Method R Discussion GroupMethod R helps developers, DBAs, and decision-makers optimize Oracle-based software in every phase of the software life cycle. |
||
Discuss Method R products, techniques, and events. Ask new questions or post your own responses. |
Is there a rule of thumb as to which is best in a given situation - height-based or freq-based histogram?
If you have a small number of distinct values, say less than 100 and queries are frequently executed against these columms for specific values and column is skewed, then Frequency Histograms are your best option.
If your number of distinct values is more than 254, Frequency Histograms are out of the question. You will have to use Height Balanced Histograms.
You should never say never (and I just broke that rule) but I have not yet seen a use for a height-balanced histogram in practice. All the histograms I have collected in production systems are frequency histograms and generally on columns with rather few distinct values - a dozen or two.
I can see a possible use for a height-balanced histogram on columns with only or predominantly range predicates or columns with extreme outliers - e.g. date '9999-12-31' for a never-never or unknown date. The problem with many automatically gathered height-balanced histograms is that they are marginal, meaning that there is no drastic skew with the consequence that any popular values are volatile, i.e. from one statistics gathering to the next different values surface as popular values. |
Powered by FogBugz