RSS Feed
Method R Discussion Group

Method R helps developers, DBAs, and decision-makers optimize Oracle-based software in every phase of the software life cycle.

Discuss Method R products, techniques, and events.

Ask new questions or post your own responses.

Estimate Size & Literal Values - Webinar on 8/30/07

If we are using literal values and have very large tables, is there a good estimate size (rule of thumb) for capturing statistics? 20%, 30% ?

Becky Goodman

Thursday, August 30, 2007

If you consider 20%, 30% you might as well use 100% - unless you are using block sampling. Think about it. Ar 20% sampling you are asking for 1 out of 5 rows. If your blocks hold more than 5 rows on average that means (at least) 1 row from every block (on average). Granted, you may avoid reading a few blocks, but you'll read almost all of the blocks of the table. The same argumentation holds to sampling rates all the way to 1% or less, depending on the average row length, block size and hence rows/block.
For tables you could safely use very small sampling rates. Number of blocks is never sampled but taken from the segment header and therefore is accurate no matter the sampling rate and the number of rows is amazingly accurately estimated even from small sampling rates. The only fly in the ointment are column statistics for columns with rather skewed values distributions. The coarser the sampling the more likely you'll miss infrequently occurring values and ultimately underestimate num_distinct. Indexes are a different story. The estimate of distinct keys (and related statistics) suffers as the sampling rate drops.
For tables I would (and do) use auto_sampling_size and for indexes and those columns where I do collect histograms I use 100%.

Wolfgang Breitling

Thursday, August 30, 2007

I have to revise my stand on this. An estimate_percent < 100 does perform faster. My initial assertion that you still read essentially all blocks as long as estimate_percent < average rows per block still is correct though. The difference comes comes from the sort(s) required to get the column stats, whether collecting histograms (for all columns size auto) or not (for all columns size 1). There Oracle applies the sampling percentage and thus at e.g. estimate_percent=>20 only 20% of the rows are processed and thus the sort size is only 20%. Especially for large tables that is a significant saving. A full compute could easily push the sort into a multi-path sort. Or, putting it another way, you want to select your sampling percentage such that your sort(s) for column statistics require at worst a 1-path sort.

Wolfgang Breitling

Wednesday, October 24, 2007

Slight correction
"estimate_percent < average rows per block" should be
"estimate_percent > 100/average rows per block"

Wolfgang Breitling

Wednesday, October 24, 2007

This topic is archived. No further replies will be accepted.

RSS Feed Method R Discussion Group

Estimate Size & Literal Values - Webinar on 8/30/07

RSS Feed
Method R Discussion Group