Resolution Clause
Use the Resolution clause in MEDIUM or HIGH mode to adjust the size of the distribution bins, and to avoid calculating data on indexes.
Element | Description | Restrictions | Syntax |
---|---|---|---|
confidence | Estimated likelihood that sampling in MEDIUM mode produces results equivalent to the exact HIGH mode. Default level is 0.95. | Must be within the range from 0.80 (minimum) to 0.99 (maximum) | Literal Number |
percent | Average percentage of the sample in each distribution bin. Default is 2.5 for MEDIUM and 0.5 for HIGH. | Minimum value is 1/nrows, for nrows the number of rows in the table | Literal Number |
min | The minimum integer number of randomly selected rows on which to generate the data distributions | Must be greater than zero but cannot exceed nrows | Literal Number |
A distribution is a mapping of the data in a column into a set of column values, ordered by magnitude or by collation. The range of these sample values is partitioned into disjunct intervals, called bins, each containing an approximately equal portion of the sample of column values. For example, if one bin holds 2 percent of the data, approximately 50 such intervals hold the entire sample.
Some statistical texts call these bins equivalence categories. Each contains a disjunct subset of the range of the data values that are sampled from the column.
UPDATE STATISTICS MEDIUM FOR TABLE orders RESOLUTION 4 0.90 DISTRIBUTIONS ONLY;This specifies 4% of the data per bin, implying approximately 25 bins, and a confidence level of 90%, and no examination of index data. If the 0.90 value were omitted, then the default level of confidence would have been in effect. If the RESOLUTION keyword and both numeric values were omitted, then default values for percent (2.5%) and for confidence (0.95) would be used.
The query optimizer estimates the selectivity of a WHERE clause by examining, for each column included in the WHERE clause, the proportional occurrence of the data values contained in the column.
You cannot create distributions for BYTE or TEXT columns. If you include a BYTE or TEXT column in an UPDATE STATISTICS statement that specifies MEDIUM or HIGH distributions, no distributions are created for those columns. Distributions are constructed for other columns in the list, however, and the statement does not return an error.
Columns of the VARCHAR data type do not use overflow bins, even when multiple bins are being used for duplicate values.
You can use the first two parameters of the DBUPSPACE environment variable to constrain the disk space and memory resources that the UPDATE STATISTICS statement can use to sort data when it constructs column distributions. These settings affect performance, because they determine how many times the database server scans the specified table to construct each distribution. (A third DBUPSPACE parameter can control whether UPDATE STATISTICS sorts with indexes when calculating column distributions, and whether the explain output file stores the plan by which the column distributions are calculated.)