Display data distributions
You can use the dbschema utility to display data distributions.
Unless column values change considerably, you do not need to regenerate the data distributions. To verify the accuracy of the distribution, compare dbschema -hd output with the results of appropriately constructed SELECT statements.
For example, the following dbschema command produces a list
of distributions for each column of table customer in database vjp_stores with
the number of values in each bin, and the number of distinct values:
dbschema -hd customer -d vjp_stores
Displaying
Data Distributions with dbschema -hd shows the
data distributions for the column zipcode that this dbschema
-hd command produces. Because this column heads the zip_ix index,
UPDATE STATISTICS HIGH was run on it, as the following output line
indicates:
High Mode, 0.500000 Resolution
Displaying
Data Distributions with dbschema -hd shows 17 bins
with one distinct zipcode value in each bin.
The OVERFLOW portion of the output shows the duplicate values that
might skew the distribution data, so dbschema moves them from
the distribution to a separate list. The number of duplicates in this
overflow list must be greater than a critical amount that the following
formula determines. Displaying
Data Distributions with dbschema -hd shows
a resolution value of .
0050.
Therefore, this formula
determines that any value that is duplicated more than one time is
listed in the overflow section. Overflow = .25 * resolution * number_rows
= .25 * .0050 * 28
= .035
For more information about the dbschema utility, see the HCL OneDB™ Migration Guide.