Data-distribution configuration
The database server uses a hashing algorithm to store and locate information within the data-distribution cache. The DS_POOLSIZE configuration parameter controls the size of the data-distribution cache and controls the total number of column distributions that can be stored in the data-distribution cache. The value of the DS_POOLSIZE configuration parameter represents half of the maximum number of distributions in the data distribution cache.
To modify the number of buckets in the data-distribution cache, use the DS_HASHSIZE configuration parameter.
Distributions_per_bucket = DS_POOLSIZE / DS_HASHSIZE
To modify the number of distributions per bucket, change either the DS_POOLSIZE or DS_HASHSIZE configuration parameter.
For example, with the default values of 127
for
DS_POOLSIZE and 31
for DS_HASHSIZE, you can potentially
store distributions for about 254 columns in the data-distribution
cache. When the cache is full, the database server automatically increases
the size of the cache by 10%.
For example, with the default values of 127
for
DS_POOLSIZE and 31
for DS_HASHSIZE, you can potentially
store distributions for about 127 columns in the data-distribution
cache. The cache has 31 hash buckets, and each hash bucket can have
an average of 4 entries.
- The number of columns for which you run the UPDATE STATISTICS
statement in HIGH or MEDIUM mode and you expect to be used most often
in frequently run queries.
If you do not specify columns when you run UPDATE STATISTICS for a table, the database server generates distributions for all columns in the table.
You can use the values of DD_HASHSIZE and DD_HASHMAX as guidelines for DS_HASHSIZE and DS_POOLSIZE. The DD_HASHSIZE and DD_HASHMAX specify the size for the data-dictionary cache, which stores information and statistics about tables that queries access.
For medium to large systems, you can start with the following values:
- DD_HASHSIZE 503
- DD_HASHMAX 4
- DS_HASHSIZE 503
- DS_POOLSIZE 1000
- DS_POOLSIZE 2000
Monitor these caches by running the onstat -g dsc command to see the actual usage, and you can adjust these parameters accordingly.
- The amount of memory available
The amount of memory that is required to store distributions for a column depends on the level at which you run UPDATE STATISTICS. Distributions for a single column might require between 1 KB and 2 MB, depending on whether you specify medium or high mode or enter a finer resolution percentage when you run UPDATE STATISTICS.
- The database server uses the DS_POOLSIZE value to determine when
to remove entries from the data-distribution cache. However, if the
optimizer needs the dropped distributions for another query, the database
server must reaccess them from the sysdistrib system catalog
table on disk. The additional I/O and buffer pool operations to access sysdistrib on
disk adds to the total response time of the query.
The database server tries to maintain the number of entries in data-distribution cache at the DS_POOLSIZE value. If the total number of entries reaches within an internal threshold of DS_POOLSIZE, the database server uses a least recently used mechanism to remove entries from the data-distribution cache. The number of entries in a hash bucket can go past this DS_POOLSIZE value, but the database server eventually reduces the number of entries when memory requirements drop.
- If DS_HASHSIZE is small and DS_POOLSIZE is large, overflow lists
can be long and require more search time in the cache.
Overflow occurs when a hash bucket already contains an entry. When multiple distributions hash to the same bucket, the database server maintains an overflow list to store and retrieve the distributions after the first one.
If DS_HASHSIZE and DS_POOLSIZE are approximately the same size, the overflow lists might be smaller or even nonexistent, which might waste memory. However, the amount of unused memory is insignificant overall.
- If the data-distribution cache is full most of the time and commonly used columns are not listed in the distribution name field, try increasing the values of the DS_HASHSIZE and DS_POOLSIZE configuration parameters.
- If the total number of entries is much lower than the value of the DS_POOLSIZE configuration parameter, you can reduce the values of the DS_HASHSIZE and DS_POOLSIZE configuration parameters.
- If the number of hits are not evenly distributed among hash lists, increase the number of hash lists by increasing the value of the DS_HASHSIZE configuration parameter. Adjust the number of hash lists to have the least number of high hit entries per hash list.