Updating statistics for join columns

In some situations, you might want to run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns.

About this task

Because of improvements and adjusted cost estimates to establish better query plans, the optimizer depends greatly on an accurate understanding of the underlying data distributions in certain cases. You might still think that a complex query does not execute quickly enough, even though you followed the guidelines in Creating data distributions. If your query involves equality predicates, take one of the following actions:

Run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns that appear in the WHERE clause of the query. If you followed the guidelines in Creating data distributions, columns that head indexes already have HIGH mode distributions.
Determine whether HIGH mode distribution information about columns that do not head indexes can provide a better execution path, take the following steps:

To determine if UPDATE STATISTICS HIGH on join columns might make a difference:

Procedure

Issue the SET EXPLAIN ON statement and rerun the query.
Note the estimated number of rows in the SET EXPLAIN output and the actual number of rows that the query returns.
If these two numbers are significantly different, run UPDATE STATISTICS HIGH on the columns that participate in joins, unless you have already done so.

Results

Important: If your table is very large, UPDATE STATISTICS with the HIGH mode can take a long time to execute.

The following example shows a query that involves join columns:

SELECT employee.name, address.city
   FROM employee, address
   WHERE employee.ssn = address.ssn
   AND employee.name = 'James'

In this example, the join columns are the ssn fields in the employee and address tables. The data distributions for both of these columns must accurately reflect the actual data so that the optimizer can correctly determine the best join plan and execution order.

You cannot use the UPDATE STATISTICS statement to create data distributions for a table that is external to the current database. For additional information about data distributions and the UPDATE STATISTICS statement, see the HCL OneDB™ Guide to SQL: Syntax.