Updating statistics for join columns
In some situations, you might want to run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns.
About this task
- Run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns that appear in the WHERE clause of the query. If you followed the guidelines in Creating data distributions, columns that head indexes already have HIGH mode distributions.
- Determine whether HIGH mode distribution information about columns that do not head indexes can provide a better execution path, take the following steps:
To determine if UPDATE STATISTICS HIGH on join columns might make a difference:
Procedure
- Issue the SET EXPLAIN ON statement and rerun the query.
- Note the estimated number of rows in the SET EXPLAIN output and the actual number of rows that the query returns.
- If these two numbers are significantly different, run UPDATE STATISTICS HIGH on the columns that participate in joins, unless you have already done so.
Results
SELECT employee.name, address.city
FROM employee, address
WHERE employee.ssn = address.ssn
AND employee.name = 'James'
In this example, the join columns are the ssn fields in the employee and address tables. The data distributions for both of these columns must accurately reflect the actual data so that the optimizer can correctly determine the best join plan and execution order.
You cannot use the UPDATE STATISTICS statement to create data distributions for a table that is external to the current database. For additional information about data distributions and the UPDATE STATISTICS statement, see the HCL OneDB™ Guide to SQL: Syntax.