Updating statistics for join columns
In some situations, you might want to run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns.
About this task
Because of improvements and adjusted cost estimates to
establish better query plans, the optimizer depends greatly on an
accurate understanding of the underlying data distributions in certain
cases. You might still think that a complex query does not execute
quickly enough, even though you followed the guidelines in Creating data distributions. If your query involves
equality predicates, take one of the following actions:
- Run the UPDATE STATISTICS statement with the HIGH keyword for specific join columns that appear in the WHERE clause of the query. If you followed the guidelines in Creating data distributions, columns that head indexes already have HIGH mode distributions.
- Determine whether HIGH mode distribution information about columns that do not head indexes can provide a better execution path, take the following steps:
To determine if UPDATE STATISTICS HIGH on join columns might make a difference:
Procedure
- Issue the SET EXPLAIN ON statement and rerun the query.
- Note the estimated number of rows in the SET EXPLAIN output and the actual number of rows that the query returns.
- If these two numbers are significantly different, run UPDATE STATISTICS HIGH on the columns that participate in joins, unless you have already done so.
Results
Important: If your table is
very large, UPDATE STATISTICS with the HIGH mode can take a long time
to execute.
The following example shows a query that involves
join columns:
SELECT employee.name, address.city
FROM employee, address
WHERE employee.ssn = address.ssn
AND employee.name = 'James'
In this example, the join columns are the ssn fields in the employee and address tables. The data distributions for both of these columns must accurately reflect the actual data so that the optimizer can correctly determine the best join plan and execution order.
You cannot use the UPDATE STATISTICS statement to create data distributions for a table that is external to the current database. For additional information about data distributions and the UPDATE STATISTICS statement, see the Informix® Guide to SQL: Syntax.