Filters in the query
The query optimizer bases query-cost estimates on the number of rows to be retrieved from each table. In turn, the estimated number of rows is based on the selectivity of each conditional expression that is used within the WHERE clause. A conditional expression that is used to select rows is termed a filter.
The selectivity is a value between 0 and 1 that indicates the proportion of rows within the table that the filter can pass. A selective filter, one that passes few rows, has a selectivity near 0, and a filter that passes almost all rows has a selectivity near 1. For guidelines on filters, see Improve filter selectivity.
The optimizer can use data distributions to calculate selectivity for the filters in a query. However, in the absence of data distributions, the database server calculates selectivity for filters of different types based on table indexes. The following table lists some of the selectivity values that the optimizer assigns to filters of different types. Selectivity that is calculated using data distributions is even more accurate than the selectivity that this table shows.
In the table:
- indexed-col is the first or only column in an index.
- 2nd-max, 2nd-min are the second-largest and second-smallest key values in indexed column.
- The plus sign ( + ) means logical union ( = the Boolean OR operator) and the multiplication symbol ( x ) means logical intersection ( = the Boolean AND operator).
Filter Expression | Selectivity (F) |
---|---|
indexed-col = literal-valueindexed-col = host-variableindexed-col IS NULL | F = 1/(number of distinct keys in index) |
tab1.indexed-col = tab2.indexed-col | F = 1/(number of distinct keys in the larger index) |
indexed-col > literal-value | F = (2nd-max - literal-value)/(2nd-max - 2nd-min) |
indexed-col < literal-value | F = (literal-value - 2nd-min)/(2nd-max - 2nd-min) |
any-col IS NULLany-col = any-expression | F = 1/10 |
any-col > any-expressionany-col < any-expression | F = 1/3 |
any-col MATCHES any-expressionany-col LIKE any-expression | F = 1/5 |
EXISTS subquery | F = 1 if subquery estimated to return >0 rows, else 0 |
NOT expression | F = 1 - F(expression) |
expr1 AND expr2 | F = F(expr1) x F(expr2) |
expr1 OR expr2 | F = F(expr1) + F(expr2) - (F(expr1) x F(expr2)) |
any-col IN list | Treated as any-col = item1 OR . . . OR any-col = itemn. |
any-col relop ANY subquery | Treated as any-col relop value1 OR .
. . OR any-col relop valuen for estimated size of
subquery n. Here relop is any relational operator, such as <, >, >=, <=. |