Estimate the size of an etx index
This section describes the factors that might make your etx index larger than it needs to be and consequently cause your queries to run slower than they need to run.
- The nature of the data to be indexed
If your documents contain much numeric data, such as the data found in financial reports, and you specify a character set that indexes numeric characters, each string of numbers is indexed as if it were a word. This can increase the number of unique words in the index.
- The index parameters that you specify
If you specify
PHRASE_SUPPORT=MEDIUM
orPHRASE_SUPPORT=MAXIMUM
, the index will be two to four times larger than if you specifyPHRASE_SUPPORT=NONE
. - Stopword lists
An etx index built with a stopword list (specified by
STOPWORD_LIST='my_stopwordlist'
) is generally slightly smaller than an index that contains all the words in a document. However, if you also specifyINCLUDE_STOPWORDS='TRUE'
, the index is approximately 50% larger.
Word support | Phrase support | Stopword list | Include stopwords | Total index size, in disk pages, for an etx index containing 100,000 documents |
---|---|---|---|---|
Exact | None | No | False | 29 KB |
Exact | None | Yes | False | 28 KB |
Exact | Medium | No | False | 69 KB |
Exact | Maximum | No | False | 87 KB |
Exact | Maximum | Yes | False | 62 KB |
Exact | Maximum | Yes | True | 87 KB |
Pattern | None | No | False | 34 KB |
Pattern | None | Yes | True | 34 KB |
Pattern | Medium | No | False | 73 KB |
Pattern | Maximum | No | False | 92 KB |
Pattern | Maximum | Yes | False | 67 KB |
Pattern | Maximum | Yes | True | 92 KB |