Stopword lists

A stopword is a word that you want excluded from your index and, as a consequence, from your searches. A typical stopword list includes words like of, the, and by. Stopword lists depend on the content and type of your data.

Any frequently occurring word that you want excluded from your index is a candidate for inclusion in a stopword list. Stopword lists can reduce the time it takes to perform a search, reduce index size, and help you avoid false hits.

To create and drop stopword lists, you use procedures defined for the DataBlade® module. For example, to create a stopword list, first create an operating system file that contains the list of stopwords, one word per line. Then make the stopword list known to the module by executing the procedure etx_CreateStopWlst(), as shown in the following example:
EXECUTE PROCEDURE etx_CreateStopWlst
    ('stopwlist', '/local0/excal/stopwlist');

This statement creates the stopword list stopwlist from the operating system file /local0/excal/stopwlist. An optional third argument can be used to specify the sbspace where the list is to be stored. If you do not specify a specific sbspace to store the list, it is stored in the default sbspace. The default sbspace is specified by the SBSPACENAME parameter in the onconfig file.

You can create your own stopword list file, or you can create one based on a list of standard English-language stopwords provided with your DataBlade module in the following location:
$INFORMIXDIR/extend/ETX.version/wordlist/etx_stopwords.txt 
where version is the current version of the DataBlade module installed on your computer.

You can have at most one stopword list associated with an etx index. The stopword list is specified when the index is initially created with the index parameter STOPWORD_LIST. The stopword list must exist when the etx index is created.

At times, you might want to include words in a search that currently exist in your stopword list. For example, suppose that the following words exist in your stopword list: to, or, and be. Suppose further that you want to search for the exact phrase “to be or not to be.” To occasionally search for stopwords with an etx index that has a stopword list associated with it, specify the INCLUDE_STOPWORDS index parameter when you create the index. Then use the CONSIDER_STOPWORDS tuning parameter when you execute the search. The CONSIDER_STOPWORDS parameter forces the search engine to include words that you previously stipulated as stopwords. For example, you can search for the phrase to be or not to be as follows:
SELECT id, description FROM videos 
    WHERE etx_contains(description, 
    Row('to be or not to be', 
        'SEARCH_TYPE = PHRASE_EXACT & CONSIDER_STOPWORDS'));
Important: The CONSIDER_STOPWORDS tuning parameter of the etx_contains() operator works only if the INCLUDE_STOPWORDS='TRUE' index parameter is specified for the CREATE INDEX statement that creates the etx index.