The etx_contains() operator
The etx_contains() operator executes a search that you define by using a clue, tuning parameters, and an optional statement local variable (SLV).
Syntax for COMPARISON condition
Element | Purpose | Syntax |
---|---|---|
col_exp | The name of the column you want to search. | Syntax must conform to the column expression syntax in HCL OneDB™ Guide to SQL: Syntax. |
slv | A statement local variable (SLV) that the search engine uses to store the score and highlighting information of a particular row. | Syntax must conform to the identifier segment; see HCL OneDB Guide to SQL: Syntax. |
Syntax for quoted string
Element | Purpose | Restrictions | Syntax |
---|---|---|---|
character | A character that forms part of the quoted string | When the quoted string is part of a Row() expression
(see Syntax row expression usage), the following
conventions apply:
|
Characters are literal values that you enter from the keyboard. |
Syntax row expression usage
Element | Purpose | Restrictions | Syntax |
---|---|---|---|
hits | Maximum number of hits, per index fragment, returned by a text search. | The data type must be a positive integer. | Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax. |
limit | Number of nonsearch words that can occur between two or more search words. | The data type must be a nonzero integer greater than the number of search words. | Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax. |
score | Minimum score for a word to be considered a pattern match. | The data type must be a real number. | Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax. |
syn_name | Synonym list name. | The synonym list must exist. | Syntax must conform to the Literal Number syntax in HCL OneDB Guide to SQL: Syntax. |
IfxDocDesc | The clue can be stored in an IfxDocDesc data type. | Only the data in the location field is consulted when a search is executed. | See The IfxDocDesc data type for more information about the IfxDocDesc row data type. |
Tuning parameters
Tuning parameter | Description | Restrictions | Default |
---|---|---|---|
CONSIDER_STOPWORDS | Indicates to the search engine that stopwords are to be included in the search. | The etx index must have been created with the INCLUDE_STOPWORDS and STOPWORD_LIST index parameters. | Disabled |
MATCH_SYNONYM | Enables synonym matching. If no value is specified, the default synonym list etx_thesaurus is consulted. If a value is specified, the specified synonym list is consulted instead of the default list. Follow the instructions given in Synonym lists to create a default or alternative list. | If a custom synonym list is specified, it must have already been created through the etx_CreateSynWlst() routine. When used together with pattern matching, such as PATTERN_ALL, only pattern matches of root words are found, not of synonyms. | Disabled |
MAX_MATCHES | Allows you to specify the maximum number of hits per index fragment returned by the search engine. If you want to use this tuning parameter to limit the number of rows returned from a search, set it to the larger of 1000 or 10% of the total number of rows in the table. Per index fragment means that if, for example, the etx index is fragmented into two parts and this tuning parameter is set to 1000, a possible maximum of 2000 hits might be returned. | None | All hits |
NO_HILITE | Allows you to speed up queries for which you want to return the score of the query results but you are not interested in highlighting the information. | The query must include the etx_ReturnType data type, a complex row type that includes both the score for the document and a subsidiary row type with highlight information. Returning all this information can incur considerable overhead, but by specifying the NO_HILITE query parameter, the DataBlade® module skips all preparation of the highlighted data and returns just the score. | Disabled |
PATTERN_ALL | Enables all the pattern search options: PATTERN_BASIC, PATTERN_TRANS, and PATTERN_SUBS. | The etx index must have been created with
the WORD_SUPPORT = PATTERN index parameter. |
Disabled |
PATTERN_BASIC | Enables the basic search option. The search returns the best pattern matches based on the value of WORD_SCORE. This might include words that are substring or superstring pattern matches of the words in the clue, transpositions, and substitutions, although it is not guaranteed. | The etx index must have been created with
the WORD_SUPPORT = PATTERN index parameter. |
Disabled |
PATTERN_SUBS | Indicates to the search engine that you want words returned that match the clue except for one character. | The etx index must have been created with
the WORD_SUPPORT = PATTERN index parameter. |
Disabled |
PATTERN_TRANS | Indicates to the search engine that you want words returned that match the clue except for a single transposition. | The etx index must have been created with
the WORD_SUPPORT = PATTERN index parameter. |
Disabled |
SEARCH_TYPE | Allows you to specify the type of search you want
to perform.
|
One of the following values:
If set to If
set to |
WORD |
WILDCARD_CLUE | Provides an alternative to the PATTERN_* query parameters. When you include the WILDCARD_CLUE query parameter, the clue can have an * character at the beginning, end, or both of any or all words in the clue, and the * will be expanded to match any string of characters. WILDCARD_CLUE applies on a word-by-word basis within the clue. This means that after all the clue words have been expanded, the clue is treated as a keyword, a Boolean expression, or a phrase, depending on the other query parameters. An exact match of the literals in a clue when you use WILDCARD_CLUE results in a contribution of score 99 to the document score, regardless of whether the matched word has additional letters at the beginning or end. This means, for example, that the clue *worth* matches with a score of 99 all the words worth, worthy, and unworthiness. | You cannot specify an * in the middle of a clue.
You cannot combine the WILDCARD_CLUE query parameter with any of the
PATTERN_* parameters in the set of query parameters for a given clue.
If you want to execute a wildcard query for a clue that includes the
character * itself as a literal, use the backslash to escape the *
character. The * must be in the character set for the DataBlade module to be able to search for it. If you include the query parameter MATCH_SYNONYM in a wildcard query, the literals in the clue (excluding the leading or trailing *) are looked up in the synonym list. If found, all synonyms have the wildcard expansion done for them and for the original literals in the clue. If you created the index with a stopword list, the DataBlade module performs standard stopword processing for all clue words that result AFTER doing the wildcard expansion. |
Disabled |
WORD_SCORE | Allows you to specify a minimum resemblance for pattern matches. The search engine counts as hits only words that meet the minimum standard set by WORD_SCORE. | Must be a value from 1 through 100, inclusive.
Specifying 0 indicates that you want to set the value
back to the default, 70 . |
70 |
Return type
The etx_contains() operator
returns BOOLEAN
.
Usage
SELECT title FROM reports
WHERE etx_contains (abstract, Row('multimedia'))
AND doc_no > 1005 ;
The optional third parameter of etx_contains() is an SLV that returns scoring and internal highlighting information. The contents of the SLV are valid only for the life of the query. The data type of the SLV is etx_ReturnType, a row data type derived from HCL OneDB. For more information about the etx_ReturnType data type, see The etx_ReturnType data type.
Although the etx access method supports fragmented indexes, you cannot use the etx_contains() operator to fragment an index by expression.
SELECT title FROM reports
WHERE etx_contains (abstract, 'multimedia')
AND doc_no > 1005 ;
Typically, the clue is a quoted string of one or more words, such as the word multimedia in the preceding example. Sometimes, however, you might want to use an entire document as the clue. To do this, instead of specifying a quoted string as one of the parameters of the etx_contains() operator, specify an IfxDocDesc document.
Due to the flexibility of the LLD_Locator data type, the data type of the location field of the IfxDocDesc data type, you can specify as a clue either a document stored in the database or a document stored as a file on the operating system. Only the location field of the IfxDocDesc data type is consulted when a document is specified as a clue to the etx_contains() operator. The contents of the other fields, such as format and version, are ignored.
An example of a search that uses an IfxDocDesc document as a clue is shown in the next section.
For more information about the IfxDocDesc and LLD_Locator data types, see Data types.
Examples
SELECT title FROM reports
WHERE etx_contains(abstract, Row('multimedia'))
AND author = 'Joe Smith';
SELECT title FROM reports
WHERE etx_contains(abstract,
Row('multimedia video',
'PATTERN_TRANS & PATTERN_SUBS & MATCH_SYNONYM = my_synonymlist'));
SELECT rc1.score, title FROM reports
WHERE etx_contains (abstract,
Row('video'), rc1 # etx_ReturnType)
AND doc_no > 1005
ORDER BY 1;
SELECT title FROM reports
WHERE etx_contains (abstract,
Row ( Row ('ASCII', '0',
Row ('IFX_FILE', NULL::LLD_Lob,
'/local0/excal/clue.txt')::LLD_Locator,
NULL::LVARCHAR)::IfxDocDesc) );
The entire contents of the operating system file /local0/excal/clue.txt are automatically converted into the clue. Even though no tuning parameters are specified, the IfxDocDesc clue must still be encapsulated within a Row() constructor.
select title from reports
where etx_contains(abstract,
Row('dynamic server','search_type=PROX_SEARCH(3) & NO_HILITE'),
rc#Etx_ReturnType) order by rc.score;
select title from reports
where etx_contains(abstract,
Row('build*','WILDCARD_CLUE & NO_HILITE'), rc#Etx_ReturnType);
select title from reports
where etx_contains(abstract,
Row("*forgiv* drink*", "WILDCARD_CLUE & search_type=PHRASE_EXACT"));
This
query finds unforgivable drinking
and forgiven
drinkers
. The query does not find unforgiving nondrinkers
,
since there is no * at the beginning of the clue word drink*
.
- The stopwords in the stopword list are
the
,and
,a
, andor
. - Three rows in the table have the following text in the search
column:
- 1 “these are the best of times”
- 2 “wherever you are, there you are”
- 3 “over the rainbow”
- You specify the following clue:
select title from reports where etx_contains(abstract, Row("the*", "WILDCARD_CLUE & search_type=WORD"));
The query returns rows 1 and 2. This is because the *
is expanded to these
, there
, and the
.
Then stopword processing is performed, which eliminates the
from
the search. This eliminates row 3 from the result.
For additional examples of the etx_contains() operator, see Concepts, and Tutorial.