Glossary
- access method
- etx is an example of a secondary access method.
- approximate phrase search
- A search in which the search text must contain a phrase identical
to the clue, or one or more words of the clue. The order of the words
in the clue is not important. For example, if the clue is
buy
three
dolls
, the search engine returns documents that contain the exact phrase as well as those that contain the phrasesthree
dolls
,buy
dolls
, ordolls
buy
. - BLOB
- See also smart large object.
- Boolean search
- A search that uses the Boolean expressions (logical operators) &
(AND), | (OR), or ! and ^ (NOT). Use the & Boolean operator when
you want to search for documents that contain all the words in a keyword
list; use | when you want to search for documents that contain at
least one word in the list; and use ! or ^ when you want to search
for documents that do not contain a specified word. The Boolean operators
can be combined to make more complicated expressions. This type of
search is activated by setting the SEARCH_TYPE tuning parameter to
BOOLEAN_SEARCH
. - CLOB
- A smart large object data type that stores blocks of text items,
such as ASCII or PostScript™ files.
See also smart large object.
- clue
- The data that you are searching for, specified as the second argument to the etx_contains() operator.
- document score
- A value that the text search engine assigns to each of the returned rows of a fuzzy search that specifies the degree of similarity between your clue and each of the returned rows. Scores vary 0 - 100, with 0 indicating no match and 100 indicating a perfect match. You access scoring information through the third parameter of the etx_contains() operator, a statement local variable (SLV). The data type of the SLV is etx_ReturnType, a row type defined by HCL OneDB™ that consists of three fields. The scoring information is contained in the score field.
- filtering
- A component of the that automatically filters out all proprietary formatting information from a formatted document and converts it into ASCII form.
- exact phrase search
- A search for text that matches your clue exactly. An exact phrase search is successful when the text search engine finds a phrase that contains all the words in the clue in the exact order that you specify.
- fuzzy search
- A search for text that matches your clue approximately instead
of exactly. A fuzzy search takes into account substitutions, transpositions,
and basic pattern matching. A search that returns a document that
contains the word
editer
when searching foreditor
is an example of a fuzzy search. - hit
- The result (a row) of a text search.
- hitlist
- A list of hits (rows).
- highlighting
- The process of retrieving the location of every instance of a clue in the search text. The returns highlighting information in the form of ordered pairs of integers that describe the location and length of all occurrences of the clue in the corresponding document.
- index parameter
- A variable that you use to specify the characteristics of an etx index
to support the searches you plan to perform. An example of an index
parameter is
WORD_SUPPORT='EXACT'
. - keyword
- Any contiguous group of characters found in the search text or clue, delimited by nonindexable characters such as spaces or tabs.
- keyword search
- A search in which the words in the clue are treated as separate entities (keywords) instead of a single unit (phrase). When the text search engine performs a keyword search, it returns a row whenever it encounters one or more of the words in your clue.
- operator class
- The set of operators that the database server associates with a secondary access method. When an index is created, it is associated with a particular operator class.
- pattern search
- See fuzzy search.
- phrase search
- A search in which the words in the clue are treated as a single unit (phrase) instead of separate entities (keywords). The two types of phrase searches are exact and approximate.
- proximity search
- A search in which you specify the number of nonsearch words that
can occur between two or more of the search words. You use a proximity
search if, for example, you are searching for a phrase that contains
the words
editor
andmultimedia
but do not want the two keywords separated by more than four nonsearch words. This type of search is activated by setting the tuning parameter SEARCH_TYPE equal toPROX_SEARCH
. - rank
- The order given to a hitlist based on the score of each of the returned rows.
- root word
- The word in a synonym list that has one or more synonyms defined for it. It is the leftmost word of a single row of the synonym list. When synonym matching is activated, the keyword being searched for must be a root word for its synonym to be returned instead.
- row data type
- A complex data type consisting of a group of ordered data elements
(fields) of the same or different data types. The fields of a row
type can be of any supported built-in or extended data type, including
complex data types, except SERIAL, SERIAL8, and BIGSERIAL and, in
certain situations, TEXT and BYTE.
There are two kinds of row data types:
- Named row types, created with the CREATE ROW TYPE statement
- Unnamed row types, created with the ROW constructor
- sbspace
- A logical storage area that contains one or more chunks that store only smart large object data.
- score
- See document score, word score.
- search string
- See clue.
- search text
- The data that is to be searched, stored in a column of a table.
- SLV
- Abbreviation for statement local variable.
- smart large object
- A large object that:
- is stored in an sbspace, a logical storage area that contains one or more chunks.
- has read, write, and seek properties similar to a UNIX™ file.
- is recoverable.
- obeys transaction isolation modes.
- can be retrieved in segments by an application.
Smart large objects include CLOB and BLOB data types.
- statement local variable (SLV)
- Variable for storing a value that a function returns indirectly, through a pointer, in addition to the value that the function returns directly. A scope of an SVL is limited to the statement in which it is used. The third optional parameter of the etx_contains() operator is an SLV that holds scoring and highlighting information. The data type of the SLV is etx_ReturnType.
- stopword
- A keyword that you want excluded from your index or your search. Stopwords are typically common words such as and, or, the, and to, or any word that appears frequently in your document that you want to exclude.
- substitution
- A misspelling of a word, in which one letter has been substituted by another, incorrect one. Misspelling searck for search is an example of a substitution.
- synonym
- One of two or more words or expressions that have the same or nearly the same meaning in some or all senses. The word java is a synonym of the word coffee.
- text search engine
- The component of the that calls the Text Retrieval Library (TRL) of Excalibur Technologies to perform a search. The TRL is a library of C-language object modules designed to perform fast retrieval and automatic indexing of text data. The text search engine is dynamically linked into HCL OneDB whenever a text search is performed or text data is indexed.
- transposition
- A misspelling of a word in which two adjacent letters switch positions. Misspelling saerch for search is an example of a transposition.
- tuning parameter
- A variable used to guide the way the text search engine conducts
a search. Tuning parameters are passed to the text search engine through
the second parameter of the Row() constructor of the etx_contains() operator.
An example of a tuning parameter is
SEARCH_TYPE = WORD
. - word score
- The search engine uses fuzzy logic to determine whether a pattern match is to be considered a hit. It assigns a word score to candidate matches based on its internal rules. By default, only words that match your search clue by a relative measure of 70 out of 100—that have a word score of 70 or better—are considered hits.