Managing stop words
ZooKeeper is used to manage several kinds of customizable lists of associated terms used by the search function. Stop words are removed from the query before it is processed. Each custom list is accessible and you can directly change stop words in ZooKeeper.
Stop word lists in Zookeeper
ZooKeeper stores a set of lists that are used by the HCL Commerce Version 9.1 Query service. Each list consists of a word, and its accompanying association or the action to be taken when it is encountered. The Stop words list records all those words that are to be filtered out of the search query before Natural Language Processing is performed on it. This list usually contains the most common words in a language (such as "the" for English).
(a|an|and|are|as|at|be|but|by|for|if|in|into|is|it|no|not|of|on|or|such|that|the|their|then|there|these|they|this|to|was|will|with|,
These custom lists are stored in JSON
format in ZooKeeper, in
language-specific dictionaries. The following section describes the structure of
these dictionaries, and how you can interact with them in ZooKeeper using the REST
API.
The Stop Words dictionary
You interact with the Stop Words dictionary using REST calls. The permitted calls are GET, POST, and PATCH. For example, in the case of a GET call, the response body contains a json-formatted set of the terms you are calling. There is no explicit DELETE call; however, you can simply do a POST with empty content to delete an item.
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=environmentType_storeID_product_stopwords&locale=en_US
Where
the environmentType is either auth or
live.{
"the": "",
"and":""
}
Extending the Stop words
- When your customers will commonly use additional terms, in English or in technical terminology, that need to be filtered out in addition to the default set.
- When you need to add Stop words in a language other than English.
PATCH http://host:port/search/resources/api/v2/configuration?nodeName=component&envType=auth
Where
the body of the request similar to the following example, in which the
value contains a list of Spanish Stop
words.{
"extendedconfiguration": {
"configgrouping": [
{
"name": "SearchConfiguration",
"property":
{ "name": "SkipWords", "value": "\\s+(de|la|que|el|en|,|a|los|.|del|se|')\\s+" }
}
]
}
}