Overriding the stemmer
The default stemmer can produce a discrepancy in how it processes "ing" and "er" terms. For example, it can produce a different number of search results for a search for "air conditioner" than for "air conditioning." You can modify the behavior of the stemmer to achieve consistent results.
- "air conditioner:" 84 results
- "air conditioning:" 9 results
- "conditioner" becomes "condition"
- "conditioning" becomes "condit"
Rather than creating synonyms for a potentially large pool of product names, you can apply an override to protect terms modified by the stemmer. This capability is built into Elasticsearch, and is described in the Elasticsearch reference guide.
Stemmer mapping rules
PATCH or POST http://hostname:port/search/resources/api/v2/configuration?nodeName=stemmer_override&envType=auth&locale=en_USWhere
the request body consists of data formatted as in the following
example.{
"dresses, dressing": "dresses",
"condition, conditioner, conditioning": "condition"
}
- If this is the first time you are adding this configuration, then use POST http request method. Subsequently, use PATCH.
- Ensure there are no multiword tokens in the list when you submit it.
- After adding the configuration, clear the NLP cache data. For more information, see HCL Commerce data cache.
- After adding the configuration, a full reindexing is needed to regenerate data according to the override rules.
"custom_en_US_stemmer_override": {
"type": "stemmer_override",
"rules": [
"dresses, dressing => dresses",
"condition, conditioner, conditioning => condition"
]
},
…
"custom_en_US_analyzer": {
"filter": [
....
"custom_en_US_stemmer_override",
"custom_en_US_stemmer",
...
],
....
}
When you use Advanced NLP, the stemmer-override rule definition list will be empty. There is no stemmer override, but a stemmer filter is added to the analyzer if the current language supports Advanced NLP. In this case the Query service loads these rules from Zookeeper and applies those rules while stemming the tokens from the search phrase. However, you can add the stemmer override rules in the same way as Basic NLP.
When reindexing, the NLP service populates the natural fields into the product index. The NLP service also uses these rules while doing stemming on the token. The NLP service calls the data query internally to load the stemmer override rules from Zookeeper.