Overriding the stemmer

The default stemmer can produce a discrepancy in how it processes "ing" and "er" terms. For example, it can produce a different number of search results for a search for "air conditioner" than for "air conditioning." You can modify the behavior of the stemmer to achieve consistent results.

When you use the default HCL Commerce Searchconfiguration, there can be cases where the stemmer's output is not a meaningful word. Taking the "air condition" example, it can be the case that the number of results differs depending on the suffix used in the word "conditioner:"

"air conditioner:" 84 results
"air conditioning:" 9 results

Internally, the stemmer is translating these terms into the following:

"conditioner" becomes "condition"
"conditioning" becomes "condit"

One way to manage situations like this is to use synonyms. For more information about using this approach, see Managing synonyms, stop words, and Search Term Associations.

Rather than creating synonyms for a potentially large pool of product names, you can apply an override to protect terms modified by the stemmer. This capability is built into Elasticsearch, and is described in the Elasticsearch reference guide.

Stemmer mapping rules

HCL Commerce Search supports rules-based stemmer overrides, which use a list of mapping rules that you provide. Use the configuration endpoint of the Data Query Service to configure the stemmer override rules. Once the override is complete, these rules are populated at indexing time to your index settings. Both Basic NLP and Advanced NLP then use these rules when stemming search phrase tokens in the Query service.

PATCH or POST http://hostname:port/search/resources/api/v2/configuration?nodeName=stemmer_override&envType=auth&locale=en_US

Where the request body consists of data formatted as in the following example.

{
    "dresses, dressing": "dresses",
    "condition, conditioner, conditioning": "condition"
}

Note: It is advisable that you do the following:

If this is the first time you are adding this configuration, then use POST http request method. Subsequently, use PATCH.
Ensure there are no multiword tokens in the list when you submit it.
After adding the configuration, clear the NLP cache data. For more information, see HCL Commerce data cache.
After adding the configuration, a full reindexing is needed to regenerate data according to the override rules.

If the rules are added to the stemmer_override zookeeper node and indexing triggered, the index settings are updated as below in the following example. The stemmer override is added to the analyzer before the stemmer filter when you are using Basic NLP.

"custom_en_US_stemmer_override": {
	"type": "stemmer_override",
	"rules": [
		"dresses, dressing => dresses",
		"condition, conditioner, conditioning => condition"
	]
},
…
"custom_en_US_analyzer": {
	"filter": [
		....
		"custom_en_US_stemmer_override",
		"custom_en_US_stemmer",
		...
	],
	....
}

When you use Advanced NLP, the stemmer-override rule definition list will be empty. There is no stemmer override, but a stemmer filter is added to the analyzer if the current language supports Advanced NLP. In this case the Query service loads these rules from Zookeeper and applies those rules while stemming the tokens from the search phrase. However, you can add the stemmer override rules in the same way as Basic NLP.

When reindexing, the NLP service populates the natural fields into the product index. The NLP service also uses these rules while doing stemming on the token. The NLP service calls the data query internally to load the stemmer override rules from Zookeeper.