Extending Natural Language Processor profiles
A Natural Language Processor (NLP) profile is used to control the preprocessing flow of search terms before executing an Elasticsearch query. The profile is a .json file and stored in your query runtime container.
You can find the default HCL_NLPProfile.json file in the
resources\profiles\nlp directory of query runtime. A NLP
profile can also be created through the /profiles REST endpoint and
is stored inside of the Zookeeper “nlpprofiles” node. The /profiles
endpoint allows a new optional query parameter, profileType with
values of Search
or NLP
to differentiate the profile.
Search
is the default choice if this parameter is not provided.
- A list of provider classes. These classes help to preprocess the search term. For more information about the provider classes, see Provider class reference.
- A list of NLP classifications that will override default classifications at query run time.
- A search term dropping priority section, which is used to define the sequence in which search term are dropped.
{
"profileName": "HCL_NLPProfile",
"provider": {
"PartNumber": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPartNumberProviderHelper",
"BlankSpace": "com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper",
"CurrenySymbol": "com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper",
"ExcludeSearchTerm": "com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper",
"STA": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper",
"MultiWordSearchTerm": "com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper",
"LowerCase": "com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper",
"PriceRangeSeparator": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceRangeSeparatorProviderHelper",
"DMM": "com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper",
"SpecialCharacter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper",
"MultiWordPriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper",
"Stopword": "com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper",
"WordToNumber": "com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper",
"PriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper",
"POS_NER": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper",
"Color": "com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper"
},
"classification": {},
"termDroppingPriority": {
"1": "FILTER",
"2": "MEASUREMENT",
"3": "BRAND",
"4": "COLOR",
"5": "ADJECTIVE",
"6": "CATEGORY",
"7": "NOUN"
}
}
Creating or updating an NLP profile
POST https://server:port/search/resources/api/v2/documents/profiles/HCL_NLPProfile?profileType=NLP
How the Query service finds the NLP profile
The Query runtime can load the NLP profile configuration details from the store index at runtime, or in response to a call via the Query REST API. If the configuration details are not provided, it will fall back on the default HCL NLP profile. The Query Service performs the following steps to lookup the name of the NLP profile.
- The Query service will check for the store locale NLP profile. If it is found, this profile will be loaded from Zookeeper.
- If no NLP profile is configured for the store locale, the query service will find the base locale from the language code of the store locale, and search for a profile name for that base locale. If one is found, this profile will be loaded from Zookeeper. The base locale can be any one of “en_US”, “es_ES”, “fr_FR”, “de_DE”, or “zh_CN”.
- If no NLP profile has been configured for the local base, the Query service will find the default NLP profile name for the store that isconfigured in the STORECONF table. If found, then this profile will be loaded from Zookeeper.
- If no default NLP profile is configured in STORECONF table, the Query Service will fall back to the default NLP profile.
Automatically handling search misses
Prior to HCL Commerce Version 9.1.8.0, if there are no results, search terms will be dropped from the search list, from left to right and one token at a time. You can now can specify which token gets removed from the search term when while there are no results. In the NLP profile, the “termDroppingPriority” section details the priority according to which tokens are removed from the search term. After removing a token, the process makes another call with the updated search term. If a result is found, the Query service returns the result; otherwise, based on the configuration, another token is removed from search term . If you are using the default NLP profile, the dropping logic will be applied in the order below.
- FILTER: Will remove the price filter from search term.
- MEASUREMENT: Will remove measurement details from search term.
- BRAND: Will remove brand name from search term.
- COLOR: Will remove color name from search term.
- ADJECTIVE: Will remove adjective from search term.
- CATEGORY: Will remove category name from search term.
- NOUN: Will remove nouns from search term.
NLP profile classification
At query runtime, the NLP profile classification overrides the classification that were analyzed by the Query service from index data or the default NLP data model.
For example, consider a case where the word “apple” is classified as a BRAND_NAME by the Query service base on the index data. If you now want to classify “apple” as a category, this change can be configured in the NLP profile classification section.
Provider class reference
The provider classes used in the profile have the following functions.
Provider | Class name | Usage |
---|---|---|
PartNumber | com.hcl.commerce.search.internal.expression.provider.SearchNLPPartNumberProviderHelper |
Matches the input search term with the part number patterns, if it matches then perform a search for a part number. The rest of the helper classes will not be executed. |
BlankSpace | com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper |
Replace more than two white spaces with a single white space. |
CurrenySymbol | com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper |
If the search term contains a price filter with currency symbol, then the currency symbol will be removed from the search term. |
ExcludeSearchTerm | com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper |
Remove the excluded term from the search term. |
STA | com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper |
Performs Search Term Association (STA) expansion and replacement at query time in the Query service. |
MultiWordSearchTerm | com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper |
Perform a check for multiword category, brand name, attribute value, color name if present, then add that into the respective list. |
LowerCase | com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper |
Convert server term into lowercase. |
PriceRangeSeparator | com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceRangeSeparatorProviderHelper |
Check for searchs term that contain a price range filter with
“–”. If yes, then replace "-" with the appropriate locale
specific separator. Eg. : en – to, es – a, zh -
至 etc. |
DMM | com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper |
Check whether the search term contains dimension details, then parse the search term for dimension matchmaker. |
SpecialCharacter | com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper |
If the search term contains a special character, add that token in the list of nouns. |
MultiWordPriceFilter | com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper |
Check for multiword filter in search term. Then, replace space with NNNN for next processor to identify term as a single word. |
Stopword | com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper |
Remove words marked with IGNORE_TERM by the configuration filter from the search term. |
WordToNumber | com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper |
Convert the word into its equivalent numeric format. |
PriceFilter | com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper |
Check for search terms with price filter with multiword along with NNNN. |
POS_NER | com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper |
Perform POS tagging and NER extraction. Check for NOUN, CATEGORY, BRAND_NAME, ADJECTIVE, ATTRIBUTE_VALUE etc. Then add to the respective list. |
Color | com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper |
Retrieve the color family details for color matchmaker base on inputted color name in the search term. |