Natural Language Processing profiles
A Natural Language Processing (NLP) profile is used to preprocess search terms and modify search queries to fetch desired search results at the storefront.
Logic for processing search strings
Shoppers are not experts in formulating the search query that can yield the desired search results to them. They are often unaware of the ideal search terms used to find products or services at the storefront. Using Natural Language Processing, the Query service can parse plain-language search terms and discern what shoppers are trying to find. It modifies the search term at runtime to fetch the desired search results to the shoppers. The search term processing logic of the Query service is described with the help of the following examples. Each example consists of an example search term and the search term processing logic that the Query service uses to process the search term and fetch the desired search results at the storefront.
- "white shirt girls"
-
The NLP parser generates the following three tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns two products.
- white – COLOR
- shirt – CATEGORY
girls – CATEGORY
- "white shirt girls under 89$"
-
NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns a single product.
- white – COLOR
- shirt – CATEGORY
- girls – CATEGORY
- "white shirt girls under 89$"
-
NLP parser generates the following four tokens to map the search term and then runs the Elasticsearch query to fetch the search results at the storefront. It returns zero matches.
- white – COLOR
- shirt – CATEGORY
- girls – CATEGORY
- under 20$ - FILTER
In this case, the NLP parser uses search term-dropping logic. It drops the search phrase from the left with one token at a time until it gets the tokens to fetch the appropriate search results or up to four iterations. If there is any price filter in the search phrase/term, then it also gets removed in this process. After completing of the search-dropping logic, the NLP parser runs the Elasticsearch query based on the following two tokens. It returns all eight products from the girls’ category by considering the shirt as a category or in the product's name or the short description.
- shirt – CATEGORY
- girls – CATEGORY
- "vitamin capsules"
-
NLP parser generates the following two tokens to map the search term and then runs the Elasticsearch query to fetch the search results. It returns zero matches because the capsule has been set as an attribute value and based on the tokens above the Elasticsearch query searches capsules against the Noun field.
- vitamin – CATEGORY
- capsules – NOUN
In this case, the NLP parser uses search term-dropping logic. But this also returns the empty search results because the capsule has been set as an attribute value and the Elasticsearch query searches capsules against the Noun field. To handle such situations, a business logic runs the fallback Elasticsearch query based on the previous analysis of the search phrase/term. The Elasitcsearch query gets executed based on the following token. This returns all the products with the vitamin category.
- vitamin – CATEGORY
NLP profiles
A Natural Language Processor (NLP) profile controls the preprocessing flow of search terms before executing an Elasticsearch query. The profile is a .json file and stored in your query runtime container.
The default HCL_NLPProfile.json file can be found in the resources\profiles\nlp directory of the query runtime. An NLP profile can also be created through the /profiles REST endpoint and stored inside the Zookeeper “nlpprofiles” node. The /profiles endpoint allows a new optional query parameter, profileType with values of Search or NLP to differentiate the profile. Search is the default choice if this parameter is not provided.
The NLP profile contains three main sections:
- A list of provider classes. These classes help to preprocess the search term. For more information about the provider classes, see Provider class reference.
- A list of NLP classifications that will override default classifications at query run time.
- A search term dropping priority section, which is used to define the sequence in which search term are dropped.
The following is a sample HCL_NLPProfile.json file, showing how the data is organized. In this sample the classification is provided for informational purposes only. In the HCL default profile this section is empty.
Creating or updating an NLP profile
POST https://server:port/search/resources/api/v2/documents/profiles/HCL_NLPProfile?profileType=NLP
How the Query service finds the NLP profile
The Query runtime can load the NLP profile configuration details from the store index at runtime, or in response to a call via the Query REST API. If the configuration details are not provided, it will fall back on the default HCL NLP profile. The Query Service performs the following steps to lookup the name of the NLP profile.
- The Query service will check for the store locale NLP profile. If it is found, this profile will be loaded from Zookeeper.
- If no NLP profile is configured for the store locale, the query service will find the base locale from the language code of the store locale, and search for a profile name for that base locale. If one is found, this profile will be loaded from Zookeeper. The base locale can be any one of “en_US”, “es_ES”, “fr_FR”, “de_DE”, or “zh_CN”.
- If no NLP profile has been configured for the local base, the Query service will find the default NLP profile name for the store that isconfigured in the STORECONF table. If found, then this profile will be loaded from Zookeeper.
- If no default NLP profile is configured in STORECONF table, the Query Service will fall back to the default NLP profile.
Automatically handling search misses
Prior to HCL Commerce Version 9.1.8.0, if there are no results, search terms will be dropped from the search list, from left to right and one token at a time. You can now can specify which token gets removed from the search term when there are no results. In the NLP profile, the termDroppingPriority section details the priority according to which tokens are removed from the search term. After removing a token, the process makes another call with the updated search term. If a result is found, the Query service returns the result; otherwise, based on the configuration, another token is removed from search term. If you are using the default NLP profile, the dropping logic will be applied in the order below, but you can change the order or remove items from the list.
- FILTER: Will remove the price filter from search term.
- MEASUREMENT: Will remove measurement details from search term.
- BRAND: Will remove brand name from search term.
- COLOR: Will remove color name from search term.
- ADJECTIVE: Will remove adjective from search term.
- CATEGORY: Will remove category name from search term.
- NOUN: Will remove nouns from search term.
Before applying the term dropping logic, the process also removes tokens that are not identified by the NLP processor. For more information about term dropping, see Addressing search misses due to search dropping.
If there is no response after applying this logic, then the Query service makes a final fallback call based on the spell corrected details. This step cannot be customized.
NLP profile classification
- Dresses = CATEGORY
- Bath = ATTRIBUTE_NAME
- Style Home = BRAND_NAME
- Hermitage Collection = BRAND_NAME
- Albini = BRAND_NAME
You can override the above NER classification through the NLP profile as shown in the Sample Json 1.0. This configuration Dresses is considered as ATTRIBUTE_VALUE NER. Bath, Style Home and Hermitage Collection are always tagged with a CATEGORY NER for all the e-Site and Catalog.
{
"profileName": "HCL_NLPProfile",
…..
"classification": {
"Dresses": "ATTRIBUTE_VALUE ",
"Bath": "CATEGORY",
"Style Home": "CATEGORY",
"Hermitage Collection": "CATEGORY"
}
…..
}
Create catalog-specific NER classifications
- 3074457345616678668
- 3074457345616678669
- 3074457345616678670
- 3074457345616678671
In the NLP Profile classification, if BATH has a CATEGORY tag, consider it a CATEGORY, irrespective of the catalog.
Bath: CATEGORY
Here is the sample NLP Profile for catalog-specific NER classification.
{
"profileName": "HCL_NLPProfile",
…..
"classification": {
"Dresses": "NOUN:[3074457345616678668]",
"Bath": "CATEGORY",
"Hermitage Collection": "CATEGORY:[3074457345616678669];NOUN:[3074457345616678668]",
"Style Home": "CATEGORY",
"Albini": "CATEGORY:[3074457345616678669,3074457345616678670];NOUN:[3074457345616678671]"
},
…..
}
As per the sample data:- Dresses is a tag with CATEGORY NER classification.
- Bath is a tag with ATTRIBUTE_NAME NER classification.
- Style Home is a tag with BRAND_NAME NER classification.
- Hermitage Collection is a tag with BRAND_NAME NER classification.
- Albini is a tag with BRAND_NAME NER classification.
- Search for Dresses with catalog id 3074457345616678668 has a tag with NOUN and the search gets performed on the noun fields. For all other catalogs except 3074457345616678668 Dresses are considered as CATEGORY.
- Search for BATH is considered a CATEGORY as it is not mapped to any catalog, and the search is performed on the category fields.
- Search for Hermitage Collection with catalog 3074457345616678669 has a tag with CATEGORY and with 3074457345616678668 tag with NOUN, and the search is performed on the noun fields.
- Searching for Style Home is considered a CATEGORY as it is not mapped to any catalog, and the search is performed on the category fields.
- Searching for Albini with any catalog except 3074457345616678669, 3074457345616678670, and 3074457345616678671 tags with the BRAND_NAME.
- With catalog 3074457345616678669 and 3074457345616678670, Albini will be tag as CATEGORY and the search is performed on category fields.
- With catalog 3074457345616678671, the Albini is tagged with NOUN and search is performed on noun fields.
For example, consider a case where the word “apple” is classified as a BRAND_NAME by the Query service based on the index data. If you want to classify “apple” as a category, this change can be configured in the NLP profile classification section.
- Provider class reference
- The provider classes used in the profile have the following functions.
- PartNumber
- com.hcl.commerce.search.internal.expression.provider.SearchNLPPartNumberProviderHelper
- BlankSpace
- com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper
- CurrenySymbol
- com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper
- ExcludeSearchTerm
- com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper
- STA
- com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper
- MultiWordSearchTerm
- com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper
- LowerCase
- com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper
- PriceRangeSeparator
- com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceRangeSeparatorProviderHelper
- DMM
- com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper
- SpecialCharacter
- com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper
- MultiWordPriceFilter
- com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper
- Stopword
- com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper
- WordToNumber
- com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper
- PriceFilter
- com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper
- POS_NER
- com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper
- Color
- com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper
You can configure logging for the HCL Commerce Test server through the WebSphere Application Server Administrative console.