HCL Commerce Version 9.1.14.0 or later

Considerations when using Basic Natural Language Processing

Natural Language Processing (NLP) can be used in two modes, Basic NLP and the default mode, Advanced NLP. Advanced NLP is enabled by default.

In certain circumstances you may want to use the Basic mode instead, for example, when your language is not supported by CoreNLP, because you want to reduce the size of your Query containers, or to reduce required heap memory for the Query container in the runtime environment. The NLP mode and its scope are controlled by the environmental variable NLP_ENABLE_LANGUAGE_CODE or Vault key nlpEnableLanguageCode. This variable supports a list of up to eight languages that are supported by a Stanford CoreNLP module, . If you add languages beyond the eight supported by CoreNLP, as described in Adding languages to the NLP service, the extra languages are evaluated by the Basic NLP module.

The eight languages that are supported in Advanced NLP are Arabic (ar), Chinese (zh), English (en), French (fr), German (de), Hungarian (hu), Italian (it), and Spanish (es). The eighteen languages supported by Basic NLP are Arabic (ar), English (en), Chinese (zh), Danish (da), Dutch (nl), Finnish (fi), French (fr), German (de), Greek (gr), Hungarian (hu), Italian (it), Norwegian (no), Portuguese (pt), Romanian (ro), Russian (ru), Spanish (es), Swedish (sv), and Turkish (tr).

Basic NLP provides all of the features of Advanced NLP, except for Dependency Parsing and Word to Number transformations. Eighteen languages are supported for Basic NLP. These languages are listed in the Basic NLP Properties file. If languages beyond these eighteen are specified, the Basic NLP flow is used for those languages, but without Stemming. In other words, Stemming is only applied to languages that are listed in the properties file, so if you do not wish to use Stemming with a particular language, remove it from the file.

For example, a store supporting Greek and English can have:
  1. Only English defined in NLP_ENABLE_LANGUAGE_CODE.
  2. There is no need to add Greek to the Basic NLP properties file. When any given language is passed to the API that is not part of the support eighteen languages, Stemming will not be performed, but the rest of the Basic NLP flow still continues.

Adding new snowball stemmers when using Basic NLP

Certain processes assume that the Advanced NLP functions are present. One such operation is stemming, which is used to reduce inflected or compound words down to their word stem. HCL Commerce Search uses the snowballPorterFilterFactory filter for stemming. If you enable Basic NLP, manually add your locale name and the snowball stemmer-supported language name in the 'stemmer.language' property if it is not in the default supported stemmer list. By default HCL Commerce Search includes snowball stemmers for the following locales in the index settings. Although Elasticsearch does not have stemmers available for Chinese and Greek, no additional configuration is needed to make the rest of these default stemmers available to the Query service if you enable Basic NLP.
[ar_EG = Arabic, da_DK = Danish, de_DE = German, en_US = English, es_ES = Spanish, 
fi_FI = Finnish, fr_FR = French, hu_HU = Hungarian, it_IT = Italian, nb_no = Norwegian, nl_NL = Dutch, pt_BR = Portuguese, 
ro_RO = Romanian, ru_RU = Russian, sv_SE = Swedish, tr_TR = Turkish]
If you want to add a snowball stemmer for a new language when Basic NLP is enabled, you can do so by sending a PATCH request to the following configuration endpoint. After making this change, restart NiFi and trigger a full index. Once the index is complete, the new stemmer is included in the index settings. In this example, we are adding Dutch and Norwegian stemmers to the list of filters.
PATCH http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=ingest&envType=auth

Request Body:

{
"global": {
"connector": [
{
"name": "attribute",
"property": [
{
"name": "stemmer.language",
"value": "
{\"nb_NO\": \"Norwegian\", \"nl_NL\": \"Dutch\"}
"
}]}]}}
Note: If locale is not supported by the current store then no stemmer will be added to the index settings.

For assistance in acquiring the stemmer for a particular language, see the topic Snowball token filter on the Elasticsearch documentation website.

Important:
After an upgrade to HCL Commerce Search Version 9.1.14.1, verify the integrity of your Ingest node data using the /configuration REST endpoint.
GET - http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=ingest&envType=auth
The Ingest node must include configuration properties and all default values for the following stemmer and stopword languages.
{
"name": "stemmer.language",
"value": "
{\"ar_EG\": \"Arabic\", \"da_DK\": \"Danish\", \"de_DE\": \"German\", \"en_US\": \"English\", \"es_ES\": \"Spanish\", \"fi_FI\": \"Finnish\", \"fr_FR\": \"French\", \"hu_HU\": \"Hungarian\", \"it_IT\": \"Italian\", \"nb_no\": \"Norwegian\", \"nl_NL\": \"Dutch\", \"pt_BR\": \"Portuguese\", \"ro_RO\": \"Romanian\", \"ru_RU\": \"Russian\", \"sv_SE\": \"Swedish\", \"tr_TR\": \"Turkish\"}

"
},
{
"name": "stopword.language",
"value": "
{ \"ar\": \"_arabic_\", \"da\": \"_danish_\", \"de\": \"_german_\", \"el\": \"_greek_\", \"en\": \"_english_\", \"es\": \"_spanish_\", \"fi\": \"_finnish_\", \"fr\": \"_french_\", \"hu\": \"_hungarian_\", \"it\": \"_italian_\", \"ja\": \"_cjk_\", \"ko\": \"_cjk_\", \"nb\": \"_norwegian_\", \"nl\": \"_dutch_\", \"pt\": \"_portuguese_\", \"ro\": \"_romanian_\", \"ru\": \"_russian_\", \"sv\": \"_swedish_\", \"tr\": \"_turkish_\", \"zh\": \"_cjk_\"}

"
}

If the stemmer.language or stopword.language properties are not present in the Ingest node, update these using the /configuration endpoint asdescribed in Manually adding languages when using Basic NLP

Enabling category search for non-leaf categories

HCL Commerce allows Shoppers to find products based on their associated parent category name, orin the case of a list of category names. This situation can arise when linked categories or a full category path search are enabled for searching. Once a category name match is found, all products within this category are returned in the search result. Additional refinement can be performed together with other terms in the same search phrase. For example, a search phrase "Gusso dresses" includes a Gusso brand name and a category name (as well as a product name) called "dresses". This search will return all products under those categories that have "dresses" in their name that are of the Gusso brand, followed by other products that only match "Gusso" or "dresses" in their name or short description.

Category search is disabled in Basic NLP. This means that while performing keyword searches, only leaf level categories are considered when matching the input term to categories. You can configure the system to match the input term for non leaf categories as well, using the following configuration endpoint. Set the value of the Ingest property flow.basic.nlp.category.search to true. By default category search is disabled; to enable it, perform the following steps.

  1. Execute a PATCH request to the Ingest configuration API with the included request body.
    PATCH - http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=ingest&envType=auth
    
    {
    "global": {
    "connector": [
    {
    "name": "attribute",
    "property": [
    { "name": "flow.basic.nlp.category.search", "value": "true" }
                ]
    }
                 ]
              }
    }
  2. Perform a full reindex. See Building the Elasticsearch index for the procedure.
  3. Restart the Query service. For more information, see Starting the Query Docker container with default configurations.

Boosting results in Basic NLP

You can use a provider to boost the result of search terms identified as NOUN, CATEGORY or BRAND. The default is the Advanced NLP provider defined in the HCL_Basic_NLPProfile.json configuration file:
"Boost":"com.hcl.commerce.search.internal.expression.provider.SearchBasicNLPBoostQueryProviderHelper"
The default content of this file is as follows:
{
	"profileName": "HCL_NLPProfile",
	"provider": {
		"PartNumber": "com.hcl.commerce.search.internal.expression.provider.SearchNLPCustomPartNumberHelper",
		"BlankSpace": "com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper",
		"CurrenySymbol": "com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper",
		"SpellCorrect": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSpellCorrectionProviderHelper",
		"ExcludeSearchTerm": "com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper",
		"NumberFormatter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPNumberFormatterProviderHelper",
		"STA": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSTAExpansionProviderHelper",
		"DependenciesParsing": "com.hcl.commerce.search.internal.expression.provider.SearchNLPDependenciesParsingProviderHelper",
		"MultiWordSearchTerm": "com.hcl.commerce.search.internal.expression.provider.SearchNLPMultiwordTermProviderHelper",
		"LowerCase": "com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper",
		"DMM": "com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper",
		"SpecialCharacter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSpecialCharacterProviderHelper",
		"MultiWordPriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper",
		"Stopword": "com.hcl.commerce.search.internal.expression.provider.SearchNLPStopwordProviderHelper",
		"WordToNumber": "com.hcl.commerce.search.internal.expression.provider.SearchNLPWordToNumberProviderHelper",
		"PriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper",
		"POS_NER": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPOSAndNERProviderHelper",
		"Color": "com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper"
	},
	"classification": {
	},
	"termDroppingPriority": {
		"1": "FILTER",
		"2": "MEASUREMENT",
		"3": "BRAND",
		"4": "COLOR",
		"5": "ADJECTIVE",
		"6": "CATEGORY",
		"7": "NOUN"
	}
}
HCL Commerce Version 9.1.15.0 or laterNote: You can disable term dropping by setting termDroppingPriority to a null value. For more information, see Disabling term dropping.
The Basic NLP provider is defined in the file HCL_Basic_NLPProfile.json.
{
	"profileName": "HCL_Basic_NLPProfile",
	"provider": {
		"PartNumber": "com.hcl.commerce.search.internal.expression.provider.SearchNLPCustomPartNumberHelper",
		"CategorySearch" : "com.hcl.commerce.search.internal.expression.provider.SearchBasicNLPCategorySearchProviderHelper",
		"BlankSpace": "com.hcl.commerce.search.internal.expression.provider.SearchNLPWhiteSpaceProviderHelper",
		"CurrenySymbol": "com.hcl.commerce.search.internal.expression.provider.SearchNLPCurrencySymbolProviderHelper",
		"LowerCase": "com.hcl.commerce.search.internal.expression.provider.SearchNLPLowerCaseProviderHelper",
		"SpellCorrect": "com.hcl.commerce.search.internal.expression.provider.SearchNLPSpellCorrectionProviderHelper",
		"ExcludeSearchTerm": "com.hcl.commerce.search.internal.expression.provider.SearchNLPExcludedTermProviderHelper",
		"Fractional": "com.hcl.commerce.search.internal.expression.provider.SearchNLPFractionalNumberHelper",
		"DMM": "com.hcl.commerce.search.internal.expression.provider.SearchNLPDMMProviderHelper",
		"MultiWordPriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchMultiwordFilterProviderHelper",
		"Stopword": "com.hcl.commerce.search.internal.expression.provider.SearchBasicNLPStopwordProviderHelper",
		"PriceFilter": "com.hcl.commerce.search.internal.expression.provider.SearchNLPPriceFilterProviderHelper",
		"POS_NER": "com.hcl.commerce.search.internal.expression.provider.SearchBasicNLPPOSAndNERProviderHelper",
		"Color": "com.hcl.commerce.search.internal.expression.provider.SearchNLPColorMMProviderHelper",
		"Boost": "com.hcl.commerce.search.internal.expression.provider.SearchBasicNLPBoostQueryProviderHelper"
	},
	"classification": {
	},
	"termDroppingPriority": {
		"1": "FILTER",
		"2": "MEASUREMENT",
		"3": "BRAND",
		"4": "COLOR",
		"5": "ADJECTIVE",
		"6": "CATEGORY",
		"7": "NOUN"
	}
}
A new configuration is also added in the wc-component.json configuration file that contains the field name on which the boosting is performed.
{
"name": "basic.nlp.boost.fields",
"value":"NOUN=nlp.natural.categories.[catalogId].normalized,BRAND=nlp.natural.names.raw,CATEGORY=nlp.natural.categories.[catalogId].raw"
   }
This is a comma separated list in which each value consists of two parts:
  • Before ‘=’ is the classification.
  • After ‘=’ is the field on which you perform boosting if it is identified as NOUN, BRAND, or CATEGORY classification.
To boost a result, the field and the boost factor are both required. The field is derived from the component above, and the boost factor is defined in the boostfields.properties file, with the prefix ‘basic’.

Basic NLP stored scripts

On Query service startup, stored scripts are loaded that boost the search results. These scripts are loaded based in the component basic.nlp.boost.fields.
  • "noun-boost-script-param-1"
  • "noun-boost-script-param-2"
  • "noun-boost-script-param-3"
  • "brand-boost-script-param-1"
  • "brand-boost-script-param-2"
  • "brand-boost-script-param-3"
  • "category-boost-script-param-1"
  • "category-boost-script-param-2"
  • "category-boost-script-param-3"

The following screen capture shows an example of a "noun-boost-script-param-1" boost script configuration.

For more information about boosting search terms in Basic NLP, see NLP processing details in API responses.

Disabling Basic NLP in Versions 9.1.15.2+

To completely disable Natural Language Processing when the Basic NLP option is being used in Versions 9.1.15.2 and upward, perform the following steps.

  1. Remove all the languages from NLP_ENABLE_LANGUAGE_CODE by setting the value of this environment value to null.
  2. Add a new empty Basic NLP profile in Zookeeper using the data query endpoint /search/resources/api/v2/documents/profiles/. Include the following content:
    {
        "profileName": "HCL_Basic_NLPProfile1",
        "provider": {},
        "classification": {
        },
        "termDroppingPriority": {}
    }
    Configure this new NLP profile as explained in Configuring the NLP profile.
  3. Set the flow.disable.basic.nlp attribute to true using the following REST endpoint:
    PATCH API https://data-query-host:data-query-port/search/resources/api/v2/configuration?nodeName=ingest&envType=auth
    Use the following content to change the attribute:
    {
        "global": {
            "connector": [
                {
                    "name": "attribute",
                    "property": [
                        {
                            "name": "flow.disable.basic.nlp",
                            "value": "true"
                        }
                    ]
                }
            ]
        }
    } 
    For more information about the flow.disable.basic.nlp attribute, see Ingest configuration via REST.