Generating queries for Elasticsearch

Incoming search strings are passed through a Natural Language Processing (NLP) system, and queries are run against the generated NLP classification. When synonyms are found they are processed separately. Understanding how queries are contructed and run will help you customize and optimize your system.

During a keyword search, inputted search term go through NLP processing, as described in Natural Language Processing (NLP) in Version 9.1. Once the search term is analyzed by the NLP parser in the query service, the main search query is generated based on the classifications. In case of synonym expansion, the expanded terms do not go through the NLP parser. All the comma separated keywords from synonyms are added to the SHOULD clause of the generated Elasticsearch query with minimum should match condition equaling 1.

Once the search term has been fully analyzed, each NLP classification obtains its own list of analyzed keywords. An Elasticsearch query is generated out of each keyword from the list and the generated query is added the to the MUST clause of the query. The exception is adjective classification. Adjectives are added to the SHOULD clause with minimum should match = 2<70% (default configuration). Here 2<70% means 1 to 2 should clauses are required, but for more than three clauses 70% are required.

In the case of synonyms, if a partial search term qualifies for synonym expansion, then remaining search terms will be parsed through the NLP process, and based on the classification the query is generated with a MUST clause for the NLP-analyzed terms, and a SHOULD clause for the synonyms expanded terms. If anything is identified as an adjective in this case, then an adjective query will no longer beadded to the SHOULD clause along with synonyms; this time the adjectives query will be added to the MUST clause.

In the case of brand name searches, an exact match occurs after Lemmatization on the brand name, and query fields will be used from the list of fields below for the classification of the brand name. Along with brand name, if any other classification with a keyword list is available then a brand name query is generated for exact brand name search plus each stemmed token from brand name and query field scope will get increase with NOUN classification fields.

Following is the list of query fields which are used while searching on each NLP classification and synonyms expanded term. Additional fields are added to the final Elasticsearch query from the search profile query field list. These fields are configurable and can be updated through the /configuration REST API endpoint for the ZooKeeper component node property nlp.classification.field.mapping. To increase the search scope of the classification, we can add a new field in the # separated field list through the configuration endpoint.

NOUN=natural.categories.[catalogId].normalized, natural.nouns.raw

CATEGORY=natural.categories.normalized, natural.categories.[catalogId].raw, natural.nouns.raw, natural.categories.[catalogId].normalized

BRAND_NAME=natural.names.normalized, natural.names.raw

ADJECTIVES=natural.adjectives.normalized, natural.adjectives.raw, natural.nouns.raw, natural.categories.[catalogId].normalized

ADJECTIVES_NAME=attribute.name.normalized, attribute.name.raw, natural.nouns.raw, natural.categories.[catalogId].normalized

UNIT_OF_MEASURE_DEFAULT_FIELD=attribute.value.raw, natural.nouns.raw, natural.categories.[catalogId].normalized

STA_QUERY_FIELD=natural.categories.[catalogId].normalized, natural.nouns.raw

ROOT_BOOSTING_FIELD=nlp.name.normalized, nlp.keyword.text

The fields in the ROOT_BOOSTING_FIELD are used for boosting purposes only. They are not part of the actual query field. The prefix nlp. is for HCL internal use only; the actual field name occurs after nlp.

Fields can be updated using the /configuration endpoint as shown in the following example. This process uses the PATCH/POST request method.

PATCH/POST http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth

Request body :


{
    "extendedconfiguration": {
        "configgrouping": [
            {
                "name": "SearchConfiguration",
                "property": [
                    {
                        "name": "nlp.classification.field.mapping",
                        "value": "NOUN=natural.categories.[catalogId].normalized#natural.nouns.raw,
CATEGORY=natural.categories.normalized#natural.categories.[catalogId].raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
BRAND_NAME=natural.names.normalized#natural.names.raw,
ADJECTIVES=natural.adjectives.normalized#natural.adjectives.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
ADJECTIVES_NAME=attribute.name.normalized#attribute.name.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
UNIT_OF_MEASURE_DEFAULT_FIELD=attribute.value.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
STA_QUERY_FIELD=natural.categories.[catalogId].normalized#natural.nouns.raw,
ROOT_BOOSTING_FIELD=nlp.name.normalized#nlp.keyword.text"
                    }
                ]
            }
        ]
    }
}

Note:

If this is the first time if you are adding the configuration through the /configuration endpoint, then use POST request method,otherwise use the PATCH request method.
Restart the Query service after adding or updating the configuration.