Generating queries for Elasticsearch
Incoming search strings are passed through a Natural Language Processing (NLP) system, and queries are run against the generated NLP classification. When synonyms are found they are processed separately. Understanding how queries are contructed and run will help you customize and optimize your system.
During a keyword search, inputted search term go through NLP processing, as described in
Natural Language Processing (NLP) in Version 9.1. Once the search term is analyzed by the NLP parser
in the query service, the main search query is generated based on the classifications.
In case of synonym expansion, the expanded terms do not go through the NLP parser. All
the comma separated keywords from synonyms are added to the SHOULD clause of the
generated Elasticsearch query with minimum should
match condition
equaling 1.
Once the search term has been fully analyzed, each NLP classification obtains its own
list of analyzed keywords. An Elasticsearch query is generated out of each keyword from
the list and the generated query is added the to the MUST clause of the query. The
exception is adjective classification. Adjectives are added to the SHOULD clause with
minimum should
match = 2<70%
(default
configuration). Here 2<70%
means 1 to 2 should
clauses are required, but for more than three clauses 70% are required.
In the case of synonyms, if a partial search term qualifies for synonym expansion, then remaining search terms will be parsed through the NLP process, and based on the classification the query is generated with a MUST clause for the NLP-analyzed terms, and a SHOULD clause for the synonyms expanded terms. If anything is identified as an adjective in this case, then an adjective query will no longer beadded to the SHOULD clause along with synonyms; this time the adjectives query will be added to the MUST clause.
In the case of brand name searches, an exact match occurs after Lemmatization on the brand name, and query fields will be used from the list of fields below for the classification of the brand name. Along with brand name, if any other classification with a keyword list is available then a brand name query is generated for exact brand name search plus each stemmed token from brand name and query field scope will get increase with NOUN classification fields.
NOUN=natural.categories.[catalogId].normalized, natural.nouns.raw
CATEGORY=natural.categories.normalized, natural.categories.[catalogId].raw, natural.nouns.raw, natural.categories.[catalogId].normalized
BRAND_NAME=natural.names.normalized, natural.names.raw
ADJECTIVES=natural.adjectives.normalized, natural.adjectives.raw, natural.nouns.raw, natural.categories.[catalogId].normalized
ADJECTIVES_NAME=attribute.name.normalized, attribute.name.raw, natural.nouns.raw, natural.categories.[catalogId].normalized
UNIT_OF_MEASURE_DEFAULT_FIELD=attribute.value.raw, natural.nouns.raw, natural.categories.[catalogId].normalized
STA_QUERY_FIELD=natural.categories.[catalogId].normalized, natural.nouns.raw
ROOT_BOOSTING_FIELD=nlp.name.normalized, nlp.keyword.text
The fields in the ROOT_BOOSTING_FIELD are used for boosting purposes only. They are not part of the actual query field. The prefix nlp. is for HCL internal use only; the actual field name occurs after nlp.
PATCH/POST http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth
Request
body :
{
"extendedconfiguration": {
"configgrouping": [
{
"name": "SearchConfiguration",
"property": [
{
"name": "nlp.classification.field.mapping",
"value": "NOUN=natural.categories.[catalogId].normalized#natural.nouns.raw,
CATEGORY=natural.categories.normalized#natural.categories.[catalogId].raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
BRAND_NAME=natural.names.normalized#natural.names.raw,
ADJECTIVES=natural.adjectives.normalized#natural.adjectives.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
ADJECTIVES_NAME=attribute.name.normalized#attribute.name.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
UNIT_OF_MEASURE_DEFAULT_FIELD=attribute.value.raw#natural.nouns.raw#natural.categories.[catalogId].normalized,
STA_QUERY_FIELD=natural.categories.[catalogId].normalized#natural.nouns.raw,
ROOT_BOOSTING_FIELD=nlp.name.normalized#nlp.keyword.text"
}
]
}
]
}
}
- If this is the first time if you are adding the configuration through the /configuration endpoint, then use POST request method,otherwise use the PATCH request method.
- Restart the Query service after adding or updating the configuration.