Adding custom nouns and classifications to NLP Name-Entity-Recognition (NER)
Name Entity Recognition (NER) is one of the most common text pre-processing techniques used in Natural Language Processing (NLP). NER is used in many fields in Artificial Intelligence (AI) including Natural Language Processing (NLP) and Machine Learning. Apart from the default entities for NER in the Stanford CoreNLP Natural Language Processor, you can also add custom language specific nouns and classifications for NER. The custom noun and classification for NER can be configured in the zookeeper filter node. New nouns and classifications are added using the POST request method. The existing nouns and classifications are updated using the PATCH request method.
Endpoints
http://data_environment_hostname:30920/search/resources/api/v2/configuration?nodeName=filter&envType=auth&locale=en_US
https://data_environment_hostname:30921/search/resources/api/v2/configuration?nodeName=filter&envType=auth&locale=en_US
{
“Dresses” : “CATEGORY”,
“Versatil” : “BRAND_NAME”,
“Size” : “ATTRIBUTE_NAME”,
“XL” : “ATTRIBUTE_VALUE”,
“an” : “IGNORE_TERM”,
“One and half” : “TO_NUMBER~1.5”,
“below” : “FILTER_LTE~1”,
“above” : “FILTER_GTE~1”,
“red” : “COLOR”,
“inch” : “UOM”
}
Default NER tags
- CATEGORY
- Maps to the category related index fields in the Product index.
- BRAND_NAME
- Maps to the manufacturer name indexed field in the Product index.
- ATTRIBUTE_VALUE
- Maps to the NLP Adjective indexed fields in the Product index.
- ATTRIBUTE_NAME
- Maps to the indexed attribute name in the Product index.
- IGNORE_TERM
- Removes the matching terms from the term search expression.
- TO_NUMBER
- Maps to the NLP Numeric indexed fields in the Product index.
- FILTER_GTE~1
- Defines a range filter condition that is greater than or equal to the given argument. This argument is the term which follows immediately the matching pattern.
- FILTER_LTE~1
- Defines a range filter condition that is greater than or equal to the given argument. This argument is the term which follows immediately the matching pattern.
- UOM
- Maps to the MatchMaker unit of measure indexed fields in the Product index.
- COLOR
- Maps to the MatchMaker color family indexed fields in the Product index.
en_US
Locale. The POST request method
is used for non-en_US
locales.Custom NER tags
If the default NER tags do not cover all of your needs, you can define your own. You do this by adding a new NER tag mapping via the /configuration endpoint, using the Patch method and a request body containing a JSON-formatted tag definition. Each request extends one NLPSearchFieldMapping object, and you can only map to product index fields.
Procedure
- Define your new mapping tag. In the example below, the mapping tag is
SELLER
.{ "extendedconfiguration": { "configgrouping": [ { "name": "NLPSearchFieldMapping", "property": { "name": "NLPFieldsDetail", "value": "[{\"NERTag\":\"SELLER\",\"IndexRawFieldName\":\"seller.raw\",\"IndexNormalizedFieldName\":\"seller.normalized\",\"FieldLevelLemmatization\":\"false\", \"BoostFactor\":\"100.0\"}]" } } ] } }
TheSELLER
mapping tag uses five tags. Of these five tags, three are mandatory, and the other two are optional and have default values.- NERTag
- The name of the NER tag used to classify the token (SELLER).
- IndexRawFieldName
- The field to use as the aggregation field while training
custom data and raw field while searching for the term.
(
seller.raw
). - IndexNormalizedFieldName
- The field to use as a normalized field while searching for
the term (
seller.normalized
). - FieldLevelLemmatization
- While training and searching, apply lemmatization on the
field if set to
true
. The default value isfalse
. - BoostFactor
- The search value of the boost factor is used to apply for boosting. Default value is 100.0.
Note: First three tags are mandatory missing any will log an error and continue with other tags. This could impact on the search result. - Update the new NER tag mapping using the PATCH request method for the
/configuration endpoint . Add your new mapping as
the request
body.
PATCH -http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth
- After you have updated the mapping, restart the Query service.
Result
Your custom NER tag is now available for use in NLP processing.