Adding part-of-speech tags
Search tokens are tagged according to their part-of-speech (POS). For example, "sofa" is usually tagged as a noun. Terms that are not tagged as a recognized type will be ignored, but you can extend the part-of-speech logic to accommodate new types.
How parts-of-speech are processed
"name": "NLPPOSCodes",
"property": [
{
"name" : "NOUN_CODE",
"value" : "NN,NNS,NNS,NNPS,NOUN,NE"
},
{
"name" : "VERB_CODE",
"value" : "VB,VBD,VBG,VBN,VBP,VBZ,VERB,VMFIN,VVINF,VVFIN,VV"
},
{
"name" : "ADJECTIVE_CODE",
"value" : "DT,PDT,JJ,JJR,JJS,ADJ,ADJA,ADJD"
},
{
"name" : "NUMERIC_CODE",
"value" : "CD,CARD,NUM,NFP"
When your search string includes the token “sofa,” for example, CoreNLP
will annotate that token with the POS tag of NN
(Noun).- Searches for nouns are performed on
natural.nouns.normalized
andnatural.nouns.raw
in the index. - Searches for adjectives will be performed on
natural.adjectives.normalized
andnatural.adjectives.raw
in the index. - Numeric codes are used based on the input search term identified by the
matchmaker, then by
natural.*.measurements
or else it will be search onnatural.adjectives.normalized
andnatural.adjectives.raw
in the index. - Any search tokens annotated with a verb tag will be ignored during the search.
The token string may contain more than one kind of tag. The four most common types of tags will all be recognized by the above processors. Tags not of these types will be ignored. In the case of the search string "hello world," "hello" will be tagged as UH, while "world" will be tagged as NN. UH is not one of the listed types for noun, adjective, numerics or verbs. Therefore, only "world" will participate in the search.
PATCH http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth
{
"extendedconfiguration": {
"configgrouping": [
{
"name": "NLPPOSCodes",
"property": [
{
"name": "NOUN_CODE",
"value": "NN,NNS,NNS,NNPS,NOUN,NE,UH"
}
]
}
]
}
}