Adding part-of-speech tags

Search tokens are tagged according to their part-of-speech (POS). For example, "sofa" is usually tagged as a noun. Terms that are not tagged as a recognized type will be ignored, but you can extend the part-of-speech logic to accommodate new types.

How parts-of-speech are processed

During search preprocessing, incoming search tokens are usually assigned tags corresponding to nouns, verbs, adjectives or numeric objects. The default coding for these categorizations is:

    "name": "NLPPOSCodes",
    "property": [
     {
       "name" : "NOUN_CODE",
       "value" : "NN,NNS,NNS,NNPS,NOUN,NE"
     },
     {
       "name" : "VERB_CODE",
       "value" : "VB,VBD,VBG,VBN,VBP,VBZ,VERB,VMFIN,VVINF,VVFIN,VV"
     },
     {
       "name" : "ADJECTIVE_CODE",
       "value" : "DT,PDT,JJ,JJR,JJS,ADJ,ADJA,ADJD"
     },
     {
     	"name" : "NUMERIC_CODE",
       "value" : "CD,CARD,NUM,NFP"

When your search string includes the token “sofa,” for example, CoreNLP will annotate that token with the POS tag of NN (Noun).

The annotated token string is passed to the appropriate processor:

Searches for nouns are performed on natural.nouns.normalized and natural.nouns.raw in the index.
Searches for adjectives will be performed on natural.adjectives.normalized and natural.adjectives.raw in the index.
Numeric codes are used based on the input search term identified by the matchmaker, then by natural.*.measurements or else it will be search on natural.adjectives.normalized and natural.adjectives.raw in the index.
Any search tokens annotated with a verb tag will be ignored during the search.

The token string may contain more than one kind of tag. The four most common types of tags will all be recognized by the above processors. Tags not of these types will be ignored. In the case of the search string "hello world," "hello" will be tagged as UH, while "world" will be tagged as NN. UH is not one of the listed types for noun, adjective, numerics or verbs. Therefore, only "world" will participate in the search.

To avoid this situation, you can add the POS tag to the list using a PATCH call to the /configuration REST endpoint.

PATCH http://dataQueryHost:dataQueryPort/search/resources/api/v2/configuration?nodeName=component&envType=auth

Note: The first time that you add this or any configuration to the component node, use the POST request method. In subsequent calls, use PATCH.

Use the following JSON code as the body of the request.

{
    "extendedconfiguration": {
        "configgrouping": [
            {
                "name": "NLPPOSCodes",
                "property": [
                    {
                        "name": "NOUN_CODE",
                        "value": "NN,NNS,NNS,NNPS,NOUN,NE,UH"
                    }
                ]
            }
        ]
    }
}

Note: Restart the Query service after making this change.