Ingest Stopword index pipeline

Stopwords are easily generated using the NiFi pipeline.

Stopword Index Field Mapping From Data Specification

The following diagram illustrates the Stopword indexing pipeline implemented in Apache NiFi. The flow consists of mainly three stages:

Generate a Stopword dictionary document for Elastic Search based on the input Stopword per language.
(IF POST) Extract current stopwords in the product index dictionary, and add them to the document generated in stage one.
Update Product's language specific dictionaries with the Stopword document generated from Stage one and Stage two.

Initial

PUT or POST REST Call: http://<Hostname>:30700/connectors/JsonStopword/data

{

"stopwords": {

 "english": {

"stopwords": ["step1", "car"]

},

"french": {

"stopwords": ["step2", "dark"]

}

}

}

Stage 1. Generate Stopword dictionary document

The following dataflow describes how the language specific Stopword data can be transformed using the CreateStopwordBodyPart1 Groovy script.

Output:

{
    "analysis" : {
        "filter" : {
            "custom_english_stopwords_dictionary" : {
                "stopwords": ["step1", "car"]
                "type" : "stop"
            },
            "custom_french_stopwords_dictionary" : {
                "stopwords": ["step2", "dark"]
                "type" : "stop"
            }
        }
    }
}

Stage 2. (IF POST) Extract current stopwords in the product index dictionary, and add them to the generated document

The following dataflow shows that when the user makes a POST* request, the following steps take place:

A GET call is made to get the current Stopword Dictionaries per language from the product index.
The language specific Stopword data from (stage 1) will be transformed using the CreateStopwordBodyPart2 Groovy script, to merge the data with document generated from Stage 1.

*Else the user will make a PUT request, which will not add the current Language Specified Stopword Dictionaries in the index to the document from Stage 1.

Step 2 Output:

{
    "analysis" : {
        "filter" : {
            "custom_english_stopwords_dictionary" : {
                "stopwords": ["the","step1", "car"]
                "type" : "stop"
            },
            "custom_french_stopwords_dictionary" : {
                "stopwords": ["step2", "dark"]
                "type" : "stop"
            }
        }
    }
}

Stage 3. Update the Product's language specific dictionaries with the stopword document generated.

The following dataflow decribes the process of updating (Overwriting) the Language Specific Dictionary with the previously generated documentation through the following steps:

Close Product index
Update Product index
Open Product index