Ingest Stopword index pipeline
Stopwords are easily generated using the NiFi pipeline.
Stopword Index Field Mapping From Data Specification
The following diagram illustrates the Stopword indexing pipeline implemented in
Apache NiFi. The flow consists of mainly three stages:
- Generate a Stopword dictionary document for Elastic Search based on the input Stopword per language.
- (IF POST) Extract current stopwords in the product index dictionary, and add them to the document generated in stage one.
- Update Product's language specific dictionaries with the Stopword document generated from Stage one and Stage two.
- Initial
- PUT or POST REST Call:
http://<Hostname>:30700/connectors/JsonStopword/data
{ "stopwords": { "english": { "stopwords": ["step1", "car"] }, "french": { "stopwords": ["step2", "dark"] } } }
- Stage 1. Generate Stopword dictionary document
- The following dataflow describes how the language specific Stopword data can be transformed using the CreateStopwordBodyPart1 Groovy script.
- Stage 2. (IF POST) Extract current stopwords in the product index dictionary, and add them to the generated document
- The following dataflow shows that when the user makes a POST* request, the following steps take place:
- Stage 3. Update the Product's language specific dictionaries with the stopword document generated.
- The following dataflow decribes the process of updating (Overwriting) the
Language Specific Dictionary with the previously generated documentation
through the following steps:
- Close Product index
- Update Product index
- Open Product index