Common components of NiFi process groups
A common set of NiFi processors is used to perform various processing tasks within HCL Commerce NiFi process groups.
Process groups simplify complex data flows by allowing you to group components, such as processors, inside their own embedded canvas in the NiFi user interface. HCL Commerce Search comes with a set of default components that are commonly used in Ingest process groups. These processors are described below, as well as the most common set of NiFi-provided processors that are used.
For more information about process groups, see Anatomy of a Process Group in the Apache NiFi documentation.
HCL-supplied processors
- ComposeDatabaseSQL
- Typically used before ExecuteSQL. Its purpose is to define the SQL statement for use with ExecuteSQL, as well as to act as a user exit for an optional Ingest profile to perform additional modification to the given SQL before sending to ExecuteSQL.
- AnalyzeExecuteSQLRecordResponse
- Typically used after ExecuteSQL for analyzing the
database query response. It has two properties: Relationship Type, and
Refresh Index.
- Relationship Type: Relationship type defines whether the incoming connection is in a state of "success" or "failure" - there is dedicated logic within this processor to classify whether the response is of a real failure, success, or empty.
- Refresh Index: Refresh Index is an optional function which allows the Elasticsearch index to perform a refresh operation immediately after processing each database page.
- RouteOnCatalog
- Used only at the junction of the main processing flow at each flow
stage, to determine how many additional flow files should be sent to the
side flow. A "side flow" in NiFi is an alternative, optional processing
flow in an ingestion pipeline that is used with Ingest profiles for
performing custom ETL tasks. This processor uses three properties to
control the side flows, which are based on Master
Catalog, Default Catalog and
Other Catalogs.
For more information, see Customizing Ingest profiles.
- FilterOnCatalog
- Used only at the junction of the main processing flow at each flow stage, to ensure that flow files with desired catalog properties are sent to the side flow. This processor uses three properties to control what can and cannot be routed to side flows: Master Catalog, Default Catalog, and Other Catalogs.
- RouteOnLanguage
- Used only at the junction of the main processing flow at each flow stage, to determine how many additional flow files should be sent to the side flow. This processor uses two properties to control the side flows, which are based on Default Language, and Other Supported Languages.
- FilterOnLanguage
- Used only at the junction of the main processing flow, at each flow stage, to rensure that flow files with desired language properties are sent to the side flow. This processor uses two properties to control what can and cannot be routed to the side flows: Default Language, and Other Supported Languages..
- TrackBulkRequest
- Used only at the beginning, immediately after entering any of the Bulk Services. TrackBulkRequest registers additional metadata against each incoming flowfile,to track its state and total time spent within this Bulk Service. The processor has a property, Dataflow Rate Control, which can be used to enable or disable rate control against the incoming dataflow. Rate Control can be used to slow down dataflow to the specified rate to avoid overloading Elasticsearch. In addition, this processor also acts as a user exit for an optional Ingest profile to perform additional customization to the incoming dataflow.
- AnalyzeBulkResponse
- Used only at the end of a Bulk Service. Its main uses are to analyze the Elasticsearch Bulk response for error determination, and to act as a user exit for an optional Ingest profile to perform additional post-processing customization to the dataflow. This processor also detects the last flowfile of a stage and sends a release signal to the corresponding Wait Link of that stage in the main flow, to allow it continue to the next stage.
- ScrollElasticsearch
- Scroll through a given Elasticsearch result set.
- ComposeIndexSchema
- Calling a given ingest profile (if defined) to customize an existing index schema for Elasticsearch.
- SerializeDocument
- Find all (two dimensional) records serially and convert in (single dimension) format to be processed by downstream custom processor.
- MapIndexFieldsFromDatabase
- Mapping of custom database table columns into the corresponding index schema fields for ingest operation.
- PublishEvent
- Publish the current flowfile content as an event to HCL Cache.
- SubscribeEvent
- Subscribe events generated from HCL Cache.
- UpdateDocumentCounter
- Increase or decrease a given HCL Cache counter with the provided delta value. This processor is mainly used with event counters for tracking dataflows inside of NiFi.
- TrackDocument
- Register meta data to be used for tracking the dataflow in the current
ingest stage, such as
Product Stage 1a - Create Product Documents
. - RetryDocument
- Retry selected portion of a given bulk request flowfile in its waiting queue.
NiFi-supplied processors
- ExecuteSQL
- Executes the provided SQL statement. For more information ,see ExecuteSQL in the Apache NiFi documentation.
- ControlRate
- Controls the rate at which data is transferred to follow-on processors. For more information, see ControlRate in the Apache NiFi documentation.
- InvokeHTTP
- Mainly used to interact with a configurable Elasticsearch HTTP endpoint. For more information, see InvokeHTTP in the Apache NiFi documentation.
- RetryFlowFile
- Mainly used, along with the default RetryDocument processor, to perform rule-based retry operations. For more information, see RetryFlowFile in the Apache NiFi documentation.