
Tuning the Ingest service
NiFi can generate large amounts of data during the Ingest process. Use the formulas provided in the following topics to tune your Ingest processes so that NiFi provides data at the same rate as Elasticsearch is able to consume it.
Why tune NiFi
NiFi can ingest information faster than Elasticsearch can index it. It is therefore possible for NiFi to overwhelm Elasticsearch with data, which can result in performance degradation. If do not have separate Auth and Live environments, your production search experience may be affected.
The solution is NiFi tuning. If the rate at which NiFi ingests data is less than or equal to the rate at which Elasticsearch consumes it, then there is no performance degradation. This is a straightforward solution in theory, however, each HCL Commerce Search environment is different. Therefore, tuning has to be done on an individual basis, specifically for each implementation. To ensure that you are able to do this, HCL Commerce provides the following guidelines, methods and parameter settings.
General approach
Tuning NiFi to match your Elasticsearch throughput involves adding additional configuration options to NiFi tuning parameters. These configuration points are provided in the following documents. In addition to the tuning parameters themselves, a method for calculating appropriate values for these tuning parameters is provided. This aids you in knowing what to tune and how to tune it.
- Tuning:
- By adding additional upgrade-friendly configuration for NiFi tuning parameters,
- By publicly documenting these configuration points,
- By privately documenting a method for calculating sane values for these tuning parameters.
- Automation:
- By adding an endpoint to Ingest service so that it can analyze the historical ingest data, calculate the new tuning values
- By adding an endpoint to Ingest service so that it can assign the new tuning values to the appropriate ingest pipelines
The automation phase includes adding an endpoint to the Ingest service. This endpoint will analyze historical ingest data and calculate new tuning values. Another endpoint will assign these new tuning values to the relevant ingest pipelines. This automation will streamline the process and ensure accurate tuning based on actual data.
Summary of the tuning process
The Ingest dataflow consists of multiple business processing stages, linked together one after another. Each stage is a stream of data moving from one location (database) to another (Elasticsearch). Each dataflow involves three main ETL operations: Extracting, Transforming,and Loading.
- Each operation can be controlled by the following tuning parameters:
- Extracting uses page size and bucket size to determine the size of each payload in the data stream.
- Transforming can be spread across multiple concurrent threads.
- Loading rate is determined by the size of the request and the concurrent number of threads sending to Elasticsearch.