Tuning the Ingest service

NiFi can generate large amounts of data during the Ingest process. Use the formulas provided in the following topics to tune your Ingest processes so that NiFi provides data at the same rate as Elasticsearch is able to consume it.

Why tune NiFi

NiFi can ingest information faster than Elasticsearch can index it. It is therefore possible for NiFi to overwhelm Elasticsearch with data, which can result in performance degradation. If do not have separate Auth and Live environments, your production search experience may be affected.

The solution is NiFi tuning. If the rate at which NiFi ingests data is less than or equal to the rate at which Elasticsearch consumes it, then there is no performance degradation. This is a straightforward solution in theory, however, each HCL Commerce Search environment is different. Therefore, tuning has to be done on an individual basis, specifically for each implementation. To ensure that you are able to do this, HCL Commerce provides the following guidelines, methods and parameter settings.

General approach

Tuning NiFi to match your Elasticsearch throughput involves adding additional configuration options to NiFi tuning parameters. These configuration points are provided in the following documents. In addition to the tuning parameters themselves, a method for calculating appropriate values for these tuning parameters is provided. This aids you in knowing what to tune and how to tune it.

It is useful to break the tuning process down into two clear steps:

Tuning:
- By adding additional upgrade-friendly configuration for NiFi tuning parameters,
- By publicly documenting these configuration points,
- By privately documenting a method for calculating sane values for these tuning parameters.
Automation:
- By adding an endpoint to Ingest service so that it can analyze the historical ingest data, calculate the new tuning values
- By adding an endpoint to Ingest service so that it can assign the new tuning values to the appropriate ingest pipelines

The automation phase includes adding an endpoint to the Ingest service. This endpoint will analyze historical ingest data and calculate new tuning values. Another endpoint will assign these new tuning values to the relevant ingest pipelines. This automation will streamline the process and ensure accurate tuning based on actual data.

Summary of the tuning process

The Ingest dataflow consists of multiple business processing stages, linked together one after another. Each stage is a stream of data moving from one location (database) to another (Elasticsearch). Each dataflow involves three main ETL operations: Extracting, Transforming,and Loading.

Each operation can be controlled by the following tuning parameters:
- Extracting uses page size and bucket size to determine the size of each payload in the data stream.
- Transforming can be spread across multiple concurrent threads.
- Loading rate is determined by the size of the request and the concurrent number of threads sending to Elasticsearch.

The tuning goal of Ingest dataflow is to obtain environment-specific tuning settings optimized for each stage with the least overall Ingest elapsed time. It is recommended that you attempt to satisfy certain assumptions when performing tuning to obtain more reliable results: use the heaviest ingest run, such as the re-index connector, for tuning estimation, and only allow one exclusive re-indexing operation to run in NiFi at any time.