HCL Commerce Version 9.1.15.0 or later

Tuning the Ingest service

NiFi can generate large amounts of data during the Ingest process. Use the formulas provided in the following topics to tune your Ingest processes so that NiFi provides data at the same rate as Elasticsearch is able to consume it.

Why tune NiFi

NiFi can ingest information faster than Elasticsearch can index it. It is therefore possible for NiFi to overwhelm Elasticsearch with data, which can result in performance degradation. If do not have separate Auth and Live environments, your production search experience may be affected.

The solution is NiFi tuning. If the rate at which NiFi ingests data is less than or equal to the rate at which Elasticsearch consumes it, then there is no performance degradation. This is a straightforward solution in theory, however, each HCL Commerce Search environment is different. Therefore, tuning has to be done on an individual basis, specifically for each implementation. To ensure that you are able to do this, HCL Commerce provides the following guidelines, methods and parameter settings.

General approach

Tuning NiFi to match your Elasticsearch throughout involves adding additional configuration options to NiFi tuning parameters. These configuration points are provided in the following documents. In addition to the tuning parameters themselves, a method for calculating appropriate values for these tuning parameters is provided. This aids you in knowing what to tune and how to tune it.

It is useful to break the tuning process down as per below steps:
  • Tuning:
    • By adding additional upgrade-friendly configuration for NiFi tuning parameters,
    • By publicly documenting these configuration points,
    • By privately documenting a method for calculating sane values for these tuning parameters.

Summary of the tuning process

The Ingest dataflow consists of multiple business processing stages, linked together one after another. Each stage is a stream of data moving from one location (database) to another (Elasticsearch). Each dataflow involves three main ETL operations: Extracting, Transforming,and Loading.

  • Each operation can be controlled by the following tuning parameters:
    • Extracting uses page size and bucket size to determine the size of each payload in the data stream.
    • Transforming can be spread across multiple concurrent threads.
    • Loading rate is determined by the size of the request and the concurrent number of threads sending to Elasticsearch.
The tuning goal of Ingest dataflow is to obtain environment-specific tuning settings optimized for each stage with the least overall Ingest elapsed time. It is recommended that you attempt to satisfy certain assumptions when performing tuning to obtain more reliable results: use the heaviest ingest run, such as the re-index connector, for tuning estimation, and only allow one exclusive re-indexing operation to run in NiFi at any time.