Troubleshooting: Request Level Trace in NiFi dataflow
Request Level Tracing (RLT) enables you to track individual data entities (identified by unique IDs) as they move through the complete ingest dataflow in NiFi. This allows you to monitor each processing stage, identify performance bottlenecks, and troubleshoot failures or data loss with high precision. This guide explains how to enable request level tracing, configure trace specifications, interpret trace outputs, and understand the data lifecycle within NiFi's ingest pipeline.
Purpose
- Track specific identifiers (IDs) through the entire ingest process.
- Understand how and where each piece of data is handled or transformed.
- Detect and investigate failures or missing data.
- Debug issues related to data ingestion and transformation.
How to Enable Request Level Trace
- Option 1: Via Query Parameter (Request level)
- Pass a 'trace' query parameter in your ingest request with the trace
specification
string:
trace="catentryId:1,2,3, 431[0-9]* ; attrId:1, 2,3 ; stageId: product-1a, product-1e"
- Option 2: Via Ingest Configuration (Site-Wide)
- Set the 'flow.trace.spec' property in the Ingest Configuration to apply
tracing to all
requests:
flow.trace.spec="catentryId:1,2,3,431[0-9]* ; attrId:1,2,3 ; stageId:product-1a,product-1e"
Above is an example of two field conditions along with a stage condition with the format as "field : list of comma-separated Ids" where the field is the supported field names listed below, and the id can be of an actual value or a regular expression pattern. To specify more than one field, use a semicolon ";" as the separator between two field conditions.
This configuration ensures all incoming data matching these criteria will be traced through the dataflow. To gather the traces from NiFi using Must-Gather, refer Using the Must-Gather application.
Supported Field Names in Trace Spec
attrIdcatentryIdcatgroupIdpageIdobjectId
- Attribute Stages:
attribute-1a,attribute-1b,attribute-1c - Catalog Stages:
catalog-1a,catalog-1b - Category Stages:
category-1a,category-1b,category-1c,category-1d,category-1e - Inventory Stages:
inventory-1a,inventory-1b - Page Stages:
page-1 - Price Stages:
price-1a,price-1b,price-1c - Product Stages:
product-1a,product-1b,product-1c,product-1e,product-1g,product-1h,product-1i - Store Stages:
store-1 - URL Stages:
url-1a,url-1b,url-1c,url-1d,url-1e,url-1f
flow.trace.setting = output=[file|index]For
example,filewrites trace logs tonifi-app.log.indexpushes trace logs to a searchable index (for example, Elasticsearch).
When logging to files, ensure your environment has a sufficient number of log history files to prevent old trace entries from being overwritten.
Log Structure and Lifecycle Keywords
Multiple keywords are utilized throughout the whole Request Level Trace (RLT) log file, and each of them indicates a certain phase of the ingest dataflow in NiFi:
| Keyword | Description |
|---|---|
| EXTRACT | When the ID is extracted from the SQL database |
| TRANSFORM | When the data is processed inside Java processors |
| LOAD | When the data enters the Bulk Service before going to Elasticsearch |
| ANALYZE | After receiving a response from Elasticsearch |
| RETRY | When the data enters the retry queue |
| SUCCESS | Indicates successful processing and flowfile will be dropped |
| FAILURE | Indicates an error, flowfile will be dropped and potential data loss |
Understanding the Dataflow Lifecycle
- EXTRACT (from database) > TRANSFORM > LOAD / RETRY > Elasticsearch > ANALYZE > SUCCESS / FAILURE
- The trace log should show you the matching ID given in the EXTRACT phase as well as how it appears after the TRANSFORM phase. As the flowfile containing this ID enters the Bulk Service, this ID will be displayed in the LOAD phase, along with the matching bulk ID. This bulk ID may then be used to follow the whole data lifecycle as it is loaded (or ingested) into Elasticsearch. This lifecycle should either end with SUCCESS or FAILURE. When neither of these are discovered for a particular ID, data loss may have occurred.