Configuring a Distributor Node

Data Consumption Methods
  • From File: The node reads data from the path specified in the File Path setting.
  • From Map: The map executes and produces batches based on the defined input card.
  • Parameterization: To use a flow variable for the File Path, enclose the variable name in percent signs (for example, %my_data_file%).

Consuming Data from a Map

If a map provides the data, it must run in Burst Mode. This is configured in the map input card settings:
  • Set the Action property (Fetch As) to Burst.

  • This is supported by adapters including FILE, REST, and messaging adapters (for example, Kafka, JMS, and MQ).

  • The Fetch Unit controls the number of records per batch. If not set (default 0), the node will fetch all records.

  • Map Batch Size: You can override the map’s fetch unit at runtime using the map_batch_size property, allowing for flexible tuning via flow variables.

Distributed Instances

The Distributor node splits data into batches. Each batch is processed by downstream nodes as a separate flow instance. These instances can run on the same REST runtime or on other available REST runtime instances.
  • Parallel execution is limited by the Maximum Instances setting.
  • If Maximum Instances exceeds the number of available REST runtime instances, the parallel execution will be limited to the total number of available REST instances. A flow that distributes batches does not count toward the execution process limit for flow executions. However, the execution of distributed batches for a single flow instance is restricted to one batch per flow executor.

    Example Scenario

    Consider a system with 5 available executors and a Distributor node configured with the following parameters:

    • Source: A CSV file containing 1,000,000 records.

    • Maximum Instances: 5.

    • Batch Size: 100,000 records (resulting in 10 total batches).

    Execution Logic:

    1. Initial State: The Distributor node in the main flow generates 5 initial requests for distributed batches.

    2. Concurrency: Since each executor can process only one request at a time, all 5 executors are immediately engaged.

    3. Queueing: As each distributed batch completes, the main Distributor instance issues a new request. This cycle continues until all 10 distributed batches have been processed.

New File Adapter Features

When the Distributor node splits a large CSV document, it would be inefficient to create a file for each batch and use it across distributed instances. Instead, the Distributor node identifies the start offset and data length and provides that information as part of the payload for the distributed instances. All distributed instances consume the same file but read different portions of data based on the specified offset and data size.

If the Map node is the first node that consumes the data in a distributed instance, it uses the legacy file adapter to read the data. The file adapter has been enhanced to support two additional properties: offset and data size.

Initialization Flow and Status Flow

If the flow includes an initialization flow, it is executed only on the main instance before the main flow begins distributing work. Distributed instances do not execute the initialization or status flows.

Similarly, the status flow is executed only once when the flow finishes and is not used by distributed instances.

Flow Variables

Flow variables available in the Distribute node—whether passed directly to the flow or generated during the initialization flow—are propagated to distributed instances through the flow run payload. Updates to variables made within distributed instances are not shared with other instances. If a shared state is required, global cache variables must be used.