Configuring a Distributor Node
- From File: The node reads data from the path specified in the File Path setting.
- From Map: The map executes and produces batches based on the defined input card.
- Parameterization: To use a flow variable for the File Path, enclose the variable name in percent signs (for example, %my_data_file%).
Consuming Data from a Map
-
Set the Action property (Fetch As) to Burst.
-
This is supported by adapters including FILE, REST, and messaging adapters (for example, Kafka, JMS, and MQ).
-
The Fetch Unit controls the number of records per batch. If not set (default 0), the node will fetch all records.
- Map Batch Size: You can override the map’s fetch unit at runtime using the map_batch_size property, allowing for flexible tuning via flow variables.
Distributed Instances
- Parallel execution is limited by the Maximum Instances setting.
- If Maximum Instances exceeds the number of available REST
runtime instances, the parallel execution will be limited to the total number of
available REST instances. A flow that distributes batches does not count toward
the execution process limit for flow executions. However, the execution of
distributed batches for a single flow instance is restricted to one
batch per flow executor.
Example Scenario
Consider a system with 5 available executors and a Distributor node configured with the following parameters:
-
Source: A CSV file containing 1,000,000 records.
-
Maximum Instances: 5.
-
Batch Size: 100,000 records (resulting in 10 total batches).
Execution Logic:
-
Initial State: The Distributor node in the main flow generates 5 initial requests for distributed batches.
-
Concurrency: Since each executor can process only one request at a time, all 5 executors are immediately engaged.
-
Queueing: As each distributed batch completes, the main Distributor instance issues a new request. This cycle continues until all 10 distributed batches have been processed.
-