Distributor Node
The Distributor Node is used to horizontally scale single, large transactions. It splits data records into batches, which are then processed in parallel by downstream nodes. This workload is distributed across multiple machines or pods, data for each output flow terminal is aggregated in distribution-order by the distributor node. The adapter for the output flow terminal acts on the aggregated data.
Use Cases and Scaling:
A typical use case for the Distributor node is in a Kubernetes environment where multiple pods are available to execute REST endpoints, but each pod has limited CPU power.
While a Split Node provides vertical scaling within a single pod, the pod's CPU limit defines the ceiling of that scaling. To move beyond this limit, a Distributor Node can be placed before the Split node to distribute the payload across multiple pods (horizontal scaling). Each individual pod then uses a Split node to achieve vertical scaling locally.
Additionally, distributing data across multiple pods optimizes memory usage by improving heap management and limiting memory fragmentation. Kubernetes can automatically scale by adding and removing REST pods as CPU or memory utilization increases and decreases, respectively.
Technical Specifications:
- Flow Position: The Distributor node must be the first node in a flow. Consequently, it does not have input terminals.
- Data Sources: The node consumes data from either a Map or directly from a File.
- Dynamic Configuration: File names and batch sizes can be dynamically assigned using flow variables.