Scaling
This section describes the scaling capabilities and limitations of HCL Link components defined in the Helm chart, focusing on replica management, downtime impact, and autoscaling.
Specifying Replicas for the Executor Service
The executor service is the only HCL Link component that supports horizontal scaling (multiple replicas). This allows parallel processing of large data transformation workloads.
# values.yaml
executor:
...
# Set a fixed number of pods for the executor service
replicas: 1
...executor:
replicas: 3Applying this change with helm upgrade adds two executor pods, bringing the total to three.
Implications of a Single-Replica Architecture
The Client, Server, and Rest components run as single-replica services. They do not include a replicas or autoscaling section in values.yaml. This architecture affects availability and maintenance as follows:
- No High Availability (HA): If the single pod for the Client, Server, or REST component fails its health check or the node fails, the service becomes unavailable. Kubernetes will restore availability only after restarting the pod on a healthy node.
-
Guaranteed Upgrade/Rollback Downtime: Helm upgrades or rollbacks use a RollingUpdate strategy. For a single-replica pod:
-
Kubernetes terminates the old pod.
-
After termination, it creates the new pod.
-
The service is unavailable while the new pod starts, initializes, and passes readiness probes.
-
This downtime is unavoidable because the Server and REST pods use persistent volumes with ReadWriteOnce (RWO) access mode. An RWO volume can be mounted to only one pod at a time. The new pod cannot start until the old pod terminates and releases the volume.
Vertical vs. Horizontal Scaling
There are two primary methods of scaling:
Vertical Scaling (Scaling Up)
Definition: Increasing the resources (CPU or memory) of a single pod.
resources.requests and
resources.limits values in your
values.yaml for a
component.# Example: Vertically scaling the 'server' from 8Gi to 12Gi
server:
resources:
requests:
memory: 12Gi
limits:
memory: 12GiUse Case: Used when a single pod is hitting its memory or CPU limit. All HCL Link components (client, server, rest, executor) can be vertically scaled.
Horizontal Scaling (Scaling Out)
Definition: Increase the number of pods (replicas) to balance the load.
How to: Update executor.replicas in values.yaml.
Use Case: Use for higher concurrent requests or tasks. Only the executor service supports horizontal scaling.
Horizontal Pod Autoscaling (HPA) for Executor Pods
# values.yaml
executor:
...
# This value is ignored if autoscaling.enabled is true
replicas: 1
...
autoscaling:
enabled: false # <-- Set this to true to enable HPA
minReplicas: 1
maxReplicas: 10
resources:
cpu:
enabled: true
averageUtilization: 80
memory:
enabled: true
averageUtilization: 80-
Set executor.autoscaling.enabled: true.
-
Define the minReplicas and maxReplicas to set the boundaries for the autoscaler.
How it Works
The HPA controller monitors the resource utilization of all executor pods and compares it to the requests defined in executor.resources.
-
executor.resources.requests.cpu: 4000m
-
executor.resources.requests.memory: 8Gi
-
executor.autoscaling.resources.cpu.averageUtilization: 80
In this example, if the average CPU usage across all executor pods exceeds 80% of the requested 4000m (i.e., 3200m), the HPA will start creating new pods. It will continue to add pods (up to maxReplicas: 10) until the average utilization drops back to or below the 80% target.
If the load decreases, the HPA will terminate pods (down to minReplicas: 1) to conserve resources. The same logic applies to the memory target.
Crucial Prerequisite: For HPA to work effectively, the executor.resources.requests must be set correctly. If the request value is too low, the HPA will scale up too aggressively. If it's too high, it may never scale at all.