Scaling

This section describes the scaling capabilities and limitations of HCL Link components defined in the Helm chart, focusing on replica management, downtime impact, and autoscaling.

Specifying Replicas for the Executor Service

The executor service is the only HCL Link component that supports horizontal scaling (multiple replicas). This allows parallel processing of large data transformation workloads.

To set a fixed number of replicas, update the executor.replicas value in the values.yaml file.
# values.yaml
executor:
  ...
  # Set a fixed number of pods for the executor service
  replicas: 1
  ...
For example, to increase the number of executor pods to three, update the value as follows:
executor:
  replicas: 3

Applying this change with helm upgrade adds two executor pods, bringing the total to three.

Implications of a Single-Replica Architecture

The Client, Server, and Rest components run as single-replica services. They do not include a replicas or autoscaling section in values.yaml. This architecture affects availability and maintenance as follows:

  • No High Availability (HA): If the single pod for the Client, Server, or REST component fails its health check or the node fails, the service becomes unavailable. Kubernetes will restore availability only after restarting the pod on a healthy node.
  • Guaranteed Upgrade/Rollback Downtime: Helm upgrades or rollbacks use a RollingUpdate strategy. For a single-replica pod:
    1. Kubernetes terminates the old pod.

    2. After termination, it creates the new pod.

    3. The service is unavailable while the new pod starts, initializes, and passes readiness probes.

This downtime is unavoidable because the Server and REST pods use persistent volumes with ReadWriteOnce (RWO) access mode. An RWO volume can be mounted to only one pod at a time. The new pod cannot start until the old pod terminates and releases the volume.

Vertical vs. Horizontal Scaling

There are two primary methods of scaling:

Vertical Scaling (Scaling Up)

Definition: Increasing the resources (CPU or memory) of a single pod.

How to: Update resources.requests and resources.limits values in your values.yaml for a component.
# Example: Vertically scaling the 'server' from 8Gi to 12Gi
server:
  resources:
    requests:
      memory: 12Gi
    limits:
      memory: 12Gi

Use Case: Used when a single pod is hitting its memory or CPU limit. All HCL Link components (client, server, rest, executor) can be vertically scaled.

Horizontal Scaling (Scaling Out)

Definition: Increase the number of pods (replicas) to balance the load.

How to: Update executor.replicas in values.yaml.

Use Case: Use for higher concurrent requests or tasks. Only the executor service supports horizontal scaling.

Horizontal Pod Autoscaling (HPA) for Executor Pods

Enable Horizontal Pod Autoscaler (HPA) to automatically scale executor pods based on load. Configure it under executor.autoscaling in values.yaml.
# values.yaml
executor:
  ...
  # This value is ignored if autoscaling.enabled is true
  replicas: 1
  ...
  autoscaling:
    enabled: false  # <-- Set this to true to enable HPA
    minReplicas: 1
    maxReplicas: 10
    resources:
      cpu:
        enabled: true
        averageUtilization: 80
      memory:
        enabled: true
        averageUtilization: 80
How to Enable and Configure HPA
  1. Set executor.autoscaling.enabled: true.

  2. Define the minReplicas and maxReplicas to set the boundaries for the autoscaler.

Configure the metrics to monitor. The values.yaml is set to monitor both CPU and memory.

How it Works

The HPA controller monitors the resource utilization of all executor pods and compares it to the requests defined in executor.resources.

  • executor.resources.requests.cpu: 4000m

  • executor.resources.requests.memory: 8Gi

  • executor.autoscaling.resources.cpu.averageUtilization: 80

In this example, if the average CPU usage across all executor pods exceeds 80% of the requested 4000m (i.e., 3200m), the HPA will start creating new pods. It will continue to add pods (up to maxReplicas: 10) until the average utilization drops back to or below the 80% target.

If the load decreases, the HPA will terminate pods (down to minReplicas: 1) to conserve resources. The same logic applies to the memory target.

Crucial Prerequisite: For HPA to work effectively, the executor.resources.requests must be set correctly. If the request value is too low, the HPA will scale up too aggressively. If it's too high, it may never scale at all.