Compute Resource Allocation

CPU is a primary compute resource for our cloud-native services. Usage varies significantly across different pods, with several components (such as Rest and Executor) classified as performing heavy CPU operations, especially under load.

To ensure stability and predictable performance, we define specific CPU resource allocations for each pod. These are categorized as Minimum (Requests) and Recommended (Limits).

CPU Requests vs. Limits

It's crucial to understand the two main parameters for managing pod compute resources:

Minimum CPU (Request): This represents the requests.cpu value in the pod specification. It is the guaranteed amount of CPU reserved by the Kubernetes scheduler for the pod. A pod will not be scheduled on a node unless that node can guarantee this minimum amount. This is the baseline CPU required to simply run the pod and its application.
Recommended CPU (Limit): This represents the limits.cpu value. It is the maximum amount of CPU the pod is allowed to use. This higher value allows pods to burst (temporarily use more CPU) during peak traffic or processing-intensive tasks. Setting a limit prevents a single pod from consuming all available node resources, which could destabilize other applications (a "noisy neighbor" problem).

All values are expressed in millicores (m), where 1000m is equivalent to 1 vCPU core.

Pod CPU Requirements

The following table details the defined CPU requests and limits for each service pod:


Pod	Minimum CPU per pod (Request)	Recommended CPU per pod (Limit)
Client	300m	2000m
Server	1000m	2000m
Rest	1000m	4000m
Executor	1000m	4000m
Kafka-link	250m	2000m

Planning and Constraints

This resource model directly informs our cluster capacity planning and scaling strategy.

Limiting CPU Usage: CPU constraints are enforced at the container level using the Kubernetes requests and limits defined above. This provides a Burstable Quality of Service (QoS) class for our pods, allowing them to use more resources when needed (up to their limit) while guaranteeing a baseline level of performance.
Capacity Planning: Cluster-level planning must account for these values:
- Node Sizing: The total sum of all pod CPU requests on a single node cannot exceed the node's allocatable CPU capacity.
- Replica Scaling: When planning for horizontal scaling (i.e., increasing the number of pod replicas), the total required CPU is the pod's request multiplied by the number of desired replicas. Our cluster must have enough spare allocatable CPU to handle this scaling.
- Overcommit: While requests determine scheduling, limits determine resource contention. Our nodes must be provisioned to handle the aggregate limits without collapsing, or we must implement a monitoring and alerting strategy for high CPU utilization and throttling.