Compute Resource Allocation
CPU is a primary compute resource for our cloud-native services. Usage varies significantly across different pods, with several components (such as Rest and Executor) classified as performing heavy CPU operations, especially under load.
To ensure stability and predictable performance, we define specific CPU resource allocations for each pod. These are categorized as Minimum (Requests) and Recommended (Limits).
CPU Requests vs. Limits
It's crucial to understand the two main parameters for managing pod compute
resources:
- Minimum CPU (Request): This represents the requests.cpu value in the pod specification. It is the guaranteed amount of CPU reserved by the Kubernetes scheduler for the pod. A pod will not be scheduled on a node unless that node can guarantee this minimum amount. This is the baseline CPU required to simply run the pod and its application.
- Recommended CPU (Limit): This represents the limits.cpu value. It is the maximum amount of CPU the pod is allowed to use. This higher value allows pods to burst (temporarily use more CPU) during peak traffic or processing-intensive tasks. Setting a limit prevents a single pod from consuming all available node resources, which could destabilize other applications (a "noisy neighbor" problem).
All values are expressed in millicores (m), where 1000m is equivalent to 1 vCPU core.
Pod CPU Requirements
The following table details the defined CPU requests and limits for each service
pod:
| Pod | Minimum CPU per pod (Request) | Recommended CPU per pod (Limit) |
|---|---|---|
| Client | 300m | 2000m |
| Server | 1000m | 2000m |
| Rest | 1000m | 4000m |
| Executor | 1000m | 4000m |
| Kafka-link | 250m | 2000m |
Planning and Constraints
This resource model directly informs our cluster capacity planning and scaling
strategy.
- Limiting CPU Usage: CPU constraints are enforced at the container level using the Kubernetes requests and limits defined above. This provides a Burstable Quality of Service (QoS) class for our pods, allowing them to use more resources when needed (up to their limit) while guaranteeing a baseline level of performance.
- Capacity Planning: Cluster-level planning must
account for these values:
- Node Sizing: The total sum of all pod CPU requests on a single node cannot exceed the node's allocatable CPU capacity.
- Replica Scaling: When planning for horizontal scaling (i.e., increasing the number of pod replicas), the total required CPU is the pod's request multiplied by the number of desired replicas. Our cluster must have enough spare allocatable CPU to handle this scaling.
- Overcommit: While requests determine scheduling, limits determine resource contention. Our nodes must be provisioned to handle the aggregate limits without collapsing, or we must implement a monitoring and alerting strategy for high CPU utilization and throttling.