AWS re:Invent 2025 - Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance (COP369)

Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance

The Impossible Triangle: Reliability, Performance, and Cost

As Kubernetes clusters scale, it becomes increasingly difficult to achieve all three of the key objectives: reliability, performance, and cost-efficiency.

With a small number of pods (e.g. 10), it's easy to manage these three pillars. However, as the cluster scales to thousands of pods, waste and inefficiencies start to emerge.

The core challenge is that developers often request more resources than their applications actually need, leading to "phantom waste" and node fragmentation.

Fundamental Resource Management Concepts

Requests and Limits: Requests define the minimum resources required, while Limits set the maximum. Requests are crucial for scheduling, while Limits impact eviction logic.

Quality of Service (QoS) Classes: Kubernetes has three QoS classes - Best Effort, Burstable, and Guaranteed - which determine eviction priority during resource contention.

Kubernetes vs. Linux: Kubernetes uses millicores for CPU requests, while Linux uses time slices. This can lead to performance issues when nodes become heavily utilized.

Cgroups: Linux control groups (cgroups) enforce the resource boundaries defined in the pod spec, using shares, weights, periods, and quotas.

Scheduler and Kubelet: The Kubernetes scheduler places pods on nodes, while the Kubelet enforces QoS and eviction on each node.

Challenges with Scaling and Optimization

Example scenario: Four pods, each requesting 1 CPU core. When one pod becomes busy, it can consume the full CPU time of all four cores, leading to performance degradation in production.

Saturation Threshold: There is a sweet spot where resources are fully utilized without causing performance issues. Underprovisioning or overprovisioning can both lead to problems.

Predictability and Consistency: Varying CPU time allocations across environments can make it difficult to achieve consistent performance.

Limitations of Larger Nodes: Increasing node size can lead to overpacking and higher saturation levels.

Addressing Resource Management Challenges

Kubernetes 1.33 introduced in-place updates for pod resources, making it easier to adjust settings.

Pod-level resource constraints can help with sidecar and init containers.

GPU resource management: Options like Nvidia MIG and MPS allow for GPU time slicing and fractional allocation.

Scaling Dimensions and Conflicts

Vertical Pod Autoscaler (VPA): Relies on historical data, making it hard to react to sudden changes or bursty workloads.

Horizontal Pod Autoscaler (HPA): Can scale based on CPU, memory, or custom metrics, but can experience thrashing due to changes in resource requests.

Node Scaling: Adopting spot instances introduces challenges around maintaining desired pod placement ratios.

Towards a Proactive, Coordinated Approach

Limitations of Reactive Scaling: HPA and VPA can have race conditions and lead to unnecessary thrashing.

Need for Predictive Scaling: Anticipating traffic patterns and warming up replicas in advance can improve responsiveness and efficiency.

Custom Resources and Operators: Provide a way to define and reconcile custom scaling policies across the entire cluster.

Coordinating Scaling Dimensions: Integrating VPA, HPA, and node scaling into a cohesive, self-healing system is crucial for large-scale Kubernetes deployments.

Scaleops: A Comprehensive Solution

Scaleops is a platform that addresses the challenges of resource management and scaling in large-scale Kubernetes environments.

Key capabilities include:

Context-aware, workload-specific scaling policies
Predictive scaling to anticipate and respond to changes
Coordinated management of VPA, HPA, and node scaling
Automated healing and reaction to bursts or failures

Conclusion

Kubernetes resource management at scale requires a comprehensive, coordinated approach that goes beyond reactive scaling.

Predictive scaling, custom resource management, and integrating multiple scaling dimensions are crucial for achieving reliability, performance, and cost-efficiency.

Solutions like Scaleops can help enterprises overcome the challenges of complex, large-scale Kubernetes deployments.

AWS re:Invent 2025 - Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance (COP369)

Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance

The Impossible Triangle: Reliability, Performance, and Cost

Fundamental Resource Management Concepts

Challenges with Scaling and Optimization

Addressing Resource Management Challenges

Scaling Dimensions and Conflicts

Towards a Proactive, Coordinated Approach

Scaleops: A Comprehensive Solution

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance (COP369)

Beyond Reactive Scaling: Optimizing Amazon EKS Cost and Performance

The Impossible Triangle: Reliability, Performance, and Cost

Fundamental Resource Management Concepts

Challenges with Scaling and Optimization

Addressing Resource Management Challenges

Scaling Dimensions and Conflicts

Towards a Proactive, Coordinated Approach

Scaleops: A Comprehensive Solution

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.