TalksAWS re:Invent 2025 - How Samsung Uses AI Agents to Optimize Infrastructure at Scale (AIM238)
AWS re:Invent 2025 - How Samsung Uses AI Agents to Optimize Infrastructure at Scale (AIM238)
Summary of "How Samsung Uses AI Agents to Optimize Infrastructure at Scale"
Introduction to Samsung Electronics
Samsung Electronics is a global technology company founded in 1969, with over 260,000 employees worldwide
Key products include mobile phones, display panels, home appliances, and semiconductors
The company has R&D centers focused on localization and specialized features like AI
Challenges Faced by Samsung
Operating massive-scale Kubernetes infrastructure across multiple cloud providers (AWS, Google Cloud, Azure, Samsung Cloud)
Complexity of managing infrastructure as code and security policies for 50+ applications
Over 1,000 EKS clusters and tens of thousands of VM nodes running a mix of CPU and GPU workloads
Need to simplify operational requirements for training and inference teams
Desire to optimize costs beyond just reserved instances and savings plans
Samsung's Solution: Agent AI and Cast AI
Samsung has an ongoing project called "Agent AI" to manage infrastructure, security, change management, monitoring, and GPU/CPU workload integration
Cast AI is a key component of the Agent AI system, providing AI-driven automation and optimization of the Kubernetes infrastructure
Key Technical Solutions
Automation Driven by AI: Samsung's AI agents continuously optimize the cloud infrastructure in real-time
Cast AI Integration: Leveraging Cast AI's intelligent automation features to scale quickly, right-size workloads, and manage spot instances
Bin-Packing Optimizations: Achieving over 30% cost savings by automating bin-packing and spot instance utilization
Benefits and Impact
Reduced Operational Overhead: AI-driven automation streamlines infrastructure management, reducing manual work and toil
Improved Application Efficiency: Real-time optimization and right-sizing of resources leads to better application performance
Significant Cost Savings: Combining AI automation, smart bin-packing, and spot instance usage results in substantial cost reductions across Kubernetes environments
Future Outlook
AI-Driven Autonomy: Moving towards fully autonomous cloud operations with minimal human intervention, especially for mission-critical projects
Industry Standard Benchmarking: Defining new benchmarks for efficiency and scalability, such as GPU processing capabilities
Driving Cost-Effective, AI-Powered Cloud Operations: Expanding AI automation across CPU and GPU workloads for unified optimization
Delivering Consistent Optimization Across Diverse Environments: Simplifying multi-cloud architecture with Cast AI as a central component
Key Takeaways
Samsung is leveraging AI-driven automation and optimization through its Agent AI and Cast AI solution to manage its massive Kubernetes infrastructure across multiple cloud providers
The solution has delivered significant benefits, including reduced operational overhead, improved application efficiency, and substantial cost savings
Samsung is focused on further advancing towards fully autonomous cloud operations, industry-leading benchmarks, and unified optimization across CPU and GPU workloads in a multi-cloud environment
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.