TalksAWS re:Invent 2025 - How Samsung Uses AI Agents to Optimize Infrastructure at Scale (AIM238)

AWS re:Invent 2025 - How Samsung Uses AI Agents to Optimize Infrastructure at Scale (AIM238)

Summary of "How Samsung Uses AI Agents to Optimize Infrastructure at Scale"

Introduction to Samsung Electronics

  • Samsung Electronics is a global technology company founded in 1969, with over 260,000 employees worldwide
  • Key products include mobile phones, display panels, home appliances, and semiconductors
  • The company has R&D centers focused on localization and specialized features like AI

Challenges Faced by Samsung

  • Operating massive-scale Kubernetes infrastructure across multiple cloud providers (AWS, Google Cloud, Azure, Samsung Cloud)
  • Complexity of managing infrastructure as code and security policies for 50+ applications
  • Over 1,000 EKS clusters and tens of thousands of VM nodes running a mix of CPU and GPU workloads
  • Need to simplify operational requirements for training and inference teams
  • Desire to optimize costs beyond just reserved instances and savings plans

Samsung's Solution: Agent AI and Cast AI

  • Samsung has an ongoing project called "Agent AI" to manage infrastructure, security, change management, monitoring, and GPU/CPU workload integration
  • Cast AI is a key component of the Agent AI system, providing AI-driven automation and optimization of the Kubernetes infrastructure

Key Technical Solutions

  1. Automation Driven by AI: Samsung's AI agents continuously optimize the cloud infrastructure in real-time
  2. Cast AI Integration: Leveraging Cast AI's intelligent automation features to scale quickly, right-size workloads, and manage spot instances
  3. Bin-Packing Optimizations: Achieving over 30% cost savings by automating bin-packing and spot instance utilization

Benefits and Impact

  1. Reduced Operational Overhead: AI-driven automation streamlines infrastructure management, reducing manual work and toil
  2. Improved Application Efficiency: Real-time optimization and right-sizing of resources leads to better application performance
  3. Significant Cost Savings: Combining AI automation, smart bin-packing, and spot instance usage results in substantial cost reductions across Kubernetes environments

Future Outlook

  1. AI-Driven Autonomy: Moving towards fully autonomous cloud operations with minimal human intervention, especially for mission-critical projects
  2. Industry Standard Benchmarking: Defining new benchmarks for efficiency and scalability, such as GPU processing capabilities
  3. Driving Cost-Effective, AI-Powered Cloud Operations: Expanding AI automation across CPU and GPU workloads for unified optimization
  4. Delivering Consistent Optimization Across Diverse Environments: Simplifying multi-cloud architecture with Cast AI as a central component

Key Takeaways

  • Samsung is leveraging AI-driven automation and optimization through its Agent AI and Cast AI solution to manage its massive Kubernetes infrastructure across multiple cloud providers
  • The solution has delivered significant benefits, including reduced operational overhead, improved application efficiency, and substantial cost savings
  • Samsung is focused on further advancing towards fully autonomous cloud operations, industry-leading benchmarks, and unified optimization across CPU and GPU workloads in a multi-cloud environment

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.