TalksAWS re:Invent 2025 -Building Fast, Cost-Efficient, Sovereign Inference Platforms on AWS w/Intel CPUs
AWS re:Invent 2025 -Building Fast, Cost-Efficient, Sovereign Inference Platforms on AWS w/Intel CPUs
AWS re:Invent 2025 - Building Fast, Cost-Efficient, Sovereign Inference Platforms on AWS with Intel CPUs
Intel and AWS Partnership
The Intel and AWS partnership dates back to the launch of Amazon EC2 in 2006, spanning over 18 years.
The collaboration runs deep, with Intel providing the hardware infrastructure and software optimization to power AWS services.
Intel has over 400 instances of EC2 running on Intel processors, providing a wide breadth of compute and technology.
Beyond just hardware, Intel also focuses on software optimization and enabling the ecosystem to run AI workloads efficiently on Intel platforms.
Optimizing AI Inference on Intel CPUs
While AI has been heavily focused on training large models, the next phase will be about enterprise-scale inference workloads.
Intel believes inference workloads will grow significantly, and that they can run efficiently on Intel Xeon CPUs, not just GPUs.
The new 8th generation Intel Xeon 6 processor, custom-built for AWS, offers up to 20% performance improvement over previous generations for diverse workloads.
This processor includes advanced matrix extensions (AMX) that accelerate matrix multiplication, enabling faster inference and training.
Flexible and Cost-Effective Inference on EC2
AWS has launched the new EC2 8i instances powered by the custom Intel Xeon 6 processor across the C, R, and M instance families.
These instances offer flexible configurations through the "Flex" variant, allowing customers to optimize network and storage performance based on their workload needs.
The Flex instances provide a cost-effective alternative to standard instances, delivering price-performance benefits.
Sovereign Inference Platforms with Intel and AWS
Customers like Deote are leveraging Intel-powered EC2 instances to build secure, cost-efficient, and scalable inference platforms.
Deote's approach involves running large language models (LLMs) and small language models (SLMs) on CPU-based EC2 instances, achieving up to 56% reduction in infrastructure costs compared to GPU-based instances.
By using compressed models and Intel's software optimizations, Deote was able to achieve near-parity in accuracy compared to uncompressed models running on GPUs.
This enables secure, on-premises inference within customer VPCs, addressing data sovereignty and security concerns.
AI Innovation with Articulate
Articulate is a platform that helps enterprises convert complex data into personalized insights and outcomes.
Articulate's platform leverages a combination of traditional machine learning models, small language models, and large language models, orchestrated by an intelligent model routing system.
The platform comes with a library of domain-specific and task-specific models, optimized to perform better than general-purpose models on specialized use cases.
Articulate's models are also optimized for cost-efficient inference on Intel Xeon processors, providing both performance and cost benefits.
Key Takeaways
Intel and AWS have a long-standing partnership, collaborating to develop custom hardware and software solutions to power enterprise workloads.
The new Intel Xeon 6 processor, custom-built for AWS, offers significant performance improvements for inference and general compute workloads.
Customers can leverage Intel-powered EC2 instances to build secure, cost-efficient, and scalable inference platforms, addressing data sovereignty and security concerns.
Specialized AI platforms like Articulate leverage Intel's hardware and software optimizations to deliver domain-specific and task-specific models that outperform general-purpose models.
The combination of Intel's hardware, software, and ecosystem enables enterprises to unlock the full potential of AI inference at scale.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.