TalksAWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)
AWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)
Scaling AI Agents with Custom Models using Amazon SageMaker
Key Challenges in Building Agentic AI Applications
Lack of standardized tools to customize models with different techniques, leading to delays in time-to-market
Difficulty in finding the right instance and container configuration for cost-effective inference at scale
Fragmented observability across tools, making it hard to root cause and debug model behavior
Difficulty in productionizing experimental workflows into scalable, repeatable pipelines
Lack of governance practices to track, audit, and version models and generative AI assets
SageMaker Capabilities for Model Customization and Deployment
Model Training and Fine-tuning
Broad selection of pre-trained models, including open-source and Amazon-built models
Fine-tuning recipes for popular models and techniques, with support for both managed training jobs and persistent clusters
Automatic checkpointing and self-healing of training jobs to accelerate model development
Cost-Effective Inference at Scale
Deploy open-source or fine-tuned models on managed instances with a few steps
Host multiple models on a single endpoint for maximum GPU utilization
Leverage speculative decoding to achieve up to 2.5x higher throughput without compromising accuracy
Observability and Experimentation Tracking
Integrated MLflow support for logging experiments, metrics, and agent traces in a centralized location
Partner AI apps for additional model and agent monitoring capabilities
SageMaker Pipelines for converting experimental workflows into scalable, repeatable pipelines
Governance and Versioning
SageMaker Model Registry as a central hub to manage the entire lifecycle of machine learning models
Capture model metadata, training lineage, and compliance information for governance and auditing
Building Agentic Applications with SageMaker and Bedrock Agent Core
Deploying a Fine-tuned Clinical Agent
Use SageMaker to host a Llama-based model fine-tuned on a medical reasoning dataset
Leverage SageMaker Pipelines to orchestrate the end-to-end model customization and deployment workflow
Register the model in SageMaker Model Registry and MLflow Model Registry, including a model card for governance
Enhancing the Agent with Custom Tools
Implement a custom MCP (Multi-modal Conversation Protocol) server to provide patient search and medical report generation capabilities
Deploy the MCP server as an AWS Lambda function and register it with the Bedrock Agent Core gateway
Integrate the custom tools into the agent's workflow to provide a more comprehensive clinical assistant
Key Highlights of SG Lang for Scalable Inference
Performance Optimizations
Hierarchical KV cache leveraging GPU, CPU, and remote memory for improved latency and throughput
Speculative decoding v2 with better CPU-GPU overlap for up to 2.5x throughput gains
Pre-fetch and decode disaggregation, and wide expert parallelism for large-scale deployment on Petaflop-scale hardware
Multimodal Capabilities
Support for image and video generation, in addition to language models
Combination of autoregressive language models and diffusion models for multimodal input and output
Business Impact and Real-World Applications
Rapid adoption of agentic AI in enterprise software, expected to grow 33x from 2024 to 2028
Customizable, cost-effective, and high-quality AI agents can drive significant productivity gains in industries like healthcare, customer service, and knowledge work
SageMaker and Bedrock Agent Core provide a comprehensive platform to build, deploy, and manage scalable agentic AI applications, addressing key challenges around model customization, inference optimization, observability, and governance
Conclusion
The presented solutions leverage the capabilities of Amazon SageMaker and Bedrock Agent Core to enable data scientists and AI developers to build and deploy high-quality, cost-effective agentic AI applications at scale. By addressing the key challenges around model customization, inference optimization, observability, and governance, these tools empower organizations to accelerate the adoption of agentic AI and unlock significant productivity gains across various industries.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.