TalksAWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

AWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

Scaling AI Agents with Custom Models using Amazon SageMaker

Key Challenges in Building Agentic AI Applications

Lack of standardized tools to customize models with different techniques, leading to delays in time-to-market
Difficulty in finding the right instance and container configuration for cost-effective inference at scale
Fragmented observability across tools, making it hard to root cause and debug model behavior
Difficulty in productionizing experimental workflows into scalable, repeatable pipelines
Lack of governance practices to track, audit, and version models and generative AI assets

SageMaker Capabilities for Model Customization and Deployment

Model Training and Fine-tuning

Broad selection of pre-trained models, including open-source and Amazon-built models
Fine-tuning recipes for popular models and techniques, with support for both managed training jobs and persistent clusters
Automatic checkpointing and self-healing of training jobs to accelerate model development

Cost-Effective Inference at Scale

Deploy open-source or fine-tuned models on managed instances with a few steps
Host multiple models on a single endpoint for maximum GPU utilization
Leverage speculative decoding to achieve up to 2.5x higher throughput without compromising accuracy

Observability and Experimentation Tracking

Integrated MLflow support for logging experiments, metrics, and agent traces in a centralized location
Partner AI apps for additional model and agent monitoring capabilities
SageMaker Pipelines for converting experimental workflows into scalable, repeatable pipelines

Governance and Versioning

SageMaker Model Registry as a central hub to manage the entire lifecycle of machine learning models
Capture model metadata, training lineage, and compliance information for governance and auditing

Building Agentic Applications with SageMaker and Bedrock Agent Core

Deploying a Fine-tuned Clinical Agent

Use SageMaker to host a Llama-based model fine-tuned on a medical reasoning dataset
Leverage SageMaker Pipelines to orchestrate the end-to-end model customization and deployment workflow
Register the model in SageMaker Model Registry and MLflow Model Registry, including a model card for governance

Enhancing the Agent with Custom Tools

Implement a custom MCP (Multi-modal Conversation Protocol) server to provide patient search and medical report generation capabilities
Deploy the MCP server as an AWS Lambda function and register it with the Bedrock Agent Core gateway
Integrate the custom tools into the agent's workflow to provide a more comprehensive clinical assistant

Key Highlights of SG Lang for Scalable Inference

Performance Optimizations

Hierarchical KV cache leveraging GPU, CPU, and remote memory for improved latency and throughput
Speculative decoding v2 with better CPU-GPU overlap for up to 2.5x throughput gains
Pre-fetch and decode disaggregation, and wide expert parallelism for large-scale deployment on Petaflop-scale hardware

Multimodal Capabilities

Support for image and video generation, in addition to language models
Combination of autoregressive language models and diffusion models for multimodal input and output

Business Impact and Real-World Applications

Rapid adoption of agentic AI in enterprise software, expected to grow 33x from 2024 to 2028
Customizable, cost-effective, and high-quality AI agents can drive significant productivity gains in industries like healthcare, customer service, and knowledge work
SageMaker and Bedrock Agent Core provide a comprehensive platform to build, deploy, and manage scalable agentic AI applications, addressing key challenges around model customization, inference optimization, observability, and governance

Conclusion

The presented solutions leverage the capabilities of Amazon SageMaker and Bedrock Agent Core to enable data scientists and AI developers to build and deploy high-quality, cost-effective agentic AI applications at scale. By addressing the key challenges around model customization, inference optimization, observability, and governance, these tools empower organizations to accelerate the adoption of agentic AI and unlock significant productivity gains across various industries.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

Scaling AI Agents with Custom Models using Amazon SageMaker

Key Challenges in Building Agentic AI Applications

SageMaker Capabilities for Model Customization and Deployment

Model Training and Fine-tuning

Cost-Effective Inference at Scale

Observability and Experimentation Tracking

Governance and Versioning

Building Agentic Applications with SageMaker and Bedrock Agent Core

Deploying a Fine-tuned Clinical Agent

Enhancing the Agent with Custom Tools

Key Highlights of SG Lang for Scalable Inference

Performance Optimizations

Multimodal Capabilities

Business Impact and Real-World Applications

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

Scaling AI Agents with Custom Models using Amazon SageMaker

Key Challenges in Building Agentic AI Applications

SageMaker Capabilities for Model Customization and Deployment

Model Training and Fine-tuning

Cost-Effective Inference at Scale

Observability and Experimentation Tracking

Governance and Versioning

Building Agentic Applications with SageMaker and Bedrock Agent Core

Deploying a Fine-tuned Clinical Agent

Enhancing the Agent with Custom Tools

Key Highlights of SG Lang for Scalable Inference

Performance Optimizations

Multimodal Capabilities

Business Impact and Real-World Applications

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.