TalksAWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

AWS re:Invent 2025 - Scale AI agents with custom models using Amazon SageMaker AI & SGLang (AIM387)

Scaling AI Agents with Custom Models using Amazon SageMaker

Key Challenges in Building Agentic AI Applications

  • Lack of standardized tools to customize models with different techniques, leading to delays in time-to-market
  • Difficulty in finding the right instance and container configuration for cost-effective inference at scale
  • Fragmented observability across tools, making it hard to root cause and debug model behavior
  • Difficulty in productionizing experimental workflows into scalable, repeatable pipelines
  • Lack of governance practices to track, audit, and version models and generative AI assets

SageMaker Capabilities for Model Customization and Deployment

Model Training and Fine-tuning

  • Broad selection of pre-trained models, including open-source and Amazon-built models
  • Fine-tuning recipes for popular models and techniques, with support for both managed training jobs and persistent clusters
  • Automatic checkpointing and self-healing of training jobs to accelerate model development

Cost-Effective Inference at Scale

  • Deploy open-source or fine-tuned models on managed instances with a few steps
  • Host multiple models on a single endpoint for maximum GPU utilization
  • Leverage speculative decoding to achieve up to 2.5x higher throughput without compromising accuracy

Observability and Experimentation Tracking

  • Integrated MLflow support for logging experiments, metrics, and agent traces in a centralized location
  • Partner AI apps for additional model and agent monitoring capabilities
  • SageMaker Pipelines for converting experimental workflows into scalable, repeatable pipelines

Governance and Versioning

  • SageMaker Model Registry as a central hub to manage the entire lifecycle of machine learning models
  • Capture model metadata, training lineage, and compliance information for governance and auditing

Building Agentic Applications with SageMaker and Bedrock Agent Core

Deploying a Fine-tuned Clinical Agent

  • Use SageMaker to host a Llama-based model fine-tuned on a medical reasoning dataset
  • Leverage SageMaker Pipelines to orchestrate the end-to-end model customization and deployment workflow
  • Register the model in SageMaker Model Registry and MLflow Model Registry, including a model card for governance

Enhancing the Agent with Custom Tools

  • Implement a custom MCP (Multi-modal Conversation Protocol) server to provide patient search and medical report generation capabilities
  • Deploy the MCP server as an AWS Lambda function and register it with the Bedrock Agent Core gateway
  • Integrate the custom tools into the agent's workflow to provide a more comprehensive clinical assistant

Key Highlights of SG Lang for Scalable Inference

Performance Optimizations

  • Hierarchical KV cache leveraging GPU, CPU, and remote memory for improved latency and throughput
  • Speculative decoding v2 with better CPU-GPU overlap for up to 2.5x throughput gains
  • Pre-fetch and decode disaggregation, and wide expert parallelism for large-scale deployment on Petaflop-scale hardware

Multimodal Capabilities

  • Support for image and video generation, in addition to language models
  • Combination of autoregressive language models and diffusion models for multimodal input and output

Business Impact and Real-World Applications

  • Rapid adoption of agentic AI in enterprise software, expected to grow 33x from 2024 to 2028
  • Customizable, cost-effective, and high-quality AI agents can drive significant productivity gains in industries like healthcare, customer service, and knowledge work
  • SageMaker and Bedrock Agent Core provide a comprehensive platform to build, deploy, and manage scalable agentic AI applications, addressing key challenges around model customization, inference optimization, observability, and governance

Conclusion

The presented solutions leverage the capabilities of Amazon SageMaker and Bedrock Agent Core to enable data scientists and AI developers to build and deploy high-quality, cost-effective agentic AI applications at scale. By addressing the key challenges around model customization, inference optimization, observability, and governance, these tools empower organizations to accelerate the adoption of agentic AI and unlock significant productivity gains across various industries.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.