TalksAWS re:Invent 2025 - Customize models for agentic AI at scale with SageMaker AI and Bedrock (AIM381)

AWS re:Invent 2025 - Customize models for agentic AI at scale with SageMaker AI and Bedrock (AIM381)

Customizing Models for Agentic AI at Scale with SageMaker

Overview

  • The presentation covered new capabilities in AWS SageMaker to enable data scientists and AI developers to customize models and deploy them at scale for building high-quality, cost-effective agentic AI applications.
  • Key trends highlighted include the rapid adoption of agentic AI in enterprise software (expected 33x increase from 2024 to 2028) and the need for autonomous decision-making by AI agents, requiring scalable compute and customizable models.
  • However, the majority of agentic AI applications fail to make it to production due to challenges around model customization, observability, asset management, and cost-effective inference.

Serverless Model Customization

  • SageMaker now offers a serverless model customization experience, providing:
    • Access to a broad choice of foundation models that can be customized using organization-specific data
    • Support for various fine-tuning techniques, including reinforcement learning
    • A fully managed, serverless experience that handles infrastructure provisioning
  • The customization workflow includes:
    1. Selecting a base model and fine-tuning technique
    2. Uploading or selecting a dataset
    3. Defining a reward function (including the ability to bring in custom code)
    4. Initiating the fine-tuning job, which is automatically checkpointed and resumed in case of node failures

SageMaker Pipelines for Model Customization and Deployment

  • SageMaker Pipelines now include new steps purpose-built for model customization and deployment, allowing users to accelerate development by leveraging pre-built integrations.
  • Pipelines can be used to automate the end-to-end workflow, from data processing to model training, evaluation, and deployment to SageMaker endpoints or Bedrock.

Serverless MLflow for Observability

  • Serverless MLflow provides a fully managed, scalable solution for tracking experiments, evaluations, and agent traces, without the need to manage any infrastructure.
  • MLflow is deeply integrated into the model customization experience, allowing users to easily access performance metrics and compare different fine-tuning iterations.

Serverless Model Evaluation

  • SageMaker now offers a serverless model evaluation experience, allowing users to leverage popular industry benchmarks, as well as custom metrics, to evaluate their fine-tuned models.
  • Evaluation results are automatically logged to MLflow, enabling easy comparison against the base model and across different fine-tuning experiments.

Agent Observability and Integration

  • Agent observability is integrated into CloudWatch, providing dashboards to monitor agent traces.
  • For models customized using SageMaker, agent traces can also be emitted in OpenTelemetry format and logged to managed MLflow or partner AI apps like Comet ML and Fiddler, enabling comprehensive root cause analysis.

Cost-effective Inference with SageMaker Endpoints

  • SageMaker endpoints now support the ability to deploy multiple foundation models on the same endpoint, enabling cost savings of up to 50% by optimizing resource utilization.
  • Speculative decoding, a new technique, can reduce inference latency by up to 2.5x without compromising accuracy, by leveraging a smaller "draft" model to generate initial token predictions.

Tracking Generative AI Assets

  • SageMaker now provides the ability to track and version not only models, but also other generative AI assets like datasets and reward functions, enabling comprehensive lineage tracking.

Business Impact and Use Cases

  • The new SageMaker capabilities address key challenges that have historically prevented agentic AI applications from reaching production, including:
    • Lack of standardized tools for model customization
    • Fragmented observability of models and agents
    • Difficulty in tracking and managing evolving AI assets
    • Complexity in building cost-effective inference stacks
  • By overcoming these challenges, organizations can more effectively leverage agentic AI to drive business value, such as:
    • Automating more decisions within enterprise software applications
    • Deploying high-quality, cost-effective agentic AI agents to assist customers or employees

Demonstration

  • The presentation included a demonstration of the new SageMaker model customization and agent integration capabilities, using a medical triage agent as an example.
  • Key steps included:
    1. Customizing an open-source model for the medical triage use case using supervised fine-tuning
    2. Evaluating the fine-tuned model using industry-standard benchmarks and custom metrics, with results logged to MLflow
    3. Deploying the customized model to a SageMaker endpoint
    4. Integrating the model with an agentic AI workflow using the Strands SDK, and later the Agent Core runtime
    5. Leveraging MLflow to observe agent traces and debug the agent's decision-making process

Conclusion

  • The new SageMaker capabilities provide a comprehensive platform for data scientists and AI developers to customize models, build agentic AI applications, and deploy them at scale, addressing key challenges that have historically hindered the adoption of agentic AI in enterprise software.
  • By enabling more effective model customization, observability, asset management, and cost-optimized inference, organizations can accelerate the development and deployment of high-quality, autonomous agentic AI solutions to drive business value.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.