TalksAWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327)

AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327)

AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock

Introduction to Reinforcement Fine-Tuning (RFT)

  • Fine-tuning is a way to adjust an existing base model to fit a specific use case
  • Base models trained on vast internet data lack the specific details, tone, and style required for a company's needs
  • Traditional supervised fine-tuning (SFT) has challenges:
    • Data-hungry, requiring large high-quality labeled datasets
    • Rigid, with models potentially memorizing examples rather than adapting
    • Prone to model drift over time as data rules change

Reinforcement Fine-Tuning on Amazon Bedrock

  • RFT allows models to learn from a small set of examples, explore thousands of solutions automatically, and use the best solution to improve themselves
  • RFT does not require massive high-quality labeled datasets or deep ML expertise
  • The RFT process:
    1. Provide data from multiple sources (e.g., files, S3, logs)
    2. Define "what good looks like" using a reward function (pre-built templates or custom Lambda)
    3. Start training, with visibility into metrics like training/validation rewards and episode length
  • Once training is complete, the fine-tuned model can be deployed for on-demand inference with pay-as-you-go pricing

Demonstration of RFT on Bedrock

  • Walkthrough of the Bedrock console to create an RFT job:
    • Select the model to fine-tune (e.g., Nova Lite 2)
    • Upload data in JSON format (e.g., financial Q&A, sentiment analysis)
    • Choose a pre-built reward function template or create a custom Lambda
    • Configure training hyperparameters like epochs and learning rate
  • Example of testing the fine-tuned model in the Bedrock playground:
    • Comparing performance to the base model
    • Observing the model's real-time response to a complex financial Q&A prompt

Salesforce's Use Case: Agent Force 360

  • Salesforce's enterprise AI platform, Agent Force, leverages RFT to build specialized models
  • Goals: High accuracy, low latency, and high explainability for latency-sensitive applications
  • Salesforce's in-house "Tax Eval" model, built using RFT:
    • Trained on a mix of public and synthetic data
    • Outperforms the GPT-4 base model on instruction adherence (97% vs. 88%) and task completion (95% vs. 83%)
    • Costs less than 10% of the GPT-4 model
  • Applying RFT to build reasoning models for Agent Force 360's "Agent Graph" architecture

Key Takeaways

  • Bedrock's RFT feature democratizes advanced model fine-tuning for all developers, without requiring deep ML expertise
  • RFT can significantly improve model performance (up to 60-70%) compared to base models, while reducing costs
  • Customers like Salesforce are leveraging RFT to build specialized, high-performing models for latency-sensitive enterprise AI applications
  • Bedrock continues to innovate with the latest models, customization features, and agentic AI capabilities to serve developers' evolving needs

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.