Talks AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327) VIDEO
AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327) AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock
Introduction to Reinforcement Fine-Tuning (RFT)
Fine-tuning is a way to adjust an existing base model to fit a specific use case
Base models trained on vast internet data lack the specific details, tone, and style required for a company's needs
Traditional supervised fine-tuning (SFT) has challenges:
Data-hungry, requiring large high-quality labeled datasets
Rigid, with models potentially memorizing examples rather than adapting
Prone to model drift over time as data rules change
Reinforcement Fine-Tuning on Amazon Bedrock
RFT allows models to learn from a small set of examples, explore thousands of solutions automatically, and use the best solution to improve themselves
RFT does not require massive high-quality labeled datasets or deep ML expertise
The RFT process:
Provide data from multiple sources (e.g., files, S3, logs)
Define "what good looks like" using a reward function (pre-built templates or custom Lambda)
Start training, with visibility into metrics like training/validation rewards and episode length
Once training is complete, the fine-tuned model can be deployed for on-demand inference with pay-as-you-go pricing
Demonstration of RFT on Bedrock
Walkthrough of the Bedrock console to create an RFT job:
Select the model to fine-tune (e.g., Nova Lite 2)
Upload data in JSON format (e.g., financial Q&A, sentiment analysis)
Choose a pre-built reward function template or create a custom Lambda
Configure training hyperparameters like epochs and learning rate
Example of testing the fine-tuned model in the Bedrock playground:
Comparing performance to the base model
Observing the model's real-time response to a complex financial Q&A prompt
Salesforce's Use Case: Agent Force 360
Salesforce's enterprise AI platform, Agent Force, leverages RFT to build specialized models
Goals: High accuracy, low latency, and high explainability for latency-sensitive applications
Salesforce's in-house "Tax Eval" model, built using RFT:
Trained on a mix of public and synthetic data
Outperforms the GPT-4 base model on instruction adherence (97% vs. 88%) and task completion (95% vs. 83%)
Costs less than 10% of the GPT-4 model
Applying RFT to build reasoning models for Agent Force 360's "Agent Graph" architecture
Key Takeaways
Bedrock's RFT feature democratizes advanced model fine-tuning for all developers, without requiring deep ML expertise
RFT can significantly improve model performance (up to 60-70%) compared to base models, while reducing costs
Customers like Salesforce are leveraging RFT to build specialized, high-performing models for latency-sensitive enterprise AI applications
Bedrock continues to innovate with the latest models, customization features, and agentic AI capabilities to serve developers' evolving needs
Your Digital Journey deserves a great story. Build one with us.