AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327)

AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock

Introduction to Reinforcement Fine-Tuning (RFT)

Fine-tuning is a way to adjust an existing base model to fit a specific use case

Base models trained on vast internet data lack the specific details, tone, and style required for a company's needs

Traditional supervised fine-tuning (SFT) has challenges:

Data-hungry, requiring large high-quality labeled datasets
Rigid, with models potentially memorizing examples rather than adapting
Prone to model drift over time as data rules change

Reinforcement Fine-Tuning on Amazon Bedrock

RFT allows models to learn from a small set of examples, explore thousands of solutions automatically, and use the best solution to improve themselves

RFT does not require massive high-quality labeled datasets or deep ML expertise

The RFT process:

Provide data from multiple sources (e.g., files, S3, logs)
Define "what good looks like" using a reward function (pre-built templates or custom Lambda)
Start training, with visibility into metrics like training/validation rewards and episode length

Once training is complete, the fine-tuned model can be deployed for on-demand inference with pay-as-you-go pricing

Demonstration of RFT on Bedrock

Walkthrough of the Bedrock console to create an RFT job:

Select the model to fine-tune (e.g., Nova Lite 2)
Upload data in JSON format (e.g., financial Q&A, sentiment analysis)
Choose a pre-built reward function template or create a custom Lambda
Configure training hyperparameters like epochs and learning rate

Example of testing the fine-tuned model in the Bedrock playground:

Comparing performance to the base model
Observing the model's real-time response to a complex financial Q&A prompt

Salesforce's Use Case: Agent Force 360

Salesforce's enterprise AI platform, Agent Force, leverages RFT to build specialized models

Goals: High accuracy, low latency, and high explainability for latency-sensitive applications

Salesforce's in-house "Tax Eval" model, built using RFT:

Trained on a mix of public and synthetic data
Outperforms the GPT-4 base model on instruction adherence (97% vs. 88%) and task completion (95% vs. 83%)
Costs less than 10% of the GPT-4 model

Applying RFT to build reasoning models for Agent Force 360's "Agent Graph" architecture

Key Takeaways

Bedrock's RFT feature democratizes advanced model fine-tuning for all developers, without requiring deep ML expertise

RFT can significantly improve model performance (up to 60-70%) compared to base models, while reducing costs

Customers like Salesforce are leveraging RFT to build specialized, high-performing models for latency-sensitive enterprise AI applications

Bedrock continues to innovate with the latest models, customization features, and agentic AI capabilities to serve developers' evolving needs

AWS re:Invent 2025 - Unlock Advanced Model Training: Reinforcement Fine-tuning on Bedrock (AIM3327)