TalksAWS re:Invent 2025 - Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study (SPS402)
AWS re:Invent 2025 - Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study (SPS402)
Summary of "Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study"
Introduction to the AWS Generative AI Innovation Center
The Generative AI Innovation Center is a $200 million program at AWS focused on enabling and accelerating customers to leverage generative AI.
The team includes generative AI strategists, applied scientists, and machine learning engineers who work hands-on with customers.
The team specializes in model customization, including optimizing for accuracy, latency, and cost, as well as hardware optimization.
Understanding Agents and Multi-Agent Systems
Agents are autonomous software systems that leverage AI to reason, plan, and complete tasks on behalf of humans or other systems.
Agents have higher-level automation capabilities compared to rule-based systems, allowing them to adapt to changing contexts and ambiguity.
Key advancements enabling practical agents include improved model reasoning, increased model choice, better data/knowledge integration, and improved agent development tools.
Multi-agent systems involve an orchestrator agent that breaks down tasks into subtasks for specialized worker agents to execute.
Benefits of multi-agent systems include modularity, scalability, and the ability to right-size models for specific tasks.
Challenges include managing latency, cost, task decomposition, context management, and error propagation.
Techniques for Customizing Agents
Model Distillation
Distills knowledge from a large, capable "teacher" model into a smaller, faster, and more cost-effective "student" model.
Allows deploying smaller models to perform the same tasks as larger models, reducing latency and operational costs.
Supervised Fine-tuning
Trains models on domain-specific data and patterns to handle specialized terminology, formats, and constraints.
Improves task decomposition and context management in multi-agent systems by teaching agents their specific roles.
Preference Optimization
Aligns agent outputs to preferred styles, tones, and formats to create a consistent user experience across multiple agents.
Teaches agents to produce responses that are not just accurate, but also well-formatted and aligned with customer preferences.
Reinforcement Fine-tuning
Enables agents to learn through trial-and-error, making sequential decisions and receiving feedback on successful outcomes.
Particularly useful for tasks like code generation, where agents need to explore different approaches to find the optimal solution.
Cosine AI's Multi-Agent Orchestration Solution
Cosine AI builds specialized coding agents for large enterprises, often in highly regulated industries.
Their multi-agent architecture includes an orchestrator agent that breaks down tasks and delegates to worker agents.
The orchestrator model is trained to plan, delegate, and manage the overall workflow, while worker models are specialized for executing specific coding tasks.
Key benefits of Cosine's multi-agent approach include:
31% improvement in economic value of work compared to a single agent model
60% reduction in GPU footprint for on-premises deployments
20% fewer errors in final code outputs
Lessons Learned from Cosine AI's Experience
Training an orchestrator model and worker models require different approaches and disciplines.
Distillation is crucial for bringing the capabilities of large foundation models into smaller, more efficient models.
Reinforcement learning and real-world execution data are invaluable for improving small model performance and generalization.
Multi-agent architectures enable highly isolated, auditable deployments in regulated industries like finance, defense, and healthcare.
Key Takeaways
Customizing agents through techniques like distillation, fine-tuning, and reinforcement learning can significantly improve performance, efficiency, and alignment with business needs.
Multi-agent architectures provide modularity, scalability, and the ability to right-size models, but require careful management of challenges like latency, cost, and error propagation.
Cosine AI's case study demonstrates the real-world benefits of multi-agent orchestration, including improved economic value, reduced infrastructure costs, and higher-quality outputs.
Effective agent customization requires understanding the unique requirements of orchestrator and worker models, as well as leveraging distillation and reinforcement learning.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.