TalksAWS re:Invent 2025 - Fine-tuning models for accuracy and latency at Robinhood Markets (IND392)

AWS re:Invent 2025 - Fine-tuning models for accuracy and latency at Robinhood Markets (IND392)

Leveraging Fine-Tuning for Accuracy and Latency at Robinhood Markets

Robinhood's AI Vision and Mission

  • Robinhood's mission is to democratize finance for all, providing users the same level of support and insight as the ultra-wealthy
  • To achieve this, Robinhood believes harnessing the power of AI and machine learning is crucial

Key Generative AI Challenges

  • Robinhood faced challenges in improving accuracy, reducing cost, and lowering latency when using large language models (LLMs) in critical production workflows
  • Strategies employed include data curation, model right-sizing, fine-tuning, and optimized deployment options

Robinhood's Generative AI Use Cases

  1. Cortex Digest: Automatically generates summaries explaining stock price movements to users
    • Fine-tuning helps with vocabulary, objectivity, and identifying important information
  2. Custom Indicators and Scans: Allows users to create trading logic using natural language
    • Democratizes algorithmic trading by translating queries into executable code
  3. CX AI Agent: Robinhood's customer support chatbot, built in multiple stages:
    • Intent understanding, planner/tool selection, and final answer generation

The Generative AI Trilemma

  • In the world of generative AI, cost, quality, and latency are often at odds with each other
  • Even small regressions in one area can jeopardize end-user experience

Robinhood's Tuning Roadmap

  1. Prompt Tuning:
    • Optimizes prompts across multiple stages of the agent pipeline
    • Uses a prompt optimization loop to generate and evaluate prompt candidates
  2. Trajectory Tuning:
    • Injects dynamic few-shot examples into the planner stage to improve quality
    • Balances quality uplift with increased context length and latency
  3. Fine-Tuning:
    • Focuses on data quality over quantity when creating the training dataset
    • Leverages techniques like LoRA (Low-Rank Adaptation) to reduce trainable parameters

LoRA: Robinhood's Fine-Tuning Approach

  • LoRA significantly reduces the number of trainable parameters compared to full fine-tuning
  • Enables scalable fine-tuning across multiple use cases with:
    • Faster training times
    • Lower costs
    • Portable models
  • Robinhood integrates LoRA into their fine-tuning platform, leveraging AWS SageMaker and Bedrock

Results and Lessons Learned

  • Robinhood's LoRA-based fine-tuned models achieved over 50% latency savings compared to previous models
  • Maintained quality parity with frontier models
  • Key lessons:
    • Importance of robust evaluation frameworks
    • Data preparation strategy (quality over quantity)
    • Methodical approach to tuning techniques
    • Leveraging AWS services for inference optimization

Conclusion

  • Robinhood's sophisticated use of AWS services and fine-tuning techniques demonstrates the potential for generative AI in regulated industries
  • Their approach can serve as a model for other organizations looking to reliably deploy generative AI in production environments

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.