Scaling secure large language models with Robinhood (FSI317)

Scaling Large Language Models Reliably and Securely with Robin Hood

Introduction

  • Trevor Spers, a Solutions Architect at AWS, introduces the session and the speaker, Dolly, a Machine Learning Engineer at Robin Hood.
  • Robin Hood is a fintech company that pioneered commission-free trading in 2013 and has expanded its offerings to include brokerage, crypto, joint accounts, futures, credit cards, and spending cards.
  • The mission of Robin Hood's AI and ML platform is to empower developers with powerful AI and ML abstractions while supporting power users with advanced tools, focusing on streamlining the journey from experimentation to production and enabling rapid adoption of state-of-the-art generative AI technologies.

Fraud Investigation Use Case

  • Robin Hood has over 80 fraud investigation agents manually writing over 300 case conclusions every day, ensuring consistency and maintaining high quality in these narratives.
  • They have introduced a solution powered by AWS Bedrock's Provision FUT mode, leveraging the Clauderic model to generate structured draft conclusions in seconds.
  • To prevent misuse, they have implemented safeguards, such as limiting the "Generate Narrative" button to two clicks per browser session.
  • They have also added a feedback loop to improve the system over time, allowing agents to rate the generation experience on a scale of 1 to 5.

Initial Approach Challenges

  • The high demand for the fraud investigation use case highlighted the limitations of their initial approach to LLM inference.
  • As a fintech company, Robin Hood is subject to stricter data regulations due to the sensitive nature of financial information, leading them to choose AWS Bedrock's Provision FUT mode.
  • This mode provided isolated inference by deploying the Clauderic model as a dedicated instance, ensuring secure connectivity through VPC endpoints.

Lessons Learned and Transition to On-Demand API

  • The Provision FUT mode had scalability issues, as it required pre-committing to a fixed number of model units, making it challenging to handle surges in demand.
  • It also faced the "Noisy Neighbor" problem, where one tenant's high consumption could negatively affect the performance of others sharing the same model unit.
  • Predicting traffic patterns months in advance and getting access to the latest models were also challenges with the Provision FUT mode.

To address these issues, Robin Hood transitioned to the AWS Bedrock on-demand API, which offered greater flexibility and scalability:

  • The on-demand API supported cross-region inference, allowing them to double the allocated quota in the region where the inference profile is in, handling surges in demand.
  • This cross-region inference also enabled failover use cases, ensuring continuous operations in the event of regional impacts.

LLM Gateway

Robin Hood developed an LLM Gateway to facilitate the transition to the on-demand API:

  • The Gateway is designed to be compatible with a wide variety of SDKs, handling the translation of inputs to providers' endpoints for completion, embedding, and image generation, making integration seamless.
  • It validates token size, image size, and other input parameters to ensure requests are compliant and efficient.
  • The Gateway includes a built-in PII reduction service that automatically detects and redacts sensitive information from inputs and outputs, ensuring compliance with data privacy regulations.
  • It supports a fallback model mechanism, ensuring high availability and continuity by switching to a secondary fallback model if the primary model fails or exceeds capacity limits.
  • The Gateway logs requests and responses, facilitating continuous model evaluation and enabling teams to benchmark performance, accuracy, and efficiency of their LLMs in real-time.
  • It also provides granular control over usage and costs, allowing teams to set budgets and enforce weight limits at multiple levels.

Future Goals (2025)

Robin Hood's goals for 2025 include:

  1. Introducing built-in tools for model and prompt evaluation analysis, making it easier to iterate and refine models effectively.
  2. Centralizing AI data governance, creating a unified framework for managing data across the platform to ensure consistency and compliance.
  3. Enhancing the Gateway to enable dynamic model routing and fine-tuned options, ensuring the right model serves the right request at the right time.
  4. Rolling out batch inference pipelines, enabling efficient processing of large-scale tasks.
  5. Providing a "Prompt Playground" to foster creativity and experimentation, allowing users to switch models and change parameters on production data.

These goals are designed to address key challenges in scaling AI and ML at Robin Hood, empowering teams to innovate faster with greater precision and reliability.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us