Bloomberg: Lessons learned from building and training LLMs on AWS (FSI320)

Here is a detailed summary of the key takeaways from the video transcription, formatted in markdown with sections for better readability:

Navigating the Challenges of Training Large Language Models

Training Bloomberg's Own Large Language Model

  • In 2020, the release of GPT-3 by OpenAI sparked widespread interest in large language models (LLMs).
  • Bloomberg, with its vast financial data and experienced data science team, decided to train its own LLM, called Bloomberg GPT, as a research project.
  • The key challenges they faced included:
    • Accessing sufficient GPU compute power, which was in high demand and short supply due to the global semiconductor crisis.
    • Managing massive volumes of training data (700 billion tokens) and checkpointing the model during training.
    • Ensuring the training process was reproducible, automated, and resilient to failures.

Partnering with AWS for Model Training

  • Bloomberg turned to AWS to overcome the GPU availability and infrastructure challenges.
  • They utilized AWS services like Amazon SageMaker, which provided a fully managed training platform and the necessary high-performance computing resources.
  • The P4 EC2 Ultra Cluster with Nvidia A100 GPUs and high-speed connectivity enabled efficient and reliable training.
  • Bloomberg's team of 9 AI researchers was able to successfully train the Bloomberg GPT model, in contrast with the much larger teams involved in training other LLMs.

Lessons Learned and Guiding Principles

  • The Bloomberg GPT model performed better on finance-specific tasks than general-purpose models, highlighting the value of domain-specific training.
  • Bloomberg developed three guiding principles for incorporating LLMs into their products:
    1. Model outputs must be derived from trustworthy sources and reflect reality.
    2. LLMs should be integrated into a larger user experience, with proper guidance and feedback mechanisms.
    3. Transparent attribution of model outputs to their data sources is crucial for building user trust.

Applying LLMs at Bloomberg

  • Bloomberg has explored various use cases for LLMs, such as:
    • Enabling natural language-based querying of their extensive financial data using a fine-tuned LLM.
    • Generating summaries of lengthy earnings call transcripts to surface key insights.
  • These applications prioritize user experience, transparency, and reliability, rather than relying on open-ended chatbot interactions.

Evolving Infrastructure for LLM Workloads

  • The infrastructure and operational practices required for large-scale LLM training have evolved, with new tools and services like Amazon SageMaker HyperPod.
  • Bloomberg has contributed to the open-source community, building the Envoy AI Gateway to provide centralized API management, access control, and cost attribution for integrating multiple LLMs.
  • Determining whether to train a custom LLM or leverage and fine-tune existing models requires careful evaluation of the specific use case and requirements.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us