Bloomberg: Lessons learned from building and training LLMs on AWS (FSI320)

Navigating the Challenges of Training Large Language Models

Training Bloomberg's Own Large Language Model

In 2020, the release of GPT-3 by OpenAI sparked widespread interest in large language models (LLMs).

Bloomberg, with its vast financial data and experienced data science team, decided to train its own LLM, called Bloomberg GPT, as a research project.

The key challenges they faced included:

Accessing sufficient GPU compute power, which was in high demand and short supply due to the global semiconductor crisis.
Managing massive volumes of training data (700 billion tokens) and checkpointing the model during training.
Ensuring the training process was reproducible, automated, and resilient to failures.

Partnering with AWS for Model Training

Bloomberg turned to AWS to overcome the GPU availability and infrastructure challenges.

They utilized AWS services like Amazon SageMaker, which provided a fully managed training platform and the necessary high-performance computing resources.

The P4 EC2 Ultra Cluster with Nvidia A100 GPUs and high-speed connectivity enabled efficient and reliable training.

Bloomberg's team of 9 AI researchers was able to successfully train the Bloomberg GPT model, in contrast with the much larger teams involved in training other LLMs.

Lessons Learned and Guiding Principles

The Bloomberg GPT model performed better on finance-specific tasks than general-purpose models, highlighting the value of domain-specific training.

Bloomberg developed three guiding principles for incorporating LLMs into their products:

Model outputs must be derived from trustworthy sources and reflect reality.
LLMs should be integrated into a larger user experience, with proper guidance and feedback mechanisms.
Transparent attribution of model outputs to their data sources is crucial for building user trust.

Applying LLMs at Bloomberg

Bloomberg has explored various use cases for LLMs, such as:

Enabling natural language-based querying of their extensive financial data using a fine-tuned LLM.
Generating summaries of lengthy earnings call transcripts to surface key insights.

These applications prioritize user experience, transparency, and reliability, rather than relying on open-ended chatbot interactions.

Evolving Infrastructure for LLM Workloads

The infrastructure and operational practices required for large-scale LLM training have evolved, with new tools and services like Amazon SageMaker HyperPod.

Bloomberg has contributed to the open-source community, building the Envoy AI Gateway to provide centralized API management, access control, and cost attribution for integrating multiple LLMs.

Determining whether to train a custom LLM or leverage and fine-tune existing models requires careful evaluation of the specific use case and requirements.

Bloomberg: Lessons learned from building and training LLMs on AWS (FSI320)

Navigating the Challenges of Training Large Language Models

Training Bloomberg's Own Large Language Model

Partnering with AWS for Model Training

Lessons Learned and Guiding Principles

Applying LLMs at Bloomberg

Evolving Infrastructure for LLM Workloads

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Bloomberg: Lessons learned from building and training LLMs on AWS (FSI320)

Navigating the Challenges of Training Large Language Models

Training Bloomberg's Own Large Language Model

Partnering with AWS for Model Training

Lessons Learned and Guiding Principles

Applying LLMs at Bloomberg

Evolving Infrastructure for LLM Workloads

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.