Leveraging Fine-Tuned Foundation Models to Implement Scalable AI Solutions at S&P Global
Introduction
- S&P Global is a leading provider of credit ratings and financial intelligence, dealing with large volumes of data and complex business problems.
- The company faced challenges in assessing credit risk for small and medium enterprises (SMEs) that lack publicly available financial information.
- To address this, S&P Global decided to leverage fine-tuned foundation models and a distributed architecture on AWS to build a scalable AI solution.
Business Problem and Challenges
- S&P Global's credit rating business is over 150 years old, with 1 million outstanding credit ratings on $46 trillion of debt.
- However, for private companies and SMEs, the lack of publicly available financial information made it difficult to assess credit risk.
- Key challenges included:
- Need for "out-of-the-box" credit assessments for large portfolios of SMEs.
- Lack of timely financial information, as private company data can be months or years old.
- Covering a large target of over 60-70 million companies without financial information.
Approach to Building a Scalable AI Solution
-
Scraping Textual Footprint from the Web:
- Created a domain graph to map child URLs related to a company and identify relevant pages.
- Utilized a heterogeneous graph structure to classify HTML content and extract relevant risk signals.
- Leveraged an orchestration pipeline using Amazon Managed Airflow, Redis, and Amazon EKS to scale the web scraping process.
-
Leveraging Fine-Tuned Foundation Models:
- Explored parameter-efficient fine-tuning techniques, such as LoRA, to update a small portion of the model parameters while preserving the knowledge in the pre-trained model.
- Employed a self-grading approach to build a quality training dataset for fine-tuning the models.
- Fine-tuned the models using Amazon SageMaker, focusing on prompt engineering and hyperparameter tuning.
-
Deploying and Operating the Solution at Scale:
- Converted the fine-tuned models to run on CPU for cost-effective inference, leveraging the Llama C++ library.
- Deployed the classification and extraction tasks on Amazon EKS, with worker pods listening to a Redis-based queue.
- Achieved high scalability, running up to 400 concurrent pods to handle 6 million inferences per week.
Key Learnings and Insights
-
Start Simple and Iterate:
- Begin with a smaller foundation model and fine-tune a small subset of the parameters.
- Gradually increase complexity as needed, considering factors like overfitting and catastrophic forgetting.
-
Leverage Parameter-Efficient Fine-Tuning:
- Techniques like LoRA can significantly reduce the fine-tuning cost and complexity.
- Allows maintaining a single base model and updating small adapter modules for different tasks.
-
Optimize for Cost-Effective Inference:
- Evaluate the trade-off between GPU and CPU instances for inference, considering the overall cost and performance requirements.
- Utilize tools like Llama C++ to efficiently convert models from GPU to CPU.
-
Adopt a Self-Grading Approach for Training Data:
- When dealing with limited resources, a self-grading approach can help build a quality training dataset.
- Break down complex tasks into smaller pieces and leverage the knowledge in the pre-trained model.
-
Embrace a Distributed Architecture:
- Leverage managed services like Amazon Managed Airflow, Redis, and Amazon EKS to build a scalable and resilient infrastructure.
- Separate heavy and light scraping tasks, and scale worker pods to handle the processing load.
Conclusion
S&P Global's journey demonstrates the power of fine-tuned foundation models and a distributed architecture in building scalable AI solutions to address complex business problems. By leveraging AWS services and adopting best practices, the company was able to create a highly efficient and cost-effective system that handles millions of inferences per week.