Customer stories: Optimizing AI performance and cost with AWS AI chips (CMP208)

Key Takeaways

2024 has been a transformative year for generative AI (Gen) with several new trends emerging:

Large language models (LLMs) are getting larger, with the release of models like LLaMA 3 and 405B
Developers are focusing on fine-tuning and adapting LLMs for better accuracy and results using techniques like domain adaptation and distillation
Smaller language models (SLMs) are a new emerging field
Multimodal models that can process multiple inputs like audio, video and text are becoming more common
AI agents and ensembles of experts are gaining popularity for providing collective intelligence

AWS Infrastructure for Gen

AWS has been investing in AI chips like Inferentia and Trainium for several years to support Gen workloads

The latest Trainium 2 chip delivers 1.3 petaflops of compute, 30% more than any other EC2 instance, and 4x more compute, memory and bandwidth compared to Trainium 1

This makes Trainium 2 the most powerful instance for large-scale distributed training and inference of Gen models with hundreds of billions to trillions of parameters

AWS provides the Neuron SDK with tools, libraries, compilers and framework support to build Gen workloads on Trainium and Inferentia

The Neuron SDK integrates with popular frameworks like PyTorch and JAX, as well as other AWS services for orchestration and management

Customer Experiences

Rico (Takeshi Suzuki)

Rico is a Japanese company focused on developing Japanese large language models (LLMs)

They used techniques like tokenizer adaptation and curriculum learning to develop a Japanese LLM from an English base model

Leveraged Trainium 1 for efficient and cost-effective large-scale training, achieving 45% cost reduction and 12% faster training time compared to GPU clusters

Encountered some challenges with SDK version upgrades and infrastructure failures, but collaborated closely with AWS to address them

RCI (Mark McQuade)

RCI focuses on building specialized small language models (SLMs) and end-to-end AI agent systems

Utilized Trainium for efficient and cost-effective training of their LLaMa-based SLM models, achieving up to 32% better cost-performance compared to GPU instances

Deployed their AI agent system "RC Orchestra" on Inferentia for inference, benefiting from better cost-performance, region availability and easier deployment

Continues to work closely with AWS as a design partner for Trainium 2 to further improve performance and cost benefits

IBM (Arman Ruiz)

IBM offers the Watson X platform for generative AI, with components for AI development, intelligent data management and governance

Sees a shift from fixed flows to variable, automated AI agent systems that can reason and plan, which will drive a lot of innovation

IBM's own Granite foundation models are designed with Enterprise needs in mind, providing full transparency on training data and legality

Collaborating closely with AWS to integrate Watson X with Trainium, Inferentia and other AWS services for efficient deployment

ByteDance (Wong Pang)

ByteDance (parent company of TikTok) has developed a multimodal language model called "MM Modality" for their platforms

Leveraged Inferentia to deploy the model globally, achieving 133% better throughput and lower cost compared to GPUs

Sees multimodal models as the future, with potential to power advanced robotics and interactive AI agents

Believes AI will ultimately benefit everyone, as long as it is made accessible and scalable

Future Outlook

Smaller, specialized language models (SLMs) will continue to evolve, aiming to match the capabilities of larger models while being more efficient and cost-effective

Multimodal models that can process diverse inputs like text, images, audio and video will become more prominent

AI agent systems composed of ensembles of specialized models will emerge to provide more robust and flexible intelligence

Continued collaboration between AI software providers and cloud infrastructure vendors like AWS will be key to driving innovation and enabling cost-effective, scalable deployment of generative AI solutions.

Customer stories: Optimizing AI performance and cost with AWS AI chips (CMP208)

Key Takeaways

AWS Infrastructure for Gen

Customer Experiences

Rico (Takeshi Suzuki)

RCI (Mark McQuade)

IBM (Arman Ruiz)

ByteDance (Wong Pang)

Future Outlook

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Customer stories: Optimizing AI performance and cost with AWS AI chips (CMP208)

Key Takeaways

AWS Infrastructure for Gen

Customer Experiences

Rico (Takeshi Suzuki)

RCI (Mark McQuade)

IBM (Arman Ruiz)

ByteDance (Wong Pang)

Future Outlook

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.