Scaling generative AI workloads with efficient model choice (AIM397-NEW)

Scaling Generative AI Workloads with Efficient Model Choice

Intelligent Prompt Routing

Allows using a combination of foundational models for your application, with each prompt routed to the model best suited for it.

Offers two routers in preview:

Anthropic router: Routes between Claud Sonic 3.5 and Claud Haiku.
Meta prompt router: Routes between Lama 8 billion and 70 billion models from the 3.1 family.

How it works:

Encodes the prompt to understand what it's about.
Predicts the performance of each model for that prompt.
Routes the prompt to the right model based on the provided response quality threshold.

Benefits:

Choose your own models from the same model family.
Access via a single serverless endpoint.
Define your own routing criteria (acceptable response quality difference).
Align predictions with ground truth data (future).

Model Distillation

Process of transferring knowledge from a larger "teacher" model to a smaller, more cost-efficient and faster "student" model.

Bedrock model distillation provides:

Ability to use your own production data (via invocation logs).
Proprietary data synthesis techniques to generate diverse and high-quality datasets.

Workflow:

Select teacher and student models (from the same family).
Choose data source (invocation logs or data synthesis).
Bedrock distills the student model.

Benefits:

Maintain accuracy while reducing cost and latency.
Example: Robin AI saw 98% accuracy of distilled model compared to teacher model, with 66% cost savings.

Amazon Bedrock Marketplace

Provides access to over 100 publicly available and proprietary models from 30+ providers.

Offers models for various use cases and specialized tasks:

Language translation (e.g., Upstage, Preferred Network)
Protein sequence generation (e.g., Evolutionary Scale)
Image generation (e.g., Stable Diffusion 3.5)
Audio dubbing (e.g., Cam's Mars 6)

Serverless offering with auto-scaling capabilities.

Customers can fine-tune compatible models and import them into Bedrock.

Use cases:

Zense using Wit's high-performance translation model.
South Korean newspaper using Upstage's proofreading model to improve accuracy.

Conclusion

Intelligent prompt routing, model distillation, and Amazon Bedrock Marketplace provide a comprehensive set of tools to scale your generative AI workloads.

These features allow you to optimize for quality, cost, and speed, enabling you to build more efficient and effective generative AI applications.

Scaling generative AI workloads with efficient model choice (AIM397-NEW)