TalksAWS re:Invent 2025 - AI Pioneers: Shipping Transformative GenAI Architectures to Production (SMB301)

AWS re:Invent 2025 - AI Pioneers: Shipping Transformative GenAI Architectures to Production (SMB301)

Summary of AWS re:Invent 2025 - AI Pioneers: Shipping Transformative GenAI Architectures to Production (SMB301)

Overview of AI Pioneers

  • AI pioneers are organizations using AI to build transformative, customer-facing architectures and use cases
  • They often work closer to the AI infrastructure, as model builders, customizers, or those running large language model (LLM) inference at scale
  • Common architectures include building LLM serving platforms, creative content generation, and domain-specific vision-language models

Building LLM Serving Platforms

Challenges of Scaling LLM Inference

  • LLM models are extremely large, often requiring multiple GPUs to serve
  • They have "thinking budgets" that are consumed during runtime, unlike traditional workloads
  • Request/response patterns are highly variable, from single words to multi-page outputs

Key Requirements of an LLM Serving Platform

  1. Model Choice: Fast access to a variety of foundation models for building generative AI applications
  2. Supporting Services: Vector databases, security, observability, and other requirements for LLM-powered apps
  3. SaaS Capabilities: Rate limiting, cost attribution, usage reporting for LLM-as-a-service offerings
  4. Self-Managed Models: Ability to host and fine-tune custom models on accelerated infrastructure
  5. Deployment Flexibility: Option to deploy LLM inference on-premises or in the cloud

Architecture Patterns

  1. Managed Architecture: Uses Amazon Bedrock to provide model choice and supporting services
  2. SaaS Architecture: Adds an LLM gateway to provide rate limiting, cost controls, and other SaaS capabilities
  3. Hybrid Architecture: Leverages Amazon SageMaker HyperPod to host and fine-tune custom LLM models
  4. Multi-Cloud/On-Premises Architecture: Extends the hybrid architecture to enable deployment on customer-managed infrastructure using EKS Hybrid

Creative Content Generation

Challenges of Consistent Visual Generation

  • Generic AI models cannot capture the specific visual traits, characters, and consistency required for production-level content generation
  • Need to maintain visual fidelity and character essence across multiple images/scenes

Bedrock Fine-Tuning for Customized Models

  • Uses techniques like parameter-efficient fine-tuning (PET), distillation, and continued pre-training (CPT) to customize the Nova Canvas model
  • Requires curated dataset, image captioning, and human-in-the-loop evaluation to ensure consistent, high-quality outputs

Architecture for LLM-Based Evaluation

  • Automated video processing and character extraction to create fine-tuning dataset
  • Bedrock fine-tuning to generate customized model
  • LLM-based "judge" evaluation to assess visual consistency, prompt adherence, and other criteria at scale

Arabic Vision-Language Model for Document Processing

  • Misraji AI, a pioneer lab in Saudi Arabia, developed an Arabic-specific vision-language model for OCR and document processing use cases
  • Leveraged a hybrid approach of real-world and synthetic data, along with iterative fine-tuning strategies, to create a state-of-the-art model
  • Enabled highly accurate Arabic OCR, competing with top models in the market

Emerging Architecture: Intelligent Control and Operations Plane (ICOP)

  • Provides a specialized, provider-hosted API endpoint for deploying and managing AI workloads like LLM serving
  • Understands the workload requirements, plans the optimal deployment, handles the provisioning, and monitors the infrastructure
  • Leverages customized, task-specific language models rather than general-purpose assistants to enable fast, cost-effective, and reliable AI workload management

Key Takeaways

  • AI pioneers are pushing the boundaries of generative AI, building transformative customer-facing applications
  • Scaling LLM inference requires specialized platforms that address model choice, SaaS capabilities, self-managed models, and deployment flexibility
  • Customizing AI models, like image generation, is crucial for maintaining visual consistency and brand identity
  • Domain-specific vision-language models can unlock new capabilities, like state-of-the-art Arabic OCR
  • Emerging "Intelligent Control and Operations Plane" architectures aim to simplify the deployment and management of AI workloads

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.