TalksAWS re:Invent 2025 - The path to Multimodal Artificial General Intelligence (STP102)

AWS re:Invent 2025 - The path to Multimodal Artificial General Intelligence (STP102)

Summary of AWS re:Invent 2025 - The Path to Multimodal Artificial General Intelligence (STP102)

Introduction to Multimodal AGI

  • Current language models like GPT are incredibly capable, but lack understanding of the physical world
  • To interact with the real world, AGI needs to understand visual and spatial intelligence, not just linguistic intelligence
  • Luma AI's mission is to build multimodal AGI that can generate, understand, and operate within the physical world

Key Features of Ray 3 Model

  • Ray 3 is Luma's state-of-the-art video generation model that can:
    • Reason about and refine its own output in real-time to match instructions
    • Export high-quality HDR and EXR files for VFX and post-production
  • Key strengths of Ray 3:
    • Realistic physics simulations, fluid dynamics, and explosions
    • Coherent world motion, depth, and scale
    • Anatomically accurate character animation and movement
    • Detailed lighting effects, textures, and surfaces

New Capabilities of Ray 3

  • HDR and EXR output support for advanced color control and compositing
  • Reasoning model that annotates and refines output to best match instructions
  • Visual annotation mode for sketching and directing model output
  • Draft mode for low-res previews to conserve credits

Ray 3 Core Features

  • Image-to-video: Turning static images into cinematic motion
  • Text-to-video: Bringing ideas to life through written instructions
  • Modify video: Editing existing videos by changing start frames or prompts
  • Key frame animation: Seamless transitions between start and end frames
  • Video looping: Creating endless, seamless loop animations

Advanced Style Adherence

  • Ray 3 preserves the look, feel, and aesthetic of the original input
  • Maintains consistent lighting, camera effects, color palette, and surface details
  • Able to translate a variety of styles, from moody intensity to retro kung fu

Business Impact and Use Cases

  • Content demand has outpaced human production capabilities
  • Generative video unlocks speed and scale for enterprises:
    • Consumer packaged goods and e-commerce: Generating product videos, lifestyle shots, and personalized assets
    • Film and episodic production: Accelerating pre-production, storyboarding, and VFX workflows
    • Gaming: Enhancing 2D and 3D game asset creation and animation
    • Advertising: Producing high-quality, targeted ads at scale

Real-World Examples

  • Coca-Cola 2025 holiday ad campaign created by a small team in 30 days
  • 32-second animation produced for $630 vs. $95-180K with traditional methods
  • Brands able to generate dozens of creative directions and iterate quickly

The Path to Multimodal AGI

  • Luma's goal is to solve multimodal general intelligence, going beyond just video generation
  • Capturing and simulating the physical world is key to empowering anyone to produce cinematic-quality content

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.