AWS re:Invent 2025 - The path to Multimodal Artificial General Intelligence (STP102)

Summary of AWS re:Invent 2025 - The Path to Multimodal Artificial General Intelligence (STP102)

Introduction to Multimodal AGI

Current language models like GPT are incredibly capable, but lack understanding of the physical world

To interact with the real world, AGI needs to understand visual and spatial intelligence, not just linguistic intelligence

Luma AI's mission is to build multimodal AGI that can generate, understand, and operate within the physical world

Key Features of Ray 3 Model

Ray 3 is Luma's state-of-the-art video generation model that can:

Reason about and refine its own output in real-time to match instructions
Export high-quality HDR and EXR files for VFX and post-production

Key strengths of Ray 3:

Realistic physics simulations, fluid dynamics, and explosions
Coherent world motion, depth, and scale
Anatomically accurate character animation and movement
Detailed lighting effects, textures, and surfaces

New Capabilities of Ray 3

HDR and EXR output support for advanced color control and compositing

Reasoning model that annotates and refines output to best match instructions

Visual annotation mode for sketching and directing model output

Draft mode for low-res previews to conserve credits

Ray 3 Core Features

Image-to-video: Turning static images into cinematic motion

Text-to-video: Bringing ideas to life through written instructions

Modify video: Editing existing videos by changing start frames or prompts

Key frame animation: Seamless transitions between start and end frames

Video looping: Creating endless, seamless loop animations

Advanced Style Adherence

Ray 3 preserves the look, feel, and aesthetic of the original input

Maintains consistent lighting, camera effects, color palette, and surface details

Able to translate a variety of styles, from moody intensity to retro kung fu

Business Impact and Use Cases

Content demand has outpaced human production capabilities

Generative video unlocks speed and scale for enterprises:

Consumer packaged goods and e-commerce: Generating product videos, lifestyle shots, and personalized assets
Film and episodic production: Accelerating pre-production, storyboarding, and VFX workflows
Gaming: Enhancing 2D and 3D game asset creation and animation
Advertising: Producing high-quality, targeted ads at scale

Real-World Examples

Coca-Cola 2025 holiday ad campaign created by a small team in 30 days

32-second animation produced for $630 vs. $95-180K with traditional methods

Brands able to generate dozens of creative directions and iterate quickly

The Path to Multimodal AGI

Luma's goal is to solve multimodal general intelligence, going beyond just video generation

Capturing and simulating the physical world is key to empowering anyone to produce cinematic-quality content

AWS re:Invent 2025 - The path to Multimodal Artificial General Intelligence (STP102)

Summary of AWS re:Invent 2025 - The Path to Multimodal Artificial General Intelligence (STP102)

Introduction to Multimodal AGI

Key Features of Ray 3 Model

New Capabilities of Ray 3

Ray 3 Core Features

Advanced Style Adherence

Business Impact and Use Cases

Real-World Examples

The Path to Multimodal AGI

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - The path to Multimodal Artificial General Intelligence (STP102)

Summary of AWS re:Invent 2025 - The Path to Multimodal Artificial General Intelligence (STP102)

Introduction to Multimodal AGI

Key Features of Ray 3 Model

New Capabilities of Ray 3

Ray 3 Core Features

Advanced Style Adherence

Business Impact and Use Cases

Real-World Examples

The Path to Multimodal AGI

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.