TalksAWS re:Invent 2025 - Building scalable applications with text and multimodal understanding (AIM375)
AWS re:Invent 2025 - Building scalable applications with text and multimodal understanding (AIM375)
Building Scalable Applications with Text and Multimodal Understanding
Introduction
Presented by DH Rajput, Principal Product Manager at Amazon AGI (Artificial General Intelligence)
Discussed how to utilize data beyond just text, such as images, documents, videos, and audio, to build accurate, context-aware applications using Amazon Nova Foundation models.
Joined by Brandon Nyer, Senior Product Manager, and Tyianne, representing Box, to discuss image/video understanding and customer use cases.
Enterprise Needs and Challenges
Organizations have vast amounts of multimodal data (text, structured data, contracts, videos, call recordings) but only use a small portion, mostly text or structured data.
Key challenges with using multimodal data:
Separate models and tools for each modality, leading to complexity and lack of context integration.
Difficulty in reasoning across modalities to deliver customer insights.
Inaccurate models requiring human intervention, which doesn't scale.
Amazon Nova 2.0 Models
Designed to treat all modalities as first-class citizens, with native multimodal processing capabilities.
Variety of models to cater to different cost, latency, and accuracy profiles:
Nova 2 Light: Fast, cost-effective reasoning model
Nova 2 Pro: Most intelligent model for complex tasks
Nova 2 Omni: Unified model for understanding and generation
Nova 2 Sonic: Conversational speech-to-text model
Nova Multimodal Embeddings: Cross-modal search and retrieval
Key features:
1 million context window to process long-form content
Multilingual support (200+ languages, 10+ for speech)
Integrated reasoning capabilities
Document Intelligence
Optimized for two key primitives: Optical Character Recognition (OCR) and Key Information Extraction (KIE).
OCR optimizations:
Robust real-world OCR for challenging documents (handwritten, low-quality scans, tilted)
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.