Talks AWS re:Invent 2025 - Turn unstructured data in Amazon S3 into AI-ready assets with SageMaker Catalog VIDEO
AWS re:Invent 2025 - Turn unstructured data in Amazon S3 into AI-ready assets with SageMaker Catalog Turning Unstructured Data into AI-Ready Assets with SageMaker Catalog
Data Readiness for AI
Importance of building a strong data foundation to support AI and generative AI applications
Shift from traditional data management to a "data mesh" approach, distributing data assets across the organization
Parallel shift in generative AI, moving towards multi-agent collaboration and orchestration
Data Modalities and Challenges
Life sciences example highlighting structured (EHR, OMICS) and unstructured (clinical notes, images, PDFs) data
Challenges in processing unstructured data:
Governance and access control
Selecting optimal solutions and use cases
Manual processing and parameter tuning
Orchestrating multiple models
Building a Unified Data Platform
AWS services for a unified data platform:
SageMaker Lakehouse for structured and unstructured data storage
SageMaker Catalog for data governance and metadata management
SageMaker Unified Studio for building applications and experiences
SageMaker Catalog for Unstructured Data
Cataloging unstructured data assets from S3 with business context and metadata
Associating glossary terms and providing data quality metrics
Enabling secure, auditable access control and permissions
Building Generative AI Applications
Leveraging SageMaker Unified Studio to:
Create knowledge bases from cataloged data
Apply guardrails and governance controls
Deploy conversational AI applications
Real-World Example: Bayer's Data Modernization Journey
Challenges with data silos, lack of trust, and inability to scale
Adopting a data mesh architecture with SageMaker Unified Studio as the central governance control plane
Automating biomarker data ETL using a Bedrock-powered agent
Benefits:
Accelerated time to harmonize data for clinical trials
Improved efficiency of R&D decision-making
Laying a foundation for precision medicine
Key Takeaways
Importance of unifying governance and access control for structured and unstructured data
Leveraging SageMaker services to build a scalable, governed data platform for AI
Automating data processing and model deployment with generative AI agents
Driving real-world business impact by modernizing data infrastructure and unlocking the value of unstructured data
Your Digital Journey deserves a great story. Build one with us.