TalksAWS re:Invent 2025 - From enterprise data mesh to AI with Amazon SageMaker Unified Studio (IND3322)

AWS re:Invent 2025 - From enterprise data mesh to AI with Amazon SageMaker Unified Studio (IND3322)

From Enterprise Data Mesh to AI with Amazon SageMaker Unified Studio

Overcoming Data Challenges for Successful AI Adoption

The presentation discusses the key barriers organizations face in building a modern data platform to enable enterprise-wide AI and data-driven insights. The speakers, Razan Wang and Sami Gordon, share their experiences working with large financial services customers to address these challenges.

Key Barriers and Solutions

1. Breaking Data Silos

  • Challenges: Fragmented domain landscape, lack of data ownership and source of truth, fragmented data discovery, and duplicate data across the organization.
  • Solutions:
    • Establish an operating model with clear roles and responsibilities for data ownership and governance.
    • Implement a multi-account strategy to distribute data domains and maintain security and compliance boundaries.
    • Leverage data contracts to define data ownership, schema, and quality rules.
    • Utilize the Amazon SageMaker Unified Studio's centralized catalog to enable cross-organization data discovery and access.

2. Establishing Trust in Data

  • Challenges: Data quality issues, lack of data lineage and traceability, unknown PII exposure, and ad-hoc access governance.
  • Solutions:
    • Leverage data contracts to define data quality rules and enable automated data quality checks using AWS Glue Data Quality.
    • Capture data lineage from AWS Glue, Apache Airflow, and dbt, and visualize it in the SageMaker Unified Studio.
    • Classify PII data using data contracts and leverage AWS Glue to detect sensitive data during ingestion.
    • Implement fine-grained access control using AWS Lake Formation across the data pipeline.

3. Enabling Cross-Organization Governance

  • Challenges: Lack of unified data discovery, manual and inefficient access request handling, and absence of business context for data assets.
  • Solutions:
    • Leverage the SageMaker Unified Studio's federated data catalog to enable unified data discovery across the organization.
    • Implement a self-service access request workflow using pub-sub notifications and conditional access control.
    • Enrich data assets with business glossary terms and map them to specific data elements to provide business context.
    • Utilize Amazon Comprehend to automatically generate descriptions and assign taxonomy for data assets.

4. Driving Data-Driven Insights and AI Experimentation

  • Experiment Lifecycle:
    • Define a hypothesis and identify relevant data sources from the data catalog.
    • Load and prepare the data using tools like Amazon Athena, SageMaker Data Wrangler, and Boto3.
    • Split the data, train a machine learning model, and evaluate the results.
    • Validate the model's performance and feature importance against the initial hypothesis.
    • Deploy the model and make the results available as a new data product in the catalog.
  • Consumption Patterns:
    • Leverage SageMaker Unified Studio's integration with Amazon QuickSight to create interactive dashboards and reports.
    • Use natural language processing to generate data stories and scenarios for business users.
    • Promote data products to the business for wider consumption and continued experimentation.

Key Takeaways

  • Establishing data ownership and a clear operating model is crucial for breaking down data silos and enabling cross-organization collaboration.
  • Improving data quality, lineage, and access governance is essential for building trust in data and enabling AI-driven decision-making.
  • Implementing a centralized data catalog and self-service access workflows can significantly improve data discoverability and consumption.
  • Integrating the data platform with tools for data preparation, model training, and business intelligence can streamline the end-to-end data-to-insights lifecycle.
  • Iterative experimentation and promoting data products for wider consumption are key to driving continuous value from the data platform.

Technical Details

  • Key technologies mentioned: Amazon SageMaker Unified Studio, AWS Glue, AWS Lake Formation, Amazon Athena, Amazon Comprehend, Amazon QuickSight, Apache Airflow, dbt.
  • Specific metrics and outcomes:
    • 70-80% faster time from question to answer
    • 80% data discoverability across the organization
    • 90% reduction in manual data governance processes
    • 5-10x increase in AI and machine learning experimentation

Business Impact

The solutions presented enable organizations to build a future-proof, enterprise-wide data platform that can unlock the full potential of AI and data-driven insights. By overcoming the key barriers of data silos, data trust, cross-organization governance, and driving consumption, organizations can:

  • Achieve a unified view of customer behavior and other critical business data.
  • Accelerate the development and deployment of AI-powered applications and decision-making.
  • Empower business users with self-service access to high-quality data and insights.
  • Ensure regulatory compliance and data security through comprehensive governance.
  • Continuously experiment and innovate using the data platform as the foundation.

Real-World Examples

  • The presenters shared their experiences working with a large bank in the ASEAN region to build a modern data platform on AWS, where the primary barrier was overcoming data silos.
  • They also discussed a use case around building a unified "Customer 360" view by integrating data from various business domains, such as retail banking, risk, customer analytics, and fraud detection.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.