TalksAWS re:Invent 2025 - From enterprise data mesh to AI with Amazon SageMaker Unified Studio (IND3322)
AWS re:Invent 2025 - From enterprise data mesh to AI with Amazon SageMaker Unified Studio (IND3322)
From Enterprise Data Mesh to AI with Amazon SageMaker Unified Studio
Overcoming Data Challenges for Successful AI Adoption
The presentation discusses the key barriers organizations face in building a modern data platform to enable enterprise-wide AI and data-driven insights. The speakers, Razan Wang and Sami Gordon, share their experiences working with large financial services customers to address these challenges.
Key Barriers and Solutions
1. Breaking Data Silos
Challenges: Fragmented domain landscape, lack of data ownership and source of truth, fragmented data discovery, and duplicate data across the organization.
Solutions:
Establish an operating model with clear roles and responsibilities for data ownership and governance.
Implement a multi-account strategy to distribute data domains and maintain security and compliance boundaries.
Leverage data contracts to define data ownership, schema, and quality rules.
Utilize the Amazon SageMaker Unified Studio's centralized catalog to enable cross-organization data discovery and access.
2. Establishing Trust in Data
Challenges: Data quality issues, lack of data lineage and traceability, unknown PII exposure, and ad-hoc access governance.
Solutions:
Leverage data contracts to define data quality rules and enable automated data quality checks using AWS Glue Data Quality.
Capture data lineage from AWS Glue, Apache Airflow, and dbt, and visualize it in the SageMaker Unified Studio.
Classify PII data using data contracts and leverage AWS Glue to detect sensitive data during ingestion.
Implement fine-grained access control using AWS Lake Formation across the data pipeline.
3. Enabling Cross-Organization Governance
Challenges: Lack of unified data discovery, manual and inefficient access request handling, and absence of business context for data assets.
Solutions:
Leverage the SageMaker Unified Studio's federated data catalog to enable unified data discovery across the organization.
Implement a self-service access request workflow using pub-sub notifications and conditional access control.
Enrich data assets with business glossary terms and map them to specific data elements to provide business context.
Utilize Amazon Comprehend to automatically generate descriptions and assign taxonomy for data assets.
4. Driving Data-Driven Insights and AI Experimentation
Experiment Lifecycle:
Define a hypothesis and identify relevant data sources from the data catalog.
Load and prepare the data using tools like Amazon Athena, SageMaker Data Wrangler, and Boto3.
Split the data, train a machine learning model, and evaluate the results.
Validate the model's performance and feature importance against the initial hypothesis.
Deploy the model and make the results available as a new data product in the catalog.
Consumption Patterns:
Leverage SageMaker Unified Studio's integration with Amazon QuickSight to create interactive dashboards and reports.
Use natural language processing to generate data stories and scenarios for business users.
Promote data products to the business for wider consumption and continued experimentation.
Key Takeaways
Establishing data ownership and a clear operating model is crucial for breaking down data silos and enabling cross-organization collaboration.
Improving data quality, lineage, and access governance is essential for building trust in data and enabling AI-driven decision-making.
Implementing a centralized data catalog and self-service access workflows can significantly improve data discoverability and consumption.
Integrating the data platform with tools for data preparation, model training, and business intelligence can streamline the end-to-end data-to-insights lifecycle.
Iterative experimentation and promoting data products for wider consumption are key to driving continuous value from the data platform.
5-10x increase in AI and machine learning experimentation
Business Impact
The solutions presented enable organizations to build a future-proof, enterprise-wide data platform that can unlock the full potential of AI and data-driven insights. By overcoming the key barriers of data silos, data trust, cross-organization governance, and driving consumption, organizations can:
Achieve a unified view of customer behavior and other critical business data.
Accelerate the development and deployment of AI-powered applications and decision-making.
Empower business users with self-service access to high-quality data and insights.
Ensure regulatory compliance and data security through comprehensive governance.
Continuously experiment and innovate using the data platform as the foundation.
Real-World Examples
The presenters shared their experiences working with a large bank in the ASEAN region to build a modern data platform on AWS, where the primary barrier was overcoming data silos.
They also discussed a use case around building a unified "Customer 360" view by integrating data from various business domains, such as retail banking, risk, customer analytics, and fraud detection.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.