Empower your data journey with Amazon DataZone’s data lineage (ANT207-NEW)

Overview

The session covers the following key aspects related to data lineage:

  1. Importance of Data Lineage:

    • Data lineage is crucial for organizations to understand where data is coming from, how it's being used, and the impact of any changes.
    • It helps address challenges like data siloes, lack of communication, and documentation across various teams and technologies.
  2. Open Lineage Framework:

    • Open Lineage is a standard specification and protocol for capturing data lineage across different data processing frameworks and tools.
    • It allows producers (e.g., Spark, Airflow, dbt) to emit lineage information in a standardized format, and consumers (e.g., AWS Data Catalog) to integrate with it.
    • This reduces the need for individual integrations between producers and consumers, promoting interoperability.
  3. AWS Data Catalog Lineage:

    • AWS Data Catalog has integrated with the Open Lineage framework to provide data lineage capabilities in its product, AWS Data Zone.
    • Data Zone aims to address four key themes around data lineage: trust, impact analysis, troubleshooting, and governance.
    • It captures lineage information from various sources like AWS Glue and Amazon Redshift, and provides a visual interface to explore the lineage.
    • Data Zone also supports column-level lineage and versioning to facilitate detailed root cause analysis.
  4. Customer Use Case: San Diego Gas & Electric (SDG&E):

    • SDG&E has adopted a data mesh architecture on AWS to manage their complex, siloed data landscape.
    • They have leveraged AWS Data Zone and the Open Lineage framework to establish data lineage, not only for data in AWS but also for on-premises data sources.
    • This has enabled them to improve data compliance, build trusted data products, and accelerate data-driven decision-making across the organization.

Key Takeaways

  1. Data lineage is crucial for organizations to understand the provenance and movement of data, enabling better data governance, compliance, and data-driven decision-making.
  2. The Open Lineage framework provides a standardized way for data producers and consumers to integrate and capture lineage information, promoting interoperability.
  3. AWS Data Zone, powered by the Open Lineage framework, offers a comprehensive data lineage solution that addresses key use cases around trust, impact analysis, troubleshooting, and governance.
  4. Customers like SDG&E have successfully leveraged AWS Data Zone and Open Lineage to establish data lineage across their hybrid data landscape, accelerating their data-driven transformation.

Conclusion

The session highlighted the importance of data lineage, the Open Lineage framework, and how AWS Data Zone leverages it to provide a robust data lineage solution. The customer use case from SDG&E demonstrated the practical benefits of implementing data lineage in a complex, hybrid data environment.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us