Cross-engine governance for a data lakehouse built on open standards (ARC332)

Allianz's Data Ecosystem Transformation

Current Challenges

  1. Complex Custom Logic: Allianz, as a large and regulated company, had built a lot of custom logic over the past 20 years to make external systems work for their needs, resulting in a complex and difficult-to-maintain ecosystem.
  2. Heterogeneous Data Sources and Ingestion: Allianz had around 50 data sources, including databases, SAP data, and Excel sheets, that were ingested into the ecosystem using various custom mechanisms, making the process complex, error-prone, and difficult to maintain.
  3. Historization Bottleneck: The historization layer, which provided a valuable 20-year data history, had issues with access management, compliance, and data privacy regulations, making it difficult for data consumers to access the data they needed.

Transformation to a New Data Ecosystem

  1. Goals: Allianz aimed to create a scalable, performant, and interoperable data ecosystem that could handle any data type and enable data sharing across the company.
  2. Iceberg as the Foundation: Allianz chose Apache Iceberg, an open table format, as the foundation of their new data ecosystem, which provided features like transactional safety, schema evolution, and time travel.
  3. Snowflake Integration: Allianz leveraged Snowflake's support for Iceberg, which allowed for efficient data ingestion, transformation, and sharing, while benefiting from Snowflake's security and governance features.
  4. Data Ingestion and Processing: Allianz used AWS Glue to ingest data from various sources into the Iceberg layer, allowing different processing engines, including Snowflake, to access the data directly without the need for data movement.
  5. Access Management and Data Privacy: Allianz moved certain data privacy and access management rules upfront, ensuring that data leaving the Iceberg layer was already compliant, and leveraged Snowflake's capabilities to further optimize access management.
  6. Automation and Scalability: Allianz achieved 80% automation in their data processing tasks, allowing their teams to focus on more value-added activities, and were able to scale their Iceberg-based ecosystem to handle large volumes of data.

Future Plans

  1. Bring Your Own Data: Allianz plans to empower source systems to directly bring their data into the Iceberg layer, using open table formats, reducing the need for central data ingestion.
  2. Open Data Catalog: Allianz is exploring the use of an open data catalog, such as Apache Polaris, to further enhance interoperability and cross-engine security and access management.

Key Takeaways

  1. Adopting open table formats, like Apache Iceberg, can help address complex custom logic, heterogeneous data sources, and historization challenges in a data ecosystem.
  2. Integrating Iceberg with a cloud-native data platform, such as Snowflake, can provide a scalable, performant, and interoperable data ecosystem.
  3. Upfront access management and data privacy controls, combined with federated data processing, can improve the operability and governance of a data ecosystem.
  4. Enabling source systems to directly contribute data to the Iceberg layer, using open table formats, can empower decentralized data teams and reduce the burden on central data teams.
  5. Leveraging open data catalog solutions, like Apache Polaris, can further enhance cross-engine interoperability and security for a federated data ecosystem.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us