Solving different data ingestion use cases with AWS (ANT330)

Outline

Data Ingestion Patterns

Ingesting Data into Data Warehouses

Using AWS Glue and AWS Redshift for Data Warehouse Ingestion

Ingesting Data into Data Lakes

AWS Glue for Batch and Streaming Ingestion
Amazon Kinesis Data Streams and Amazon MSK for Real-time Ingestion
Amazon S3 and Amazon Athena for File-based Ingestion

Ingesting Data into Lakehouse

Amazon Sagemaker Lakehouse and Zero-ETL Integrations

Ingesting Data into Log and Analytics Services

Amazon OpenSearch Service Integrations

Ingestion Strategies and Best Practices

Leveraging Zero-ETL Integrations
Optimizing Performance and Cost for Ingestion

1. Data Ingestion Patterns

The presenters discussed three main data ingestion patterns:

Inside-Out: Data ingested from a centralized data lake to purpose-built data stores like data warehouses or ML applications.

Outside-In: Data coming from business partners or specialized systems shared with the centralized data hub.

Around the Perimeter: Users sharing data with each other to meet common business goals.

2. Ingesting Data into Data Warehouses

Presenters used Amazon Redshift as an example data warehouse.

Highlighted the use of AWS Glue and AWS Redshift's Zero-ETL integration for efficient data ingestion.

Zero-ETL allows configuring data movement without creating custom pipelines.
Supports integration with various data sources like databases, SaaS applications, and files.

Discussed strategies like auto-copy from S3, integrating streaming data from Kinesis/MSK, and using AWS DMS for on-premises database ingestion.

3. Ingesting Data into Data Lakes

Presenters discussed using AWS Glue for both batch and streaming data ingestion into data lakes.

Glue provides connectors for various data sources and supports custom connectors.
Continuously running Glue jobs for real-time ingestion from streaming sources like Kinesis and MSK.

Highlighted Amazon S3 and Amazon Athena for file-based ingestion and querying.

Discussed the use of Amazon Kinesis Data Firehose for efficient, scalable, and cost-effective data ingestion into data lakes.

4. Ingesting Data into Lakehouse

Presenters introduced the concept of Amazon Sagemaker Lakehouse, which bridges the gap between data warehouses and data lakes.

Discussed using Zero-ETL Integrations to ingest data from various sources directly into the Lakehouse.

Highlighted the support for open table formats like Apache Iceberg, Apache Hudi, and Delta Lake for Lakehouse ingestion.

6. Ingestion Strategies and Best Practices

Leverage Zero-ETL Integrations to reduce operational overhead and improve data availability.

Optimize performance and cost by:

Choosing the right worker types and auto-scaling for AWS Glue jobs.
Utilizing Kinesis Data Streams' enhanced fan-out and express Brokers for MSK.
Implementing fault tolerance and parallelism strategies for Flink.
Configuring dead-letter queues and selective field mapping for OpenSearch ingestion.

Overall, the presenters provided a comprehensive overview of various data ingestion patterns and strategies, highlighting the use of managed AWS services to build efficient, scalable, and cost-effective data ingestion architectures.

Solving different data ingestion use cases with AWS (ANT330)

Data Ingestion Strategies on AWS

Outline

1. Data Ingestion Patterns

2. Ingesting Data into Data Warehouses

3. Ingesting Data into Data Lakes

4. Ingesting Data into Lakehouse

5. Ingesting Data into Log and Analytics Services

6. Ingestion Strategies and Best Practices

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Solving different data ingestion use cases with AWS (ANT330)

Data Ingestion Strategies on AWS

Outline

1. Data Ingestion Patterns

2. Ingesting Data into Data Warehouses

3. Ingesting Data into Data Lakes

4. Ingesting Data into Lakehouse

5. Ingesting Data into Log and Analytics Services

6. Ingestion Strategies and Best Practices

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.