Accelerate value from data: Migrating from batch to stream processing (ANT324)

Here's a detailed summary of the video transcription in markdown format, broken down into sections:

Evolution of Data Processing

  • Humans and businesses have been collecting data for thousands of years, starting with clay tablets in ancient Mesopotamia.
  • Data production and processing has exploded in the 21st century due to the ubiquity of the internet, smartphones, and e-commerce.
  • Data is now being produced continuously, from diverse sources, and is being analyzed by many applications within businesses.

Modern Business Needs

  • Businesses still require reporting, but also need the ability to generate faster insights and power AI/ML capabilities.
  • Faster insights lead to better and faster decision-making, while AI/ML capabilities can provide differentiation for businesses.

Batch vs. Streaming Processing

  • Batch processing is useful for powering business reporting and BI tools, but is insufficient for generating faster insights and powering AI/ML.
  • Streaming processing can provide real-time data processing, enabling faster insights and better support for AI/ML use cases.

Streaming Architecture

  1. Producing and Storing Data:

    • Use CDC (Change Data Capture) tools like AWS DMS or Debezium to stream data changes from databases to streaming storage like Amazon Kinesis or Apache Kafka.
    • This reduces load on databases and provides low-latency data ingestion.
  2. Processing Data in Motion:

    • Use a managed service for Apache Flink to continuously process the streaming data, perform filtering, enrichment, and aggregation.
    • Flink's stateful processing capabilities enable complex event processing and machine learning inference.
  3. Writing Data to Data Store:

    • Use a serverless service like Amazon Data Fusion to ingest data into different destinations, including data lakes like Apache Iceberg.
    • Iceberg supports updates and schema evolution, which are important for streaming use cases.

Key Takeaways

  1. Streaming can enable real-time insights and synchronize systems with continuous data production.
  2. Streaming can satisfy both latency-tolerant and latency-sensitive use cases, replacing the need for separate batch and streaming architectures.
  3. Streaming technologies are becoming more mature, cost-effective, and easier to use, making them suitable for mission-critical workloads.

Call to Action

  • Learn more about streaming technologies by exploring the session catalog and workshops.
  • Identify use cases where you can start implementing streaming, even starting small.
  • Provide feedback on the session to help improve future content.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us