TalksAWS re:Invent 2025 - Amazon Kinesis Data Streams under the hood (ANT423)

AWS re:Invent 2025 - Amazon Kinesis Data Streams under the hood (ANT423)

AWS Kinesis Data Streams: Powering Real-Time Streaming at Massive Scale

Overview of Kinesis Data Streams

  • Fully managed, serverless data streaming service
  • Enables real-time collection, processing, and analysis of data from various sources
  • Scales elastically to handle massive throughput, from 100 messages/second to 807 million records/second during peak events

Key Kinesis Data Streams Principles

  1. Elastic Scaling: Automatically scales up and down based on load, without manual intervention
  2. Ease of Use: Serverless model, no infrastructure management required
  3. Reliability: Provides low-latency, predictable performance at any scale
  4. High Availability: Architected across multiple availability zones for redundancy

The Shard: Fundamental Unit of Kinesis Scaling

  • Shard is the atomic unit of scale, with defined throughput limits (1 MB/s writes, 2 MB/s reads)
  • Partition keys are hashed and distributed across shards, enabling horizontal scalability
  • Challenges with shards include message ordering, scaling back in, and uneven load distribution

Kinesis On-Demand and Advantages

  • On-Demand mode automatically scales shards up and down based on load
  • Kinesis On-Demand Advantage adds "warm throughput" reservations for predictable performance
  • 40% cheaper than regular On-Demand, for workloads with committed usage

Large Record Support

  • Increased maximum record size from 1 MB to 10 MB, without additional cost
  • Uses a token bucket approach to handle bursts within shard throughput limits

Handling Spiky Workloads: Social Media Sentiment Analysis Example

  • Scenario: Launching a new product, with a viral social media spike
  • On-Demand Advantage with warm throughput reservations prevented data loss during the spike
  • Automatically scaled from 4 shards to 128 shards to handle 57.2 million records

Kinesis Streaming Architecture Under the Hood

  • Shard splitting and merging to maintain message ordering and enable scaling back in
  • Distributed storage across multiple availability zones for high availability
  • Front-end API layer that abstracts the complexity of the backend infrastructure

Consumer Responsibilities

  • Awareness of shards and lease management for distributed processing
  • Kinesis Client Library v3 handles state management and load balancing automatically

Producer Best Practices

  • Batch multiple records into larger payloads to reduce network overhead
  • Use Kinesis Producer Library for automatic record aggregation, compression, and retries

Key Takeaways

  • Kinesis Data Streams provides a highly scalable, reliable, and easy-to-use streaming platform
  • Innovative architectural decisions, like shard splitting and distributed storage, enable massive scale
  • Managed services like On-Demand Advantage and client libraries simplify streaming application development
  • Kinesis Data Streams can handle spiky workloads and business events with ease, thanks to its elastic scaling and advanced features

Additional Resources

  • Kinesis Data Streams documentation: [QR Code Link]
  • Kinesis Data Streams demo application: [QR Code Link]
  • Kinesis Producer Library: [QR Code Link]
  • Kinesis Client Library: [QR Code Link]

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.