Talks AWS re:Invent 2025 - Amazon Kinesis Data Streams under the hood (ANT423) VIDEO
AWS re:Invent 2025 - Amazon Kinesis Data Streams under the hood (ANT423) AWS Kinesis Data Streams: Powering Real-Time Streaming at Massive Scale
Overview of Kinesis Data Streams
Fully managed, serverless data streaming service
Enables real-time collection, processing, and analysis of data from various sources
Scales elastically to handle massive throughput, from 100 messages/second to 807 million records/second during peak events
Key Kinesis Data Streams Principles
Elastic Scaling : Automatically scales up and down based on load, without manual intervention
Ease of Use : Serverless model, no infrastructure management required
Reliability : Provides low-latency, predictable performance at any scale
High Availability : Architected across multiple availability zones for redundancy
The Shard: Fundamental Unit of Kinesis Scaling
Shard is the atomic unit of scale, with defined throughput limits (1 MB/s writes, 2 MB/s reads)
Partition keys are hashed and distributed across shards, enabling horizontal scalability
Challenges with shards include message ordering, scaling back in, and uneven load distribution
Kinesis On-Demand and Advantages
On-Demand mode automatically scales shards up and down based on load
Kinesis On-Demand Advantage adds "warm throughput" reservations for predictable performance
40% cheaper than regular On-Demand, for workloads with committed usage
Large Record Support
Increased maximum record size from 1 MB to 10 MB, without additional cost
Uses a token bucket approach to handle bursts within shard throughput limits
Handling Spiky Workloads: Social Media Sentiment Analysis Example
Scenario: Launching a new product, with a viral social media spike
On-Demand Advantage with warm throughput reservations prevented data loss during the spike
Automatically scaled from 4 shards to 128 shards to handle 57.2 million records
Kinesis Streaming Architecture Under the Hood
Shard splitting and merging to maintain message ordering and enable scaling back in
Distributed storage across multiple availability zones for high availability
Front-end API layer that abstracts the complexity of the backend infrastructure
Consumer Responsibilities
Awareness of shards and lease management for distributed processing
Kinesis Client Library v3 handles state management and load balancing automatically
Producer Best Practices
Batch multiple records into larger payloads to reduce network overhead
Use Kinesis Producer Library for automatic record aggregation, compression, and retries
Key Takeaways
Kinesis Data Streams provides a highly scalable, reliable, and easy-to-use streaming platform
Innovative architectural decisions, like shard splitting and distributed storage, enable massive scale
Managed services like On-Demand Advantage and client libraries simplify streaming application development
Kinesis Data Streams can handle spiky workloads and business events with ease, thanks to its elastic scaling and advanced features
Additional Resources
Kinesis Data Streams documentation: [QR Code Link]
Kinesis Data Streams demo application: [QR Code Link]
Kinesis Producer Library: [QR Code Link]
Kinesis Client Library: [QR Code Link]
Your Digital Journey deserves a great story. Build one with us.