Streaming Data on AWS: Key Launches and Customer Insights
Overview
- This session covered the latest updates and features in AWS's streaming data portfolio, including Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and Amazon Managed Streaming for Apache Flink (Amazon MSF).
- The session highlighted four key areas that are important to customers when it comes to streaming data technologies: performance, cost, resilience, and ease of use.
- The speakers shared details on recent launches and improvements across these areas, including:
- Amazon Kinesis Client Library (KCL) 3.0 for improved performance and cost savings
- Per-second billing for Amazon MSF, providing more granular pricing
- Amazon MSK Express Brokers for up to 3x throughput, 20x faster scaling, and improved resilience
Ingestion, Storage, and Processing
- AWS offers a suite of streaming data technologies, including:
- Ingestion and storage: Amazon Kinesis Data Streams (KDS) and Amazon MSK
- Stream processing: Amazon MSF
- Connecting sources and destinations: Amazon Kinesis Data Firehose and Amazon MSK Connect
Customer Spotlight: Mercado Libre
- Mercado Libre, a major Latin American e-commerce company, used Amazon KDS to process over 30 million incoming messages and 50 million outgoing messages per day, achieving 6-nines of uptime and reliable data replication across regions.
Key Streaming Use Cases
Customers are using AWS streaming services for:
- Real-time analytics
- Real-time data transformation
- Ingestion into a data lake
- Event-driven architectures
What's Important to Customers?
- Performance: Improved throughput and faster scaling
- Cost: Optimized compute costs and more granular billing
- Resilience: High availability and faster recovery
- Ease of Use: Reduced maintenance overhead and seamless integrations
Key Launches and Improvements
Amazon Kinesis
- KCL 3.0 provides more balanced workload distribution, enabling up to 30% compute cost savings
Amazon MSF
- Moved from per-hour to per-second billing, providing more granular pricing
Amazon MSK
- Introduced Express Brokers, delivering up to 3x throughput, 20x faster scaling, and 90% faster recovery
- Significantly improved performance, elasticity, availability, and ease of use
When to Use Amazon MSK?
- Standard Brokers: For customers migrating from existing Kafka setups and needing fine-grained control
- MSK Serverless: For workloads that don't require Kafka management
- Express Brokers: For scaled Kafka deployments, balancing performance, elasticity, and automation
Availability and Resilience
- Three pillars of highly available streaming services:
- Impact detection and avoidance
- Real-time responsiveness
- Redundant systems
Highlights:
- Seamless broker removal in MSK to avoid impacting bootstrap brokers
- 20x faster rebalancing with MSK Express Brokers compared to standard Apache Kafka
- Cross-region replication with same topic name preservation in MSK
Ease of Use
- Key aspects of ease of use:
- Choices to meet customer needs
- Diverse sources and sinks
- Agility to get to new versions
Highlights:
- Automated database-to-Apache Iceberg table replication in Amazon Kinesis Data Firehose
- Real-time retrieval-augmented generation (RAG) pipeline in Amazon MSK with Amazon Bedrock and Amazon OpenSearch
- Support for latest Flink versions, in-place upgrades, and automated rollbacks in Amazon MSF
Customer Spotlight: Verizon
- Verizon migrated from self-managed Apache Kafka to Amazon MSK, achieving:
- Improved scalability, capacity, and performance
- Reduced data loss from 5-10 TB to less than 0.1%
- Seamless in-place upgrades and security patches
- Faster scaling during peak events