TalksAmazon EKS as data platform for analytics (KUB405)
Amazon EKS as data platform for analytics (KUB405)
Here is a detailed summary of the video transcription in markdown format, broken down into sections:
Data and Analytics Platforms
Data has become a key commodity with various formats and personas (external users, internal users, data scientists, developers, etc.)
Need to support real-time decision-making and processing of massive data volumes
Platform engineering has emerged to address the needs of both developers (autonomy, open-source tools) and platform teams (security, scalability, cost, performance)
New data-intensive workloads (notebooks, data lakes, data meshes, streaming, ML/AI) pose new challenges for platform engineering
Optimizing Analytics Platforms on Kubernetes
Layer 1: Building a Production-Ready Kubernetes Cluster
Use non-routable IP ranges for network scaling
Configure VPC-CNI for efficient IP management
Optimize CoreDNS performance and resolution
Leverage managed scaling for CoreDNS
Monitor the Kubernetes control plane, API throttling, and network health
Layer 2: Installing Open-Source Tools
Use the Spark Operator for running Apache Spark
Integrate Apache Unicorn for priority-based job scheduling
Leverage workflow engines like Apache Airflow or Argo Workflows
Layer 3: Onboarding Tenants
Provide a self-service API for tenants to manage resources (IAM, S3, etc.)
Use projects like AWS Controllers for Kubernetes (ACK) to extend the Kubernetes API
Customer Case Study: Appsflyer
Challenges
Massive data volumes (100+ PB daily)
Highly dynamic and distributed compute resources
Strict SLAs for data processing
Solutions
Migrated from EC2 to EKS with Carpenter for efficient scaling and cost optimization
Leveraged Graviton instances and local storage for performance
Enriched observability by combining metrics from Carpenter, Kubernetes, and Spark
Empowered data engineers with self-service APIs and automation
Results
60% cost reduction
35% improvement in SLA
Reduced operational overhead for platform engineers
Key Takeaways
Optimize and monitor EKS for analytics workloads using best practices
Align tools and practices to foster organizational growth
Enable self-service APIs to empower data engineers and scientists
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.