Amazon EKS as data platform for analytics (KUB405)

Here is a detailed summary of the video transcription in markdown format, broken down into sections:

Data and Analytics Platforms

Data has become a key commodity with various formats and personas (external users, internal users, data scientists, developers, etc.)
Need to support real-time decision-making and processing of massive data volumes
Platform engineering has emerged to address the needs of both developers (autonomy, open-source tools) and platform teams (security, scalability, cost, performance)
New data-intensive workloads (notebooks, data lakes, data meshes, streaming, ML/AI) pose new challenges for platform engineering

Optimizing Analytics Platforms on Kubernetes

Layer 1: Building a Production-Ready Kubernetes Cluster

Use non-routable IP ranges for network scaling
Configure VPC-CNI for efficient IP management
Optimize CoreDNS performance and resolution
Leverage managed scaling for CoreDNS
Monitor the Kubernetes control plane, API throttling, and network health

Layer 2: Installing Open-Source Tools

Use the Spark Operator for running Apache Spark
Integrate Apache Unicorn for priority-based job scheduling
Leverage workflow engines like Apache Airflow or Argo Workflows

Layer 3: Onboarding Tenants

Provide a self-service API for tenants to manage resources (IAM, S3, etc.)
Use projects like AWS Controllers for Kubernetes (ACK) to extend the Kubernetes API

Customer Case Study: Appsflyer

Challenges

Massive data volumes (100+ PB daily)
Highly dynamic and distributed compute resources
Strict SLAs for data processing

Solutions

Migrated from EC2 to EKS with Carpenter for efficient scaling and cost optimization
Leveraged Graviton instances and local storage for performance
Enriched observability by combining metrics from Carpenter, Kubernetes, and Spark
Empowered data engineers with self-service APIs and automation

Results

60% cost reduction
35% improvement in SLA
Reduced operational overhead for platform engineers

Key Takeaways

Optimize and monitor EKS for analytics workloads using best practices
Align tools and practices to foster organizational growth
Enable self-service APIs to empower data engineers and scientists

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Amazon EKS as data platform for analytics (KUB405)

Data and Analytics Platforms

Optimizing Analytics Platforms on Kubernetes

Layer 1: Building a Production-Ready Kubernetes Cluster

Layer 2: Installing Open-Source Tools

Layer 3: Onboarding Tenants

Customer Case Study: Appsflyer

Challenges

Solutions

Results

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Amazon EKS as data platform for analytics (KUB405)

Data and Analytics Platforms

Optimizing Analytics Platforms on Kubernetes

Layer 1: Building a Production-Ready Kubernetes Cluster

Layer 2: Installing Open-Source Tools

Layer 3: Onboarding Tenants

Customer Case Study: Appsflyer

Challenges

Solutions

Results

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.