Netflix’s efficient network configuration for millions of containers (NFX306)

Scaling Container Networking at Netflix

Containers at Netflix

  • Netflix used to run everything on EC2 without containers, but eventually migrated to containers.
  • Key reasons for using containers:
    1. Portability and consistency - Ensures application runs consistently across development, testing, and production environments.
    2. Faster deployment - Containers can be deployed quickly, which is useful for CI/CD pipelines.
    3. Cost and efficiency - Allows better utilization of resources like CPUs, GPUs, and memory.

Networking Challenges with Containers

  • Providing IP addresses to hundreds of containers on a single node.
  • Ensuring containers can access services and be accessed securely and efficiently at Netflix scale.

Netflix's Container Management System - Titus

  • Titus is Netflix's multi-tenant container orchestration system with tight AWS integration.
  • Titus acts as a "magic box" that can deploy applications as containers on EC2 instances, improving resource utilization.

Container Networking Requirements

  1. Developer Experience: Containers should run like EC2 VMs, with each container having a single IP address, just like EC2 instances.
  2. Isolation:
    • Data plane isolation - Containers should be isolated from each other's network traffic.
    • Account-level isolation - Some teams require their applications to run in separate accounts.
  3. Low Latency: The network setup time for containers, including for short-lived containers and bursts, should be minimal.

Container Networking Design

  1. Secondary ENIs: Attaching multiple secondary ENIs to an EC2 instance and allocating IPs to containers.
    • Limitations: Account isolation and ENI limit per instance.
  2. ENI Trunking:
    • Using a single secondary "trunk" ENI and associating branch ENIs in different VPCs/accounts.
    • Uses VLAN tagging and IPVLAN to route traffic to containers.
  3. Bandwidth Limiting and Traffic Shaping:
    • Using Linux traffic control (TC) with Hierarchical Token Bucket (HTB) algorithm to limit bandwidth and isolate noisy containers.
    • Classifying traffic based on container IP addresses using BPF.

Reducing Network Setup Latency

  1. Pre-Assigning and Caching IPs: Reducing API calls to assign IPs by pre-assigning and caching multiple IPs.
  2. Request Batching: Batching IP assignment requests to reduce the number of API calls.
  3. Prefix Delegation:
    • Requesting a IPv6 prefix delegation from AWS, allowing containers to use 2^48 addresses without additional API calls.
    • Enables "High-Skill Network Mode" where containers share a common subnet and security groups.

IPv6 Adoption and Transition

  • Netflix is committed to migrating to IPv6 to leverage the benefits of prefix delegation.
  • Developed an in-house solution called TSA (Titus Set Comp Agent) to transparently handle communication between IPv6-only containers and IPv4-only endpoints.

Itman: Netflix's IP Management System

  • Itman (IP Management) is the system that manages container networking at Netflix.
  • It consists of a distributed service cluster that handles IP address assignment and network setup for containers.
  • Leverages techniques like request batching and prefix delegation to optimize network setup latency and API costs.

Ongoing Improvements

  • Continued IPv6 migration to benefit from prefix delegation.
  • Exploring more performant traffic shaping mechanisms to address global lock issues in HTB.
  • Investigating ways to reduce the extra TCP stack traversal overhead in the current IPVLAN-based data plane.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us