Netflix's massive multi-account journey: Year two (NFX402)

Reimagining Multi-Account Deployments for Security and Speed

Introduction

  • Pratik Sharma, Principal Solutions Architect at AWS, has worked closely with the security folks at Netflix for the last 5 years.
  • In 2022 reinvent, they started a conversation on reimagining multi-AWS account deployments with security and speed in mind.
  • Over the last couple of years, they worked together with Netflix and collaborated on an idea, which they will discuss in this talk.

The Problem

  • Historically, the top triangle (AWS accounts, IAM roles, and compute resources) has been tightly coupled for valid reasons.
  • Netflix tried to migrate their workloads from a few large accounts into smaller ones, but moving compute resources along with all the roles and everything else that those resources touch is a challenging task.

The Solution

  • Netflix came up with a new idea to decouple IAM roles from AWS accounts containing compute resources while taking advantage of the strong isolation boundaries that AWS accounts offer.
  • This has become the cornerstone of Netflix's massive multi-account, multi-AWS account journey.

Key Capabilities

  1. Account-Agnostic Credential Delivery:

    • Enabling AWS IAM to trust an internal token issuer or certificate issuer.
    • Leveraging this to deliver arbitrary IAM role credentials to workloads.
    • Implementing a lightweight OIDC provider and an IMDS proxy.
  2. Reasoning about Migration Targets:

    • Estimating migration complexity, security risk, and operational risk for each application.
    • Leveraging application metadata to assess the risk and complexity.
  3. Technical Capabilities for Migration:

    • Implementing a CD hook for the migration process.
    • Deploying a robust account vending solution.
    • Providing an on/off switch for the new credential delivery mechanism.

Risk and Access Efficiency

  • Access Exposure: The number of allowed paths between application roles and resources.
  • Access Efficiency: The ratio of intended and actual access.
  • By migrating applications to dedicated accounts, they can eliminate unnecessary access paths and improve access efficiency.
  • The risk reduction benefit can be realized even before all applications are migrated, as the remaining apps in the multi-tenant account no longer have cross-access to each other's resources.

IAM Patterns

  • Leveraging Service Control Policies (SCPs), Resource Control Policies (RCPs), and Permissions Boundaries to increase developer freedom and velocity in the dedicated accounts.
  • Adopting Attribute-Based Access Control (ABAC) to simplify the control of resource access and support Infrastructure as Code.

Migration Planning and Execution

  • Categorizing applications based on their AWS service usage and migration complexity.
  • Gathering data about services accessed and resources used by each application.
  • Developing a Cloud Application Migration Platform (CAMP) to orchestrate the migration process.
  • Implementing validation checks to ensure the safety of the migration process.

Lessons Learned and Key Numbers

  • Designing for minimal app owner interaction has been beneficial.
  • Organizational buy-in and commitment are crucial for such a project.
  • It's a lot of work, and getting into the migration feedback loop as soon as possible is important.
  • Netflix has migrated 4,005 applications, including over 1,100 IAM roles, with the largest batch being 277 applications.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us