Scaling and evolving media storage at Netflix with Amazon S3 (NFX302)

Content Drive and Data Lifecycle Management at Netflix

Netflix Studio Ecosystem and Media Asset Management

  • Netflix has pioneered the "studio in the cloud" model, enabling artists to work from different corners of the world in creating stories and assets.
  • Media assets go through many stages and phases, with constant transformations and frequent uploads and downloads.
  • To support these globally distributed processes, Netflix needs a highly efficient and performant media cloud infrastructure.

Content Drive: A Metadata Abstraction Layer for Media Assets

Key Features and Motivations

  • Provide a file-folder interface for artists and studio workflows to access media assets stored in AWS S3.
  • Enable dynamic access control and collaboration/sharing capabilities.
  • Support new media enhancement features as first-class capabilities.

Architecture and Core Entities

  • REST/GraphQL API layer, file service, access control, data transfer, and persistence components.
  • Key entities: Workspace (user, project, team), File, Folder, Sequence, Snapshot.
  • Hierarchical tree structure with unique file path invariant.
  • Authorization based on workspace type (user-owned vs. application-owned).

User Workflow and Hybrid Storage Support

  • Example user workflow: asset ingestion, file upload, checksum verification, asset creation.
  • Content Drive supports hybrid storage models, allowing representation and management of assets stored in different locations (cloud, on-premises).

Scale and Optimization

  • As of October 2024, 14.5 billion Content Drive nodes, with 50 million new nodes created weekly.
  • Optimization techniques: distributed caching, follower reads, time-based IDs, batch writes, CTE offloading, and query optimization.

Data Lifecycle Management

Challenges and Opportunities

  • Explosive growth in content volume (doubling year-over-year) due to geographical expansion, vendor scaling, and high data movement.
  • 50% year-over-year increase in storage and access costs, with a proportional increase in unused or "dark" data.

Requirements and Approach

  • Policy-based automation, security, real-time insights, object-level granularity, archival integrity, secure purging, and auditing capabilities.
  • Hybrid solution leveraging S3 capabilities with custom tagging and transition rules.

Architecture Components

  • Policy Manager: Translates business context into actionable lifecycle requests.
  • Storage Metadata Layer (Content Drive): Maintains the separation between metadata and data, enabling efficient lifecycle operations.
  • Storage Lifecycle Manager: Orchestrates lifecycle operations, using a distributed processing model.
  • S3 Abstraction Layer: Interacts with S3 to tag objects and perform lifecycle transitions.

Impact and Future Initiatives

  • 70% cost savings projected through intelligent retention rules and cold data migration.
  • 77PB of data archived, 200TB temporarily restored, and 33PB purged.
  • Future initiatives: Hybrid storage management, intelligent tagging, and metadata tiering.

Conclusion

Content Drive is a cloud-native media asset management solution that enables cost-efficient and scalable storage of media assets at Netflix. The integrated data lifecycle management system further optimizes storage costs by intelligently managing the transition of content across different storage tiers based on access patterns and business requirements.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us