TalksAWS re:Invent 2025 - Advanced data modeling with Amazon DynamoDB (DAT414)
AWS re:Invent 2025 - Advanced data modeling with Amazon DynamoDB (DAT414)
Advanced Data Modeling with Amazon DynamoDB
Key Characteristics of DynamoDB
Fully Managed Service
DynamoDB is a fully managed, highly available, and scalable NoSQL database service provided by AWS.
It has a regional, multi-tenant, self-healing fleet of storage nodes, load balancers, and request routers that ensure high availability and consistent performance.
Consumption-Based Pricing
DynamoDB uses a consumption-based pricing model, where you pay for the read and write capacity units (RCUs and WCUs) consumed, as well as the storage used.
This pricing model provides predictable billing impacts, allowing you to estimate costs accurately when making changes to your data model.
Consistent Performance at Any Scale
DynamoDB provides consistent, single-digit millisecond latency performance, regardless of the table size, from megabytes to petabytes.
This is achieved through DynamoDB's partitioning and distribution of data across multiple storage nodes, which allows for constant-time access to your data.
Understanding DynamoDB Data Modeling
Primary Keys
DynamoDB tables require a primary key, which can be either a simple primary key (partition key only) or a composite primary key (partition key and sort key).
The primary key must be unique for each item in the table and is used to distribute and access data efficiently.
Partitioning and the DynamoDB API
DynamoDB automatically partitions your data across multiple storage nodes based on the partition key.
The DynamoDB API provides single-item actions (CRUD operations) that require the full primary key, as well as query operations that allow you to fetch multiple items with the same partition key.
Secondary Indexes
Secondary indexes in DynamoDB provide additional read-based access patterns by creating a fully managed copy of your data with a new primary key.
There are two types of secondary indexes: global secondary indexes (GSIs) and local secondary indexes (LSIs).
Data Modeling Goals and Process
Maintaining Data Integrity
Ensure a valid schema in your application code to maintain the integrity of the data you're saving.
Enforce constraints and handle data consistency when duplicating data across items.
Enabling Efficient Data Access
Design your primary keys to uniquely identify and efficiently access the relevant data.
Use secondary indexes to enable additional read-based access patterns.
Keeping It Simple
Avoid over-complicating your data model and stick to the basics of DynamoDB's partitioning and API.
Understand your access patterns upfront and model your data accordingly.
Evolving Your DynamoDB Schema
Schema Changes Not Affecting Data Access
Adding new, unindexed attributes to your table is the easiest type of schema evolution, as DynamoDB is schema-less.
You can handle these changes entirely within your application code.
Adding New Indexes
You can add new global secondary indexes (GSIs) to your DynamoDB table at any time, and DynamoDB will automatically backfill the index for you.
This allows you to enable new read-based access patterns without modifying your existing data.
Changing Existing Data
If you need to add a new attribute or index that requires updating existing data, you'll need to perform a backfill operation.
Tools like AWS Glue, AWS Step Functions, and the DynamoDB Bulk Executor can help automate and simplify this process.
Anti-Patterns and Best Practices
Avoiding the "Kitchen Sink" Item Collection
Only group different entity types in the same item collection if you have at least one access pattern that requires fetching them together.
Prefer to have separate tables or item collections for unrelated entities.
Embracing the DynamoDB API
Avoid hiding the DynamoDB API behind abstraction layers, as this can lead to performance issues and suboptimal usage of the service.
Leverage the full capabilities of the DynamoDB API, such as atomic updates and conditional writes, to optimize your data model.
Managing Item Size and Transactions
Keep item sizes small to minimize costs and improve performance.
Use transactions sparingly, only for low-volume, high-value operations where data consistency is critical.
Key Takeaways
Understand DynamoDB's unique characteristics, including its partitioning, API, and consumption-based pricing, to design an effective data model.
Focus on maintaining data integrity and enabling efficient data access through thoughtful primary key and secondary index design.
Leverage DynamoDB's schema flexibility to evolve your data model over time, using tools to automate backfill operations when necessary.
Avoid anti-patterns like the "kitchen sink" item collection and embrace the DynamoDB API to optimize your application's performance and cost-effectiveness.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.