# Data modeling for DynamoDB tables
Data modeling

Before we dive into data modeling, it's important to understand some DynamoDB fundamentals. DynamoDB is a key-value NoSQL database which allows flexible schema. The set of data attributes apart from the key attributes for each item can be either uniform or discrete. The DynamoDB key schema is in the form of either a simple primary key where a partition key uniquely identifies an item, or in the form of a composite primary key where a combination of a partition key and sort key uniquely defines an item. The partition key is hashed to determine the physical location of data and retrieve it. Therefore, it is important to choose a high cardinality and horizontally scalable attribute as a partition key to ensure even distribution of data. The sort key attribute is optional in the key schema and having a sort key enables modelling one-to-many relationships and creating item collections in DynamoDB. Sort keys are also referred to as range keys—they are used to sort items in an item collection and also allow flexible range-based operations.

For more details and best practices on DynamoDB key schema, you can refer to the following:
+ [Partitions and data distribution in DynamoDB](HowItWorks.Partitions.md) 
+ [Best practices for designing and using partition keys effectively in DynamoDB](bp-partition-key-design.md) 
+ [Best practices for using sort keys to organize data in DynamoDB](bp-sort-keys.md) 
+ [Choosing the right DynamoDB partition key](https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/) 

Secondary indexes are often needed to support additional query patterns in DynamoDB. Secondary indexes are shadow tables where the same data is organised via a different key schema compared to the base table. A local secondary index (LSI) shares the same partition key as the base table and allows having an alternate sort key allowing it to share the base table’s capacity. A global secondary index (GSI) can have a different partition key as well as a different sort key attribute than the base table which means throughput management for a GSI is independent of the base table.

For more details on secondary indexes and best practices, you can refer to the following:
+ [Improving data access with secondary indexes in DynamoDB](SecondaryIndexes.md) 
+ [Best practices for using secondary indexes in DynamoDB](bp-indexes.md) 

Let's now look at data modeling a little closer. The process of designing a flexible and highly-optimized schema on DynamoDB, or any NoSQL database for that matter, can be a challenging skill to learn. The goal of this module is to help you develop a mental flowchart for designing a schema that will take you from use case into production. We will start with an introduction to the foundational choice of any design, single table versus multiple table design. Then we will review the multitude of design patterns (building blocks) that can be used to achieve various organizational or performance results for your application. Finally, we are including a variety of complete schema design packages for different use cases and industries.

![\[Image showing the conceptual relationship between the data, the blocks that sit under them, and then the foundation that sits under the blocks.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SchemaDesign.png)


**Topics**
+ [

# Item collections - how to model one-to-many relationships in DynamoDB
](WorkingWithItemCollections.md)
+ [

# Data Modeling foundations in DynamoDB
](data-modeling-foundations.md)
+ [

# Data modeling building blocks in DynamoDB
](data-modeling-blocks.md)
+ [

# Data modeling schema design packages in DynamoDB
](data-modeling-schemas.md)
+ [

# Best practices for modeling relational data in DynamoDB
](bp-relational-modeling.md)

# Item collections - how to model one-to-many relationships in DynamoDB
Working with Item Collections

In DynamoDB, an *item collection* is a group of items that share the same partition key value, which means the items are related. Item collections are the primary mechanism to model one-to-many relationships in DynamoDB. Item collections can only exist on tables or indexes configured to use a [composite primary key](HowItWorks.CoreComponents.md#HowItWorks.CoreComponents.PrimaryKey).

**Note**  
Item collections can exist either in a base table or a secondary index. For more information specifically about how item collections interact with indexes, see [Item collections in Local Secondary Indexes](LSI.md#LSI.ItemCollections).

Consider the following table showing three different users and their in-game inventories:

![\[Three different item collections with different attributes.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/item_collection.png)


For some items in each collection, the sort key is a concatenation made up of information used to group data, such as `inventory::armor`, `inventory::weapon` or `info`. Each item collection can have a different combination of these attributes as the sort key. User `account1234` has an `inventory::weapons` item, while user `account1387` does not (because they have not found any yet). User `account1138` only uses two items for their sort key (since they have no inventory yet) while the other users use three.

DynamoDB lets you selectively retrieve items from these item collections to do the following:
+ Retrieve all items from a particular user
+ Retrieve only one item from a particular user
+ Retrieve all the items of a specific type belonging to a particular user

## Speed up queries by organizing your data with item collections


In this example, each of the items in these three item collections represents a player and the data model we have chosen, based off the game’s and player’s access patterns. What data does the game need? When does it need it? How frequently does it need it? What’s the cost of doing it this way? These data modeling decisions were made based off the answers to these questions.

In this game, there is a different page presented to the player for their inventory for weapons and another page for armor. When the player opens their inventory, weapons are shown first because we want that page to load extremely fast, while subsequent inventory pages can load after that. Since each of these item types can be quite large as the player acquires more in-game items, we decided that each inventory page would be its own item in the player’s item collection in the database. 

The following section talks more about how you can interact with item collections through the `Query` operation.

**Topics**
+ [

## Speed up queries by organizing your data with item collections
](#WorkingWithItemCollections.Example)

# Data Modeling foundations in DynamoDB
Data modeling foundations

This section covers the foundation layer by examining the two types of table design: single table and multiple table.

![\[Image showing the conceptual relationship between the data, the blocks that sit under them, and then the foundation that sits under the blocks. Emphasis on the foundation.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SchemaDesignFoundation.png)


## Single table design foundation
Single table design

One choice for the foundation of our DynamoDB schema is **single table design**. Single table design is a pattern that allows you to store multiple types (entities) of data in a single DynamoDB table. It aims to optimize data access patterns, improve performance, and reduce costs by eliminating the need for maintaining multiple tables and complex relationships between them. This is possible because DynamoDB stores items with the same partition key (known as an item collection) on the same partition(s) as each other. In this design, different types of data are stored as items in the same table, and each item is identified by a unique sort key.

![\[Image showing a table and how the sort key is used to differentiate each item by entity type within the same UserID Item collection.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SingleTableSchema.png)


**Advantages**
+ Data locality to support queries for multiple entity types in a single database call
+ Reduces overall financial and latency costs of reads:
  + A single query for two items totalling less than 4KB is 0.5 RCU eventually consistent
  + Two queries for two items totalling less than 4KB is 1 RCU eventually consistent (0.5 RCU each)
  + The time to return two separate database calls will average higher than a single call
+ Reduces the number of tables to manage:
  + Permissions do not need to be maintained across multiple IAM roles or policies 
  + Capacity management for the table is averaged across all entities, usually resulting in a more predictable consumption pattern
  + Monitoring requires fewer alarms
  + Customer Managed Encryption Keys only need to be rotated on one table
+ Smooths traffic to the table:
  + By aggregating multiple usage patterns to the same table, the overall usage tends to be smoother (the way a stock index's performance tends to be smoother than any individual stock) which works better for achieving higher utilization with provisioned mode tables

**Disadvantages**
+ Learning curve can be steep due to paradoxical design compared to relational databases
+ Data requirements must be consistent across all entity types
  + Backups are all or nothing so if some data is not mission critical, consider keeping it in a separate table
  + Table encryption is shared across all items. For multi-tenant applications with individual tenant encryption requirements, client side encryption would be required
  + Tables with a mix of historical data and operational data will not see as much of a benefit from enabling the Infrequent Access Storage Class. For more information, see [DynamoDB table classes](HowItWorks.TableClasses.md) 
+ All changed data will be propagated to DynamoDB Streams even if only a subset of entities need to be processed.
  + Thanks to Lambda event filters, this will not affect your bill when using Lambda, but will be an added cost when using the Kinesis Consumer Library 
+ When using GraphQL, single table design will be more difficult to implement
+ When using higher-level SDK clients like Java's [`DynamoDBMapper`](DynamoDBMapper.md) or [Enhanced Client](DynamoDBEnhanced.md), it can be more difficult to process results because items in the same response may be associated with different classes

**When to use**

Single table design works well for applications that frequently query multiple entity types together or need to maintain relationships between different data types. It's particularly effective when your access patterns benefit from data locality and when you want to minimize the overhead of managing multiple tables.

## Multiple table design foundation
Multiple table design

The second choice for the foundation of our DynamoDB schema is multiple table design****. Multiple table design is a pattern that is more like a traditional database design where you store a single type(entity) of data in a each DynamoDB table. Data within each table will still be organized by partition key so performance within a single entity type will be optimized for scalability and performance, but queries across multiple tables must be done independently.

![\[Image showing a forum table containing a list of forums and some aggregate data.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/MultipleTable1.png)


![\[Image showing a thread table containing a list of threads partitioned by the specific forum they belong to.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/MultipleTable2.png)


**Advantages**
+ Simpler to design for those who aren't used to working with single table design 
+ Easier implementation of GraphQL resolvers due to each resolver mapping to a single entity(table)
+ Allows for unique data requirements across different entity types:
  + Backups can be made for the individual tables that are mission critical 
  + Table encryption can be managed for each table. For multi-tenant applications with individual tenant encryption requirements, separate tenant tables make it possible for each customer to have their own encryption key
  + Infrequent Access Storage Class can be enabled on just the tables with historical data to realize the full cost savings benefit. For more information, see [DynamoDB table classes](HowItWorks.TableClasses.md)
+ Each table will have its own change data stream allowing for a dedicated Lambda function to be designed for each type of item rather than a single monolithic processor

**Disadvantages**
+ For access patterns that require data across multiple tables, multiple reads from DynamoDB will be required and data may need to be processed/joined on the client code.
+ Operations and monitoring of multiple tables requires more CloudWatch alarms and each table must be scaled independently
+ Each tables permissions will need to be managed separately. The addition of tables in the future will require a change to any necessary IAM roles or policies

**When to use**

If your application’s access patterns do not have the need to query multiple entities or tables together, then multiple table design is a good and sufficient approach.

# Data modeling building blocks in DynamoDB
Data modeling building blocks

This section covers the building block layer to give you design patterns you can use in your application.

![\[Image showing the conceptual relationship between the data, the blocks that sit under them, and then the foundation that sits under the blocks. Emphasis on the foundation.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SchemaDesignBlocks.png)


**Topics**
+ [

## Composite sort key building block
](#data-modeling-blocks-composite)
+ [

## Multi-tenancy building block
](#data-modeling-blocks-multi-tenancy)
+ [

## Sparse index building block
](#data-modeling-blocks-sparse-index)
+ [

## Time to live building block
](#data-modeling-blocks-ttl)
+ [

## Time to live for archival building block
](#data-modeling-blocks-ttl-archival)
+ [

## Vertical partitioning building block
](#data-modeling-blocks-vertical-partitioning)
+ [

## Write sharding building block
](#data-modeling-blocks-write-sharding)

## Composite sort key building block
Composite sort key

When people think of NoSQL, they may also think of it as non-relational. Ultimately, there is no reason relationships cannot be placed into a DynamoDB schema, they just look different than relational databases and their foreign keys. One of the most critical patterns we can use to develop a logical hierarchy of our data in DynamoDB is a composite sort key. The most common style for designing one is with each layer of the hierarchy (parent layer > child layer > grandchild layer) separated by a hashtag. For example, `PARENT#CHILD#GRANDCHILD#ETC`.

![\[Image showing an item in a table with a userID as the primary key, and a combination of other attributes as the sort key.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ShoppingCart.png)


While a partition key in DynamoDB always requires the exact value to query for data, we can apply a partial condition to the sort key from left to right similar to traversing a binary tree.

In the example above, we have an e-Commerce store with a Shopping Cart that needs to be maintained across user sessions. Whenever the user logs in, they may want to see the entire Shopping Cart including items saved for later. But when they enter the checkout, only items in the active cart should be loaded for purchase. Since both of these `KeyConditions` explicitly ask for CART sort keys, the additional wishlist data is simply ignored by DynamoDB at read time. While both saved and active items are a part of the same cart, we need to treat them differently in different parts of the application, so applying a `KeyCondition` to the prefix of the sort key is the most optimized way of retrieving only the data needed for each part of the application.

**Key features of this building block**
+ Related items are stored locally to each other for effective data access 
+ Using `KeyCondition` expressions, subsets of the hierarchy can be selectively retrieved meaning there are no wasted RCUs 
+ Different parts of the application can store their items under a specific prefix preventing overwritten items or conflicting writes

## Multi-tenancy building block
Multi-tenancy

Many customers use DynamoDB to host data for their multi-tenant applications. For these scenarios, we want to design the schema in a way that keeps all data from a single tenant in its own logical partition of the table. This leverages the concept of an Item Collection, which is a term for all items in a DynamoDB table with the same partition key. For more information on how DynamoDB approaches multitenancy, see [Multitenancy on DynamoDB](https://docs.aws.amazon.com/whitepapers/latest/multi-tenant-saas-storage-strategies/multitenancy-on-dynamodb.html). 

![\[Image showing a table that could represent a multi-tenant photo site. The primary key is made up of users as the partition key and different photos as the sort key. The attribute for each item shows the URL the photo is hosted at.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/MultiTenant.png)


For this example, we are running a photo hosting site with potentially thousands of users. Each user will only upload photos to their own profile initially, but by default we will not allow a user to see the photos of any other user. An additional level of isolation would ideally be added to the authorization of each user's call to your API to ensure they are only requesting data from their own partition, but at the schema level, unique partition keys is adequate.

**Key features of this building block**
+ The amount of data read by any one user or tenant can only be as much as the total amount of items in their partition
+ Removal of a tenant's data due to an account closure or compliance request can be done tactfully and cheaply. Simply run a query where the partition key equals their tenant ID, then execute a `DeleteItem` operation for each primary key returned

**Note**  
Designed with multi-tenancy in mind, you can use different encryption key providers across a single table to safely isolate data. [AWS Database Encryption SDK](https://docs.aws.amazon.com/database-encryption-sdk/latest/devguide/what-is-database-encryption-sdk.html) for Amazon DynamoDB enables you to include client-side encryption in your DynamoDB workloads. You can perform attribute-level encryption, enabling you to encrypt specific attribute values before storing them in your DynamoDB table and search on encrypted attributes without decrypting the entire database beforehand. 

## Sparse index building block
Sparse index

Sometimes an access pattern requires looking for items that match a rare item or an item that receives a status (which requires an escalated response). Rather than regularly query across the entire dataset for these items, we can leverage the fact that **global secondary indexes (GSI)** are sparsely loaded with data. This means that only items in the base table that have the attributes defined in the index will be replicated to the index.

![\[Image showing a base table that receives a large amount of steady state data\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SparseBaseTable.png)


![\[Image showing a global secondary index that only receives items that have been escalated\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SparseGSI.png)


In this example, we see an IOT use case where each device in the field is reporting back a status on a regular basis. For the majority of the reports we expect the device to report everything is okay, but on occasion there can be a fault and it must be escalated to a repair technician. For reports with an escalation, the attribute `EscalatedTo` is added the item, but is not present otherwise. The GSI in this example is partitioned by `EscalatedTo` and since the GSI brings over keys from the base table we can still see which DeviceID reported the fault and at what time.

While reads are cheaper than writes in DynamoDB, sparse indexes are a very powerful tool for use cases where instances of a specific type of item is rare but reads to find them are common.

**Key features of this building block**
+ Write and storage costs for the sparse GSI only apply to items that match the key pattern, so the cost of the GSI can be substantially less than other GSIs that have all items replicated to them 
+ A composite sort key can still be used to further narrow down the items that match the desired query, for instance, a timestamp could be used for the sort key to only view faults reported in the last X minutes (`SK > 5 minutes ago, ScanIndexForward: False`)

## Time to live building block
Time to live

Most data have some duration of time for which it can be considered worth keeping in a primary datastore. To facilitate data aging out from DynamoDB, it has a feature called **time to live (TTL)**. The [TTL](TTL.md) feature allows you to define a specific attribute at the table level that needs monitoring for items with an epoch timestamp (that's in the past). This allows you to delete expired records from the table for free.

**Note**  
If you are using [Global Tables version 2019.11.21 (Current)](GlobalTables.md) of global tables and you also use the [Time to Live](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) feature, DynamoDB replicates TTL deletes to all replica tables. The initial TTL delete does not consume write capacity in the Region in which the TTL expiry occurs. However, the replicated TTL delete to the replica table(s) consumes replicated write capacity in each of the replica Regions and applicable charges will apply.

![\[Image showing a table with a user's messages with a time to live attribute\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/TTL.png)


In this example, we have an application designed to let a user create messages that are short-lived. When a message is created in DynamoDB, the TTL attribute is set to a date seven days in the future by the application code. In roughly seven days, DynamoDB will see that the epoch timestamp of these items is in the past and delete them.

Since the deletes done by TTL are free, it is strongly recommended to use this feature to remove historical data from the table. This will reduce the overall storage bill each month and will likely reduce the costs of user reads since there will be less data to be retrieved by their queries. While TTL is enabled at the table level, it is up to you which items or entities to create a TTL attribute for and how far into the future to set the epoch timestamp to.

**Key features of this building block**
+ TTL deletes are run behind the scenes with no impact to your table performance 
+ TTL is an asynchronous process that runs roughly every six hours, but can take over 48 hours for an expired record to be deleted 
  + Do not rely on TTL deletes for use cases like lock records or state management if stale data must be cleaned up in less than 48 hours 
+ You can name the TTL attribute a valid attribute name, but the value must be a number type

## Time to live for archival building block
Time to live archival

While TTL is an effective tool for deleting older data from DynamoDB, many use cases require an archive of the data be kept for a longer period of time than the primary datastore. In this instance, we can leverage TTL's timed deletion of records to push expired records into a long-term datastore.

![\[Image showing a table that sends a time to live delete job to DynamoDB Streams followed by a long-term datastore.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/TTLArchive.png)


When a TTL delete is done by DynamoDB, it is still pushed into the DynamoDB Stream as a `Delete` event. When DynamoDB TTL is the one who performs the delete though, there is an attribute on the stream record of `principal:dynamodb`. Using a Lambda subscriber to the DynamoDB Stream, we can apply an event-filter for only the DynamoDB principal attribute and know that any records that match that filter are to be pushed to an archival store like Amazon Glacier.

**Key features of this building block**
+  Once the low-latency reads of DynamoDB are no longer needed for the historical items, migrating them to a colder storage service like Amazon Glacier can reduce storage costs significantly while meeting the data compliance needs of your use case 
+ If the data is persisted into Amazon S3, cost-efficient analytics tools like Amazon Athena or Redshift Spectrum can be used to perform historical analysis of the data

## Vertical partitioning building block
Vertical partitioning

Users familiar with a document model database will be familar with the idea of storing all related data within a single JSON document. While DynamoDB supports JSON data types, it doesn't support executing `KeyConditions` on nested JSON. Since `KeyConditions` are what dictate how much data is read from disk and effectively how many RCUs a query consumes, this can result in inefficiencies at scale. To better optimize the writes and reads of DynamoDB, we recommend breaking apart the document's individual entities into individual DynamoDB items, also referred to as **vertical partitioning**.

![\[Image showing a large data structure formatted as a nested JSON object.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DocumentBlob.png)


![\[Image showing an item collection where the item's sort key helps keep DynamoDB usage optimized.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SingleTableSchema.png)


Vertical partitoning, as shown above, is a key example of single table design in action but can also be implemented across multiple tables if desired. Since DynamoDB bills writes in 1KB increments, you should ideally partition the document in a way that results in items under 1KB.

**Key features of this building block**
+ A hierarchy of data relationships is maintained via sort key prefixes so the singular document structure could be rebuilt client-side if needed 
+ Singular components of the data structure can be updated independently resulting in small item updates being only 1 WCU 
+ By using the sort key `BeginsWith`, the application can retrieve similar data in a single query, aggregating read costs for reduced total cost/latency
+ Large documents can easily be larger than the 400 KB individual item size limit in DynamoDB and vertical partitioning helps work around this limit

## Write sharding building block
Write sharding

One of the very few hard limits DynamoDB has in place is the restriction of how much throughput a single physical partition can maintain per second (not necessarily a single partition key). These limits are presently:
+ 1000 WCU (or 1000 <=1KB items written per second) and 3000 RCU (or 3000 <=4KB reads per second) *strongly consistent* or 
+ 6000 <=4KB reads per second *eventually consistent *

In the event requests against the table exceed either of these limits, an error is sent back to the client SDK of `ThroughputExceededException`, more commonly referred to as throttling. Use cases that require read operations beyond that limit will mostly be served best by placing a read cache in front of DynamoDB, but write operations require a schema level design known as **write sharding**.

![\[Image showing how DynamoDB shards partition keys across multiple partitions to prevent throttling from spikes in traffic.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/WriteShardingProblem.png)


![\[Image showing how DynamoDB shards partition keys across multiple partitions to prevent throttling from spikes in traffic.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/WriteShardingSolution.png)


To solve this problem, we'll append a random integer onto the end of the partition key for each contestant in the application's `UpdateItem` code. The range of the random integer generator will need to have an upper bound matching or exceeding the expected amount of writes per second for a given contestant divided by 1000. To support 20,000 votes per second, it would look like rand(0,19). Now that the data is stored under separate logical partitions, it must be combined back together at read time. Since vote totals doesn't need to be real time, a Lambda function scheduled to read all vote partitions every X minutes could perform occasional aggregation for each contestant and write it back to a single vote total record for live reads.

**Key features of this building block**
+ For use cases with extremely high write throughput for a given partition key that cannot be avoided, write operations can be artificially spread across multiple DynamoDB partitions 
+ GSIs with a low cardinality partition key should also utilize this pattern since throttling on a GSI will apply backpressure to write operations on the base table

# Data modeling schema design packages in DynamoDB
Data modeling schema design packages

Learn about data modeling schema design packages for DynamoDB, including use cases, access patterns, and final schema designs for social networks, gaming profiles, complaint management, recurring payments, device status, and online shops.

![\[Image showing the conceptual relationship between the data, the blocks that sit under them, and then the foundation that sits under the blocks. Emphasis on the foundation.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SchemaDesignData.png)


## Prerequisites


Before we attempt to design our schema for DynamoDB, we must first gather some prerequisite data on the use case the schema needs to support. Unlike relational databases, DynamoDB is sharded by default, meaning that the data will live on multiple servers behind the scenes so designing for data locality is important. We'll need to put together the following list for each schema design:
+ List of entities (ER Diagram)
+ Estimated volumes and throughput for each entity
+ Access patterns that need to be supported (queries and writes)
+ Data retention requirements

**Topics**
+ [

## Prerequisites
](#data-modeling-prereqs)
+ [

# Social network schema design in DynamoDB
](data-modeling-schema-social-network.md)
+ [

# Gaming profile schema design in DynamoDB
](data-modeling-schema-gaming-profile.md)
+ [

# Complaint management system schema design in DynamoDB
](data-modeling-complaint-management.md)
+ [

# Recurring payments schema design in DynamoDB
](data-modeling-schema-recurring-payments.md)
+ [

# Monitoring device status updates in DynamoDB
](data-modeling-device-status.md)
+ [

# Using DynamoDB as a data store for an online shop
](data-modeling-online-shop.md)

# Social network schema design in DynamoDB
Social network

## Social network business use case
Business use case

This use case talks about using DynamoDB as a social network. A social network is an online service that lets different users interact with each other. The social network we'll design will let the user see a timeline consisting of their posts, their followers, who they are following, and the posts written by who they are following. The access patterns for this schema design are:
+ Get user information for a given userID 
+ Get follower list for a given userID
+ Get following list for a given userID
+ Get post list for a given userID
+ Get user list who likes the post for a given postID
+ Get the like count for a given postID
+ Get the timeline for a given userID

## Social network entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using for the social network schema design.

![\[ERD for a social network application that shows entities, such as User, Post, and Follower.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetworkERD.png)


## Social network access patterns
Access patterns

These are the access patterns we'll be considering for the social network schema design.
+ `getUserInfoByUserID`
+ `getFollowerListByUserID`
+ `getFollowingListByUserID`
+ `getPostListByUserID`
+ `getUserLikesByPostID`
+ `getLikeCountByPostID`
+ `getTimelineByUserID`

## Social network schema design evolution
Schema design

DynamoDB is a NoSQL database, so it does not allow you to perform a join - an operation that combines data from multiple databases. Customers unfamiliar with DynamoDB might apply relational database management system (RDBMS) design philosophies (such as creating a table for each entity) to DynamoDB when they do not need to. The purpose of DynamoDB's single-table design is to write data in a pre-joined form according to the application's access pattern, and then immediately use the data without additional computation. For more information, see [Single-table vs. multi-table design in DynamoDB](https://aws.amazon.com/blogs/database/single-table-vs-multi-table-design-in-amazon-dynamodb/). 

Now, let's step through how we'll evolve our schema design to address all the access patterns.

**Step 1: Address access pattern 1 (`getUserInfoByUserID`)**

To get a given user's information, we'll need to [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html) the base table with a key condition of `PK=<userID>`. The query operation lets you paginate the results, which can be useful when a user has many followers. For more information on Query, see [Querying tables in DynamoDB](Query.md). 

In our example, we track two types of data for our user: their "count" and their "info." A user's "count" reflects how many followers they have, how many users they are following, and how many posts they've created. A user's "info" reflects their personal information such as their name.

We see these two kinds of data represented by the two items below. The item that has "count" in its sort key (SK) is more likely to change than the item with "info." DynamoDB considers the size of the item as it appears before and after the update and the provisioned throughput consumed will reflect the larger of these item sizes. So even if you update just a subset of the item's attributes, [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html) will still consume the full amount of provisioned throughput (the larger of the before and after item sizes). You can get the items via a single `Query` operation and use `UpdateItem` to add or subtract from existing numeric attributes.

![\[Result of the Query operation for a user with ID u#12345 and their count and info data.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork1.png)


**Step 2: Address access pattern 2 (`getFollowerListByUserID`)**

To get a list of users who are following a given user, we'll need to `Query` the base table with a key condition of `PK=<userID>#follower`. 

![\[Result of Query operation on a table to list the followers of the user with ID u#12345.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork2.png)


**Step 3: Address access pattern 3 (`getFollowingListByUserID`)**

To get a list of users a given user is following, we'll need to `Query` the base table with a key condition of `PK=<userID>#following`. You can then use a [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html) operation to group up several requests together and do the following:
+ Add User A to User B's follower list, and then increment User B's follower count by one.
+ Add User B to User A's follower list, and then increment User A's follower count by one.

![\[Result of Query operation on a table to list all users the user with ID u#12345 is following.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork3.png)


**Step 4: Address access pattern 4 (`getPostListByUserID`)**

To get a list of posts created by a given user, we'll need to `Query` the base table with a key condition of `PK=<userID>#post`. One important thing to note here is that a user's postIDs must be incremental: the second postID value must be greater than the first postID value (since users want to see their posts in a sorted manner). You can do this by generating postIDs based on a time value like a Universally Unique Lexicographically Sortable Identifier (ULID).

![\[Result of Query operation with a key condition to get a list of posts created by a specific user.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork4.png)


**Step 5: Address access pattern 5 (`getUserLikesByPostID`)**

To get a list of users who liked a given user's post, we'll need to `Query` the base table with a key condition of `PK=<postID>#likelist`. This approach is the same pattern that we used for retrieving the follower and following lists in access pattern 2 (`getFollowerListByUserID`) and access pattern 3 (`getFollowingListByUserID`).

![\[Result of Query operation with a key condition to get a list of users who liked a specific user’s post.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork5.png)


**Step 6: Address access pattern 6 (`getLikeCountByPostID`)**

To get a count of likes for a given post, we'll need to perform a [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html) operation on the base table with a key condition of `PK=<postID>#likecount`. This access pattern can cause throttling issues whenever a user with many followers (such as a celebrity) creates a post since throttling occurs when a partition's throughput exceeds 1000 WCU per second. This problem is not a result of DynamoDB, it just appears in DynamoDB since it's at the end of the software stack.

You should evaluate whether it's really essential for all users to view the like count simultaneously or if it can happen gradually over time. In general, a post's like count doesn't need to be immediately 100% accurate. You can implement this strategy by putting a queue between your application and DynamoDB to have the updates happen periodically.

![\[Result of GetItem operation with a key condition to get the count of likes for a specific post.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork6.png)


**Step 7: Address access pattern 7 (`getTimelineByUserID`)**

To get the timeline for a given user, we'll need to perform a `Query` operation on the base table with a key condition of `PK=<userID>#timeline`. Let's consider a scenario where a user's followers need to view their post synchronously. Every time a user writes a post, their follower list is read and their userID and postID are slowly entered into the timeline key of all its followers. Then, when your application starts, you can read the timeline key with the `Query` operation and fill the timeline screen with a combination of userID and postID using the [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html) operation for any new items. You cannot read the timeline with an API call, but this is a more cost effective solution if the posts could be edited frequently.

The timeline is a place that shows recent posts, so we'll need a way to clean up the old ones. Instead of using WCU to delete them, you can use DynamoDB's [TTL](TTL.md) feature to do it for free.

![\[Result of Query operation with a key condition to get the timeline for a given user showing their recent posts.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork7.png)


All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | Other conditions/filters | 
| --- | --- | --- | --- | --- | --- | 
| getUserInfoByUserID | Base table | Query | PK=<userID> |  |  | 
| getFollowerListByUserID | Base table | Query | PK=<userID>\$1follower |  |  | 
| getFollowingListByUserID | Base table | Query | PK=<userID>\$1following |  |  | 
| getPostListByUserID | Base table | Query | PK=<userID>\$1post |  |  | 
| getUserLikesByPostID | Base table | Query | PK=<postID>\$1likelist |  |  | 
| getLikeCountByPostID | Base table | GetItem | PK=<postID>\$1likecount |  |  | 
| getTimelineByUserID | Base table | Query | PK=<userID>\$1timeline |  |  | 

## Social network final schema
Final schema

Here is the final schema design. To download this schema design as a JSON file, see [DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/SocialNetwork/SocialNetworkSchema.json) on GitHub.

**Base table:**

![\[Final schema design of a table that contains results of the preceding Query and GetItem operations.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/SocialNetwork8.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Gaming profile schema design in DynamoDB
Gaming profile

## Gaming profile business use case
Business use case

This use case talks about using DynamoDB to store player profiles for a gaming system. Users (in this case, players) need to create profiles before they can interact with many modern games, especially online ones. Gaming profiles typically include the following:
+ Basic information such as their user name
+ Game data such as items and equipment
+ Game records such as tasks and activities
+ Social information such as friend lists

To meet the fine-grained data query access requirements for this application, the primary keys (partition key and sort key) will use generic names (PK and SK) so they can be overloaded with various types of values as we will see below.

The access patterns for this schema design are:
+ Get a user's friend list
+ Get all of a player's information
+ Get a user's item list
+ Get a specific item from the user's item list
+ Update a user's character
+ Update the item count for a user

The size of the gaming profile will vary in different games. [Compressing large attribute values](bp-use-s3-too.md) can let them fit within item limits in DynamoDB and reduce costs. The throughput management strategy would depend various on factors such as: number of players, number of games played per second, and seasonality of the workload. Typically for a newly launched game, the number of players and the level of popularity are unknown so we will start with the [on-demand throughput mode](capacity-mode.md#capacity-mode-on-demand).

## Gaming profile entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using for the gaming profile schema design.

![\[ER diagram for a gaming profile, showing relationships between entities, such as User, Game, and Score.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfileERD.png)


## Gaming profile access patterns
Access patterns

These are the access patterns we'll be considering for the social network schema design.
+ `getPlayerFriends`
+ `getPlayerAllProfile`
+ `getPlayerAllItems`
+ `getPlayerSpecificItem`
+ `updateCharacterAttributes`
+ `updateItemCount`

## Gaming profile schema design evolution
Schema design

From the above ERD, we can see that this is a one-to-many relationship type of data modeling. In DynamoDB, one-to-many data models can be organized into item collections, which is different from traditional relational databases where multiple tables are created and linked through foreign keys. An [item collections](WorkingWithItemCollections.md) is a group of items that share the same partition key value but have different sort key values. Within an item collection, each item has a unique sort key value that distinguishes it from other items. With this in mind, let’s use the following pattern for `HASH` and `RANGE` values for each entity type.

To begin, we use generic names like `PK` and `SK` to store different types of entities in the same table to make the model future-proof. For better readability, we can include prefixes to denote the type of data or include an arbitrary attribute called `Entity_type` or `Type`. In the current example, we use a string starting with `player` to store `player_ID` as the `PK`; use `entity name#` as the prefix of `SK`, and add a `Type` attribute to indicate which entity type this piece of data is. This allows us to support storing more entity types in the future, and use advanced technologies such as GSI Overloading and Sparse GSI to meet more access patterns.

Let’s start implementing the access patterns. Access patterns such as adding players and adding equipment can be realized through the [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html) operation, so we can ignore them. In this document, we’ll focus on the typical access patterns listed above.

**Step 1: Address access pattern 1 (`getPlayerFriends`)**

We address access pattern 1 (`getPlayerFriends`) with this step. In our current design, friendship is simple and the number of friends in the game is small. For simplicity's sake, we use a list data type to store friend lists (1:1 modeling). In this design, we use [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html) to satisfy this access pattern. In the `GetItem` operation, we explicitly provide the partition key and sort key value to get a specific item.

However, if a game has a large number of friends, and the relationships between them are complex (such as friendships being bi-directional with both an invite and accept component) it would be necessary to use a many-to-many relationship to store each friend individually, in order to scale to an unlimited friend list size. And if the friendship change involves operating on multiple items at the same time, DynamoDB transactions can be used to group multiple actions together and submit them as a single all-or-nothing [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html) or [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactGetItems.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactGetItems.html) operation.

![\[Complex many-to-many relationship diagram for a gaming profile of the Friends entity.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfile1.png)


**Step 2: Address access patterns 2 (`getPlayerAllProfile`), 3 (`getPlayerAllItems`), and 4 (`getPlayerSpecificItem`) **

We address access patterns 2 (`getPlayerAllProfile`), 3 (`getPlayerAllItems`), and 4 (`getPlayerSpecificItem`) using this step. What these three access patterns have in common is a range query, which uses the [`Query`](Query.md) operation. Depending on the scope of the query, [Key Condition](Query.KeyConditionExpressions.md) and [Filter Expressions](Query.FilterExpression.md) are used, which are commonly used in practical development.

In the Query operation, we provide a single value for Partition Key and get all items with that Partition Key value. Access pattern 2 (`getPlayerAllProfile`) is implemented in this way. Optionally, we can add a sort key condition expression — a string that determines the items to be read from the table. Access pattern 3 (`getPlayerAllItems`) is implemented by adding the key condition of sort key begins\$1with `ITEMS#`. Further, in order to simplify the development of the application side, we can use filter expressions to implement access pattern 4 (`getPlayerSpecificItem`).

Here's a pseudocode example using filter expression that filters items of the `Weapon` category:

```
filterExpression: "ItemType = :itemType"
expressionAttributeValues: {":itemType": "Weapon"}
```

![\[Using Query operation with a partition key and sort key conditions to implement different access patterns.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfile2.png)


**Note**  
A filter expression is applied after a Query finishes, but before the results are returned to the client. Therefore, a Query consumes the same amount of read capacity regardless of whether a filter expression is present.

If the access pattern is to query a large dataset and filter out a large amount of data to keep only a small subset of data, the appropriate approach is to design DynamoDB Partition Key and Sort Key more effectively. For example, in the above example for obtaining a certain `ItemType`, if there are many items for each player and querying for a certain `ItemType` is a typical access pattern, it would be more efficient to bring `ItemType` into the `SK` as a composite key. The data model would look like this: `ITEMS#ItemType#ItemId`.

**Step 3: Address access patterns 5 (`updateCharacterAttributes`) and 6 (`updateItemCount`) **

We address access patterns 5 (`updateCharacterAttributes`) and 6 (`updateItemCount`) using this step. When the player needs to modify the character, such as reducing the currency, or modifying the quantity of a certain weapon in their items, use [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html) to implement these access patterns. To update a player's currency but ensure it never goes below a minimum amount, we can add a [DynamoDB condition expression CLI example](Expressions.ConditionExpressions.md) to reduce the balance only if it's greater than or equal to the minimum amount. Here is a pseudocode example:

```
UpdateExpression: "SET currency = currency - :amount"
ConditionExpression: "currency >= :minAmount"
```

![\[Using UpdateItem with a condition expression to modify a player's currency, ensuring it's never less than a set amount.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfile4-Update-player-Currency.png)


When developing with DynamoDB and using [Atomic Counters](WorkingWithItems.md#WorkingWithItems.AtomicCounters) to decrement inventory, we can ensure idempotency by using optimistic locking. Here is a pseudocode example for Atomic Counters:

```
UpdateExpression: "SET ItemCount = ItemCount - :incr"
expression-attribute-values: '{":incr":{"N":"1"}}'
```

![\[Using atomic counter to decrement the ItemCount attribute value from 5 to 4.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfile5-Update-Item-Count.png)


In addition, in a scenario where the player purchases an item with currency, the entire process needs to deduct currency and add an item at the same time. We can use DynamoDB Transactions to group multiple actions together and submit them as a single all-or-nothing `TransactWriteItems` or `TransactGetItems` operation. `TransactWriteItems` is a synchronous and idempotent write operation that groups up to 100 write actions in a single all-or-nothing operation. The actions are completed atomically so that either all of them succeed or none of them succeeds. Transactions help eliminate the risk of duplication or vanishing currency. For more information on transactions, see [DynamoDB transactions example](transaction-example.md) .

All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | Other conditions/filters | 
| --- | --- | --- | --- | --- | --- | 
| getPlayerFriends | Base table | GetItem | PK=PlayerID | SK=“FRIENDS\$1playerID” |  | 
| getPlayerAllProfile | Base table | Query | PK=PlayerID |  |  | 
| getPlayerAllItems | Base table | Query | PK=PlayerID | SK begins\$1with “ITEMS\$1” |  | 
| getPlayerSpecificItem | Base table | Query | PK=PlayerID | SK begins\$1with “ITEMS\$1” | filterExpression: "ItemType = :itemType" expressionAttributeValues: \$1 ":itemType": "Weapon" \$1 | 
| updateCharacterAttributes | Base table | UpdateItem | PK=PlayerID | SK=“\$1METADATA\$1playerID” | UpdateExpression: "SET currency = currency - :amount" ConditionExpression: "currency >= :minAmount" | 
| updateItemCount | Base table | UpdateItem | PK=PlayerID | SK =“ITEMS\$1ItemID” | update-expression: "SET ItemCount = ItemCount - :incr" expression-attribute-values: '\$1":incr":\$1"N":"1"\$1\$1'  | 

## Gaming profile final schema
Final schema

Here is the final schema design. To download this schema design as a JSON file, see [DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/GamingPlayerProfiles/GamePlayerProfilesSchema.json) on GitHub.

**Base table:**

![\[Final schema design of a table that contains results of the preceding access pattern implementations.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/GamingProfile6-FinalSchema.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Complaint management system schema design in DynamoDB
Complaint management system

## Complaint management system business use case
Business use case

DynamoDB is a database well-suited for a complaint management system (or contact center) use case as most access patterns associated with them would be key-value based transactional lookups. The typical access patterns in this scenario would be to:
+ Create and update complaints
+ Escalate a complaint
+ Create and read comments on a complaint
+ Get all complaints by a customer
+ Get all comments by an agent and get all escalations 

Some comments may have attachments describing the complaint or solution. While these are all key-value access patterns, there can be additional requirements such as sending out notifications when a new comment is added to a complaint or running analytical queries to find complaint distribution by severity (or agent performance) per week. An additional requirement related to lifecycle management or compliance would be to archive complaint data after three years of logging the complaint.

## Complaint management system architecture diagram
Architecture diagram

The following diagram shows the architecture diagram of the complaint management system. This diagram shows the different AWS service integrations that the complaint management system uses.

![\[Combined workflow to fulfill non-transactional requirements using integrations with several AWS services.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-1-AD.jpg)


Apart from the key-value transactional access patterns that we will be handling in the DynamoDB data modeling section later, we have three non-transactional requirements. The architecture diagram above can be broken down into the following three workflows:

1. Send a notification when a new comment is added to a complaint

1. Run analytical queries on weekly data

1. Archive data older than three years

Let's take a more in-depth look at each one.

**Send a notification when a new comment is added to a complaint**

We can use the below workflow to achieve this requirement:

![\[Workflow to invoke Lambda functions to send notifications based on changes recorded by DynamoDB Streams.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-2-Workflow1.jpg)


[DynamoDB Streams](Streams.md) is a change data capture mechanism to record all write activity on your DynamoDB tables. You can configure Lambda functions to trigger on some or all of these changes. An [event filter](https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventfiltering.html) can be configured on Lambda triggers to filter out events that are not relevant to the use-case. In this instance, we can use a filter to trigger Lambda only when a new comment is added and send out notification to relevant email ID(s) which can be fetched from [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html) or any other credential store.

**Run analytical queries on weekly data**

DynamoDB is suitable for workloads that are primarily focused on online transactional processing (OLTP). For the other 10-20% access patterns with analytical requirements, data can be exported to S3 with the managed [Export to Amazon S3](S3DataExport.HowItWorks.md) feature with no impact to the live traffic on DynamoDB table. Take a look at this workflow below:

![\[Workflow to periodically invoke a Lambda function to store DynamoDB data in an Amazon S3 bucket.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-3-Workflow2.jpg)


[Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is) can be used to trigger AWS Lambda on schedule - it allows you to configure a cron expression for Lambda invocation to take place periodically. Lambda can invoke the `ExportToS3` API call and store DynamoDB data in S3. This S3 data can then be accessed by a SQL engine such as [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/what-is) to run analytical queries on DynamoDB data without affecting the live transactional workload on the table. A sample Athena query to find number of complaints per severity level would look like this:

```
SELECT Item.severity.S as "Severity", COUNT(Item) as "Count"
FROM "complaint_management"."data"
WHERE NOT Item.severity.S = ''
GROUP BY Item.severity.S ;
```

This results in the following Athena query result:

![\[Athena query results showing number of complaints for severity levels P3, P2, and P1.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-4-Athena.png)


**Archive data older than three years**

You can leverage the DynamoDB [Time to Live (TTL)](TTL.md) feature to delete obsolete data from your DynamoDB table at no additional cost (except in the case of global tables replicas for the 2019.11.21 (Current) version, where TTL deletes replicated to other Regions consume write capacity). This data appears and can be consumed from DynamoDB Streams to be archived off into Amazon S3. The workflow for this requirement is as follows:

![\[Workflow to archive old data in an Amazon S3 bucket using the TTL feature and DynamoDB Streams.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-5-Workflow3.jpg)


## Complaint management system entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using for the complaint management system schema design. 

![\[Complaint management system ERD that shows the entities Customer, Complaint, Comment, and Agent.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-6-ERD.jpg)


## Complaint management system access patterns
Access patterns

These are the access patterns we'll be considering for the complaint management schema design.

1. createComplaint

1. updateComplaint

1. updateSeveritybyComplaintID

1. getComplaintByComplaintID

1. addCommentByComplaintID

1. getAllCommentsByComplaintID

1. getLatestCommentByComplaintID

1. getAComplaintbyCustomerIDAndComplaintID

1. getAllComplaintsByCustomerID

1. escalateComplaintByComplaintID

1. getAllEscalatedComplaints

1. getEscalatedComplaintsByAgentID (order from newest to oldest)

1. getCommentsByAgentID (between two dates)

## Complaint management system schema design evolution
Schema design

Since this is a complaint management system, most access patterns revolve around a complaint as the primary entity. The `ComplaintID` being highly cardinal will ensure even distribution of data in the underlying partitions and is also the most common search criteria for our identified access patterns. Therefore, `ComplaintID` is a good partition key candidate in this data set.

**Step 1: Address access patterns 1 (`createComplaint`), 2 (`updateComplaint`), 3 (`updateSeveritybyComplaintID`), and 4 (`getComplaintByComplaintID`) **

We can use a generic sort key valued called "metadata" (or "AA") to store complaint-specific information such as `CustomerID`, `State`, `Severity`, and `CreationDate`. We use singleton operations with `PK=ComplaintID` and `SK=“metadata”` to do the following:

1. [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html) to create a new complaint

1. [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html) to update the severity or other fields in the complaint metadata

1. [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html) to fetch metadata for the complaint

![\[Primary key, sort key, and attribute values, such as customer_id and severity, for a complaint item.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-7-Step1.png)


**Step 2: Address access pattern 5 (`addCommentByComplaintID`)**

This access pattern requires a one-to-many relationship model between a complaint and comments on the complaint. We will use the [vertical partitioning](data-modeling-blocks.md#data-modeling-blocks-vertical-partitioning) technique here to use a sort key and create an item collection with different types of data. If we look at access patterns 6 (`getAllCommentsByComplaintID`) and 7 (`getLatestCommentByComplaintID`), we know that comments will need to be sorted by time. We can also have multiple comments coming in at the same time so we can use the [composite sort key](data-modeling-blocks.md#data-modeling-blocks-composite) technique to append time and `CommentID` in the sort key attribute.

Other options to deal with such possible comment collisions would be to increase the granularity for the timestamp or add an incremental number as a suffix instead of using `Comment_ID`. In this case, we’ll prefix the sort key value for items corresponding to comments with “comm\$1” to enable range-based operations.

We also need to ensure that the `currentState` in the complaint metadata reflects the state when a new comment is added. Adding a comment might indicate that the complaint has been assigned to an agent or it has been resolved and so on. In order to bundle the addition of comment and update of current state in the complaint metadata, in an all-or-nothing manner, we will use the [TransactWriteItems](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html) API. The resulting table state now looks like this:

![\[Table to store a complaint with its comments as a one-to-many relationship using a composite sort key.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-8-Step2.png)


Let’s add some more data in the table and also add `ComplaintID` as a separate field from our `PK` for future-proofing the model in case we need additional indexes on `ComplaintID`. Also note that some comments may have attachments which we will store in Amazon Simple Storage Service and only maintain their references or URLs in DynamoDB. It’s a best practice to keep the transactional database as lean as possible to optimize cost and performance. The data now looks like this:

![\[Table with complaint metadata and the data of all comments associated with each complaint.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-9-Step3.png)


**Step 3: Address access patterns 6 (`getAllCommentsByComplaintID`) and 7 (`getLatestCommentByComplaintID`)**

In order to get all comments for a complaint, we can use the [`query`](Query.md) operation with the `begins_with` condition on the sort key. Instead of consuming additional read capacity to read the metadata entry and then having the overhead of filtering the relevant results, having a sort key condition like this help us only read what we need. For example, a query operation with `PK=Complaint123` and `SK` begins\$1with `comm#` would return the following while skipping the metadata entry:

![\[Query operation result using a sort key condition that only displyas a complaint's comments.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-10-Step4.png)


Since we need the latest comment for a complaint in pattern 7 (`getLatestCommentByComplaintID`), let's use two additional query parameters:

1. `ScanIndexForward` should be set to False to get results sorted in a descending order

1. `Limit` should be set to 1 to get the latest (only one) comment

Similar to access pattern 6 (`getAllCommentsByComplaintID`), we skip the metadata entry using `begins_with` `comm#` as the sort key condition. Now, you can perform access pattern 7 on this design using the query operation with `PK=Complaint123` and `SK=begins_with comm#`, `ScanIndexForward=False`, and `Limit` 1. The following targeted item will be returned as a result:

![\[Result of query operation using a sort key condition to get a complaint's last comment.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-11-Step5.png)


Let's add more dummy data to the table.

![\[Table with dummy data to get latest comments on complaints received.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-12-Step6.png)


**Step 4: Address access patterns 8 (`getAComplaintbyCustomerIDAndComplaintID`) and 9 (`getAllComplaintsByCustomerID`)**

Access patterns 8 (`getAComplaintbyCustomerIDAndComplaintID`) and 9 (`getAllComplaintsByCustomerID`) introduces a new search criteria: `CustomerID`. Fetching it from the existing table requires an expensive [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html) to read all data and then filter relevant items for the `CustomerID` in question. We can make this search more efficient by creating a [global secondary index (GSI)](GSI.md) with `CustomerID` as the partition key. Keeping in mind the one-to-many relationship between customer and complaints as well as access pattern 9 (`getAllComplaintsByCustomerID`), `ComplaintID` would be the right candidate for the sort key.

The data in the GSI would look like this:

![\[GSI with a one-to-many relationship model to get all complaints by a specific CustomerID.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-13-Step4-GSI.png)


 An example query on this GSI for access pattern 8 (`getAComplaintbyCustomerIDAndComplaintID`) would be: `customer_id=custXYZ`, `sort key=Complaint1321`. The result would be:

![\[Query operation result on a GSI to get data of a specific complaint by a given customer.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-14-Step4-8.png)


To get all complaints for a customer for access pattern 9 (`getAllComplaintsByCustomerID`), the query on the GSI would be: `customer_id=custXYZ` as the partition key condition. The result would be:

![\[Query operation result using a partition key condition to get all complaints by a given customer.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-15-Step4-9.png)


**Step 5: Address access pattern 10 (`escalateComplaintByComplaintID`)**

This access introduces the escalation aspect. To escalate a complaint, we can use `UpdateItem` to add attributes such as `escalated_to` and `escalation_time` to the existing complaint metadata item. DynamoDB provides flexible schema design which means a set of non-key attributes can be uniform or discrete across different items. See below for an example:

`UpdateItem with PK=Complaint1444, SK=metadata`

![\[Result of updating complaint metadata using UpdateItem API operation.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-16-Step5.png)


**Step 6: Address access patterns 11 (`getAllEscalatedComplaints`) and 12 (`getEscalatedComplaintsByAgentID`)**

Only a handful of complaints are expected to be escalated out of the whole data set. Therefore, creating an index on the escalation-related attributes would lead to efficient lookups as well as cost-effective GSI storage. We can do this by leveraging the [sparse index](data-modeling-blocks.md#data-modeling-blocks-sparse-index) technique. The GSI with partition key as `escalated_to` and sort key as `escalation_time` would look like this:

![\[GSI design using escalation-related attributes, escalated_to and escalation_time.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-17-Step6.png)


To get all escalated complaints for access pattern 11 (`getAllEscalatedComplaints`), we simply scan this GSI. Note that this scan will be performant and cost-efficient due to the size of the GSI. To get escalated complaints for a specific agent (access pattern 12 (`getEscalatedComplaintsByAgentID`)), the partition key would be `escalated_to=agentID` and we set `ScanIndexForward` to `False` for ordering from newest to oldest.

**Step 7: Address access pattern 13 (`getCommentsByAgentID`)**

For the last access pattern, we need to perform a lookup by a new dimension: `AgentID`. We also need time-based ordering to read comments between two dates so we create a GSI with `agent_id` as the partition key and `comm_date` as the sort key. The data in this GSI will look like the following:

![\[GSI design to lookup comments by a given agent sorted using comment date.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-18.png)


An example query on this GSI would be `partition key agentID=AgentA` and `sort key=comm_date between (2023-04-30T12:30:00, 2023-05-01T09:00:00)`, the result of which is:

![\[Result of query using a partition key and sort key on a GSI.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-19.png)


All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | Other conditions/filters | 
| --- | --- | --- | --- | --- | --- | 
| createComplaint | Base table | PutItem | PK=complaint\$1id | SK=metadata |  | 
| updateComplaint | Base table | UpdateItem | PK=complaint\$1id | SK=metadata |  | 
| updateSeveritybyComplaintID | Base table | UpdateItem | PK=complaint\$1id | SK=metadata |  | 
| getComplaintByComplaintID | Base table | GetItem | PK=complaint\$1id | SK=metadata |  | 
| addCommentByComplaintID | Base table | TransactWriteItems | PK=complaint\$1id | SK=metadata, SK=comm\$1comm\$1date\$1comm\$1id |  | 
| getAllCommentsByComplaintID | Base table | Query | PK=complaint\$1id | SK begins\$1with "comm\$1" |  | 
| getLatestCommentByComplaintID | Base table | Query | PK=complaint\$1id | SK begins\$1with "comm\$1" | scan\$1index\$1forward=False, Limit 1 | 
| getAComplaintbyCustomerIDAndComplaintID | Customer\$1complaint\$1GSI | Query | customer\$1id=customer\$1id | complaint\$1id = complaint\$1id |  | 
| getAllComplaintsByCustomerID | Customer\$1complaint\$1GSI | Query | customer\$1id=customer\$1id | N/A |  | 
| escalateComplaintByComplaintID | Base table | UpdateItem | PK=complaint\$1id | SK=metadata |  | 
| getAllEscalatedComplaints | Escalations\$1GSI | Scan | N/A | N/A |  | 
| getEscalatedComplaintsByAgentID (order from newest to oldest) | Escalations\$1GSI | Query | escalated\$1to=agent\$1id | N/A | scan\$1index\$1forward=False | 
| getCommentsByAgentID (between two dates) | Agents\$1Comments\$1GSI | Query | agent\$1id=agent\$1id | SK between (date1, date2) |  | 

## Complaint management system final schema
Final schema

Here are the final schema designs. To download this schema design as a JSON file, see [DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/ComplainManagement/ComplaintManagementSchema.json) on GitHub.

**Base table**

![\[Base table design with complaint metadata.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-20-Complaint_management_system.png)


**Customer\$1Complaint\$1GSI**

![\[GSI design showing complaints by a given customer.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-21-Customer_Complaint_GSI.png)


**Escalations\$1GSI**

![\[GSI design showing escalation-related attributes.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-22-Escalations_GSI.png)


**Agents\$1Comments\$1GSI**

![\[GSI design showing comments made by a given agent.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ComplaintManagement-23-Comments_GSI.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Recurring payments schema design in DynamoDB
Recurring payments

## Recurring payments business use case
Business use case

This use case talks about using DynamoDB to implement a recurring payments system. The data model has the following entities: *accounts*, *subscriptions*, and *receipts*. The specifics for our use case include the following:
+ Each *account* can have multiple *subscriptions*
+ The *subscription* has a `NextPaymentDate` when the next payment needs to be processed and a `NextReminderDate` when an email reminder is sent to the customer
+ There is an item for the *subscription* that is stored and updated when the payment been processed (the average item size is around 1KB and the throughput depends on the number of *accounts* and *subscriptions*)
+ The *payment* processor will also create a *receipt* as part of the process which is stored in the table and are set to expire after a period of time by using a [TTL](TTL.md) attribute

## Recurring payments entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using for the recurring payments system schema design.

![\[Recurring payments system ERD showing entities: Account, Subscription, and Receipt.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-1-ERD.png)


## Recurring payments system access patterns
Access patterns

These are the access patterns we'll be considering for the recurring payments system schema design.

1. `createSubscription`

1. `createReceipt`

1. `updateSubscription`

1. `getDueRemindersByDate`

1. `getDuePaymentsByDate`

1. `getSubscriptionsByAccount`

1. `getReceiptsByAccount`

## Recurring payments schema design
Schema design

The generic names `PK` and `SK` are used for key attributes to allow storing different types of entities in the same table such as the account, subscription, and receipt entities. The user first creates a subscription, which is where the user agrees to pay an amount on the same day each month in return for a product. They get the choice on which day of the month to process the payment. There is also a reminder that will be sent prior to the payment being processed. The application works by having two batch jobs that run each day: one batch job sends reminders due that day and the other batch job processes any payments due that day.

**Step 1: Address access pattern 1 (`createSubscription`)**

Access pattern 1 (`createSubscription`) is used to initially create the subscription, and the details including `SKU`, `NextPaymentDate`, `NextReminderDate` and `PaymentDetails` are set. This step shows the state of the table for just one account with one subscription. There can be multiple subscriptions in the item collection so this is a one-to-many relationship.

![\[Table design showing the subscription details for an account.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-2-Step1.png)


**Step 2: Address access patterns 2 (`createReceipt`) and 3 (`updateSubscription`)**

Access pattern 2 (`createReceipt`) is used to create the receipt item. After the payment is processed each month, the payment processor will write a receipt back to the base table. There, could be multiple receipts in the item collection so this is a one-to-many relationship. The payment processor will also update the subscription item (access Pattern 3 (`updateSubscription`)) to update for the `NextReminderDate` or the `NextPaymentDate` for the next month.

![\[Receipt details and subscription item update to show the next subscription reminder date.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-3-Step2.png)


**Step 3: Address access pattern 4 (`getDueRemindersByDate`)**

The application processes reminders for the payment in batches for the current day. Therefore the application needs to access the subscriptions on a different dimension: date rather than account. This is a good use case for a [global secondary index (GSI)](GSI.md). In this step we add the index `GSI-1`, which uses the `NextReminderDate` as the GSI partition key. We do not need to replicate all the items. This GSI is a [sparse index](data-modeling-blocks.md#data-modeling-blocks-sparse-index) and the receipts items are not replicated. We also do not need to project all the attributes—we only need to include a subset of the attributes. The image below shows the schema of `GSI-1` and it gives the information needed for the application to send the reminder email.

![\[GSI-1 schema with details, such as email address, the application needs to send a reminder email.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-4-Step3.png)


**Step 4: Address access pattern 5 (`getDuePaymentsByDate`)**

The application processes the payments in batches for the current day in the same way it does with reminders. We add `GSI-2` in this step, and it uses the `NextPaymentDate` as the GSI partition key. We do not need to replicate all the items. This GSI is a sparse index as the receipts items are not replicated. The image below shows the schema of `GSI-2`.

![\[GSI-2 schema with details to process payments. NextPaymentDate is the partition key for GSI-2.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-5-Step4.png)


**Step 5: Address access patterns 6 (`getSubscriptionsByAccount`) and 7 (`getReceiptsByAccount`)**

The application can retrieve all the subscriptions for an account by using a [query](Query.md) on the base table that targets the account identifier (the `PK`) and uses the range operator to get all the items where the `SK` begins with “SUB\$1”. The application can also use the same query structure to retrieve all the receipts by using a range operator to get all the items where the `SK` begins with “REC\$1”. This allows us to satisfy access patterns 6 (`getSubscriptionsByAccount`) and 7 (`getReceiptsByAccount`). The application uses these access patterns so the user can see their current subscriptions and their past receipts for the last six months. There is no change to the table schema in this step and we can see below how we target just the subscription item(s) in access pattern 6 (`getSubscriptionsByAccount`).

![\[Result of query operation on the base table. It shows the subscription of a specific account.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-6-Step5.png)


All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | 
| --- | --- | --- | --- | --- | 
| createSubscription | Base table | PutItem | ACC\$1account\$1id | SUB\$1<SUBID>\$1SKU<SKUID> | 
| createReceipt | Base table | PutItem | ACC\$1account\$1id | REC\$1<RecieptDate>\$1SKU<SKUID> | 
| updateSubscription | Base table | UpdateItem | ACC\$1account\$1id | SUB\$1<SUBID>\$1SKU<SKUID> | 
| getDueRemindersByDate | GSI-1 | Query | <NextReminderDate> |  | 
| getDuePaymentsByDate | GSI-2 | Query | <NextPaymentDate> |  | 
| getSubscriptionsByAccount | Base table | Query | ACC\$1account\$1id | SK begins\$1with “SUB\$1” | 
| getReceiptsByAccount | Base table | Query | ACC\$1account\$1id | SK begins\$1with “REC\$1” | 

## Recurring payments final schema
Final schema

Here are the final schema designs. To download this schema design as a JSON file, see [DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples/blob/master/schema_design/SchemaExamples/ReocurringPayments/ReocurringPaymentsSchema.json) on GitHub.

**Base table**

![\[Base table design showing account information, and its subscription and receipt details.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-7-Base.png)


**GSI-1**

![\[GSI-1 schema with subscription details, such as email address and NextPaymentDate.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-8-GSI1.png)


**GSI-2**

![\[GSI-2 schema with payment details, such as PaymentAmount and PaymentDay.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/ReoccurringPayments-9-GSI2.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Monitoring device status updates in DynamoDB
Device status updates

This use case talks about using DynamoDB to monitor device status updates (or changes in device state) in DynamoDB.

## Use case
Use case

In IoT use-cases (a smart factory for instance) many devices need to be monitored by operators and they periodically send their status or logs to a monitoring system. When there is a problem with a device, the status for the device changes from *normal* to *warning*. There are different log levels or statuses depending on the severity and type of abnormal behavior in the device. The system then assigns an operator to check on the device and they may escalate the problem to their supervisor if needed.

Some typical access patterns for this system include:
+ Create log entry for a device
+ Get all logs for a specific device state showing the most recent logs first
+ Get all logs for a given operator between two dates
+ Get all escalated logs for a given supervisor
+ Get all escalated logs with a specific device state for a given supervisor
+ Get all escalated logs with a specific device state for a given supervisor for a specific date

## Entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using for monitoring device status updates.

![\[ERD of device status updates. It shows the entities: Device and Operator.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-1-ERD.jpg)


## Access patterns
Access patterns

These are the access patterns we'll be considering for monitoring device status updates.

1. `createLogEntryForSpecificDevice`

1. `getLogsForSpecificDevice`

1. `getWarningLogsForSpecificDevice`

1. `getLogsForOperatorBetweenTwoDates`

1. `getEscalatedLogsForSupervisor`

1. `getEscalatedLogsWithSpecificStatusForSupervisor`

1. `getEscalatedLogsWithSpecificStatusForSupervisorForDate`

## Schema design evolution
Schema design

**Step 1: Address access patterns 1 (`createLogEntryForSpecificDevice`) and 2 (`getLogsForSpecificDevice`)**

The unit of scaling for a device tracking system would be individual devices. In this system, a `deviceID` uniquely identifies a device. This makes `deviceID` a good candidate for the partition key. Each device sends information to the tracking system periodically (say, every five minutes or so). This ordering makes date a logical sorting criterion and therefore, the sort key. The sample data in this case would look something like this:

![\[Table to store status of multiple devices. DeviceID is the primary key and status update Date is the sort key.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-2-Step1.png)


To fetch log entries for a specific device, we can perform a [query](Query.md) operation with partition key `DeviceID="d#12345"`.

**Step 2: Address access pattern 3 (`getWarningLogsForSpecificDevice`)**

Since `State` is a non-key attribute, addressing access pattern 3 with the current schema would require a [filter expression](Query.FilterExpression.md). In DynamoDB, filter expressions are applied after data is read using key condition expressions. For example, if we were to fetch warning logs for `d#12345`, the query operation with partition key `DeviceID="d#12345"` will read four items from the above table and then filter out the one item without the *warning* status. This approach is not efficient at scale. A filter expression can be a good way to exclude items that are queried if the ratio of excluded items is low or the query is performed infrequently. However, for cases where many items are retrieved from a table and the majority of the items are filtered out, we can continue evolving our table design so it runs more efficiently.

Let's change how to handle this access pattern by using [composite sort keys](data-modeling-blocks.md#data-modeling-blocks-composite). You can import sample data from [DeviceStateLog\$13.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/device-state-log/json/DeviceStateLog_3.json) where the sort key is changed to `State#Date`. This sort key is the composition of the attributes `State`, `#`, and `Date`. In this example, `#` is used as a delimiter. The data now looks something like this: 

![\[Status update data for the device, d#12345, fetched using the composite sort key State#Date.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-3-Step2.png)


To fetch only warning logs for a device, the query becomes more targeted with this schema. The key condition for the query uses partition key `DeviceID="d#12345"` and sort key `State#Date begins_with “WARNING”`. This query will only read the relevant three items with the *warning* state.

**Step 3: Address access pattern 4 (`getLogsForOperatorBetweenTwoDates`)**

You can import [DeviceStateLog\$14.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/device-state-log/json/DeviceStateLog_4.json)D where the `Operator` attribute was added to the `DeviceStateLog` table with example data.

![\[DeviceStateLog table design with Operator attribute to get an operator's logs between specific dates.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-4-Step3.png)


Since `Operator` is not currently a partition key, there is no way to perform a direct key-value lookup on this table based on `OperatorID`. We’ll need to create a new [item collection](WorkingWithItemCollections.md) with a global secondary index on `OperatorID`. The access pattern requires a lookup based on dates so Date is the sort key attribute for the [global secondary index (GSI)](GSI.md). This is what the GSI now looks like:

![\[GSI design with OperatorID and Date as partition key and sort key to get logs for a specific operator.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-5-Step3.png)


For access pattern 4 (`getLogsForOperatorBetweenTwoDates`), you can query this GSI with partition key `OperatorID=Liz` and sort key `Date` between `2020-04-11T05:58:00` and `2020-04-24T14:50:00`.

![\[Querying on GSI using OperatorID and Date to get logs for an operator between two dates.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-6-GSI1_1.png)


**Step 4: Address access patterns 5 (`getEscalatedLogsForSupervisor`) 6 (`getEscalatedLogsWithSpecificStatusForSupervisor`), and 7 (`getEscalatedLogsWithSpecificStatusForSupervisorForDate`)**

We’ll be using a [sparse index](data-modeling-blocks.md#data-modeling-blocks-sparse-index) to address these access patterns.

Global secondary indexes are sparse by default, so only items in the base table that contain primary key attributes of the index will actually appear in the index. This is another way of excluding items that are not relevant for the access pattern being modeled.

You can import [DeviceStateLog\$16.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/device-state-log/json/DeviceStateLog_6.json) where the `EscalatedTo` attribute was added to the `DeviceStateLog` table with example data. As mentioned earlier, not all of the logs gets escalated to a supervisor.

![\[GSI design with the EscalatedTo attribute to get all the escalated logs for a supervisor.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-7-Step4.png)


You can now create a new GSI where `EscalatedTo` is the partition key and `State#Date` is the sort key. Notice that only items that have both `EscalatedTo` and `State#Date` attributes appear in the index.

![\[GSI design to get all the items with the EscalatedTo and State#Date attributes.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-8-Step4.png)


The rest of the access patterns are summarized as follows:

    For access pattern 5 (getEscalatedLogsForSupervisor), you can perform a query on the escalations GSI with partition key EscalatedTo="Sara"   For access pattern 6 (getEscalatedLogsWithSpecificStatusForSupervisor), you can perform a query on the escalations GSI with partition key EscalatedTo="Sara" and sort key State\$1Date begins\$1with “WARNING”    For access pattern 7 (getEscalatedLogsWithSpecificStatusForSupervisorForDate), you can perform a query on the escalations GSI with partition key EscalatedTo="Sara" and sort key State\$1Date begins\$1with “WARNING4\$12020-04-27”    

All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | Other conditions/filters | 
| --- | --- | --- | --- | --- | --- | 
| createLogEntryForSpecificDevice | Base table | PutItem | DeviceID=deviceId | State\$1Date=state\$1date |  | 
| getLogsForSpecificDevice | Base table | Query | DeviceID=deviceId | State\$1Date begins\$1with "state1\$1" | ScanIndexForward = False | 
| getWarningLogsForSpecificDevice | Base table | Query | DeviceID=deviceId | State\$1Date begins\$1with "WARNING" |  | 
| getLogsForOperatorBetweenTwoDates | GSI-1 | Query | Operator=operatorName | Date between date1 and date2 |  | 
| getEscalatedLogsForSupervisor | GSI-2 | Query | EscalatedTo=supervisorName |  |  | 
| getEscalatedLogsWithSpecificStatusForSupervisor | GSI-2 | Query | EscalatedTo=supervisorName | State\$1Date begins\$1with "state1\$1" |  | 
| getEscalatedLogsWithSpecificStatusForSupervisorForDate | GSI-2 | Query | EscalatedTo=supervisorName | State\$1Date begins\$1with "state1\$1date1" |  | 

## Final schema
Final schema

Here are the final schema designs. To download this schema design as a JSON file, see [DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples/tree/master/schema_design/SchemaExamples) on GitHub.

**Base table**

![\[Base table design with device status metadata, such as Device ID, State, and Date.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-9-Table.png)


**GSI-1**

![\[GSI-1 design. It shows the primary key and attributes: DeviceID, State#Date, and State.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-10-GSI1.png)


**GSI-2**

![\[GSI-2 design. It shows the primary key and attributes: DeviceID, Operator, Date, and State.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/DeviceStatus-11-GSI2.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Using DynamoDB as a data store for an online shop
Online shop

This use case talks about using DynamoDB as a data store for an online shop (or e-store).

## Use case
Use case

An online store lets users browse through different products and eventually purchase them. Based on the generated invoice, a customer can pay using a discount code or gift card and then pay the remaining amount with a credit card. Purchased products will be picked from one of several warehouses and will be shipped to the provided address. Typical access patterns for an online store include:
+ Get customer for a given customerId
+ Get product for a given productId
+ Get warehouse for a given warehouseId
+ Get a product inventory for all warehouses by a productId
+ Get order for a given orderId
+ Get all products for a given orderId
+ Get invoice for a given orderId
+ Get all shipments for a given orderId
+ Get all orders for a given productId for a given date range
+ Get invoice for a given invoiceId
+ Get all payments for a given invoiceId
+ Get shipment details for a given shipmentId
+ Get all shipments for a given warehouseId
+ Get inventory of all products for a given warehouseId
+ Get all invoices for a given customerId for a given date range
+ Get all products ordered by a given customerId for a given date range

## Entity relationship diagram
Entity relationship diagram

This is the entity relationship diagram (ERD) we'll be using to model DynamoDB as a data store for an online shop.

![\[ERD for an online store's data model with entities, such as Product, Order, Payment, and Customer.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-1-ERD.png)


## Access patterns
Access patterns

These are the access patterns we'll be considering when using DynamoDB as a data store for an online shop.

1. `getCustomerByCustomerId`

1. `getProductByProductId`

1. `getWarehouseByWarehouseId`

1. `getProductInventoryByProductId`

1. `getOrderDetailsByOrderId`

1. `getProductByOrderId`

1. `getInvoiceByOrderId`

1. `getShipmentByOrderId`

1. `getOrderByProductIdForDateRange`

1. `getInvoiceByInvoiceId`

1. `getPaymentByInvoiceId`

1. `getShipmentDetailsByShipmentId`

1. `getShipmentByWarehouseId`

1. `getProductInventoryByWarehouseId`

1. `getInvoiceByCustomerIdForDateRange`

1. `getProductsByCustomerIdForDateRange`

## Schema design evolution
Schema design

Using [NoSQL Workbench for DynamoDB](workbench.md) , import [AnOnlineShop\$11.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_1.json) to create a new data model called `AnOnlineShop` and a new table called `OnlineShop`. Note that we use the generic names `PK` and `SK` for the partition key and sort key. This is a practice used in order to store different types of entities in the same table.

**Step 1: Address access pattern 1 (`getCustomerByCustomerId`)**

Import [AnOnlineShop\$12.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_2.json) to handle access pattern 1 (`getCustomerByCustomerId`). Some entities do not have relationships to other entities, so we will use the same value of `PK` and `SK` for them. In the example data, note that the keys use a prefix `c#` in order to distinguish the `customerId` from other entities that will be added later. This practice is repeated for other entities as well. 

To address this access pattern, a [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_GetItem.html) operation can be used with `PK=customerId` and `SK=customerId`.

**Step 2: Address access pattern 2 (`getProductByProductId`)**

Import [AnOnlineShop\$13.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_3.json) to address access pattern 2 (`getProductByProductId`) for the `product` entity. The product entities are prefixed by `p#` and the same sort key attribute has been used to store `customerID` as well as `productID`. Generic naming and [vertical partitioning](data-modeling-blocks.md#data-modeling-blocks-vertical-partitioning) allows us to create such item collections for an effective single table design. 

To address this access pattern, a `GetItem` operation can be used with `PK=productId` and `SK=productId`.

**Step 3: Address access pattern 3 (`getWarehouseByWarehouseId`)**

Import [AnOnlineShop\$14.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_4.json) to address access pattern 3 (`getWarehouseByWarehouseId`) for the `warehouse` entity. We currently have the `customer`, `product`, and `warehouse` entities added to the same table. They are distinguished using prefixes and the `EntityType` attribute. A type attribute (or prefix naming) improves the model’s readability. The readability would be affected if we simply stored alphanumeric IDs for different entities in the same attribute. It would be difficult to tell one entity from the other in the absence of these identifiers. 

To address this access pattern, a `GetItem` operation can be used with `PK=warehouseId` and `SK=warehouseId`.

**Base table:**

![\[DynamoDB table design with prefixes and EntityType to get warehouse data by its ID.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-2-Step3.png)


**Step 4: Address access pattern 4 (`getProductInventoryByProductId`)**

Import [AnOnlineShop\$15.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_5.json) to address access pattern 4 (`getProductInventoryByProductId`). `warehouseItem` entity is used to keep track of the number of products in each warehouse. This item would normally be updated when a product is added or removed from a warehouse. As seen in the ERD, there is a many-to-many relationship between `product` and `warehouse`. Here, the one-to-many relationship from `product` to `warehouse` is modeled as `warehouseItem`. Later on, the one-to-many relationship from `warehouse` to `product` will be modeled as well. 

Access pattern 4 can be addressed with a query on `PK=ProductId` and `SK begins_with “w#“`. 

For more information about `begins_with()` and other expressions that can be applied to sort keys, see [Key Condition Expressions](Query.KeyConditionExpressions.md).

**Base table:**

![\[Table design to query ProductID and warehouseId for tracking product inventory in a given warehouse.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-3-Step4.png)


**Step 5: Address access patterns 5 (`getOrderDetailsByOrderId`) and 6 (`getProductByOrderId`)**

Add some more `customer`, `product`, and `warehouse` items to the table by importing [AnOnlineShop\$16.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_6.json). Then, import [AnOnlineShop\$17.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_7.json) to build an item collection for `order` that can address access patterns 5 (`getOrderDetailsByOrderId`) and 6 (`getProductByOrderId`). You can see the one-to-many relationship between `order` and `product` modeled as orderItem entities. 

To address access pattern 5 (`getOrderDetailsByOrderId`), query the table with `PK=orderId`. This will provide all information about the order including `customerId` and ordered products.

**Base table:**

![\[Table design to query using orderId for getting information about all ordered products.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-4-Step5.png)


To address access pattern 6 (`getProductByOrderId`), we need to read products in an `order` only. Query the table with `PK=orderId` and `SK begins_with “p#”` to accomplish this.

**Base table:**

![\[Table design to query using orderId and productId for getting products in an order.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-5-Step5.png)


**Step 6: Address access pattern 7 (`getInvoiceByOrderId`)**

Import [AnOnlineShop\$18.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_8.json) to add an `invoice` entity to the *order* item collection to handle access pattern 7 (`getInvoiceByOrderId`). To address this access pattern, you can use a query operation with `PK=orderId` and `SK begins_with “i#”`.

**Base table:**

![\[Table design with invoice entity in the order item collection to get an invoice by orderId.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-6-Step6.png)


**Step 7: Address access pattern 8 (`getShipmentByOrderId`)**

Import [AnOnlineShop\$19.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_9.json) to add `shipment` entities to the *order* item collection to address access pattern 8 (`getShipmentByOrderId`). We are extending the same vertically partitioned model by adding more types of entities in the single table design. Notice how the *order* item collection contains the different relationships that an `order` entity has with the `shipment`, `orderItem`, and `invoice` entities. 

To get shipments by `orderId`, you can perform a query operation with `PK=orderId` and `SK begins_with “sh#”`.

**Base table:**

![\[Table design with shipment entity added to the order item collection to get shipments by order ID.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-7-Step7.png)


**Step 8: Address access pattern 9 (`getOrderByProductIdForDateRange`)**

We created an *order* item collection in the previous step. This access pattern has new lookup dimensions (`ProductID` and `Date`) which requires you to scan the whole table and filter out relevant records to fetch targeted items. In order to address this access pattern, we'll need to create a [global secondary index (GSI)](GSI.md). Import [AnOnlineShop\$110.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_10.json) to create a new item collection using the GSI that makes it possible to retrieve `orderItem` data from several *order* item collections. The data now has `GSI1-PK` and `GSI1-SK` which will be `GSI1`’s partition key and sort key, respectively. 

DynamoDB automatically populates items which contain a GSI’s key attributes from the table to the GSI. There is no need to manually do any additional inserts into the GSI. 

To address access pattern 9, perform a query on `GSI1` with `GSI1-PK=productId` and `GSI1SK between (date1, date2)`.

**Base table:**

![\[Table design with a GSI to get order data from several order item collections.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-8-Step8-Base.png)


**GSI1:**

![\[GSI design with ProductID and Date as partition and sort keys to get orders by product ID and date.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-9-Step8-GSI.png)


**Step 9: Address access patterns 10 (`getInvoiceByInvoiceId`) and 11 (`getPaymentByInvoiceId`)**

Import [AnOnlineShop\$111.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_11.json) to address access patterns 10 (`getInvoiceByInvoiceId`) and 11 (`getPaymentByInvoiceId`), both of which are related to `invoice`. Even though these are two different access patterns, they are realized using the same key condition. `Payments` are defined as an attribute with the map data type on the `invoice` entity.

**Note**  
`GSI1-PK` and `GSI1-SK` is overloaded to store information about different entities so that multiple access patterns can be served from the same GSI. For more information about GSI overloading, see [Overloading Global Secondary Indexes in DynamoDB](bp-gsi-overloading.md).

To address access pattern 10 and 11, query `GSI1` with `GSI1-PK=invoiceId` and `GSI1-SK=invoiceId`.

**GSI1:**

![\[GSI design with invoiceId as both partition and sort key to get invoice and payment by invoice ID.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-10-Step9.png)


**Step 10: Address access patterns 12 (`getShipmentDetailsByShipmentId`) and 13 (`getShipmentByWarehouseId`)**

Import [AnOnlineShop\$112.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_12.json) to address access patterns 12 (`getShipmentDetailsByShipmentId`) and 13 (`getShipmentByWarehouseId`). 

Notice that `shipmentItem` entities are added to the *order* item collection on the base table in order to be able to retrieve all details about an order in a single query operation.

**Base table:**

![\[Table design with shipmentItem entity in the order item collection to get all order details.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-11-Step10.png)


The `GSI1` partition and sort keys have already been used to model a one-to-many relationship between `shipment` and `shipmentItem`. To address access pattern 12 (`getShipmentDetailsByShipmentId`), query `GSI1` with `GSI1-PK=shipmentId` and `GSI1-SK=shipmentId`.

**GSI1:**

![\[GSI1 design with shipmentId as partition and sort key to get shipment details by shipment ID.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-12-Step10-GSI.png)


We’ll need to create another GSI (`GSI2`) to model the new one-to-many relationship between `warehouse` and `shipment` for access pattern 13 (`getShipmentByWarehouseId`). To address this access pattern, query `GSI2` with `GSI2-PK=warehouseId` and `GSI2-SK begins_with “sh#”`.

**GSI2:**

![\[GSI2 design with warehouseId and shipmentId as partition and sort keys to get shipments by warehouse.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-13-Step10-GSI2.png)


**Step 11: Address access patterns 14 (`getProductInventoryByWarehouseId`) 15 (`getInvoiceByCustomerIdForDateRange`), and 16 (`getProductsByCustomerIdForDateRange`)**

Import [AnOnlineShop\$113.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_13.json) to add data related to the next set of access patterns. To address access pattern 14 (`getProductInventoryByWarehouseId`), query `GSI2` with `GSI2-PK=warehouseId` and `GSI2-SK begins_with “p#”`.

**GSI2:**

![\[GSI2 design with warehouseId and productId as partition and sort keys to address access pattern 14.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-14-Step11-GSI2.png)


To address access pattern 15 (`getInvoiceByCustomerIdForDateRange`), query `GSI2` with `GSI2-PK=customerId` and `GSI2-SK between (i#date1, i#date2)`.

**GSI2:**

![\[GSI2 design with customerId and invoice date range as partition and sort keys to address access pattern 15.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-15-Step11-GSI2.png)


To address access pattern 16 (`getProductsByCustomerIdForDateRange`), query `GSI2` with `GSI2-PK=customerId` and `GSI2-SK between (p#date1, p#date2)`.

**GSI2:**

![\[GSI2 design with customerId and product date range as partition and sort keys to address access pattern 16\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-16-Step11-GSI2.png)


**Note**  
In [NoSQL Workbench](workbench.md), *facets* represent an application's different data access patterns for DynamoDB. Facets give you a way to view a subset of the data in a table, without having to see records that don't meet the constraints of the facet. Facets are considered a visual data modeling tool, and don't exist as a usable construct in DynamoDB as they are purely an aid for modeling access patterns.   
Import [AnOnlineShop\$1facets.json](https://github.com/aws-samples/amazon-dynamodb-design-patterns/blob/master/examples/an-online-shop/json/AnOnlineShop_facets.json) to see the facets for this use case.

All access patterns and how the schema design addresses them are summarized in the table below:


| Access pattern | Base table/GSI/LSI | Operation | Partition key value | Sort key value | 
| --- | --- | --- | --- | --- | 
| getCustomerByCustomerId | Base table | GetItem |  PK=customerId | SK=customerId | 
| getProductByProductId | Base table | GetItem |  PK=productId | SK=productId | 
| getWarehouseByWarehouseId | Base table | GetItem |  PK=warehouseId | SK=warehouseId | 
| getProductInventoryByProductId | Base table | Query |  PK=productId | SK begins\$1with "w\$1" | 
| getOrderDetailsByOrderId | Base table | Query |  PK=orderId |  | 
| getProductByOrderId | Base table | Query |  PK=orderId | SK begins\$1with "p\$1" | 
| getInvoiceByOrderId |  Base table | Query |  PK=orderId | SK begins\$1with "i\$1" | 
| getShipmentByOrderId |  Base table | Query |  PK=orderId | SK begins\$1with "sh\$1" | 
| getOrderByProductIdForDateRange |  GSI1 | Query |  PK=productId | SK between date1 and date2 | 
| getInvoiceByInvoiceId |  GSI1 | Query |  PK=invoiceId | SK=invoiceId | 
| getPaymentByInvoiceId |  GSI1 | Query |  PK=invoiceId | SK=invoiceId | 
| getShipmentDetailsByShipmentId |  GSI1 | Query |  PK=shipmentId | SK=shipmentId | 
| getShipmentByWarehouseId |  GSI2 | Query |  PK=warehouseId | SK begins\$1with "sh\$1" | 
| getProductInventoryByWarehouseId |  GSI2 | Query |  PK=warehouseId | SK begins\$1with "p\$1" | 
| getInvoiceByCustomerIdForDateRange |  GSI2 | Query |  PK=customerId | SK between i\$1date1 and i\$1date2 | 
| getProductsByCustomerIdForDateRange |  GSI2 | Query |  PK=customerId | SK between p\$1date1 and p\$1date2 | 

### Online shop final schema
Final schema

Here are the final schema designs. To download this schema design as a JSON file, see [DynamoDB Design Patterns](https://github.com/aws-samples/aws-dynamodb-examples/tree/master/schema_design/SchemaExamples) on GitHub.

**Base table**

![\[Final schema of base table for an online shop with attributes, such as EntityName and Name.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-17-Final-BaseTable.png)


**GSI1**

![\[Final GSI1 schema for an online shop's base table with attributes, such as EntityType.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-18-Final-GSI1.png)


**GSI2**

![\[Final GSI2 schema for an online shop's base table with attributes, such as EntityType.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/DataModeling/OnlineShop-19-Final-GSI2.png)


## Using NoSQL Workbench with this schema design
NoSQL Workbench

You can import this final schema into [NoSQL Workbench](workbench.md), a visual tool that provides data modeling, data visualization, and query development features for DynamoDB, to further explore and edit your new project. Follow these steps to get started:

1. Download NoSQL Workbench. For more information, see [Download NoSQL Workbench for DynamoDB](workbench.settingup.md).

1. Download the JSON schema file listed above, which is already in the NoSQL Workbench model format.

1. Import the JSON schema file into NoSQL Workbench. For more information, see [Importing an existing data model](workbench.Modeler.ImportExisting.md). 

1. Once you've imported into NOSQL Workbench, you can edit the data model. For more information, see [Editing an existing data model](workbench.Modeler.Edit.md).

# Best practices for modeling relational data in DynamoDB
Relational modeling

This section provides best practices for modeling relational data in Amazon DynamoDB. First, we introduce traditional data modeling concepts. Then, we describe the advantages of using DynamoDB over traditional relational database management systems—how it eliminates the need for JOIN operations and reduces overhead. 

We then explain how to design a DynamoDB table that scales efficiently. Finally, we provide an example of how to model relational data in DynamoDB.

**Topics**
+ [

## Traditional relational database models
](#SQLtoNoSQL.relational-modeling2)
+ [

## How DynamoDB eliminates the need for JOIN operations
](#bp-relational-modeling-joins)
+ [

## How DynamoDB transactions eliminate overhead to the write process
](#bp-relational-modeling-transactions)
+ [

# First steps for modeling relational data in DynamoDB
](bp-modeling-nosql.md)
+ [

# Example of modeling relational data in DynamoDB
](bp-modeling-nosql-B.md)

## Traditional relational database models
Traditional relational database models

A traditional relational database management system (RDBMS) stores data in a normalized relational structure. The objective of the relational data model is to reduce the duplication of data (through normalization) to support referential integrity and reduce data anomalies. 

The following schema is an example of a relational data model for a generic order-entry application. The application supports a human resources schema that backs the operational and business support systems of a theoretical manufacturer.

![\[Example RDBMS schema.\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/RDBMS.png)


As a non-relational database service, DynamoDB offers many advantages over traditional relational database management systems. 

## How DynamoDB eliminates the need for JOIN operations


An RDBMS uses a structure query language (SQL) to return data to the application. Because of the normalization of the data model, such queries typically require the use of the `JOIN` operator to combine data from one or more tables.

For example, to generate a list of purchase order items sorted by the quantity in stock at all warehouses that can ship each item, you could issue the following SQL query against the preceding schema.

```
SELECT * FROM Orders
  INNER JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID
  INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID
  INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID
  ORDER BY Quantity_on_Hand DESC
```

SQL queries of this kind can provide a flexible API for accessing data, but they require a significant amount of processing. Each join in the query increases the runtime complexity of the query because the data for each table must stage and then be assembled to return the result set. 

Additional factors that can impact how long it takes the queries to run are the size of the tables and whether the columns being joined have indexes. The preceding query initiates complex queries across several tables and then sorts the result set.

Eliminating the need for `JOINs` is at the heart of NoSQL data modeling. This is why we built DynamoDB to support Amazon.com, and why DynamoDB can deliver consistent performance at any scale. Given the runtime complexity of SQL queries and `JOINs`, RDBMS performance is not constant at scale. This causes performance issues as customer applications grow.

While normalizing data does reduce the amount of data stored to disk, often the most constrained resources that impact performance are CPU time and network latency. 

DynamoDB is built to minimize both constraints by eliminating `JOINs` (and encouraging denormalization of data) and optimizing the database architecture to fully answer an application query with a single request to an item. These qualities enable DynamoDB to provide single-digit, millisecond performance at any scale. This is because the runtime complexity for DynamoDB operations is constant, regardless of data size, for common access patterns.

## How DynamoDB transactions eliminate overhead to the write process


Another factor that can slow down an RDBMS is the use of transactions to write to a normalized schema. As shown in the example, relational data structures used by most online transaction processing (OLTP) applications must be broken down and distributed across multiple logical tables when they are stored in an RDBMS. 

Therefore, an ACID-compliant transaction framework is necessary to avoid race conditions and data integrity issues that could occur if an application tries to read an object that is in the process of being written. Such a transaction framework, when coupled with a relational schema, can add significant overhead to the write process.

The implementation of transactions in DynamoDB prohibits common scaling issues that are found with an RDBMS. DynamoDB does this by issuing a transaction as a single API call and bounding the number of items that can be accessed in that single transaction. Long-running transactions can cause operational issues by holding locks on the data either for a long time, or perpetually, because the transaction is never closed. 

To prevent such issues in DynamoDB, transactions were implemented with two distinct API operations: `TransactWriteItems` and `TransactGetItems`. These API operations do not have begin and end semantics that are common in an RDBMS. Further, DynamoDB has a 100-item access limit within a transaction to similarly prevent long-running transactions. To learn more about DynamoDB transactions, see [Working with transactions](transactions.md).

For these reasons, when your business requires a low-latency response to high-traffic queries, taking advantage of a NoSQL system generally makes technical and economic sense. Amazon DynamoDB helps solve the problems that limit relational system scalability by avoiding them.

The performance of an RDBMS does not typically scale well for the following reasons:
+ It uses expensive joins to reassemble required views of query results.
+ It normalizes data and stores it on multiple tables that require multiple queries to write to disk.
+ It generally incurs the performance costs of an ACID-compliant transaction system.

DynamoDB scales well for these reasons:
+ Schema flexibility lets DynamoDB store complex hierarchical data within a single item.
+ Composite key design lets it store related items close together on the same table.
+ Transactions are performed in a single operation. The limit for the number of items that can be accessed is 100, to avoid long-running operations.

Queries against the data store become much simpler, often in the following form:

```
SELECT * FROM Table_X WHERE Attribute_Y = "somevalue"
```

DynamoDB does far less work to return the requested data compared to the RDBMS in the earlier example.

# First steps for modeling relational data in DynamoDB
First steps

**Note**  
NoSQL design requires a different mindset than RDBMS design. For an RDBMS, you can create a normalized data model without thinking about access patterns. You can then extend it later when new questions and query requirements arise. By contrast, in Amazon DynamoDB, you shouldn't start designing your schema until you know the questions that it needs to answer. Understanding the business problems and the application use cases up front is absolutely essential.

To start designing a DynamoDB table that will scale efficiently, you must take several steps first to identify the access patterns that are required by the operations and business support systems (OSS/BSS) that it needs to support:
+ For new applications, review user stories about activities and objectives. Document the various use cases you identify, and analyze the access patterns that they require.
+ For existing applications, analyze query logs to find out how people are currently using the system and what the key access patterns are.

After completing this process, you should end up with a list that might look something like the following.


**Access Patterns for Order Entry Application**  

| Pattern \$1 | Access Pattern Description | 
| --- | --- | 
| 1 | Look up Employee Details by Employee ID | 
| 2 | Query Employee Details by Employee Name | 
| 3 | Find an Employee's Phone Number(s) | 
| 4 | Find a Customer's Phone Number(s) | 
| 5 | Get Orders for Customer within Date Range | 
| 6 | Show all Open Orders within Date Range | 
| 7 | See all Employees hired recently | 
| 8 | Find all Employees in Warehouse | 
| 9 | Get all Items on Order for Product | 
| 10 | Get Inventories for Product at all Warehouses | 
| 11 | Get Customers by Account Rep | 
| 12 | Get Orders by Account Rep | 
| 13 | Get Employees with Job Title | 
| 14 | Get Inventory by Product and Warehouse | 
| 15 | Get Total Product Inventory | 

In a real application, your list might be much longer. But this collection represents the range of query pattern complexity that you might find in a production environment.

A modern approach to DynamoDB schema design uses aggregate-oriented principles, grouping data based on access patterns rather than rigid entity boundaries. This approach considers multiple design patterns:
+ *Single Table Design* - Using composite sort keys, overloaded global secondary indexes, and adjacency list patterns to store multiple entity types in one table
+ *Multi-Table Design* - Using separate tables for entities with independent operational characteristics and low access correlation, with strategic GSIs for cross-entity queries
+ *Aggregate Design* - Embedding related data when always accessed together (Order \$1 OrderItems) or using item collections for identifying relationships (Product \$1 Inventory)

The choice between these approaches depends on your specific access patterns, data characteristics, and operational requirements. You can use these elements to structure the data so that an application can retrieve whatever it needs for a given access pattern using a single query on a table or index.

**Note**  
The choice between single-table and multi-table design depends on your specific requirements. Single-table design works well when entities have high access correlation and similar operational characteristics. Multi-table design is preferred when entities have independent operational requirements, different access patterns, or when you need clear operational boundaries. The example in this guide demonstrates a multi-table approach with strategic aggregation and denormalization.

To use NoSQL Workbench for DynamoDB to help visualize your partition key design, see [Building data models with NoSQL Workbench](workbench.Modeler.md).

# Example of modeling relational data in DynamoDB
Example

This example describes how to model relational data in Amazon DynamoDB. The DynamoDB table design corresponds to the relational order entry schema that is shown in [Relational modeling](bp-relational-modeling.md). This design uses multiple specialized tables rather than a single adjacency list, providing clear operational boundaries while leveraging strategic GSIs to serve all access patterns efficiently.

The design approach uses aggregate-oriented principles, grouping data based on access patterns rather than rigid entity boundaries. Key design decisions include using separate tables for entities with low access correlation, embedding related data when always accessed together, and using item collections for identifying relationships.

The following tables and their accompanying indexes support the relational order entry schema:

## Employee Table Design


The Employee table stores employee information as a single entity per item, optimized for direct employee lookups and supporting multiple query patterns through strategic GSIs. This table demonstrates the principle of designing separate tables for entities with independent operational characteristics and low cross-entity access correlation.

The table uses a simple partition key (employee\$1id) without a sort key, as each employee is a distinct entity. Four GSIs enable efficient querying by different attributes:
+ *EmployeeByName GSI* - Uses INCLUDE projection with all employee attributes to support complete employee detail retrieval by name, handling potential duplicate names with employee\$1id as sort key
+ *EmployeeByWarehouse GSI* - Uses INCLUDE projection with only essential attributes (name, job\$1title, hire\$1date) to minimize storage costs while supporting warehouse-based queries
+ *EmployeeByJobTitle GSI* - Enables role-based queries with INCLUDE projection for reporting and organizational analysis
+ *EmployeeByHireDate GSI* - Uses a static partition key value "EMPLOYEE" with hire\$1date as sort key to enable efficient date range queries for recent hires. Since employee additions/updates are typically under 1,000 WCU, a single partition can handle the write load without hot partition issues


**Employee Table - Base Table Structure**  

| employee\$1id (PK) | name | phone\$1numbers | warehouse\$1id | job\$1title | hire\$1date | entity\$1type | 
| --- | --- | --- | --- | --- | --- | --- | 
| emp\$1001 | John Smith | ["\$11-555-0101"] | wh\$1sea | Manager | 2024-03-15 | EMPLOYEE | 
| emp\$1002 | Jane Doe | ["\$11-555-0102", "\$11-555-0103"] | wh\$1sea | Associate | 2025-01-10 | EMPLOYEE | 
| emp\$1003 | Bob Wilson | ["\$11-555-0104"] | wh\$1pdx | Associate | 2025-06-20 | EMPLOYEE | 
| emp\$1004 | Alice Brown | ["\$11-555-0105"] | wh\$1pdx | Supervisor | 2023-11-05 | EMPLOYEE | 
| emp\$1005 | Charlie Davis | ["\$11-555-0106"] | wh\$1sea | Associate | 2025-12-01 | EMPLOYEE | 


**EmployeeByName GSI - Supporting Employee Name Queries**  

| name (GSI-PK) | employee\$1id (GSI-SK) | phone\$1numbers | warehouse\$1id | job\$1title | hire\$1date | 
| --- | --- | --- | --- | --- | --- | 
| Alice Brown | emp\$1004 | ["\$11-555-0105"] | wh\$1pdx | Supervisor | 2023-11-05 | 
| Bob Wilson | emp\$1003 | ["\$11-555-0104"] | wh\$1pdx | Associate | 2025-06-20 | 
| Charlie Davis | emp\$1005 | ["\$11-555-0106"] | wh\$1sea | Associate | 2025-12-01 | 
| Jane Doe | emp\$1002 | ["\$11-555-0102", "\$11-555-0103"] | wh\$1sea | Associate | 2025-01-10 | 
| John Smith | emp\$1001 | ["\$11-555-0101"] | wh\$1sea | Manager | 2024-03-15 | 


**EmployeeByWarehouse GSI - Supporting Warehouse Queries**  

| warehouse\$1id (GSI-PK) | employee\$1id (GSI-SK) | name | job\$1title | hire\$1date | 
| --- | --- | --- | --- | --- | 
| wh\$1pdx | emp\$1003 | Bob Wilson | Associate | 2025-06-20 | 
| wh\$1pdx | emp\$1004 | Alice Brown | Supervisor | 2023-11-05 | 
| wh\$1sea | emp\$1001 | John Smith | Manager | 2024-03-15 | 
| wh\$1sea | emp\$1002 | Jane Doe | Associate | 2025-01-10 | 
| wh\$1sea | emp\$1005 | Charlie Davis | Associate | 2025-12-01 | 


**EmployeeByJobTitle GSI - Supporting Job Title Queries**  

| job\$1title (GSI-PK) | employee\$1id (GSI-SK) | name | warehouse\$1id | hire\$1date | 
| --- | --- | --- | --- | --- | 
| Associate | emp\$1002 | Jane Doe | wh\$1sea | 2025-01-10 | 
| Associate | emp\$1003 | Bob Wilson | wh\$1pdx | 2025-06-20 | 
| Associate | emp\$1005 | Charlie Davis | wh\$1sea | 2025-12-01 | 
| Manager | emp\$1001 | John Smith | wh\$1sea | 2024-03-15 | 
| Supervisor | emp\$1004 | Alice Brown | wh\$1pdx | 2023-11-05 | 


**EmployeeByHireDate GSI - Supporting Recent Hire Queries**  

| entity\$1type (GSI-PK) | hire\$1date (GSI-SK) | employee\$1id | name | warehouse\$1id | 
| --- | --- | --- | --- | --- | 
| EMPLOYEE | 2023-11-05 | emp\$1004 | Alice Brown | wh\$1pdx | 
| EMPLOYEE | 2024-03-15 | emp\$1001 | John Smith | wh\$1sea | 
| EMPLOYEE | 2025-01-10 | emp\$1002 | Jane Doe | wh\$1sea | 
| EMPLOYEE | 2025-06-20 | emp\$1003 | Bob Wilson | wh\$1pdx | 
| EMPLOYEE | 2025-12-01 | emp\$1005 | Charlie Davis | wh\$1sea | 

## Customer Table Design


The Customer table maintains customer information with strategic denormalization of account\$1rep\$1id to enable efficient account representative queries. This design choice trades slight storage overhead for query performance, eliminating the need for joins between customer and account representative data.

The table supports multiple phone numbers per customer using a list attribute, demonstrating DynamoDB's schema flexibility. The single GSI enables account representative workflows:
+ *CustomerByAccountRep GSI* - Uses INCLUDE projection with name and email attributes to support account rep customer management without requiring full customer record retrieval


**Customer Table - Base Table Structure**  

| customer\$1id (PK) | name | phone\$1numbers | email | account\$1rep\$1id | 
| --- | --- | --- | --- | --- | 
| cust\$1001 | Acme Corp | ["\$11-555-1001"] | contact@acme.com | rep\$1001 | 
| cust\$1002 | TechStart Inc | ["\$11-555-1002", "\$11-555-1003"] | info@techstart.com | rep\$1001 | 
| cust\$1003 | Global Traders | ["\$11-555-1004"] | sales@globaltraders.com | rep\$1002 | 
| cust\$1004 | BuildRight LLC | ["\$11-555-1005"] | orders@buildright.com | rep\$1002 | 
| cust\$1005 | FastShip Co | ["\$11-555-1006"] | support@fastship.com | rep\$1003 | 


**CustomerByAccountRep GSI - Supporting Account Rep Queries**  

| account\$1rep\$1id (GSI-PK) | customer\$1id (GSI-SK) | name | email | 
| --- | --- | --- | --- | 
| rep\$1001 | cust\$1001 | Acme Corp | contact@acme.com | 
| rep\$1001 | cust\$1002 | TechStart Inc | info@techstart.com | 
| rep\$1002 | cust\$1003 | Global Traders | sales@globaltraders.com | 
| rep\$1002 | cust\$1004 | BuildRight LLC | orders@buildright.com | 
| rep\$1003 | cust\$1005 | FastShip Co | support@fastship.com | 

## Order Table Design


The Order table uses vertical partitioning with separate items for order headers and order items. This design enables efficient product-based queries while maintaining all order components within the same partition for efficient access. Each order consists of multiple items:
+ *Order Header* - Contains order metadata with PK=order\$1id, SK=order\$1id
+ *Order Items* - Individual line items with PK=order\$1id, SK=product\$1id, enabling direct product queries

**Note**  
This vertical partitioning approach trades the simplicity of embedded order items for enhanced query flexibility. Each order item becomes a separate DynamoDB item, enabling efficient product-based queries while maintaining all order data within the same partition for efficient retrieval in a single request.

The table includes strategic denormalization of account\$1rep\$1id (duplicated from Customer table) to enable direct account representative queries without requiring customer lookups. For high-throughput write scenarios, OPEN orders include status and shard attributes to enable write sharding across multiple partitions.

Four GSIs support different query patterns with optimized projections:
+ *OrderByCustomerDate GSI* - Uses INCLUDE projection with order summary and item details to support customer order history with date range filtering
+ *OpenOrdersByDate GSI (Sparse, Sharded)* - Uses multi-attribute partition key (status \$1 shard) with 5 shards to distribute 5,000 WPS (writes per second) across partitions (1,000 WPS each, matching DynamoDB's 1,000 WCU per partition limit). Only indexes OPEN orders (20% of total), which can help reduce GSI storage costs. Requires parallel queries across all 5 shards with client-side result merging
+ *OrderByAccountRep GSI* - Uses INCLUDE projection with order summary attributes to support account representative workflows without full order details
+ *ProductInOrders GSI* - Created from OrderItem records (PK=order\$1id, SK=product\$1id), this GSI enables queries to find all orders containing a specific product. Uses INCLUDE projection with order context (customer\$1id, order\$1date, quantity) for product demand analysis


**Order Table - Base Table Structure (Vertical Partitioning)**  

| PK | SK | customer\$1id | order\$1date | status | account\$1rep\$1id | quantity | price | shard | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| ord\$1001 | ord\$1001 | cust\$1001 | 2025-11-15 | CLOSED | rep\$1001 |  |  |  | 
| ord\$1001 | prod\$1100 |  |  |  |  | 5 | 25.00 |  | 
| ord\$1002 | ord\$1002 | cust\$1001 | 2025-12-20 | OPEN | rep\$1001 |  |  | 0 | 
| ord\$1002 | prod\$1101 |  |  |  |  | 10 | 15.00 |  | 
| ord\$1003 | ord\$1003 | cust\$1002 | 2026-01-05 | OPEN | rep\$1001 |  |  | 2 | 
| ord\$1003 | prod\$1100 |  |  |  |  | 3 | 25.00 |  | 


**OrderByCustomerDate GSI - Supporting Customer Order Queries**  

| customer\$1id (GSI-PK) | order\$1date (GSI-SK) | order\$1id | status | total\$1amount | order\$1items | shard | 
| --- | --- | --- | --- | --- | --- | --- | 
| cust\$1001 | 2025-11-15 | ord\$1001 | CLOSED | 225.00 | [\$1product\$1id: "prod\$1100", qty: 5\$1] |  | 
| cust\$1001 | 2025-12-20 | ord\$1002 | OPEN | 150.00 | [\$1product\$1id: "prod\$1101", qty: 10\$1] | 0 | 
| cust\$1002 | 2026-01-05 | ord\$1003 | OPEN | 175.00 | [\$1product\$1id: "prod\$1100", qty: 3\$1] | 2 | 
| cust\$1003 | 2025-10-10 | ord\$1004 | CLOSED | 250.00 | [\$1product\$1id: "prod\$1101", qty: 5\$1] |  | 
| cust\$1004 | 2026-01-03 | ord\$1005 | OPEN | 200.00 | [\$1product\$1id: "prod\$1100", qty: 20\$1] | 1 | 


**OpenOrdersByDate GSI (Sparse, Sharded) - Supporting High-Throughput Open Order Queries**  

| status (GSI-PK-1) | shard (GSI-PK-2) | order\$1date (SK) | order\$1id | customer\$1id | account\$1rep\$1id | order\$1items | total\$1amount | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| OPEN | 0 | 2025-12-20 | ord\$1002 | cust\$1001 | rep\$1001 | [\$1product\$1id: "prod\$1101", qty: 10\$1] | 150.00 | 
| OPEN | 1 | 2026-01-03 | ord\$1005 | cust\$1004 | rep\$1002 | [\$1product\$1id: "prod\$1100", qty: 20\$1] | 200.00 | 
| OPEN | 2 | 2026-01-05 | ord\$1003 | cust\$1002 | rep\$1001 | [\$1product\$1id: "prod\$1100", qty: 3\$1] | 175.00 | 


**OrderByAccountRep GSI - Supporting Account Rep Order Queries**  

| account\$1rep\$1id (GSI-PK) | order\$1date (GSI-SK) | order\$1id | customer\$1id | status | total\$1amount | 
| --- | --- | --- | --- | --- | --- | 
| rep\$1001 | 2025-11-15 | ord\$1001 | cust\$1001 | CLOSED | 225.00 | 
| rep\$1001 | 2025-12-20 | ord\$1002 | cust\$1001 | OPEN | 150.00 | 
| rep\$1001 | 2026-01-05 | ord\$1003 | cust\$1002 | OPEN | 175.00 | 
| rep\$1002 | 2025-10-10 | ord\$1004 | cust\$1003 | CLOSED | 250.00 | 
| rep\$1002 | 2026-01-03 | ord\$1005 | cust\$1004 | OPEN | 200.00 | 


**ProductInOrders GSI - Supporting Product Order Queries**  

| product\$1id (GSI-PK) | order\$1id (GSI-SK) | customer\$1id | order\$1date | quantity | 
| --- | --- | --- | --- | --- | 
| prod\$1100 | ord\$1001 | cust\$1001 | 2025-11-15 | 5 | 
| prod\$1100 | ord\$1003 | cust\$1002 | 2026-01-05 | 3 | 
| prod\$1101 | ord\$1002 | cust\$1001 | 2025-12-20 | 10 | 

## Product Table Design


The Product table uses the item collection pattern to store both product metadata and inventory data within the same partition. This design leverages the identifying relationship between products and inventory - inventory cannot exist without a parent product. Using PK=product\$1id with SK=product\$1id for product metadata and SK=warehouse\$1id for inventory items eliminates the need for a separate Inventory table and GSI, reducing costs by approximately 50%.

This pattern enables efficient queries for both individual warehouse inventory (GetItem with composite key) and all warehouse inventory for a product (Query on partition key). The total\$1inventory attribute in the product metadata item provides denormalized aggregation for quick total inventory lookups.


**Product Table - Base Table Structure (Item Collection Pattern)**  

| product\$1id (PK) | warehouse\$1id (SK) | product\$1name | category | unit\$1price | inventory\$1quantity | total\$1inventory | 
| --- | --- | --- | --- | --- | --- | --- | 
| prod\$1100 | prod\$1100 | Widget A | Hardware | 25.00 |  | 500 | 
| prod\$1100 | wh\$1sea |  |  |  | 200 |  | 
| prod\$1100 | wh\$1pdx |  |  |  | 150 |  | 
| prod\$1100 | wh\$1atl |  |  |  | 150 |  | 
| prod\$1101 | prod\$1101 | Gadget B | Electronics | 50.00 |  | 300 | 
| prod\$1101 | wh\$1sea |  |  |  | 100 |  | 
| prod\$1101 | wh\$1pdx |  |  |  | 200 |  | 

Each table is designed with specific Global Secondary Indexes (GSIs) to support the required access patterns efficiently. The design uses aggregate-oriented principles with strategic denormalization and sparse indexing to optimize both performance and cost.

Key design optimizations include:
+ *Sparse GSI* - OpenOrdersByDate only indexes OPEN orders (20% of total), which can help reduce GSI storage costs
+ *Item Collection Pattern* - Product table stores inventory using PK=product\$1id, SK=warehouse\$1id to eliminate separate inventory table
+ *Order \$1 OrderItems Aggregation* - Embedded as single item due to 100% access correlation
+ *Strategic Denormalization* - account\$1rep\$1id duplicated in Order table for efficient queries

Finally, you can revisit the access patterns that were defined earlier. The following table shows how each access pattern is efficiently supported using the multi-table design with strategic GSIs. Each pattern uses either direct key lookups or single GSI queries, avoiding expensive scans and providing consistent performance at any scale.


| S. No. | Access patterns | Query conditions | 
| --- | --- | --- | 
|  1  |  Look up Employee Details by Employee ID  |  Employee Table: GetItem(employee\$1id="emp\$1001")  | 
|  2  |  Query Employee Details by Employee Name  |  EmployeeByName GSI: Query(name="John Smith")  | 
|  3  |  Find an Employee's Phone Number(s)  |  Employee Table: GetItem(employee\$1id="emp\$1001")  | 
|  4  |  Find a Customer's Phone Number(s)  |  Customer Table: GetItem(customer\$1id="cust\$1001")  | 
|  5  |  Get Orders for Customer within Date Range  |  OrderByCustomerDate GSI: Query(customer\$1id="cust\$1001", order\$1date BETWEEN "2025-01-01" AND "2025-12-31")  | 
|  6  |  Show all Open Orders within Date Range  |  OpenOrdersByDate GSI: Query 5 shards in parallel with multi-attribute PK (status="OPEN" \$1 shard=0-4), SK=order\$1date BETWEEN "2025-01-01" AND "2025-12-31", merge results  | 
|  7  |  See all Employees hired recently  |  EmployeeByHireDate GSI: Query(entity\$1type="EMPLOYEE", hire\$1date >= "2025-01-01")  | 
|  8  |  Find all Employees in Warehouse  |  EmployeeByWarehouse GSI: Query(warehouse\$1id="wh\$1sea")  | 
|  9  |  Get all Items on Order for Product  |  ProductInOrders GSI: Query(product\$1id="prod\$1100")  | 
|  10  |  Get Inventories for Product at all Warehouses  |  Product Table: Query(product\$1id="prod\$1100")  | 
|  11  |  Get Customers by Account Rep  |  CustomerByAccountRep GSI: Query(account\$1rep\$1id="rep\$1001")  | 
|  12  |  Get Orders by Account Rep  |  OrderByAccountRep GSI: Query(account\$1rep\$1id="rep\$1001")  | 
|  13  |  Get Employees with Job Title  |  EmployeeByJobTitle GSI: Query(job\$1title="Manager")  | 
|  14  |  Get Inventory by Product and Warehouse  |  Product Table: GetItem(product\$1id="prod\$1100", warehouse\$1id="wh\$1sea")  | 
|  15  |  Get Total Product Inventory  |  Product Table: GetItem(product\$1id="prod\$1100", warehouse\$1id="prod\$1100")  |