

# Data patterns
<a name="a-sus-data-patterns"></a>

**Topics**
+ [SUS 4 How do you take advantage of data access and usage patterns to support your sustainability goals?](sus-04.md)

# SUS 4 How do you take advantage of data access and usage patterns to support your sustainability goals?
<a name="sus-04"></a>

Implement data management practices to reduce the provisioned storage required to support your workload, and the resources required to use it. Understand your data, and use storage technologies and configurations that best support the business value of the data and how it’s used. Lifecycle data to more efficient, less performant storage when requirements decrease, and delete data that’s no longer required. 

**Topics**
+ [SUS04-BP01 Implement a data classification policy](sus_sus_data_a2.md)
+ [SUS04-BP02 Use technologies that support data access and storage patterns](sus_sus_data_a3.md)
+ [SUS04-BP03 Use lifecycle policies to delete unnecessary data](sus_sus_data_a4.md)
+ [SUS04-BP04 Minimize over-provisioning in block storage](sus_sus_data_a5.md)
+ [SUS04-BP05 Remove unneeded or redundant data](sus_sus_data_a6.md)
+ [SUS04-BP06 Use shared file systems or object storage to access common data](sus_sus_data_a7.md)
+ [SUS04-BP07 Minimize data movement across networks](sus_sus_data_a8.md)
+ [SUS04-BP08 Back up data only when difficult to recreate](sus_sus_data_a9.md)

# SUS04-BP01 Implement a data classification policy
<a name="sus_sus_data_a2"></a>

 Classify data to understand its significance to business outcomes. Use this information to determine when you can move data to more energy-efficient storage or safely delete it. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Determine requirements for the distribution, retention, and deletion of your data. 
+  Use tagging on volumes and objects to record the metadata that’s used to determine how it’s managed, including data classification. 
+  Periodically audit your environment for untagged and unclassified data, and classify and tag the data appropriately. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Data Classification Process](https://docs.aws.amazon.com/whitepapers/latest/data-classification/data-classification-process.html) 
+  [Leveraging AWS Cloud to Support Data Classification](https://docs.aws.amazon.com/whitepapers/latest/data-classification/leveraging-aws-cloud-to-support-data-classification.html) 
+  [Tag policies from AWS Organizations](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_tag-policies.html) 

# SUS04-BP02 Use technologies that support data access and storage patterns
<a name="sus_sus_data_a3"></a>

 Use storage that best supports how your data is accessed and stored to minimize the resources provisioned while supporting your workload. For example, Solid State Devices (SSDs) are more energy intensive than magnetic drives and should be used only for active data use cases. Use energy-efficient, archival-class storage for infrequently accessed data. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Monitor your data access patterns. 
+  Migrate data to the appropriate technology based on access pattern. 
+  Migrate archival data to storage designed for that purpose. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS volume types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html) 
+  [Amazon EC2 instance store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) 
+  [Amazon S3 Intelligent-Tiering](https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering.html) 
+  [Using Amazon S3 storage classes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html) 
+  [What is Amazon CloudWatch?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) 
+  [What is Amazon Glacier?](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 

 **Related videos:** 
+  [Architectural Patterns for Data Lakes on AWS](https://www.youtube.com/watch?v=XpTly4XHmqc&ab_channel=AWSEvents) 

# SUS04-BP03 Use lifecycle policies to delete unnecessary data
<a name="sus_sus_data_a4"></a>

 Manage the lifecycle of all your data and automatically enforce deletion timelines to minimize the total storage requirements of your workload. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Define lifecycle policies for all your data classification types. 
+  Set automated lifecycle policies to enforce lifecycle rules. 
+  Delete unused volumes and snapshots. 
+  Aggregate data where applicable based on lifecycle rules. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon ECR Lifecycle policies](https://docs.aws.amazon.com/AmazonECR/latest/userguide/LifecyclePolicies.html) 
+  [Amazon EFS lifecycle management](https://docs.aws.amazon.com/efs/latest/ug/lifecycle-management-efs.html) 
+  [Amazon S3 Intelligent-Tiering](https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering.html) 
+  [Evaluating Resources with AWS Config Rules](https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config.html) 
+  [Managing your storage lifecycle on Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html) 
+  [Object lifecycle policies in AWS Elemental MediaStore](https://docs.aws.amazon.com/mediastore/latest/ug/policies-object-lifecycle.html) 

 **Related videos:** 
+  [Amazon S3 Lifecycle](https://www.youtube.com/watch?v=53eHNSpaMJI&ab_channel=AmazonWebServices) 

# SUS04-BP04 Minimize over-provisioning in block storage
<a name="sus_sus_data_a5"></a>

 To minimize total provisioned storage, create block storage with size allocations that are appropriate for the workload. Use elastic volumes to expand storage as data grows without having to resize storage attached to compute resources. Regularly review elastic volumes and shrink over-provisioned volumes to fit the current data size. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Monitor the utilization of your data volumes. 
+  Use elastic volumes and managed block data services to automate allocation of additional storage as your persistent data grows. 
+  Set target levels of utilization for your data volumes, and resize volumes outside of expected ranges. 
+  Size read-only volumes to fit the data. 
+  Migrate data to object stores to avoid provisioning the excess capacity from fixed volume sizes on block storage. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS Elastic Volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html) 
+  [Amazon FSx Documentation](https://docs.aws.amazon.com/fsx/index.html) 
+  [What is Amazon CloudWatch?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) 
+  [What is Amazon Elastic File System?](https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html) 

# SUS04-BP05 Remove unneeded or redundant data
<a name="sus_sus_data_a6"></a>

 Duplicate data only when necessary to minimize total storage consumed. Use backup technologies that deduplicate data at the file and block level. Limit the use of Redundant Array of Independent Drives (RAID) configurations except where required to meet Service Level Agreements (SLAs). 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Use mechanisms that can deduplicate data at the block and object level. 
+  Use backup technology that can make incremental backups and deduplicate data at the block, file, and object level. 
+  Use RAID only when required to meet your SLAs. 
+  Centralize log and trace data, deduplicate identical log entries, and establish mechanisms to tune verbosity when needed. 
+  Pre-populate caches only where justified. 
+  Establish cache monitoring and automation to resize cache accordingly. 
+  Remove out-of-date deployments and assets from object stores and edge caches when pushing new versions of your workload. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS snapshots](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html) 
+  [Change log data retention in CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html#SettingLogRetention) 
+  [Data deduplication on Amazon FSx for Windows File Server](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/using-data-dedup.html) 
+  [Features of Amazon FSx for ONTAP including data deduplication](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/what-is-fsx-ontap.html#features-overview) 
+  [Invalidating Files on Amazon CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html) 
+  [Using AWS Backup to back up and restore Amazon EFS file systems](https://docs.aws.amazon.com/efs/latest/ug/awsbackup.html) 
+  [What is Amazon CloudWatch Logs?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) 
+  [Working with backups on Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html) 

 **Related examples:** 
+  [Lab: Optimize Data Pattern Using Amazon Redshift Data Sharing](https://wellarchitectedlabs.com/sustainability/300_labs/300_optimize_data_pattern_using_redshift_data_sharing/) 

# SUS04-BP06 Use shared file systems or object storage to access common data
<a name="sus_sus_data_a7"></a>

 Adopt shared storage and single sources of truth to avoid data duplication and reduce the total storage requirements of your workload. Fetch data from shared storage only as needed. Detach unused volumes to make more resources available. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Migrate data to shared storage when the data has multiple consumers. 
+  Fetch data from shared storage only as needed. 
+  Delete data as appropriate for your usage patterns, and implement time-to-live (TTL) functionality to manage cached data. 
+  Detach volumes from clients that are not actively using them. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon FSx](https://aws.amazon.com/fsx/) 
+  [Caching strategies](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/Strategies.html) 
+  [What is Amazon Elastic File System?](https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html) 
+  [What is Amazon S3?](https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html) 

# SUS04-BP07 Minimize data movement across networks
<a name="sus_sus_data_a8"></a>

 Use shared storage and access data from regional data stores to minimize the total networking resources required to support data movement for your workload. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Store data as close to the consumer as possible. 
+  Partition regionally consumed services so that their Region-specific data is stored within the Region where it is consumed. 
+  Use block-level duplication instead of file or object-level duplication when copying changes across the network. 
+  Compress data before moving it over the network. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Optimizing your AWS Infrastructure for Sustainability, Part III: Networking](https://aws.amazon.com/blogs/architecture/optimizing-your-aws-infrastructure-for-sustainability-part-iii-networking/) 
+  [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/) 
+  [Amazon CloudFront Key Features including the CloudFront Global Edge Network](https://aws.amazon.com/cloudfront/features/) 
+  [Compressing HTTP requests in Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/gzip.html) 
+  [Intermediate data compression with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-output-compression.html#HadoopIntermediateDataCompression) 
+  [Loading compressed data files from Amazon S3 into Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/t_loading-gzip-compressed-data-files-from-S3.html) 
+  [Serving compressed files with Amazon CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html) 

# SUS04-BP08 Back up data only when difficult to recreate
<a name="sus_sus_data_a9"></a>

 To minimize storage consumption, only back up data that has business value or is needed to satisfy compliance requirements. Examine backup policies and exclude ephemeral storage that doesn’t provide value in a recovery scenario. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Use your data classification to establish what data needs to be backed up. 
+  Exclude data that you can easily recreate. 
+  Exclude ephemeral data from your backups. 
+  Exclude local copies of data, unless the time required to restore that data from a common location exceeds your service level agreements (SLAs). 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Using AWS Backup to back up and restore Amazon EFS file systems](https://docs.aws.amazon.com/efs/latest/ug/awsbackup.html) 
+  [Amazon EBS snapshots](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html) 
+  [Working with backups on Amazon Relational Database Service](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html) 