

# 13 – Manage cost over time
13 – Manage cost over time

 **How do you manage the cost of your workload over time?** To ensure that you always have the most cost-efficient workload, periodically review your workload to discover opportunities to implement new services, features, and components. It is common for analytics workloads to have an ever-growing number of users and exponential growth of data volume. Implement a standardized process across your organization to identify and remove unused resources, such as unused data, infrastructure, and ETL jobs. 


|  **ID**  |  **Priority**  |  **Best practice**  | 
| --- | --- | --- | 
|  ☐ BP 13.1   |  Recommended  |   Remove unused data and infrastructure.   | 
|  ☐ BP 13.2   |  Recommended  |  Reduce overprovisioning infrastructure.  | 
|  ☐ BP 13.3   |  Recommended  |  Evaluate and adopt new cost-effective solutions.  | 

 For more details, refer to the following information: 
+  AWS Database Blog: [Safely reduce the cost of your unused Amazon DynamoDB tables using On-Demand mode.](https://aws.amazon.com/blogs/database/safely-reduce-the-cost-of-your-unused-amazon-dynamodb-tables-using-on-demand-mode/) 
+  AWS Management and Governance Blog: [Controlling your AWS costs by deleting unused Amazon EBS](https://aws.amazon.com/blogs/mt/controlling-your-aws-costs-by-deleting-unused-amazon-ebs-volumes/) [volumes](https://aws.amazon.com/blogs/mt/controlling-your-aws-costs-by-deleting-unused-amazon-ebs-volumes/). 
+  AWS Database Blog: [Implementing DB Instance Stop and Start in Amazon RDS](https://aws.amazon.com/blogs/database/implementing-db-instance-stop-and-start-in-amazon-rds/). 
+  AWS Big Data Blog: [Lower your costs with the new pause and resume actions on Amazon Redshift](https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/). 
+  AWS Partner Network (APN) Blog: [Scaling Laravel Jobs with AWS Batch and Amazon EventBridge](https://aws.amazon.com/blogs/apn/scaling-laravel-jobs-with-aws-batch-and-amazon-eventbridge/). 
+  AWS Glue Developer Guide: [Tracking Processed Data Using Job Bookmarks](https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html). 

# Best practice 13.1 – Remove unused data and infrastructure
BP 13.1 – Remove unused data and infrastructure

 Delete data that is out of its retention period, or not needed anymore. Delete intermediate-processed data that can be removed without business impacts. If the output of analytics jobs is not used by anyone, consider removing such jobs so that you don't waste resources. 

## Suggestion 13.1.1 – Track data freshness
Suggestion 13.1.1 – Track data freshness

 In many cases, maintaining a metadata repository for tracking data movement will be worthwhile. This is not only to instill confidence in the quality of the data, but also to identify infrequently updated data, and unused data. 

## Suggestion 13.1.2 – Delete data that is out of its retention period
Suggestion 13.1.2 – Delete data that is out of its retention period

 Data that is past its retention period should be deleted to reduce unnecessary storage costs. Identify data through the metadata catalog that is outside its retention period. To reduce human effort, automate the data removal process. If data is stored in Amazon S3, use Amazon S3 Lifecycle configurations to expire data automatically. 

## Suggestion 13.1.3 – Delete intermediate-processed data that can be removed without business impacts
Suggestion 13.1.3 – Delete intermediate-processed data that can be removed without business impacts

 Many steps in analytics processes create intermediate or temporary datasets. Ensure that intermediate datasets are removed if they have no further business value. 

## Suggestion 13.1.4 – Remove unused analytics jobs that consume infrastructure resources but no one uses the job results
Suggestion 13.1.4 – Remove unused analytics jobs that consume infrastructure resources but no one uses the job results

 Periodically review the ownership, source, and downstream consumers of all analytics infrastructure resources. If downstream consumers no longer need the analytics job, stop the job from running and remove unneeded resources. 

 

## Suggestion 13.1.5 – Use the lowest acceptable frequency for data processing
Suggestion 13.1.5 – Use the lowest acceptable frequency for data processing

 Data processing requirements must be considered in the business context. There is no value in processing data faster than it is consumed or delivered. For example, in a sales analytics workload, it might not be necessary to perform analytics on each transaction as it arrives. In some cases, only hourly reports are needed by business management. Batch processing the transactions is more eﬃcient and can reduce unnecessary infrastructure costs between batch processing jobs. 

## Suggestion 13.1.6 – Compress data to reduce cost
Suggestion 13.1.6 – Compress data to reduce cost

 Data compression can significantly reduce storage and query costs. Columnar data formats like Apache Parquet stores data in columns rather than rows, allowing similar data to be stored contiguously. Using Parquet over CSV format can reduce storage costs significantly. Since services like Amazon Redshift Spectrum and Amazon Athena charge for bytes scanned, compressing data lowers the overall cost of using those services. 

# Best practice 13.2 – Continuously evaluate your provisioned resources and identify overprovisioned workloads
Best practice 13.2 – Continuously evaluate your provisioned resources and identify overprovisioned workloads

 Workload resource utilization can change over time, especially with the growth of data or after process optimization has occurred. Your organization should review resource usage patterns and determine if you require the same infrastructure footprint to meet your business goals. 

## Suggestion 13.2.1 – Evaluate whether compute resources can be downsized
Suggestion 13.2.1 – Evaluate whether compute resources can be downsized

 Investigate your resource utilization by inspecting the metrics provided by Amazon CloudWatch. Evaluate whether the resources can be downsized to one-level smaller within the same instance class. For example, reduce Amazon EMR cluster nodes from m5.16xlarge to m5.12xlarge, or the number of instances that make up the cluster. 

## Suggestion 13.2.2 – Move infrequently used data out of a data warehouse into a data lake
Suggestion 13.2.2 – Move infrequently used data out of a data warehouse into a data lake

 Data that is infrequently used can be moved from the data warehouse into the data lake. From there, the data can be queried in place or joined with data in the warehouse. Use services such as Amazon Redshift Spectrum to query and join data in the Amazon S3 data lake, or Amazon Athena to query data at rest in Amazon S3. 

## Suggestion 13.2.3 – Merge low utilization infrastructure resources
Suggestion 13.2.3 – Merge low utilization infrastructure resources

 If you have several workloads that all have low-utilization resources, determine if you can combine those workloads to run on shared infrastructure. In many cases, using a pooled resource model for analytics workloads will save on infrastructure costs. 

## Suggestion 13.2.4 – Move infrequently accessed data into low-cost storage tiers
Suggestion 13.2.4 – Move infrequently accessed data into low-cost storage tiers

 When designing a data lake or data analytics project, consider required access patterns, transaction concurrency, and acceptable transaction latency. These will inﬂuence where data is stored. It is equally important to consider how often data will be accessed. Have a data lifecycle plan to migrate data tiers from hotter storage to colder, less-expensive storage, while still meeting all business objectives. 

 Transitioning between storage tiers is achieved using Amazon S3 Lifecycle policies. These automatically transition objects into another tier with lower cost, and will even delete expired data. Amazon S3 Intelligent-Tiering will analyze the data access patterns and automatically move objects between tiers. 

 

## Suggestion 13.2.5 – Move to serverless when you don't need always-on infrastructure
Suggestion 13.2.5 – Move to serverless when you don't need always-on infrastructure

 For analytics workloads that have intermittent or unpredictable usage patterns, moving to AWS serverless can provide significant cost savings compared to provisioned servers. AWS serverless analytics services like Amazon Athena, EMR Serverless, and Amazon Redshift Serverless are great options that provide on-demand access without having to provision always-on resources. These services automatically start up when needed and shut down when not in use so you don't have to pay for idle capacity. 

 For example, with Amazon Redshift Serverless, you pay for compute only when the data warehouse is in use. By using Amazon Redshift Serverless for tasks such as loading data and leveraging Amazon Redshift data sharing, you can scale down your main cluster and still maintain the same performance for end users. 

 For more detail, refer to the following: 
+ [ Easy analytics and cost optimization with Amazon Redshift Serverless ](https://aws.amazon.com/blogs/big-data/easy-analytics-and-cost-optimization-with-amazon-redshift-serverless/)
+ [ Amazon EMR Serverless cost estimator ](https://aws.amazon.com/blogs/big-data/amazon-emr-serverless-cost-estimator/)
+ [ Run queries 3x faster with up to 70% cost savings on the latest Amazon Athena engine ](https://aws.amazon.com/blogs/big-data/run-queries-3x-faster-with-up-to-70-cost-savings-on-the-latest-amazon-athena-engine/)

# Best practice 13.3 – Evaluate and adopt new cost-effective solutions
BP 13.3 – Evaluate and adopt new cost-effective solutions

 As AWS releases new services and features, it’s a best practice to review your existing architectural decisions to ensure that they remain cost effective. If a new or updated service can support the same workload but in a much cheaper way, consider implementing the change to reduce cost. 

## Suggestion 13.3.1 – Set Service Quotas to control resource usage
Suggestion 13.3.1 – Set Service Quotas to control resource usage

 Some AWS services allow setting Service Quotas per account. Service Quotas should be established to prevent runaway infrastructure deployment by accident. Ensure that Service Quotas are set high enough to cover the expected peak usage. 

## Suggestion 13.3.2 – Pause and resume resources if the workload is not always required
Suggestion 13.3.2 – Pause and resume resources if the workload is not always required

 Use automation to pause and resume resources when the resource is unneeded. For example, stop development and test Amazon RDS instances that are not used after working hours. 

## Suggestion 13.3.3 – Switch to a new service or take advantage of new features that can reduce cost
Suggestion 13.3.3 – Switch to a new service or take advantage of new features that can reduce cost

 AWS consistently adds new capabilities to enable your organization to leverage the latest technologies to experiment and innovate more quickly. Your organization should review new service releases frequently to understand the price and performance, and determine if such features can improve cost reduction. 