# PERF 3. 如何存储、管理和访问工作负载中的数据？
<a name="perf-03"></a>

 针对特定系统的最佳数据管理解决方案往往取决于数据类型（数据块、文件或对象）、访问模式（随机或连续）、所需吞吐量、访问频率（在线、离线、归档）、更新频率（WORM、动态）以及可用性与持久性限制等因素。Well-Architected 工作负载使用专门构建的数据存储，这些存储允许使用不同的功能来提高性能。

**Topics**
+ [

# PERF03-BP01 使用最能满足数据访问和存储要求的专用数据存储
](perf_data_use_purpose_built_data_store.md)
+ [

# PERF03-BP02 评估数据存储的可用配置选项
](perf_data_evaluate_configuration_options_data_store.md)
+ [

# PERF03-BP03 收集和记录数据存储性能指标
](perf_data_collect_record_data_store_performance_metrics.md)
+ [

# PERF03-BP04 实施可提高数据存储查询性能的策略
](perf_data_implement_strategies_to_improve_query_performance.md)
+ [

# PERF03-BP05 实施利用缓存的数据访问模式
](perf_data_access_patterns_caching.md)

# PERF03-BP01 使用最能满足数据访问和存储要求的专用数据存储
<a name="perf_data_use_purpose_built_data_store"></a>

 了解数据特性（如数据的可共享性、大小、缓存大小、访问模式、延迟、吞吐量和持久性），为工作负载选择合适的专用数据存储（存储或数据库）。

 **常见反模式：**
+  由于内部对某种特定类型的数据库解决方案具备相关经验且比较了解，因此坚持使用一种数据存储。
+  认为所有工作负载都有类似的数据存储和访问要求。
+  没有实施数据目录来清点数据资产。

 **建立此最佳实践的好处：**了解数据特性和要求，有助于确定效率最高、性能最高的存储技术来满足工作负载需求。

 **在未建立这种最佳实践的情况下暴露的风险等级：**高 

## 实施指导
<a name="implementation-guidance"></a>

 选择和实施数据存储时，要确保查询、扩展和存储特性支持工作负载数据要求。AWS 提供多种数据存储和数据库技术，包括数据块存储、对象存储、流式存储、文件系统、关系数据库、键值数据库、文档数据库、内存数据库、图形数据库、时间序列数据库和分类账数据库等。每种数据管理解决方案都有可供您使用的选项和配置，可支持应用场景和数据模型。通过了解数据特性和要求，您可以摆脱单一存储技术以及有很多局限性的一刀切方法，专注于合理管理数据。

### 实施步骤
<a name="implementation-steps"></a>
+  清点工作负载中存在的各种数据类型。
+  了解并记录数据特性和要求，包括：
  +  数据类型（非结构化、半结构化、关系型） 
  +  数据量和增长 
  +  数据持久性：持久、短暂、瞬时 
  +  ACID（原子性、一致性、隔离性、持久性）要求 
  +  数据访问模式（读取密集型或写入密集型） 
  +  延迟 
  +  吞吐量 
  +  IOPS（每秒输入/输出操作数） 
  +  数据留存期 
+  了解可用于 AWS 工作负载的不同数据存储（[存储](https://docs.aws.amazon.com/whitepapers/latest/aws-overview/storage-services.html)和[数据库](https://docs.aws.amazon.com/whitepapers/latest/aws-overview/database.html)服务），这些存储可以满足您的数据特性要求（如 [PERF01-BP01 了解并掌握可用的云服务和功能](perf_architecture_understand_cloud_services_and_features.md)中所述）。AWS 存储技术及其关键特性的一些示例包括：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/framework/perf_data_use_purpose_built_data_store.html)
+  若要构建数据平台，可利用 AWS 上的[现代数据架构](https://aws.amazon.com/big-data/datalakes-and-analytics/modern-data-architecture/)来集成数据湖、数据仓库和专用数据存储。
+  为工作负载选择数据存储时需要考虑的关键问题如下：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/framework/perf_data_use_purpose_built_data_store.html)
+  在非生产环境中进行试验和基准测试，确定哪种数据存储可以满足工作负载要求。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [Amazon EBS 卷类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 存储](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS：Amazon EFS 性能](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [适用于 Lustre 的 Amazon FSx 性能](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [适用于 Windows File Server 的 Amazon FSx 性能](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier：Amazon Glacier 文档](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3：请求速率和性能注意事项](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [使用 AWS 进行云存储](https://aws.amazon.com/products/storage/) 
+  [Amazon EBS I/O 特性](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 
+  [AWS 云数据库](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS 数据库缓存](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora 最佳实践](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift 性能](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena 十大性能技巧](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum 最佳实践](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB 最佳实践](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 
+  [在 Amazon EC2 和 Amazon RDS 之间进行选择](https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-sql-server/comparison.html) 
+ [实施 Amazon ElastiCache 的最佳实践](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/BestPractices.html)

 **相关视频：**
+  [AWS re:Invent 2023: Improve Amazon Elastic Block Store efficiency and be more cost-efficient](https://www.youtube.com/watch?v=7-CB02rqiuw) 
+  [AWS re:Invent 2023: Optimizing storage price and performance with Amazon Simple Storage Service](https://www.youtube.com/watch?v=RxgYNrXPOLw) 
+  [AWS re:Invent 2023: Building and optimizing a data lake on Amazon Simple Storage Service](https://www.youtube.com/watch?v=mpQa_Zm1xW8) 
+  [AWS re:Invent 2022: Building modern data architectures on AWS](https://www.youtube.com/watch?v=Uk2CqEt5f0o) 
+  [AWS re:Invent 2022: Building data mesh architectures on AWS](https://www.youtube.com/watch?v=nGRvlobeM_U) 
+  [AWS re:Invent 2023: Deep dive into Amazon Aurora and its innovations](https://www.youtube.com/watch?v=je6GCOZ22lI) 
+  [AWS re:Invent 2023: Advanced data modeling with Amazon DynamoDB](https://www.youtube.com/watch?v=PVUofrFiS_A) 
+ [AWS re:Invent 2022: Modernize apps with purpose-built databases](https://www.youtube.com/watch?v=V-DiplATdi0)
+ [Amazon DynamoDB deep dive: Advanced design patterns](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **相关示例：**
+  [AWS Purpose Built Databases 讲习会](https://catalog.us-east-1.prod.workshops.aws/workshops/93f64257-52be-4c12-a95b-c0a1ff3b7e2b/en-US) 
+  [Databases for Developers](https://catalog.workshops.aws/db4devs/en-US) 
+  [AWS Modern Data Architecture Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/32f3e732-d67d-4c63-b967-c8c5eabd9ebf/en-US) 
+  [Build a Data Mesh on AWS](https://catalog.us-east-1.prod.workshops.aws/workshops/23e6326b-58ee-4ab0-9bc7-3c8d730eb851/en-US) 
+  [Amazon S3 示例](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 
+  [Optimize Data Pattern using Amazon Redshift Data Sharing](https://wellarchitectedlabs.com/sustainability/300_labs/300_optimize_data_pattern_using_redshift_data_sharing/) 
+  [Database Migrations](https://github.com/aws-samples/aws-database-migration-samples) 
+  [MS SQL Server - AWS Database Migration Service (AWS DMS) Replication Demo](https://github.com/aws-samples/aws-dms-sql-server) 
+  [Database Modernization Hands On 讲习会](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Amazon Neptune 示例](https://github.com/aws-samples/amazon-neptune-samples) 

# PERF03-BP02 评估数据存储的可用配置选项
<a name="perf_data_evaluate_configuration_options_data_store"></a>

 了解并评估数据存储的各种可用功能和配置选项，从而优化工作负载的存储空间和性能。

 **常见反模式：**
+  对所有工作负载都只使用一种存储类型，例如 Amazon EBS。
+  对所有工作负载都使用预调配 IOPS，而没有对所有存储层进行真实测试。
+  不了解所选数据管理解决方案的配置选项。
+  只依赖于增加实例大小，而没有考虑其他可用的配置选项。
+  没有测试数据存储的扩展特性。

 **建立此最佳实践的好处：**通过探索和试用数据存储选项，也许能够降低基础设施成本、提高性能并减少维护工作负载所需的工作量。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 根据数据存储和访问要求，一个工作负载能够使用一个或多个数据存储。要优化性能效率和成本，必须评估数据访问模式来确定适当的数据存储配置。在研究数据存储选项时，要考虑存储选项、内存、计算、只读副本、一致性要求、连接池和缓存选项等各个方面。尝试使用这些不同的配置选项来改进性能效率指标。

### 实施步骤
<a name="implementation-steps"></a>
+  了解数据存储的当前配置（如实例类型、存储大小或数据库引擎版本）。
+  查看 AWS 文档和最佳实践，了解有助于提高数据存储性能的推荐配置选项。需要考虑的关键数据存储选项如下：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/framework/perf_data_evaluate_configuration_options_data_store.html)
+  在非生产环境中进行试验和基准测试，确定哪种配置选项可以满足工作负载要求。
+  试验完成后，规划迁移并验证性能指标。
+  使用 AWS 监控工具（如 [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)）和优化工具（如 [Amazon S3 Storage Lens 存储统计管理工具](https://aws.amazon.com/s3/storage-lens/)），在实际使用模式下持续优化数据存储。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [使用 AWS 进行云存储](https://aws.amazon.com/products/storage/?ref=wellarchitected) 
+  [Amazon EBS 卷类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 存储](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS：Amazon EFS 性能](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [适用于 Lustre 的 Amazon FSx 性能](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [适用于 Windows File Server 的 Amazon FSx 性能](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier：Amazon Glacier 文档](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3：请求速率和性能注意事项](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Amazon EBS I/O 特性](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 
+  [ 云数据库AWS](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS 数据库缓存](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora 最佳实践](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift 性能](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena 十大性能技巧](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum 最佳实践](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB 最佳实践](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 

 **相关视频：**
+  [AWS re:Invent 2023: Improve Amazon Elastic Block Store efficiency and be more cost-efficient](https://www.youtube.com/watch?v=7-CB02rqiuw) 
+  [AWS re:Invent 2023: Optimize storage price and performance with Amazon Simple Storage Service](https://www.youtube.com/watch?v=RxgYNrXPOLw) 
+  [AWS re:Invent 2023: Building and optimizing a data lake on Amazon Simple Storage Service](https://www.youtube.com/watch?v=mpQa_Zm1xW8) 
+  [AWS re:Invent 2023: What's new with AWS file storage](https://www.youtube.com/watch?v=yXIeIKlTFV0) 
+  [AWS re:Invent 2023: Dive deep into Amazon DynamoDB](https://www.youtube.com/watch?v=ld-xoehkJuU) 

 **相关示例：**
+  [AWS Purpose Built Databases 讲习会](https://catalog.us-east-1.prod.workshops.aws/workshops/93f64257-52be-4c12-a95b-c0a1ff3b7e2b/en-US) 
+  [Databases for Developers](https://catalog.workshops.aws/db4devs/en-US) 
+  [AWS Modern Data Architecture Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/32f3e732-d67d-4c63-b967-c8c5eabd9ebf/en-US) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 示例](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 
+  [Amazon DynamoDB 示例](https://github.com/aws-samples/aws-dynamodb-examples) 
+  [AWS 数据库迁移示例](https://github.com/aws-samples/aws-database-migration-samples) 
+  [Database Modernization 讲习会](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Working with parameters on your Amazon RDS for Postgress DB](https://github.com/awsdocs/amazon-rds-user-guide/blob/main/doc_source/Appendix.PostgreSQL.CommonDBATasks.Parameters.md) 

# PERF03-BP03 收集和记录数据存储性能指标
<a name="perf_data_collect_record_data_store_performance_metrics"></a>

 跟踪并记录数据存储的相关性能指标，了解数据管理解决方案的执行情况。这些指标有助于您优化数据存储，验证是否满足工作负载要求，并清晰地概述工作负载的表现情况。

 **常见反模式：**
+  只手动搜索日志文件来查找指标。
+  只将指标发布到团队使用的内部工具，而没有全面了解工作负载。
+  只使用由自己选定的监控软件记录的默认指标。
+  只在出现问题时审查指标。
+  只监控系统级指标，而不捕获数据访问或使用情况指标。

 **建立此最佳实践的好处：**建立性能基准有助于了解工作负载的正常行为和要求。可以更快地识别和调试异常模式，从而提高数据存储的性能和可靠性。

 **在未建立这种最佳实践的情况下暴露的风险等级：**高 

## 实施指导
<a name="implementation-guidance"></a>

 要监控数据存储的性能，必须记录一段时间的多项性能指标。这样您便可以检测异常并根据业务指标衡量性能，确保满足您的工作负载需求。

 指标既应包括支持数据存储的底层系统指标，也应包括数据库指标。底层系统指标可能包括 CPU 利用率、内存、可用磁盘存储、磁盘 I/O、缓存命中率以及网络入站和出站指标，而数据存储指标可能包括每秒事务数、最多的查询、平均查询速率、响应时间、索引使用情况、表锁定、查询超时和打开的连接数。这些数据对于了解工作负载的表现情况以及数据管理解决方案的使用方式至关重要。在数据驱动方法中使用这些指标，以便调整和优化工作负载的资源。  

 使用各种工具、库和系统来记录与数据库性能相关的性能测量值。

## 实施步骤
<a name="implementation-steps"></a>
+  确定要跟踪的数据存储关键性能指标。
  +  [Amazon S3 指标与维度](https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html) 
  +  [监控 Amazon RDS 实例中的指标](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) 
  +  [在 Amazon RDS 上使用性能详情监控数据库负载](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) 
  +  [增强监测概述](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.overview.html) 
  +  [DynamoDB 指标与维度](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html) 
  +  [监控 DynamoDB Accelerator](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.Monitoring.html) 
  +  [使用 Amazon CloudWatch 监控 Amazon MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/monitoring-cloudwatch.html) 
  +  [应监控哪些指标？](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.WhichShouldIMonitor.html) 
  +  [监控 Amazon Redshift 集群性能](https://docs.aws.amazon.com/redshift/latest/mgmt/metrics.html) 
  +  [Timestream 指标与维度](https://docs.aws.amazon.com/timestream/latest/developerguide/metrics-dimensions.html) 
  +  [Amazon Aurora 的 Amazon CloudWatch 指标](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMonitoring.Metrics.html) 
  +  [在 Amazon Keyspaces（Apache Cassandra 兼容）中记录和监控](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring.html) 
  +  [监控 Amazon Neptune 资源](https://docs.aws.amazon.com/neptune/latest/userguide/monitoring.html) 
+  使用经批准的日志记录和监控解决方案来收集这些指标。[Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 可以收集架构中各种资源的指标。您也可以收集和发布自定义指标，用于显示业务指标或派生指标。使用 CloudWatch 或第三方解决方案来设置超出阈值时显示的警报。
+  检查数据存储监控，确定其能否受益于可检测性能异常的机器学习解决方案。
  +  [Amazon DevOps Guru for Amazon RDS](https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.overview.how-it-works.html) 会显示性能问题，并提出纠正措施的建议。
+  在监控和日志记录解决方案中配置数据留存，从而满足您的安全和运营目标。
  +  [CloudWatch 指标的默认数据留存](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 
  +  [CloudWatch Logs 的默认数据留存](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

## 资源
<a name="resources"></a>

 **相关文档：**
+  [AWS 数据库缓存](https://aws.amazon.com/caching/database-caching/) 
+  [Amazon Athena 十大性能技巧](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+  [Amazon Aurora 最佳实践](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+  [DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/) 
+  [Amazon DynamoDB 最佳实践](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+  [Amazon Redshift Spectrum 最佳实践](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+  [Amazon Redshift 性能](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [AWS 云数据库](https://aws.amazon.com/products/databases/) 
+  [Amazon RDS 性能详情](https://aws.amazon.com/rds/performance-insights/) 

 **相关视频：**
+ [AWS re:Invent 2022 - Performance monitoring with Amazon RDS and Aurora, featuring Autodesk](https://www.youtube.com/watch?v=wokRbwK4YLo)
+ [Database Performance Monitoring and Tuning with Amazon DevOps Guru for Amazon RDS](https://www.youtube.com/watch?v=cHKuVH7YGBE)
+ [AWS re:Invent 2023 - What’s new with AWS file storage](https://www.youtube.com/watch?v=yXIeIKlTFV0)
+ [AWS re:Invent 2023 - Dive deep into Amazon DynamoDB](https://www.youtube.com/watch?v=ld-xoehkJuU)
+ [AWS re:Invent 2023 - Building and optimizing a data lake on Amazon S3](https://www.youtube.com/watch?v=mpQa_Zm1xW8)
+ [AWS re:Invent 2023 - What’s new with AWS file storage](https://www.youtube.com/watch?v=yXIeIKlTFV0)
+ [AWS re:Invent 2023 - Dive deep into Amazon DynamoDB](https://www.youtube.com/watch?v=ld-xoehkJuU)
+  [Best Practices for Monitoring Redis Workloads on Amazon ElastiCache](https://www.youtube.com/watch?v=c-hTMLN35BY&ab_channel=AWSOnlineTechTalks) 

 **相关示例：**
+  [AWS Dataset Ingestion Metrics Collection Framework](https://github.com/awslabs/aws-dataset-ingestion-metrics-collection-framework) 
+  [Amazon RDS Monitorng 讲习会](https://www.workshops.aws/?tag=Enhanced%20Monitoring) 
+ [AWS Purpose Built Databases 讲习会](https://catalog.us-east-1.prod.workshops.aws/workshops/93f64257-52be-4c12-a95b-c0a1ff3b7e2b/en-US)

# PERF03-BP04 实施可提高数据存储查询性能的策略
<a name="perf_data_implement_strategies_to_improve_query_performance"></a>

 实施可优化数据和改进数据查询的策略，从而提高工作负载的可扩展性和性能效率。

 **常见反模式：**
+  没有对数据存储中的数据进行分区。
+  在数据存储中只以一种文件格式存储数据。
+  没有在数据存储中使用索引。

 **建立此最佳实践的好处：**优化数据和查询性能可以提高效率、降低成本并改善用户体验。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

数据优化和查询调整是提高数据存储性能效率的关键环节，因为这会影响整个云工作负载的性能和响应能力。如果查询未经优化，则会耗用更多的资源并产生更多的瓶颈，从而降低数据存储的整体效率。

数据优化会涵盖多种技术，旨在确保高效的数据存储和访问，同时还有助于改进在数据存储中的查询性能。关键策略包括数据分区、数据压缩和数据去规范化，这有助于针对存储和访问优化数据。

### 实施步骤
<a name="implementation-steps"></a>
+  了解并分析在数据存储中执行的关键数据查询。
+  识别数据存储中运行速度较慢的查询，并使用查询计划了解当前状态。
  +  [在 Amazon Redshift 中分析查询计划](https://docs.aws.amazon.com/redshift/latest/dg/c-analyzing-the-query-plan.html) 
  +  [在 Athena 中使用 EXPLAIN 和 EXPLAIN ANALYZE](https://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement.html) 
+  实施可提高查询性能的策略。一些关键策略包括：
  +  使用[列式文件格式](https://docs.aws.amazon.com/athena/latest/ug/columnar-storage.html)（如 Parquet 或 ORC）。
  + 压缩数据存储中的数据，减少存储空间和 I/O 操作。
  +  进行数据分区，将数据分割成更小的部分，减少数据扫描时间。
    + [在 Athena 中对数据进行分区](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)
    + [分区和数据分发](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html)
  +  对查询中的常用列编制数据索引。
  +  使用实体化视图频繁地进行查询。
    + [了解实体化视图](https://docs.aws.amazon.com/prescriptive-guidance/latest/materialized-views-redshift/understanding-materialized-views.html)
    + [在 Amazon Redshift 中创建实体化视图](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html)
  +  为查询选择合适的联接操作。联接两个表时，请在联接的左侧指定较大的表，在联接的右侧指定较小的表。
  +  实施分布式缓存解决方案，从而缩短延迟并减少数据库 I/O 操作次数。
  +  执行定期维护，例如 [vaccum](https://docs.aws.amazon.com/prescriptive-guidance/latest/postgresql-maintenance-rds-aurora/autovacuum.html) 操作、重新索引和[进行统计](https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html)。
+  在非生产环境中试验和测试策略。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [Amazon Aurora 最佳实践](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift 性能](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena 十大性能技巧](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [AWS 数据库缓存](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [实施 Amazon ElastiCache 的最佳实践](https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html) 
+  [在 Athena 中对数据进行分区](https://docs.aws.amazon.com/athena/latest/ug/partitions.html) 

 **相关视频：**
+ [AWS re:Invent 2023 - AWS storage cost-optimization best practices](https://www.youtube.com/watch?v=8LVKNHcA6RY)
+ [AWS re:Invent 2022 - Performance monitoring with Amazon RDS and Aurora, featuring Autodesk](https://www.youtube.com/watch?v=wokRbwK4YLo)
+  [Optimize Amazon Athena Queries with New Query Analysis Tools ](https://www.youtube.com/watch?v=7JUyTqglmNU&ab_channel=AmazonWebServices) 

 **相关示例：**
+ [AWS Purpose Built Databases 讲习会](https://catalog.us-east-1.prod.workshops.aws/workshops/93f64257-52be-4c12-a95b-c0a1ff3b7e2b/en-US)

# PERF03-BP05 实施利用缓存的数据访问模式
<a name="perf_data_access_patterns_caching"></a>

 实施可从缓存数据受益的访问模式，以便快速检索经常访问的数据。

 **常见反模式：**
+  缓存经常变化的数据。
+  依赖缓存的数据，就好像这些数据是持久存储的，并且始终可用。
+  不考虑缓存数据的一致性。
+  不监控缓存实现方案的效率。

 **建立此最佳实践的好处：**将数据存储在缓存中可以改善读取延迟、读取吞吐量、用户体验和整体效率，还可以降低成本。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 缓存是一种软件或硬件组件，旨在存储数据，以便将来可以更快或更高效地处理对相同数据的请求。如果存储在缓存中的数据丢失，则可以通过重复先前的计算或从其他数据存储中获取数据进行重建。

 数据缓存可能是提高应用程序整体性能和减轻底层主数据源负担的最有效策略之一。数据可以在应用程序的多个级别上缓存，例如在进行远程调用的应用程序内缓存（称作*客户端缓存*），或者使用快速辅助服务来存储数据（称作*远程缓存*）。

 **客户端缓存** 

 借助客户端缓存，每个客户端（查询后端数据存储的应用程序或服务）都可以在本地将特定查询的结果存储指定的时间。可以通过先检查本地客户端缓存，减少通过网络向数据存储发出的请求数量。如果结果不存在，则应用程序可以查询数据存储并将这些结果存储在本地。这种模式允许每个客户端将数据存储在尽可能近的位置（客户端本身），从而尽可能降低延迟。当后端数据存储不可用时，客户端还可以继续支持某些查询，从而提高整个系统的可用性。

 这种方法的一个缺点是，当涉及多个客户端时，它们可能会在本地存储相同的缓存数据。这会导致这些客户端之间存在重复的存储使用情况和数据不一致性。一个客户端可能刚缓存查询结果，而一分钟后，另一个客户端可能运行相同的查询并得到不同的结果。

 **远程缓存** 

 为了解决客户端之间的重复数据问题，可以使用快速的外部服务或*远程缓存*来存储查询的数据。在查询后端数据存储之前，每个客户端都将检查远程缓存，而不是检查本地数据存储。这种策略可在客户端之间实现更加一致的响应、更高的存储数据效率以及更高的缓存数据量，因为存储空间可独立于客户端进行扩展。

 远程缓存的缺点是整个系统的延迟可能会更高，因为需要额外的网络跃点数来检查远程缓存。客户端缓存可以与远程缓存一起使用，形成多级缓存来缩短延迟。

### 实施步骤
<a name="implementation-steps"></a>
+  确定可以从缓存中受益的数据库、API 和网络服务。读取工作负载繁重、读写比率高或扩展成本高昂的服务适合使用缓存。
  +  [数据库缓存](https://aws.amazon.com/caching/database-caching/) 
  +  [启用 API 缓存以增强响应能力](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html) 
+  确定最适合您的访问模式的适当缓存策略类型。
  +  [缓存策略](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Strategies.html) 
  +  [AWS 缓存解决方案](https://aws.amazon.com/caching/aws-caching/) 
+  遵循数据存储的[缓存最佳实践](https://aws.amazon.com/caching/best-practices/)。
+  为所有数据配置缓存失效策略，例如生存时间（TTL），以平衡数据的时效性并减轻后端数据存储的压力。
+  启用诸如自动连接重试、指数回退、客户端超时和客户端连接池等功能（如果有），因为它们可以提高性能和可靠性。
  +  [Best practices: Redis clients and Amazon ElastiCache for Redis](https://aws.amazon.com/blogs/database/best-practices-redis-clients-and-amazon-elasticache-for-redis/) 
+  监控缓存命中率，目标为 80% 或更高。低于此值可能表示缓存大小不足，或访问模式无法从缓存中受益。
  +  [Which metrics should I monitor?](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.WhichShouldIMonitor.html)
  +  [Best practices for monitoring Redis workloads on Amazon ElastiCache](https://www.youtube.com/watch?v=c-hTMLN35BY) 
  +  [Monitoring best practices with Amazon ElastiCache (Redis OSS) using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/) 
+  实施[数据复制](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.Redis.Groups.html)，将读取操作分流到多个实例，以提高数据读取性能和可用性。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [Using the Amazon ElastiCache Well-Architected Lens](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/WellArchitechtedLens.html) 
+  [Monitoring best practices with Amazon ElastiCache (Redis OSS) using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/) 
+  [应监控哪些指标？](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.WhichShouldIMonitor.html) 
+  [Performance at Scale with Amazon ElastiCache](https://docs.aws.amazon.com/whitepapers/latest/scale-performance-elasticache/scale-performance-elasticache.html) 白皮书 
+  [缓存挑战和策略](https://aws.amazon.com/builders-library/caching-challenges-and-strategies/) 

 **相关视频：**
+  [Amazon ElastiCache Learning Path](https://pages.awscloud.com/GLB-WBNR-AWS-OTT-2021_LP_0003-DAT_AmazonElastiCache.html) 
+  [Design for success with Amazon ElastiCache best practices](https://youtu.be/_4SkEy6r-C4) 
+ [AWS re:Invent 2020 - Design for success with Amazon ElastiCache best practices](https://www.youtube.com/watch?v=_4SkEy6r-C4)
+ [AWS re:Invent 2023 - [LAUNCH] Introducing Amazon ElastiCache Serverless](https://www.youtube.com/watch?v=YYStP97pbXo)
+ [AWS re:Invent 2022 - 5 great ways to reimagine your data layer with Redis ](https://www.youtube.com/watch?v=CD1kvauvKII)
+ [AWS re:Invent 2021 - Deep dive on Amazon ElastiCache (Redis OSS)](https://www.youtube.com/watch?v=QEKDpToureQ)

 **相关示例：**
+  [使用 Amazon ElastiCache for Redis 提升 MySQL 数据库性能](https://aws.amazon.com/getting-started/hands-on/boosting-mysql-database-performance-with-amazon-elasticache-for-redis/)