

# Architecture selection
<a name="architecture-selection"></a>


| AOSPERF01: How do you plan, monitor, and optimize your sharding strategy? | 
| --- | 
|   | 

 Providing the effectiveness of your sharding strategy in OpenSearch Service involves planning, monitoring, and ongoing optimization. Sharding is a critical aspect of OpenSearch Service's distributed architecture, and a well-designed strategy can significantly impact the performance, scalability, and reliability of your domain.


| AOSPERF02: How do you monitor and optimize resource allocation for your OpenSearch instances? | 
| --- | 
|   | 

 Optimizing resource utilization in your OpenSearch domain instances is important for several reasons, as it contributes to the efficiency, cost-effectiveness, stability, reliability, efficient data storage and overall performance of your search and analytics environment.

**Topics**
+ [AOSPERF01-BP01 Maintain shard sizes at recommended ranges](aosperf01-bp01.md)
+ [AOSPERF01-BP02 Check shard-to-CPU ratio](aosperf01-bp02.md)
+ [AOSPERF01-BP03 Check the number of shards per GiB of heap memory](aosperf01-bp03.md)
+ [AOSPERF02-BP01 Implement processor utilization monitoring](aosperf02-bp01.md)
+ [AOSPERF02-BP02 Implement Java memory utilization monitoring](aosperf02-bp02.md)

# AOSPERF01-BP01 Maintain shard sizes at recommended ranges
<a name="aosperf01-bp01"></a>

 Shard size is recommended in a range of 10 GiB to 50 GiB. 

 **Level of risk exposed if this best practice is not established:** High 

 **Desired outcome:** Shards fall within the recommended size range of 10 GiB to 50 GiB, for efficient indexing, querying and relocation performance. 

 **Benefits of establishing this best practice:** 
+  Reduced CPU and memory utilization 
+  Enhanced domain stability by avoiding shard contention and node overload 

## Implementation guidance
<a name="implementation-guidance-29"></a>

 Shard sizes depend on the workload, and the distribution of shards to data nodes significantly influences a domain's performance. 
+  Verify that the ideal shard size range is between 10GB and 30GB for *search-intensive* workloads, and 30GB to 50GB for *log analytics* and *time-series* data processing. Use `GET _cat/shards` to view shard size. 
+  If your indices have excessively small shards (less than 10GB in size), consider reindexing the data with a reduced number shards for that index to potentially boost performance and reduce CPU utilization. You can reindex the data from source into a new index or use the `_reindex` API to copy data from an existing index to a new one within the same domain. 
+  Verify shard to Java heap memory ratio. Verify that you have no more than 25 shards per GiB of Java heap. 
+  Verify that you don't have more than 1,000 shards per data node. 
+  You can implement ISM policies to roll over your indices when the shards reach a certain size. 
+  Consider OpenSearch's [aliasing](https://opensearch.org/docs/latest/im-plugin/index-alias/) capability, which you can use to quickly update your indices without requiring modifications to your application. 

## Resources
<a name="resources-28"></a>
+  [Sharding strategy](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html#bp-sharding-strategy) 
+  [Index aliases](https://opensearch.org/docs/latest/im-plugin/index-alias/) 

# AOSPERF01-BP02 Check shard-to-CPU ratio
<a name="aosperf01-bp02"></a>

 Shards are allocated a minimum of 1.5 vCPUs per shard, supporting efficient processing and scalability. 

 **Level of risk exposed if this best practice is not established:** High 

 **Desired outcome:** The Shard-to-CPU ratio is maintained at a minimum level of 1.5 vCPUs per shard. This may vary depending on the use case. Search use cases might require a higher ratio, while log analytics use cases may work with a smaller ratio. 

 **Benefits of establishing this best practice:** 
+  A minimum shard-to-CPU ratio of 1.5 helps support each shard with sufficient processing power, reducing the risk of bottlenecks and promoting performance during indexing or search operations. 
+  By provisioning at least 1.5 vCPUs per shard, you can scale your cluster more efficiently, either by scaling up (increasing instance size) or out (adding more data nodes), to meet changing workload demands without sacrificing performance. 

## Implementation guidance
<a name="implementation-guidance-30"></a>

 During indexing or search operations, each shard should use at least one vCPU to process requests. To guarantee sufficient processing capacity, maintain a minimum shard-to-CPU ratio of 1:1.5. This means that if you have 10 shards in your cluster, for example, you should provision at least 15 vCPUs (10 x 1.5 = 15). You can scale up or out to meet the needs of your cluster based on the total number of shards their density in your data nodes. 

 In addition to the recommended shard-to-CPU ratio, for more detail, see [AOSPERF01-BP01](aosperf01-bp01.md). This helps to align shard sizes with your workload requirements. 

## Resources
<a name="resources-29"></a>
+  [Sharding strategy](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html#bp-sharding-strategy) 

# AOSPERF01-BP03 Check the number of shards per GiB of heap memory
<a name="aosperf01-bp03"></a>

 Prevent inefficient resource utilization by keeping each data node's heap under a shard load of 25. 

 **Level of risk exposed if this best practice is not established:** High 

 **Desired outcome:** Each data node's heap memory should support no more than 25 shards, thereby optimizing resource utilization and enhancing query performance. 

 **Benefits of establishing this best practice:** 
+  Improved query performance 
+  Reduce risks of running out of memory while executing the queries 

## Implementation guidance
<a name="implementation-guidance-31"></a>

 Amazon OpenSearch Service allocates approximately half of each data node's physical memory, up to 32 GB, for the Java Virtual Machine (JVM) and OpenSearch. In a system with 16 GB of memory, this amounts to roughly 8 GB reserved for the JVM. 

 Additionally, the total number of shards a node can handle is directly related to its JVM heap memory size. To maintain an optimal balance, strive for a shard-to-JVM-heap-memory ratio of approximately 25:1. For example, with 16 GB of memory (and thus 8 GB allocated to the JVM), you should aim for no more than 200 shards per node (25 x 8 = 200). 

 For further guidance on optimal shard sizing, see [AOSPERF04-BP01](aosperf04-bp01.md) and [AOSPERF04-BP02](aosperf04-bp02.md). 

## Resources
<a name="resources-30"></a>
+  [Sharding strategy](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html#bp-sharding-strategy) 
+  [Choosing the number of shards](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp-sharding.html) 
+  [Sizing OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) 

# AOSPERF02-BP01 Implement processor utilization monitoring
<a name="aosperf02-bp01"></a>

 Keep CPU usage under 75% to maintain efficient resource utilization and prevent potential performance issues. 

 **Level of risk exposed if this best practice is not established**: High 

 **Desired outcome**: CPU utilization is below 75% for efficient resource utilization and minimizing potential performance issues. 

 **Benefits of establishing this best practice:** 
+  **Prevent performance issues:** Keeping CPU utilization below 75% helps identify and address potential issues related to high CPU usage, minimizing the risk of indexing or query processing bottlenecks. 
+  **Maintain scalability:** By keeping CPU utilization within a safe threshold, you can prevent scalability issues and ensure that your OpenSearch Service domains can handle increased traffic or workloads without degradation. 

## Implementation guidance
<a name="implementation-guidance-32"></a>

 The performance and efficiency in your OpenSearch Service domain can be maintained by closely monitoring CPU utilization. This involves tracking CPU usage over the past 14 days and verifying that the average rate remains below 75 percent. By doing so, you can proactively identify and address potential issues related to high CPU utilization, such as indexing or query processing bottlenecks, before they affect your domain's performance or scalability. 

### Implementation steps
<a name="implementation-steps-17"></a>
+  Log in to the AWS Management Console. 
+  Navigate to the Amazon OpenSearch Service console. 
+  Select your OpenSearch Service domain name. 
+  Choose the **Cluster health** tab. 
+  Navigate to the Data nodes box. 
+  Choose the **CPU utilization** graph. 
+  Adjust time range to `2w` and Statistic to `Average`. 

 If you notice that your average CPU utilization exceeds 75% over a 14-day period, it's recommended to investigate further by checking for dedicated leader nodes and reviewing the best practices outlined in [AOSPERF01](https://docs.aws.amazon.com/wellarchitected/latest/amazon-opensearch-service-lens/architecture-selection.html). 

## Resources
<a name="resources-31"></a>
+  [Monitoring OpenSearch cluster metrics with Amazon CloudWatch](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-cloudwatchmetrics.html#managedomains-cloudwatchmetrics-cluster-metrics) 
+  [Choosing the number of shards](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp-sharding.html) 
+  [Sizing OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) 

# AOSPERF02-BP02 Implement Java memory utilization monitoring
<a name="aosperf02-bp02"></a>

 Keep `JVMMemoryPressure` below 85% to maintain efficient memory utilization and prevent potential performance issues. 

 **Level of risk exposed if this best practice is not established:** High 

 **Desired outcome:** The JVMMemoryPressure metric remains below 85% for efficient memory utilization and minimizing potential performance issues or domain blocks. 

 **Benefits of establishing this best practice:** Closely monitoring `JVMMemoryPressure` helps you identify and address any potential issues related to high memory pressure, such as excessive garbage collection or memory leaks, before they impact your domain's performance, availability and scalability. 

## Implementation guidance
<a name="implementation-guidance-33"></a>

 To maintain optimal performance and memory utilization for your OpenSearch Service domains, closely monitor the `JVMMemoryPressure` metric. Specifically, track this metric to verify that it remains below the recommended threshold of 85 percent. This helps you identify and address any potential issues related to high memory pressure, such as excessive garbage collection or out of memory problems, before they impact your domain's performance or scalability. 

### Implementation steps
<a name="implementation-steps-18"></a>
+  Log in to the AWS Management Console. 
+  Navigate to the Amazon OpenSearch Service console. 
+  Select your OpenSearch Service domain name. 
+  Choose the **Cluster health tab**. 
+  Navigate to the Data nodes box. 
+  Choose the `JVMMemoryPressure` graph. 
+  Adjust time range to `2w` and Statistic to `Maximum`. 

 If you notice that your `JVMMemoryPressure` exceeds 85% over a 14-day period, investigate further by reviewing the best practices outlined in [AOSPERF01](https://docs.aws.amazon.com/wellarchitected/latest/amazon-opensearch-service-lens/architecture-selection.html) as well as reviewing some of the common issues outlined in [How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster](https://repost.aws/knowledge-center/opensearch-high-jvm-memory-pressure). 

## Resources
<a name="resources-32"></a>
+  [How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster](https://repost.aws/knowledge-center/opensearch-high-jvm-memory-pressure) 
+  [Troubleshooting Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/handling-errors.html#troubleshooting-cluster-block) 
+  [Choosing the number of shards](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp-sharding.html) 
+  [Sizing OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) 