

# Process and culture
<a name="process-and-culture"></a>


| AOSPERF06: How do you balance throughput and latency through bulk request size? | 
| --- | 
|   | 

 Determining the optimal bulk request size for data ingestion in OpenSearch involves balancing factors such as network latency, indexing rate, document size, batching frequency, refresh intervals, and buffer size. 

**Topics**
+ [AOSPERF06-BP01 Identify index refresh controls for optimal ingestion performance](aosperf06-bp01.md)
+ [AOSPERF06-BP02 Evaluate bulk request size](aosperf06-bp02.md)
+ [AOSPERF06-BP03 Implement HTTP compression](aosperf06-bp03.md)
+ [AOSPERF06-BP04 Evaluate `filter_path` criteria](aosperf06-bp04.md)

# AOSPERF06-BP01 Identify index refresh controls for optimal ingestion performance
<a name="aosperf06-bp01"></a>

 Improve indexing throughput and speed by adjusting the refresh\$1interval value to more than 30 seconds. 

 **Level of risk exposed if this best practice is not established:** Medium 

 **Desired outcome**: The refresh\$1interval value is set to more than 30 seconds, which could potentially lead to increased indexing throughput and faster indexing speeds. 

 **Benefits of establishing this best practice:** By adjusting the `refresh_interval`, you can optimize index write performance, as less frequent refreshes allow for more efficient ongoing writes which usually results as faster indexing speeds. 

## Implementation guidance
<a name="implementation-guidance-42"></a>

 A refresh operation makes all updates to an index accessible for search. The default refresh interval is one second, indicating that OpenSearch Service performs a refresh every second during ongoing index writes. 

### Implementation steps
<a name="implementation-steps-27"></a>
+  Check the current `refresh_interval` value for your index. 

```
GET /<index-name>/_settings/index.refresh_interval
```
+  Change the `refresh_interval` value to 30s or more 

```
PUT /sample_data/_settings
        {
        "index" : {
        "refresh_interval" : "30s"
        }
        }
```
+ It is also possible to disable the automatic refreshes by setting `refresh_interval": "-1"`
+  If the `refresh_interval` is disabled, you can manually refresh an index running `POST <index-name>/_refresh`. 
+  If you're loading new data into your domain through a batch process, it might be beneficial to disable the automatic refresh just before the batch process begins, and re-enable it after the process concludes. 

## Resources
<a name="resources-40"></a>
+  [Optimize OpenSearch Refresh Interval](https://opensearch.org/blog/optimize-refresh-interval/) 

# AOSPERF06-BP02 Evaluate bulk request size
<a name="aosperf06-bp02"></a>

 Improve indexing performance and domain stability by adjusting bulk request size to a recommended range of 3-5 MiB. 

 **Level of risk exposed if this best practice is not established:** Low 

 **Desired outcome**: The bulk request size is within the recommended starting range of 3 – 5 MiB, ensuring efficient data ingestion and query performance. 

 **Benefits of establishing this best practice:** 
+  Improved indexing performance 
+  Enhanced domain stability by avoiding unnecessary resource utilization 

## Implementation guidance
<a name="implementation-guidance-43"></a>

 Determining the optimal bulk size depends on your data and domain configuration. However, a recommended starting point is a bulk size of three to five MiB per request. 

 To optimize bulk requests, define the target document batch size within your application. Instead of trying to determine an exact number of documents per batch, focus on reaching a total batch size that falls within this range. 

 If you have multiple systems that are sending batches to OpenSearch, consider either using a service like [Amazon OpenSearch Service Ingestion](https://aws.amazon.com/opensearch-service/features/ingestion/), which can queue the batches for more effective ingestion, or ensure that the size of the batches across all systems do not overpass the recommended range if they simultaneously ingest data into your OpenSearch Service domain. 

## Resources
<a name="resources-41"></a>
+  [Optimize bulk request size and compression](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html#bp-perf-bulk) 

# AOSPERF06-BP03 Implement HTTP compression
<a name="aosperf06-bp03"></a>

 Reduce request and response payload sizes by enabling GZIP compression. 

 **Level of risk exposed if this best practice is not established:** Low 

 **Desired outcome**: GZIP compression is enabled in the OpenSearch Service domain, reducing the size of data being ingested or requested. 

 **Benefits of establishing this best practice:** Reduce the payload size of requests and responses. 

## Implementation guidance
<a name="implementation-guidance-44"></a>

 In OpenSearch Service domains, GZIP compression can be used to compress HTTP requests and responses. This compression helps reduce document size, lowering bandwidth usage and latency, resulting in improved transfer speeds. 

### Implementation steps
<a name="implementation-steps-28"></a>
+  To use compression, include the following headers in the HTTP requests of your application: 

```
'Accept-Encoding': 'gzip'
```

```
'Content-Encoding': 'gzip'
```

## Resources
<a name="resources-42"></a>
+  [Compressing HTTP requests in Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/gzip.html) 

# AOSPERF06-BP04 Evaluate `filter_path` criteria
<a name="aosperf06-bp04"></a>

 Reduce response and request sizes by optimizing the filter\$1path criteria, minimizing download traffic. 

 **Level of risk exposed if this best practice is not established:** Low 

 **Desired outcome**: Reduced response and request size. 

 **Benefits of establishing this best practice:** Reduced download traffic. 

## Implementation guidance
<a name="implementation-guidance-45"></a>

 Responses from the `_index` and `_bulk` APIs carry extensive information, which is valuable for troubleshooting or implementing retry logic. However, due to bandwidth considerations, indexing a 32-byte document, for instance, results in a 339-byte response (including headers). 

 This response size might seem minimal, but if you index 1,000,000 documents per day (or approximately 11.5 documents per second), 339 bytes per response works out to 10.17 GB of download traffic per month. 

### Implementation steps
<a name="implementation-steps-29"></a>
+  Use `filter_path` with the APIs that you call frequently, such as the `_index` and `_bulk` APIs. For example: 

```
PUT opensearch-domain/<index-name>
        /_doc/1?filter_path=result,_shards.total
        POST
        opensearch-domain/_bulk?filter_path=-took,-items.index._*
```

## Resources
<a name="resources-43"></a>
+  [Reducing response size](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/indexing.html#indexing-size) 