# Amazon OpenSearch Serverless collections
Collections

A *collection* in Amazon OpenSearch Serverless is a logical grouping of one or more indexes that represent an analytics workload. OpenSearch Serverless automatically manages and tunes the collection, requiring minimal manual input.

**Topics**
+ [

# Managing Amazon OpenSearch Serverless collections
](serverless-manage.md)
+ [

# Working with vector search collections
](serverless-vector-search.md)
+ [

# Using data lifecycle policies with Amazon OpenSearch Serverless
](serverless-lifecycle.md)
+ [

# Using the AWS SDKs to interact with Amazon OpenSearch Serverless
](serverless-sdk.md)
+ [

# Using CloudFormation to create Amazon OpenSearch Serverless collections
](serverless-cfn.md)
+ [

# Backing up collections using snapshots
](serverless-snapshots.md)
+ [

# Zstandard Codec Support in Amazon OpenSearch Serverless
](serverless-zstd-compression.md)
+ [

# Save Storage by Using Derived Source
](serverless-derived-source.md)

# Managing Amazon OpenSearch Serverless collections
Managing collections

A *collection* in Amazon OpenSearch Serverless is a logical grouping of one or more indexes that represent an analytics workload. OpenSearch Serverless automatically manages and tunes the collection, requiring minimal manual input.

**Topics**
+ [

# Configuring permissions for collections
](serverless-collection-permissions.md)
+ [

# Automatic semantic enrichment for Serverless
](serverless-semantic-enrichment.md)
+ [

# Creating collections
](serverless-create.md)
+ [

# Accessing OpenSearch Dashboards
](serverless-dashboards.md)
+ [

# Viewing collections
](serverless-list.md)
+ [

# Deleting collections
](serverless-delete.md)

# Configuring permissions for collections


OpenSearch Serverless uses the following AWS Identity and Access Management (IAM) permissions for creating and managing collections. You can specify IAM conditions to restrict users to specific collections.
+ `aoss:CreateCollection` – Create a collection.
+ `aoss:ListCollections` – List collections in the current account.
+ `aoss:BatchGetCollection` – Get details about one or more collections.
+ `aoss:UpdateCollection` – Modify a collection.
+ `aoss:DeleteCollection` – Delete a collection.

The following sample identity-based access policy provides the minimum permissions necessary for a user to manage a single collection named `Logs`:

```
[
   {
      "Sid":"Allows managing logs collections",
      "Effect":"Allow",
      "Action":[
         "aoss:CreateCollection",
         "aoss:ListCollections",
         "aoss:BatchGetCollection",
         "aoss:UpdateCollection",
         "aoss:DeleteCollection",
         "aoss:CreateAccessPolicy",
         "aoss:CreateSecurityPolicy"
      ],
      "Resource":"*",
      "Condition":{
         "StringEquals":{
            "aoss:collection":"Logs"
         }
      }
   }
]
```

`aoss:CreateAccessPolicy` and `aoss:CreateSecurityPolicy` are included because encryption, network, and data access policies are required in order for a collection to function properly. For more information, see [Identity and Access Management for Amazon OpenSearch Serverless](security-iam-serverless.md).

**Note**  
If you're creating the first collection in your account, you also need the `iam:CreateServiceLinkedRole` permission. For more information, see [Using service-linked roles to create OpenSearch Serverless collections](serverless-service-linked-roles.md).

# Automatic semantic enrichment for Serverless


## Introduction


The automatic semantic enrichment feature can help improve search relevance by up to 20% over lexical search. Automatic semantic enrichment eliminates the undifferentiated heavy lifting of managing your own ML (machine learning) model infrastructure and integration with the search engine. The feature is available for all three serverless collection types: Search, Time Series, and Vector.

## What is semantic search


 Traditional search engines rely on word-to-word matching (referred to as lexical search) to find results for queries. Although this works well for specific queries such as television model numbers, it struggles with more abstract searches. For example, when searching for "shoes for the beach," a lexical search merely matches individual words "shoes," "beach," "for," and "the" in catalog items, potentially missing relevant products like "water-resistant sandals" or "surf footwear" that don't contain the exact search terms.

 Semantic search returns query results that incorporate not just keyword matching, but the intent and contextual meaning of the user's search. For example, if a user searches for "how to treat a headache," a semantic search system might return the following results: 
+ Migraine remedies
+ Pain management techniques
+ Over-the-counter pain relievers 

## Model details and performance benchmark


 While this feature handles the technical complexities behind the scenes without exposing the underlying model, we provide transparency through a brief model description and benchmark results to help you make informed decisions about feature adoption in your critical workloads.

 Automatic semantic enrichment uses a service-managed, pre-trained sparse model that works effectively without requiring custom fine-tuning. The model analyzes the fields you specify, expanding them into sparse vectors based on learned associations from diverse training data. The expanded terms and their significance weights are stored in native Lucene index format for efficient retrieval. We’ve optimized this process using [document-only mode,](https://docs.opensearch.org/docs/latest/vector-search/ai-search/neural-sparse-with-pipelines/#step-1a-choose-the-search-mode) where encoding happens only during data ingestion. Search queries are merely tokenized rather than processed through the sparse model, making the solution both cost-effective and performant. 

 Our performance validation during feature development used the [MS MARCO](https://huggingface.co/datasets/BeIR/msmarco) passage retrieval dataset, featuring passages averaging 334 characters. For relevance scoring, we measured average Normalized Discounted Cumulative Gain (NDCG) for the first 10 search results (ndcg@10) on the [BEIR](https://github.com/beir-cellar/beir) benchmark for English content and average ndcg@10 on MIRACL for multilingual content. We assessed latency through client-side, 90th-percentile (p90) measurements and search response p90 [took values.](https://github.com/beir-cellar/beir) These benchmarks provide baseline performance indicators for both search relevance and response times. Here are the key benchmark numbers - 
+ English language - Relevance improvement of 20% over lexical search. It also lowered P90 search latency by 7.7% over lexical search (BM25 is 26 ms, and automatic semantic enrichment is 24 ms).
+ Multi-lingual - Relevance improvement of 105% over lexical search, whereas P90 search latency increased by 38.4% over lexical search (BM25 is 26 ms, and automatic semantic enrichment is 36 ms).

Given the unique nature of each workload, we encourage you to evaluate this feature in your development environment using your own benchmarking criteria before making implementation decisions.

## Languages Supported


The feature supports English. In addition, the model also supports Arabic, Bengali, Chinese, Finnish, French, Hindi, Indonesian, Japanese, Korean, Persian, Russian, Spanish, Swahili, and Telugu.

## Set up an automatic semantic enrichment index for serverless collections


Setting up an index with automatic semantic enrichment enabled for your text fields is easy, and you can manage it through the console, APIs, and CloudFormation templates during new index creation. To enable it for an existing index, you need to recreate the index with automatic semantic enrichment enabled for text fields. 

Console experience - The AWS console allows you to easily create an index with automatic semantic enrichment fields. Once you select a collection, you will find the create index button at the top of the console. Once you click the create index button, you will find options to define automatic semantic enrichment fields. In one index, you can have combinations of automatic semantic enrichment for English and multilingual, as well as lexical fields.

![\[alt text not found\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/images/ase-console-exp-serverless.png)


API experience - To create an automatic semantic enrichment index using the AWS Command Line Interface (AWS CLI), use the create-index command: 

```
aws opensearchserverless create-index \
--id [collection_id] \
--index-name [index_name] \
--index-schema [index_body] \
```

 In the following example index-schema, the *title\$1semantic *field has a field type set to *text* and has parameter *semantic\$1enrichment *set to status *ENABLED*. Setting the *semantic\$1enrichment* parameter enables automatic semantic enrichment on the *title\$1semantic* field. You can use the *language\$1options* field to specify either *english* or *multi-lingual*. 

```
    aws opensearchserverless create-index \
    --id XXXXXXXXX \
    --index-name 'product-catalog' \
    --index-schema '{
    "mappings": {
        "properties": {
            "product_id": {
                "type": "keyword"
            },
            "title_semantic": {
                "type": "text",
                "semantic_enrichment": {
                    "status": "ENABLED",
                    "language_options": "english"
                }
            },
            "title_non_semantic": {
                "type": "text"
            }
        }
    }
}'
```

To describe the created index, use the following command:

```
aws opensearchserverless get-index \
--id [collection_id] \
--index-name [index_name] \
```

You can also use CloudFormation templates (Type:AWS::OpenSearchServerless::CollectionIndex) to create semantic search during collection provisioning as well as after the collection is created.

## Data ingestion and search


Once you've created an index with automatic semantic enrichment enabled, the feature works automatically during data ingestion process, no additional configuration required.

Data ingestion: When you add documents to your index, the system automatically:
+ Analyzes the text fields you designated for semantic enrichment
+ Generates semantic encodings using OpenSearch Service managed sparse model
+ Stores these enriched representations alongside your original data

This process uses OpenSearch's built-in ML connectors and ingest pipelines, which are created and managed automatically behind the scenes.

Search: The semantic enrichment data is already indexed, so queries run efficiently without invoking the ML model again. This means you get improved search relevance with no additional search latency overhead.

## Configuring permissions for automatic semantic enrichment


Before creating an automated semantic enrichment index, you need to configure the required permissions. This section explains the permissions needed and how to set them up.

### IAM policy permissions


Use the following AWS Identity and Access Management (IAM) policy to grant the necessary permissions for working with automatic semantic enrichment:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AutomaticSemanticEnrichmentPermissions",
            "Effect": "Allow",
            "Action": [
                "aoss:CreateIndex",
                "aoss:GetIndex",
                "aoss:UpdateIndex",
                "aoss:DeleteIndex",
                "aoss:APIAccessAll"
            ],
            "Resource": "*"
        }
    ]
}
```

------

**Key permissions**  
+ The `aoss:*Index` permissions enable index management
+ The `aoss:APIAccessAll` permission allows OpenSearch API operations
+ To restrict permissions to a specific collection, replace `"Resource": "*"` with the collection's ARN

### Configure data access permissions


To set up an index for automatic semantic enrichment, you must have appropriate data access policies that grant permission to access index, pipeline, and model collection resources. For more information about data access policies, see [Data access control for Amazon OpenSearch Serverless](serverless-data-access.md). For the procedure to configure a data access policy, see [Creating data access policies (console)](serverless-data-access.md#serverless-data-access-console).

#### Data access permissions


```
[
    {
        "Description": "Create index permission",
        "Rules": [
            {
                "ResourceType": "index",
                "Resource": ["index/collection_name/*"],
                "Permission": [
                  "aoss:CreateIndex", 
                  "aoss:DescribeIndex",
                  "aoss:UpdateIndex",
                  "aoss:DeleteIndex"
                ]
            }
        ],
        "Principal": [
            "arn:aws:iam::account_id:role/role_name"
        ]
    },
    {
        "Description": "Create pipeline permission",
        "Rules": [
            {
                "ResourceType": "collection",
                "Resource": ["collection/collection_name"],
                "Permission": [
                  "aoss:CreateCollectionItems",
                  "aoss:DescribeCollectionItems"
                ]
            }
        ],
        "Principal": [
            "arn:aws:iam::account_id:role/role_name"
        ]
    },
    {
        "Description": "Create model permission",
        "Rules": [
            {
                "ResourceType": "model",
                "Resource": ["model/collection_name/*"],
                "Permission": ["aoss:CreateMLResource"]
            }
        ],
        "Principal": [
            "arn:aws:iam::account_id:role/role_name"
        ]
    },
]
```

#### Network access permissions


To allow service APIs to access private collections, you must configure network policies that permit the required access between the service API and the collection. For more information about network policies, see [Network access for Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html) .

```
[
   {
      "Description":"Enable automatic semantic enrichment in a private collection",
      "Rules":[
         {
            "ResourceType":"collection",
            "Resource":[
               "collection/collection_name"
            ]
         }
      ],
      "AllowFromPublic":false,
      "SourceServices":[
         "aoss.amazonaws.com"
      ],
   }
]
```

**To configure network access permissions for a private collection**

1. Sign in to the OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home).

1. In the left navigation, choose *Network policies*. Then do one of the following:
   + Choose an existing policy name and choose *Edit*
   + Choose *Create network policy* and configure the policy details

1. In the *Access type* area, choose *Private (recommended)*, and then select *AWS service private access*.

1. In the search field, choose *Service*, and then choose *aoss.amazonaws.com*.

1. In the *Resource type* area, select the *Enable access to OpenSearch endpoint* box.

1. For *Search collection(s), or input specific prefix term(s)*, in the search field, select *Collection Name*. Then enter or select the name of the collections to associate with the network policy.

1. Choose *Create* for a new network policy or *Update* for an existing network policy.

## Query Rewrites


Automatic semantic enrichment automatically converts your existing “match” queries to semantic search queries without requiring query modifications. If a match query is part of a compound query, the system traverses your query structure, finds match queries, and replaces them with neural sparse queries. Currently, the feature only supports replacing “match” queries, whether it’s a standalone query or part of a compound query. “multi\$1match” is not supported. In addition, the feature supports all compound queries to replace their nested match queries. Compound queries include: bool, boosting, constant\$1score, dis\$1max, function\$1score, and hybrid. 

## Limitations of automatic semantic enrichment


Automatic semantic search is most effective when applied to small-to-medium sized fields containing natural language content, such as movie titles, product descriptions, reviews, and summaries. Although semantic search enhances relevance for most use cases, it might not be optimal for certain scenarios. Consider following limitations when deciding whether to implement automatic semantic enrichment for your specific use case. 
+ Very long documents – The current sparse model processes only the first 8,192 tokens of each document for English. For multilingual documents, it’s 512 tokens. For lengthy articles, consider implementing document chunking to ensure complete content processing.
+ Log analysis workloads – Semantic enrichment significantly increases index size, which might be unnecessary for log analysis where exact matching typically suffices. The additional semantic context rarely improves log search effectiveness enough to justify the increased storage requirements. 
+ Automatic semantic enrichment is not compatible with the Derived Source feature. 
+ Throttling – Indexing inference requests are currently capped at 100 TPS for OpenSearch Serverless. This is a soft limit; reach out to AWS Support for higher limits.

## Pricing


 OpenSearch Serverless bills automatic semantic enrichment based on OpenSearch Compute Units (OCUs) consumed during sparse vector generation at indexing time. You’re charged only for actual usage during indexing. You can monitor this consumption using the Amazon CloudWatch metric SemanticSearchOCU. For specific details about model token limits, volume throughput per OCU, and example of sample calculation, visit [ OpenSearch Service Pricing](https://aws.amazon.com/opensearch-service/pricing/). 

# Creating collections


You can use the console or the AWS CLI to create a serverless collection. These steps cover how to create a *search* or *time series* collection. To create a *vector search* collection, see [Working with vector search collections](serverless-vector-search.md). 

**Topics**
+ [

# Create a collection (console)
](serverless-create-console.md)
+ [

# Create a collection (CLI)
](serverless-create-cli.md)

# Create a collection (console)


Use the procedures in this section to create a collection by using the AWS Management Console. These steps cover how to create a *search* or *time series* collection. To create a *vector search* collection, see [Working with vector search collections](serverless-vector-search.md). 

**Topics**
+ [

## Configure collection settings
](#serverless-create-console-step-2)
+ [

## Configure additional search fields
](#serverless-create-console-step-3)

## Configure collection settings


Use the following procedure configure information about your collection. 

**To configure collection settings using the console**

1. Navigate to the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home/](https://console.aws.amazon.com/aos/home/).

1. Expand **Serverless** in the left navigation pane and choose **Collections**. 

1. Choose **Create collection**.

1. Provide a name and description for the collection. The name must meet the following criteria:
   + Is unique to your account and AWS Region
   + Contains only lowercase letters a-z, the numbers 0–9, and the hyphen (-)
   + Contains between 3 and 32 characters

1. Choose a collection type:
   + **Time series** – Log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data. At least 24 hours of data is stored on hot indexes, and the rest remains in warm storage.
   + **Search** – Full-text search that powers applications in your internal networks and internet-facing applications. All search data is stored in hot storage to ensure fast query response times.
**Note**  
Choose this option if you are enabling automatic semantic search, as described in [Configure collection settings](#serverless-create-console-step-2).
   + **Vector search** – Semantic search on vector embeddings that simplifies vector data management. Powers machine learning (ML) augmented search experiences and generative AI applications such as chatbots, personal assistants, and fraud detection.

   For more information, see [Choosing a collection type](serverless-overview.md#serverless-usecase).

1. For **Deployment type**, choose the redundancy setting for your collection. By default, each collection has redundancy, which means that the indexing and search OpenSearch Compute Units (OCUs) each have their own standby replicas in a different Availability Zone. For development and testing purposes, you can choose to disable redundancy, which reduces the number of OCUs in your collection to two. For more information, see [How it works](serverless-overview.md#serverless-process).

1. For **Security**, choose **Standard create**.

1. For **Encryption**, choose an AWS KMS key to encrypt your data with. OpenSearch Serverless notifies you if the collection name that you entered matches a pattern defined in an encryption policy. You can choose to keep this match or override it with unique encryption settings. For more information, see [Encryption in Amazon OpenSearch Serverless](serverless-encryption.md).

1. For **Network access settings**, configure network access for the collection.
   + For **Access type**, select public or private. 

     If you choose private, specify which VPC endpoints and AWS services can access the collection.
     + **VPC endpoints for access** – Specify one or more VPC endpoints to allow access through. To create a VPC endpoint, see [Data plane access through AWS PrivateLink](serverless-vpc.md).
     + **AWS service private access** – Select one or more supported services to allow access to.
   + For **Resource type**, select whether users can access the collection through its *OpenSearch* endpoint (to make API calls through cURL, Postman, and so on), through the *OpenSearch Dashboards* endpoint (to work with visualizations and make API calls through the console), or both.
**Note**  
AWS service private access applies only to the OpenSearch endpoint, not to the OpenSearch Dashboards endpoint.

   OpenSearch Serverless notifies you if the collection name that you entered matches a pattern defined in a network policy. You can choose to keep this match or override it with custom network settings. For more information, see [Network access for Amazon OpenSearch Serverless](serverless-network.md).

1. (Optional) Add one or more tags to the collection. For more information, see [Tagging Amazon OpenSearch Serverless collections](tag-collection.md).

1. Choose **Next**.

## Configure additional search fields


The options you see on page two of the create collection workflow depend on the type of collection you are creating. This section describes how to configure additional search fields for each collection type. This section also describes how to configure automatic semantic enrichment. Skip any section that doesn't apply to your collection type.

**Topics**
+ [

### Configure automatic semantic enrichment
](#serverless-create-console-step-3-semantic-enrichment-fields)
+ [

### Configure time series search fields
](#serverless-create-console-step-3-time-series-fields)
+ [

### Configure lexical search fields
](#serverless-create-console-step-3-lexical-fields)
+ [

### Configure vector search fields
](#serverless-create-console-step-3-vector-search-fields)

### Configure automatic semantic enrichment


When you create or edit a collection, you can configure automatic semantic enrichment, which simplifies semantic search implementation and capabilities in Amazon OpenSearch Service. Semantic search returns query results that incorporate not just keyword matching, but the intent and contextual meaning of the user's search. For more information, see [Automatic semantic enrichment for Serverless](serverless-semantic-enrichment.md).

**To configure automatic semantic enrichment**

1. In the **Index details** section, for **Index name**, specify a name.

1. In the **Automatic semantic enrichment fields** section, choose **Add semantic search field**.

1. In the **Input field name for semantic enrichment** field, enter the name of a field that you want to enrich.

1. **Data type** is **Text**. You can't change this.

1. For **Language**, choose either **English** or **Multilingual**.

1. Choose **Add field**.

1. After you finish configuring optional fields for your collection, choose **Next**. Review your changes and choose **Submit** to create the collection.

### Configure time series search fields


The options in the **Time series search fields** section pertain to time series data and data streams. For more information about these subjects, see [Managing time-series data in Amazon OpenSearch Service with data streams](data-streams.md).

**To configure time series search fields**

1. In the **Time series search fields** section, choose **Add time series field**.

1. For **Field name**, enter a name.

1. For **Data type**, choose a type from the list.

1. Choose **Add field**

1. After you finish configuring optional fields for your collection, choose **Next**. Review your changes and choose **Submit** to create the collection.

### Configure lexical search fields


Lexical search seeks an exact match between a search query and indexed terms or keywords.

**To configure lexical search fields**

1. In the **Lexical search fields** section, choose **Add search field**.

1. For **Field name**, enter a name.

1. For **Data type**, choose a type from the list.

1. Choose **Add field**

1. After you finish configuring optional fields for your collection, choose **Next**. Review your changes and choose **Submit** to create the collection.

### Configure vector search fields


**To configure vector search fields**

1. In the **Vector fields** section, choose **Add vector field**.

1. For **Field name**, enter a name.

1. For **Engine**, choose a type from the list.

1. Enter the number of dimensions.

1. For **Distance Metric**, choose a type from the list.

1. After you finish configuring optional fields for your collection, choose **Next**.

1. Review your changes and choose **Submit** to create the collection.

# Create a collection (CLI)


Use the procedures in this section to create an OpenSearch Serverless collection using the AWS CLI. 

**Topics**
+ [

## Before you begin
](#serverless-create-cli-before-you-begin)
+ [

## Creating a collection
](#serverless-create-cli-creating)
+ [

## Creating a collection with an automatic semantic enrichment index
](#serverless-create-cli-automatic-semantic-enrichment)

## Before you begin


Before you create a collection using the AWS CLI, use the following procedure to create required policies for the collection.

**Note**  
In each of the following procedures, when you specify a name for a collection, the name must meet the following criteria:  
Is unique to your account and AWS Region
Contains only lowercase letters a-z, the numbers 0–9, and the hyphen (-)
Contains between 3 and 32 characters

**To create required policies for a collection**

1. Open the AWS CLI and run the following command to create an [encryption policy](serverless-encryption.md) with a resource pattern that matches the intended name of the collection. 

   ```
   &aws opensearchserverless create-security-policy \
     --name policy name \
     --type encryption --policy "{\"Rules\":[{\"ResourceType\":\"collection\",\"Resource\":[\"collection\/collection name\"]}],\"AWSOwnedKey\":true}"
   ```

   For example, if you plan to name your collection *logs-application*, you might create an encryption policy like this:

   ```
   &aws opensearchserverless create-security-policy \
     --name logs-policy \
     --type encryption --policy "{\"Rules\":[{\"ResourceType\":\"collection\",\"Resource\":[\"collection\/logs-application\"]}],\"AWSOwnedKey\":true}"
   ```

   If you plan to use the policy for additional collections, you can make the rule more broad, such as `collection/logs*` or `collection/*`.

1. Run the following command to configure network settings for the collection using a [network policy](serverless-network.md). You can create network policies after you create a collection, but we recommend doing it beforehand.

   ```
   &aws opensearchserverless create-security-policy \
     --name policy name \
     --type network --policy "[{\"Description\":\"description\",\"Rules\":[{\"ResourceType\":\"dashboard\",\"Resource\":[\"collection\/collection name\"]},{\"ResourceType\":\"collection\",\"Resource\":[\"collection\/collection name\"]}],\"AllowFromPublic\":true}]"
   ```

   Using the previous *logs-application* example, you might create the following network policy:

   ```
   &aws opensearchserverless create-security-policy \
     --name logs-policy \
     --type network --policy "[{\"Description\":\"Public access for logs collection\",\"Rules\":[{\"ResourceType\":\"dashboard\",\"Resource\":[\"collection\/logs-application\"]},{\"ResourceType\":\"collection\",\"Resource\":[\"collection\/logs-application\"]}],\"AllowFromPublic\":true}]"
   ```

## Creating a collection


The following procedure uses the [CreateCollection](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_CreateCollection.html) API action to create a collection of type `SEARCH` or `TIMESERIES`. If you don't specify a collection type in the request, it defaults to `TIMESERIES`. For more information about these types, see [Choosing a collection type](serverless-overview.md#serverless-usecase). To create a *vector search* collection, see [Working with vector search collections](serverless-vector-search.md). 

If your collection is encrypted with an AWS owned key, the `kmsKeyArn` is `auto` rather than an ARN.

**Important**  
After you create a collection, you won't be able to access it unless it matches a data access policy. For more information, see [Data access control for Amazon OpenSearch Serverless](serverless-data-access.md).

**To create a collection**

1. Verify that you created required policies described in [Before you begin](#serverless-create-cli-before-you-begin).

1. Run the following command. For `type` specify either `SEARCH` or `TIMESERIES`.

   ```
   &aws opensearchserverless create-collection --name "collection name" --type collection type --description "description"
   ```

## Creating a collection with an automatic semantic enrichment index


Use the following procedure to create a new OpenSearch Serverless collection with an index that is configured for [automatic semantic enrichment](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-semantic-enrichment.html). The procedure uses the OpenSearch Serverless [CreateIndex](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_CreateIndex.html) API action.

**To create a new collection with an index configured for automatic semantic enrichment**

Run the following command to create the collection and an index.

```
&aws opensearchserverless create-index \
--region Region ID \
--id collection name --index-name index name \
--index-schema \
'mapping in json'
```

Here's an example.

```
&aws opensearchserverless create-index \
--region us-east-1 \
--id conversation_history --index-name conversation_history_index \
--index-schema \ 
'{
    "mappings": {
        "properties": {
            "age": {
                "type": "integer"
            },
            "name": {
                "type": "keyword"
            },
            "user_description": {
                "type": "text"
            },
            "conversation_history": {
                "type": "text",
                "semantic_enrichment": {
                    "status": "ENABLED",
                    // Specifies the sparse tokenizer for processing multi-lingual text
                    "language_option": "MULTI-LINGUAL", 
                    // If embedding_field is provided, the semantic embedding field will be set to the given name rather than original field name + "_embedding"
                    "embedding_field": "conversation_history_user_defined" 
                }
            },
            "book_title": {
                "type": "text",
                "semantic_enrichment": {
                    // No embedding_field is provided, so the semantic embedding field is set to "book_title_embedding"
                    "status": "ENABLED",
                    "language_option": "ENGLISH"
                }
            },
            "abstract": {
                "type": "text",
                "semantic_enrichment": {
                    // If no language_option is provided, it will be set to English.
                    // No embedding_field is provided, so the semantic embedding field is set to "abstract_embedding"
                    "status": "ENABLED" 
                }
            }
        }
    }
}'
```

# Accessing OpenSearch Dashboards


After you create a collection with the AWS Management Console, you can navigate to the collection's OpenSearch Dashboards URL. You can find the Dashboards URL by choosing **Collections** in the left navigation pane and selecting the collection to open its details page. The URL takes the format `https://dashboards.us-east-1.aoss.amazonaws.com/_login/?collectionId=07tjusf2h91cunochc`. Once you navigate to the URL, you'll automatically log into Dashboards.

If you already have the OpenSearch Dashboards URL available but aren't on the AWS Management Console, calling the Dashboards URL from the browser will redirect to the console. Once you enter your AWS credentials, you'll automatically log in to Dashboards. For information about accessing collections for SAML, see [Accessing OpenSearch Dashboards with SAML](serverless-saml.md#serverless-saml-dashboards).

The OpenSearch Dashboards console timeout is one hour and isn't configurable.

**Note**  
On May 10, 2023, OpenSearch introduced a common global endpoint for OpenSearch Dashboards. You can now navigate to OpenSearch Dashboards in the browser with a URL that takes the format `https://dashboards.us-east-1.aoss.amazonaws.com/_login/?collectionId=07tjusf2h91cunochc`. To ensure backward compatibility, we'll continue to support the existing collection specific OpenSearch Dashboards endpoints with the format `https://07tjusf2h91cunochc.us-east-1.aoss.amazonaws.com/_dashboards`.

# Viewing collections


You can view the existing collections in your AWS account on the **Collections** tab of the Amazon OpenSearch Service console.

To list collections along with their IDs, send a [ListCollections](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_ListCollections.html) request.

```
&aws opensearchserverless list-collections
```

**Sample response**

```
{
   "collectionSummaries":[
      {
         "arn":"arn:aws:aoss:us-east-1:123456789012:collection/07tjusf2h91cunochc",
         "id":"07tjusf2h91cunochc",
         "name":"my-collection",
         "status":"CREATING"
      }
   ]
}
```

To limit the search results, use collection filters. This request filters the response to collections in the `ACTIVE` state: 

```
&aws opensearchserverless list-collections --collection-filters '{ "status": "ACTIVE" }'
```

To get more detailed information about one or more collections, including the OpenSearch endpoint and the OpenSearch Dashboards endpoint, send a [BatchGetCollection](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_BatchGetCollection.html) request:

```
&aws opensearchserverless batch-get-collection --ids "07tjusf2h91cunochc" "1iu5usc4rame"
```

**Note**  
You can include `--names` or `--ids` in the request, but not both.

**Sample response**

```
{
   "collectionDetails":[
      {
         "id": "07tjusf2h91cunochc",
         "name": "my-collection",
         "status": "ACTIVE",
         "type": "SEARCH",
         "description": "",
         "arn": "arn:aws:aoss:us-east-1:123456789012:collection/07tjusf2h91cunochc",
         "kmsKeyArn": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab",
         "createdDate": 1667446262828,
         "lastModifiedDate": 1667446300769,
         "collectionEndpoint": "https://07tjusf2h91cunochc.us-east-1.aoss.amazonaws.com",
         "dashboardEndpoint": "https://07tjusf2h91cunochc.us-east-1.aoss.amazonaws.com/_dashboards"
      },
      {
         "id": "178ukvtg3i82dvopdid",
         "name": "another-collection",
         "status": "ACTIVE",
         "type": "TIMESERIES",
         "description": "",
         "arn": "arn:aws:aoss:us-east-1:123456789012:collection/178ukvtg3i82dvopdid",
         "kmsKeyArn": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab",
         "createdDate": 1667446262828,
         "lastModifiedDate": 1667446300769,
         "collectionEndpoint": "https://178ukvtg3i82dvopdid.us-east-1.aoss.amazonaws.com",
         "dashboardEndpoint": "https://178ukvtg3i82dvopdid.us-east-1.aoss.amazonaws.com/_dashboards"
      }
   ],
   "collectionErrorDetails":[]
}
```

# Deleting collections


Deleting a collection deletes all data and indexes in the collection. You can't recover collections after you delete them.

**To delete a collection using the console**

1. From the **Collections** panel of the Amazon OpenSearch Service console, select the collection you want to delete.

1. Choose **Delete** and confirm deletion.

To delete a collection using the AWS CLI, send a [DeleteCollection](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_DeleteCollection.html) request:

```
&aws opensearchserverless delete-collection --id 07tjusf2h91cunochc
```

**Sample response**

```
{
   "deleteCollectionDetail":{
      "id":"07tjusf2h91cunochc",
      "name":"my-collection",
      "status":"DELETING"
   }
}
```

# Working with vector search collections


The *vector search* collection type in OpenSearch Serverless provides a similarity search capability that is scalable and high performing. It makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (AI) applications without having to manage the underlying vector database infrastructure. 

Use cases for vector search collections include image searches, document searches, music retrieval, product recommendations, video searches, location-based searches, fraud detection, and anomaly detection. 

Because the vector engine for OpenSearch Serverless is powered by the [k-nearest neighbor (k-NN) search feature](https://opensearch.org/docs/latest/search-plugins/knn/index/) in OpenSearch, you get the same functionality with the simplicity of a serverless environment. The engine supports [k-NN plugin API](https://opensearch.org/docs/latest/search-plugins/knn/api/). With these operations, you can take advantage of full-text search, advanced filtering, aggregations, geospatial queries, nested queries for faster retrieval of data, and enhanced search results.

The vector engine provides distance metrics such as Euclidean distance, cosine similarity, and dot product similarity, and can accommodate 16,000 dimensions. You can store fields with various data types for metadata, such as numbers, Booleans, dates, keywords, and geopoints. You can also store fields with text for descriptive information to add more context to stored vectors. Colocating the data types reduces complexity, increases maintainability, and avoids data duplication, version compatibility challenges, and licensing issues. 

**Note**  
Note the following information:  
Amazon OpenSearch Serverless supports Faiss 16-bit scalar quantization which can be used to perform conversions between 32-bit floating and 16-bit vectors. To learn more, see [ Faiss 16-bit scalar quantization](https://opensearch.org/docs/latest/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). You can also use binary vectors to reduce memory costs. For more information, see [Binary vectors](https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector#binary-vectors).
Amazon OpenSearch Serverless supports disk-based vector search. Disk-based vector search significantly reduces the operational costs for vector workloads in low-memory environments. For more information, see [Disk-based vector search](https://docs.opensearch.org/2.19/vector-search/optimizing-storage/disk-based-vector-search/).

## Getting started with vector search collections
Getting started

In this tutorial, you complete the following steps to store, search, and retrieve vector embeddings in real time:

1. [Configure permissions](#serverless-vector-permissions)

1. [Create a collection](#serverless-vector-create)

1. [Upload and search data](#serverless-vector-index)

1. [Delete the collection](#serverless-vector-delete)

### Step 1: Configure permissions


To complete this tutorial (and to use OpenSearch Serverless in general), you must have the correct AWS Identity and Access Management (IAM) permissions. In this tutorial, you create a collection, upload and search data, and then delete the collection.

Your user or role must have an attached [identity-based policy](security-iam-serverless.md#security-iam-serverless-id-based-policies) with the following minimum permissions:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "aoss:CreateCollection",
        "aoss:ListCollections",
        "aoss:BatchGetCollection",
        "aoss:DeleteCollection",
        "aoss:CreateAccessPolicy",
        "aoss:ListAccessPolicies",
        "aoss:UpdateAccessPolicy",
        "aoss:CreateSecurityPolicy",
        "iam:ListUsers",
        "iam:ListRoles"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}
```

------

For more information about OpenSearch Serverless IAM permissions, see [Identity and Access Management for Amazon OpenSearch Serverless](security-iam-serverless.md).

### Step 2: Create a collection


A *collection* is a group of OpenSearch indexes that work together to support a specific workload or use case.

**To create an OpenSearch Serverless collection**

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home ).

1. Choose **Collections** in the left navigation pane and choose **Create collection**.

1. Name the collection **housing**.

1. For collection type, choose **Vector search**. For more information, see [Choosing a collection type](serverless-overview.md#serverless-usecase).

1. Under **Deployment type**, clear **Enable redundancy (active replicas)**. This creates a collection in development or testing mode, and reduces the number of OpenSearch Compute Units (OCUs) in your collection to two. If you want to create a production environment in this tutorial, leave the check box selected. 

1. Under **Security**, select **Easy create** to streamline your security configuration. All the data in the vector engine is encrypted in transit and at rest by default. The vector engine supports fine-grained IAM permissions so that you can define who can create, update, and delete encryptions, networks, collections, and indexes.

1. Choose **Next**.

1. Review your collection settings and choose **Submit**. Wait several minutes for the collection status to become `Active`.

### Step 3: Upload and search data


An *index* is a collection of documents with a common data schema that provides a way for you to store, search, and retrieve your vector embeddings and other fields. You can create and upload data to indexes in an OpenSearch Serverless collection by using the [Dev Tools](https://opensearch.org/docs/latest/dashboards/dev-tools/index-dev/) console in OpenSearch Dashboards, or an HTTP tool such as [Postman](https://www.postman.com/downloads/) or [awscurl](https://github.com/okigan/awscurl). This tutorial uses Dev Tools.

**To index and search data in the housing collection**

1. To create a single index for your new collection, send the following request in the [Dev Tools](https://opensearch.org/docs/latest/dashboards/dev-tools/index-dev/) console. By default, this creates an index with an `nmslib` engine and Euclidean distance.

   ```
   PUT housing-index
   {
      "settings": {
         "index.knn": true
      },
      "mappings": {
         "properties": {
            "housing-vector": {
               "type": "knn_vector",
               "dimension": 3
            },
            "title": {
               "type": "text"
            },
            "price": {
               "type": "long"
            },
            "location": {
               "type": "geo_point"
            }
         }
      }
   }
   ```

1. To index a single document into *housing-index*, send the following request:

   ```
   POST housing-index/_doc
   {
     "housing-vector": [
       10,
       20,
       30
     ],
     "title": "2 bedroom in downtown Seattle",
     "price": "2800",
     "location": "47.71, 122.00"
   }
   ```

1. To search for properties that are similar to the ones in your index, send the following query:

   ```
   GET housing-index/_search
   {
       "size": 5,
       "query": {
           "knn": {
               "housing-vector": {
                   "vector": [
                       10,
                       20,
                       30
                   ],
                   "k": 5
               }
           }
       }
   }
   ```

### Step 4: Delete the collection


Because the *housing* collection is for test purposes, make sure to delete it when you're done experimenting.

**To delete an OpenSearch Serverless collection**

1. Go back to the **Amazon OpenSearch Service** console.

1. Choose **Collections** in the left navigation pane and select the **properties** collection.

1. Choose **Delete** and confirm the deletion.

## Filtered search


You can use filters to refine your semantic search results. To create an index and perform a filtered search on your documents, substitute [Upload and search data](#serverless-vector-index) in the previous tutorial with the following instructions. The other steps remain the same. For more information about filters, see [k-NN search with filters](https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/).

**To index and search data in the housing collection**

1. To create a single index for your collection, send the following request in the [Dev Tools](https://opensearch.org/docs/latest/dashboards/dev-tools/index-dev/) console:

   ```
   PUT housing-index-filtered
   {
     "settings": {
       "index.knn": true
     },
     "mappings": {
       "properties": {
         "housing-vector": {
           "type": "knn_vector",
           "dimension": 3,
           "method": {
             "engine": "faiss",
             "name": "hnsw"
           }
         },
         "title": {
           "type": "text"
         },
         "price": {
           "type": "long"
         },
         "location": {
           "type": "geo_point"
         }
       }
     }
   }
   ```

1. To index a single document into *housing-index-filtered*, send the following request:

   ```
   POST housing-index-filtered/_doc
   {
     "housing-vector": [
       10,
       20,
       30
     ],
     "title": "2 bedroom in downtown Seattle",
     "price": "2800",
     "location": "47.71, 122.00"
   }
   ```

1. To search your data for an apartment in Seattle under a given price and within a given distance of a geographical point, send the following request:

   ```
   GET housing-index-filtered/_search
   {
     "size": 5,
     "query": {
       "knn": {
         "housing-vector": {
           "vector": [
             0.1,
             0.2,
             0.3
           ],
           "k": 5,
           "filter": {
             "bool": {
               "must": [
                 {
                   "query_string": {
                     "query": "Find me 2 bedroom apartment in Seattle under $3000 ",
                     "fields": [
                       "title"
                     ]
                   }
                 },
                 {
                   "range": {
                     "price": {
                       "lte": 3000
                     }
                   }
                 },
                 {
                   "geo_distance": {
                     "distance": "100miles",
                     "location": {
                       "lat": 48,
                       "lon": 121
                     }
                   }
                 }
               ]
             }
           }
         }
       }
     }
   }
   ```

## Billion scale workloads


Vector search collections support workloads with billions of vectors. You don’t need to reindex for scaling purposes because auto scaling does this for you. If you have millions of vectors (or more) with a high number of dimensions and need more than 200 OCUs, contact [AWS Support](https://aws.amazon.com/premiumsupport/) to raise your the maximum OpenSearch Compute Units (OCUs) for your account. 

## Limitations


Vector search collections have the following limitations:
+ Vector search collections don't support the Apache Lucene ANN engine.
+ Vector search collections only support the HNSW algorithm with Faiss and do not support IVF and IVFQ.
+ Vector search collections don't support the warmup, stats, and model training API operations.
+ Vector search collections don't support inline or stored scripts.
+ Index count information isn't available in the AWS Management Console for vector search collections. 
+ The refresh interval for indexes on vector search collections is 60 seconds.

## Next steps


Now that you know how to create a vector search collection and index data, you might want to try some of the following exercises:
+ Use the OpenSearch Python client to work with vector search collections. See this tutorial on [GitHub](https://github.com/opensearch-project/opensearch-py/blob/main/guides/plugins/knn.md). 
+ Use the OpenSearch Java client to work with vector search collections. See this tutorial on [GitHub](https://github.com/opensearch-project/opensearch-java/blob/main/guides/plugins/knn.md). 
+ Set up LangChain to use OpenSearch as a vector store. LangChain is an open source framework for developing applications powered by language models. For more information, see the [LangChain documentation](https://python.langchain.com/docs/integrations/vectorstores/opensearch).

# Using data lifecycle policies with Amazon OpenSearch Serverless
Using data lifecycle policies

A data lifecycle policy in Amazon OpenSearch Serverless defines how long OpenSearch Serverless retains data in a time series collection. For example, you can set a policy to retain log data for 30 days before OpenSearch Serverless deletes it.

You can configure a separate policy for each index within each time series collection in your AWS account. OpenSearch Serverless retains documents for at least the duration that you specify in the policy. It then deletes the documents automatically on a best-effort basis, typically within 48 hours or 10% of the retention period, whichever is longer.

Only time series collections support data lifecycle policies. Search and vector search collections do not.

**Topics**
+ [

## Data lifecycle policies
](#serverless-lifecycle-policies)
+ [

## Required permissions
](#serverless-lifecycle-permissions)
+ [

## Policy precedence
](#serverless-lifecycle-precedence)
+ [

## Policy syntax
](#serverless-lifecycle-syntax)
+ [

## Creating data lifecycle policies
](#serverless-lifecycle-create)
+ [

## Updating data lifecycle policies
](#serverless-lifecycle-update)
+ [

## Deleting data lifecycle policies
](#serverless-lifecycle-delete)

## Data lifecycle policies


In a data lifecycle policy, you specify a series of rules. The data lifecycle policy lets you manage the retention period of data associated to indexes or collections that match these rules. These rules define the retention period for data in an index or group of indexes. Each rule consists of a resource type (`index`), a retention period, and a list of resources (indexes) that the retention period applies to.

You define the retention period with one of the following formats:
+ `"MinIndexRetention": "24h"` – OpenSearch Serverless retains index data for the specified period in hours or days. You can set this period to be from `24h` to `3650d`.
+ `"NoMinIndexRetention": true` – OpenSearch Serverless retains index data indefinitely.

In the following sample policy, the first rule specifies a retention period of 15 days for all indexes within the collection `marketing`. The second rule specifies that all index names that begin with `log` in the `finance` collection have no retention period set and will be retained indefinitely.

```
{
   "lifeCyclePolicyDetail": {
      "type": "retention",
      "name": "my-policy",
      "policyVersion": "MTY4ODI0NTM2OTk1N18x",
      "policy": {
         "Rules": [
            {
            "ResourceType":"index",
            "Resource":[
               "index/marketing/*"
            ],
            "MinIndexRetention": "15d"
         },
         {
            "ResourceType":"index",
            "Resource":[
               "index/finance/log*"
            ],
            "NoMinIndexRetention": true
         }
         ]
      },
      "createdDate": 1688245369957,
      "lastModifiedDate": 1688245369957
   }
}
```

In the following sample policy rule, OpenSearch Serverless indefinitely retains the data in all indexes for all collections within the account.

```
{
   "Rules": [
      {
         "ResourceType": "index",
         "Resource": [
            "index/*/*"
         ]
      }
   ],
   "NoMinIndexRetention": true
}
```

## Required permissions


Lifecycle policies for OpenSearch Serverless use the following AWS Identity and Access Management (IAM) permissions. You can specify IAM conditions to restrict users to data lifecycle policies associated with specific collections and indexes.
+ `aoss:CreateLifecyclePolicy` – Create a data lifecycle policy.
+ `aoss:ListLifecyclePolicies` – List all data lifecycle policies in the current account.
+ `aoss:BatchGetLifecyclePolicy` – View a data lifecycle policy associated with an account or policy name.
+ `aoss:BatchGetEffectiveLifecyclePolicy` – View a data lifecycle policy for a given resource (`index` is the only supported resource).
+ `aoss:UpdateLifecyclePolicy` – Modify a given data lifecycle policy, and change its retention setting or resource.
+ `aoss:DeleteLifecyclePolicy` – Delete a data lifecycle policy.

The following identity-based access policy allows a user to view all data lifecycle policies, and update policies with the resource pattern `index/application-logs`:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "aoss:UpdateLifecyclePolicy"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aoss:collection": "application-logs"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "aoss:ListLifecyclePolicies",
                "aoss:BatchGetLifecyclePolicy"
            ],
            "Resource": "*"
        }
    ]
}
```

------

## Policy precedence


There can be situations where data lifecycle policy rules overlap, within or across policies. When this happens, a rule with a more specific resource name or pattern for an index overrides a rule with a more general resource name or pattern for any indexes that are common to *both* rules.

For example, in the following policy, two rules apply to an index `index/sales/logstash`. In this situation, the second rule takes precedence because `index/sales/log*` is the longest match to `index/sales/logstash`. Therefore, OpenSearch Serverless sets no retention period for the index.

```
{
      "Rules":[
         {
            "ResourceType":"index",
            "Resource":[
               "index/sales/*",
            ],
            "MinIndexRetention": "15d"
         },
         {
            "ResourceType":"index",
            "Resource":[
               "index/sales/log*",
            ],
            "NoMinIndexRetention": true
         }
      ]
   }
```

## Policy syntax


Provide one or more *rules*. These rules define data lifecycle settings for your OpenSearch Serverless indexes.

Each rule contains the following elements. You can either provide `MinIndexRetention` or `NoMinIndexRetention` in each rule, but not both. 


| Element | Description | 
| --- | --- | 
| Resource type | The type of resource that the rule applies to. The only supported option for data lifecycle policies is index. | 
| Resource | A list of resource names and/or patterns. Patterns consist of a prefix and a wildcard (\$1), which allow the associated permissions to apply to multiple resources. For example, index/<collection-name\$1pattern>/<index-name\$1pattern>. | 
| MinIndexRetention | The minimum period, in days (d) or hours (h), to retain the document in the index. The lower bound is 24h and the upper bound is 3650d. | 
| NoMinIndexRetention | If true, OpenSearch Serverless retains documents indefinitely. | 

In the following example, the first rule applies to all indexes under the `autoparts-inventory` pattern (`index/autoparts-inventory/*`) and requires data to be retained for at least 20 days before any actions, such as deletion or archiving, can occur. 

The second rule targets indexes matching the `auto*/gear` pattern (`index/auto*/gear`), setting a minimum retention period of 24 hours.

The third rule applies specifically to the `tires` index and has no minimum retention period, meaning that data in this index can be deleted or archived immediately or based on other criteria. These rules help manage the retention of index data with varying retention times or no retention restrictions.

```
{
  "Rules": [
    {
      "ResourceType": "index",
      "Resource": [
        "index/autoparts-inventory/*"
      ],
      "MinIndexRetention": "20d"
    },
    {
      "ResourceType": "index",
      "Resource": [
        "index/auto*/gear"
      ],
      "MinIndexRetention": "24h"
    },
    {
      "ResourceType": "index",
      "Resource": [
        "index/autoparts-inventory/tires"
      ],
      "NoMinIndexRetention": true
    }
  ]
}
```

## Creating data lifecycle policies


To create a data lifecycle policy, you define rules that manage the retention and deletion of your data based on specified criteria. 

### Console


**To create a data lifecycle policy**

1. Sign in to the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home).

1. In the left navigation pane, choose **Data lifecycle policies**.

1. Choose **Create data lifecycle policy**.

1. Enter a descriptive name for the policy.

1. For **Data lifecycle**, choose **Add** and select the collections and indexes for the policy. 

   Start by choosing the collections to which the indexes belong. Then, either choose the index from the list or enter an index pattern. To select all collections as sources, enter an asterisk (`*`).

1. For **Data retention**, you can either choose to retain the data indefinitely, or deselect **Unlimited (never delete)** and specify a time period after which OpenSearch Serverless automatically deletes the data from Amazon S3.

1. Choose **Save**, then **Create**.

### AWS CLI


To create a data lifecycle policy using the AWS CLI, use the [create-lifecycle-policy](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/create-lifecycle-policy.html) command with the following options:
+ `--name` – The name of the policy.
+ `--type` – The type of policy. Currently, the only available value is `retention`.
+ `--policy` – The data lifecycle policy. This parameter accepts both inline policies and .json files. You must encode inline policies as a JSON escaped string. To provide the policy in a file, use the format `--policy file://my-policy.json`.

**Example**  

```
aws opensearchserverless create-lifecycle-policy \
  --name my-policy \
  --type retention \
  --policy "{\"Rules\":[{\"ResourceType\":\"index\",\"Resource\":[\"index/autoparts-inventory/*\"],\"MinIndexRetention\": \"81d\"},{\"ResourceType\":\"index\",\"Resource\":[\"index/sales/orders*\"],\"NoMinIndexRetention\":true}]}"
```

## Updating data lifecycle policies


To update a data lifecycle policy, you can modify existing rules to reflect changes in your data retention or deletion requirements. This allows you to adapt your policies as your data management needs evolve.

There might be a few minutes of lag time between when you update the policy and when OpenSearch Serverless starts to enforce the new retention periods.

### Console


**To update a data lifecycle policy**

1. Sign in to the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home).

1. In the left navigation pane, choose **Data lifecycle policies**.

1. Select the data lifecycle policy that you want to update, then choose **Edit**.

1. Modify the policy using the visual editor or the JSON editor.

1. Choose **Save**.

### AWS CLI


To update a data lifecycle policy using the AWS CLI, use the [update-lifecycle-policy](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/update-lifecycle-policy.html) command. 

You must include the `--policy-version` parameter in the request. You can retrieve the policy version by using the [list-lifecycle-policies](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/list-lifecycle-policies.html) or [batch-get-lifecycle-policy](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/batch-get-lifecycle-policy.html) commands. We recommend including the most recent policy version to prevent accidentally overwriting changes made by others.

The following request updates a data lifecycle policy with a new policy JSON document.

**Example**  

```
aws opensearchserverless update-lifecycle-policy \
  --name my-policy \
  --type retention \
  --policy-version MTY2MzY5MTY1MDA3Ml8x \
  --policy file://my-new-policy.json
```

## Deleting data lifecycle policies


When you delete a data lifecycle policy, OpenSearch Serverless no longer enforces it on any matching indexes.

### Console


**To delete a data lifecycle policy**

1. Sign in to the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home).

1. In the left navigation pane, choose **Data lifecycle policies**.

1. Select the policy that you want to delete, then choose **Delete** and confirm deletion.

### AWS CLI


To delete a data lifecycle policy using the AWS CLI, use the [delete-lifecycle-policy](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/delete-lifecycle-policy.html) command.

**Example**  

```
aws opensearchserverless delete-lifecycle-policy \
  --name my-policy \
  --type retention
```

# Using the AWS SDKs to interact with Amazon OpenSearch Serverless
Managing collections with the AWS SDKs

This section includes examples of how to use the AWS SDKs to interact with Amazon OpenSearch Serverless. These code samples show how to create security policies and collections, and how to query collections.

**Note**  
We're currently building out these code samples. If you want to contribute a code sample (Java, Go, etc.), please open a pull request directly within the [GitHub repository](https://github.com/awsdocs/amazon-opensearch-service-developer-guide/blob/master/doc_source/serverless-sdk.md).

**Topics**
+ [

## Python
](#serverless-sdk-python)
+ [

## JavaScript
](#serverless-sdk-javascript)

## Python


The following sample script uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/opensearchserverless.html), as well as the [opensearch-py](https://pypi.org/project/opensearch-py/) client for Python, to create encryption, network, and data access policies, create a matching collection, and index some sample data.

To install the required dependencies, run the following commands:

```
pip install opensearch-py
pip install boto3
pip install botocore
pip install requests-aws4auth
```

Within the script, replace the `Principal` element with the Amazon Resource Name (ARN) of the user or role that's signing the request. You can also optionally modify the `region`.

```
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
import botocore
import time

# Build the client using the default credential configuration.
# You can use the CLI and run 'aws configure' to set access key, secret
# key, and default region.

client = boto3.client('opensearchserverless')
service = 'aoss'
region = 'us-east-1'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)


def createEncryptionPolicy(client):
    """Creates an encryption policy that matches all collections beginning with tv-"""
    try:
        response = client.create_security_policy(
            description='Encryption policy for TV collections',
            name='tv-policy',
            policy="""
                {
                    \"Rules\":[
                        {
                            \"ResourceType\":\"collection\",
                            \"Resource\":[
                                \"collection\/tv-*\"
                            ]
                        }
                    ],
                    \"AWSOwnedKey\":true
                }
                """,
            type='encryption'
        )
        print('\nEncryption policy created:')
        print(response)
    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ConflictException':
            print(
                '[ConflictException] The policy name or rules conflict with an existing policy.')
        else:
            raise error


def createNetworkPolicy(client):
    """Creates a network policy that matches all collections beginning with tv-"""
    try:
        response = client.create_security_policy(
            description='Network policy for TV collections',
            name='tv-policy',
            policy="""
                [{
                    \"Description\":\"Public access for TV collection\",
                    \"Rules\":[
                        {
                            \"ResourceType\":\"dashboard\",
                            \"Resource\":[\"collection\/tv-*\"]
                        },
                        {
                            \"ResourceType\":\"collection\",
                            \"Resource\":[\"collection\/tv-*\"]
                        }
                    ],
                    \"AllowFromPublic\":true
                }]
                """,
            type='network'
        )
        print('\nNetwork policy created:')
        print(response)
    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ConflictException':
            print(
                '[ConflictException] A network policy with this name already exists.')
        else:
            raise error


def createAccessPolicy(client):
    """Creates a data access policy that matches all collections beginning with tv-"""
    try:
        response = client.create_access_policy(
            description='Data access policy for TV collections',
            name='tv-policy',
            policy="""
                [{
                    \"Rules\":[
                        {
                            \"Resource\":[
                                \"index\/tv-*\/*\"
                            ],
                            \"Permission\":[
                                \"aoss:CreateIndex\",
                                \"aoss:DeleteIndex\",
                                \"aoss:UpdateIndex\",
                                \"aoss:DescribeIndex\",
                                \"aoss:ReadDocument\",
                                \"aoss:WriteDocument\"
                            ],
                            \"ResourceType\": \"index\"
                        },
                        {
                            \"Resource\":[
                                \"collection\/tv-*\"
                            ],
                            \"Permission\":[
                                \"aoss:CreateCollectionItems\"
                            ],
                            \"ResourceType\": \"collection\"
                        }
                    ],
                    \"Principal\":[
                        \"arn:aws:iam::123456789012:role\/Admin\"
                    ]
                }]
                """,
            type='data'
        )
        print('\nAccess policy created:')
        print(response)
    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ConflictException':
            print(
                '[ConflictException] An access policy with this name already exists.')
        else:
            raise error


def createCollection(client):
    """Creates a collection"""
    try:
        response = client.create_collection(
            name='tv-sitcoms',
            type='SEARCH'
        )
        return(response)
    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'ConflictException':
            print(
                '[ConflictException] A collection with this name already exists. Try another name.')
        else:
            raise error


def waitForCollectionCreation(client):
    """Waits for the collection to become active"""
    response = client.batch_get_collection(
        names=['tv-sitcoms'])
    # Periodically check collection status
    while (response['collectionDetails'][0]['status']) == 'CREATING':
        print('Creating collection...')
        time.sleep(30)
        response = client.batch_get_collection(
            names=['tv-sitcoms'])
    print('\nCollection successfully created:')
    print(response["collectionDetails"])
    # Extract the collection endpoint from the response
    host = (response['collectionDetails'][0]['collectionEndpoint'])
    final_host = host.replace("https://", "")
    indexData(final_host)


def indexData(host):
    """Create an index and add some sample data"""
    # Build the OpenSearch client
    client = OpenSearch(
        hosts=[{'host': host, 'port': 443}],
        http_auth=awsauth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        timeout=300
    )
    # It can take up to a minute for data access rules to be enforced
    time.sleep(45)

    # Create index
    response = client.indices.create('sitcoms-eighties')
    print('\nCreating index:')
    print(response)

    # Add a document to the index.
    response = client.index(
        index='sitcoms-eighties',
        body={
            'title': 'Seinfeld',
            'creator': 'Larry David',
            'year': 1989
        },
        id='1',
    )
    print('\nDocument added:')
    print(response)


def main():
    createEncryptionPolicy(client)
    createNetworkPolicy(client)
    createAccessPolicy(client)
    createCollection(client)
    waitForCollectionCreation(client)


if __name__ == "__main__":
    main()
```

## JavaScript


The following sample script uses the [SDK for JavaScript in Node.js](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-opensearchserverless/), as well as the [opensearch-js](https://www.npmjs.com/package/@opensearch-project/opensearch) client for JavaScript, to create encryption, network, and data access policies, create a matching collection, create an index, and index some sample data.

To install the required dependencies, run the following commands:

```
npm i aws-sdk
npm i aws4
npm i @opensearch-project/opensearch
```

Within the script, replace the `Principal` element with the Amazon Resource Name (ARN) of the user or role that's signing the request. You can also optionally modify the `region`.

```
var AWS = require('aws-sdk');
var aws4 = require('aws4');
var {
    Client,
    Connection
} = require("@opensearch-project/opensearch");
var {
    OpenSearchServerlessClient,
    CreateSecurityPolicyCommand,
    CreateAccessPolicyCommand,
    CreateCollectionCommand,
    BatchGetCollectionCommand
} = require("@aws-sdk/client-opensearchserverless");
var client = new OpenSearchServerlessClient();

async function execute() {
    await createEncryptionPolicy(client)
    await createNetworkPolicy(client)
    await createAccessPolicy(client)
    await createCollection(client)
    await waitForCollectionCreation(client)
}

async function createEncryptionPolicy(client) {
    // Creates an encryption policy that matches all collections beginning with 'tv-'
    try {
        var command = new CreateSecurityPolicyCommand({
            description: 'Encryption policy for TV collections',
            name: 'tv-policy',
            type: 'encryption',
            policy: " \
        { \
            \"Rules\":[ \
                { \
                    \"ResourceType\":\"collection\", \
                    \"Resource\":[ \
                        \"collection\/tv-*\" \
                    ] \
                } \
            ], \
            \"AWSOwnedKey\":true \
        }"
        });
        const response = await client.send(command);
        console.log("Encryption policy created:");
        console.log(response['securityPolicyDetail']);
    } catch (error) {
        if (error.name === 'ConflictException') {
            console.log('[ConflictException] The policy name or rules conflict with an existing policy.');
        } else
            console.error(error);
    };
}

async function createNetworkPolicy(client) {
    // Creates a network policy that matches all collections beginning with 'tv-'
    try {
        var command = new CreateSecurityPolicyCommand({
            description: 'Network policy for TV collections',
            name: 'tv-policy',
            type: 'network',
            policy: " \
            [{ \
                \"Description\":\"Public access for television collection\", \
                \"Rules\":[ \
                    { \
                        \"ResourceType\":\"dashboard\", \
                        \"Resource\":[\"collection\/tv-*\"] \
                    }, \
                    { \
                        \"ResourceType\":\"collection\", \
                        \"Resource\":[\"collection\/tv-*\"] \
                    } \
                ], \
                \"AllowFromPublic\":true \
            }]"
        });
        const response = await client.send(command);
        console.log("Network policy created:");
        console.log(response['securityPolicyDetail']);
    } catch (error) {
        if (error.name === 'ConflictException') {
            console.log('[ConflictException] A network policy with that name already exists.');
        } else
            console.error(error);
    };
}

async function createAccessPolicy(client) {
    // Creates a data access policy that matches all collections beginning with 'tv-'
    try {
        var command = new CreateAccessPolicyCommand({
            description: 'Data access policy for TV collections',
            name: 'tv-policy',
            type: 'data',
            policy: " \
            [{ \
                \"Rules\":[ \
                    { \
                        \"Resource\":[ \
                            \"index\/tv-*\/*\" \
                        ], \
                        \"Permission\":[ \
                            \"aoss:CreateIndex\", \
                            \"aoss:DeleteIndex\", \
                            \"aoss:UpdateIndex\", \
                            \"aoss:DescribeIndex\", \
                            \"aoss:ReadDocument\", \
                            \"aoss:WriteDocument\" \
                        ], \
                        \"ResourceType\": \"index\" \
                    }, \
                    { \
                        \"Resource\":[ \
                            \"collection\/tv-*\" \
                        ], \
                        \"Permission\":[ \
                            \"aoss:CreateCollectionItems\" \
                        ], \
                        \"ResourceType\": \"collection\" \
                    } \
                ], \
                \"Principal\":[ \
                    \"arn:aws:iam::123456789012:role\/Admin\" \
                ] \
            }]"
        });
        const response = await client.send(command);
        console.log("Access policy created:");
        console.log(response['accessPolicyDetail']);
    } catch (error) {
        if (error.name === 'ConflictException') {
            console.log('[ConflictException] An access policy with that name already exists.');
        } else
            console.error(error);
    };
}

async function createCollection(client) {
    // Creates a collection to hold TV sitcoms indexes
    try {
        var command = new CreateCollectionCommand({
            name: 'tv-sitcoms',
            type: 'SEARCH'
        });
        const response = await client.send(command);
        return (response)
    } catch (error) {
        if (error.name === 'ConflictException') {
            console.log('[ConflictException] A collection with this name already exists. Try another name.');
        } else
            console.error(error);
    };
}

async function waitForCollectionCreation(client) {
    // Waits for the collection to become active
    try {
        var command = new BatchGetCollectionCommand({
            names: ['tv-sitcoms']
        });
        var response = await client.send(command);
        while (response.collectionDetails[0]['status'] == 'CREATING') {
            console.log('Creating collection...')
            await sleep(30000) // Wait for 30 seconds, then check the status again
            function sleep(ms) {
                return new Promise((resolve) => {
                    setTimeout(resolve, ms);
                });
            }
            var response = await client.send(command);
        }
        console.log('Collection successfully created:');
        console.log(response['collectionDetails']);
        // Extract the collection endpoint from the response
        var host = (response.collectionDetails[0]['collectionEndpoint'])
        // Pass collection endpoint to index document request
        indexDocument(host)
    } catch (error) {
        console.error(error);
    };
}

async function indexDocument(host) {

    var client = new Client({
        node: host,
        Connection: class extends Connection {
            buildRequestObject(params) {
                var request = super.buildRequestObject(params)
                request.service = 'aoss';
                request.region = 'us-east-1'; // e.g. us-east-1
                var body = request.body;
                request.body = undefined;
                delete request.headers['content-length'];
                request.headers['x-amz-content-sha256'] = 'UNSIGNED-PAYLOAD';
                request = aws4.sign(request, AWS.config.credentials);
                request.body = body;

                return request
            }
        }
    });

    // Create an index
    try {
        var index_name = "sitcoms-eighties";

        var response = await client.indices.create({
            index: index_name
        });

        console.log("Creating index:");
        console.log(response.body);

        // Add a document to the index
        var document = "{ \"title\": \"Seinfeld\", \"creator\": \"Larry David\", \"year\": \"1989\" }\n";

        var response = await client.index({
            index: index_name,
            body: document
        });

        console.log("Adding document:");
        console.log(response.body);
    } catch (error) {
        console.error(error);
    };
}

execute()
```

# Using CloudFormation to create Amazon OpenSearch Serverless collections
Creating collections with CloudFormation

You can use CloudFormation to create Amazon OpenSearch Serverless resources such as collections, security policies, and VPC endpoints. For the comprehensive OpenSearch Serverless CloudFormation reference, see [Amazon OpenSearch Serverless](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_OpenSearchServerless.html) in the *CloudFormation User Guide*.

The following sample CloudFormation template creates a simple data access policy, network policy, and security policy, as well as a matching collection. It's a good way to get up and running quickly with Amazon OpenSearch Serverless and provision the necessary elements to create and use a collection.

**Important**  
This example uses public network access, which isn't recommended for production workloads. We recommend using VPC access to protect your collections. For more information, see [AWS::OpenSearchServerless::VpcEndpoint](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-opensearchserverless-vpcendpoint.html) and [Data plane access through AWS PrivateLink](serverless-vpc.md).

```
AWSTemplateFormatVersion: 2010-09-09
Description: 'Amazon OpenSearch Serverless template to create an IAM user, encryption policy, data access policy and collection'
Resources:
  IAMUSer:
    Type: 'AWS::IAM::User'
    Properties:
      UserName:  aossadmin
  DataAccessPolicy:
    Type: 'AWS::OpenSearchServerless::AccessPolicy'
    Properties:
      Name: quickstart-access-policy
      Type: data
      Description: Access policy for quickstart collection
      Policy: !Sub >-
        [{"Description":"Access for cfn user","Rules":[{"ResourceType":"index","Resource":["index/*/*"],"Permission":["aoss:*"]},
        {"ResourceType":"collection","Resource":["collection/quickstart"],"Permission":["aoss:*"]}],
        "Principal":["arn:aws:iam::${AWS::AccountId}:user/aossadmin"]}]
  NetworkPolicy:
    Type: 'AWS::OpenSearchServerless::SecurityPolicy'
    Properties:
      Name: quickstart-network-policy
      Type: network
      Description: Network policy for quickstart collection
      Policy: >-
        [{"Rules":[{"ResourceType":"collection","Resource":["collection/quickstart"]}, {"ResourceType":"dashboard","Resource":["collection/quickstart"]}],"AllowFromPublic":true}]
  EncryptionPolicy:
    Type: 'AWS::OpenSearchServerless::SecurityPolicy'
    Properties:
      Name: quickstart-security-policy
      Type: encryption
      Description: Encryption policy for quickstart collection
      Policy: >-
        {"Rules":[{"ResourceType":"collection","Resource":["collection/quickstart"]}],"AWSOwnedKey":true}
  Collection:
    Type: 'AWS::OpenSearchServerless::Collection'
    Properties:
      Name: quickstart
      Type: TIMESERIES
      Description: Collection to holds timeseries data
    DependsOn: EncryptionPolicy
Outputs:
  IAMUser:
    Value: !Ref IAMUSer
  DashboardURL:
    Value: !GetAtt Collection.DashboardEndpoint
  CollectionARN:
    Value: !GetAtt Collection.Arn
```

# Backing up collections using snapshots


Snapshots are point-in-time backups of your Amazon OpenSearch Serverless collections that provide disaster recovery capabilities. OpenSearch Serverless automatically creates and manages snapshots of your collections, ensuring business continuity and data protection. Each snapshot contains index metadata (settings and mappings for your indexes), cluster metadata (index templates and aliases), and index data (all documents and data stored in your indexes).

OpenSearch Serverless provides automatic hourly backups with no manual configuration, zero maintenance overhead, no additional storage costs, quick recovery from accidental data loss, and the ability to restore specific indexes from a snapshot.

Before working with snapshots, understand these important considerations. Creating a snapshot takes time to complete and isn't instantaneous. New documents or updates during snapshot creation will not be included in the snapshot. You can restore snapshots only to their original collection and not to a new one. When restored, indexes receive new UUIDs that differ from their original versions. Restoring to an existing open index in OpenSearch Serverless will overwrite the data of that index provided a new index name or a prefix pattern is not provided.This differs from OpenSearch core behavior. You can run only one restore operation at a time, and you can't start multiple restore operations on the same collection simultaneously. Attempting to restore indexes during an active restore operation causes the operation to fail. During a restore operation, your requests to the indexes fail.

## Required permissions


To work with snapshots, configure the following permissions in your data access policy. For more information about data access policies, see [Data access policies versus IAM policies](serverless-data-access.md#serverless-data-access-vs-iam).


****  

| Data Access Policy | APIs | 
| --- | --- | 
| aoss:DescribeSnapshot | GET /\$1cat/snapshots/aoss-automatedGET \$1snapshot/aoss-automated/snapshot/ | 
| aoss:RestoreSnapshot | POST /\$1snapshot/aoss-automated/snapshot/\$1restore | 
| aoss:DescribeCollectionItems | GET /\$1cat/recovery | 

You can configure policies using the following AWS CLI commands:

1.  [ create-access-policy](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/create-access-policy.html) 

1.  [ delete-access-policy ](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/delete-access-policy.html) 

1. [ get-access-policy ](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/get-access-policy.html)

1. [ update-access-policy ](https://docs.aws.amazon.com/cli/latest/reference/opensearchserverless/update-access-policy.html)

Here is a sample CLI command for creating an access policy. In the command, replace the *example* content with your specific information.

```
aws opensearchserverless create-access-policy \
--type data \
--name Example-data-access-policy \
--region aws-region \
--policy '[
  {
    "Rules": [
      {
        "Resource": [
          "collection/Example-collection"
        ],
        "Permission": [
          "aoss:DescribeSnapshot",
          "aoss:RestoreSnapshot",
          "aoss:DescribeCollectionItems"
        ],
        "ResourceType": "collection"
      }
    ],
    "Principal": [
      "arn:aws:iam::111122223333:user/UserName"
    ],
    "Description": "Data policy to support snapshot operations."
  }
]'
```

## Working with snapshots


By default, when you create a new collection, OpenSearch Serverless automatically creates snapshots every hour. You don't need to take any action. Each snapshot includes all indexes in the collection. After OpenSearch Serverless creates snapshots, you can list them and review the details of the snapshot using the following procedures.

### List snapshots


Use the following procedures to list all snapshots in a collection and review their details.

------
#### [ Console ]

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/](https://console.aws.amazon.com/aos/).

1. In the left navigation pane, choose **Serverless**, then choose **Collections**.

1. Choose the name of your collection to open its details page.

1. Choose the **Snapshots** tab to display all generated snapshots.

1. Review the snapshot information including:
   + **Snapshot ID** - Unique identifier for the snapshot
   + **Status** - Current state (Available, In Progress)
   + **Created time** - When the snapshot was taken

------
#### [ OpenSearch API ]
+ Use the following command to list all snapshots in a collection:

  ```
  GET /_cat/snapshots/aoss-automated
  ```

  OpenSearch Serverless returns a response like the following:

  ```
  id                                 status  start_epoch start_time end_epoch  end_time    duration    indexes successful_shards failed_shards total_shards
  snapshot-ExampleSnapshotID1     SUCCESS 1737964331  07:52:11   1737964382 07:53:02    50.4s       1                                             
  snapshot-ExampleSnapshotID2     SUCCESS 1737967931  08:52:11   1737967979 08:52:59    47.7s       2                                             
  snapshot-ExampleSnapshotID3     SUCCESS 1737971531  09:52:11   1737971581 09:53:01    49.1s       3                                             
  snapshot-ExampleSnapshotID4 IN_PROGRESS 1737975131  10:52:11   -          -            4.8d       3
  ```

------

### Get snapshot details


Use the following procedures to retrieve detailed information about a specific snapshot.

------
#### [ Console ]

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/](https://console.aws.amazon.com/aos/).

1. In the left navigation pane, choose **Serverless**, then choose **Collections**.

1. Choose the name of your collection to open its details page.

1. Choose the **Snapshots** tab.

1. Choose the snapshot job ID to display detailed information about the snapshot, including metadata, indexes included, and timing information.

------
#### [ OpenSearch API ]
+ Use the following command to retrieve information about a snapshot. In the command, replace the *example* content with your specific information.

  ```
  GET _snapshot/aoss-automated/snapshot/
  ```

  Example Request:

  ```
  GET _snapshot/aoss-automated/snapshot-ExampleSnapshotID1/
  ```

  Example Response:

  ```
  {
      "snapshots": [
          {
              "snapshot": "snapshot-ExampleSnapshotID1-5e01-4423-9833Example",
              "uuid": "Example-5e01-4423-9833-9e9eb757Example",
              "version_id": 136327827,
              "version": "2.11.0",
              "remote_store_index_shallow_copy": true,
              "indexes": [
                  "Example-index-0117"
              ],
              "data_streams": [],
              "include_global_state": true,
              "metadata": {},
              "state": "SUCCESS",
              "start_time": "2025-01-27T09:52:11.953Z",
              "start_time_in_millis": 1737971531953,
              "end_time": "2025-01-27T09:53:01.062Z",
              "end_time_in_millis": 1737971581062,
              "duration_in_millis": 49109,
              "failures": [],
              "shards": {
                  "total": 0,
                  "failed": 0,
                  "successful": 0
              }
          }
      ]
  }
  ```

------

The snapshot response includes several key fields: `id` provides a unique identifier for the snapshot operation, `status` returns the current state `SUCCESS` or `IN_PROGRESS`, `duration` indicates the time taken to complete the snapshot operation, and `indexes` returns the number of indexes included in the snapshot.

## Restoring from a snapshot


Restoring from a snapshot recovers data from a previously taken backup. This process is crucial for disaster recovery and data management in OpenSearch Serverless. Before restoring, understand that restored indexes will have different UUIDs than their original versions, restoring to an existing open index in OpenSearch Serverless will overwrite the data of that index provided a new index name or a prefix pattern is not provided, snapshots can only be restored to their original collection (cross-collection restoration is not supported), and restore operations will impact cluster performance so plan accordingly.

Use the following procedures to restore backed up indexes from a snapshot.

------
#### [ Console ]

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/](https://console.aws.amazon.com/aos/).

1. In the left navigation pane, choose **Serverless**, then choose **Collections**.

1. Choose the name of your collection to open its details page.

1. Choose the **Snapshots** tab to display available snapshots.

1. Choose the snapshot you want to restore from, then choose **Restore from snapshot**.

1. In the **Restore from snapshot** dialog:
   + For **Snapshot name**, verify the selected snapshot ID.
   + For **Snapshot scope**, choose either:
     + **All indexes in collection** - Restore all indexes from the snapshot
     + **Specific indexes** - Select individual indexes to restore
   + For **Destination**, choose the collection to restore to.
   + (Optional) Configure **Rename settings** to rename restored indexes:
     + **Do not rename** - Keep original index names
     + **Add prefix to restored index names** - Add a prefix to avoid conflicts
     + **Rename using regular expression** - Use advanced renaming patterns
   + (Optional) Configure **Notification** settings to be notified when the restore completes or encounters errors.

1. Choose **Save** to start the restore operation.

------
#### [ OpenSearch API ]

1. Run the following command to identify the appropriate snapshot.

   ```
   GET /_snapshot/aoss-automated/_all
   ```

   For a smaller list of snapshots, run the following command.

   ```
   GET /_cat/snapshots/aoss-automated
   ```

1. Run the following command to verify the details of the snapshot before restoring. In the command, replace the *example* content with your specific information.

   ```
   GET _snapshot/aoss-automated/snapshot-ExampleSnapshotID1/
   ```

1. Run the following command to restore from a specific snapshot.

   ```
   POST /_snapshot/aoss-automated/snapshot-ID/_restore
   ```

   You can customize the restore operation by including a request body. Here's an example.

   ```
   POST /_snapshot/aoss-automated/snapshot-ExampleSnapshotID1-5e01-4423-9833Example/_restore
   {
     "indices": "opensearch-dashboards*,my-index*",
     "ignore_unavailable": true,
     "include_global_state": false,
     "include_aliases": false,
     "rename_pattern": "opensearch-dashboards(.+)",
     "rename_replacement": "restored-opensearch-dashboards$1"
   }
   ```

1. Run the following command to view the restore progress.

   ```
   GET /_cat/recovery
   ```

------

**Note**  
When restoring a snapshot with a command that includes a request body, you can use several parameters to control the restore behavior. The `indices` parameter specifies which indices to restore and supports wildcard patterns. Set `ignore_unavailable` to continue the restore operation even if an index in the snapshot is missing. Use `include_global_state` to determine whether to restore the cluster state, and `include_aliases` to control whether to restore associated aliases. The `rename_pattern` and `rename_replacement` parameters rename indexes during the restore operation.

# Zstandard Codec Support in Amazon OpenSearch Serverless


Index codecs determine how an index's stored fields are compressed and stored on disk and in S3. The index codec is controlled by the static `index.codec` setting that specifies the compression algorithm. This setting impacts both index shard size and index operation performance.

By default, indexes in OpenSearch Serverless use the default codec with the LZ4 compression algorithm. OpenSearch Serverless also supports `zstd` and `zstd_no_dict` codecs with configurable compression levels from 1 to 6.

**Important**  
Since `index.codec` is a static setting, it cannot be changed after index creation.

For more details, refer to the [OpenSearch Index Codecs documentation](https://opensearch.org/docs/latest/im-plugin/index-codecs/).

## Creating an index with ZSTD codec


You can specify the ZSTD codec during index creation using the `index.codec` setting:

```
PUT /your_index
{
  "settings": {
    "index.codec": "zstd"
  }
}
```

## Compression levels


ZSTD codecs support optional compression levels via the `index.codec.compression_level` setting, accepting integers in the range [1, 6]. Higher compression levels result in better compression ratios (smaller storage) but slower compression and decompression speeds. The default compression level is 3.

```
PUT /your_index
{
  "settings": {
    "index.codec": "zstd",
    "index.codec.compression_level": 2
  }
}
```

## Performance benchmarking


Based on benchmark testing with the nyc\$1taxi dataset, ZSTD compression achieved 26-32% better compression compared to baseline across different combinations of `zstd`, `zstd_no_dict`, and compression levels.


| Metric | ZSTD L1 | ZSTD L6 | ZSTD\$1NO\$1DICT L1 | ZSTD\$1NO\$1DICT L6 | 
| --- | --- | --- | --- | --- | 
| Index Size Reduction | 28.10% | 32% | 26.90% | 28.70% | 
| Indexing Throughput Change | -0.50% | -23.80% | -0.50% | -5.30% | 
| Match-all Query p90 Latency Improvement | -16.40% | 29.50% | -16.40% | 23.40% | 
| Range Query p90 Latency Improvement | 90.90% | 92.40% | -282.90% | 92.50% | 
| Distance Amount p90 Agg Latency Improvement | 2% | 24.70% | 2% | 13.80% | 

For more details, refer to the [AWS OpenSearch blog](https://aws.amazon.com/blogs/big-data/optimize-storage-costs-in-amazon-opensearch-service-using-zstandard-compression/).

# Save Storage by Using Derived Source


By default, OpenSearch Serverless stores each ingested document in the `_source` field, which contains the original JSON document body, and indexes individual fields for search. While the `_source` field is not searchable, it is retained so that the full document can be returned when executing fetch requests, such as get and search. When derived source is enabled, OpenSearch Serverless skips storing the `_source` field and instead reconstructs it dynamically on demand — for example, during search, get, mget, reindex, or update operations. Using the derived source setting can reduce storage usage by up to 50%.

## Configuration


To configure derived source for your index, create the index using the `index.derived_source.enabled` setting:

```
PUT my-index1
{
  "settings": {
    "index": {
      "derived_source": {
        "enabled": true
      }
    }
  }
}
```

## Important considerations

+ Only certain field types are supported. For a list of supported fields and limitations, refer to the [OpenSearch documentation](https://docs.opensearch.org/latest/mappings/metadata-fields/source/#supported-fields-and-parameters). If you create an index with derived source and an unsupported field, index creation will fail. If you attempt to ingest a document with an unsupported field in a derived source-enabled index, ingestion will fail. Use this feature only when you are aware of the field types that will be added to your index.
+ The setting `index.derived_source.enabled` is a static setting. This cannot be changed after the index is created.

## Limitations on query responses


When derived source is enabled, it imposes certain limitations on how query responses are generated and returned.
+ Date fields with multiple formats specified always use the first format in the list for all requested documents, regardless of the original ingested format.
+ Geopoint values are returned in a fixed `{"lat": lat_val, "lon": lon_val}` format and may lose some precision.
+ Multi-value arrays may be sorted, and keyword fields may be deduplicated.

For more details, refer to the [OpenSearch blog](https://opensearch.org/blog/save-up-to-2x-on-storage-with-derived-source/).

## Performance benchmarking


Based on benchmark testing with the nyc\$1taxi dataset, derived source achieved 58% reduction in index size compared to baseline.


| Metric | Derived Source | 
| --- | --- | 
| Index Size Reduction | 58.3% | 
| Indexing Throughput Change | 3.7% | 
| Indexing p90 Latency Change | 6.9% | 
| Match-all Query p90 Latency Improvement | 19% | 
| Range Query p90 Latency Improvement | -18.8% | 
| Distance Amount p90 Agg Latency Improvement | -7.3% | 

For more details, refer to the [OpenSearch blog](https://opensearch.org/blog/save-up-to-2x-on-storage-with-derived-source/).