

# Stage 4 – Data migration
<a name="stage-4-data-migration"></a>

Now that your target environment is ready, you can implement the data migration strategy that you chose during the planning stage.

This section covers implementation steps for the four different patterns:
+ [Building from a snapshot](build-from-snapshot.md)
+ [Building from the source](build-from-source.md)
+ [Remote reindexing](remote-reindexing.md)
+ [Using Logstash](logstash.md)

# 1. Building from a snapshot
<a name="build-from-snapshot"></a>

When you use the snapshot-restore approach, you copy data from the source Elasticsearch or OpenSearch cluster to target Amazon OpenSearch Service domain.

Broadly, the snapshot-restore process consists of the following steps:

1. Take a snapshot of the necessary data (indexes) from the existing cluster, and upload the snapshot to an S3 bucket.

1. Create an Amazon OpenSearch Service domain.

1. Give Amazon OpenSearch Service permissions to access the bucket, and give your user account permissions to work with snapshots. Create a snapshot repository and point that to your bucket.

1. Restore the snapshot on the Amazon OpenSearch Service domain.

1. Point your client applications to the Amazon OpenSearch Service domain.

1. Create Index State Management (ISM) policies for configuring retention (optional).

Snapshots are incremental. Therefore, a snapshot can be run and restored incrementally. By using snapshots, you can extract data in bulk as files on a storage system (for example, Amazon S3). You can then load these files in the target environment by using the `_restore` API operation. This eliminates the need for reindexing, which is time consuming, and it also reduces network traffic.

## Snapshot considerations
<a name="snapshot-considerations"></a>

When using the snapshot-restore approach, consider the following:
+ You can't search or reindex while an index is being restored. However, you can search and reindex an index while the snapshot is being taken.
+ The source and target Elasticsearch or OpenSearch versions must be compatible. A snapshot of an index that was created in:
  + 5.x can be restored to 6.x
  + 2.x can be restored to 5.x
  + 1.x can be restored to 2.x
+ Because this is a point-in-time restoration of the Elasticsearch or OpenSearch snapshot, subsequent changes in the source cluster won't be replicated to the target Amazon OpenSearch Service domain. You can stop ingestion of the data into the source Elasticsearch or OpenSearch cluster until the restoration is done, or you can repeat the snapshot restore process a few times. Because the snapshot is incremental, only the changes will be copied and restored in the target environment in less time than the first restore. After restoration is successfully finished, you point the ingestion applications to the Amazon OpenSearch Service domain.
+ Taking a snapshot includes, by default, a snapshot of the cluster state and all indexes. When migrating from Elasticsearch, you might need to create equivalent index lifecycle policies in the target environment using the ISM feature in OpenSearch. Elasticsearch Index Lifecycle Management (ILM) is not supported in Amazon OpenSearch Service.
+ You can't restore a snapshot to an earlier version of Elasticsearch or OpenSearch. For example, you can't restore a snapshot of version 7.10 to 7.9. Similarly, you can't restore snapshots from Elasticsearch 7.11 or later to an Amazon OpenSearch Service domain. If you have migrated your self-managed Elasticsearch environment to version 7.11 or later, you can use Logstash to load data from the Elasticsearch cluster and write it to the OpenSearch domain.
+ You export a snapshot to a designated storage location called a repository. Elasticsearch or OpenSearch creates a number of files in the repository. You can't modify or delete these files. Doing so might create inconsistencies or cause the restore process to fail.

# 2. Building from the source
<a name="build-from-source"></a>

As described earlier, building from the source is the approach where you do not migrate data from the current Elasticsearch or OpenSearch environment. Instead, you build indexes in the target domain directly from your log, or product-catalog data source or content source.

Two options are available for building from the source. The option you choose depends on the data type of your data:
+ Using AWS Database Migration Service – If the source of your data is a relational database management system (RDBMS) and the source is supported by AWS Database Migration Service (AWS DMS), you can use AWS DMS to copy data from your data source to your target Amazon OpenSearch Service domain. AWS DMS supports full load and change data capture (CDC) options. In the full load option, the AWS DMS task copies all data from source database table to a target OpenSearch index. You can use default mapping or provide custom mapping configurations. In the CDC option, AWS DMS first makes a full copy of the source table records into a target OpenSearch index. Then it captures changed data (updates and inserts) and copies it to the OpenSearch index. For more information, see the blog posts [Introducing Amazon Elasticsearch Service as a target in AWS Database Migration Service](https://aws.amazon.com/blogs/database/introducing-amazon-elasticsearch-service-as-a-target-in-aws-database-migration-service/) and [Scale Amazon Elasticsearch Service for AWS Database Migration Service migrations](https://aws.amazon.com/blogs/database/scale-amazon-elasticsearch-service-for-aws-database-migration-service-migrations/).
+ Building from the document source – If your data source is not an RDBMS or it is not supported by AWS DMS, you might have to create a custom solution using open-source tools or a combination of open-source tools and AWS services. You must convert your source data to JSON documents before it can be loaded in OpenSearch. If you already have pipelines set up from your source to your current Elasticsearch or OpenSearch environment, you can point those data pipelines to OpenSearch with appropriate changes in client libraries and (if required) data model changes in indexes in the Amazon OpenSearch Service domain. When building indexes from the source, keep in mind the following considerations:
  + The location of the documents – The documents could already be available in the AWS Cloud, in object storage such as Amazon S3, or they might be stored in an on-premises storage location such as a file system.
  + The format of the documents – The documents could already be in JSON format, ready to be ingested into the Amazon OpenSearch Service domain, or they might need to be cleansed, processed, and formatted into JSON before they can be ingested into the Amazon OpenSearch Service domain.

Building from the source involves the following high-level steps:

1. Define index mapping and settings in the Amazon OpenSearch Service domain.

1. Extract data from the document source and copy it into an object storage location such as Amazon S3. You can use an open source tool (for example, Logstash), an AWS service client (for example, Amazon Kinesis Agent), a third-party commercial tool, or a custom program.

1. Configure an open-source tool (for example, Logstash or Fluent Bit) or a native AWS service (for example, AWS Lambda or AWS DMS) to convert data into JSON documents and load it periodically or continuously from the object store to the Amazon OpenSearch Service domain.

For more information, see [Loading streaming data into Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/integrations.html).

# 3. Remote reindexing
<a name="remote-reindexing"></a>

In this case, the indexes of the source self-managed Elasticsearch or OpenSearch cluster are migrated into the Amazon OpenSearch Service domain using the [reindex document API operation](https://docs.opensearch.org/latest/api-reference/document-apis/reindex/). You can use the reindex document API operation to create an index from an existing Elasticsearch or OpenSearch index. The existing index can be in the same cluster where you run the reindex operation, or it can be in a remote cluster. Amazon OpenSearch Service supports using the reindex document API operation with remote clusters. You can reindex from an index in a self-managed Elasticsearch to an index in Amazon OpenSearch Service.

Remote reindex supports Elasticsearch 1.5 and later for the remote Elasticsearch cluster and Amazon OpenSearch Service 6.7 and later for the local domain. For more information, see the blog post [Migrate data into Amazon ES using remote reindex](https://aws.amazon.com/blogs/big-data/migrate-data-into-amazon-es-using-remote-reindex/). The blog post refers to Amazon Elasticsearch, but the guidance applies to Amazon OpenSearch Service domains equally.

# 4. Using Logstash
<a name="logstash"></a>

[Logstash](https://www.elastic.co/guide/en/logstash/current/index.html) is an open-source data processing tool that can collect data from the source, perform transformation or filtering, and send data to one or more destinations. To write data to the Amazon OpenSearch Service domain, Logstash provides the following plugins:
+ logstash-input-elasticsearch
+ logstash-input-opensearch
+ logstash-output-opensearch

For more information,  see [Loading data into Amazon OpenSearch Service with Logstash](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-logstash.html) and the OpenSearch blog post [Introducing logstash-input-opensearch plugin for OpenSearch](https://opensearch.org/blog/community/2022/05/introducing-logstash-input-opensearch-plugin-for-opensearch/).