# Automatically copy your Amazon Glacier vault archives to an S3 bucket and storage classes
<a name="overview"></a>

Data Transfer from Amazon Glacier Vaults to Amazon S3 is a serverless Guidance that automates and optimizes the restore, copy, and transfer process of [Amazon Simple Storage Service Glacier](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) (Amazon Glacier) vault archives. The Guidance copies all of the vault's archives to a defined [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) bucket destination and [storage class](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html). Then you can attach [tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) to help you categorize your data, such as with data classification or cost allocation. A prebuilt [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) dashboard provides a visualization of the copy operation progress. 

**Important**  
 Amazon S3 and Amazon Glacier are different AWS services.   
 *Amazon Glacier* is an object storage service for low-cost data archiving and long-term backup. It stores *archives* in *vaults*. It doesn't offer storage classes. The Amazon Glacier service provides a console. However, any archive operation, such as upload, download, or deletion, requires you to use the AWS CLI or write code. There is no console support for archive operations.   
 *Amazon S3* is an object storage service for any type of data. It stores *objects* in *buckets*. It offers different storage classes for frequent access, infrequent access, archives, and optimized tiering. You can interact with the Amazon S3 service by using the Amazon S3 console or [AWS Command Line Interface](https://aws.amazon.com/cli/) (AWS CLI).   
 The *Amazon Glacier Instant Retrieval, Amazon Glacier Flexible Retrieval, and S3 Glacier Deep Archive storage classes* are features of the Amazon S3 service. The Amazon Glacier Flexible Retrieval storage class offers the same features as the Amazon Glacier service. The Amazon Glacier service doesn't offer storage classes. 

 For example, Saanvi works at AnyCompany Archives. Five years ago, she used the Amazon Glacier service to store scanned copies of historical documents in a vault. AnyCompany just announced that they will have a different online exhibit each month, featuring documents that are stored in the Amazon Glacier vault. To address this change of business: 
+  Saanvi wants to take advantage of the storage classes offered with the Amazon S3 service, including more flexibility in how files are stored and accessed. 
+  Using Data Transfer from Amazon Glacier Vaults to Amazon S3, Saanvi can copy all of her document archives from her Amazon Glacier vault to an S3 bucket. She can assign them to the S3 storage classes that best fit her use cases. For example, she can use the S3 Standard storage class for documents that will be featured in the first exhibit and accessed daily, and the S3 Glacier Deep Archive storage class for documents that won't be featured in any of the exhibits. 
+  Now that the documents are stored in the Amazon S3 service, Saanvi can also apply [S3 Lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) configurations, tag her data, and use the Amazon S3 console. 

**Note**  
 This Guidance doesn't delete the original archives or the source Amazon Glacier vault. You must manually delete the archives and vault. For more information, refer to [Deleting an Archive in Amazon Glacier](https://docs.aws.amazon.com/amazonglacier/latest/dev/deleting-an-archive.html) in the *Amazon Glacier Developer Guide*.   
 If your source Amazon Glacier vault has a [Vault Lock policy](https://docs.aws.amazon.com/amazonglacier/latest/dev/vault-lock-policy.html) that prevents deletion, you must delete this policy before deleting the original archives. However, if your Vault Lock policy is in the Locked state, you can't delete it. See [Amazon Glacier Vault Lock](https://docs.aws.amazon.com/amazonglacier/latest/dev/vault-lock.html) and [Abort Vault Lock (DELETE lock-policy)](https://docs.aws.amazon.com/amazonglacier/latest/dev/api-AbortVaultLock.html) in the Amazon Glacier Developer Guide for more information. 

 This implementation guide provides an overview of the Data Transfer from Amazon Glacier Vaults to Amazon S3 Guidance, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying the Guidance to the Amazon Web Services (AWS) Cloud. 

 The intended audience for using this Guidance's features and capabilities in their environment includes solution architects, business decision makers, DevOps engineers, data scientists, and cloud professionals. Practical experience with the AWS Cloud, Amazon Glacier vaults, Amazon S3 buckets, and Amazon S3 storage classes is preferred. 

 Use this navigation table to quickly find answers to these questions: 


|  If you want to . . .  |  Read . . .  | 
| --- | --- | 
|   Know the cost for running this Guidance.   The estimated cost for running this Guidance in the US East (Ohio) Region is USD \$1153.57 to copy 100,000 Amazon Glacier vault archives, totaling 100 TB of data, from an Amazon Glacier vault to an S3 bucket.   |  [Cost](cost.md)  | 
|  Understand the security considerations for this Guidance.  |  [Security](security-1.md)  | 
|   Know how to plan for quotas for this Guidance.   This Guidance uses [AWS Lambda](https://aws.amazon.com/lambda/) functions to transfer data. This affects your account-wide Lambda concurrency limit.   |  [Quotas](quotas.md)  | 
|  Know which AWS Regions support this Guidance.  |  [Supported AWS Regions](plan-your-deployment.md#supported-aws-regions)  | 
|  View or download the AWS CloudFormation template included in this Guidance to automatically deploy the infrastructure resources (the "stack") for this Guidance.  |  [AWS CloudFormation template](aws-cloudformation-template.md)  | 
| Access the source code and optionally use the AWS Cloud Development Kit (AWS CDK) to deploy the Guidance. | [GitHub repository](https://github.com/aws-solutions/data-transfer-from-amazon-s3-glacier-vaults-to-amazon-s3) | 

 
# Features and benefits
<a name="features-and-benefits"></a>

 The Guidance provides the following features: 

 **Automation** 

Automate the process of restoring, copying, and transferring archives from a vault in the Amazon Glacier service to a bucket in the Amazon S3 service. After you move your data, you can use the Amazon S3 console. A prebuilt Amazon CloudWatch dashboard helps you monitor metrics and visualize the copy operation progress. 

 **Ability to assign storage classes** 

 When you use this Guidance to move your data from the Amazon Glacier service into the Amazon S3 service, you choose a storage class to assign to all of your objects. After your data is stored in the Amazon S3 service, you can change the storage classes to fit the use case for each file. We recommend carefully reviewing each storage class and its pricing details before deploying this Guidance. See [Amazon S3 storage class considerations](amazon-s3-storage-class-considerations.md) for more information. 

 **Visibility and access to data** 

 After the Guidance stores Amazon Glacier archives as objects in the destination S3 bucket, you can add tags to data. Tagging offers benefits such data classification, permissions controls, object lifecycle management, and cost allocation. For more information, see [Categorizing your storage using tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) in the *Amazon Simple Storage Service User Guide*. 

 **Flexibility to cancel your transfer and resume later** 

 The Guidance tracks the progress of the transfer. You can stop and restart transfers without needing to retransfer existing archives. See [Problem: Transfer workflow must be stopped](troubleshooting.md#problem-transfer-workflow-must-be-stopped) for more information. 

 **Cost optimization** 

 Copy Amazon Glacier vault archives to an S3 bucket and assign more [cost-effective storage classes](https://aws.amazon.com/s3/pricing/), such as: 
+  The low-cost S3 Glacier Deep Archive storage class for data that you rarely access 
+  The S3 Glacier Instant Retrieval storage class if you'll need your data quarterly but within milliseconds 
+  The S3 Standard storage class for data you'll need daily 

 You can also configure and apply [S3 Lifecycles](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) to transition your objects automatically into different storage classes, based on rules you set. 

 **Integration with Service Catalog AppRegistry and Application Manager, a capability of AWS Systems Manager ** 

 This Guidance includes a [Service Catalog AppRegistry](https://docs.aws.amazon.com/servicecatalog/latest/arguide/intro-app-registry.html) resource to register the Guidance's CloudFormation template and its underlying resources as an application in both Service Catalog AppRegistry and [Application Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/application-manager.html). With this integration, centrally manage the Guidance's resources and enable application search, reporting, and management actions. 

 
# Use cases
<a name="use-cases"></a>

**Performance and cost optimization**

Balancing cloud storage performance and storage cost is crucial for organizations. You can use this Guidance to help you optimize your storage infrastructure with the features that the Amazon S3 service offers. With tagging, lifecycle configurations, and a variety of storage classes, you can improve your performance while minimizing costs.

**Cloud archiving**

Many organizations store their most fundamental asset—their data—in locations that are slow to retrieve and lack flexibility. You can use this Guidance to help you automate, monitor, and seamlessly move your data when and where you need it.

# Concepts and definitions
<a name="concepts-and-definitions"></a>

 This section describes key concepts and defines terminology specific to this Guidance:  

 **archive** 

 Any data stored in an Amazon Glacier vault, such as a photo, video, or document. An archive is similar to an Amazon S3 object: it's the base unit of storage in the Amazon Glacier service. For more information, see [Archive](https://docs.aws.amazon.com/amazonglacier/latest/dev/amazon-glacier-data-model.html#data-model-archive) in the *Amazon Glacier Developer Guide*. 

 **chunk** 

 Term used to describe a *part* in a multipart upload or download for the Amazon Glacier service. This Guidance uses multipart upload to transfer the archives. For more information, see [Uploading Large Archives in Parts (Multipart Upload)](https://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html) and [Retrieving Amazon Glacier Archives Using AWS Management Console](https://docs.aws.amazon.com/amazonglacier/latest/dev/downloading-an-archive-two-steps.html) in the *Amazon Glacier Developer Guide*. 

 **inventory** 

 A point in time snapshot or listing of the archives stored within an Amazon Glacier vault. For more information, see [Downloading a Vault Inventory in Amazon Glacier](https://docs.aws.amazon.com/amazonglacier/latest/dev/vault-inventory.html) in the *Amazon Glacier Developer Guide*. 

 **tag** 

 A key-value pair used to categorize storage in the Amazon S3 service. For more information, see [Categorizing your storage using tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) in the *Amazon S3 User Guide*. 

 **vault** 

 A container in the Amazon Glacier service for storing archives. An Amazon Glacier vault is similar to an S3 bucket. For more information, see [Vault](https://docs.aws.amazon.com/amazonglacier/latest/dev/amazon-glacier-data-model.html#data-model-vault) in the *Amazon Glacier Developer Guide*. 

 **workflow\$1run** 

 An identifier used to represent the transfer of an Amazon Glacier vault to an S3 bucket. The Guidance randomly generates the `workflow_run` value on the first run (or you can choose the value). The Guidance uses this value when resuming a transfer. 

**Note**  
 For a general reference of AWS terms, see the [AWS Glossary](https://docs.aws.amazon.com/general/latest/gr/glos-chap.html).