

# Importing changes from your data repository
<a name="importing-files-dra"></a>

You can import changes to data and POSIX metadata from a linked data repository to your Amazon FSx file system. Associated POSIX metadata includes ownership, permissions, and timestamps.

To import changes to the file system, use one of the following methods:
+ Configure your file system to automatically import new, changed, or deleted files from your linked data repository. For more information, see [Automatically import updates from your S3 bucket](autoimport-data-repo-dra.md).
+ Select the option to import metadata when you create a data repository association. This will initiate an import data repository task immediately after creating the data repository association.
+ Use an on-demand import data repository task. For more information, see [Using data repository tasks to import changes](import-data-repo-task-dra.md).

Automatic import and import data repository tasks can run at the same time.

When you turn on automatic import for a data repository association, your file system automatically updates file metadata as objects are created, modified, or deleted in S3. When you select the option to import metadata while creating a data repository association, your file system imports metadata for all objects in the data repository. When you import using an import data repository task, your file system imports only metadata for objects that were created or modified since the last import.

FSx for Lustre automatically copies the content of a file from your data repository and loads it into the ﬁle system when your application first accesses the file in the file system. This data movement is managed by FSx for Lustre and is transparent to your applications. Subsequent reads of these files are served directly from the file system with sub-millisecond latencies.

You can also preload your whole ﬁle system or a directory within your ﬁle system. For more information, see [Preloading files into your file system](preload-file-contents-hsm-dra.md). If you request the preloading of multiple ﬁles simultaneously, FSx for Lustre loads ﬁles from your Amazon S3 data repository in parallel.

FSx for Lustre only imports S3 objects that have POSIX-compliant object keys. Both automatic import and import data repository tasks import POSIX metadata. For more information, see [POSIX metadata support for data repositories](posix-metadata-support.md).

**Note**  
FSx for Lustre doesn't support importing metadata for symbolic links (symlinks) from S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes. Metadata for S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive objects that aren't symlinks can be imported (that is, an inode is created on the FSx for Lustre file system with the correct metadata). However, to read this data from the file system, you must first restore the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive object. Importing file data directly from Amazon S3 objects in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class into FSx for Lustre isn't supported.

# Automatically import updates from your S3 bucket
<a name="autoimport-data-repo-dra"></a>

You can configure FSx for Lustre to automatically update metadata in the file system as objects are added to, changed in, or deleted from your S3 bucket. FSx for Lustre creates, updates, or deletes the file and directory listing, corresponding to the change in S3. If the changed object in the S3 bucket no longer contains its metadata, FSx for Lustre maintains the current metadata values of the file, including the current permissions.

**Note**  
The FSx for Lustre file system and the linked S3 bucket must be located in the same AWS Region to automatically import updates.

You can configure automatic import when you create the data repository association, and you can update the automatic import settings at any time using the FSx management console, the AWS CLI, or the AWS API.

**Note**  
You can configure both automatic import and automatic export on the same data repository association. This topic describes only the automatic import feature.

**Important**  
If an object is modified in S3 with all automatic import policies enabled and automatic export disabled, the content of that object is always imported to a corresponding file in the file system. If a file already exists in the target location, the file is overwritten.
If a file is modified in both the file system and S3, with all automatic import and automatic export policies enabled, either the file in the file system or the object in S3 could be overwritten by the other. It isn't guaranteed that a later edit in one location will overwrite an earlier edit in another location. If you modify the same file in both the file system and the S3 bucket, you should ensure application-level coordination to prevent such conflicts. FSx for Lustre doesn't prevent conflicting writes in multiple locations.

The import policy specifies how you want FSx for Lustre to update your file system as the contents change in the linked S3 bucket. A data repository association can have one of the following import policies:
+ **New** – FSx for Lustre automatically updates file and directory metadata only when new objects are added to the linked S3 data repository.
+ **Changed** – FSx for Lustre automatically updates file and directory metadata only when an existing object in the data repository is changed.
+ **Deleted** – FSx for Lustre automatically updates file and directory metadata only when an object in the data repository is deleted.
+ **Any combination of New, Changed, and Deleted** – FSx for Lustre automatically updates file and directory metadata when any of the specified actions occur in the S3 data repository. For example, you can specify that the file system is updated when an object is added to (**New**) or removed from (**Deleted**) the S3 repository, but not updated when an object is changed.
+ **No policy configured** – FSx for Lustre doesn't update file and directory metadata on the file system when objects are added to, changed in, or deleted from the S3 data repository. If you don't configure an import policy, automatic import is disabled for the data repository association. You can still manually import metadata changes by using an import data repository task, as described in [Using data repository tasks to import changes](import-data-repo-task-dra.md).

**Important**  
Automatic import will not synchronize the following S3 actions with your linked FSx for Lustre file system:  
Deleting an object using S3 object lifecycle expirations
Permanently deleting the current object version in a versioning-enabled bucket
Undeleting an object in a versioning-enabled bucket

For most use cases, we recommend that you configure an import policy of **New**, **Changed**, and **Deleted**. This policy ensures that all updates made in your linked S3 data repository are automatically imported to your file system.

When you set an import policy to update your file system file and directory metadata based on changes in the linked S3 data repository, FSx for Lustre creates an event notification configuration on the linked S3 bucket. The event notification configuration is named `FSx`. Don't modify or delete the `FSx` event notification configuration on the S3 bucket – doing so will prevent the automatic import of updated file and directory metadata to your file system.

When FSx for Lustre updates a file listing that has changed on the linked S3 data repository, it overwrites the local file with the updated version, even if the file is write-locked.

FSx for Lustre makes a best effort to update your file system. FSx for Lustre cannot update the file system in the following situations:
+ If FSx for Lustre doesn't have permission to open the changed or new S3 object. In this case, FSx for Lustre skips the object and continues. The DRA lifecycle state isn't affected.
+ If FSx for Lustre doesn't have bucket-level permissions, such as for `GetBucketAcl`. This will cause the data repository lifecycle state to become **Misconfigured**. For more information, see [Data repository association lifecycle state](dra-lifecycles.md).
+ If the `FSx` event notification configuration on the linked S3 bucket is deleted or changed. This will cause the data repository lifecycle state to become **Misconfigured**. For more information, see [Data repository association lifecycle state](dra-lifecycles.md).

We recommend that you [turn on logging](cw-event-logging.md#manage-logging) to CloudWatch Logs to log information about any files or directories that couldn't be imported automatically. Warnings and errors in the log contain information about the failure reason. For more information, see [Data repository event logs](data-repo-event-logs.md).

## Prerequisites
<a name="auto-import-prereqs-dra"></a>

The following conditions are required for FSx for Lustre to automatically import new, changed, or deleted files from the linked S3 bucket:
+ The file system and its linked S3 bucket are located in the same AWS Region.
+ The S3 bucket doesn't have a misconfigured **Lifecycle state**. For more information, see [Data repository association lifecycle state](dra-lifecycles.md).
+ Your account has the permissions required to configure and receive event notifications on the linked S3 bucket.

## Types of file changes supported
<a name="file-change-support-dra"></a>

FSx for Lustre supports importing the following changes to files and directories that occur in the linked S3 bucket:
+ Changes to file contents.
+ Changes to file or directory metadata.
+ Changes to symlink target or metadata.
+ Deletions of files and directories. If you delete an object in the linked S3 bucket which corresponds to a directory in the file system (that is, an object with a key name that ends with a slash), FSx for Lustre deletes the corresponding directory on the file system only if it is empty.

## Updating import settings
<a name="manage-autoimport-dra"></a>

You can set a file system's import settings for a linked S3 bucket when you create the data repository association. For more information, see [Creating a link to an S3 bucket](create-linked-dra.md).

You can also update the import settings at any time, including the import policy. For more information, see [Updating data repository association settings](update-dra-settings.md).

## Monitoring automatic import
<a name="monitoring-autoimport"></a>

If the rate of change in your S3 bucket exceeds the rate at which automatic import can process these changes, the corresponding metadata changes being imported to your FSx for Lustre file system are delayed. If this occurs, you can use the `AgeOfOldestQueuedMessage` metric to monitor the age of the oldest change waiting to be processed by automatic import. For more information on this metric, see [FSx for Lustre S3 repository metrics](fs-metrics.md#auto-import-export-metrics).

If the delay in importing metadata changes exceeds 14 days (as measured using the `AgeOfOldestQueuedMessage` metric), changes in your S3 bucket that haven't been processed by automatic import aren't imported into your file system. Additionally, your data repository association lifecycle is marked as **MISCONFIGURED** and automatic import is stopped. If you have automatic export enabled, automatic export continues monitoring your FSx for Lustre file system for changes. However, additional changes aren't synchronized from your FSx for Lustre file system to S3.

To return your data repository association from the **MISCONFIGURED** lifecycle state to the **AVAILABLE** lifecycle state, you must update your data repository association. You can update your data repository association using the [update-data-repository-association](https://docs.aws.amazon.com/cli/latest/reference/fsx/update-data-repository-association.html) CLI command (or the corresponding [UpdateDataRepositoryAssociation](https://docs.aws.amazon.com/fsx/latest/APIReference/API_UpdateDataRepositoryAssociation.html) API operation). The only request parameter that you need is the `AssociationID` of the data repository association that you want to update.

After the data repository association lifecycle state changes to **AVAILABLE**, automatic import (and automatic export if enabled) restarts. Upon restarting, automatic export resumes synchronizing file system changes to S3. To synchronize the metadata of new and changed objects in S3 with your FSx for Lustre file system that weren't imported or are from when the data repository association was in a misconfigured state, run an [import data repository task](import-data-repo-task-dra.md). Import data repository tasks don't synchronize deletes in your S3 bucket with your FSx for Lustre file system. If you want to fully synchronize S3 with your file system (including deletes), you must re-create your file system.

To ensure that delays to importing metadata changes don't exceed 14 days, we recommend that you set an alarm on the `AgeOfOldestQueuedMessage` metric and reduce activity in your S3 bucket if the `AgeOfOldestQueuedMessage` metric grows beyond your alarm threshold. For an FSx for Lustre file system connected to an S3 bucket with a single shard continuously sending the maximum number of possible changes from S3, with only automatic import running on the FSx for Lustre file system, automatic import can process a 7-hour backlog of S3 changes within 14 days.

Additionally, with a single S3 action, you can generate more changes than automatic import will ever process in 14 days. Examples of these types of actions include, but are not limited to, AWS Snowball uploads to S3 and large-scale deletions. If you make a large-scale change to your S3 bucket that you want synchronized with your FSx for Lustre file system, to prevent automatic import changes from exceeding 14 days, you should delete your file system and re-create it once the S3 change has completed.

If your `AgeOfOldestQueuedMessage` metric is growing, review your S3 bucket `GetRequests`, `PutRequests`, `PostRequests`, and `DeleteRequests` metrics for activity changes that would cause an increase in the rate and/or number of changes being sent to automatic import. For information about available S3 metrics, see [Monitoring Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/monitoring-overview.html) in the *Amazon S3 User Guide*.

For a list of all available FSx for Lustre metrics, see [Monitoring with Amazon CloudWatch](monitoring-cloudwatch.md).

# Using data repository tasks to import changes
<a name="import-data-repo-task-dra"></a>

The import data repository task imports metadata of objects that are new or changed in your S3 data repository, creating a new file or directory listing for any new object in the S3 data repository. For any object that has been changed in the data repository, the corresponding file or directory listing is updated with the new metadata. No action is taken for objects that have been deleted from the data repository.

Use the following procedures to import metadata changes by using the Amazon FSx console and CLI. Note that you can use one data repository task for multiple DRAs.

## To import metadata changes (console)
<a name="create-import-data-repo-task-dra-console"></a>

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. On the navigation pane, choose **File systems**, then choose your Lustre file system.

1. Choose the **Data repository** tab.

1. In the **Data repository associations** pane, choose the data repository associations you want to create the import task for.

1. From the **Actions** menu, choose **Import task**. This choice isn't available if the file system isn't linked to a data repository. The **Create import data repository task** page appears.

1. (Optional) Specify up to 32 directories or files to import from your linked S3 buckets by providing the paths to those directories or files in **Data repository paths to import**.
**Note**  
If a path that you provide isn't valid, the task fails.

1. (Optional) Choose **Enable** under **Completion report** to generate a task completion report after the task completes. A *task completion report* provides details about the files processed by the task that meet the scope provided in **Report scope**. To specify the location for Amazon FSx to deliver the report, enter a relative path on a linked S3 data repository for **Report path**.

1. Choose **Create**. 

   A notification at the top of the **File systems** page shows the task that you just created in progress. 

To view the task status and details, scroll down to the **Data Repository Tasks** pane in the **Data Repository** tab for the file system. The default sort order shows the most recent task at the top of the list.

To view a task summary from this page, choose **Task ID** for the task you just created. The **Summary** page for the task appears. 

## To import metadata changes (CLI)
<a name="create-import-data-repo-task-dra-cli"></a>
+ Use the [https://docs.aws.amazon.com/cli/latest/reference/fsx/create-data-repository-task.html](https://docs.aws.amazon.com/cli/latest/reference/fsx/create-data-repository-task.html) CLI command to import metadata changes on your FSx for Lustre file system. The corresponding API operation is [https://docs.aws.amazon.com/fsx/latest/APIReference/API_CreateDataRepositoryTask.html](https://docs.aws.amazon.com/fsx/latest/APIReference/API_CreateDataRepositoryTask.html).

  ```
  $ aws fsx create-data-repository-task \
      --file-system-id fs-0123456789abcdef0 \
      --type IMPORT_METADATA_FROM_REPOSITORY \
      --paths s3://bucketname1/dir1/path1 \
      --report Enabled=true,Path=s3://bucketname1/dir1/path1,Format=REPORT_CSV_20191124,Scope=FAILED_FILES_ONLY
  ```

  After successfully creating the data repository task, Amazon FSx returns the task description as JSON.

After creating the task to import metadata from the linked data repository, you can check the status of the import data repository task. For more information about viewing data repository tasks, see [Accessing data repository tasks](view-data-repo-tasks.md).

# Preloading files into your file system
<a name="preload-file-contents-hsm-dra"></a>

You can optionally preload contents individual files or directories into your file system.

## Importing files using HSM commands
<a name="preload-hsm"></a>

Amazon FSx copies data from your Amazon S3 data repository when a file is first accessed. Because of this approach, the initial read or write to a file incurs a small amount of latency. If your application is sensitive to this latency, and you know which files or directories your application needs to access, you can optionally preload contents of individual files or directories. You do so using the `hsm_restore` command, as follows.

You can use the `hsm_action` command (issued with the `lfs` user utility) to verify that the file's contents have finished loading into the file system. A return value of `NOOP` indicates that the file has successfully been loaded. Run the following commands from a compute instance with the file system mounted. Replace *path/to/file* with the path of the file you're preloading into your file system.

```
sudo lfs hsm_restore path/to/file
sudo lfs hsm_action path/to/file
```

You can preload your whole file system or an entire directory within your file system by using the following commands. (The trailing ampersand makes a command run as a background process.) If you request the preloading of multiple files simultaneously, Amazon FSx loads your files from your Amazon S3 data repository in parallel. If a file has already been loaded to the file system, the `hsm_restore` command doesn't reload it.

```
nohup find local/directory -type f -print0 | xargs -0 -n 1 -P 8 sudo lfs hsm_restore &
```

**Note**  
If your linked S3 bucket is larger than your file system, you should be able to import all the file metadata into your file system. However, you can load only as much actual file data as will fit into the file system's remaining storage space. You'll receive an error if you attempt to access file data when there is no more storage left on the file system. If this occurs, you can increase the amount of storage capacity as needed. For more information, see [Managing storage capacity](managing-storage-capacity.md).

## Validation step
<a name="preload-validation"></a>

You can run the bash script listed below to help you discover how many files or objects are in an archived (released) state.

To improve the script's performance, especially across file systems with a large number of files, CPU threads are automatically determined based in the `/proc/cpuproc` file. That is, you will see faster performance with a higher vCPU count Amazon EC2 instance.

1. Set up the bash script.

   ```
   #!/bin/bash
   
   # Check if a directory argument is provided
   if [ $# -ne 1 ]; then
       echo "Usage: $0 /path/to/lustre/mount"
       exit 1
   fi
   
   # Set the root directory from the argument
   ROOT_DIR="$1"
   
   # Check if the provided directory exists
   if [ ! -d "$ROOT_DIR" ]; then
       echo "Error: Directory $ROOT_DIR does not exist."
       exit 1
   fi
   
   # Automatically detect number of CPUs and set threads
   if command -v nproc &> /dev/null; then
       THREADS=$(nproc)
   elif [ -f /proc/cpuinfo ]; then
       THREADS=$(grep -c ^processor /proc/cpuinfo)
   else
       echo "Unable to determine number of CPUs. Defaulting to 1 thread."
       THREADS=1
   fi
   
   # Output file
   OUTPUT_FILE="released_objects_$(date +%Y%m%d_%H%M%S).txt"
   
   echo "Searching in $ROOT_DIR for all released objects using $THREADS threads"
   echo "This may take a while depending on the size of the filesystem..."
   
   # Find all released files in the specified lustre directory using parallel
   # If you  get false positives for file names/paths that include the word 'released',
   # you can grep 'released exists archived' instead of just 'released'
   time sudo lfs find "$ROOT_DIR" -type f | \
   parallel --will-cite -j "$THREADS" -n 1000 "sudo lfs hsm_state {} | grep released" > "$OUTPUT_FILE"
   
   echo "Search complete. Released objects are listed in $OUTPUT_FILE"
   echo "Total number of released objects: $(wc -l <"$OUTPUT_FILE")"
   ```

1. Make the script executable:

   ```
   $ chmod +x find_lustre_released_files.sh
   ```

1. Run the script, as in the following example:

   ```
   $ ./find_lustre_released_files.sh /fsxl/sample
   Searching in /fsxl/sample for all released objects using 16 threads
   This may take a while depending on the size of the filesystem...
   real 0m9.906s
   user 0m1.502s
   sys 0m5.653s
   Search complete. Released objects are listed in released_objects_20241121_184537.txt
   Total number of released objects: 30000
   ```

If there are released objects present, then perform a bulk restore on the desired directories to bring the files into FSx for Lustre from S3, as in the following example:

```
$ DIR=/path/to/lustre/mount
$ nohup find $DIR -type f -print0 | xargs -0 -n 1 -P 8 sudo lfs hsm_restore &
```

Note that `hsm_restore` will take a while where there are millions of files.