

# Using files in your jobs
<a name="using-files-in-your-jobs"></a>

 Many of the jobs that you submit to AWS Deadline Cloud have input and output files. Your input files and output directories may be located on a combination of shared filesystems and local drives. Jobs need to locate the content in those locations. Deadline Cloud provides two features, [job attachments](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/storage-job-attachments.html) and [storage profiles](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/storage-shared.html) that work together to help your jobs locate the files that they need. 

Job attachments offer several benefits
+ Move files between hosts using Amazon S3
+ Transfer files from your work station to worker hosts and vice versa
+ Available for jobs in queues where you enable the feature
+ Primarily used with service-managed fleets, but also compatible with customer-managed fleets.

 Use storage profiles to map the layout of shared filesystem locations on your workstation and worker hosts. This mapping helps your jobs locate shared files and directories when their locations differ between your workstation and worker hosts, such as cross-platform setups with Windows-based workstations and Linux-based worker hosts. Storage profile's map of your filesystem configuration is also used by job attachments to identify the files it needs to shuttle between hosts through Amazon S3. 

 If you are not using job attachments, and you don't need to remap file and directory locations between workstations and worker hosts then you don't need to model your fileshares with storage profiles. 

**Topics**
+ [Sample project infrastructure](sample-project-infrastructure.md)
+ [Storage profiles and path mapping](storage-profiles-and-path-mapping.md)

# Sample project infrastructure
<a name="sample-project-infrastructure"></a>

To demonstrate using job attachments and storage profiles, set up a test environment with two separate projects. You can use the Deadline Cloud console to create the test resources.

1. If you haven't already, create a test farm. To create a farm, follow the procedure in [Create a farm](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/farms.html). 

1. Create two queues for jobs in each of the two projects. To create queues, follow the procedure in [Create a queue](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-queue.html).

   1. Create the first queue called **Q1**. Use the following configuration, use the defaults for all other items.
      + For job attachments, choose **Create a new Amazon S3 bucket**.
      + Select **Enable association with customer-managed fleets**.
      + For the run as user, enter **jobuser** for both the POSIX user and group.
      + For the queue service role, create a new role named **AssetDemoFarm-Q1-Role**
      + Clear the default conda queue environment checkbox.

   1. Create the second queue called **Q2**. Use the following configuration, use the defaults for all other items.
      + For job attachments, choose **Create a new Amazon S3 bucket**.
      + Select **Enable association with customer-managed fleets**.
      + For the run as user, enter **jobuser** for both the POSIX user and group.
      + For the queue service role, create a new role named **AssetDemoFarm-Q2-Role**
      + Clear the default conda queue environment checkbox.

1. Create a single customer-managed fleet that runs the jobs from both queues. To create the fleet, follow the procedure in [Create a customer-managed fleet](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-a-cmf.html). Use the following configuration:
   + For **Name**, use **DemoFleet**.
   + For **Fleet type** choose **Customer managed**
   + For **Fleet service role**, create a new role named **AssetDemoFarm-Fleet-Role**.
   + Don't associate the fleet with any queues.

The test environment assumes that there are three file systems shared between hosts using network file shares. In this example, the locations have the following names:
+ `FSCommon` - contains input job assets that are common to both projects.
+ `FS1` - contains input and output job assets for project 1.
+ `FS2` - contains input and output job assets for project 2.

The test environment also assumes that there are three workstations, as follows:
+ `WSAll` - A Linux-based workstation used by developers for all projects. The shared file system locations are:
  + `FSCommon`: `/shared/common`
  + `FS1`: `/shared/projects/project1`
  + `FS2`: `/shared/projects/project2`
+ `WS1` - A Windows-based workstation used for project 1. The shared file system locations are:
  + `FSCommon`: `S:\`
  + `FS1`: `Z:\`
  + `FS2`: Not available
+ `WS1` - A macOS-based workstation used for project 2.The shared file system locations are:
  + `FSCommon`: `/Volumes/common`
  + `FS1`: Not available
  + `FS2`: `/Volumes/projects/project2`

Finally, define the shared file system locations for the workers in your fleet. The examples that follow refer to this configuration as `WorkerConfig`. The shared locations are: 
+ `FSCommon`: `/mnt/common`
+ `FS1`: `/mnt/projects/project1`
+ `FS2`: `/mnt/projects/project2`

 You don't need to set up any shared file systems, workstations, or workers that match this configuration. The shared locations don't need to exist for the demonstration. 

# Storage profiles and path mapping
<a name="storage-profiles-and-path-mapping"></a>

Use storage profiles to model the file systems on your workstation and worker hosts. Each storage profile describes the operating system and file system layout of one of your system configurations. This topic describes how to use storage profiles to model the file system configurations of your hosts so Deadline Cloud can generate path mapping rules for your jobs, and how those path mapping rules are generated from your storage profiles.

When you submit a job to Deadline Cloud you can provide an optional storage profile ID for the job. This storage profile describes the submitting workstation's file system. It describes the original file system configuration that the file paths in the job template use.

You can also associate a storage profile with a fleet. The storage profile describes the file system configuration of all worker hosts in the fleet. If you have workers with different file system configuration, those workers must be assigned to a different fleet in your farm.

 Path mapping rules describe how paths should be remapped from how they are specified in the job to the path's actual location on a worker host. Deadline Cloud compares the file system configuration described in a job's storage profile with the storage profile of the fleet that is running the job to derive these path mapping rules. 

**Topics**
+ [Model shared file system locations with storage profiles](modeling-your-shared-filesystem-locations-with-storage-profiles.md)
+ [Configure storage profiles for fleets](configuring-storage-profiles-for-fleets.md)
+ [Configure storage profiles for queues](storage-profiles-for-queues.md)
+ [Derive path mapping rules from storage profiles](deriving-path-mapping-rules-from-storage-profiles.md)

# Model shared file system locations with storage profiles
<a name="modeling-your-shared-filesystem-locations-with-storage-profiles"></a>

 A storage profile models the file system configuration of one of your host configurations. There are four different host configurations in the [sample project infrastructure](). In this example you create a separate storage profile for each. You can create a storage profile using any of the following:
+ [CreateStorageProfile API](https://docs.aws.amazon.com/deadline-cloud/latest/APIReference/API_CreateStorageProfile.html)
+ [AWS::Deadline::StorageProfile](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-deadline-storageprofile.html) CloudFormation resource
+ [AWS console](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/storage-shared.html#storage-profile)

 A storage profile is made up of a list of file system locations that each tell Deadline Cloud the location and type of a file system location that is relevant for jobs submitted from or run on a host. A storage profile should only model the locations that are relevant for jobs. For example, the shared `FSCommon` location is located on workstation `WS1` at `S:\`, so the corresponding file system location is: 

```
{
    "name": "FSCommon",
    "path": "S:\\",
    "type": "SHARED"
}
```

 Use the following commands to create the storage profile for workstation configurations `WS1`, `WS2`, and `WS3` and the worker configuration `WorkerConfig` using the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) in [AWS CloudShell](https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html): 

```
# Change the value of FARM_ID to your farm's identifier
FARM_ID=farm-00112233445566778899aabbccddeeff

aws deadline create-storage-profile --farm-id $FARM_ID \
  --display-name WSAll \
  --os-family LINUX \
  --file-system-locations \
  '[
      {"name": "FSCommon", "type":"SHARED", "path":"/shared/common"},
      {"name": "FS1", "type":"SHARED", "path":"/shared/projects/project1"},
      {"name": "FS2", "type":"SHARED", "path":"/shared/projects/project2"}
  ]'

aws deadline create-storage-profile --farm-id $FARM_ID \
  --display-name WS1 \
  --os-family WINDOWS \
  --file-system-locations \
  '[
      {"name": "FSCommon", "type":"SHARED", "path":"S:\\"},
      {"name": "FS1", "type":"SHARED", "path":"Z:\\"}
   ]'

aws deadline create-storage-profile --farm-id $FARM_ID \
  --display-name WS2 \
  --os-family MACOS \
  --file-system-locations \
  '[
      {"name": "FSCommon", "type":"SHARED", "path":"/Volumes/common"},
      {"name": "FS2", "type":"SHARED", "path":"/Volumes/projects/project2"}
  ]'

aws deadline create-storage-profile --farm-id $FARM_ID \
  --display-name WorkerCfg \
  --os-family LINUX \
  --file-system-locations \
  '[
      {"name": "FSCommon", "type":"SHARED", "path":"/mnt/common"},
      {"name": "FS1", "type":"SHARED", "path":"/mnt/projects/project1"},
      {"name": "FS2", "type":"SHARED", "path":"/mnt/projects/project2"}
  ]'
```

**Note**  
You must refer to the file system locations in your storage profiles using the same values for the `name` property across all storage profiles in your farm. Deadline Cloud compares the names to determine that file system locations from different storage profiles are referring to the same location when generating path mapping rules. 

# Configure storage profiles for fleets
<a name="configuring-storage-profiles-for-fleets"></a>

You can configure a fleet to include a storage profile that models the file system locations on all workers in the fleet. The host file system configuration of all workers in a fleet must match their fleet's storage profile. Workers with different file system configurations must be in separate fleets. 

To set your fleet's configuration to use the `WorkerConfig` storage profile use the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) in [AWS CloudShell](https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html): 

```
# Change the value of FARM_ID to your farm's identifier
FARM_ID=farm-00112233445566778899aabbccddeeff
# Change the value of FLEET_ID to your fleet's identifier
FLEET_ID=fleet-00112233445566778899aabbccddeeff
# Change the value of WORKER_CFG_ID to your storage profile named WorkerConfig
WORKER_CFG_ID=sp-00112233445566778899aabbccddeeff

FLEET_WORKER_MODE=$( \
  aws deadline get-fleet --farm-id $FARM_ID --fleet-id $FLEET_ID \
   --query '.configuration.customerManaged.mode' \
)
FLEET_WORKER_CAPABILITIES=$( \
  aws deadline get-fleet --farm-id $FARM_ID --fleet-id $FLEET_ID \
   --query '.configuration.customerManaged.workerCapabilities' \
)

aws deadline update-fleet --farm-id $FARM_ID --fleet-id $FLEET_ID \
  --configuration \
  "{
    \"customerManaged\": {
      \"storageProfileId\": \"$WORKER_CFG_ID\",
      \"mode\": $FLEET_WORKER_MODE,
      \"workerCapabilities\": $FLEET_WORKER_CAPABILITIES
    }
  }"
```

# Configure storage profiles for queues
<a name="storage-profiles-for-queues"></a>

 A queue's configuration includes a list of case-sensitive names of the shared file system locations that jobs submitted to the queue require access to. for example, jobs submitted to queue `Q1` require file system locations `FSCommon` and `FS1`. Jobs submitted to queue `Q2` require file system locations `FSCommon` and `FS2`. 

To set the queue's configurations to require these file system locations, use the following script: 

```
# Change the value of FARM_ID to your farm's identifier
FARM_ID=farm-00112233445566778899aabbccddeeff
# Change the value of QUEUE1_ID to queue Q1's identifier
QUEUE1_ID=queue-00112233445566778899aabbccddeeff
# Change the value of QUEUE2_ID to queue Q2's identifier
QUEUE2_ID=queue-00112233445566778899aabbccddeeff

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE1_ID \
  --required-file-system-location-names-to-add FSComm FS1

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE2_ID \
  --required-file-system-location-names-to-add FSComm FS2
```

 A queue's configuration also includes a list of allowed storage profiles that applies to jobs submitted to and fleets associated with that queue. Only storage profiles that define file system locations for all of the required file system locations for the queue are allowed in the queue's list of allowed storage profiles. 

A job fails if you submit it with a storage profile that isn't in the list of allowed storage profiles for the queue. You can always submit a job with no storage profile to a queue. The workstation configurations labeled `WSAll` and `WS1` both have the required file system locations (`FSCommon` and `FS1`) for queue `Q1`. They need to be allowed to submit jobs to the queue. Similarly, workstation configurations `WSAll` and `WS2` meet the requirements for queue `Q2`. They need to be allowed to submit jobs to that queue. Update both queue configurations to allow jobs to be submitted with these storage profiles using the following script: 

```
# Change the value of WSALL_ID to the identifier of the WSAll storage profile
WSALL_ID=sp-00112233445566778899aabbccddeeff
# Change the value of WS1 to the identifier of the WS1 storage profile
WS1_ID=sp-00112233445566778899aabbccddeeff
# Change the value of WS2 to the identifier of the WS2 storage profile
WS2_ID=sp-00112233445566778899aabbccddeeff

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE1_ID \
  --allowed-storage-profile-ids-to-add $WSALL_ID $WS1_ID

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE2_ID \
  --allowed-storage-profile-ids-to-add $WSALL_ID $WS2_ID
```

 If you add the `WS2` storage profile to the list of allowed storage profiles for queue `Q1` it fails: 

```
$ aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE1_ID \
  --allowed-storage-profile-ids-to-add $WS2_ID

An error occurred (ValidationException) when calling the UpdateQueue operation: Storage profile id: sp-00112233445566778899aabbccddeeff does not have required file system location: FS1
```

 This is because the `WS2` storage profile doesn't contain a definition for the file system location named `FS1` that queue `Q1` requires. 

 Associating a fleet that is configured with a storage profile that is not in the queue's list of allowed storage profiles also fails. For example: 

```
$ aws deadline create-queue-fleet-association --farm-id $FARM_ID \
   --fleet-id $FLEET_ID \
   --queue-id $QUEUE1_ID

An error occurred (ValidationException) when calling the CreateQueueFleetAssociation operation: Mismatch between storage profile ids.
```

To fix the error, add the storage profile named `WorkerConfig` to the list of allowed storage profiles for both queue `Q1` and queue `Q2`. Then, associate the fleet with these queues so that workers in the fleet can run jobs from both queues. 

```
# Change the value of FLEET_ID to your fleet's identifier
FLEET_ID=fleet-00112233445566778899aabbccddeeff
# Change the value of WORKER_CFG_ID to your storage profile named WorkerCfg
WORKER_CFG_ID=sp-00112233445566778899aabbccddeeff

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE1_ID \
  --allowed-storage-profile-ids-to-add $WORKER_CFG_ID

aws deadline update-queue --farm-id $FARM_ID --queue-id $QUEUE2_ID \
  --allowed-storage-profile-ids-to-add $WORKER_CFG_ID

aws deadline create-queue-fleet-association --farm-id $FARM_ID \
  --fleet-id $FLEET_ID \
  --queue-id $QUEUE1_ID

aws deadline create-queue-fleet-association --farm-id $FARM_ID \
  --fleet-id $FLEET_ID \
  --queue-id $QUEUE2_ID
```

# Derive path mapping rules from storage profiles
<a name="deriving-path-mapping-rules-from-storage-profiles"></a>

 Path mapping rules describe how paths should be remapped from the job to the path's actual location on a worker host. When a task is running on a worker, the storage profile from the job is compared to the storage profile of the worker's fleet to derive the path mapping rules for the task. 

 Deadline Cloud creates a mapping rule for each of the required file system locations in the queue's configuration. For example, a job submitted with the `WSAll` storage profile to queue `Q1` has the path mapping rules: 
+  `FSComm`: `/shared/common -> /mnt/common` 
+  `FS1`: `/shared/projects/project1 -> /mnt/projects/project1` 

 Deadline Cloud creates rules for the `FSComm` and `FS1` file system locations, but not the `FS2` file system location even though both the `WSAll` and `WorkerConfig` storage profiles define `FS2`. This is because queue `Q1`'s list of required file system locations is `["FSComm", "FS1"]`. 

 You can confirm the path mapping rules available to jobs submitted with a particular storage profile by submitting a job that prints out [Open Job Description's path mapping rules file](https://github.com/OpenJobDescription/openjd-specifications/wiki/How-Jobs-Are-Run#path-mapping), and then reading the session log after the job has completed: 

```
# Change the value of FARM_ID to your farm's identifier
FARM_ID=farm-00112233445566778899aabbccddeeff
# Change the value of QUEUE1_ID to queue Q1's identifier
QUEUE1_ID=queue-00112233445566778899aabbccddeeff
# Change the value of WSALL_ID to the identifier of the WSALL storage profile
WSALL_ID=sp-00112233445566778899aabbccddeeff

aws deadline create-job --farm-id $FARM_ID --queue-id $QUEUE1_ID \
  --priority 50 \
  --storage-profile-id $WSALL_ID \
  --template-type JSON --template \
  '{
    "specificationVersion": "jobtemplate-2023-09",
    "name": "DemoPathMapping",
    "steps": [
      {
        "name": "ShowPathMappingRules",
        "script": {
          "actions": {
            "onRun": {
              "command": "/bin/cat",
              "args": [ "{{Session.PathMappingRulesFile}}" ]
            }
          }
        }
      }
    ]
  }'
```

 If you use the [Deadline Cloud CLI](https://pypi.org/project/deadline/) to submit jobs, its configuration `settings.storage_profile_id` setting sets the storage profile that jobs submitted with the CLI will have. To submit jobs with the `WSAll` storage profile, set: 

```
deadline config set settings.storage_profile_id $WSALL_ID
```

 To run a customer-managed worker as though it is running in the sample infrastructure, follow the procedure in [Run the worker agent](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/run-worker.html) in the *Deadline Cloud User Guide* to run a worker with AWS CloudShell. If you followed those instructions before, delete the `~/demoenv-logs` and `~/demoenv-persist` directories first. Also, set the values of the `DEV_FARM_ID` and `DEV_CMF_ID` environment variables that the directions reference as follows before doing so: 

```
DEV_FARM_ID=$FARM_ID
DEV_CMF_ID=$FLEET_ID
```

 After the job runs, you can see the path mapping rules in the job's log file: 

```
cat demoenv-logs/${QUEUE1_ID}/*.log
...
JJSON log results (see below)
...
```

The log contains mapping for both the `FS1` and `FSComm` file systems. Reformatted for readability, the log entry looks like this:

```
{
    "version": "pathmapping-1.0",
    "path_mapping_rules": [
        {
            "source_path_format": "POSIX",
            "source_path": "/shared/projects/project1",
            "destination_path": "/mnt/projects/project1"
        },
        {
            "source_path_format": "POSIX",
            "source_path": "/shared/common",
            "destination_path": "/mnt/common"
        }
    ]
```

 You can submit jobs with different storage profiles to see how the path mapping rules change. 