View a markdown version of this page

Metadata lookup and enrichment - Spatial Data Management on AWS

Metadata lookup and enrichment

Metadata lookup is a role that S3, REST API, and DynamoDB connectors can fill. When a derive connector includes a derivation block in its trigger configuration, it uses the lookup-derivation execution model: fetch records from an external data source, match them to SDMA files or assets by a key field, and write the matched values as governed metadata attributes.

This is how you automate bulk metadata enrichment. A domain expert defines the external data source, the matching rules, and the field mappings once on a connector. That connector is approved on a template. From that point on, every asset created under that template is automatically enriched — files are matched to external records, metadata is applied, and the asset record is updated with provenance showing where each value came from.

Important

Derive connectors use the IAM role prefix SpatialDataManagementContentDerivation- (not SpatialDataManagementContentPublisher- used by publish connectors). SDMA validates the role name prefix and tests role assumption when the connector is created.

Create a derive connector with lookup

  1. In the Spatial Data Portal, go to Library settings > Connectors.

  2. Choose the Derive content tab.

  3. Choose Create deriver.

  4. Enter a connector name.

  5. For Connector type, select the type that matches your data source:

    • Amazon S3 CSV file import — for s3Lookup operations (S3 connector type)

    • Amazon DynamoDB import — for dynamodbLookup operations (DynamoDB connector type)

    • REST API import — for restLookup operations (REST connector type)

  6. Paste the connector configuration JSON (see the operation sections below) into the JSON editor.

  7. Choose Create.

The connector type you select determines the protocol and authentication model. The derivation block in the trigger configuration is what makes it a lookup connector — the same S3, DynamoDB, or REST connector type can also fill publish, resource provision, or content production roles depending on its configuration.

Lookup operations

The lookup-derivation model supports the following operations (used as the derivation.op value).

Operation Description

s3Lookup

Reads a CSV or JSON file from Amazon S3 and matches records to resources. Useful for bulk metadata import from spreadsheets or data exports.

restLookup

Calls a REST API endpoint and matches response records to resources. Useful for enriching metadata from external catalogs or databases.

dynamodbLookup

Performs a GetItem on an Amazon DynamoDB table and applies the result to a resource. Useful for single-record lookups by key.

Record matching

The applyTo block controls how external records are matched to resources:

  • resource – The resource type to apply derived metadata to (file or asset).

  • match.source – The field in the external record used for matching (for example, filename).

  • match.target – The resource field to match against (for example, file.path:basename). Supports transforms: :basename, :ext, :tolower.

  • onNoMatch – Behavior when no match is found. Currently only skip (default) is supported, which continues to the next record.

  • mappingPolicy – Controls how derived attributes interact with existing attributes: inherit (default) or override.

  • responseFieldMapping – Maps external record fields to resource metadata attributes.

Amazon S3 lookup

The s3Lookup operation reads a file from S3 and parses it into records. It supports CSV and JSON content types.

Prerequisites

  1. Upload the source data file (CSV or JSON) to an S3 bucket.

  2. Create an IAM role with the following:

    • Role name must start with SpatialDataManagementContentDerivation- (for example, SpatialDataManagementContentDerivation-MyS3Lookup).

    • Trust policy must allow the SDMA solution account to assume it:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }

      Replace <SDMA_ACCOUNT_ID> with the AWS account ID where SDMA is deployed.

    • Permissions policy must grant s3:GetObject on the source file:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/metadata/*" } ] }

CSV options

For CSV files, the following options are available in the trigger-level derivation.s3Config.csvOptions:

  • hasHeader – Whether the CSV has a header row. Defaults to true. When true, column names from the header are used as field names. When false, columns are indexed as 0, 1, 2, and so on

  • delimiter – CSV delimiter character. Defaults to ,.

Example

This example reads a CSV file from S3 and applies metadata to files based on filename matching:

{ "s3Config": { "bucketName": "my-metadata-bucket", "securityConfig": { "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyS3Lookup", "type": "AssumeRole" } }, "triggers": [ { "description": "Derive file metadata from CSV on asset creation", "resources": ["asset"], "events": ["create"], "derivation": { "op": "s3Lookup", "sourceContentType": "csv", "s3Config": { "objectKey": "metadata/${project.projectId}/attributes.csv" }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "onNoMatch": "skip", "responseFieldMapping": [ { "source": "department", "target": "file.department" }, { "source": "classification", "target": "file.classification" } ] } } } ] }

REST API lookup

The restLookup operation calls a REST API endpoint and parses the response into records. It supports GET and POST methods, query parameter substitution, and response filtering.

Prerequisites

  1. Identify the REST API endpoint that returns the metadata to derive.

  2. If using API key, token, or basic auth, store the credentials in AWS Secrets Manager.

  3. Create an IAM role with the following:

    • Role name must start with SpatialDataManagementContentDerivation- (for example, SpatialDataManagementContentDerivation-MyRestLookup).

    • Trust policy must allow the SDMA solution account to assume it:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }

      Replace <SDMA_ACCOUNT_ID> with the AWS account ID where SDMA is deployed.

    • Permissions policy must grant access to the Secrets Manager secret (if using API key, token, or basic auth):

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:<SECRET_NAME>" } ] }

Example

This example calls a REST API to look up file metadata from an external catalog by project ID:

{ "restConfig": { "apiBase": "https://api.example.com/v1", "securityConfig": { "type": "ApiKey", "secretArn": "arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:<SECRET_NAME>", "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyRestLookup" } }, "triggers": [ { "description": "Derive asset metadata from external catalog", "resources": ["asset"], "events": ["create"], "derivation": { "op": "restLookup", "restConfig": { "method": "GET", "path": "/records", "queryParams": { "projectId": "${project.projectId}" }, "responseFilter": "data.items" }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "responseFieldMapping": [ { "source": "category", "target": "file.category" }, { "source": "owner", "target": "file.owner" } ] } } } ] }

The responseFilter field uses dot notation to extract a nested array from the API response (for example, data.items extracts the items array from { "data": { "items": […​] } }).

Amazon DynamoDB lookup

The dynamodbLookup operation performs a GetItem on a DynamoDB table and returns the item as a single record. It automatically deserializes DynamoDB attribute types (S, N, BOOL, M, L) to native values.

Prerequisites

  1. Create a DynamoDB table with the metadata to derive.

  2. Create an IAM role with the following:

    • Role name must start with SpatialDataManagementContentDerivation- (for example, SpatialDataManagementContentDerivation-MyDynamoDBLookup).

    • Trust policy must allow the SDMA solution account to assume it:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }

      Replace <SDMA_ACCOUNT_ID> with the AWS account ID where SDMA is deployed.

    • Permissions policy must grant dynamodb:GetItem on the target table:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "dynamodb:GetItem", "Resource": "arn:aws:dynamodb:<REGION>:<ACCOUNT_ID>:table/<TABLE_NAME>" } ] }

Partition key resolution

The partition key value can be resolved in two ways:

  • Explicit – Set dynamodbConfig.partitionKeyValue to a template string (for example, ${asset.assetId}).

  • From match config – If partitionKeyValue is omitted, the value is derived from the applyTo.match.target field. For example, if match.target is file.path:basename, the file’s basename is used as the partition key value.

Example

This example looks up file metadata from a DynamoDB table using the filename as the partition key:

{ "dynamodbConfig": { "tableName": "file-metadata-table", "partitionKey": "filename", "region": "us-west-2", "securityConfig": { "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyDynamoDBLookup", "type": "AssumeRole" } }, "triggers": [ { "description": "Derive file metadata from DynamoDB", "resources": ["asset"], "events": ["create"], "derivation": { "op": "dynamodbLookup", "dynamodbConfig": { "consistentRead": true }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "responseFieldMapping": [ { "source": "status", "target": "file.status" }, { "source": "priority", "target": "file.priority" } ] } } } ] }
Note

The dynamodbConfig at the trigger’s derivation level is merged with the connector-level dynamodbConfig. Use the trigger-level config to override specific fields like consistentRead per trigger.

Copy fields

The copyFields configuration provides a shortcut for copying a set of fields from the external record to resource metadata using prefix-based matching. Instead of listing individual field mappings, you specify a source prefix and target prefix:

"copyFields": { "sourcePrefix": "metadata.", "targetPrefix": "file." }

This copies all fields starting with metadata. from the external record to the resource, replacing the prefix with file.. For example, metadata.classification becomes file.classification.

Record ID mapping

The recordIdMapping configuration stores the external record’s identifier as a metadata attribute on the matched resource:

"recordIdMapping": { "target": "file.external_id" }

This writes the record’s key (from the external source) to the external_id attribute on the matched file.

Configuration fields

The following tables describe the configuration fields for the metadata derivation connector.

Connector-level fields

Field Required Description

s3Config.bucketName

Yes (for s3Lookup)

Target S3 bucket name.

s3Config.securityConfig

Yes (for s3Lookup)

Authentication configuration. Must use AssumeRole type.

restConfig.apiBase

Yes (for restLookup)

Base URL for the REST API.

restConfig.securityConfig

Yes (for restLookup)

Authentication configuration. Supports ApiKey, TokenAuth, and BasicAuth types.

dynamodbConfig.tableName

Yes (for dynamodbLookup)

DynamoDB table name.

dynamodbConfig.partitionKey

Yes (for dynamodbLookup)

Partition key attribute name.

dynamodbConfig.region

No

AWS Region of the DynamoDB table.

dynamodbConfig.securityConfig

Yes (for dynamodbLookup)

Authentication configuration. Must use AssumeRole type.

Trigger-level derivation fields

Field Required Description

derivation.op

Yes

Derivation operation: s3Lookup, restLookup, or dynamodbLookup

derivation.sourceContentType

No

Content type of the source data. csv (default) or json. Applies to s3Lookup.

derivation.s3Config.objectKey

Yes (for s3Lookup)

S3 object key with ${variable} substitution support.

derivation.s3Config.csvOptions.hasHeader

No

Whether the CSV has a header row. Defaults to true.

derivation.s3Config.csvOptions.delimiter

No

CSV delimiter character. Defaults to ,.

derivation.restConfig.method

No

HTTP method for restLookup. Defaults to GET.

derivation.restConfig.path

Yes (for restLookup)

API path appended to connector-level restConfig.apiBase.

derivation.restConfig.queryParams

No

Query parameters with ${variable} substitution support.

derivation.restConfig.responseFilter

No

Dot-notation path to extract records from the API response.

derivation.dynamodbConfig.partitionKeyValue

No

Partition key value template with ${variable} support. If omitted, derived from applyTo.match.

derivation.dynamodbConfig.consistentRead

No

Use strongly consistent reads. Defaults to false.

derivation.applyTo.resource

Yes

Target resource type: file or asset.

derivation.applyTo.scope

No

Reserved. Currently all records are always processed regardless of this value. Defaults to all.

derivation.applyTo.match.source

Yes

External record field used for matching.

derivation.applyTo.match.target

Yes

Resource field to match against. Supports :basename, :ext, :tolower transforms.

derivation.applyTo.onNoMatch

No

Behavior on no match. Currently only skip (default) is supported.

derivation.applyTo.mappingPolicy

No

How derived attributes interact with existing ones: inherit (default) or override.

derivation.applyTo.responseFieldMapping

No

Field mappings from external record to resource metadata.

derivation.copyFields.sourcePrefix

No

Prefix to extract from external record fields.

derivation.copyFields.targetPrefix

No

Prefix to apply to target resource fields.

derivation.recordIdMapping.target

No

Resource field to store the external record ID (for example, file.external_id).

derivation.onError

No

Error handling: fail (default) or record-and-continue.

Error handling

The following table describes common derivation errors and their resolution.

Operation Error Resolution

s3Lookup

AccessDenied / 403

The assumed IAM role does not have s3:GetObject permission on the source file. Verify the role’s permissions policy.

s3Lookup

NoSuchKey / 404

The S3 object does not exist at the configured key. Verify the objectKey and ${variable} substitution values.

restLookup

401 / 403

Authentication failed. Verify the securityConfig credentials and Secrets Manager secret.

restLookup

404

The REST endpoint returned not found. Verify the apiBase and path configuration.

dynamodbLookup

AccessDeniedException

The assumed IAM role does not have dynamodb:GetItem permission on the target table. Verify the role’s permissions policy.

dynamodbLookup

ResourceNotFoundException

The configured DynamoDB table does not exist. Verify the tableName and region.

All

No matching records

When applyTo.onNoMatch is skip (default), unmatched records are silently skipped and processing continues with the next record.