Metadata lookup and enrichment
Metadata lookup is a role that S3, REST API, and DynamoDB connectors can fill. When a derive connector includes a derivation block in its trigger configuration, it uses the lookup-derivation execution model: fetch records from an external data source, match them to SDMA files or assets by a key field, and write the matched values as governed metadata attributes.
This is how you automate bulk metadata enrichment. A domain expert defines the external data source, the matching rules, and the field mappings once on a connector. That connector is approved on a template. From that point on, every asset created under that template is automatically enriched — files are matched to external records, metadata is applied, and the asset record is updated with provenance showing where each value came from.
Important
Derive connectors use the IAM role prefix SpatialDataManagementContentDerivation- (not SpatialDataManagementContentPublisher- used by publish connectors). SDMA validates the role name prefix and tests role assumption when the connector is created.
Create a derive connector with lookup
-
In the Spatial Data Portal, go to Library settings > Connectors.
-
Choose the Derive content tab.
-
Choose Create deriver.
-
Enter a connector name.
-
For Connector type, select the type that matches your data source:
-
Amazon S3 CSV file import — for
s3Lookupoperations (S3 connector type) -
Amazon DynamoDB import — for
dynamodbLookupoperations (DynamoDB connector type) -
REST API import — for
restLookupoperations (REST connector type)
-
-
Paste the connector configuration JSON (see the operation sections below) into the JSON editor.
-
Choose Create.
The connector type you select determines the protocol and authentication model. The derivation block in the trigger configuration is what makes it a lookup connector — the same S3, DynamoDB, or REST connector type can also fill publish, resource provision, or content production roles depending on its configuration.
Lookup operations
The lookup-derivation model supports the following operations (used as the derivation.op value).
| Operation | Description |
|---|---|
|
|
Reads a CSV or JSON file from Amazon S3 and matches records to resources. Useful for bulk metadata import from spreadsheets or data exports. |
|
|
Calls a REST API endpoint and matches response records to resources. Useful for enriching metadata from external catalogs or databases. |
|
|
Performs a GetItem on an Amazon DynamoDB table and applies the result to a resource. Useful for single-record lookups by key. |
Record matching
The applyTo block controls how external records are matched to resources:
-
resource– The resource type to apply derived metadata to (fileorasset). -
match.source– The field in the external record used for matching (for example,filename). -
match.target– The resource field to match against (for example,file.path:basename). Supports transforms::basename,:ext,:tolower. -
onNoMatch– Behavior when no match is found. Currently onlyskip(default) is supported, which continues to the next record. -
mappingPolicy– Controls how derived attributes interact with existing attributes:inherit(default) oroverride. -
responseFieldMapping– Maps external record fields to resource metadata attributes.
Amazon S3 lookup
The s3Lookup operation reads a file from S3 and parses it into records. It supports CSV and JSON content types.
Prerequisites
-
Upload the source data file (CSV or JSON) to an S3 bucket.
-
Create an IAM role with the following:
-
Role name must start with
SpatialDataManagementContentDerivation-(for example,SpatialDataManagementContentDerivation-MyS3Lookup). -
Trust policy must allow the SDMA solution account to assume it:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }Replace
<SDMA_ACCOUNT_ID>with the AWS account ID where SDMA is deployed. -
Permissions policy must grant
s3:GetObjecton the source file:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::<YOUR_BUCKET_NAME>/metadata/*" } ] }
-
CSV options
For CSV files, the following options are available in the trigger-level derivation.s3Config.csvOptions:
-
hasHeader– Whether the CSV has a header row. Defaults totrue. Whentrue, column names from the header are used as field names. Whenfalse, columns are indexed as0,1,2, and so on -
delimiter– CSV delimiter character. Defaults to,.
Example
This example reads a CSV file from S3 and applies metadata to files based on filename matching:
{ "s3Config": { "bucketName": "my-metadata-bucket", "securityConfig": { "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyS3Lookup", "type": "AssumeRole" } }, "triggers": [ { "description": "Derive file metadata from CSV on asset creation", "resources": ["asset"], "events": ["create"], "derivation": { "op": "s3Lookup", "sourceContentType": "csv", "s3Config": { "objectKey": "metadata/${project.projectId}/attributes.csv" }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "onNoMatch": "skip", "responseFieldMapping": [ { "source": "department", "target": "file.department" }, { "source": "classification", "target": "file.classification" } ] } } } ] }
REST API lookup
The restLookup operation calls a REST API endpoint and parses the response into records. It supports GET and POST methods, query parameter substitution, and response filtering.
Prerequisites
-
Identify the REST API endpoint that returns the metadata to derive.
-
If using API key, token, or basic auth, store the credentials in AWS Secrets Manager.
-
Create an IAM role with the following:
-
Role name must start with
SpatialDataManagementContentDerivation-(for example,SpatialDataManagementContentDerivation-MyRestLookup). -
Trust policy must allow the SDMA solution account to assume it:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }Replace
<SDMA_ACCOUNT_ID>with the AWS account ID where SDMA is deployed. -
Permissions policy must grant access to the Secrets Manager secret (if using API key, token, or basic auth):
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:<SECRET_NAME>" } ] }
-
Example
This example calls a REST API to look up file metadata from an external catalog by project ID:
{ "restConfig": { "apiBase": "https://api.example.com/v1", "securityConfig": { "type": "ApiKey", "secretArn": "arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:<SECRET_NAME>", "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyRestLookup" } }, "triggers": [ { "description": "Derive asset metadata from external catalog", "resources": ["asset"], "events": ["create"], "derivation": { "op": "restLookup", "restConfig": { "method": "GET", "path": "/records", "queryParams": { "projectId": "${project.projectId}" }, "responseFilter": "data.items" }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "responseFieldMapping": [ { "source": "category", "target": "file.category" }, { "source": "owner", "target": "file.owner" } ] } } } ] }
The responseFilter field uses dot notation to extract a nested array from the API response (for example, data.items extracts the items array from { "data": { "items": […] } }).
Amazon DynamoDB lookup
The dynamodbLookup operation performs a GetItem on a DynamoDB table and returns the item as a single record. It automatically deserializes DynamoDB attribute types (S, N, BOOL, M, L) to native values.
Prerequisites
-
Create a DynamoDB table with the metadata to derive.
-
Create an IAM role with the following:
-
Role name must start with
SpatialDataManagementContentDerivation-(for example,SpatialDataManagementContentDerivation-MyDynamoDBLookup). -
Trust policy must allow the SDMA solution account to assume it:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<SDMA_ACCOUNT_ID>:role/SpatialDataManagement-ConnectorInvocationFunctionRole" }, "Action": "sts:AssumeRole" } ] }Replace
<SDMA_ACCOUNT_ID>with the AWS account ID where SDMA is deployed. -
Permissions policy must grant
dynamodb:GetItemon the target table:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "dynamodb:GetItem", "Resource": "arn:aws:dynamodb:<REGION>:<ACCOUNT_ID>:table/<TABLE_NAME>" } ] }
-
Partition key resolution
The partition key value can be resolved in two ways:
-
Explicit – Set
dynamodbConfig.partitionKeyValueto a template string (for example,${asset.assetId}). -
From match config – If
partitionKeyValueis omitted, the value is derived from theapplyTo.match.targetfield. For example, ifmatch.targetisfile.path:basename, the file’s basename is used as the partition key value.
Example
This example looks up file metadata from a DynamoDB table using the filename as the partition key:
{ "dynamodbConfig": { "tableName": "file-metadata-table", "partitionKey": "filename", "region": "us-west-2", "securityConfig": { "assumeRoleArn": "arn:aws:iam::<ACCOUNT_ID>:role/SpatialDataManagementContentDerivation-MyDynamoDBLookup", "type": "AssumeRole" } }, "triggers": [ { "description": "Derive file metadata from DynamoDB", "resources": ["asset"], "events": ["create"], "derivation": { "op": "dynamodbLookup", "dynamodbConfig": { "consistentRead": true }, "applyTo": { "resource": "file", "scope": "all", "match": { "source": "filename", "target": "file.path:basename" }, "responseFieldMapping": [ { "source": "status", "target": "file.status" }, { "source": "priority", "target": "file.priority" } ] } } } ] }
Note
The dynamodbConfig at the trigger’s derivation level is merged with the connector-level dynamodbConfig. Use the trigger-level config to override specific fields like consistentRead per trigger.
Copy fields
The copyFields configuration provides a shortcut for copying a set of fields from the external record to resource metadata using prefix-based matching. Instead of listing individual field mappings, you specify a source prefix and target prefix:
"copyFields": { "sourcePrefix": "metadata.", "targetPrefix": "file." }
This copies all fields starting with metadata. from the external record to the resource, replacing the prefix with file.. For example, metadata.classification becomes file.classification.
Record ID mapping
The recordIdMapping configuration stores the external record’s identifier as a metadata attribute on the matched resource:
"recordIdMapping": { "target": "file.external_id" }
This writes the record’s key (from the external source) to the external_id attribute on the matched file.
Configuration fields
The following tables describe the configuration fields for the metadata derivation connector.
Connector-level fields
| Field | Required | Description |
|---|---|---|
|
|
Yes (for |
Target S3 bucket name. |
|
|
Yes (for |
Authentication configuration. Must use |
|
|
Yes (for |
Base URL for the REST API. |
|
|
Yes (for |
Authentication configuration. Supports |
|
|
Yes (for |
DynamoDB table name. |
|
|
Yes (for |
Partition key attribute name. |
|
|
No |
AWS Region of the DynamoDB table. |
|
|
Yes (for |
Authentication configuration. Must use |
Trigger-level derivation fields
| Field | Required | Description |
|---|---|---|
|
|
Yes |
Derivation operation: |
|
|
No |
Content type of the source data. |
|
|
Yes (for |
S3 object key with |
|
|
No |
Whether the CSV has a header row. Defaults to |
|
|
No |
CSV delimiter character. Defaults to |
|
|
No |
HTTP method for |
|
|
Yes (for |
API path appended to connector-level |
|
|
No |
Query parameters with |
|
|
No |
Dot-notation path to extract records from the API response. |
|
|
No |
Partition key value template with |
|
|
No |
Use strongly consistent reads. Defaults to |
|
|
Yes |
Target resource type: |
|
|
No |
Reserved. Currently all records are always processed regardless of this value. Defaults to |
|
|
Yes |
External record field used for matching. |
|
|
Yes |
Resource field to match against. Supports |
|
|
No |
Behavior on no match. Currently only |
|
|
No |
How derived attributes interact with existing ones: |
|
|
No |
Field mappings from external record to resource metadata. |
|
|
No |
Prefix to extract from external record fields. |
|
|
No |
Prefix to apply to target resource fields. |
|
|
No |
Resource field to store the external record ID (for example, |
|
|
No |
Error handling: |
Error handling
The following table describes common derivation errors and their resolution.
| Operation | Error | Resolution |
|---|---|---|
|
|
|
The assumed IAM role does not have |
|
|
|
The S3 object does not exist at the configured key. Verify the |
|
|
|
Authentication failed. Verify the |
|
|
|
The REST endpoint returned not found. Verify the |
|
|
|
The assumed IAM role does not have |
|
|
|
The configured DynamoDB table does not exist. Verify the |
|
All |
No matching records |
When |