

# Personally identifiable information (PII) recipe steps
PII recipe steps

Use these recipe steps to perform transformations on personally identifiable information (PII) in a dataset.

**Note**  
In addition to the recipe steps in this section, there are DataBrew recipe steps not designed specifically for PII that you can use to handle PII. An example is [DELETE](recipe-actions.DELETE.md), a basic column recipe step that deletes a column.

**Topics**
+ [

# CRYPTOGRAPHIC\$1HASH
](recipe-actions.CRYPTOGRAPHIC_HASH.md)
+ [

# DECRYPT
](recipe-actions.DECRYPT.md)
+ [

# DETERMINISTIC\$1DECRYPT
](recipe-actions.DETERMINISTIC_DECRYPT.md)
+ [

# DETERMINISTIC\$1ENCRYPT
](recipe-actions.DETERMINISTIC_ENCRYPT.md)
+ [

# ENCRYPT
](recipe-actions.ENCRYPT.md)
+ [

# MASK\$1CUSTOM
](recipe-actions.MASK_CUSTOM.md)
+ [

# MASK\$1DATE
](recipe-actions.MASK_DATE.md)
+ [

# MASK\$1DELIMITER
](recipe-actions.MASK_DELIMITER.md)
+ [

# MASK\$1RANGE
](recipe-actions.MASK_RANGE.md)
+ [

# REPLACE\$1WITH\$1RANDOM\$1BETWEEN
](recipe-actions.REPLACE_WITH_RANDOM_BETWEEN.md)
+ [

# REPLACE\$1WITH\$1RANDOM\$1DATE\$1BETWEEN
](recipe-actions.REPLACE_WITH_RANDOM_DATE_BETWEEN.md)
+ [

# SHUFFLE\$1ROWS
](recipe-actions.SHUFFLE_ROWS.md)

# CRYPTOGRAPHIC\$1HASH


Applies an algorithm to hash values in the column.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `secretId` – The ARN of the Secrets Manager secret key. The key used in the hash-based message authentication code (HMAC) prefix algorithm to hash the source columns, or `databrew!default` is the base64 decoded output for the value of the Secrets Manager secret key.
+ `secretVersion` – Optional. Defaults to the latest secret version.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.
+ `createSecretIfMissing` – Optional boolean. If true will attempt to create the secret on behalf of the caller.
+ `algorithm` – The algorithm used to hash your data. Valid enum values: MD5, SHA1, SHA256, SHA512, HMAC\$1MD5, HMAC\$1SHA1, HMAC\$1SHA256, HMAC\$1SHA512

  Each option refers to a different hashing algorithm. Those options with the "HMAC" prefix refer to a keyed hashing algorithm, and require the `secretId` parameter. For options without the "HMAC" prefix, the `secretId` parameter is not required.

  If you do not provide a hash algorithm, the service defaults to "HMAC\$1SHA256".

```
{
   "sourceColumns": ["phonenumber"],   
   "secretId": "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret",
   "entityTypeFilter": ["USA_ALL"]
}
```

When working in the interactive experience, in addition to the project’s role, the console user must have permission to `secretsmanager:GetSecretValue` on the provided Secrets Manager secret.

**Sample policy:**

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret"
            ]
        }
    ]
}
```

------

You may also opt to use the DataBrew-created default secret by passing `databrew!default` as secretId and parameter `createSecretIfMissing` as true. This is not recommended for production. Anyone with the **AwsGlueDataBrewFullAccessPolicy** role can use the default secret.

# DECRYPT


You can use the DECRYPT transform to decrypt inside of DataBrew. Your data can also be decrypted outside of DataBrew with the AWS Encryption SDK. If the provided KMS key ARN does not match what was used to encrypt the column, the decrypt operation fails. For more information on the AWS Encryption SDK, see [What is the AWS Encryption SDK](https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/introduction.html) in the *AWS Encryption SDK Developer Guide*.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `kmsKeyArn` – The key ARN of the AWS Key Management Service key to use to decrypt the source columns. For more information on the key ARN, see [Key ARN](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#key-id-key-ARN) in the *AWS Key Management Service Developer Guide*. 

```
{
   "sourceColumns": ["phonenumber"],
   "kmsKeyArn": "arn:aws:kms:us-east-1:012345678901:key/<kms-key-id>"
}
```

When working in the interactive experience, in addition to the project’s role, the console user must have permission to `kms:GenerateDataKey` and `kms:Decrypt` on the provided KMS key.

**Sample policy:**

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "kms:GenerateDataKey",
            "kms:Decrypt"
        ],
        "Resource": [
            "arn:aws:kms:us-east-1:012345678901:key/kms-key-id"
        ]
    }
  ]
}
```

------

# DETERMINISTIC\$1DECRYPT


Decrypts data encrypted with DETERMINISTIC\$1ENCRYPT.

This transformation is a no-op if the provided secret id and version does not match what was used to encrypt the column.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `secretId` – The ARN of the Secrets Manager secret key to use to decrypt the source columns.
+ `secretVersion` – Optional. Defaults to the latest secret version.

**Example**

```
{
   "sourceColumns": ["phonenumber"],   
   "secretId": "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret",
   "secretVersion": "adfe-1232-7563-3123"
}
```

When working in the interactive experience, in addition to the project’s role, the console user must have permission to secretsmanager:GetSecretValue on the provided Secrets Manager secret.

**Sample policy:**

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret"
            ]
        }
    ]
}
```

------

# DETERMINISTIC\$1ENCRYPT


Encrypts the column using AES-GCM-SIV with a 256 bit key. Data encrypted with DETERMINISTIC\$1ENCRYPT can only be decrypted inside of DataBrew with the DETERMINISTIC\$1DECRYPT transform. This transform does not use AWS KMS or the AWS Encryption SDK, and instead uses the [AWS LC github library](https://github.com/awslabs/aws-lc).

Can encrypt up to 400KB per cell. Does not preserve data type on decrypt.

**Note**  
Note: Using a secret for more than a year is discouraged.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `secretId` – The ARN of the Secrets Manager secret key to use to encrypt the source columns, or databrew\$1default.
+ `secretVersion` – Optional. Defaults to the latest secret version.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.
+ `createSecretIfMissing` – Optional boolean. If true will attempt to create the secret on behalf of the caller.

**Example**

```
{
   "sourceColumns": ["phonenumber"],   
   "secretId": "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret",
   "secretVersion": "adfe-1232-7563-3123",
   "entityTypeFilter": ["USA_ALL"]
}
```

When working in the interactive experience, in addition to the project’s role, the console user must have permission to `secretsmanager:GetSecretValue` on the provided Secrets Manager secret.

**Sample policy**

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:012345678901:secret:mysecret"
            ]
        }
    ]
}
```

------

# ENCRYPT


Encrypts values in the source columns with the [AWS Encryption SDK](https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/introduction.html). The DECRYPT transform can be used to decrypt inside of DataBrew. You can also decrypt the data outside of DataBrew using the AWS Encryption SDK.

The ENCRYPT transform can encrypt up to 128 MiB per cell. It will attempt to preserve the format on decryption. To preserve the data type, the data type metadata must serialize to less than 1KB. Otherwise, you must set the `preserveDataType` parameter to false. The data type metadata will be stored in plaintext in the encryption context. For more information on the encryption context, see [Encryption context](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#encrypt_context) in the *AWS Key Management Service Developer Guide*.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `kmsKeyArn` – The key ARN of the AWS Key Management Service key to use to encrypt the source columns. For more information on the key ARN, see [Key ARN](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#key-id-key-ARN) in the *AWS Key Management Service Developer Guide*.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.
+ `preserveDataType` – Optional boolean. Defaults to true. If false, the data type will not be stored.

In the following example, `entityTypeFilter` and `preserveDataType` are optional.

**Example**

```
{
    "sourceColumns": ["phonenumber"],
    "kmsKeyArn": "arn:aws:kms:us-east-1:012345678901:key/kms-key-id",
    "entityTypeFilter": ["USA_ALL"],
    "preserveDataType": "true"
}
```

When working in the interactive experience, in addition to the project’s role, the console user must have permission to `kms:GenerateDataKey` on the provided AWS KMS key.

**Sample policy:**

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "kms:GenerateDataKey"
        ],
        "Resource": [
            "arn:aws:kms:us-east-1:012345678901:key/kms-key-id"
        ]
    }
  ]
}
```

------

# MASK\$1CUSTOM


Masks characters that match a provided custom value.

**Parameters**
+ `sourceColumns` – A list of existing column names.
+ `maskSymbol` – A symbol that will be used to replace specified characters.
+ `regex` – If true, treats `customValue` as a regex pattern to match.
+ `customValue` – All occurrences (or regex matches) of `customValue` will be masked in the string.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.

**Example**  
  

```
// Mask all occurrences of 'amazon' in the column
{ 
    "RecipeAction": {
        "Operation": "MASK_CUSTOM",
        "Parameters": {
            "sourceColumns": ["company"],
            "maskSymbol": "#",
            "customValue": "amazon"
        }
    }
}
```

# MASK\$1DATE


Masks components of a date with a user-specified mask symbol.

**Parameters**
+ `sourceColumns` – A list of existing column names.
+ `maskSymbol` – A symbol that will be used to replace specified characters.
+ `redact` – An array of date component enums to mask. Valid enum values: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND.
+ `locale` – Optional IETF BCP 47 language tag. Defaults to `en`. The locale to use for date formatting.

**Example**  
  

```
// Mask year
{ 
    "RecipeAction": {
        "Operation": "MASK_DATE",
        "Parameters": {
            "sourceColumns": ["birthday"],
            "maskSymbol": "#",
            "redact": ["YEAR"]
        }
    }
}
```

# MASK\$1DELIMITER


Masks characters between two delimiters with a user-specified masking symbol.

**Parameters**
+ `sourceColumns` – A list of existing column names.
+ `maskSymbol` – A symbol that will be used to replace specified characters.
+ `startDelimiter` – A character indicating where masking is to begin. Omitting this parameter will apply the mask starting from the start of the string.
+ `endDelimiter` – A character indicating where masking is to end. Omitting this parameter will apply the masking from the startDelimiter to the end of the string.
+ `preserveDelimiters` – If true, applies mask to delimiters.
+ `alphabet` – An array of character sets to preserve during masking. Valid enum values: SYMBOLS, WHITESPACE.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.

**Example**  
  

```
// Mask string between '<' and '>', ignoring white spaces, symbols, and lowercase letters
{ 
    "RecipeAction": {
        "Operation": "MASK_DELIMITER",
        "Parameters": {
            "sourceColumns": ["name"],
            "maskSymbol": "#",
            "startDelimiter": "<",
            "endDelimiter": ">",
            "preserveDelimiters": false,
            "alphabet": ["WHITESPACE", "SYMBOLS"]
        }
    }
}
```

# MASK\$1RANGE


Masks characters between two positions with a user-specified masking symbol.

**Parameters**
+ `sourceColumns` – A list of existing column names.
+ `maskSymbol` – A symbol that will be used to replace specified characters.
+ `start` – A number indicating at which character position the masking is to begin (0-indexed, inclusive). Negative indexing is allowed. Omitting this parameter will apply the mask from the beginning of the string until 'stop'.
+ `stop` – A number indicating at which character position the masking is to end (0-indexed, exclusive). Negative indexing is allowed. Omitting this parameter will apply the mask from 'start' until the end of the string.
+ `alphabet` – An array of character sets enums to preserve during masking. Valid enum values: SYMBOLS, WHITESPACE.
+ `entityTypeFilter` – Optional array of [entity types](https://docs.aws.amazon.com/databrew/latest/dg/API_EntityDetectorConfiguration.html#databrew-Type-EntityDetectorConfiguration-EntityTypes). Can be used to encrypt only detected PII in free-text column.

**Example**  
  

```
// Mask entire string
{ 
    "RecipeAction": {
        "Operation": "MASK_RANGE",
        "Parameters": {
            "sourceColumns": ["firstName", "lastName"],
            "maskSymbol": "#"
        }
    }
}
```

# REPLACE\$1WITH\$1RANDOM\$1BETWEEN


Replaces values with a random number.

**Parameters**
+ `lowerBound` – The lower bound of the random number range.
+ `sourceColumns` – A list of existing column names.
+ `upperBound` – The upper bound of the random number range.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "REPLACE_WITH_RANDOM_BETWEEN",
        "Parameters": {
            "lowerBound": "1",
            "sourceColumns": ["column1", "column2"],
            "upperBound": "100"
        }
    }
}
```

# REPLACE\$1WITH\$1RANDOM\$1DATE\$1BETWEEN


Replaces values with a random date.

**Parameters**
+ `startDate` – The start of the range of dates from which a random date will be taken.
+ `sourceColumns` – A list of existing column names.
+ `endDate` – The end of the range of dates from which a random date will be taken.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "REPLACE_WITH_RANDOM_DATE_BETWEEN",
        "Parameters": {
            "startDate": "2020-12-12 12:12:12",
            "sourceColumns": ["column1", "column2"],
            "endDate": "2021-12-12 12:12:12"
        }
    }
}
```

# SHUFFLE\$1ROWS


Shuffles values in a given column. The shuffling can occur with values grouped by a secondary column.

**Parameters**
+ `sourceColumns` – An array of existing columns.
+ `groupByColumns` – An array of columns to group the source columns by while shuffling.

**Example**  
  

```
{
   "sourceColumns": ["age"],
   "*groupByColumns*": ["country"]
}
```