

# Data protection in AWS Glue DataBrew
<a name="data-protection"></a>

DataBrew offers several features that are designed to help protect your data.

**Topics**
+ [Encryption at rest](encryption-at-rest.md)
+ [Encryption in transit](encryption-in-transit.md)
+ [Key management](key-management.md)
+ [Identifying and handling personally identifiable information (PII)](personal-information-protection.md)
+ [DataBrew dependency on other AWS services](dependency-on-other-services.md)

The AWS [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/) applies to data protection in AWS Glue DataBrew. As described in this model, AWS is responsible for protecting the global infrastructure that runs all of the AWS Cloud. You are responsible for maintaining control over your content that is hosted on this infrastructure. You are also responsible for the security configuration and management tasks for the AWS services that you use. For more information about data privacy, see the [Data Privacy FAQ](https://aws.amazon.com/compliance/data-privacy-faq/). For information about data protection in Europe, see the [AWS Shared Responsibility Model and GDPR](https://aws.amazon.com/blogs/security/the-aws-shared-responsibility-model-and-gdpr/) blog post on the *AWS Security Blog*.

For data protection purposes, we recommend that you protect AWS account credentials and set up individual users with AWS IAM Identity Center or AWS Identity and Access Management (IAM). That way, each user is given only the permissions necessary to fulfill their job duties. We also recommend that you secure your data in the following ways:
+ Use multi-factor authentication (MFA) with each account.
+ Use SSL/TLS to communicate with AWS resources. We require TLS 1.2 and recommend TLS 1.3.
+ Set up API and user activity logging with AWS CloudTrail. For information about using CloudTrail trails to capture AWS activities, see [Working with CloudTrail trails](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-trails.html) in the *AWS CloudTrail User Guide*.
+ Use AWS encryption solutions, along with all default security controls within AWS services.
+ Use advanced managed security services such as Amazon Macie, which assists in discovering and securing sensitive data that is stored in Amazon S3.
+ If you require FIPS 140-3 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint. For more information about the available FIPS endpoints, see [Federal Information Processing Standard (FIPS) 140-3](https://aws.amazon.com/compliance/fips/).

We strongly recommend that you never put confidential or sensitive information, such as your customers' email addresses, into tags or free-form text fields such as a **Name** field. This includes when you work with DataBrew or other AWS services using the console, API, AWS CLI, or AWS SDKs. Any data that you enter into tags or free-form text fields used for names may be used for billing or diagnostic logs. If you provide a URL to an external server, we strongly recommend that you do not include credentials information in the URL to validate your request to that server.

# Encryption at rest
<a name="encryption-at-rest"></a>

DataBrew supports data encryption at rest for DataBrew projects and jobs. Projects and jobs can read encrypted data, and jobs can write encrypted data by calling [AWS Key Management Service (AWS KMS)](https://aws.amazon.com/kms/) to generate keys and decrypt data. You can also use KMS keys to encrypt the job logs that are generated by DataBrew jobs. You can specify encryption keys using the DataBrew console or the DataBrew API.

**Important**  
AWS Glue DataBrew supports only symmetric AWS KMS keys. For more information, see [AWS KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#kms_keys) in the *AWS Key Management Service Developer Guide*.

When you create jobs in DataBrew with encryption enabled, you can use the DataBrew console to specify S3-managed server-side encryption keys (SSE-S3) or KMS keys stored in AWS KMS (SSE-KMS) to encrypt data at rest.

**Important**  
When you use an Amazon Redshift dataset, objects unloaded to the provided temporary directory are encrypted with SSE-S3.

# Encrypting data written by DataBrew jobs
<a name="encryption-security-configuration"></a>

DataBrew jobs can write to encrypted Amazon S3 targets and encrypted Amazon CloudWatch Logs. 

**Topics**
+ [Setting up DataBrew to use encryption](#encryption-setup-DataBrew)
+ [Creating a route to AWS KMS for VPC jobs](#encryption-kms-vpc-endpoint)
+ [Setting up encryption with AWS KMS keys](#console-security-configurations-wizard)

## Setting up DataBrew to use encryption
<a name="encryption-setup-DataBrew"></a>

Follow this procedure to set up your DataBrew environment to use encryption.

**To set up your DataBrew environment to use encryption**

1. Create or update your AWS KMS keys to give AWS KMS permissions to the AWS Identity and Access Management (IAM) roles that are passed to DataBrew jobs. These IAM roles are used to encrypt CloudWatch Logs and Amazon S3 targets. For more information, see [Encrypt Log Data in CloudWatch Logs Using AWS KMS](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html) in the *Amazon CloudWatch Logs User Guide*. 

   In the following example, *`"role1"`*, *`"role2"`*, and *`"role3"`* are IAM roles that are passed to DataBrew jobs. This policy statement describes a KMS key policy that gives permission to the listed IAM roles to encrypt and decrypt with this KMS key.

   ```
      {
          "Effect": "Allow",
          "Principal": {
              "Service": "logs.region.amazonaws.com",
              "AWS": [
                  "role1",
                  "role2",
                  "role3"
              ]
          },
          "Action": [
              "kms:Encrypt*",
              "kms:Decrypt*",
              "kms:ReEncrypt*",
              "kms:GenerateDataKey*",
              "kms:Describe*"
          ],
          "Resource": "*"
      }
   ```

   The `Service` statement, shown as `"Service": "logs.region.amazonaws.com"`, is required if you use the key to encrypt CloudWatch Logs.

1. Ensure that the AWS KMS key is set to `ENABLED` before it is used.

For more information about specifying permissions using AWS KMS key policies, see [Using key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html).

## Creating a route to AWS KMS for VPC jobs
<a name="encryption-kms-vpc-endpoint"></a>

You can connect directly to AWS KMS through a private endpoint in your virtual private cloud (VPC) instead of connecting over the internet. When you use a VPC endpoint, communication between your VPC and AWS KMS is conducted entirely within the AWS network.

You can create an AWS KMS VPC endpoint within a VPC. Without this step, your DataBrew jobs might fail with a `kms timeout`. For detailed instructions, see [Connecting to AWS KMS Through a VPC Endpoint](https://docs.aws.amazon.com/kms/latest/developerguide/kms-vpc-endpoint.html) in the *AWS Key Management Service Developer Guide*. 

As you follow these instructions, on the [VPC console](https://console.aws.amazon.com//vpc), make sure to do the following:
+ Choose **Enable Private DNS name**.
+ For **Security group**, choose the security group (including a self-referencing rule) that you use for your DataBrew job that accesses Java Database Connectivity (JDBC).

When you run a DataBrew job that accesses JDBC data stores, DataBrew must have a route to the AWS KMS endpoint. You can provide the route with a network address translation (NAT) gateway or with an AWS KMS VPC endpoint. To create a NAT gateway, see [NAT Gateways](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html) in the *Amazon VPC User Guide*.

## Setting up encryption with AWS KMS keys
<a name="console-security-configurations-wizard"></a>

When you enable encryption on a job, it applies to both Amazon S3 and CloudWatch. The IAM role that is passed must have the following AWS KMS permissions.

For more information, see the following topics in the *Amazon Simple Storage Service User Guide*:
+ For information about `SSE-S3`, see [Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html). 
+ For information about `SSE-KMS`, see [Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html). 

# Encryption in transit
<a name="encryption-in-transit"></a>

AWS provides Secure Sockets Layer (SSL) encryption for data in flight. 

DataBrew support for JDBC data sources comes through AWS Glue. When connecting to JDBC data sources, DataBrew uses the settings on your AWS Glue connection, including the **Require SSL connection** option. For more information, see [AWS Glue Connection Properties - AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/connection-defining.html) in the *AWS Glue Developer Guide*.

AWS KMS provides both "bring your own key" encryption and server-side encryption for DataBrew extract, transform, load (ETL) processing and for the AWS Glue Data Catalog. 

# Key management
<a name="key-management"></a>

You can use IAM with DataBrew to define users, AWS resources, groups, roles, and fine-grained policies regarding access, denial, and more.

You can define the access to the metadata using both resource-based and identity-based policies, depending on your organization's needs. Resource-based policies list the principals that are allowed or denied access to your resources, allowing you to set up policies such as cross-account access. Identity policies are specifically attached to users, groups, and roles within IAM. 

DataBrew supports creating your own AWS KMS key "bring your own key" encryption. DataBrew also provides server-side encryption using KMS keys from AWS KMS for DataBrew jobs.

# Identifying and handling personally identifiable information (PII)
<a name="personal-information-protection"></a>

When you build analytic functions or machine learning models, you need safeguards to prevent exposure of personally identifiable information (PII) data. *PII* is personal data that can be used to identify an individual, such as an address, bank account number, or phone number. For example, when data analysts and data scientists use datasets to discover general demographic information, they should not have access to specific individuals' PII. 

DataBrew provides data masking mechanisms to obfuscate PII data during data preparation process. Depending on your organization's needs, there are different PII data redaction mechanisms available. You can obfuscate the PII data so that users can't revert it back, or you can make the obfuscation reversible. 

Identifying and masking PII data in DataBrew involves building a set of transforms that customers can use to redact PII data. Part of this process is providing PII data detection and statistics in the **Data Profile overview** dashboard on the DataBrew console. 

You can use the following data-masking techniques:
+ *Substitution* – Replace PII data with other authentic-looking values.
+ *Shuffling* – Shuffle the value from the same column in different rows. 
+ *Deterministic encryption* – Apply deterministic encryption algorithms to the column values. Deterministic encryption always produces the same ciphertext for a value. 
+ *Probabilistic encryption* – Apply probabilistic encryption algorithms to the column values. Probabilistic encryption produces different ciphertext each time that it's applied. 
+ *Decryption* – Decrypt columns based on encryption keys. 
+ *Nulling out or deletion* – Replace a particular field with a null value or delete the column. 
+ *Masking out* – Use character scrambling or mask certain portions in the columns. 
+ *Hashing* – Apply hash functions to the column values. 

For more information on using transforms, see [Personally identifiable information (PII) recipe steps](https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.pii.html). For more information on using profile jobs to detect PII, including a list of the entity types that can be detected, see [EntityDetectorConfiguration section for configuring PII](https://docs.aws.amazon.com/databrew/latest/dg/profile.configuration.html#entity-detector-configuration) in *Building a profile job configuration programmatically*.

# DataBrew dependency on other AWS services
<a name="dependency-on-other-services"></a>

To work with the DataBrew console, you need a minimum set of permissions to work with the DataBrew resources for your AWS account. In addition to these DataBrew permissions, the console requires permissions from the following services: 
+ CloudWatch Logs permissions to display logs.
+ IAM permissions to list and pass roles.
+ Amazon EC2 permissions to list VPCs, subnets, security groups, instances, and other objects. DataBrew uses these permissions to set up Amazon EC2 items such as VPCs when running DataBrew jobs.
+ Amazon S3 permissions to list buckets and objects.
+ AWS Glue permissions to read AWS Glue schema objects, such as databases, partitions, tables, and connections.
+ AWS Lake Formation permissions to work with Lake Formation data lakes.