

# Getting started with AWS Glue
<a name="setting-up"></a>

The following sections provide information on setting up AWS Glue. Not all of the setting up sections are required to start using AWS Glue. You can use the instructions as needed to set up IAM permissions, encryption, and DNS ( if you're using a VPC environment to access data stores or if you're using interactive sessions).

**Topics**
+ [Overview of using AWS Glue](start-console-overview.md)
+ [Setting up IAM permissions for AWS Glue](set-up-iam.md)
+ [Setting up AWS Glue usage profiles](start-usage-profiles.md)
+ [Getting started with the AWS Glue Data Catalog](start-data-catalog.md)
+ [Setting up network access to data stores](start-connecting.md)
+ [Setting up encryption in AWS Glue](set-up-encryption.md)
+ [Setting up networking for development for AWS Glue](start-development-endpoint.md)

# Overview of using AWS Glue
<a name="start-console-overview"></a>

With AWS Glue, you store metadata in the AWS Glue Data Catalog. You use this metadata to orchestrate ETL jobs that transform data sources and load your data warehouse or data lake. The following steps describe the general workflow and some of the choices that you make when working with AWS Glue.

**Note**  
You can use the following steps, or you can create a workflow that automatically performs steps 1 through 3. For more information, see [Performing complex ETL activities using blueprints and workflows in AWS Glue](orchestrate-using-workflows.md).

1. Populate the AWS Glue Data Catalog with table definitions.

   In the console, for persistent data stores, you can add a crawler to populate the AWS Glue Data Catalog. You can start the **Add crawler** wizard from the list of tables or the list of crawlers. You choose one or more data stores for your crawler to access. You can also create a schedule to determine the frequency of running your crawler. For data streams, you can manually create the table definition, and define stream properties.

   Optionally, you can provide a custom classifier that infers the schema of your data. You can create custom classifiers using a grok pattern. However, AWS Glue provides built-in classifiers that are automatically used by crawlers if a custom classifier does not recognize your data. When you define a crawler, you don't have to select a classifier. For more information about classifiers in AWS Glue, see [Defining and managing classifiers](add-classifier.md). 

   Crawling some types of data stores requires a connection that provides authentication and location information. If needed, you can create a connection that provides this required information in the AWS Glue console.

   The crawler reads your data store and creates data definitions and named tables in the AWS Glue Data Catalog. These tables are organized into a database of your choosing. You can also populate the Data Catalog with manually created tables. With this method, you provide the schema and other metadata to create table definitions in the Data Catalog. Because this method can be a bit tedious and error prone, it's often better to have a crawler create the table definitions.

   For more information about populating the AWS Glue Data Catalog with table definitions, see [Creating tables](tables-described.md).

1. Define a job that describes the transformation of data from source to target.

   Generally, to create a job, you have to make the following choices:
   + Choose a table from the AWS Glue Data Catalog to be the source of the job. Your job uses this table definition to access your data source and interpret the format of your data.
   + Choose a table or location from the AWS Glue Data Catalog to be the target of the job. Your job uses this information to access your data store.
   + Tell AWS Glue to generate a script to transform your source to target. AWS Glue generates the code to call built-in transforms to convert data from its source schema to target schema format. These transforms perform operations such as copy data, rename columns, and filter data to transform data as necessary. You can modify this script in the AWS Glue console.

   For more information about defining jobs in AWS Glue, see [Building visual ETL jobs](author-job-glue.md).

1. Run your job to transform your data.

   You can run your job on demand, or start it based on a one of these trigger types:
   + A trigger that is based on a cron schedule.
   + A trigger that is event-based; for example, the successful completion of another job can start an AWS Glue job.
   + A trigger that starts a job on demand.

   For more information about triggers in AWS Glue, see [Starting jobs and crawlers using triggers](trigger-job.md).

1. Monitor your scheduled crawlers and triggered jobs.

   Use the AWS Glue console to view the following:
   + Job run details and errors.
   + Crawler run details and errors.
   + Any notifications about AWS Glue activities

   For more information about monitoring your crawlers and jobs in AWS Glue, see [Monitoring AWS Glue](monitor-glue.md).

# Setting up IAM permissions for AWS Glue
<a name="set-up-iam"></a>

The instructions in this topic help you quickly set up AWS Identity and Access Management (IAM) permissions for AWS Glue. You will complete the following tasks:
+ Grant your IAM identities access to AWS Glue resources.
+ Create a service role for running jobs, accessing data, and running AWS Glue Data Quality tasks.

For detailed instructions that you can use to customize IAM permissions for AWS Glue, see [Configuring IAM permissions for AWS Glue](configure-iam-for-glue.md).

**To set up IAM permissions for AWS Glue in the AWS Management Console**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. Choose **Getting started**.

1. Under **Prepare your account for AWS Glue**, choose **Set up IAM permissions**.

1. Choose the IAM identities (roles or users) that you want to give AWS Glue permissions to. AWS Glue attaches the `[AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess)` managed policy to these identities. You can skip this step if you want to set these permissions manually or only want to set a default service role.

1. Choose **Next**.

1. Choose the level of Amazon S3 access that your roles and users need. The options that you choose in this step are applied to all of the identities that you selected.

   1. Under **Choose S3 locations**, choose the Amazon S3 locations that you want to grant access to.

   1. Next, select whether your identities should have **Read only (recommended)** or **Read and write** access to the locations that you previously selected. AWS Glue adds permissions policies to your identities based on the combination of locations and read or write permissions you select.

      The following table displays the permissions that AWS Glue attaches for Amazon S3 access.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/set-up-iam.html)

1. Choose **Next**.

1. Choose a default AWS Glue service role for your account. A service role is an IAM role that AWS Glue uses to access resources in other AWS services on your behalf. For more information, see [Service roles for AWS Glue](security_iam_service-with-iam.md#security_iam_service-with-iam-roles-service).
   + When you choose the standard AWS Glue service role, AWS Glue creates a new IAM role in your AWS account named `AWSGlueServiceRole` with the following managed policies attached. If your account already has an IAM role named `AWSGlueServiceRole`, AWS Glue attaches these policies to the existing role.
     +  [ AWSGlueServiceRole](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – This managed policy is required for AWS Glue to access and manage resources on your behalf. It allows AWS Glue to create, update, and delete various resources such as AWS Glue jobs, crawlers, and connections. This policy also grants permissions for AWS Glue to access Amazon CloudWatch logs for logging purposes. For the purposes of getting started, we recommend using this policy to learn how to use AWS Glue. As you get more comfortable with AWS Glue, you can create policies that allow you to fine-tune access to resources as needed. 
     +  [AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – This managed policy grants full access to the AWS Glue service through the AWS Management Console. This policy grants permissions to perform any operation within AWS Glue, enabling you to create, modify, and delete any AWS Glue resource as needed. However, it's important to note that this policy does not grant permissions to access the underlying data stores or other AWS services that may be involved in the ETL process. Due to the broad scope of permissions granted by the `AWSGlueConsoleFullAccess` policy, it should be assigned with caution and following the principle of least privilege. It is generally recommended to create and use more granular policies tailored to specific use cases and requirements whenever possible. 
     +  [ AWSGlueConsole-S3-read-only-policy](https://console.aws.amazon.com/iam/home#policies/details/arn:aws:iam:aws:policy/AWSGlueConsole-S3-read-only-policy) – This policy allows AWS Glue to read data from specified Amazon S3 buckets, but it does not grant permissions to write or modify data in Amazon S3 or 

        [ AWSGlueConsole-S3-read-and-write](https://console.aws.amazon.com/iam/home#policies/details/arn:aws:iam:aws:policy/AWSGlueConsole-S3-read-and-write) – This policy allows AWS Glue to read and write data to specified Amazon S3 buckets as part of the ETL process. 
   +  When you choose an existing IAM role, AWS Glue sets the role as the default, but doesn't add `AWSGlueServiceRole` permissions to it. Ensure that you've configured the role to use as a service role for AWS Glue. For more information, see [Step 1: Create an IAM policy for the AWS Glue service](create-service-policy.md) and [Step 2: Create an IAM role for AWS Glue](create-an-iam-role.md). 

1. Choose **Next**.

1. Finally, review the permissions you've selected and then choose **Apply changes**. When you apply the changes, AWS Glue adds IAM permissions to the identities that you selected. You can view or modify the new permissions in the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

You've now completed the minimum IAM permissions setup for AWS Glue. In a production environment, we recommend that you familiarize yourself with [Security in AWS Glue](security.md) and [Identity and access management for AWS Glue](security-iam.md) to help you secure AWS resources for your use case.

## Next steps
<a name="set-up-iam-next-steps"></a>

Now that you have IAM permissions set up, you can explore the following topics to get started using AWS Glue:
+ [Getting Started with AWS Glue in AWS Skill Builder](https://explore.skillbuilder.aws/learn/course/external/view/elearning/8171/getting-started-with-aws-glue)
+ [Getting started with the AWS Glue Data Catalog](start-data-catalog.md)

# Setting up for AWS Glue Studio
<a name="setting-up-studio"></a>

Complete the tasks in this section when you're using AWS Glue for the visual ETL for the first time:

**Topics**
+ [Review IAM permissions needed for the AWS Glue Studio user](getting-started-min-privs.md)
+ [Review IAM permissions needed for ETL jobs](getting-started-min-privs-job.md)
+ [Set up IAM permissions for AWS Glue Studio](getting-started-iam-permissions.md)
+ [Configure a VPC for your ETL job](getting-started-vpc-config.md)

# Review IAM permissions needed for the AWS Glue Studio user
<a name="getting-started-min-privs"></a>

To use AWS Glue Studio, the user must have access to various AWS resources. The user must be able to view and select Amazon S3 buckets, IAM policies and roles, and AWS Glue Data Catalog objects.

## AWS Glue service permissions
<a name="getting-started-min-privs-glue"></a>

AWS Glue Studio uses the actions and resources of the AWS Glue service. Your user needs permissions on these actions and resources to effectively use AWS Glue Studio. You can grant the AWS Glue Studio user the `AWSGlueConsoleFullAccess` managed policy, or create a custom policy with a smaller set of permissions.

**Important**  
Per security best practices, it is recommended to restrict access by tightening policies to further restrict access to Amazon S3 bucket and Amazon CloudWatch log groups. For an example Amazon S3 policy, see [Writing IAM Policies: How to Grant Access to an Amazon S3 Bucket](https://aws.amazon.com/blogs/security/writing-iam-policies-how-to-grant-access-to-an-amazon-s3-bucket/). 

## Creating Custom IAM Policies for AWS Glue Studio
<a name="getting-started-all-gs-privs"></a>

You can create a custom policy with a smaller set of permissions for AWS Glue Studio. The policy can grant permissions for a subset of objects or actions. Use the following information when creating a custom policy. 

 To use the AWS Glue Studio APIs, include `glue:UseGlueStudio` in the action policy in your IAM permissions. Using `glue:UseGlueStudio` will allow you to access all AWS Glue Studio actions even as more actions are added to the API over time. 

 For more information on actions defined by AWS Glue, see [ Actions defined by AWS Glue](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsglue.html). 

 **Data preparation authoring Actions** 
+ SendRecipeAction
+ GetRecipeAction

 **Directed acyclic graph (DAG) Actions** 
+ CreateDag
+ UpdateDag
+ GetDag
+ DeleteDag

 **Job Actions** 
+ SaveJob
+ GetJob
+ CreateJob
+ DeleteJob
+ GetJobs
+ UpdateJob

 **Job run Actions** 
+ StartJobRun
+ GetJobRuns
+ BatchStopJobRun
+ GetJobRun
+ QueryJobRuns
+ QueryJobs
+ QueryJobRunsAggregated

 **Schema Actions** 
+ GetSchema
+ GetInferredSchema

 **Database Actions** 
+ GetDatabases

 **Plan Actions** 
+ GetPlan

 **Table Actions** 
+ SearchTables
+ GetTables
+ GetTable

 **Connection Actions** 
+ CreateConnection
+ DeleteConnection
+ UpdateConnection
+ GetConnections
+ GetConnection

 **Mapping Actions** 
+ GetMapping

 **S3 Proxy Actions**
+ ListBuckets
+ ListObjectsV2
+ GetBucketLocation

**Security Configuration Actions**
+ GetSecurityConfigurations 

**Script Actions**
+ CreateScript (different from API of same name in AWS Glue)

## Accessing AWS Glue Studio APIs
<a name="getting-started-glue-studio-apis"></a>

 To access AWS Glue Studio, add `glue:UseGlueStudio` in the actions policy list in the IAM permissions. 

 In the example below, `glue:UseGlueStudio` is included in the action policy, but the AWS Glue Studio APIs are not individually identified. That is because when you include `glue:UseGlueStudio`, you are automatically granted access to the internal APIs without having to specify the individual AWS Glue Studio APIs in the IAM permissions. 

 In the example, the additional listed action policies (for example, `glue:SearchTables`) are not AWS Glue Studio APIs, so they will need to be included in the IAM permissions as required. You may also want to include Amazon S3 Proxy actions to specify the level of Amazon S3 access to grant. The example policy below provides access to open AWS Glue Studio, create a visual job, and save/run it if the IAM role selected has sufficient access. 

## Notebook and data preview permissions
<a name="getting-started-data-preview-perms"></a>

Data previews and notebooks allow you to see a sample of your data at any stage of your job (reading, transforming, writing), without having to run the job. You specify an AWS Identity and Access Management (IAM) role for AWS Glue Studio to use when accessing the data. IAM roles are intended to be assumable and do not have standard long-term credentials such as a password or access keys associated with it. Instead, when AWS Glue Studio assumes the role, IAM provides it with temporary security credentials. 

To ensure data previews and notebook commands work correctly, use a role that has a name that starts with the string `AWSGlueServiceRole`. If you choose to use a different name for your role, then you must add the `iam:passrole` permission and configure a policy for the role in IAM. For more information, see [Create an IAM policy for roles not named "AWSGlueServiceRole\$1"](getting-started-iam-permissions.md#create-iam-policy).

**Warning**  
If a role grants the `iam:passrole` permission for a notebook, and you implement role chaining, a user could unintentionally gain access to the notebook. There is currently no auditing implemented which would allow you to monitor which users have been granted access to the notebook.

If you would like to deny an IAM identity the ability to create data preview sessions, consult the following example [Deny an identity the ability to create data preview sessions](security_iam_id-based-policy-examples.md#deny-data-preview-sessions-per-identity).

## Amazon CloudWatch permissions
<a name="getting-started-min-privs-cloudwatch"></a>

You can monitor your AWS Glue Studio jobs using Amazon CloudWatch, which collects and processes raw data from AWS Glue into readable, near-real-time metrics. By default, AWS Glue metrics data is sent to CloudWatch automatically. For more information, see [What Is Amazon CloudWatch?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatch.html) in the *Amazon CloudWatch User Guide*, and [AWS Glue Metrics](https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html#awsglue-metrics) in the *AWS Glue Developer Guide*. 

To access CloudWatch dashboards, the user accessing AWS Glue Studio needs one of the following:
+ The `AdministratorAccess` policy
+ The `CloudWatchFullAccess` policy
+ A custom policy that includes one or more of these specific permissions:
  + `cloudwatch:GetDashboard` and `cloudwatch:ListDashboards` to view dashboards
  + `cloudwatch:PutDashboard` to create or modify dashboards
  + `cloudwatch:DeleteDashboards` to delete dashboards

For more information for changing permissions for an IAM user using policies, see [Changing Permissions for an IAM User](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_change-permissions.html) in the *IAM User Guide*. 

# Review IAM permissions needed for ETL jobs
<a name="getting-started-min-privs-job"></a>

When you create a job using AWS Glue Studio, the job assumes the permissions of the IAM role that you specify when you create it. This IAM role must have permission to extract data from your data source, write data to your target, and access AWS Glue resources. 

The name of the role that you create for the job must start with the string `AWSGlueServiceRole` for it to be used correctly by AWS Glue Studio. For example, you might name your role `AWSGlueServiceRole-FlightDataJob`.

## Data source and data target permissions
<a name="getting-started-min-privs-data"></a>

An AWS Glue Studio job must have access to Amazon S3 for any sources, targets, scripts, and temporary directories that you use in your job. You can create a policy to provide fine-grained access to specific Amazon S3 resources. 
+ Data sources require `s3:ListBucket` and `s3:GetObject` permissions. 
+ Data targets require `s3:ListBucket`, `s3:PutObject`, and `s3:DeleteObject` permissions.

**Note**  
 Your IAM policy needs to allow `s3:GetObject` for the specific buckets used for hosting AWS Glue transforms.   
 The following buckets are owned by the AWS service account and is worldwide readable. These buckets serve as a repository for the source code pertinent to a subset of transformations accessible via the AWS Glue Studio visual editor. Permissions on the bucket are set up to deny any other API action on the bucket. Anybody can read those scripts we provide for the transformations, but nobody outside our service team can "put" anything in them. When your AWS Glue job runs, that file is pulled in as a local import so the file is downloaded to the local container. After that, there is no further communication with that account. 

 Region: Bucket name 
+ af-south-1: aws-glue-studio-transforms-762339736633-prod-af-south-1
+ ap-east-1: aws-glue-studio-transforms-125979764932-prod-ap-east-1
+ ap-northeast-2: aws-glue-studio-transforms-673535381443-prod-ap-northeast-2
+ ap-northeast-3: aws-glue-studio-transforms-149976050262-prod-ap-northeast-3
+ ap-south-1: aws-glue-studio-transforms-584702181950-prod-ap-south-1
+ ap-south-2: aws-glue-studio-transforms-380279651983-prod-ap-south-2
+ ap-southeast-1: aws-glue-studio-transforms-737106620487-prod-ap-southeast-1
+ ap-southeast-2: aws-glue-studio-transforms-234881715811-prod-ap-southeast-2
+ ap-southeast-3: aws-glue-studio-transforms-151265630221-prod-ap-southeast-3
+ ap-southeast-4: aws-glue-studio-transforms-052235663858-prod-ap-southeast-4
+ ca-central-1: aws-glue-studio-transforms-622716468547-prod-ca-central-1
+ ca-west-1: aws-glue-studio-transforms-915795495192-prod-ca-west-1
+ eu-central-1: aws-glue-studio-transforms-560373232017-prod-eu-central-1
+ eu-central-2: aws-glue-studio-transforms-907358657121-prod-eu-central-2
+ eu-north-1: aws-glue-studio-transforms-312557305497-prod-eu-north-1
+ eu-south-1: aws-glue-studio-transforms-939684186351-prod-eu-south-1
+ eu-south-2: aws-glue-studio-transforms-239737454084-prod-eu-south-2
+ eu-west-1: aws-glue-studio-transforms-244479516193-prod-eu-west-1
+ eu-west-2: aws-glue-studio-transforms-804222392271-prod-eu-west-2
+ eu-west-3: aws-glue-studio-transforms-371299348807-prod-eu-west-3
+ il-central-1: aws-glue-studio-transforms-806964611811-prod-il-central-1
+ me-central-1: aws-glue-studio-transforms-733304270342-prod-me-central-1
+ me-south-1: aws-glue-studio-transforms-112120182341-prod-me-south-1
+ sa-east-1: aws-glue-studio-transforms-881619130292-prod-sa-east-1
+ us-east-1: aws-glue-studio-transforms-510798373988-prod-us-east-1
+ us-east-2: aws-glue-studio-transforms-251189692203-prod-us-east-2
+ us-west-1: aws-glue-studio-transforms-593230150239-prod-us-west-1
+ us-west-2: aws-glue-studio-transforms-818035625594-prod-us-west-2
+ ap-northeast-1: aws-glue-studio-transforms-200493242866-prod-ap-northeast-1
+ cn-north-1: aws-glue-studio-transforms-071033555442-prod-cn-north-1
+ cn-northwest-1: aws-glue-studio-transforms-070947029561-prod-cn-northwest-1
+ us-gov-west-1: aws-glue-studio-transforms-227493901923-prod-us-gov-west-1-2604
+ eusc-de-east-1: aws-glue-studio-transforms-780995497573-prod-eusc-de-east-1-555

If you choose Amazon Redshift as your data source, you can provide a role for cluster permissions. Jobs that run against a Amazon Redshift cluster issue commands that access Amazon S3 for temporary storage using temporary credentials. If your job runs for more than an hour, these credentials will expire causing the job to fail. To avoid this problem, you can assign a role to the Amazon Redshift cluster itself that grants the necessary permissions to jobs using temporary credentials. For more information, see [Moving Data to and from Amazon Redshift](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-redshift.html) in the *AWS Glue Developer Guide*.

If the job uses data sources or targets other than Amazon S3, then you must attach the necessary permissions to the IAM role used by the job to access these data sources and targets. For more information, see [Setting Up Your Environment to Access Data Stores](https://docs.aws.amazon.com/glue/latest/dg/start-connecting.html) in the *AWS Glue Developer Guide*.

If you're using connectors and connections for your data store, you need additional permissions, as described in [Permissions required for using connectors](#getting-started-min-privs-connectors).

## Permissions required for deleting jobs
<a name="getting-started-min-privs-delete-job"></a>

In AWS Glue Studio you can select multiple jobs in the console to delete. To perform this action, you must have the `glue:BatchDeleteJob` permission. This is different from the AWS Glue console, which requires the `glue:DeleteJob` permission for deleting jobs.

## AWS Key Management Service permissions
<a name="getting-started-min-privs-kms"></a>

If you plan to access Amazon S3 sources and targets that use server-side encryption with AWS Key Management Service (AWS KMS), then attach a policy to the AWS Glue Studio role used by the job that enables the job to decrypt the data. The job role needs the `kms:ReEncrypt`, `kms:GenerateDataKey`, and `kms:DescribeKey` permissions. Additionally, the job role needs the `kms:Decrypt` permission to upload or download an Amazon S3 object that is encrypted with an AWS KMS customer master key (CMK).

There are additional charges for using AWS KMS CMKs. For more information, see [AWS Key Management Service Concepts - Customer Master Keys (CMKs)](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) and [AWS Key Management Service Pricing](https://aws.amazon.com/kms/pricing) in the *AWS Key Management Service Developer Guide*.

## Permissions required for using connectors
<a name="getting-started-min-privs-connectors"></a>

If you're using an AWS Glue Custom Connector and connection to access a data store, the role used to run the AWS Glue ETL job needs additional permissions attached:
+ The AWS managed policy `AmazonEC2ContainerRegistryReadOnly` for accessing connectors purchased from AWS Marketplace.
+ The `glue:GetJob` and `glue:GetJobs` permissions.
+ AWS Secrets Manager permissions for accessing secrets that are used with connections. Refer to [Example: Permission to retrieve secret values](https://docs.aws.amazon.com/secretsmanager/latest/userguide/auth-and-access_examples.html#auth-and-access_examples_read) for example IAM policies.

If your AWS Glue ETL job runs within a VPC running Amazon VPC, then the VPC must be configured as described in [Configure a VPC for your ETL job](getting-started-vpc-config.md).

# Set up IAM permissions for AWS Glue Studio
<a name="getting-started-iam-permissions"></a>

You can create the roles and assign policies to users and job roles by using the AWS administrator user. 

You can use the **AWSGlueConsoleFullAccess** AWS managed policy to provide the necessary permissions for using the AWS Glue Studio console. 

To create your own policy, follow the steps documented in [Create an IAM Policy for the AWS Glue Service](https://docs.aws.amazon.com/glue/latest/dg/create-service-policy.html) in the *AWS Glue Developer Guide*. Include the IAM permissions described previously in [Review IAM permissions needed for the AWS Glue Studio user](getting-started-min-privs.md).

**Topics**
+ [Attach policies to the AWS Glue Studio user](#attach-iam-policy)
+ [Create an IAM policy for roles not named "AWSGlueServiceRole\$1"](#create-iam-policy)

## Attach policies to the AWS Glue Studio user
<a name="attach-iam-policy"></a>

Any AWS user that signs in to the AWS Glue Studio console must have permissions to access specific resources. You provide those permissions by using assigning IAM policies to the user. 

**To attach the **AWSGlueConsoleFullAccess** managed policy to a user**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Policies**. 

1. In the list of policies, select the check box next to the **AWSGlueConsoleFullAccess**. You can use the **Filter** menu and the search box to filter the list of policies. 

1. Choose **Policy actions**, and then choose **Attach**. 

1. Choose the user to attach the policy to. You can use the **Filter** menu and the search box to filter the list of principal entities. After choosing the user to attach the policy to, choose **Attach policy**. 

1. Repeat the previous steps to attach additional policies to the user, as needed.

## Create an IAM policy for roles not named "AWSGlueServiceRole\$1"
<a name="create-iam-policy"></a>

**To configure an IAM policy for roles used by AWS Glue Studio**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Add a new IAM policy. You can add to an existing policy or create a new IAM inline policy. To create an IAM policy:

   1. Choose **Policies**, and then choose **Create Policy**. If a **Get Started** button appears, choose it, and then choose **Create Policy**.

   1. Next to **Create Your Own Policy**, choose **Select**.

   1. For **Policy Name**, type any value that is easy for you to refer to later. Optionally, type descriptive text in **Description**.

   1. For **Policy Document**, type a policy statement with the following format, and then choose **Create Policy**:

1. Copy and paste the following blocks into the policy under the "Statement" array, replacing *my-interactive-session-role-prefix* with the prefix for all common roles to associate with permissions for AWS Glue.

   ```
   {
       "Action": [
           "iam:PassRole"
       ],
       "Effect": "Allow",
       "Resource": "arn:aws:iam::*:role/my-interactive-session-role-prefix*",
       "Condition": {
           "StringLike": {
               "iam:PassedToService": [
                   "glue.amazonaws.com "
               ]
           }
       }
   }
   ```

    Here is the full example with the Version and Statement arrays included in the policy 

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Action": [
           "iam:PassRole"
         ],
         "Effect": "Allow",
         "Resource": "arn:aws:iam::*:role/my-interactive-session-role-prefix*",
         "Condition": {
           "StringLike": {
             "iam:PassedToService": [
               "glue.amazonaws.com "
             ]
           }
         }
       }
     ]
   }
   ```

------

1. To enable the policy for a user, choose **Users**.

1. Choose the user to whom you want to attach the policy.

# Configure a VPC for your ETL job
<a name="getting-started-vpc-config"></a>

You can use Amazon Virtual Private Cloud (Amazon VPC) to define a virtual network in your own logically isolated area within the AWS Cloud, known as a *virtual private cloud (VPC)*. You can launch your AWS resources, such as instances, into your VPC. Your VPC closely resembles a traditional network that you might operate in your own data center, with the benefits of using the scalable infrastructure of AWS. You can configure your VPC; you can select its IP address range, create subnets, and configure route tables, network gateways, and security settings. You can connect instances in your VPC to the internet. You can connect your VPC to your own corporate data center, making the AWS Cloud an extension of your data center. To protect the resources in each subnet, you can use multiple layers of security, including security groups and network access control lists. For more information, see the [Amazon VPC User Guide](https://docs.aws.amazon.com/vpc/latest/userguide/).

You can configure your AWS Glue ETL jobs to run within a VPC when using connectors. You must configure your VPC for the following, as needed:
+ Public network access for data stores not in AWS. All data stores that are accessed by the job must be available from the VPC subnet. 
+ If your job needs to access both VPC resources and the public internet, the VPC needs to have a network address translation (NAT) gateway inside the VPC. 

  For more information, see [Setting Up Your Environment to Access Data Stores](https://docs.aws.amazon.com/glue/latest/dg/start-connecting.html) in the *AWS Glue Developer Guide*.

# Getting started with notebooks in AWS Glue Studio
<a name="notebook-getting-started"></a>

 When you start a notebook through AWS Glue Studio, all the configuration steps are done for you so that you can explore your data and start developing your job script after only a few seconds. 

 The following sections describe how to create a role and grant the appropriate permissions to use notebooks in AWS Glue Studio for ETL jobs. 

 For more information on actions defined by AWS Glue, see [ Actions defined by AWS Glue ](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsglue.html). 

**Topics**
+ [Granting permissions for the IAM role](#studio-notebook-permissions)

## Granting permissions for the IAM role
<a name="studio-notebook-permissions"></a>

 Setting up AWS Glue Studio is a pre-requisite to using notebooks. 

To use notebooks in AWS Glue, your role requires the following:
+  A trust relationship with AWS Glue for the `sts:AssumeRole` action and, if you want tagging then `sts:TagSession`. 
+  An IAM policy containing all the permissions for notebooks, AWS Glue, and interactive sessions. 
+  An IAM policy for a pass role since the role needs to be able to pass itself from the notebook to interactive sessions. 

 For example, when you create a new role, you can add a standard AWS managed policy like `AWSGlueConsoleFullAccessRole` to the role, and then add a new policy for the notebook operations and another for the IAM PassRole policy. 

### Actions needed for a trust relationship with AWS Glue
<a name="create-notebook-permissions-trust"></a>

 When starting a notebook session, you must add the `sts:AssumeRole` to the trust relationship of the role that is passed to the notebook. If your session includes tags, you must also pass the `sts:TagSession` action. Without these actions, the notebook session cannot start. 

 For example: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

------

### Policies containing IAM permissions for notebooks
<a name="create-notebook-permissions-operations"></a>

 The following sample policy describes the required AWS IAM permissions for notebooks. If you are creating a new role, create a policy that contains the following: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:StartNotebook",
        "glue:TerminateNotebook",
        "glue:GlueNotebookRefreshCredentials",
        "glue:DeregisterDataPreview",
        "glue:GetNotebookInstanceStatus",
        "glue:GlueNotebookAuthorize"
      ],
      "Resource": "*"
    }
  ]
}
```

------

 You can use the following IAM policies to allow access to specific resources: 
+  *AwsGlueSessionUserRestrictedNotebookServiceRole*: Provides full access to all AWS Glue resources except for sessions. Allows users to create and use only the notebook sessions that are associated with the user. This policy also includes other permissions needed by AWS Glue to manage AWS Glue resources in other AWS services. 
+  *AwsGlueSessionUserRestrictedNotebookPolicy*: Provides permissions that allows users to create and use only the notebook sessions that are associated with the user. This policy also includes permissions to explicitly allow users to pass a restricted AWS Glue session role. 

### IAM policy to pass a role
<a name="create-notebook-permissions-pass-role"></a>

 When you create a notebook with a role, that role is then passed to interactive sessions so that the same role can be used in both places. As such, the `iam:PassRole` permission needs to be part of the role's policy. 

 Create a new policy for your role using the following example. Replace the account number with your own and the role name. 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::111122223333:role/<role_name>"
    }
  ]
}
```

------

# Setting up AWS Glue usage profiles
<a name="start-usage-profiles"></a>

One of the main advantages of using a cloud platform is its flexibility. However, with this ease of creating compute resources comes a risk of spiraling cloud costs when left unmanaged and without guardrails. As a result, admins need to balance avoiding high infrastructure costs while at the same time allowing users to work without unnecessary friction.

With AWS Glue usage profiles, admins can create different profiles for various classes of users within the account, such as developers, testers, and product teams. Each profile is a unique set of parameters that can be assigned to different types of users. For example, developers may need more workers and can have a higher number of maximum workers while product teams may need fewer workers and a lower timeout or idle timeout value.

**Example of jobs and job runs behavior**  
Suppose that a job is created by user A with profile A. The job is saved with certain parameter values. User B with profile B will try to run the job.

When user A authored the job, if he didn’t set a specific number of workers, the default set in user A's profile was applied and was saved with the job's definitions.

When user B runs the job, it run with whatever values were saved for it. If user B's own profile is more restrictive and not allowed to run with that many workers, the job run will fail.

**Usage profile as a resource**  
An AWS Glue usage profile is a resource identified by an Amazon Resource Name (ARN). All the default IAM (Identity and Access Management) controls apply, including action-based and resource-based authorization. Admins should update the IAM policy of users who create AWS Glue resources, granting them access to use the profiles.

![\[An example of usage profiles configured in AWS Glue.\]](http://docs.aws.amazon.com/glue/latest/dg/images/usage-profiles-1.png)


**Topics**
+ [Creating and managing usage profiles](start-usage-profiles-managing.md)
+ [Usage profiles and jobs](start-usage-profiles-jobs.md)

# Creating and managing usage profiles
<a name="start-usage-profiles-managing"></a>

## Creating an AWS Glue usage profile
<a name="w2aac15c15c19b3"></a>

Admins should create usage profiles and then assign them to the various users. When creating a usage profile, you specify default values as well as a range of allowed values for various job and session parameters. You must configure at least one parameter for jobs or interactive sessions. You can customize the default value to be used when a parameter value is not provided for the job, and/or set up a range limit or a set of allowed values for validation if a user provides a parameter value when using this profile. 

*Defaults* are a best practice set by the admin to assist job authors. When a user creates a new job and doesn't set a timeout value, the usage profile's default timeout will apply. If the author doesn’t have a profile, then the AWS Glue service defaults would apply and be saved in the job's definition. At runtime, AWS Glue enforces the limits set in the profile (min, max, allowed workers).

 Once a parameter is configured, all other parameters are optional. Parameters that can be customized for jobs or interactive sessions are: 
+  **Number of workers** – restrict the number of workers to avoid excessive use of compute resources. You can set a default, minimum, and maximum value. The minimum is 1. 
+  **Worker type** – restrict the relevant worker types for your workloads. You can set a default type and allow worker types for a user profile. 
+  **Timeout** – define the maximum time a job or interactive session can run and consume resources before it is terminated. Set up timeout values to avoid long-running jobs.

  You can set a default, minimum, and maximum value in minutes. The minimum is 1 (minute). While the AWS Glue default time out is 2880 minutes, you can set any default value in the usage profile.

  It is a best practice to set a value for 'default'. This value will be used for the job or session creation if no value was set by the user.
+  **Idle timeout** – define the number of minutes an interactive session is inactive before timing out after a cell has been run. Define idle timeout for interactive sessions to terminate after the work completed. Idle timeout range should be within the limit of timeout.

  You can set a default, minimum, and maximum value in minutes. The minimum is 1 (minute). While the AWS Glue default time out is 2880 minutes, you can set any default value in the usage profile. 

  It is a best practice to set a value for 'default'. This value will be used for the session creation if no value was set by the user.

**To create an AWS Glue usage profile as an admin (console)**

1. In the left navigation menu, choose **Cost management**.

1. Choose **Create usage profile**.

1. Enter the **Usage profile name** for the usage profile.

1. Enter an optional description that will help others recognize the purpose of the usage profile.

1. Define at least one parameter in the profile. Any field in the form is a parameter. For example, the session idle timeout minimum.

1. Define any optional tags that apply to the usage profile.

1. Choose **Save**.  
![\[An example of an admin creating a usage profile in AWS Glue.\]](http://docs.aws.amazon.com/glue/latest/dg/images/usage-profiles-2-create.png)

**To create a usage profile (AWS CLI)**

1. Enter the following command.

   ```
   aws glue create-usage-profile --name profile-name --configuration file://config.json --tags list-of-tags
   ```

   where the config.json can define parameter values for interactive sessions (`SessionConfiguration`) and jobs (`JobConfiguration`):

   ```
   //config.json (There is a separate blob for session/job configuration
   {
       "SessionConfiguration": {
           "timeout": {
               "DefaultValue": "2880",
               "MinValue": "100",
               "MaxValue": "4000"
           },
           "idleTimeout": {
               "DefaultValue": "30",
               "MinValue": "10",
               "MaxValue": "4000"
           },
           "workerType": {
               "DefaultValue": "G.2X",
               "AllowedValues": [
                   "G.1X",
                   "G.2X",
                   "G.4X",
                   "G.8X",
                   "G.12X",
                   "G.16X",
                   "R.1X",
                   "R.2X",
                   "R.4X",
                   "R.8X"
               ]
           },
           "numberOfWorkers": {
               "DefaultValue": "10",
               "MinValue": "1",
               "MaxValue": "10"
           }
       },
       "JobConfiguration": {
           "timeout": {
               "DefaultValue": "2880",
               "MinValue": "100",
               "MaxValue": "4000"
           },
           "workerType": {
               "DefaultValue": "G.2X",
               "AllowedValues": [
                   "G.1X",
                   "G.2X",
                   "G.4X",
                   "G.8X",
                   "G.12X",
                   "G.16X",
                   "R.1X",
                   "R.2X",
                   "R.4X",
                   "R.8X"
               ]
           },
           "numberOfWorkers": {
               "DefaultValue": "10",
               "MinValue": "1",
               "MaxValue": "10"
           }
       }
   }
   ```

1. Enter the following command to see the usage profile created:

   ```
   aws glue get-usage-profile --name profile-name
   ```

   The response:

   ```
   {
       "ProfileName": "foo",
       "Configuration": {
           "SessionConfiguration": {
               "numberOfWorkers": {
                   "DefaultValue": "10",
                   "MinValue": "1",
                   "MaxValue": "10"
               },
               "workerType": {
                   "DefaultValue": "G.2X",
                   "AllowedValues": [
                       "G.1X",
                       "G.2X",
                       "G.4X",
                       "G.8X",
                       "G.12X",
                       "G.16X",
                       "R.1X",
                       "R.2X",
                       "R.4X",
                       "R.8X"
                   ]
               },
               "timeout": {
                   "DefaultValue": "2880",
                   "MinValue": "100",
                   "MaxValue": "4000"
               },
               "idleTimeout": {
                   "DefaultValue": "30",
                   "MinValue": "10",
                   "MaxValue": "4000"
               }
           },
           "JobConfiguration": {
               "numberOfWorkers": {
                   "DefaultValue": "10",
                   "MinValue": "1",
                   "MaxValue": "10"
               },
               "workerType": {
                   "DefaultValue": "G.2X",
                   "AllowedValues": [
                       "G.1X",
                       "G.2X",
                       "G.4X",
                       "G.8X",
                       "G.12X",
                       "G.16X",
                       "R.1X",
                       "R.2X",
                       "R.4X",
                       "R.8X"
                   ]
               },
               "timeout": {
                   "DefaultValue": "2880",
                   "MinValue": "100",
                   "MaxValue": "4000"
               }
           }
       },
       "CreatedOn": "2024-01-19T23:15:24.542000+00:00"
   }
   ```

Additional CLI commands used to manage usage profiles:
+ aws glue list-usage-profiles
+ aws glue update-usage-profile --name *profile-name* --configuration *file://config.json*
+ aws glue delete-usage-profile --name *profile-name*

## Editing a usage profile
<a name="w2aac15c15c19b5"></a>

Admins can edit usage profiles that they have created, to change the profile parameter values for jobs and interactive sessions.

To edit a usage profile:

**To edit an AWS Glue usage profile as an admin (console)**

1. In the left navigation menu, choose **Cost management**.

1. Choose a usage profile that you have permissions to edit and choose **Edit**.

1. Make changes as needed to the profile. By default, the parameters that already have values are expanded.

1. Choose **Save Edits**.  
![\[An example of a user editing a usage profile in AWS Glue.\]](http://docs.aws.amazon.com/glue/latest/dg/images/usage-profiles-4-edit.png)

**To edit a usage profile (AWS CLI)**
+ Enter the following command. The same `--configuration` file syntax is used as shown above in the create command.

  ```
  aws glue update-usage-profile --name profile-name --configuration file://config.json
  ```

  where the config.json defines parameter values for interactive sessions (`SessionConfiguration`) and jobs (`JobConfiguration`):

## Assigning a usage profile
<a name="w2aac15c15c19b7"></a>

The **Utilization status** column in the **Usage profiles** page shows whether a usage profile is assigned to users. Hovering over the status shows the assigned IAM entities.

The admin can assign an AWS Glue usage profile to users/roles who create AWS Glue resources. Assigning a profile is a combination of two actions:
+ Updating the IAM user/role tag with the `glue:UsageProfile` key, then
+ Updating the IAM policy of the user/role.

For users who use AWS Glue Studio to create jobs/interactive sessions, the admin tags the following roles:
+ For restrictions on jobs, the admin tags the logged in console role
+ For restrictions on interactive sessions, the admin tags the role the user provides when they create the notebook

The following is example policy that admin needs to update on the IAM users/roles who create AWS Glue resources:

```
{
    "Effect": "Allow",
    "Action": [
        "glue:GetUsageProfile"
    ],
    "Resource": [
        "arn:aws:glue:us-east-1:123456789012:usageProfile/foo"
    ]
}
```

AWS Glue validates job, job run, and session requests based on the values specified in the AWS Glue usage profile and raises an exception if the request is disallowed. For synchronous APIs, an error will be thrown to the user. For asynchronous paths, a failed job run is created with the error message that the input parameter is outside of the allowed range for the assigned profile of the user/role.

To assign a usage profile to a user/role:

1. Open the (Identity and Access Management) IAM console.

1. In the left navigation, choose **Users** or **Roles**.

1. Choose a user or role.

1. Choose the **Tags** tab.

1. Choose **Add new tag**

1. Add a tag with the **Key** of `glue:UsageProfile` and the **Value** of the name of your usage profile.

1. Choose **Save changes**  
![\[An example of an adding a tag to an IAM role.\]](http://docs.aws.amazon.com/glue/latest/dg/images/usage-profiles-iam-role-tagged.png)

## Viewing your assigned usage profile
<a name="w2aac15c15c19b9"></a>

Users can view their assigned usage profiles and use them when making API calls to create AWS Glue job and session resources, or starting a job.

Profile permissions are provided in IAM policies. As long as the caller policy has the `glue:UsageProfile` permission, a user can see the profile. Otherwise, you will get an access denied error.

To view an assigned usage profile:

1. In the left navigation menu, choose **Cost management**.

1. Choose a usage profile that you have permissions to view.

![\[An example of a user viewing their assigned usage profile in AWS Glue.\]](http://docs.aws.amazon.com/glue/latest/dg/images/usage-profiles-3-view.png)


# Usage profiles and jobs
<a name="start-usage-profiles-jobs"></a>

## Authoring jobs with usage profiles
<a name="w2aac15c15c21b3"></a>

While authoring jobs, the limits and defaults set in your usage profile will apply. Your profile will be assigned to the job upon save.

## Running jobs with usage profiles
<a name="w2aac15c15c21b5"></a>

When you start a job run, AWS Glue enforces the limits set in your caller's profile. If there is no direct caller, Glue will then apply the limits from the profile assigned to the job by its author.

**Note**  
When a job is ran on a schedule (by AWS Glue workflows or AWS Glue triggers), the profile assigned to the job the author will apply.  
When a job is ran by an external service (Step Functions, MWAA) or a `StartJobRun` API, the caller's profile limit will be enforced.

For AWS Glue workflows or AWS Glue triggers: pre-existing jobs need to be updated to save the new profile name so that profile's limits (min, max, and allowed workers) will be enforced at runtime for scheduled runs.

## Viewing a usage profile assigned for jobs
<a name="w2aac15c15c21b7"></a>

To view the profile assigned to your jobs (that will be used at runtime with scheduled AWS Glue workflows or AWS Glue triggers), you may look at the job **Details** tab. You may also look at the profile used in past runs in the job runs details tab.

## Updating or deleting a usage profile attached to a job
<a name="w2aac15c15c21b9"></a>

The profile assigned to a job is changed upon update. If the author isn't assigned a usage profile, any profile previously attached to the job will be removed from it.

# Getting started with the AWS Glue Data Catalog
<a name="start-data-catalog"></a>

 The AWS Glue Data Catalog is your persistent technical metadata store. It is a managed service that you can use to store, annotate, and share metadata in the AWS Cloud. For more information, see [AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro). 


|  | 
| --- |
| The AWS Glue console and some user interfaces were recently updated. | 

## Overview
<a name="start-data-catalog-overview"></a>

 You can use this tutorial to create your first AWS Glue Data Catalog, which uses an Amazon S3 bucket as your data source. 

 In this tutorial, you'll do the following using the AWS Glue console: 

1.  Create a database 

1.  Create a table 

1.  Use an Amazon S3 bucket as a data source 

 After completing these steps, you will have successfully used an Amazon S3 bucket as the data source to populate the AWS Glue Data Catalog. 

## Step 1: Create a database
<a name="start-data-catalog-database"></a>

 To get started, sign in to the AWS Management Console and open the [AWS Glue console](https://console.aws.amazon.com/glue). 

 ** To create a database using the AWS Glue console: ** 

1.  In the AWS Glue console, choose **Databases** under ** Data catalog** from the left-hand menu. 

1.  Choose **Add database**. 

1.  In the Create a database page, enter a name for the database. In the **Location - *optional*** section, set the URI location for use by clients of the Data Catalog. If you don't know this, you can continue with creating the database. 

1.  (Optional). Enter a description for the database. 

1.  Choose **Create database**. 

 Congratulations, you've just set up your first database using the AWS Glue console. Your new database will appear in the list of available databases. You can edit the database by choosing the database's name from the ** Databases** dashboard. 

 **Next steps** 

 ** Other ways to create a database: ** 

 You just created a database using the AWS Glue console, but there are other ways to create a database: 
+ You can use crawlers to create a database and tables for you automatically. To set up a database using crawlers, see [Working with Crawlers in the AWS Glue Console](https://docs.aws.amazon.com/glue/latest/dg/console-crawlers.html). 
+  You can use CloudFormation templates. See [Creating AWS Glue Resources Using AWS Glue Data Catalog Templates](https://docs.aws.amazon.com/glue/latest/dg/populate-with-cloudformation-templates.html). 
+  You can also create a database using the AWS Glue Database API operations. 

   To create a database using the `create` operation, structure the request by including the `DatabaseInput` (required) parameters. 

   For example:   
****  
 The following are examples of how you can use the CLI, Boto3, or DDL to define a table based on the same flights\$1data.csv file from the S3 bucket that you used in the tutorial.   

  ```
  aws glue create-database --database-input "{\"Name\":\"clidb\"}"                                              
  ```

  ```
  glueClient = boto3.client('glue')
  
  response = glueClient.create_database(
      DatabaseInput={
          'Name': 'boto3db'
      }
  )
  ```

 For more information about the Database API data types, structure, and operations, see [Database API](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-databases.html). 

 **Next Steps** 

 In the next section, you'll create a table and add that table to your database. 

You can also explore the settings and permissions for your Data Catalog. See [Working with Data Catalog Settings in the AWS Glue Console](https://docs.aws.amazon.com/glue/latest/dg/console-data-catalog-settings.html). 

## Step 2. Create a table
<a name="start-data-catalog-table"></a>

 In this step, you create a table using the AWS Glue console. 

1.  In the AWS Glue console, choose **Tables** in the left-hand menu. 

1.  Choose **Add table**. 

1.  Set your table's properties by entering a name for your table in **Table details**. 

1.  In the **Databases** section, choose the database that you created in Step 1 from the drop-down menu. 

1.  In **Add a data store** section, **S3** will be selected by default as the type of source. 

1.  For **Data is located in **, choose **Specified path in another account**. 

1. Copy and paste the path for the **Include path** input field:

   `s3://crawler-public-us-west-2/flight/2016/csv/`

1.  In the section **Data format**, for **Classification**, choose **CSV**. and for **Delimiter**, choose **comma (,)**. Choose **Next**. 

1. You are asked to define a schema. A schema defines the structure and format of a data record. Choose **Add column**. (For more information, see See [Schema registries](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html#schema-registry-schemas.html)).

1.  Specify the column properties: 

   1. Enter a column name. 

   1. For **Column type**, 'string' is already selected by default.

   1. For **Column number**, '1' is already selected by default.

   1. Choose **Add**.

1.  You are asked to add partition indexes. This is optional. To skip this step, choose **Next**. 

1.  A summary of the table properties is displayed. If everything looks as expected, choose **Create**. Otherwise, choose **Back** and make edits as needed. 

 Congratulations, you've successfully created a table manually and associated it to a database. Your newly created table will appear in the Tables dashboard. From the dashboard, you can modify and manage all your tables. 

 For more information, see [Working with Tables in the AWS Glue Console](https://docs.aws.amazon.com/glue/latest/dg/console-tables.html). 

## Next steps
<a name="start-data-catalog-next-steps"></a>

 **Next steps** 

 Now that the Data Catalog is populated, you can begin authoring jobs in AWS Glue. See [ Building visual ETL jobs with AWS Glue Studio](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html). 

 In addition to using the console, there are other ways to define tables in the Data Catalog including:
+  [Creating and running a crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) 
+  [Adding classifiers to a crawler in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html) 
+  [Using the AWS Glue Table API](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html) 
+  [ Using the AWS Glue Data Catalog template](https://docs.aws.amazon.com/glue/latest/dg/populate-with-cloudformation-templates.html) 
+  [ Migrating an Apache Hive metastore](https://github.com/aws-samples/aws-glue-samples/tree/master/utilities/Hive_metastore_migration) 
+  [Using the AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/glue/create-table.html), Boto3, or data definition language (DDL)   
****  
 The following are examples of how you can use the CLI, Boto3, or DDL to define a table based on the same flights\$1data.csv file from the S3 bucket that you used in the tutorial.   
 See the documentation on how to structure an AWS CLI command. The CLI example contains the JSON syntax for the 'aws glue create-table --table-input' value.   

  ```
  {
          "Name": "flights_data_cli",
          "StorageDescriptor": {
              "Columns": [
                  {
                      "Name": "year",
                      "Type": "bigint"
                  },
                  {
                      "Name": "quarter",
                      "Type": "bigint"
                  }
              ],
              "Location": "s3://crawler-public-us-west-2/flight/2016/csv",
              "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
              "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
              "Compressed": false,
              "NumberOfBuckets": -1,
              "SerdeInfo": {
                  "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                  "Parameters": {
                      "field.delim": ",",
                      "serialization.format": ","
                  }
              }
          },
          "PartitionKeys": [
              {
                  "Name": "mon",
                  "Type": "string"
              }
          ],
          "TableType": "EXTERNAL_TABLE",
          "Parameters": {
              "EXTERNAL": "TRUE",
              "classification": "csv",
              "columnsOrdered": "true",
              "compressionType": "none",
              "delimiter": ",",
              "skip.header.line.count": "1",
              "typeOfData": "file"
          }
      }
  ```

  ```
  import boto3
  
  glue_client = boto3.client("glue")
  
  response = glue_client.create_table(
      DatabaseName='sampledb',
      TableInput={
          'Name': 'flights_data_manual',
      'StorageDescriptor': {
        'Columns': [{
          'Name': 'year',
          'Type': 'bigint'
        }, {
          'Name': 'quarter',
          'Type': 'bigint'
        }],
        'Location': 's3://crawler-public-us-west-2/flight/2016/csv',
        'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat',
        'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
        'Compressed': False,
        'NumberOfBuckets': -1,
        'SerdeInfo': {
          'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe',
          'Parameters': {
            'field.delim': ',',
            'serialization.format': ','
          }
        },
      },
      'PartitionKeys': [{
        'Name': 'mon',
        'Type': 'string'
      }],
      'TableType': 'EXTERNAL_TABLE',
      'Parameters': {
        'EXTERNAL': 'TRUE',
        'classification': 'csv',
        'columnsOrdered': 'true',
        'compressionType': 'none',
        'delimiter': ',',
        'skip.header.line.count': '1',
        'typeOfData': 'file'
      }
      }
  )
  ```

  ```
  CREATE EXTERNAL TABLE `sampledb`.`flights_data` (
    `year` bigint, 
    `quarter` bigint)
  PARTITIONED BY ( 
    `mon` string)
  ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
  STORED AS INPUTFORMAT 
    'org.apache.hadoop.mapred.TextInputFormat' 
  OUTPUTFORMAT 
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
  LOCATION
    's3://crawler-public-us-west-2/flight/2016/csv/'
  TBLPROPERTIES (
    'classification'='csv', 
    'columnsOrdered'='true', 
    'compressionType'='none', 
    'delimiter'=',', 
    'skip.header.line.count'='1', 
    'typeOfData'='file')
  ```

# Setting up network access to data stores
<a name="start-connecting"></a>

To run your extract, transform, and load (ETL) jobs, AWS Glue must be able to access your data stores. If a job doesn't need to run in your virtual private cloud (VPC) subnet—for example, transforming data from Amazon S3 to Amazon S3—no additional configuration is needed.

If a job needs to run in your VPC subnet—for example, transforming data from a JDBC data store in a private subnet—AWS Glue sets up [elastic network interfaces](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_ElasticNetworkInterfaces.html) that enable your jobs to connect securely to other resources within your VPC. Each elastic network interface is assigned a private IP address from the IP address range within the subnet you specified. No public IP addresses are assigned. Security groups specified in the AWS Glue connection are applied on each of the elastic network interfaces. For more information, see [Setting up Amazon VPC for JDBC connections to Amazon RDS data stores from AWS Glue](setup-vpc-for-glue-access.md). 

All JDBC data stores that are accessed by the job must be available from the VPC subnet. To access Amazon S3 from within your VPC, a [VPC endpoint](vpc-endpoints-s3.md) is required. If your job needs to access both VPC resources and the public internet, the VPC needs to have a Network Address Translation (NAT) gateway inside the VPC.

 A job or development endpoint can only access one VPC (and subnet) at a time. If you need to access data stores in different VPCs, you have the following options: 
+ Use VPC peering to access the data stores. For more about VPC peering, see [VPC Peering Basics](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-basics.html) 
+ Use an Amazon S3 bucket as an intermediary storage location. Split the work into two jobs, with the Amazon S3 output of job 1 as the input to job 2.

For details on how to connect to a Amazon Redshift data store using Amazon VPC, see [Configuring Redshift connections](aws-glue-programming-etl-connect-redshift-home.md#aws-glue-programming-etl-connect-redshift-configure).

For details on how to connnect to Amazon RDS data stores using Amazon VPC, see [Setting up Amazon VPC for JDBC connections to Amazon RDS data stores from AWS Glue](setup-vpc-for-glue-access.md).

Once necessary rules are set in Amazon VPC, you create a connection in AWS Glue with the necessary properties to connect to your data stores. For more information about the connection, see [Connecting to data](glue-connections.md).

**Note**  
Make sure you set up your DNS environment for AWS Glue. For more information, see [Setting up DNS in your VPC](set-up-vpc-dns.md). 

**Topics**
+ [Setting up a VPC to connect to PyPI for AWS Glue](setup-vpc-for-pypi.md)
+ [Setting up DNS in your VPC](set-up-vpc-dns.md)

# Setting up a VPC to connect to PyPI for AWS Glue
<a name="setup-vpc-for-pypi"></a>

The Python Package Index (PyPI) is a repository of software for the Python programming language. This topic addresses the details needed to support the use of pip installed packages (as specified by the session creator using the `--additional-python-modules` flag).

Using AWS Glue interactive sessions with a connector results in the use of VPC network via the subnet specified for the connector. Consequently AWS services and other network destinations are not available unless you set up a special configuration.

The resolutions to this issue include:
+ Use of an internet gateway which is reachable by your session.
+ Set up and use of an S3 bucket with a PyPI/simple repo containing the transitive closure of a package set's dependencies.
+ Use of a CodeArtifact repository which is mirroring PyPI and attached to your VPC.

## Setting up an internet gateway
<a name="setup-vpc-for-pypi-internet-gateway"></a>

The technical aspects are detailed in [NAT gateway use cases](https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-scenarios.html) but note these requirements for using `--additional-python-modules`. Specifically, `--additional-python-modules` requires access to pypi.org which is determined by the configuration of your VPC. Note the following requirements:

1. The requirement of installing additional python modules via pip install for a user's session. If the session uses a connector, your configuration may be affected.

1. When a connector is being used with `--additional-python-modules`, when the session is started the subnet associated with the connector's `PhysicalConnectionRequirements` has to provide a network path for reaching pypi.org.

1. You must determine whether or not your configuration is correct.

## Setting up an Amazon S3 bucket to host a targeted PyPI/simple repo
<a name="setup-vpc-for-pypi-s3-bucket"></a>

This example sets up a PyPI mirror in Amazon S3 for a set of packages and their dependencies.

To set up the PyPI mirror for a set of packages:

```
# pip download all the dependencies
pip download -d s3pypi --only-binary :all: plotly gglplot
pip download -d s3pypi --platform manylinux_2_17_x86_64 --only-binary :all: psycopg2-binary
# create and upload the pypi/simple index and wheel files to the s3 bucket
s3pypi -b test-domain-name --put-root-index -v s3pypi/*
```

If you already have an existing artifact repository, it will have an index URL for pip's use that you can provide in place of the example URL for the Amazon S3 bucket as above.

To use the custom index-url, with some example packages:

```
%%configure
{
    "--additional-python-modules": "psycopg2_binary==2.9.5",
    "python-modules-installer-option": "--no-cache-dir --verbose --index-url https://test-domain-name.s3.amazonaws.com/ --trusted-host test-domain-name.s3.amazonaws.com"
}
```

## Setting up a CodeArtifact mirror of pypi attached to your VPC
<a name="setup-vpc-for-pypi-code-artifact"></a>

To set up a mirror:

1. Create a repository in the same region as the subnet used by the connector.

   Select `Public upstream repositories` and choose `pypi-store`.

1. Provide access to the repository from the VPC for the subnet.

1. Specify the correct `--index-url` using the `python-modules-installer-option`. 

   ```
   %%configure
   {
       "--additional-python-modules": "psycopg2_binary==2.9.5",
       "python-modules-installer-option": "--no-cache-dir --verbose --index-url https://test-domain-name.s3.amazonaws.com/ --trusted-host test-domain-name.s3.amazonaws.com"
   }
   ```

For more information, see [Use CodeArtifact from a VPC](https://docs.aws.amazon.com/codeartifact/latest/ug/use-codeartifact-from-vpc.html).

# Setting up DNS in your VPC
<a name="set-up-vpc-dns"></a>

Domain Name System (DNS) is a standard by which names used on the internet are resolved to their corresponding IP addresses. A DNS hostname uniquely names a computer and consists of a host name and a domain name. DNS servers resolve DNS hostnames to their corresponding IP addresses.

To set up DNS in your VPC, ensure that DNS hostnames and DNS resolution are both enabled in your VPC. The VPC network attributes `enableDnsHostnames` and `enableDnsSupport` must be set to `true`. To view and modify these attributes, go to the VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/). 

For more information, see [Using DNS with your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html). Also, you can use the AWS CLI and call the [modify-vpc-attribute](https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-vpc-attribute.html) command to configure the VPC network attributes.

**Note**  
If you are using Route 53, confirm that your configuration does not override DNS network attributes.

# Setting up encryption in AWS Glue
<a name="set-up-encryption"></a>

The following example workflow highlights the options to configure when you use encryption with AWS Glue. The example demonstrates the use of specific AWS Key Management Service (AWS KMS) keys, but you might choose other settings based on your particular needs. This workflow highlights only the options that pertain to encryption when setting up AWS Glue. 

1. If the user of the AWS Glue console doesn't use a permissions policy that allows all AWS Glue API operations (for example, `"glue:*"`), confirm that the following actions are allowed:
   + `"glue:GetDataCatalogEncryptionSettings"`
   + `"glue:PutDataCatalogEncryptionSettings"`
   + `"glue:CreateSecurityConfiguration"`
   + `"glue:GetSecurityConfiguration"`
   + `"glue:GetSecurityConfigurations"`
   + `"glue:DeleteSecurityConfiguration"`

1. Any client that accesses or writes to an encrypted catalog—that is, any console user, crawler, job, or development endpoint—needs the following permissions.

1. Any user or role that accesses an encrypted connection password needs the following permissions.

1. The role of any extract, transform, and load (ETL) job that writes encrypted data to Amazon S3 needs the following permissions.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": {
       "Effect": "Allow",
       "Action": [
         "kms:Decrypt",
         "kms:Encrypt",
         "kms:GenerateDataKey"
       ],
       "Resource": "arn:aws:kms:us-east-1:111122223333:key/key-id"
     }
   }
   ```

------

1. Any ETL job or crawler that writes encrypted Amazon CloudWatch Logs requires the following permissions in the key and IAM policies.

   In the key policy (not the IAM policy):

   ```
   {
    	"Effect": "Allow",
    	"Principal": {
    		"Service": "logs.region.amazonaws.com"
    	},
    	"Action": [
    		"kms:Encrypt*",
    		"kms:Decrypt*",
    		"kms:ReEncrypt*",
    		"kms:GenerateDataKey*",
    		"kms:Describe*"
    	],
    	"Resource": "<arn of key used for ETL/crawler cloudwatch encryption>"
    }
   ```

   For more information about key policies, see [Using Key Policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html) in the *AWS Key Management Service Developer Guide*.

   In the IAM policy attach the `logs:AssociateKmsKey` permission:

   ```
   {
    	"Effect": "Allow",
    	"Principal": {
    		"Service": "logs.region.amazonaws.com"
    	},
    	"Action": [
    		"logs:AssociateKmsKey"
    	],
    	"Resource": "<arn of key used for ETL/crawler cloudwatch encryption>"
    }
   ```

1. Any ETL job that uses an encrypted job bookmark needs the following permissions.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": {
       "Effect": "Allow",
       "Action": [
         "kms:Decrypt",
         "kms:Encrypt"
       ],
       "Resource": "arn:aws:kms:us-east-1:111122223333:key/*"
     }
   }
   ```

------

1. On the AWS Glue console, choose **Settings** in the navigation pane.

   1. On the **Data catalog settings** page, encrypt your Data Catalog by selecting **Metadata encryption**. This option encrypts all the objects in the Data Catalog with the AWS KMS key that you choose.

   1.  For **AWS KMS key**, choose **aws/glue**. You can also choose a AWS KMS key that you created.
**Important**  
AWS Glue supports only symmetric customer master keys (CMKs). The **AWS KMS key** list displays only symmetric keys. However, if you select **Choose a AWS KMS key ARN**, the console lets you enter an ARN for any key type. Ensure that you enter only ARNs for symmetric keys.

   When encryption is enabled, the client that is accessing the Data Catalog must have AWS KMS permissions. 

1. In the navigation pane, choose **Security configurations**. A security configuration is a set of security properties that can be used to configure AWS Glue processes. Then choose **Add security configuration**. In the configuration, choose any of the following options: 

   1. Select **S3 encryption**. For **Encryption mode**, choose **SSE-KMS**. For the **AWS KMS key**, choose **aws/s3** (ensure that the user has permission to use this key). This enables data written by the job to Amazon S3 to use the AWS managed AWS Glue AWS KMS key.

   1. Select **CloudWatch logs encryption**, and choose a CMK. (Ensure that the user has permission to use this key). For more information, see [Encrypt Log Data in CloudWatch Logs Using AWS KMS](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html) in the *AWS Key Management Service Developer Guide*.
**Important**  
AWS Glue supports only symmetric customer master keys (CMKs). The **AWS KMS key** list displays only symmetric keys. However, if you select **Choose a AWS KMS key ARN**, the console lets you enter an ARN for any key type. Ensure that you enter only ARNs for symmetric keys.

   1. Choose **Advanced properties**, and select **Job bookmark encryption**. For the **AWS KMS key**, choose **aws/glue** (ensure that the user has permission to use this key). This enables encryption of job bookmarks written to Amazon S3 with the AWS Glue AWS KMS key.

1. In the navigation pane, choose **Connections**.

   1. Choose **Add connection** to create a connection to the Java Database Connectivity (JDBC) data store that is the target of your ETL job.

   1. To enforce that Secure Sockets Layer (SSL) encryption is used, select **Require SSL connection**, and test your connection.

1. In the navigation pane, choose **Jobs**. 

   1. Choose **Add job** to create a job that transforms data. 

   1. In the job definition, choose the security configuration that you created. 

1. On the AWS Glue console, run your job on demand. Verify that any Amazon S3 data written by the job, the CloudWatch Logs written by the job, and the job bookmarks are all encrypted.

# Setting up networking for development for AWS Glue
<a name="start-development-endpoint"></a>

To run your extract, transform, and load (ETL) scripts with AWS Glue, you can develop and test your scripts using a *development endpoint*. Development endpoints are not supported for use with AWS Glue version 2.0 jobs. For versions 2.0 and later, the preferred development method is using Jupyter Notebook with one of the AWS Glue kernels. For more information, see [Getting started with AWS Glue interactive sessions](interactive-sessions.md).

## Setting up your network for a development endpoint
<a name="setup-vpc-for-development-endpoint"></a>

When you set up a development endpoint, you specify a virtual private cloud (VPC), subnet, and security groups.

**Note**  
Make sure you set up your DNS environment for AWS Glue. For more information, see [Setting up DNS in your VPC](set-up-vpc-dns.md). 

To enable AWS Glue to access required resources, add a row in your subnet route table to associate a prefix list for Amazon S3 to the VPC endpoint. A prefix list ID is required for creating an outbound security group rule that allows traffic from a VPC to access an AWS service through a VPC endpoint. To ease connecting to a notebook server that is associated with this development endpoint, from your local machine, add a row to the route table to add an internet gateway ID. For more information, see [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html). Update the subnet routes table to be similar to the following table: 


****  

| Destination | Target | 
| --- | --- | 
| 10.0.0.0/16 | local | 
| pl-id for Amazon S3 | vpce-id | 
| 0.0.0.0/0 | igw-xxxx | 

 To enable AWS Glue to communicate between its components, specify a security group with a self-referencing inbound rule for all TCP ports. By creating a self-referencing rule, you can restrict the source to the same security group in the VPC, and it's not open to all networks. The default security group for your VPC might already have a self-referencing inbound rule for ALL Traffic. 

**To set up a security group**

1. Sign in to the AWS Management Console and open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the left navigation pane, choose **Security Groups**.

1. Either choose an existing security group from the list, or **Create Security Group** to use with the development endpoint. 

1. In the security group pane, navigate to the **Inbound** tab.

1. Add a self-referencing rule to allow AWS Glue components to communicate. Specifically, add or confirm that there is a rule of **Type** `All TCP`, **Protocol** is `TCP`, **Port Range** includes all ports, and whose **Source** is the same security group name as the **Group ID**. 

   The inbound rule looks similar to this:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/start-development-endpoint.html)

   The following shows an example of a self-referencing inbound rule:  
![\[Image showing an example of a self-referencing inbound rule.\]](http://docs.aws.amazon.com/glue/latest/dg/images/SetupSecurityGroup-Start.png)

1. Add a rule to for outbound traffic also. Either open outbound traffic to all ports, or create a self-referencing rule of **Type** `All TCP`, **Protocol** is `TCP`, **Port Range** includes all ports, and whose **Source** is the same security group name as the **Group ID**. 

   The outbound rule looks similar to one of these rules:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/start-development-endpoint.html)

## Setting up Amazon EC2 for a notebook server
<a name="setup-vpc-for-notebook-server"></a>

 With a development endpoint, you can create a notebook server to test your ETL scripts with Jupyter notebooks. To enable communication to your notebook, specify a security group with inbound rules for both HTTPS (port 443) and SSH (port 22). Ensure that the rule's source is either 0.0.0.0/0 or the IP address of the machine that is connecting to the notebook. 

**To set up a security group**

1. Sign in to the AWS Management Console and open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the left navigation pane, choose **Security Groups**.

1. Either choose an existing security group from the list, or **Create Security Group** to use with your notebook server. The security group that is associated with your development endpoint is also used to create your notebook server.

1. In the security group pane, navigate to the **Inbound** tab.

1. Add inbound rules similar to this:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/start-development-endpoint.html)

   The following shows an example of the inbound rules for the security group:  
![\[Image showing an example of the inbound rules for the security group.\]](http://docs.aws.amazon.com/glue/latest/dg/images/SetupSecurityGroupNotebook-Start.png)