

# Connecting to Amazon Redshift in AWS Glue Studio
<a name="connecting-to-data-redshift"></a>

**Note**  
 You can use AWS Glue for Spark to read from and write to tables in Amazon Redshift databases outside of AWS Glue Studio. To configure Amazon Redshift with AWS Glue jobs programatically, see [Redshift connections](aws-glue-programming-etl-connect-redshift-home.md). 

 AWS Glue provides built-in support for Amazon Redshift. AWS Glue Studio provides a visual interface to connect to Amazon Redshift, author data integration jobs, and run them on AWS Glue Studio serverless Spark runtime. 

**Topics**
+ [Creating an Amazon Redshift connection](creating-redshift-connection.md)
+ [Creating a Amazon Redshift source node](creating-redshift-source-node.md)
+ [Creating an Amazon Redshift target node](creating-redshift-target-node.md)
+ [Advanced options](creating-redshift-connection-advanced-options.md)

# Creating an Amazon Redshift connection
<a name="creating-redshift-connection"></a>

## Permissions needed
<a name="creating-redshift-connection-permissions"></a>

 Additional permissions are need to use Amazon Redshift clusters and Amazon Redshift serverless environments. For more information on how to add permissions to ETL jobs, see [Review IAM permissions needed for ETL jobs](https://docs.aws.amazon.com/glue/latest/ug/setting-up.html#getting-started-min-privs-job). 
+  redshift:DescribeClusters 
+  redshift-serverless:ListWorkgroups 
+  redshift-serverless:ListNamespaces 

## Overview
<a name="w2aac25c27c13c11b5"></a>

 When adding an Amazon Redshift connection, you can choose an existing Amazon Redshift connection or create a new connection when adding a **Data source - Redshift** node in AWS Glue Studio. 

 AWS Glue supports both Amazon Redshift clusters and Amazon Redshift serverless environments. When you create a connection, Amazon Redshift serverless environments display the **serverless** label next to the connection option. 

 For more information on how to create a Amazon Redshift connection, see [ Moving data to and from Amazon Redshift](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-redshift.html#aws-glue-programming-etl-redshift-using). 

# Creating a Amazon Redshift source node
<a name="creating-redshift-source-node"></a>

## Permissions needed
<a name="creating-redshift-source-node-permissions"></a>

 AWS Glue Studio jobs using Amazon Redshift data sources require additional permissions. For more information on how to add permissions to ETL jobs, see [Review IAM permissions needed for ETL jobs](https://docs.aws.amazon.com/glue/latest/ug/setting-up.html#getting-started-min-privs-job). 

 The following permissions are needed in order to use an Amazon Redshift connection. 
+  redshift-data:ListSchemas 
+  redshift-data:ListTables 
+  redshift-data:DescribeTable 
+  redshift-data:ExecuteStatement 
+  redshift-data:DescribeStatement 
+  redshift-data:GetStatementResult 

## Adding an Amazon Redshift data source
<a name="creating-redshift-source-node-add"></a>

**To add a **Data Source – Amazon Redshift** node:**

1.  Choose the Amazon Redshift access type: 
   +  Direct data connection (recommended) – choose this option if you want to access your Amazon Redshift data directly. This is the recommended option and also the default. 
   +  Data Catalog tables – choose this option if you have Data Catalog tables that you want to use. 

1.  If you choose Direct data connection, choose the connection for your Amazon Redshift data source. This assumes that the connection already exists and you can select from existing connections. If you need to create a connection, choose **Create Redshift connection**. For more information, see [ Overview of using connectors and connections ](https://docs.aws.amazon.com/glue/latest/ug/connectors-chapter.html#using-connectors-overview). 

    Once you have chosen a connection, you can view the connection properties by clicking **View properties**. Information about the connection are visible, including URL, security groups, subnet, availability zone, description, and created (UTC) and last updated (UTC) timestamps. 

1.  Choose a Amazon Redshift source option: 
   +  **Choose a single table** – this is the table that contains the data you want to access from a single Amazon Redshift table. 
   +  **Enter custom query ** – allows you to access a dataset from multiple Amazon Redshift tables based on your custom query. 

1.  If you chose a single table, choose the Amazon Redshift schema. The list of available schema to choose from is determined by the selected table. 

    Or, choose **Enter custom query**. Choose this option to access a custom dataset from multiple Amazon Redshift tables. When you choose this option, enter the Amazon Redshift query. 

    When connecting to an Amazon Redshift serverless environment, add the following permission to the custom query: 

   ```
               GRANT SELECT ON ALL TABLES IN <schema> TO PUBLIC
   ```

    You can choose **Infer schema** to read the schema based on the query that you entered. You can also choose **Open Redshift query editor** to enter a Amazon Redshift query. For more information, see [ Querying a database using the query editor ](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html). 

1.  In **Performance and security**, choose the Amazon S3 staging directory and IAM role. 
   +  **Amazon S3 staging directory** – choose the Amazon S3 location for temporarily staging data. 
   +  **IAM role** – choose the IAM role that can write to the Amazon S3 location you selected. 

1.  In **Custom Redshift paramters - optional**, enter the parameter and value. 

# Creating an Amazon Redshift target node
<a name="creating-redshift-target-node"></a>

## Permissions needed
<a name="creating-redshift-target-node-permissions"></a>

 AWS Glue Studio jobs using Amazon Redshift data target require additional permissions. For more information on how to add permissions to ETL jobs, see [Review IAM permissions needed for ETL jobs](https://docs.aws.amazon.com/glue/latest/ug/setting-up.html#getting-started-min-privs-job). 

 The following permissions are needed in order to use an Amazon Redshift connection. 
+  redshift-data:ListSchemas 
+  redshift-data:ListTables 

## Adding an Amazon Redshift target node
<a name="w2aac25c27c13c15b5"></a>

**To create a a Amazon Redshift target node:**

1.  Choose an existing Amazon Redshift table as the target, or enter a new table name. 

1.  When you use the **Data target - Redshift** target node, you can choose from the following options: 
   +  **APPEND** – If a table already exists, dump all the new data into the table as an insert. If the table doesn't exist, create it and then insert all new data. 

      Additionally, check the box if you want to update (UPSERT) existing records in the target table. The table must exist first, otherwise the operation will fail. 
   +  **MERGE** – AWS Glue will update or append data to your target table based on the conditions you specify. 
**Note**  
 To use the merge action in AWS Glue, you must enable Amazon Redshift merge functionality. For instructions on how to enable merge for your Amazon Redshift instance, see [MERGE (preview) ](https://docs.aws.amazon.com/redshift/latest/dg/r_MERGE.html). 

      Choose options: 
     + **Choose keys and simple actions** – choose the columns to be used as matching keys between the source data and your target data set. 

       Specify the following options when matched:
       + Update record in your target data set with data from source.
       + Delete record in your target data set.

       Specify the following options when not matched:
       + Insert source data as a new row into your target data set.
       + Do nothing.
     + **Enter custom MERGE statement** – You can then choose **Validate Merge statement** to verify that the statement is valid or invalid.
   +  **TRUNCATE** – If a table already exists, truncate the table data by first clearing the contents of the target table. If truncate is successful, then insert all data. If the table doesn't exist, create the table and insert all data. If truncate is not successful, the operation will fail. 
   +  **DROP** – If a table already exists, delete the table metadata and data. If deletion is successful, then insert all data. If the table doesn't exist, create the table and insert all data. If drop is not successful, the operation will fail. 
   +  **CREATE** – Create a new table with the default name. If table name already exist, create a new table with a name postfix of `job_datetime` to the name for uniqueness. This will insert all the data into the new table. If the table exists, the final table name will have the postfix appended. If the table doesn’t exist, a table will be created. In either case, a new table will be created. 

# Advanced options
<a name="creating-redshift-connection-advanced-options"></a>

 See [ Using the Amazon Redshift Spark connector on AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-redshift.html#aws-glue-programming-etl-redshift-using). 