

# Connecting to Vertica in AWS Glue Studio
<a name="connecting-to-data-vertica"></a>

 AWS Glue provides built-in support for Vertica. AWS Glue Studio provides a visual interface to connect to Vertica, author data integration jobs, and run them on the AWS Glue Studio serverless Spark runtime. 

 AWS Glue Studio creates a unified connection for Vertica. For more information, see [Considerations](using-connectors-unified-connections.md#using-connectors-unified-connections-considerations). 

**Topics**
+ [Creating a Vertica connection](creating-vertica-connection.md)
+ [Creating a Vertica source node](creating-vertica-source-node.md)
+ [Creating a Vertica target node](creating-vertica-target-node.md)
+ [Advanced options](#creating-vertica-connection-advanced-options)

# Creating a Vertica connection
<a name="creating-vertica-connection"></a>

**Prerequisites**:
+ An Amazon S3 bucket or folder to use for temporary storage when reading from and writing to the database, referred to by *tempS3Path*.
**Note**  
When using Vertica in AWS Glue job data previews, temporary files may not be automatically removed from *tempS3Path*. To ensure the removal of temporary files, directly end the data preview session by choosing **End session** in the **Data preview** pane.  
If you cannot guarantee the data preview session is ended directly, consider setting Amazon S3 Lifecycle configuration to remove old data. We recommend removing data older than 49 hours, based on maximum job runtime plus a margin. For more information about configuring Amazon S3 Lifecycle, see [Managing your storage lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) in the Amazon S3 documentation.
+ An IAM policy with appropriate permissions to your Amazon S3 path you can associate with your AWS Glue job role.
+ If your Vertica instance is in an Amazon VPC, configure Amazon VPC to allow your AWS Glue job to communicate with the Vertica instance without traffic traversing the public internet. 

  In Amazon VPC, identify or create a **VPC**, **Subnet** and **Security group** that AWS Glue will use while executing the job. Additionally, you need to ensure Amazon VPC is configured to permit network traffic between your Vertica instance and this location. Your job will need to establish a TCP connection with your Vertica client port, (default 5433). Based on your network layout, this may require changes to security group rules, Network ACLs, NAT Gateways and Peering connections.

**To configure a connection to Vertica:**

1. In AWS Secrets Manager, create a secret using your Vertica credentials, *verticaUsername* and *verticaPassword*. To create a secret in Secrets Manager, follow the tutorial available in [ Create an AWS Secrets Manager secret ](https://docs.aws.amazon.com//secretsmanager/latest/userguide/create_secret.html) in the AWS Secrets Manager documentation. After creating the secret, keep the Secret name, *secretName* for the next step. 
   + When selecting **Key/value pairs**, create a pair for the key `user` with the value *verticaUsername*.
   + When selecting **Key/value pairs**, create a pair for the key `password` with the value *verticaPassword*.

1. In the AWS Glue console, create a connection by following the steps in [Adding an AWS Glue connection](console-connections.md). After creating the connection, keep the connection name, *connectionName*, for the next step. 
   + When selecting a **Connection type**, select Vertica.
   + When selecting **Vertica Host**, provide the hostname of your Vertica installation.
   + When selecting **Vertica Port**, the port your Vertica installation is available through.
   + When selecting an **AWS Secret**, provide *secretName*.

1. In the following situations, you may require additional configuration:
   + 

     For Vertica instances hosted on AWS in an Amazon VPC
     + Provide Amazon VPC connection information to the AWS Glue connection that defines your Vertica security credentials. When creating or updating your connection, set **VPC**, **Subnet** and **Security groups** in **Network options**.

You will need to perform the following steps before running your AWS Glue job:
+ Grant the IAM role associated with your AWS Glue job permissions to *tempS3Path*.
+ Grant the IAM role associated with your AWS Glue job permission to read *secretName*.

# Creating a Vertica source node
<a name="creating-vertica-source-node"></a>

## Prerequisites needed
<a name="creating-vertica-source-node-prerequisites"></a>
+ A Vertica type AWS Glue Data Catalog connection, *connectionName* and a temporary Amazon S3 location, *tempS3Path*, as described in the previous section, [Creating a Vertica connection](creating-vertica-connection.md).
+ A Vertica table you would like to read from, *tableName*, or query *targetQuery*.

## Adding a Vertica data source
<a name="creating-vertica-source-node-add"></a>

**To add a **Data source – Vertica** node:**

1.  Choose the connection for your Vertica data source. Since you have created it, it should be available in the dropdown. If you need to create a connection, choose **Create Vertica connection**. For more information see the previous section, [Creating a Vertica connection](creating-vertica-connection.md). 

    Once you have chosen a connection, you can view the connection properties by clicking **View properties**. 

1. Choose the **Database** containing your table.

1. Choose the **Staging area in Amazon S3**, enter an S3A URI to *tempS3Path*.

1. Choose the **Vertica Source**.
   +  **Choose a single table** – access all data from a single table. 
   +  **Enter custom query ** – access a dataset from multiple tables based on your custom query. 

1.  If you chose a single table, enter *tableName* and optionally select a **Schema**. 

    If you chose **Enter custom query**, enter a SQL SELECT query and optionally select a **Schema**. 

1.  In **Custom Vertica properties**, enter parameters and values as needed. 

# Creating a Vertica target node
<a name="creating-vertica-target-node"></a>

## Prerequisites needed
<a name="creating-vertica-target-node-prerequisites"></a>
+ A Vertica type AWS Glue Data Catalog connection, *connectionName* and a temporary Amazon S3 location, *tempS3Path*, as described in the previous section, [Creating a Vertica connection](creating-vertica-connection.md).

## Adding a Vertica data target
<a name="creating-vertica-target-node-add"></a>

**To add a **Data target – Vertica** node:**

1.  Choose the connection for your Vertica data source. Since you have created it, it should be available in the dropdown. If you need to create a connection, choose **Create Vertica connection**. For more information see the previous section, [Creating a Vertica connection](creating-vertica-connection.md). 

    Once you have chosen a connection, you can view the connection properties by clicking **View properties**. 

1. Choose the **Database** containing your table.

1. Choose the **Staging area in Amazon S3**, enter an S3A URI to *tempS3Path*.

1. Enter *tableName* and optionally select a **Schema**. 

1.  In **Custom Vertica properties**, enter parameters and values as needed. 

## Advanced options
<a name="creating-vertica-connection-advanced-options"></a>

You can provide advanced options when creating a Vertica node. These options are the same as those available when programming AWS Glue for Spark scripts.

See [Vertica connections](aws-glue-programming-etl-connect-vertica-home.md).