

# Connecting to Google BigQuery in AWS Glue Studio
<a name="connecting-to-data-bigquery"></a>

**Note**  
  You can use AWS Glue for Spark to read from and write to tables in Google BigQuery in AWS Glue 4.0 and later versions. To configure Google BigQuery with AWS Glue jobs programmatically, see  [BigQuery connections](aws-glue-programming-etl-connect-bigquery-home.md).  

 AWS Glue Studio provides a visual interface to connect to BigQuery, author data integration jobs, and run them on the AWS Glue Studio serverless Spark runtime. 

 When creating a connection to Google BigQuery in AWS Glue Studio, a unified connection is created. For more information, see [Considerations](using-connectors-unified-connections.md#using-connectors-unified-connections-considerations). 

 Instead of creating a secret with the credentials in a specific format, `{"credentials": "base64 encoded JSON"}`, now with unified connection to Google BigQuery, you can create a secret which directly includes the JSON from Goolge BigQuery: `{"type": "service-account", ...}`. 

**Topics**
+ [Creating a BigQuery connection](creating-bigquery-connection.md)
+ [Creating a BigQuery source node](creating-bigquery-source-node.md)
+ [Creating a BigQuery target node](creating-bigquery-target-node.md)
+ [Advanced options](#creating-bigquery-connection-advanced-options)

# Creating a BigQuery connection
<a name="creating-bigquery-connection"></a>

To connect to Google BigQuery from AWS Glue, you will need to create and store your Google Cloud Platform credentials in a AWS Secrets Manager secret, then associate that secret with a Google BigQuery AWS Glue connection.

**To configure a connection to BigQuery:**

1. In Google Cloud Platform, create and identify relevant resources:
   + Create or identify a GCP project containing BigQuery tables you would like to connect to.
   + Enable the BigQuery API. For more information, see [ Use the BigQuery Storage Read API to read table data ](https://cloud.google.com/bigquery/docs/reference/storage/#enabling_the_api).

1. In Google Cloud Platform, create and export service account credentials:

   You can use the BigQuery credentials wizard to expedite this step: [Create credentials](https://console.cloud.google.com/apis/credentials/wizard?api=bigquery.googleapis.com).

   To create a service account in GCP, follow the tutorial available in [Create service accounts](https://cloud.google.com/iam/docs/service-accounts-create).
   + When selecting **project**, select the project containing your BigQuery table.
   + When selecting GCP IAM roles for your service account, add or create a role that would grant appropriate permissions to run BigQuery jobs to read, write or create BigQuery tables.

   To create credentials for your service account, follow the tutorial available in [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating).
   + When selecting key type, select **JSON**.

   You should now have downloaded a JSON file with credentials for your service account. It should look similar to the following:

   ```
   {
     "type": "service_account",
     "project_id": "*****",
     "private_key_id": "*****",
     "private_key": "*****",
     "client_email": "*****",
     "client_id": "*****",
     "auth_uri": "https://accounts.google.com/o/oauth2/auth",
     "token_uri": "https://oauth2.googleapis.com/token",
     "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
     "client_x509_cert_url": "*****",
     "universe_domain": "googleapis.com"
   }
   ```

1. In AWS Secrets Manager, create a secret using your downloaded credentials file. You can choose the **Plaintext** tab and paste the JSON formatted file content. To create a secret in Secrets Manager, follow the tutorial available in [ Create an AWS Secrets Manager secret ](https://docs.aws.amazon.com//secretsmanager/latest/userguide/create_secret.html) in the AWS Secrets Manager documentation. After creating the secret, keep the Secret name, *secretName* for the next step. 

1. In the AWS Glue Data Catalog, create a connection by following the steps in [https://docs.aws.amazon.com/glue/latest/dg/console-connections.html](https://docs.aws.amazon.com/glue/latest/dg/console-connections.html). After creating the connection, keep the connection name, *connectionName*, for the next step. 
   + When selecting a **Connection type**, select Google BigQuery.
   + When selecting an **AWS Secret**, provide *secretName*.

1. Grant the IAM role associated with your AWS Glue job permission to read *secretName*.

1. In your AWS Glue job configuration, provide *connectionName* as an **Additional network connection**.

# Creating a BigQuery source node
<a name="creating-bigquery-source-node"></a>

## Prerequisites needed
<a name="creating-bigquery-source-node-prerequisites"></a>
+ A BigQuery type AWS Glue Data Catalog connection
+ An AWS Secrets Manager secret for your Google BigQuery credentials, used by the connection.
+ Appropriate permissions on your job to read the secret used by the connection.
+ The name and dataset of the table and corresponding Google Cloud project you would like to read.

## Adding a BigQuery data source
<a name="creating-bigquery-source-node-add"></a>

**To add a **Data source – BigQuery** node:**

1.  Choose the connection for your BigQuery data source. Since you have created it, it should be available in the dropdown. If you need to create a connection, choose **Create BigQuery connection**. For more information, see [ Overview of using connectors and connections ](https://docs.aws.amazon.com/glue/latest/ug/connectors-chapter.html#using-connectors-overview). 

    Once you have chosen a connection, you can view the connection properties by clicking **View properties**. 

1. Identify what BigQuery data you would like to read, then choose a **BigQuery Source** option
   + Choose a single table – allows you to pull all data from a table.
   + Enter a custom query – allows you to customize which data is retrieved by providing a query.

1.  Describe the data you would like to read

   **(Required) **set **Parent Project** to the project containing your table, or a billing parent project, if relevant.

   If you chose a single table, set **Table** to the name of a Google BigQuery table in the following format: `[dataset].[table]` 

   If you chose a query, provide it to **Query**. In your query, refer to tables with their fully qualified table name, in the format: `[project].[dataset].[tableName]`.

1.  Provide BigQuery properties 

   If you chose a single table, you do not need to provide additional properties.

   If you chose a query, you must provide the following **Custom Google BigQuery properties**:
   + Set `viewsEnabled` to true.
   + Set `materializationDataset` to a dataset. The GCP principal authenticated by the credentials provided through the AWS Glue connection must be able to create tables in this dataset.

# Creating a BigQuery target node
<a name="creating-bigquery-target-node"></a>

## Prerequisites needed
<a name="creating-bigquery-target-node-prerequisites"></a>
+ A BigQuery type AWS Glue Data Catalog connection
+ An AWS Secrets Manager secret for your Google BigQuery credentials, used by the connection.
+ Appropriate permissions on your job to read the secret used by the connection.
+ The name and dataset of the table and corresponding Google Cloud project you would like to write to.

## Adding a BigQuery data target
<a name="creating-bigquery-target-node-add"></a>

**To add a **Data target – BigQuery** node:**

1.  Choose the connection for your BigQuery data target. Since you have created it, it should be available in the dropdown. If you need to create a connection, choose **Create BigQuery connection**. For more information, see [ Overview of using connectors and connections ](https://docs.aws.amazon.com/glue/latest/ug/connectors-chapter.html#using-connectors-overview). 

    Once you have chosen a connection, you can view the connection properties by clicking **View properties**. 

1. Identify what BigQuery table you would like to write to, then choose a **Write method**.
   + Direct – writes to BigQuery directly using the BigQuery Storage Write API.
   + Indirect – writes to Google Cloud Storage, then copies to BigQuery.

   If you would like to write indirectly, provide a destination GCS location with **Temporary GCS bucket**. You will need to provide additional configuration in your AWS Glue connection. For more information, see [Using indirect write with Google BigQuery](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-bigquery-home.html#aws-glue-programming-etl-connect-bigquery-indirect-write).

1.  Describe the data you would like to read

   **(Required) **set **Parent Project** to the project containing your table, or a billing parent project, if relevant.

   If you chose a single table, set **Table** to the name of a Google BigQuery table in the following format: `[dataset].[table]` 

## Advanced options
<a name="creating-bigquery-connection-advanced-options"></a>

You can provide advanced options when creating a BigQuery node. These options are the same as those available when programming AWS Glue for Spark scripts.

See [ BigQuery connection option reference ](https://docs.aws.amazon.com//glue/latest/dg/aws-glue-programming-etl-connect-bigquery-home.html) in the AWS Glue developer guide. 