

# Connecting to Google Sheets
<a name="connecting-to-googlesheets"></a>

 Google Sheets is an online spreadsheet software that allows you to organize large amounts of data, create custom reports, automate calculations, and collaborate with others.If you're a Google Sheets user, you can connect AWS Glue to your Google Sheets account. Then, you can use Google Sheets as a data source in your ETL jobs. Run these jobs to transfer data between Google Sheets and AWS services or other supported applications. 

**Topics**
+ [

# AWS Glue support for Google Sheets
](googlesheets-support.md)
+ [

# Policies containing the API operations for creating and using connections
](googlesheets-configuring-iam-permissions.md)
+ [

# Configuring Google Sheets
](googlesheets-configuring.md)
+ [

# Configuring Google Sheets connections
](googlesheets-configuring-connections.md)
+ [

# Reading from Google Sheets entities
](googlesheets-reading-from-entities.md)
+ [

# Google Sheets connection options
](googlesheets-connection-options.md)
+ [

# Set up Authorization code OAuth flow for Google Sheets
](googlesheets-oauth-authorization.md)
+ [

# Limitations for Google Sheets connector
](googlesheets-connector-limitations.md)

# AWS Glue support for Google Sheets
<a name="googlesheets-support"></a>

AWS Glue supports Google Sheets as follows:

**Supported as a source?**  
Yes. You can use AWS Glue ETL jobs to query data from Google Sheets.

**Supported as a target?**  
No.

**Supported Google Sheets API versions**  
 Google Sheets API v4 and Google Drive API v3 

# Policies containing the API operations for creating and using connections
<a name="googlesheets-configuring-iam-permissions"></a>

 The following sample policy describes the required AWS permissions for creating and using connections. If you are creating a new role, create a policy that contains the following: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:ListConnectionTypes",
        "glue:DescribeConnectionType",
        "glue:RefreshOAuth2Tokens",
        "glue:ListEntities",
        "glue:DescribeEntity"
      ],
      "Resource": "*"
    }
  ]
}
```

------

You can also use the following managed IAM policies to allow access:
+  [ AWSGlueServiceRole ](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – Grants access to resources that various AWS Glue processes require to run on your behalf. These resources include AWS Glue, Amazon S3, IAM, CloudWatch Logs, and Amazon EC2. If you follow the naming convention for resources specified in this policy, AWS Glue processes have the required permissions. This policy is typically attached to roles specified when defining crawlers, jobs, and development endpoints. 
+  [ AWSGlueConsoleFullAccess ](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – Grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console. 

# Configuring Google Sheets
<a name="googlesheets-configuring"></a>

Before you can use AWS Glue to transfer from Google Sheets, you must meet these requirements:

## Minimum requirements
<a name="googlesheets-configuring-min-requirements"></a>
+ You have a Google Sheets account with Email and Password.
+  Your Google Sheets account is enabled for API access. All use of the Google Sheets API is available at no additional cost. 
+  Your Google Sheets account allows you to install connected apps. If you lack access to this functionality, contact your Google Sheets administrator. 

 If you meet these requirements, you’re ready to connect AWS Glue to your Google Sheets account. 

# Configuring Google Sheets connections
<a name="googlesheets-configuring-connections"></a>

To configure a Google Sheet connection:

1. In AWS Secrets Manager, create a secret with the following details: 

   1.  For AuthorizationCode grant type: 
      +  For customer managed connected app – Secret should contain the connected app Consumer Secret with `USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET` as key. 

1. In AWS Glue Glue Studio, create a connection under **Data Connections** by following the steps below: 

   1. When selecting a **Data Source**, select Google Sheets.

   1. Provide the Google Sheets environment.

      1.  Select the `secretName` which you want to use for this connection in AWS Glue to put the tokens. 

      1.  Select the network options if you want to use your network. 

1.  Grant the IAM role associated with your AWS Glue job permission to read `secretName`. 

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "secretsmanager:DescribeSecret",
           "secretsmanager:GetSecretValue",
           "secretsmanager:PutSecretValue",
           "ec2:CreateNetworkInterface",
           "ec2:DescribeNetworkInterfaces",
           "ec2:DeleteNetworkInterface"
         ],
         "Resource": "*"
       }
     ]
   }
   ```

------

 **AUTHORIZATION\$1CODE Grant Type** 

 This grant type is considered “three-legged” OAuth as it relies on redirecting users to the third party authorization server to authenticate the user. It is used when creating connections via the AWS Glue Console. The AWS Glue Console will redirect the user to Google Sheets where the user must login and allow AWS Glue the requested permissions to access their Google Sheets instance. 

 Users may opt to create their own connected app in Google Sheets and provide their own client id and client secret when creating connections through the AWS Glue Console. In this scenario, they will still be redirected to Google Sheets to login and authorize AWS Glue to access their resources. 

 This grant type results in a refresh token and access token. The access token is short lived, and may be refreshed automatically without user interaction using the refresh token. 

 For more information, see [ public Google Sheets documentation on creating a connected app for Authorization Code OAuth flow ](https://developers.google.com/workspace/guides/create-credentials). 

# Reading from Google Sheets entities
<a name="googlesheets-reading-from-entities"></a>

 **Prerequisites** 
+  A Google SpreadSheet that you would like to read from. You will need the SpreadSheet ID and tabName of the spreadsheet. 

 **Google Sheets Entity and Field Details:** 


| Entity | Data Type | Supported Operators | 
| --- | --- | --- | 
| Spreadsheet | String | N/A (filter is not supported) | 

 **Example** 

```
googleSheets_read = glueContext.create_dynamic_frame.from_options(
    connection_type="googlesheets",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "{SpreadSheetID}#{SheetTabName}",
        "API_VERSION": "v4"
    }
```

 **Partitioning queries** 

 For Record Base Partitioning only, `NUM_PARTITIONS` can be provided as additional spark options if you want to utilize concurrency in Spark. With this parameter, the original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by spark tasks concurrently. 

 **Example with `NUM_PARTITIONS`** 

```
googlesheets_read = glueContext.create_dynamic_frame.from_options(
    connection_type="googlesheets",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "{SpreadSheetID}#{SheetTabName}",
        "API_VERSION": "v4",
        "NUM_PARTITIONS": "10"
    }
```

# Google Sheets connection options
<a name="googlesheets-connection-options"></a>

The following are connection options for Google Sheets:
+  `ENTITY_NAME`(String) - (Required) Used for Read. The `SpreadSheet ID` and `sheetTabName` in Google Sheets. Example: `{SpreadSheetID}#{SheetTabName}`. 
+  `API_VERSION`(String) - (Required) Used for Read. Google Sheets Rest API version you want to use. 
+  `SELECTED_FIELDS`(List<String>) - Default: empty(SELECT \$1). Used for Read. Columns you want to select for the object. 
+  `FILTER_PREDICATE`(String) - Default: empty. Used for Read. It should be in the Spark SQL format. 
+  `QUERY`(String) - Default: empty. Used for Read. Full Spark SQL query. 
+  `NUM_PARTITIONS`(Integer) - Default: 1. Used for Read. Number of partitions for read. 

# Set up Authorization code OAuth flow for Google Sheets
<a name="googlesheets-oauth-authorization"></a>

 **Prerequisites** 
+  A Google account where you can sign in to use the Google Sheets app. In your Google account, Google Sheets contains the data that you want to transfer. 
+  A Google Cloud Platform account and a Google Cloud project. See [ Create Google Cloud Project ](https://developers.google.com/workspace/guides/create-project) for more details. 

**To set up your Google account and get OAuth 2.0 credentials:**

1.  Once the Google Cloud project is setup, enable the Google Sheets API and Google Drive APIs in the project. For the steps to enable them, see [ Enable and disable APIs ](https://support.google.com/googleapi/answer/6158841) in the API Console Help for Google Cloud Platform. 

1.  Next, configure an OAuth consent screen for external users. For more information about the OAuth consent screen, see [ Setting up your OAuth consent screen ](https://support.google.com/cloud/answer/10311615#) in the Google Cloud Platform Console Help. 

1.  In the OAuth consent screen, add the following scopes: 
   +  [ The Google Sheets API read-only scope ](https://www.googleapis.com/auth/spreadsheets.readonly) 
   +  [ The Google Drive API read-only scope ](ttps://www.googleapis.com/auth/drive.readonly) 

    For more information about these scopes, see [ OAuth 2.0 Scopes for Google APIs ](https://developers.google.com/identity/protocols/oauth2/scopes) in the Google Identity documentation. 

1.  Generate OAuth 2.0 client ID and secret. For the steps to create this client ID, see [ Setting up OAuth 2.0 ](https://support.google.com/cloud/answer/6158849?hl=en#zippy=) in the Google Cloud Platform Console Help. 

    The OAuth 2.0 client ID must have one or more authorized redirect URLs. 

    Redirect URLs have the following format: 
   + https://<aws-region>.console.aws.amazon.com/gluestudio/oauth 

1.  Note the client ID and client secret from the settings for your OAuth 2.0 client ID. 

# Limitations for Google Sheets connector
<a name="googlesheets-connector-limitations"></a>

The following are limitations for the Google Sheets connector:
+  Google Sheets connector does not support Filters. Hence, filter based partitioning cannot be supported. 
+  In Record Base Partitioning, there is no provision to return exact record count by SAAS. As a result, there can be scenarios where files with empty records are created.
+  Since the Google Sheets connector does not support filter-based partitioning, `partitionField`, `lowerbound`, and `upperbound` are not valid connection options. If these options are provided, the AWS Glue job is expected to fail. 
+  It is essential to designate the first row of the sheet as the header row to avoid data processing issues. 
  +  If not provided, header row will be replaced with `Unnamed:1`, `Unnamed:2`, `Unnamed:3`... if the sheet contains data with the first row empty. 
  +  If header row is provided, empty column names will be replaced with `Unnamed:<number of column>`. For example, if header row is `['ColumnName1', 'ColumnName2', '', '', 'ColumnName5', 'ColumnName6']`, then it will become `['ColumnName1', 'ColumnName2', 'Unnamed:3', 'Unnamed:4', 'ColumnName5', 'ColumnName6'].` 
+  The Google Sheets connector does not support Incremental transfer. 
+  Google Sheets connector supports only String datatype. 
+  Duplicate headers in a sheet will be iteratively renamed with a numeric suffix. Header names provided by the user will have precedence while renaming the duplicate headers. For example, if the header row is ["Name", "", "Name", null, "Unnamed:6", ""], it will change to: ["Name", "Unnamed:2", "Name1", "Unnamed:4", "Unnamed:6", "Unnamed:61"]. 
+  Google Sheets connector does not support spaces for a tabName. 
+  A folder name can't have the following special characters: 
  + \$1
  + /