

# Connecting to Google Search Console
<a name="connecting-to-data-google-search-console"></a>

Google Search Console is a free platform available to website owners for monitoring how Google views the site and optimizing its organic presence. This includes viewing referring domains, mobile site performance, rich search results, and the highest-traffic queries and pages. If you are a Google Search Console user, you can connect AWS Glue to your Google Search Console account. You can use Google Search Console as a data source in your ETL jobs. Run these jobs to transfer data from Google Search Console to AWS services or other supported applications.

**Topics**
+ [AWS Glue support for Google Search Console](google-search-console-support.md)
+ [Policies containing the API operations for creating and using connections](google-search-console-configuring-iam-permissions.md)
+ [Configuring Google Search Console](google-search-console-configuring.md)
+ [Configuring Google Search Console connections](google-search-console-configuring-connections.md)
+ [Reading from Google Search Console entities](google-search-console-reading-from-entities.md)
+ [Google Search Console connection options](google-search-console-connection-options.md)
+ [Google Search Console limitations](google-search-console-limitations.md)

# AWS Glue support for Google Search Console
<a name="google-search-console-support"></a>

AWS Glue supports Google Search Console as follows:

**Supported as a source?**  
Yes. You can use AWS Glue ETL jobs to query data from Google Search Console.

**Supported as a target?**  
No.

**Supported Google Search Console API versions**  
The following Google Search Console API versions are supported:
+ v3

# Policies containing the API operations for creating and using connections
<a name="google-search-console-configuring-iam-permissions"></a>

The following sample policy describes the required AWS IAM permissions for creating and using connections. If you are creating a new role, create a policy that contains the following:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:ListConnectionTypes",
        "glue:DescribeConnectionType",
        "glue:RefreshOAuth2Tokens",
        "glue:ListEntities",
        "glue:DescribeEntity"
      ],
      "Resource": "*"
    }
  ]
}
```

------

If you don't want to use the above method, alternatively use the following managed IAM policies:
+ [AWSGlueServiceRole](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – Grants access to resources that various AWS Glue processes require to run on your behalf. These resources include AWS Glue, Amazon S3, IAM, CloudWatch Logs, and Amazon EC2. If you follow the naming convention for resources specified in this policy, AWS Glue processes have the required permissions. This policy is typically attached to roles specified when defining crawlers, jobs, and development endpoints.
+ [AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – Grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console.

# Configuring Google Search Console
<a name="google-search-console-configuring"></a>

Before you can use AWS Glue to transfer data from Google Search Console, you must meet these requirements:

## Minimum requirements
<a name="google-search-console-configuring-min-requirements"></a>

The following are minimum requirements:
+ You have a Google Search Console account.
+ You have a Google Cloud Platform account and a Google Cloud project.
+ In your Google Cloud project, you've enabled the Google Search Console API.
+ In your Google Cloud project, you've configured an OAuth consent screen for external users. For more information, see [Setting up your OAuth consent screen ](https://support.google.com/cloud/answer/10311615) in the Google Cloud Platform Console Help.
+ In your Google Cloud project, you've configured an OAuth 2.0 client ID. See [Setting up OAuth 2.0](https://support.google.com/cloud/answer/6158849) for the client credentials that AWS Glue uses to access your data securely when it makes authenticated calls to your account.

If you meet these requirements, you’re ready to connect AWS Glue to your Google Search Console account. For typical connections, you don't need do anything else in Google Search Console.

# Configuring Google Search Console connections
<a name="google-search-console-configuring-connections"></a>

Google Search Console supports the AUTHORIZATION\$1CODE grant type for OAuth2. The grant type determines how AWS Glue communicates with Google Search Console to request access to your data.
+ This grant type is considered "three-legged" OAuth as it relies on redirecting users to a third-party authorization server to authenticate the user. It is used when creating connections via the AWS Glue console.
+ Users may still opt to create their own connected app in Google Search Console and provide their own client ID and client secret when creating connections through the AWS Glue console. In this scenario, they will still be redirected to Google Search Console to login and authorize AWS Glue to access their resources.
+ This grant type results in a refresh token and access token. The access token is short lived, and may be refreshed automatically without user interaction using the refresh token.
+ For public Google Search Console documentation on creating a connected app for Authorization Code OAuth flow, see [Using OAuth 2.0 to Access Google APIs](https://developers.google.com/identity/protocols/oauth2).

To configure a Google Search Console connection:

1. In AWS Secrets Manager, create a secret with the following details:

   1. For the customer managed connected app, the Secret should contain the connected app Consumer Secret with `USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET` as key.

   1. Note: you must create a secret for your connections in AWS Glue.

1. In AWS Glue Glue Studio, create a connection under **Data Connections** by following the steps below:

   1. When selecting a **Connection type**, select Google Search Console.

   1. Select the AWS IAM role which AWS Glue can assume and has permissions for following actions:

------
#### [ JSON ]

****  

      ```
      {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "secretsmanager:DescribeSecret",
              "secretsmanager:GetSecretValue",
              "secretsmanager:PutSecretValue",
              "ec2:CreateNetworkInterface",
              "ec2:DescribeNetworkInterfaces",
              "ec2:DeleteNetworkInterface"
            ],
            "Resource": "*"
          }
        ]
      }
      ```

------

   1. Select the `secretName` which you want to use for this connection in AWS Glue to put the tokens.

   1. Select the network options if you want to use your network.

1. Grant the IAM role associated with your AWS Glue job permission to read `secretName`.

# Reading from Google Search Console entities
<a name="google-search-console-reading-from-entities"></a>

**Prerequisite**

A Google Search Console object you would like to read from. You will need the object name.

**Supported entities for source**:


| Entity | Can be filtered | Supports limit | Supports Order by | Supports Select \$1 | Supports partitioning | 
| --- | --- | --- | --- | --- | --- | 
| Search Analytics | Yes | Yes | No | Yes | No | 
| Sites | No | No | No | Yes | No | 
| Sitemaps | No | No | No | Yes | No | 

**Example**:

```
googleSearchConsole_read = glueContext.create_dynamic_frame.from_options(
    connection_type="googlesearchconsole",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "API_VERSION": "v3"
    }
```

**Google Search Console entity and field details**:

Google Search Console provides endpoints to fetch metadata dynamically for supported entities. Accordingly, operator support is captured at the datatype level.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/google-search-console-reading-from-entities.html)

**Note**  
For an updated list of valid values for filters, see the [Google Search Console](https://developers.google.com/webmaster-tools/v1/searchanalytics/query) API docs.  
The field `start_end_date` is a combination of `start_date` and `end_date`.

## Partitioning queries
<a name="google-search-console-reading-partitioning-queries"></a>

Filter-based partitioning and record-based partitioning are not supported.

# Google Search Console connection options
<a name="google-search-console-connection-options"></a>

The following are connection options for Google Search Console:
+ `ENTITY_NAME`(String) - (Required) Used for Read. The name of your object in Google Search Console.
+ `API_VERSION`(String) - (Required) Used for Read. Google Search Console Rest API version you want to use.
+ `SELECTED_FIELDS`(List<String>) - Default: empty(SELECT \$1). Used for Read. Columns you want to select for the object.
+ `FILTER_PREDICATE`(String) - Default: "start\$1end\$1date between <30 days ago from current date> AND <yesterday: that is, 1 day ago from the current date>". Used for Read. It should be in the Spark SQL format.
+ `QUERY`(String) - Default: "start\$1end\$1date between <30 days ago from current date> AND <yesterday: that is, 1 day ago from the current date>" Used for Read. Full Spark SQL query.
+ `INSTANCE_URL`(String) - Used for Read. A valid Google Search Console instance URL.

# Google Search Console limitations
<a name="google-search-console-limitations"></a>

The following are limitations or notes for Google Search Console:
+ Google Search Console enforces usage limits on the API. For more information, see [Usage Limits](https://developers.google.com/webmaster-tools/limits).
+ When no filter is passed for the `Search Analytics` entity, the API sums up all the clicks, impressions, CTR, and other data for your entire site within the specified default date range and presents it as a single record.
+ To breakdown the data into smaller segments, you need to introduce dimensions to your query. Dimensions tells the API how you want to segment your data.
  + For example, if you add `filterPredicate: dimensions="country"` you'll get one record for each country where your site received traffic during the specified period.
  + Example to pass multiple dimensions: `filterPredicate: dimensions="country" AND dimensions="device" AND dimensions="page"`. In this case you'll get one row in the response for each unique combination of these three dimensions.
+ Default values are set for the `start_end_date` and `dataState` fields.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/google-search-console-limitations.html)