

# Connecting to Salesforce
<a name="connecting-to-data-salesforce"></a>

Salesforce provides customer relationship management (CRM) software that help you with sales, customer service, e-commerce, and more. If you're a Salesforce user, you can connect AWS Glue to your Salesforce account. Then, you can use Salesforce as a data source or destination in your ETL Jobs. Run these jobs to transfer data between Salesforce and AWS services or other supported applications.

**Topics**
+ [AWS Glue support for Salesforce](salesforce-support.md)
+ [Policies containing the API operations for creating and using connections](salesforce-configuring-iam-permissions.md)
+ [Configuring Salesforce](salesforce-configuring.md)
+ [Apply System Admin profile](#salesforce-configuring-apply-system-admin-profile)
+ [Configuring Salesforce connections](salesforce-configuring-connections.md)
+ [Reading from Salesforce](salesforce-reading-from-entities.md)
+ [Writing to Salesforce](salesforce-writing-to.md)
+ [Salesforce connection options](salesforce-connection-options.md)
+ [Limitations for the Salesforce connector](salesforce-connector-limitations.md)
+ [Set up the Authorization Code flow for Salesforce](salesforce-setup-authorization-code-flow.md)
+ [Set up the JWT bearer OAuth flow for Salesforce](salesforce-setup-jwt-bearer-oauth.md)

# AWS Glue support for Salesforce
<a name="salesforce-support"></a>

AWS Glue supports Salesforce as follows:

**Supported as a source?**  
Yes. You can use AWS Glue ETL jobs to query data from Salesforce.

**Supported as a target?**  
Yes. You can use AWS Glue ETL jobs to write records into Salesforce.

**Supported Salesforce API versions**  
The following Salesforce API versions are supported
+ v58.0
+ v59.0
+ v60.0

# Policies containing the API operations for creating and using connections
<a name="salesforce-configuring-iam-permissions"></a>

The following sample IAM policy describes the required permissions for creating, managing and using Salesforce connections within AWS Glue ETL jobs. If you are creating a new role, create a policy that contains the following:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:DescribeSecret",
        "secretsmanager:GetSecretValue",
        "secretsmanager:PutSecretValue",
        "glue:ListConnectionTypes",
        "glue:DescribeConnectionType",
        "glue:RefreshOAuth2Tokens",
        "glue:ListEntities",
        "glue:DescribeEntity"
      ],
      "Resource": "*"
    }
  ]
}
```

------

You can also use the following IAM policies to allow access:
+ [AWSGlueServiceRole](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – Grants access to resources that various AWS Glue processes require to run on your behalf. These resources include AWS Glue, Amazon S3, IAM, CloudWatch Logs, and Amazon EC2. If you follow the naming convention for resources specified in this policy, AWS Glue processes have the required permissions. This policy is typically attached to roles specified when defining crawlers, jobs, and development endpoints.
+ [AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – Grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console.

If providing Network Options when creating a Salesforce connection, the following actions must also be included in the IAM role:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DeleteNetworkInterface"
      ],
      "Resource": "*"
    }
  ]
}
```

------

 For Zero-ETL Salesforce connections, see [Zero-ETL prerequisites](https://docs.aws.amazon.com/glue/latest/dg/zero-etl-prerequisites.html). 

 For Zero-ETL Salesforce connections, see [Zero-ETL prerequisites](https://docs.aws.amazon.com/glue/latest/dg/zero-etl-prerequisites.html). 

# Configuring Salesforce
<a name="salesforce-configuring"></a>

Before you can use AWS Glue to transfer data to or from Salesforce, you must meet these requirements:

## Minimum requirements
<a name="salesforce-configuring-min-requirements"></a>

The following are minimum requirements:
+ You have a Salesforce account.
+ Your Salesforce account is enabled for API access. API access is enabled by default for the Enterprise, Unlimited, Developer, and Performance editions.

If you meet these requirements, you’re ready to connect AWS Glue to your Salesforce account. AWS Glue handles the remaining requirements with the AWS managed connected app.

## The AWS managed connected app for Salesforce
<a name="salesforce-configuring-connected-app"></a>

The AWS managed connected app helps you create a Salesforce connection in fewer steps. In Salesforce, a connected app is a framework that authorizes external applications, like AWS Glue, to access your Salesforce data using OAuth 2.0. To use the AWS managed connected app, create a Salesforce connection by using the AWS Glue consule. When you configure the connection, set the **OAuth grant type** to **Authorization code** and leave the box checked for **Use AWS managed client application**.

When saving the connection, you will be redirected to Salesforce to login and approve AWS Glue access to your Salesforce account.

## Apply System Admin profile
<a name="salesforce-configuring-apply-system-admin-profile"></a>

 In Salesforce, follow the steps to apply the System Admin profile: 

1.  In Salesforce, navigate to **Settings > Connected Apps > Connected Apps OAuth Usage**. 

1.  In the list of connected apps, find AWS Glue and choose **Install**. If needed, choose **Unblock**. 

1.  Navigate to **Settings > Manage Connected Apps then choose AWS Glue**. Under OAuth Policies, choose **Admin approved users are pre-authorized** and select the **System Admin** profile. This action restricts access to AWS Glue only to users with the System Admin profile. 

## Apply System Admin profile
<a name="salesforce-configuring-apply-system-admin-profile"></a>

 In Salesforce, follow the steps to apply the System Admin profile: 

1.  In Salesforce, navigate to **Settings > Connected Apps > Connected Apps OAuth Usage**. 

1.  In the list of connected apps, find AWS Glue and choose **Install**. If needed, choose **Unblock**. 

1.  Navigate to **Settings > Manage Connected Apps then choose AWS Glue**. Under OAuth Policies, choose **Admin approved users are pre-authorized** and select the **System Admin** profile. This action restricts access to AWS Glue only to users with the System Admin profile. 

# Configuring Salesforce connections
<a name="salesforce-configuring-connections"></a>

To configure a Salesforce connection:

1. In AWS Secrets Manager, create a secret with the following details:

   1. For the JWT\$1TOKEN grant type - the secret should contain the JWT\$1TOKEN key with its value.

   1. For the AuthorizationCode grant type:

      1. For an AWS Managed connected app, an empty secret or a secret with some temporary value must be provided.

      1. For a customer managed connected app, the secret should contain the connected app `Consumer Secret` with `USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET` as the key.

   1. Note: You must create a secret for your connection in AWS Glue.

1. In AWS Glue Glue Studio, create a connection under **Data Connections** by following the steps below:

   1. When selecting a **Connection type**, select Salesforce.

   1. Provide the INSTANCE\$1URL of the Salesforce instance you want to connect to.

   1. Provide the Salesforce environment.

   1. Select the AWS IAM role which AWS Glue can assume and has permissions for following actions:

------
#### [ JSON ]

****  

      ```
      {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "secretsmanager:DescribeSecret",
              "secretsmanager:GetSecretValue",
              "secretsmanager:PutSecretValue",
              "ec2:CreateNetworkInterface",
              "ec2:DescribeNetworkInterfaces",
              "ec2:DeleteNetworkInterface"
            ],
            "Resource": "*"
          }
        ]
      }
      ```

------

   1. Select the OAuth2 grant type which you want to use for the connections. The grant type determines how AWS Glue communicates with Salesforce to request access to your data. Your choice affects the requirements that you must meet before you create the connection. You can choose either of these types:
      + **JWT\$1BEARER Grant Type**: This grant type works well for automation scenarios as it allows a JSON Web Token (JWT) to be created up front with the permissions of a particular user in the Salesforce instance. The creator has control over how long the JWT is valid for. AWS Glue is able to use the JWT to obtain an access token which is used to call Salesforce APIs.

        This flow requires that the user has created a connected app in their Salesforce instance which enables issuing JWT-based access tokens for users.

        For information on creating a connected app for the JWT bearer OAuth flow, see [OAuth 2.0 JWT bearer flow for server-to-server integration](https://help.salesforce.com/s/articleView?id=sf.remoteaccess_oauth_jwt_flow.htm). To set up the JWT bearer flow with the Salesforce connected app, see [Set up the JWT bearer OAuth flow for Salesforce](salesforce-setup-jwt-bearer-oauth.md).
      + **AUTHORIZATION\$1CODE Grant Type**: This grant type is considered a "three-legged" OAuth as it relies on redirecting users to the third-party authorization server to authenticate the user. It is used when creating connections via the AWS Glue console. The user creating a connection may by default rely on an AWS Glue connected app (AWS Glue managed client application) where they do not need to provide any OAuth related information except for their Salesforce instance URL. The AWS Glue console will redirect the user to Salesforce where the user must login and allow AWS Glue the requested permissions to access their Salesforce instance.

        Users may still opt to create their own connected app in Salesforce and provide their own client ID and client secret when creating connections through the AWS Glue console. In this scenario, they will still be redirected to Salesforce to login and authorize AWS Glue to access their resources.

        This grant type results in a refresh token and access token. The access token is short lived, and may be refreshed automatically without user interaction using the refresh token.

        For information on creating a connected app for the Authorization Code OAuth flow, see [Set up the Authorization Code flow for Salesforce](salesforce-setup-authorization-code-flow.md).

   1. Select the `secretName` which you want to use for this connection in AWS Glue to store the OAuth 2.0 tokens.

   1. Select the network options if you want to use your network.

1. Grant the IAM role associated with your AWS Glue job permission to read `secretName`.

1. If providing network options, also grant the IAM role the following permissions:

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "ec2:CreateNetworkInterface",
           "ec2:DescribeNetworkInterfaces",
           "ec2:DeleteNetworkInterface"
         ],
         "Resource": "*"
       }
     ]
   }
   ```

------

## Configuring Salesforce connections with the AWS CLI
<a name="salesforce-configuring-connections-cli"></a>

You can create Salesforce connections using the AWS CLI:

```
aws glue create-connection --connection-input \
"{\"Name\": \"salesforce-conn1\",\"ConnectionType\": \"SALESFORCE\",\"ConnectionProperties\": {\"ROLE_ARN\": \"arn:aws:iam::123456789012:role/glue-role\",\"INSTANCE_URL\": \"https://example.my.salesforce.com\"},\"ValidateCredentials\": true,\"AuthenticationConfiguration\": {\"AuthenticationType\": \"OAUTH2\",\"SecretArn\": \"arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-conn1-secret-IAmcdk\",\"OAuth2Properties\": {\"OAuth2GrantType\": \"JWT_BEARER\",\"TokenUrl\": \"https://login.salesforce.com/services/oauth2/token\"}}}" \
--endpoint-url https://glue.us-east-1.amazonaws.com \
--region us-east-1
```

# Reading from Salesforce
<a name="salesforce-reading-from-entities"></a>

**Prerequisite**

A Salesforce sObject you would like to read from. You will need the object name such as `Account` or `Case` or `Opportunity`.

**Example**:

```
salesforce_read = glueContext.create_dynamic_frame.from_options(
    connection_type="salesforce",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "Account",
        "API_VERSION": "v60.0"
    }
)
```

## Partitioning queries
<a name="salesforce-reading-partitioning-queries"></a>

You can provide the additional Spark options `PARTITION_FIELD`, `LOWER_BOUND`, `UPPER_BOUND`, and `NUM_PARTITIONS` if you want to utilize concurrency in Spark. With these parameters, the original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by Spark tasks concurrently.
+ `PARTITION_FIELD`: the name of the field to be used to partition the query.
+ `LOWER_BOUND`: an **inclusive** lower bound value of the chosen partition field.

  For Date or Timestamp fields, the connector accepts the Spark timestamp format used in Spark SQL queries.

  Examples of valid values:

  ```
  "TIMESTAMP \"1707256978123\""
  "TIMESTAMP '2018-01-01 00:00:00.000 UTC'"
  "TIMESTAMP \"2018-01-01 00:00:00 Pacific/Tahiti\"" 
  "TIMESTAMP \"2018-01-01 00:00:00\""
  "TIMESTAMP \"-123456789\" Pacific/Tahiti"
  "TIMESTAMP \"1702600882\""
  ```
+ `UPPER_BOUND`: an **exclusive** upper bound value of the chosen partition field.
+ `NUM_PARTITIONS`: the number of partitions.
+  `TRANSFER_MODE`: supports two modes: `SYNC` and `ASYNC`. Default is `SYNC`. When set to `ASYNC`, Bulk API 2.0 Query will be utilized for processing. 

Example:

```
salesforce_read = glueContext.create_dynamic_frame.from_options(
    connection_type="salesforce",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "Account",
        "API_VERSION": "v60.0",
        "PARTITION_FIELD": "SystemModstamp",
        "LOWER_BOUND": "TIMESTAMP '2021-01-01 00:00:00 Pacific/Tahiti'",
        "UPPER_BOUND": "TIMESTAMP '2023-01-10 00:00:00 Pacific/Tahiti'",
        "NUM_PARTITIONS": "10",
        "TRANSFER_MODE": "ASYNC" 
    }
)
```

## FILTER\$1PREDICATE option
<a name="salesforce-filter-predicate"></a>

**FILTER\$1PREDICATE**: It is an optional parameter. This option is used for query filter.

Examples of **FILTER\$1PREDICATE**:

```
     Case 1: FILTER_PREDICATE with single criterion
     Examples: 	
       LastModifiedDate >= TIMESTAMP '2025-04-01 00:00:00 Pacific/Tahiti'
       LastModifiedDate <= TIMESTAMP "2025-04-01 00:00:00"
       LastModifiedDate >= TIMESTAMP '2018-01-01 00:00:00.000 UTC'
       LastModifiedDate <= TIMESTAMP "-123456789 Pacific/Tahiti"
       LastModifiedDate <= TIMESTAMP "1702600882"

     Case 2: FILTER_PREDICATE with multiple criteria
     Examples: 
       LastModifiedDate >= TIMESTAMP '2025-04-01 00:00:00 Pacific/Tahiti' AND Id = "0012w00001CotGiAAJ"
       LastModifiedDate >= TIMESTAMP "1702600882" AND Id = "001gL000002i26MQAQ"

     Case 3: FILTER_PREDICATE single criterion with LIMIT
     Examples: 
       LastModifiedDate >= TIMESTAMP "1702600882" LIMIT 2

     Case 4: FILTER_PREDICATE with LIMIT
     Examples: 
       LIMIT 2
```

# Writing to Salesforce
<a name="salesforce-writing-to"></a>

**Prerequisites**

A Salesforce sObject you would like to write to. You will need the object name such as `Account` or `Case` or `Opportunity`.

The Salesforce connector supports four write operations:
+ INSERT
+ UPSERT
+ UPDATE
+ DELETE

When using the `UPSERT` write operation, the `ID_FIELD_NAMES` option must be provided to specify the external ID field for the records.

 You can also add connection options: 
+  `TRANSFER_MODE`: Supports two modes: `SYNC` and `ASYNC`. Default is `SYNC`. When set to `ASYNC`, Bulk API 2.0 Ingest will be utilized for processing. 
+  `FAIL_ON_FIRST_ERROR`: The default value is `FALSE`, which means the AWS Glue job will continue processing all the data even if there are some failed write records. When set to `TRUE`, the AWS Glue job will fail if there are any failed write records, and it will not continue processing. 

**Example**

```
salesforce_write = glueContext.write_dynamic_frame.from_options(
    frame=frameToWrite,
    connection_type="salesforce",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "Account",
        "API_VERSION": "v60.0",
        "WRITE_OPERATION": "INSERT",
        "TRANSFER_MODE": "ASYNC",
        "FAIL_ON_FIRST_ERROR": "true"
    }
)
```

# Salesforce connection options
<a name="salesforce-connection-options"></a>

The following connection options are supported for the Salesforce connector:
+ `ENTITY_NAME`(String) - (Required) Used for Read/Write. The name of your Object in Salesforce.
+ `API_VERSION`(String) - (Required) Used for Read/Write. Salesforce Rest API version you want to use.
+ `SELECTED_FIELDS`(List<String>) - Default: empty(SELECT \$1). Used for Read. Columns you want to select for the object.
+ `FILTER_PREDICATE`(String) - Default: empty. Used for Read. It should be in the Spark SQL format.

  When providing a filter predicate, only the `AND` operator is supported. Other operators such as `OR` and `IN` are not currently supported.
+ `QUERY`(String) - Default: empty. Used for Read. Full Spark SQL query.
+ `PARTITION_FIELD`(String) - Used for Read. Field to be used to partition query.
+ `LOWER_BOUND`(String)- Used for Read. An inclusive lower bound value of the chosen partition field.
+ `UPPER_BOUND`(String) - Used for Read. An exclusive upper bound value of the chosen partition field. 
+ `NUM_PARTITIONS`(Integer) - Default: 1. Used for Read. Number of partitions for read.
+ `IMPORT_DELETED_RECORDS`(String) - Default: FALSE. Used for read. To get the delete records while querying.
+ `WRITE_OPERATION`(String) - Default: INSERT. Used for write. Value should be INSERT, UPDATE, UPSERT, DELETE.
+ `ID_FIELD_NAMES`(String) - Default : null. Required for UPDATE and UPSERT.

# Limitations for the Salesforce connector
<a name="salesforce-connector-limitations"></a>

The following are limitations for the Salesforce connector:
+ We only support Spark SQL and Salesforce SOQL is not supported.
+ Job bookmarks are not supported.
+ Salesforce field names are case sensitive. When writing to Salesforce, data must match the casing of the fields defined within Salesforce.

# Set up the Authorization Code flow for Salesforce
<a name="salesforce-setup-authorization-code-flow"></a>

Refer to Salesforce public documentation for enabling the OAuth 2.0 Authorization Code flow.

To configure the connected app:

1. Activate the **Enable OAuth Settings** checkbox.

1. In the **Callback URL** text field, enter one or more redirect URLs for AWS Glue.

   Redirect URLs have the following format:

   https://*region*.console.aws.amazon.com/gluestudio/oauth

   In this URL, *region* is the code for the AWS Region where you use AWS Glue to transfer data from Salesforce. For example, the code for the US East (N. Virginia) Region is `us-east-1`. For that Region, the URL is the following:

   https://us-east-1.console.aws.amazon.com/gluestudio/oauth

   For the AWS Regions that AWS Glue supports, and their codes, see [AWS Glue endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/glue.html) in the *AWS General Reference*.

1. Activate the **Require Secret for Web Server Flow** checkbox.

1. In the **Available OAuth Scopes** list, add the following scopes:
   + Manage user data via APIs (api)
   + Access custom permissions (custom\$1permissions)
   + Access the identity URL service (id, profile, email, address, phone)
   + Access unique user identifiers (openid)
   + Perform requests at any time (refresh\$1token, offline\$1access)

1. Set the refresh token policy for the connected app to **Refresh token is valid until revoked**. Otherwise, your jobs will fail when your refresh token expires. For more information on how to check and edit the refresh token policy, see [Manage OAuth Access Policies for a Connected App](https://help.salesforce.com/articleView?id=connected_app_manage_oauth.htm) in the Salesforce documentation.

# Set up the JWT bearer OAuth flow for Salesforce
<a name="salesforce-setup-jwt-bearer-oauth"></a>

Refer to Salesforce public documentation for enabling server-to-server integration with [OAuth 2.0 JSON Web Tokens](https://help.salesforce.com/s/articleView?id=sf.remoteaccess_oauth_jwt_flow.htm).

Once you have created a JWT and configured the connected app appropriately in Salesforce, you can create a new Salesforce connection with the `JWT_TOKEN` key set in your Secrets Manager Secret. Set the OAuth grant type to **JWT Bearer Token** when creating the connection.