

# Connecting to SAP OData
<a name="connecting-to-data-sap-odata"></a>

SAP OData is a standard Web protocol used for querying and updating data present in SAP using ABAP (Advanced Business Application Programming), applying and building on Web technologies such as HTTP to provide access to information from a variety of external applications, platforms and devices. With the product, you can access everything you need to help you seamlessly integrate with your SAP system, application, or data.

**Topics**
+ [AWS Glue support for SAP OData](sap-odata-support.md)
+ [Create connections](sap-odata-creating-connections.md)
+ [Creating SAP OData job](sap-odata-creating-job.md)
+ [Writing to SAP OData](sap-odata-writing.md)
+ [Using the SAP OData state management script](sap-odata-state-management-script.md)
+ [Partitioning for Non ODP entities](sap-odata-non-odp-entities-partitioning.md)
+ [SAP OData connection options](sap-odata-connection-options.md)
+ [SAP OData entity and field details](sap-odata-entity-field-details.md)

# AWS Glue support for SAP OData
<a name="sap-odata-support"></a>

AWS Glue supports SAP OData as follows:

**Supported as a source?**  
Yes. You can use AWS Glue ETL jobs to query data from SAP OData.

**Supported as a target?**  
Yes. You can use AWS Glue ETL jobs to write records into SAP OData.

**Supported SAP OData API versions**  
The following SAP OData API versions are supported:
+ 2.0

**Supported sources**  
The following sources are supported:
+ ODP (Operational Data Provisioning) Sources:
  + BW Extractors (DataSources)
  + CDS Views
  + SLT
+ Non-ODP Sources, for example:
  + CDS View Services
  + RFC-based Services
  + Custom ABAP Services

**Supported SAP Components**  
The following are minimum requirements:
+ You must enable catalog service for service discovery.
  + Configure operational data provisioning (ODP) data sources for extraction in the SAP Gateway of your SAP system.
  + **OData V2.0**: Enable the OData V2.0 catalog service(s) in your SAP Gateway via transaction `/IWFND/MAINT_SERVICE`.
  + Enable OData V2.0 services in your SAP Gateway via transaction `/IWFND/MAINT_SERVICE`.
  + Your SAP OData service must support client side pagination/query options such as `$top` and `$skip`. It must also support system query option `$count`.
  + You must provide the required authorization for the user in SAP to discover the services and extract data using SAP OData services. Refer to the security documentation provided by SAP.
+ If you want to use OAuth 2.0 as an authorization mechanism, you must enable OAuth 2.0 for the OData service and register the OAuth client per SAP documentation.
+ To generate an OData service based on ODP data sources, SAP Gateway Foundation must be installed locally in your ERP/BW stack or in a hub configuration.
  + For your ERP/BW applications, the SAP NetWeaver AS ABAP stack must be at 7.50 SP02 or above.
  + For the hub system (SAP Gateway), the SAP NetWeaver AS ABAP of the hub system must be 7.50 SP01 or above for remote hub setup.
+ For non-ODP sources, your SAP NetWeaver stack version must be 7.40 SP02 or above.

**Supported Authentication Methods**  
The following authentication methods are supported:
+ Basic Authentication
+ OAuth 2.0

# Prerequisites
<a name="sap-odata-prerequisites"></a>

Prior to initiating an AWS Glue job for data extraction from SAP OData using the SAP OData connection, complete the following prerequisites:
+ The relevant SAP OData Service must be activated in the SAP system, ensuring the data source is available for consumption. If the OData service is not activated, the Glue job will not be able to access or extract data from SAP.
+ Appropriate authentication mechanisms such as basic (custom) authentication or OAuth 2.0 must be configured in SAP to ensure that the AWS Glue job can successfully establish a connection with the SAP OData service.
+ Configure IAM policies to grant the AWS Glue job appropriate permissions for accessing SAP, Secrets Manager, and other AWS resources involved in the process.
+ If the SAP system is hosted within a private network, VPC connectivity must be configured to ensure that the AWS Glue job can securely communicate with SAP without exposing sensitive data over public internet.

AWS Secrets Manager can be used to securely store sensitive information such as SAP credentials, which the AWS Glue job can dynamically retrieve at runtime. This approach eliminates the need to hard-code credentials, enhancing security and flexibility.

The following prerequisites provide step-by-step guidance on how to set up each component for a smooth integration between AWS Glue and SAP OData.

**Topics**
+ [SAP OData activation](sap-odata-activation.md)
+ [IAM policies](sap-odata-configuring-iam-permissions.md)
+ [Connectivity / VPC Connection](sap-odata-connectivity-vpc-connection.md)
+ [SAP Authentication](sap-odata-authentication.md)
+ [AWS Secrets Manager to store your Auth secret](sap-odata-aws-secret-manager-auth-secret.md)

# SAP OData activation
<a name="sap-odata-activation"></a>

Complete the following steps for SAP OData connection:

## ODP Sources
<a name="sap-odata-odp-sources"></a>

Before you can transfer data from an ODP provider, you must meet the following requirements:
+ You have an SAP NetWeaver AS ABAP instance.
+ Your SAP NetWeaver instance contains an ODP provider that you want to transfer data from. ODP providers include:
  + SAP DataSources (Transaction code RSO2)
  + SAP Core Data Services ABAP CDS Views
  + SAP BW or SAP BW/4HANA systems (InfoObject, DataStore Object)
  + Real-time replication of Tables and DB-Views from SAP Source System via SAP Landscape Replication Server (SAP SLT)
  + SAP HANA Information Views in SAP ABAP based Sources
+ Your SAP NetWeaver instance has the SAP Gateway Foundation component.
+ You have created an OData service that extracts data from your ODP provider. To create the OData service, you use the SAP Gateway Service Builder. To access your ODP data, Amazon AppFlow calls this service by using the OData API. For more information, see [Generating a Service for Extracting ODP Data via OData](https://help.sap.com/docs/SAP_BPC_VERSION_BW4HANA/dd104a87ab9249968e6279e61378ff66/69b481859ef34bab9cc7d449e6fff7b6.html?version=11.0) in the SAP BW/4HANA documentation.
+ To generate an OData service based on ODP data sources, SAP Gateway Foundation must be installed locally in your ERP/BW stack or in a hub configuration.
  + For your ERP/BW applications, the SAP NetWeaver AS ABAP stack must be at 7.50 SP02 or above.
  + For the hub system (SAP Gateway), the SAP NetWeaver AS ABAP of the hub system must be 7.50 SP01 or above for remote hub setup.

## Non-ODP Sources
<a name="sap-odata-non-odp-sources"></a>
+ Your SAP NetWeaver stack version must be 7.40 SP02 or above.
+ You must enable catalog service for service discovery.
  + **OData V2.0**: The OData V2.0 catalog service(s) can be enabled in your SAP Gateway via transaction `/IWFND/MAINT_SERVICE`
+ Your SAP OData service must support client side pagination/query options such as `$top` and `$skip`. It must also support system query option `$count`.
+ For OAuth 2.0, you must enable OAuth 2.0 for the OData service and register the OAuth client per SAP documentation and set the authorized redirect URL as follows:
  + `https://<region>.console.aws.amazon.com/gluestudio/oauth`, replacing `<region>` with the region where AWS Glue is running, example: us-east-1. 
  + You must enable secure setup for connecting over HTTPS.
+ You must provide required authorization for the user in SAP to discover the services and extract data using SAP OData services. Please refer to the security documentation provided by SAP.

# IAM policies
<a name="sap-odata-configuring-iam-permissions"></a>

## Policies containing the API operations for creating and using connections
<a name="sap-odata-policies-api-operations"></a>

The following sample policy describes the required AWS IAM permissions for creating and using connections. If you are creating a new role, create a policy that contains the following:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:ListConnectionTypes",
        "glue:DescribeConnectionType",
        "glue:RefreshOAuth2Tokens",
        "glue:ListEntities",
        "glue:DescribeEntity"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:DescribeSecret",
        "secretsmanager:GetSecretValue",
        "secretsmanager:PutSecretValue"
      ],
      "Resource": "*"
    }
  ]
}
```

------

The role must grant access to all the resources used by the job, for example Amazon S3. If you don’t want to use the above method, alternatively use the following managed IAM policies.
+ [AWSGlueServiceRole](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – Grants access to resources that various AWS Glue processes require to run on your behalf. These resources include AWS Glue, Amazon S3, IAM, CloudWatch Logs, and Amazon EC2. If you follow the naming convention for resources specified in this policy, AWS Glue processes have the required permissions. This policy is typically attached to roles specified when defining crawlers, jobs, and development endpoints.
+ [AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – Grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console.
+ [SecretsManagerReadWrite](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/SecretsManagerReadWrite) – Provides read/write access to AWS Secrets Manager via the AWS Management Console. Note: this excludes IAM actions, so combine with `IAMFullAccess` if rotation configuration is required.

**IAM Policies/Permissions needed to configure VPC**

The following IAM permissions are required while using VPC connection for creating AWS Glue Connection. For more details, refer to [create an IAM policy for AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/create-service-policy.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "ec2:DescribeNetworkInterfaces"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

# Connectivity / VPC Connection
<a name="sap-odata-connectivity-vpc-connection"></a>

Steps for VPC Connection:

1. Use existing VPC connection or create a new connection by following the [Amazon VPC documentation](https://docs.aws.amazon.com/vpc/latest/userguide/create-vpc.html).

1. Make sure you have NAT Gateway which routes the traffic to internet.

1. Choose VPC endpoint as Amazon S3 Gateway to create connection.

1. Enable DNS resolution and DNS hostname to use AWS provided DNS Services.

1. Go to created VPC and add necessary endpoints for different services like STS, AWS Glue, Secret Managers.

   1. Choose Create Endpoint.

   1. For Service Category, choose AWS Services.

   1. For Service Name, choose the service that you are connecting to.

   1. Choose VPC and Enable DNS Name.

   1. VCP Endpoints required for VPC connection:

      1. [STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sts_vpc_endpoint_create.html)

      1. [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/vpc-interface-endpoints.html)

      1. [Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/vpc-endpoint-overview.html)

## Security Group Configuration
<a name="sap-odata-security-group-configuration"></a>

Security group must allow traffic to its listening port from AWS Glue VPC for AWS Glue to be able to connect to it. It is a good practice to restrict the range of source IP addresses as much as possible. 

AWS Glue requires special security group that allows all inbound traffic from itself. You can create a self-referencing rule that allows all traffic originating from the security group. You can modify an existing security group and specify the security group as source.

Open the communication from the HTTPS ports of the URL endpoint (either NLB or SAP instance).

## Connectivity options
<a name="sap-odata-connectivity-options"></a>
+ HTTPS connection with internal and external NLB, SSL certificate from certificate authority (CA), not self-signed SSL certificate
+ HTTPS connection with SAP instance SSL certificate from certificate authority (CA), not self-signed SSL certificate

# SAP Authentication
<a name="sap-odata-authentication"></a>

The SAP connector supports both CUSTOM (this is SAP BASIC authentication) and OAUTH authentication methods.

## Custom Authentication
<a name="sap-odata-custom-authentication"></a>

AWS Glue supports Custom (Basic Authentication) as a method for establishing connections to your SAP systems, allowing the use of a username and password for secure access. This auth type works well for automation scenarios as it allows using username and password up front with the permissions of a particular user in the SAP OData instance. AWS Glue is able to use the username and password to authenticate SAP OData APIs. In AWS Glue, basic authorization is implemented as custom authorization.

For public SAP OData documentation for Basic Auth flow, see [HTTP Basic Authentication](https://help.sap.com/docs/SAP_SUCCESSFACTORS_PLATFORM/d599f15995d348a1b45ba5603e2aba9b/5c8bca0af1654b05a83193b2922dcee2.html).

## OAuth 2.0 Authentication
<a name="sap-odata-oauth-2.0-authentication"></a>

AWS Glue also supports OAuth 2.0 as a secure authentication mechanism for establishing connections to your SAP systems. This enables seamless integration while ensuring compliance with modern authentication standards and enhancing the security of data access.

## AUTHORIZATION\$1CODE Grant Type
<a name="sap-odata-authentication-code-grant-type"></a>

The grant type determines how AWS Glue communicates with SAP OData to request access to your data. SAP OData supports only the `AUTHORIZATION_CODE` grant type. This grant type is considered "three-legged" OAuth as it relies on redirecting users to the third-party authorization server to authenticate the user. It is used when creating connections via the AWS Glue console. 

Users may still opt to create their own connected app in SAP OData and provide their own client ID and client secret when creating connections through the AWS Glue console. In this scenario, they will still be redirected to SAP OData to login and authorize AWS Glue to access their resources.

This grant type results in a refresh token and access token. The access token is short lived, and may be refreshed automatically without user interaction using the refresh token.

For public SAP OData documentation on creating a connected app for Authorization Code OAuth flow, see [Authentication Using OAuth 2.0](https://help.sap.com/docs/ABAP_PLATFORM_NEW/e815bb97839a4d83be6c4fca48ee5777/2e5104fd87ff452b9acb247bd02b9f9e.html).

# AWS Secrets Manager to store your Auth secret
<a name="sap-odata-aws-secret-manager-auth-secret"></a>

You will need to store the SAP OData connection secrets in AWS Secrets Manager, configure the necessary permissions for retrieval as specified in the [IAM policies](sap-odata-configuring-iam-permissions.md) section, and use it while creating a connection.

Use the AWS Management Console for AWS Secrets Manager to create a secret for your SAP source. For more information, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html). Details in AWS Secrets Manager should include the elements in the following code. 

## Custom Authentication Secret
<a name="sap-odata-custom-auth-secret"></a>

You will need to enter your SAP system username in place of *<your SAP username>* and its password in place of *<your SAP username password>* and True or False. In this context, setting `basicAuthDisableSSO` to `true` disables Single Sign-On (SSO) for Basic Authentication requests, requiring explicit user credentials for each request. Conversely, setting it to `false` allows the use of existing SSO sessions if available.

```
{
   "basicAuthUsername": "<your SAP username>",
   "basicAuthPassword": "<your SAP username password>",
   "basicAuthDisableSSO": "<True/False>",
   "customAuthenticationType": "CustomBasicAuth"
}
```

## OAuth 2.0 Secret
<a name="sap-odata-oauth-2.0-secret"></a>

In case you are using OAuth 2.0 as your authentication mechanism, the secret in the AWS Secrets Manager should have the **User Managed Client Application ClientId** in the following format. You will need to enter your SAP client secret in place of <your client secret>.

```
{"USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET": "<your client secret>"
}
```

# Create connections
<a name="sap-odata-creating-connections"></a>

To configure an SAP OData connection:

1. Sign in to the AWS Management Console and open the [AWS Glue console](https://console.aws.amazon.com/glue). In the AWS Glue Studio, create a connection by following the steps below:

   1. Click Data connections on the left panel.

   1. Click on Create connection.

   1. Select **SAP OData** in **Choose data source**

   1. Provide the **Application host URL** of the SAP OData instance you want to connect to. This application host url must be accessible over public internet for non VPC connection.

   1. Provide the **Application service path** of the SAP OData instance you want to connect to. This is the same as the catalog service path. For example: `/sap/opu/odata/iwfnd/catalogservice;v=2`. AWS Glue doesn’t accept specific object path.

   1. Provide the **Client number** of the SAP OData instance you want to connect to. Acceptable values are [001-999]. Example: 010

   1. Provide the **Port number** of the SAP OData instance you want to connect to. Example: 443

   1. Provide the **Logon language** of the SAP OData instance you want to connect to. Example: EN

   1. Select the AWS IAM role which AWS Glue can assume and has permissions as outlined in the [IAM policies](sap-odata-configuring-iam-permissions.md) section.

   1. Select the **Authentication Type** which you want to use for this connection in AWS Glue from the dropdown list: OAUTH2 or CUSTOM

      1. CUSTOM - Select the secret you created as specified in the [AWS Secrets Manager to store your Auth secret](sap-odata-aws-secret-manager-auth-secret.md) section.

      1. OAUTH 2.0 - enter the following inputs only in case of OAuth 2.0:

         1. Under **User Managed Client Application ClientId**, enter your client id.

         1. `USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET` (your client secret) in the AWS Secrets Manager that you created in the [AWS Secrets Manager to store your Auth secret](sap-odata-aws-secret-manager-auth-secret.md) section.

         1. Under **Authorization Code URL**, enter your authorization code URL.

         1. Under **Authorization Tokens URL**, enter your authorization token URL.

         1. Under **OAuth Scopes**, enter your OAuth scopes separated by space. Example: `/IWFND/SG_MED_CATALOG_0002 ZAPI_SALES_ORDER_SRV_0001`

   1. Select the network options if you want to use your network. For more details, see [Connectivity / VPC Connection](sap-odata-connectivity-vpc-connection.md).

1. Grant the IAM role associated with your AWS Glue job permission to read `secretName`. For more details, see the [IAM policies](sap-odata-configuring-iam-permissions.md).

1. Choose **Test connection** and test your connection. If the connection test passes, click next, enter your connection name and save your connection. Test connection functionality is not available if you have chosen Network options (VPC). 

# Creating SAP OData job
<a name="sap-odata-creating-job"></a>

Refer to [Building visual ETL jobs with AWS Glue Studio](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html)

# Operational Data Provisioning (ODP) Sources
<a name="sap-odata-operational-data-provisioning-sources"></a>

Operational Data Provisioning (ODP) provides a technical infrastructure that you can use to support data extraction and replication for various target applications and supports delta mechanisms in these scenarios. In case of a delta procedure, the data from a source (ODP Provider) is automatically written to a delta queue (Operational Delta Queue – ODQ) using an update process or passed to the delta queue using an extractor interface. An ODP Provider can be a DataSource (extractors), ABAP Core Data Services Views (ABAP CDS Views), SAP BW or SAP BW/4HANA, SAP Landscape Transformation Replication Server (SLT), and SAP HANA Information Views (calculation views). The target applications (referred to as ODQ 'subscribers' or more generally “ODP Consumers”) retrieve the data from the delta queue and continue processing the data.

## Full Load
<a name="sap-odata-full-load"></a>

In the context of SAP OData and ODP entities, a **Full Load** refers to the process of extracting all available data from an ODP entity in a single operation. This operation retrieves the complete dataset from the source system, ensuring that the target system has a comprehensive and up-to-date copy of the entity's data. Full loads are typically used for sources that do not support incremental loads or when a refresh of the target system is required.

**Example**

You can explicitly set the `ENABLE_CDC` flag to false, when creating the DynamicFrame. Note: `ENABLE_CDC` is false by default, if you don’t want to initialize the delta queue, you don’t have to send this flag or set it to true. Not setting this flag to true will result in a full load extraction.

```
sapodata_df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "ENABLE_CDC": "false"
    }, transformation_ctx=key)
```

## Incremental Load
<a name="sap-odata-incremental-load"></a>

An **incremental load** in the context of ODP (Operational Data Provisioning) entities involves extracting only the new or changed data (deltas) from the source system since the last data extraction, avoiding preprocessing the already processed records. This approach significantly improves efficiency, reduces data transfer volumes, enhances performance, ensures efficient synchronization between systems, and minimizes processing time, especially for large datasets that change frequently.

# Delta Token based Incremental Transfers
<a name="sap-odata-incremental-transfers"></a>

To enable Incremental Transfer using Change Data Capture (CDC) for ODP-enabled entities that support it, follow these steps:

1. Create the Incremental Transfer job in script mode.

1. When creating the DataFrame or Glue DynamicFrame, you need to pass the option `"ENABLE_CDC": "True"`. This option ensures that you will receive a Delta Token from SAP, which can be used for subsequent retrieval of changed data.

The delta token will be present in the last row of the dataframe, in the DELTA\$1TOKEN column. This token can be used as a connector option in subsequent calls to incrementally retrieve the next set of data.

**Example**
+ We set the `ENABLE_CDC` flag to `true`, when creating the DynamicFrame. Note: `ENABLE_CDC` is `false` by default, if you don’t want to initialize the delta queue, you don’t need to send this flag or set it to true. Not setting this flag to true will result in a full load extraction.

  ```
  sapodata_df = glueContext.create_dynamic_frame.from_options(
      connection_type="SAPOData",
      connection_options={
          "connectionName": "connectionName",
          "ENTITY_NAME": "entityName",
          "ENABLE_CDC": "true"
      }, transformation_ctx=key)
  
  # Extract the delta token from the last row of the DELTA_TOKEN column
  delta_token_1 = your_logic_to_extract_delta_token(sapodata_df) # e.g., D20241029164449_000370000
  ```
+ The extracted delta token can be passed as a an option to retrieve new events.

  ```
  sapodata_df_2 = glueContext.create_dynamic_frame.from_options(
      connection_type="SAPOData",
      connection_options={
          "connectionName": "connectionName",
          "ENTITY_NAME": "entityName",
          // passing the delta token retrieved in the last run
          "DELTA_TOKEN": delta_token_1
      } , transformation_ctx=key)
  
  # Extract the new delta token for the next run
  delta_token_2 = your_logic_to_extract_delta_token(sapodata_df_2)
  ```

Note that the last record, in which the `DELTA_TOKEN` is present, is not a transactional record from source, and is only there for the purpose of passing the delta token value.

Apart from the `DELTA_TOKEN`, the following fields are returned in each row of the dataframe. 
+ **GLUE\$1FETCH\$1SQ**: This is a sequence field, generated from the EPOC timestamp in the order the record was received, and is unique for each record. This can be used if you need to know or establish the order of changes in the source system. This field will be present only for ODP enabled entities.
+ **DML\$1STATUS**: This will show `UPDATED` for all newly inserted and updated records from the source, and `DELETED` for records that have been deleted from source.

For more details about how to manage state and reuse the delta token to retrieve changed records through an example refer to the [Using the SAP OData state management script](sap-odata-state-management-script.md) section.

## Delta Token Invalidation
<a name="sap-odata-invalidation"></a>

A delta token is associated with the service collection and a user. If a new initial pull with `“ENABLE_CDC” : “true”` is initiated for the same service collection and the user, all previous delta tokens issued as a result of a previous initialization will be invalidated by SAP OData service. Invoking the connector with an expired delta token will lead to an exception: 

`Could not open data access via extraction API RODPS_REPL_ODP_OPEN` 

# OData Services (Non-ODP Sources)
<a name="sap-odata-non-odp-services"></a>

## Full Load
<a name="sap-odata-non-odp-full-load"></a>

For Non-ODP (Operational Data Provisioning) systems, a **Full Load** involves extracting the entire dataset from the source system and loading it into the target system. Since Non-ODP systems do not inherently support advanced data extraction mechanisms like deltas, the process is straightforward but can be resource-intensive depending on the size of the data.

## Incremental Load
<a name="sap-odata-non-odp-incremental-load"></a>

For systems or entities that do not support **ODP (Operational Data Provisioning)**, incremental data transfer can be managed manually by implementing a timestamp based mechanism to track and extract changes.

**Timestamp based Incremental Transfers**

For non-ODP enabled entities(or for ODP enabled entities that don’t use the ENABLE\$1CDC flag), we can use a `filteringExpression` option in the connector to indicate the `datetime` interval for which we want to retrieve data. This method relies on a timestamp field in you data that represents when each record was last created/modified.

**Example**

Retrieving records that changed after 2024-01-01T00:00:00.000

```
sapodata_df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "filteringExpression": "LastChangeDateTime >= 2024-01-01T00:00:00.000"
    }, transformation_ctx=key)
```

Note: In this example, `LastChangeDateTime` is the field that represents when each record was last modified. The actual field name may vary depending on your specific SAP OData entity.

To get a new subset of data in subsequent runs, you would update the `filteringExpression` with a new timestamp. Typically, this would be the maximum timestamp value from the previously retrieved data.

**Example**

```
max_timestamp = get_max_timestamp(sapodata_df)  # Function to get the max timestamp from the previous run
next_filtering_expression = f"LastChangeDateTime > {max_timestamp}"

# Use this next_filtering_expression in your next run
```

In the next section, we will provide an automated approach to manage these timestamp-based incremental transfers, eliminating the need to manually update the filtering expression between runs.

# Writing to SAP OData
<a name="sap-odata-writing"></a>

 This section describes how to write data to your SAP OData Service using the AWS Glue connector for SAP OData. 

**Prerequisites**
+ Access to an SAP OData service
+ An SAP OData EntitySet Object you would like to write to. You will need the Object name.
+ Valid SAP OData credentials and a valid connection
+ Appropriate permissions as described in [IAM policies](https://docs.aws.amazon.com/glue/latest/dg/sap-odata-configuring-iam-permissions.html)

The SAP OData connector supports two write operations:
+ INSERT
+ UPDATE

While using the UPDATE write operation, ID\$1FIELD\$1NAMES must be provided to specify the external ID field for the records.

**Example:**

```
sapodata_write = glueContext.write_dynamic_frame.from_options(
    frame=frameToWrite,
    connection_type="sapodata",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "WRITE_OPERATION": "INSERT"
    }
```

# Using the SAP OData state management script
<a name="sap-odata-state-management-script"></a>

To use the SAP OData state management script in your AWS Glue job, follow these steps:
+ Download the state management script: `s3://aws-blogs-artifacts-public/artifacts/BDB-4789/sap_odata_state_management.zip ` from the public Amazon S3 bucket.
+ Upload the script to an Amazon S3 bucket that your AWS Glue job has permissions to access.
+ Reference the script in your AWS Glue job: When creating or updating your AWS Glue job, pass the `'--extra-py-files'` option referencing the script path in your Amazon S3 bucket. For example: `--extra-py-files s3://your-bucket/path/to/sap_odata_state_management.py`
+ Import and use the state management library in your AWS Glue job scripts.

## Delta-token based Incremental Transfer example
<a name="sap-odata-delta-token-incremental-transfer"></a>

Here's an example of how to use the state management script for delta-token based incremental transfers:

```
from sap_odata_state_management import StateManagerFactory, StateManagerType, StateType

# Initialize the state manager
state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG,
    state_type=StateType.DELTA_TOKEN,
    options={
        "job_name": args['JOB_NAME'],
        "logger": logger
    }
)

# Get connector options (including delta token if available)
key = "SAPODataNode"
connector_options = state_manager.get_connector_options(key)

# Use the connector options in your Glue job
df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "ENABLE_CDC": "true",
        **connector_options
    }
)

# Process your data here...

# Update the state after processing
state_manager.update_state(key, sapodata_df.toDF())
```

## Timestamp based Incremental Transfer example
<a name="sap-odata-timestamp-incremental-transfer"></a>

Here's an example of how to use the state management script for delta-token based incremental transfers:

```
from sap_odata_state_management import StateManagerFactory, StateManagerType, StateType

# Initialize the state manager
state_manager = StateManagerFactory.create_manager(
    manager_type=StateManagerType.JOB_TAG,
    state_type=StateType.DELTA_TOKEN,
    options={
        "job_name": args['JOB_NAME'],
        "logger": logger
    }
)

# Get connector options (including delta token if available)
key = "SAPODataNode"
connector_options = state_manager.get_connector_options(key)

# Use the connector options in your Glue job
df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "ENABLE_CDC": "true",
        **connector_options
    }
)

# Process your data here...

# Update the state after processing
state_manager.update_state(key, sapodata_df.toDF())
```

In both examples, the state management script handles the complexities of storing the state(either delta token or timestamp) between job runs. It automatically retrieves the last know state when getting connector options and updates the state after processing, ensuring the each job run only processes new or changed data.

# Partitioning for Non ODP entities
<a name="sap-odata-non-odp-entities-partitioning"></a>

In Apache Spark, partitioning refers to the way data is divided and distributed across the worker nodes in a cluster for parallel processing. Each partition is a logical chunk of data that can be processed independently by a task. Partitioning is a fundamental concept in Spark that directly impacts performance, scalability, and resource utilization. AWS Glue jobs use Spark's partitioning mechanism to divide the dataset into smaller chunks (partitions) that can be processed in parallel across the cluster's worker nodes. Note that partitioning is not applicable for ODP entities.

For more details, see [AWS Glue Spark and PySpark jobs](https://docs.aws.amazon.com/glue/latest/dg/spark_and_pyspark.html).

**Prerequisites**

An SAP OData’s Object you would like to read from. You will need the object/EntitySet name. For example: ` /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder `.

**Example**

```
sapodata_read = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "/sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder"
    }, transformation_ctx=key)
```

## Partitioning Queries
<a name="sap-odata-partitioning-queries"></a>

### Field Based partitioning
<a name="sap-odata-field-based-partitioning"></a>

You can provide the additional Spark options `PARTITION_FIELD`, `LOWER_BOUND`, `UPPER_BOUND`, and `NUM_PARTITIONS` if you want to utilize concurrency in Spark. With these parameters, the original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by Spark tasks concurrently. Integer, Date and DateTime fields support field-based partitioning in the SAP OData connector.
+ `PARTITION_FIELD`: the name of the field to be used to partition the query.
+ `LOWER_BOUND`: an inclusive lower bound value of the chosen partition field.

   For any field whose data type is DateTime, the Spark timestamp format used in Spark SQL queries is accepted.

  Examples of valid values: `"2000-01-01T00:00:00.000Z"` 
+ `UPPER_BOUND`: an exclusive upper bound value of the chosen partition field.
+ `NUM_PARTITIONS`: number of partitions.
+ `PARTITION_BY`: the type partitioning to be performed, `FIELD` to be passed in case of Field based partitioning.

**Example**

```
sapodata= glueContext.create_dynamic_frame.from_options(
    connection_type="sapodata",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "/sap/opu/odata/sap/SEPM_HCM_SCENARIO_SRV/EmployeeSet",
        "PARTITION_FIELD": "validStartDate"
        "LOWER_BOUND": "2000-01-01T00:00:00.000Z"
        "UPPER_BOUND": "2020-01-01T00:00:00.000Z"
        "NUM_PARTITIONS": "10",
        "PARTITION_BY": "FIELD"
    }, transformation_ctx=key)
```

### Record Based partitioning
<a name="sap-odata-record-based-partitioning"></a>

The original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by Spark tasks concurrently.

Record-based partitioning is only supported for non-ODP entities, as pagination in ODP entities is supported through the next token/skip token.
+ `PARTITION_BY`: the type partitioning to be performed. `COUNT` is to be passed in case of record-based partitioning.

**Example**

```
sapodata= glueContext.create_dynamic_frame.from_options(
    connection_type="sapodata",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "/sap/opu/odata/sap/SEPM_HCM_SCENARIO_SRV/EmployeeSet",
        "NUM_PARTITIONS": "10",
        "PARTITION_BY": "COUNT"
    }, transformation_ctx=key)
```

# Limitations / Callouts
<a name="sap-odata-limitations"></a>
+ ODP entities are not compatible with Record Based Partitioning since pagination is handled using skip token/delta token. Consequently, for Record Based Partitioning, the default value for maxConcurrency is set to "null" irrespective of the user input.
+ When both limit and partition is applied, the limit takes precedence over partitioning.

# SAP OData connection options
<a name="sap-odata-connection-options"></a>

The following are connection options for SAP OData:
+ `ENTITY_NAME`(String) - (Required) Used for Read. The name of your object in SAP OData.

  For example: /sap/opu/odata/sap/API\$1SALES\$1ORDER\$1SRV/A\$1SalesOrder
+ `API_VERSION`(String) - (Optional) Used for Read. SAP OData Rest API version you want to use. Example: 2.0.
+ `SELECTED_FIELDS`(List<String>) - Default: empty(SELECT \$1). Used for Read. Columns you want to select for the object.

  For example: SalesOrder
+ `FILTER_PREDICATE`(String) - Default: empty. Used for Read. It should be in the Spark SQL format.

  For example: `SalesOrder = "10"`
+ `QUERY`(String) - Default: empty. Used for Read. Full Spark SQL query.

  For example: `SELECT * FROM /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder`
+ `PARTITION_FIELD`(String) - Used for Read. Field to be used to partition query.

  For example: `ValidStartDate`
+ `LOWER_BOUND`(String)- Used for Read. An inclusive lower bound value of the chosen partition field.

  For example: `"2000-01-01T00:00:00.000Z"`
+ `UPPER_BOUND`(String) - Used for Read. An exclusive upper bound value of the chosen partition field.

  For example: `"2024-01-01T00:00:00.000Z"`
+ `NUM_PARTITIONS`(Integer) - Default: 1. Used for Read. Number of partitions for read.
+ `INSTANCE_URL`(String) - The SAP instance application host URL.

  For example: `https://example-externaldata.sierra.aws.dev`
+ `SERVICE_PATH`(String) - The SAP instance application service path.

  For example: `/sap/opu/odata/iwfnd/catalogservice;v=2`
+ `CLIENT_NUMBER`(String) - The SAP instance application client number.

  For example: 100
+ `PORT_NUMBER`(String) - Default: The SAP instance application port number.

  For example: 443
+ `LOGON_LANGUAGE`(String) - The SAP instance application logon language.

  For example: `EN`
+ `ENABLE_CDC`(String) - Defines whether to run a job with CDC enabled, that is, with track changes.

  For example: `True/False`
+ `DELTA_TOKEN`(String) - Runs an incremental data pull based on the valid Delta Token supplied. 

  For example: `D20241107043437_000463000`
+ `PAGE_SIZE`(Integer) - Defines the page size for querying the records. The default page size is 50,000. When a page size is specified, SAP returns only the defined number of records per API call, rather than the entire dataset. The connector will still provide the total number of records and handle pagination using your specified page size. If you require a larger page size, you can choose any value up to 500,000, which is the maximum allowed. Any specified page size exceeding 500,000 will be ignored. Instead, the system will use the maximum allowed page size. You can specify the page size in the AWS Glue Studio UI by adding a connection option `PAGE_SIZE` with your desired value. 

  For example: `20000`

# SAP OData entity and field details
<a name="sap-odata-entity-field-details"></a>

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/sap-odata-entity-field-details.html)