

# Programming Ray scripts
<a name="aws-glue-programming-ray"></a>

**Important**  
AWS Glue for Ray will no longer be open to new customers starting April 30, 2026. If you would like to use AWS Glue for Ray, sign up prior to that date. Existing customers can continue to use the service as normal. For capabilities similar to for AWS Glue for Ray, explore Amazon EKS. For more information, see [AWS Glue for Ray end of support](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html).

AWS Glue makes it easy to write and run Ray scripts. This section describes the supported Ray capabilities that are available in AWS Glue for Ray. You program Ray scripts in Python.

Your custom script must be compatible with the version of Ray that's defined by the `Runtime` field in your job definition. For more information about `Runtime` in the Jobs API, see [Jobs](aws-glue-api-jobs-job.md). For information about each runtime environment, see [Supported Ray runtime environments](ray-jobs-section.md#author-job-ray-runtimes).

**Topics**
+ [Tutorial: Writing an ETL script in AWS Glue for Ray](edit-script-ray-intro-tutorial.md)
+ [Using Ray Core and Ray Data in AWS Glue for Ray](edit-script-ray-scripting.md)
+ [Providing files and Python libraries to Ray jobs](edit-script-ray-env-dependencies.md)
+ [Connecting to data in Ray jobs](edit-script-ray-connections-formats.md)

# Tutorial: Writing an ETL script in AWS Glue for Ray
<a name="edit-script-ray-intro-tutorial"></a>

**Important**  
AWS Glue for Ray will no longer be open to new customers starting April 30, 2026. If you would like to use AWS Glue for Ray, sign up prior to that date. Existing customers can continue to use the service as normal. For capabilities similar to for AWS Glue for Ray, explore Amazon EKS. For more information, see [AWS Glue for Ray end of support](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html).

Ray gives you the ability to write and scale distributed tasks natively in Python. AWS Glue for Ray offers serverless Ray environments that you can access from both jobs and interactive sessions (Ray interactive sessions are in preview). The AWS Glue job system provides a consistent way to manage and run your tasks—on a schedule, from a trigger, or from the AWS Glue console. 

Combining these AWS Glue tools creates a powerful toolchain that you can use for extract, transform, and load (ETL) workloads, a popular use case for AWS Glue. In this tutorial, you will learn the basics of putting together this solution.

We also support using AWS Glue for Spark for your ETL workloads. For a tutorial on writing a AWS Glue for Spark script, see [Tutorial: Writing an AWS Glue for Spark script](aws-glue-programming-intro-tutorial.md). For more information about available engines, see [AWS Glue for Spark and AWS Glue for Ray](how-it-works-engines.md). Ray is capable of addressing many different kinds of tasks in analytics, machine learning (ML), and application development. 

In this tutorial, you will extract, transform, and load a CSV dataset that is hosted in Amazon Simple Storage Service (Amazon S3). You will begin with the New York City Taxi and Limousine Commission (TLC) Trip Record Data Dataset, which is stored in a public Amazon S3 bucket. For more information about this dataset, see the [Registry of Open Data on AWS](https://registry.opendata.aws/nyc-tlc-trip-records-pds/). 

You will transform your data with predefined transforms that are available in the Ray Data library. Ray Data is a dataset preparation library designed by Ray and included by default in AWS Glue for Ray environments. For more information about libraries included by default, see [Modules provided with Ray jobs](edit-script-ray-env-dependencies.md#edit-script-ray-modules-provided). You will then write your transformed data to an Amazon S3 bucket that you control.

**Prerequisites** – For this tutorial, you need an AWS account with access to AWS Glue and Amazon S3. 

## Step 1: Create a bucket in Amazon S3 to hold your output data
<a name="edit-script-ray-intro-tutorial-s3"></a>

You will need an Amazon S3 bucket that you control to serve as a sink for data created in this tutorial. You can create this bucket with the following procedure.

**Note**  
If you want to write your data to an existing bucket that you control, you can skip this step. Take note of *yourBucketName*, the existing bucket's name, to use in later steps.

**To create a bucket for your Ray job output**
+ Create a bucket by following the steps in [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the *Amazon S3 User Guide*.
  + When choosing a bucket name, take note of *yourBucketName*, which you will refer to in later steps.
  + For other configuration, the suggested settings provided in the Amazon S3 console should work fine in this tutorial.

  As an example, the bucket creation dialog box might look like this in the Amazon S3 console.  
![\[A dialog box in the Amazon S3 console that is used in configuring a new bucket.\]](http://docs.aws.amazon.com/glue/latest/dg/images/ray-tutorial-create-bucket.jpg)

## Step 2: Create an IAM role and policy for your Ray job
<a name="edit-script-ray-intro-tutorial-iam"></a>

Your job will need an AWS Identity and Access Management (IAM) role with the following:
+ Permissions granted by the `AWSGlueServiceRole` managed policy. These are the basic permissions that are necessary to run an AWS Glue job.
+ `Read` access level permissions for the `nyc-tlc/*` Amazon S3 resource.
+ `Write` access level permissions for the `yourBucketName/*` Amazon S3 resource.
+ A trust relationship that allows the `glue.amazonaws.com` principal to assume the role.

You can create this role with the following procedure.

**To create an IAM role for your AWS Glue for Ray job**
**Note**  
You can create an IAM role by following many different procedures. For more information or options about how to provision IAM resources, see the [AWS Identity and Access Management documentation](https://docs.aws.amazon.com/iam/index.html).

1. Create a policy that defines the previously outlined Amazon S3 permissions by following the steps in [Creating IAM policies (console) with the visual editor](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-visual-editor) in the *IAM User Guide*.
   + When selecting a service, choose Amazon S3.
   + When selecting permissions for your policy, attach the following sets of actions for the following resources (mentioned previously):
     + Read access level permissions for the `nyc-tlc/*` Amazon S3 resource.
     + Write access level permissions for the `yourBucketName/*` Amazon S3 resource.
   + When selecting the policy name, take note of *YourPolicyName*, which you will refer to in a later step.

1. Create a role for your AWS Glue for Ray job by following the steps in [ Creating a role for an AWS service (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html#roles-creatingrole-service-console) in the *IAM User Guide*.
   + When selecting a trusted AWS service entity, choose `Glue`. This will automatically populate the necessary trust relationship for your job.
   + When selecting policies for the permissions policy, attach the following policies:
     + `AWSGlueServiceRole`
     + *YourPolicyName*
   + When selecting the role name, take note of *YourRoleName*, which you will refer to in later steps.

## Step 3: Create and run an AWS Glue for Ray job
<a name="edit-script-ray-intro-tutorial-author-job"></a>

In this step, you create an AWS Glue job using the AWS Management Console, provide it with a sample script, and run the job. When you create a job, it creates a place in the console for you to store, configure, and edit your Ray script. For more information about creating jobs, see [Managing AWS Glue Jobs in the AWS Console](author-job-glue.md#console-jobs).

In this tutorial, we address the following ETL scenario: you would like to read the January 2022 records from the New York City TLC Trip Record dataset, add a new column (`tip_rate`) to the dataset by combining data in existing columns, then remove a number of columns that aren't relevant to your current analysis, and then you would like to write the results to *yourBucketName*. The following Ray script performs these steps:

```
import ray
import pandas
from ray import data

ray.init('auto')

ds = ray.data.read_csv("s3://nyc-tlc/opendata_repo/opendata_webconvert/yellow/yellow_tripdata_2022-01.csv")

# Add the given new column to the dataset and show the sample record after adding a new column
ds = ds.add_column( "tip_rate", lambda df: df["tip_amount"] / df["total_amount"])

# Dropping few columns from the underlying Dataset 
ds = ds.drop_columns(["payment_type", "fare_amount", "extra", "tolls_amount", "improvement_surcharge"])

ds.write_parquet("s3://yourBucketName/ray/tutorial/output/")
```

**To create and run an AWS Glue for Ray job**

1. In the AWS Management Console, navigate to the AWS Glue landing page.

1. In the side navigation pane, choose **ETL Jobs**.

1. In **Create job**, choose **Ray script editor**, and then choose **Create**, as in the following illustration.  
![\[A dialog box in the AWS Glue console used to create a Ray job.\]](http://docs.aws.amazon.com/glue/latest/dg/images/edit-script-ray-create.png)

1. Paste the full text of the script into the **Script** pane, and replace any existing text.

1. Navigate to **Job details** and set the **IAM Role** property to *YourRoleName*.

1. Choose **Save**, and then choose **Run**.

## Step 4: Inspect your output
<a name="edit-script-ray-intro-tutorial-inspect"></a>

After running your AWS Glue job, you should validate that the output matches the expectations of this scenario. You can do so with the following procedure.

**To validate whether your Ray job ran successfully**

1. On the job details page, navigate to **Runs**.

1. After a few minutes, you should see a run with a **Run status** of **Succeeded**.

1. Navigate to the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/) and inspect *yourBucketName*. You should see files written to your output bucket.

1. Read the Parquet files and verify their contents. You can do this with your existing tools. If you don't have a process for validating Parquet files, you can do this in the AWS Glue console with an AWS Glue interactive session, using either Spark or Ray (in preview).

   In an interactive session, you have access to Ray Data, Spark, or pandas libraries, which are provided by default (based on your choice of engine). To verify your file contents, you can use common inspection methods that are available in those libraries—methods like `count`, `schema`, and `show`. For more information about interactive sessions in the console, see [Using notebooks with AWS Glue Studio and AWS Glue](https://docs.aws.amazon.com/glue/latest/ug/notebooks-chapter.html). 

   Because you have confirmed that files have been written to the bucket, you can say with relative certainty that if your output has problems, they are not related to IAM configuration. Configure your session with *yourRoleName* to have access to the relevant files.

If you don't see the expected outcomes, examine the troubleshooting content in this guide to identify and remediate the source of the error. You can find the troubleshooting content in the [Troubleshooting AWS Glue](troubleshooting-glue.md) chapter. For specific errors that are related to Ray jobs, see [Troubleshooting AWS Glue for Ray errors from logs](troubleshooting-ray.md) in the troubleshooting chapter. 

## Next steps
<a name="edit-script-ray-intro-tutorial-next"></a>

 You have now seen and performed an ETL process using AWS Glue for Ray from end to end. You can use the following resources to understand what tools AWS Glue for Ray provides to transform and interpret your data at scale. 
+  For more information about Ray's task model, see [Using Ray Core and Ray Data in AWS Glue for Ray](edit-script-ray-scripting.md). For more experience in using Ray tasks, follow the examples in the Ray Core documentation. See [Ray Core: Ray Tutorials and Examples (2.4.0)](https://docs.ray.io/en/releases-2.4.0/ray-core/examples/overview.html) in the Ray documentation. 
+  For guidance about available data management libraries in AWS Glue for Ray, see [Connecting to data in Ray jobs](edit-script-ray-connections-formats.md). For more experience using Ray Data to transform and write datasets, follow the examples in the Ray Data documentation. See [Ray Data: Examples (2.4.0)](https://docs.ray.io/en/releases-2.4.0/data/examples/index.html). 
+ For more information about configuring AWS Glue for Ray jobs, see [Working with Ray jobs in AWS Glue](ray-jobs-section.md).
+ For more information about writing AWS Glue for Ray scripts, continue reading the documentation in this section.

# Using Ray Core and Ray Data in AWS Glue for Ray
<a name="edit-script-ray-scripting"></a>

**Important**  
AWS Glue for Ray will no longer be open to new customers starting April 30, 2026. If you would like to use AWS Glue for Ray, sign up prior to that date. Existing customers can continue to use the service as normal. For capabilities similar to for AWS Glue for Ray, explore Amazon EKS. For more information, see [AWS Glue for Ray end of support](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html).

Ray is a framework for scaling up Python scripts by distributing work across a cluster. You can use Ray as a solution to many sorts of problems, so Ray provides libraries to optimize certain tasks. In AWS Glue, we focus on using Ray to transform large datasets. AWS Glue offers support for Ray Data and parts of Ray Core to facilitate this task. 

## What is Ray Core?
<a name="edit-script-ray-scripting-core-what"></a>

The first step of building a distributed application is identifying and defining work that can be performed concurrently. Ray Core contains the parts of Ray that you use to define tasks that can be performed concurrently. Ray provides reference and quick start information that you can use to learn the tools they provide. For more information, see [What is Ray Core?](https://docs.ray.io/en/latest/ray-core/walkthrough.html) and [Ray Core Quick Start](https://docs.ray.io/en/latest/ray-overview/getting-started.html#ray-core-quick-start). For more information about effectively defining concurrent tasks in Ray, see [Tips for first-time users](https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html). 

**Ray tasks and actors**  
In AWS Glue for Ray documentation, we might refer to *tasks* and *actors*, which are core concepts in Ray.  
Ray uses Python functions and classes as the building blocks of a distributed computing system. Much like when Python functions and variables become "methods" and "attributes" when used in a class, functions become "tasks" and classes become "actors" when they're used in Ray to send code to workers. You can identify functions and classes that might be used by Ray by the `@ray.remote` annotation.  
Tasks and actors are configurable, they have a lifecycle, and they take up compute resources throughout their life. Code that throws errors can be traced back to a task or actor when you're finding the root cause of problems. Thus, these terms might come up when you're learning how to configure, monitor, or debug AWS Glue for Ray jobs.   
To begin learning how to effectively use tasks and actors to build a distributed application, see [Key Concepts](https://docs.ray.io/en/latest/ray-core/key-concepts.html) in the Ray docs.

## Ray Core in AWS Glue for Ray
<a name="edit-script-ray-scripting-core-glue"></a>

AWS Glue for Ray environments manage cluster formation and scaling, as well as collecting and visualizing logs. Because we manage these concerns, we consequently limit access to and support for the APIs in Ray Core that would be used to address these concerns in an open-source cluster.

In the managed `Ray2.4` runtime environment, we do not support:
+ [Ray Core CLI](https://docs.ray.io/en/releases-2.4.0/ray-core/api/cli.html)
+ [Ray State CLI](https://docs.ray.io/en/releases-2.4.0/ray-observability/api/state/cli.html)
+ `ray.util.metrics` Prometheus metric utility methods:
  + [Counter](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.util.metrics.Counter.html)
  + [Gauge](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.util.metrics.Gauge.html)
  + [Histogram](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.util.metrics.Histogram.html)
+ Other debugging tools:
  + [ray.util.pdb.set\$1trace](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.util.pdb.set_trace.html)
  + [ray.util.inspect\$1serializability](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.util.inspect_serializability.html)
  + [ray.timeline](https://docs.ray.io/en/releases-2.4.0/ray-core/api/doc/ray.timeline.html)

## What is Ray Data?
<a name="edit-script-ray-scripting-data-what"></a>

When you're connecting to data sources and destinations, handling datasets, and initiating common transforms, Ray Data is a straightforward methodology for using Ray to solve problems transforming Ray datasets. For more information about using Ray Data, see [Ray Datasets: Distributed Data Preprocessing](https://docs.ray.io/en/releases-2.4.0/data/dataset.html). 

You can use Ray Data or other tools to access your data. For more information on accessing your data in Ray, see [Connecting to data in Ray jobs](edit-script-ray-connections-formats.md).

## Ray Data in AWS Glue for Ray
<a name="edit-script-ray-scripting-data-glue"></a>

Ray Data is supported and provided by default in the managed `Ray2.4` runtime environment. For more information about provided modules, see [Modules provided with Ray jobs](edit-script-ray-env-dependencies.md#edit-script-ray-modules-provided).

# Providing files and Python libraries to Ray jobs
<a name="edit-script-ray-env-dependencies"></a>

**Important**  
AWS Glue for Ray will no longer be open to new customers starting April 30, 2026. If you would like to use AWS Glue for Ray, sign up prior to that date. Existing customers can continue to use the service as normal. For capabilities similar to for AWS Glue for Ray, explore Amazon EKS. For more information, see [AWS Glue for Ray end of support](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html).

This section provides information that you need for using Python libraries with AWS Glue Ray jobs. You can use certain common libraries included by default in all Ray jobs. You can also provide your own Python libraries to your Ray job. 

## Modules provided with Ray jobs
<a name="edit-script-ray-modules-provided"></a>

You can perform data integration workflows in a Ray job with the following provided packages. These packages are available by default in Ray jobs.

------
#### [ AWS Glue version 4.0 ]

In AWS Glue 4.0, the Ray (`Ray2.4` runtime) environment provides the following packages:
+ boto3 == 1.26.133
+ ray == 2.4.0
+ pyarrow == 11.0.0
+ pandas == 1.5.3
+ numpy == 1.24.3
+ fsspec == 2023.4.0

This list includes all packages that would be installed with `ray[data] == 2.4.0`. Ray Data is supported out of box.

------

## Providing files to your Ray job
<a name="edit-script-ray-working-directory"></a>

You can provide files to your Ray job with the `--working-dir` parameter. Provide this parameter with a path to a .zip file hosted on Amazon S3. Within the .zip file, your files must be contained in a single top-level directory. No other files should be at the top level.

Your files will be distributed to each Ray node before your script begins to run. Consider how this might impact the disk space that's available to each Ray node. Available disk space is determined by the WorkerType set in the job configuration. If you want to provide your job data at scale, this mechanism is not the right solution. For more information on providing data to your job, see [Connecting to data in Ray jobs](edit-script-ray-connections-formats.md). 

Your files will be accessible as if the directory was provided to Ray through the `working_dir` parameter. For example, to read a file named `sample.txt` in your .zip file's top-level directory, you could call:

```
@ray.remote
def do_work():
    f = open("sample.txt", "r")
    print(f.read())
```

For more information about `working_dir`, see the [Ray documentation](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#remote-uris). This feature behaves similarly to Ray's native capabilities.

## Additional Python modules for Ray jobs
<a name="edit-script-ray-python-libraries-additional"></a>

**Additional modules from PyPI**

Ray jobs use the Python Package Installer (pip3) to install additional modules to be used by a Ray script. You can use the `--pip-install` parameter with a list of comma-separated Python modules to add a new module or change the version of an existing module. 

For example, to update or add a new `scikit-learn` module, use the following key-value pair: 

`"--pip-install", "scikit-learn==0.21.3"`

If you have custom modules or custom patches, you can distribute your own libraries from Amazon S3 with the `--s3-py-modules` parameter. Before uploading your distribution, it might need to be repackaged and rebuilt. Follow the guidelines in in [Including Python code in Ray jobs](#edit-script-ray-packaging).

**Custom distributions from Amazon S3**

Custom distributions should adhere to Ray packaging guidelines for dependencies. You can find out how to build these distributions in the following section. For more information about how Ray sets up dependencies, see [Environment Dependencies](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html) in the Ray documentation. 

To include a custom distributable after assessing its contents, upload your distributable to a bucket available to the job's IAM role. Specify the Amazon S3 path to a Python zip archive in your parameter configuration. If you're providing multiple distributables, separate them by comma. For example:

`"--s3-py-modules", "s3://s3bucket/pythonPackage.zip"` 

**Limitations**

Ray jobs do not support compiling native code in the job environment. You can be limited by this if your Python dependencies transitively depend on native, compiled code. Ray jobs can run provided binaries, but they must be compiled for Linux on ARM64. This means you might be able to use the contents of `aarch64``manylinux` wheels. You can provide your native dependencies in a compiled form by repackaging a wheel to Ray standards. Typically, this means removing `dist-info` folders so that there is only one folder at the root at the archive. 

You cannot upgrade the version of `ray` or `ray[data]` using this parameter. In order to use a new version of Ray, you will need to change the runtime field on your job, after we have released support for it. For more information about supported Ray versions, see [AWS Glue versions](release-notes.md#release-notes-versions).

## Including Python code in Ray jobs
<a name="edit-script-ray-packaging"></a>

The Python Software Foundation offers standardized behaviors for packaging Python files for use across different runtimes. Ray introduces limitations to packaging standards that you should be aware of. AWS Glue does not specify packaging standards beyond those specified to Ray. The following instructions provide standard guidance on packaging simple Python packages.

Package your files in a `.zip` archive. A directory should be at the root of the archive. **There should be no other files at the root level of the archive, or this may lead to unexpected behavior.** The root directory is the package, and its name is used to refer to your Python code when importing it.

If you provide a distribution in this form to a Ray job with `--s3-py-modules`, you will be able to import Python code from your package in your Ray script.

Your package can provide a single Python module with some Python files, or you can package together many modules. When repackaging dependencies, such as libraries from PyPI, **check for hidden files and metadata directories** inside of those packages. 

**Warning**  
 Certain OS behaviors make make it difficult to properly follow these packaging instructions.   
OSX may add hidden files such as `__MACOSX` to your zip file at the top level.
Windows may add your files to a folder inside the zip automatically, unintentionally creating a nested folder.

The following procedures assume you are interacting with your files in Amazon Linux 2 or a similar OS that provides a distribution of the Info-ZIP `zip` and `zipinfo` utilities. We recommend using these tools to prevent unexpected behaviors. 

To package Python files for use in Ray

1. Create a temporary directory with your package name, then confirm your working directory is its parent directory. You can do this with the following commands:

   ```
   cd parent_directory
   mkdir temp_dir
   ```

1. Copy your files into the temporary directory, then confirm your directory structure. The contents of this directory will be directly accessed as your Python module. You can do this with the following command:

   ```
   ls -AR temp_dir
   # my_file_1.py
   # my_file_2.py
   ```

1. Compress your temporary folder using zip. You can do this with the following commands:

   ```
   zip -r zip_file.zip temp_dir
   ```

1. Confirm your file is properly packaged. `zip_file.zip` should now be found in your working directory. You can inspect it with the following command:

   ```
   zipinfo -1 zip_file.zip
   # temp_dir/
   # temp_dir/my_file_1.py
   # temp_dir/my_file_2.py
   ```

To repackage a Python package for use in Ray.

1. Create a temporary directory with your package name, then confirm your working directory is its parent directory. You can do this with the following commands:

   ```
   cd parent_directory
   mkdir temp_dir
   ```

1. Decompress your package and copy the contents into your temporary directory. Remove files related to your previous packaging standard, leaving only the contents of the module. Confirm your file structure looks correct with the following command:

   ```
   ls -AR temp_dir
   # my_module
   # my_module/__init__.py
   # my_module/my_file_1.py
   # my_module/my_submodule/__init__.py
   # my_module/my_submodule/my_file_2.py
   # my_module/my_submodule/my_file_3.py
   ```

1. Compress your temporary folder using zip. You can do this with the following commands:

   ```
   zip -r zip_file.zip temp_dir
   ```

1. Confirm your file is properly packaged. `zip_file.zip` should now be found in your working directory. You can inspect it with the following command:

   ```
   zipinfo -1 zip_file.zip
   # temp_dir/my_module/
   # temp_dir/my_module/__init__.py
   # temp_dir/my_module/my_file_1.py
   # temp_dir/my_module/my_submodule/
   # temp_dir/my_module/my_submodule/__init__.py
   # temp_dir/my_module/my_submodule/my_file_2.py
   # temp_dir/my_module/my_submodule/my_file_3.py
   ```

# Connecting to data in Ray jobs
<a name="edit-script-ray-connections-formats"></a>

**Important**  
AWS Glue for Ray will no longer be open to new customers starting April 30, 2026. If you would like to use AWS Glue for Ray, sign up prior to that date. Existing customers can continue to use the service as normal. For capabilities similar to for AWS Glue for Ray, explore Amazon EKS. For more information, see [AWS Glue for Ray end of support](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html).

AWS Glue Ray jobs can use a broad array of Python packages that are designed for you to quickly integrate data. We provide a minimal set of dependencies in order to not clutter your environment. For more information about what is included by default, see [Modules provided with Ray jobs](edit-script-ray-env-dependencies.md#edit-script-ray-modules-provided).

**Note**  
AWS Glue extract, transform, and load (ETL) provides the DynamicFrame abstraction to streamline ETL workflows where you resolve schema differences between rows in your dataset. AWS Glue ETL provides additional features—job bookmarks and grouping input files. We don't currently provide corresponding features in Ray jobs.  
AWS Glue for Spark provides direct support for connecting to certain data formats, sources and sinks. In Ray, AWS SDK for pandas and current third-party libraries substantively cover that need. You will need to consult those libraries to understand what capabilities are available.

AWS Glue for Ray integration with Amazon VPC is not currently available. Resources in Amazon VPC will not be accessible without a public route. For more information about using AWS Glue with Amazon VPC, see [Configuring interface VPC endpoints (AWS PrivateLink) for AWS Glue (AWS PrivateLink)](vpc-interface-endpoints.md). 

## Common libraries for working with data in Ray
<a name="edit-script-ray-etl-libraries"></a>

**Ray Data** – Ray Data provides methods to handle common data formats, sources and sinks. For more information about supported formats and sources in Ray Data, see [Input/Output](https://docs.ray.io/en/latest/data/api/input_output.html) in the Ray Data documentation. Ray Data is an opinionated library, rather than a general-purpose library, for handling datasets. 

Ray provides certain guidance around use cases where Ray Data might be the best solution for your job. For more information, see [ Ray use cases ](https://docs.ray.io/en/latest/ray-overview/use-cases.html) in the Ray documentation. 

**AWS SDK for pandas (awswrangler)** – AWS SDK for pandas is an AWS product that delivers clean, tested solutions for reading from and writing to AWS services when your transformations manage data with pandas DataFrames. For more information about supported formats and sources in the AWS SDK for pandas, see the [API Reference](https://aws-sdk-pandas.readthedocs.io/en/stable/api.html) in the AWS SDK for pandas documentation. 

For examples of how to read and write data with the AWS SDK for pandas, see [Quick Start](https://aws-sdk-pandas.readthedocs.io/en/stable/) in the AWS SDK for pandas documentation. The AWS SDK for pandas doesn't provide transforms for your data. It only provides support for reading and writing from sources. 

**Modin** – Modin is a Python library that implements common pandas operations in a distributable way. For more information about Modin, see the [Modin documentation](https://modin.readthedocs.io/en/stable/). Modin itself doesn't provide support for reading and writing from sources. It provides distributed implementations of common transforms. Modin is supported by the AWS SDK for pandas. 

When you run Modin and the AWS SDK for pandas together in a Ray environment, you can perform common ETL tasks with performant results. For more information about using Modin with the AWS SDK for pandas, see [At scale](https://aws-sdk-pandas.readthedocs.io/en/stable/scale.html) in the AWS SDK for pandas documentation. 

**Other frameworks** – For more information about frameworks that Ray supports, see [ The Ray Ecosystem ](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) in the Ray documentation. We don't provide support for other frameworks in AWS Glue for Ray.

## Connecting to data through the Data Catalog
<a name="edit-script-ray-gludc"></a>

Managing your data through the Data Catalog in conjunction with Ray jobs is supported with the AWS SDK for pandas. For more information, see [Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.0.0rc2/tutorials/005%20-%20Glue%20Catalog.html) on the AWS SDK for pandas website.