

# How data processing works in Data Wrangler
Data processing

While working with data interactively in an Amazon SageMaker Data Wrangler data flow, Amazon SageMaker Canvas only applies the transformations to a sample dataset for you to preview. After finishing your data flow in SageMaker Canvas, you can process all of your data and save it in a location that is suitable for your machine learning workflows.

There are several options for how to proceed after you've finished transforming your data in Data Wrangler:
+ [Create a model](canvas-processing-export-model.md). You can create a Canvas model, where you directly start creating a model with your prepared data. You can create a model either after processing your entire dataset, or by exporting just the sample data you worked with in Data Wrangler. Canvas saves your processed data (either the entire dataset or the sample data) as a Canvas dataset.

  We recommend that you use your sample data for quick iterations, but that you use your entire data when you want to train your final model. When building tabular models, datasets larger than 5 GB are automatically downsampled to 5 GB, and for time series forecasting models, datasets larger than 30 GB are downsampled to 30 GB.

  To learn more about creating a model, see [How custom models work](canvas-build-model.md).
+ [Export the data](canvas-export-data.md). You can export your data for use in machine learning workflows. When you choose to export your data, you have several options:
  + You can save your data in the Canvas application as a dataset. For more information about the supported file types for Canvas datasets and additional requirements when importing data into Canvas, see [Create a dataset](canvas-import-dataset.md).
  + You can save your data to Amazon S3. Depending on the Canvas memory availability, your data is processed in the application and then exported to Amazon S3. If the size of your dataset exceeds what Canvas can process, then by default, Canvas uses an EMR Serverless job to scale to multiple compute instances, process your full dataset, and export it to Amazon S3. You can also manually configure a SageMaker Processing job to have more granular control over the compute resources used to process your data.
+ [Export a data flow](canvas-export-data-flow.md). You might want to save the code for your data flow so that you can modify or run your transformations outside of Canvas. Canvas provides you with the option to save your data flow transformations as Python code in a Jupyter notebook, which you can then export to Amazon S3 for use elsewhere in your machine learning workflows.

When you export your data from a data flow and save it either as a Canvas dataset or to Amazon S3, Canvas creates a new destination node in your data flow, which is a final node that shows you where your processed data is stored. You can add additional destination nodes to your flow if you'd like to perform multiple export operations. For example, you can export the data from different points in your data flow to only apply some of the transformations, or you can export transformed data to different Amazon S3 locations. For more information about how to add or edit a destination node, see [Add destination nodes](canvas-destination-nodes-add.md) and [Edit a destination node](canvas-destination-nodes-edit.md).

For more information about setting up a schedule with Amazon EventBridge to automatically process and export your data on a schedule, see [Create a schedule to automatically process new data](canvas-data-export-schedule-job.md).

# Export to create a model


In just a few clicks from your data flow, you can export your transformed data and start creating an ML model in Canvas. Canvas saves your data as a Canvas dataset, and you're taken to the model build configuration page for a new model.

To create a Canvas model with your transformed data:

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the node that you're exporting.

1. From the context menu, choose **Create model**.

1. In the **Export to create a model** side panel, enter a **Dataset name** for the new dataset.

1. Leave the **Process entire dataset** option selected to process and export your entire dataset before proceeding with building a model. Turn this option off to train your model using the interactive sample data you are working with in your data flow.

1. Enter a **Model name** to name the new model.

1. Select a **Problem type**, or the type of model that you want to build. For more information about the supported model types in SageMaker Canvas, see [How custom models work](canvas-build-model.md).

1. Select the **Target column**, or the value that you want the model to predict.

1. Choose **Export and create model**.

The **Build** tab for a new Canvas model should open, and you can finish configuring and training your model. For more information about how to build a model, see [Build a model](canvas-build-model-how-to.md).

# Export data


Export data to apply the transforms from your data flow to the full imported dataset. You can export any node in your data flow to the following locations:
+ SageMaker Canvas dataset
+ Amazon S3

If you want to train models in Canvas, you can export your full, transformed dataset as a Canvas dataset. If you want to use your transformed data in machine learning workflows external to SageMaker Canvas, you can export your dataset to Amazon S3.

## Export to a Canvas dataset


Use the following procedure to export a SageMaker Canvas dataset from a node in your data flow.

**To export a node in your flow as a SageMaker Canvas dataset**

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the node that you're exporting.

1. In the context menu, hover over **Export**, and then select **Export data to Canvas dataset**.

1. In the **Export to Canvas dataset** side panel, enter a **Dataset name** for the new dataset.

1. Leave the **Process entire dataset** option selected if you want SageMaker Canvas to process and save your full dataset. Turn this option off to only apply the transforms to the sample data you are working with in your data flow.

1. Choose **Export**.

You should now be able to go to the **Datasets** page of the Canvas application and see your new dataset.

## Export to Amazon S3


When exporting your data to Amazon S3, you can scale to transform and process data of any size. Canvas automatically processes your data locally if the application's memory can handle the size of your dataset. If your dataset size exceeds the local memory capacity of 5 GB, then Canvas initiates a remote job on your behalf to provision additional compute resources and process the data more quickly. By default, Canvas uses Amazon EMR Serverless to run these remote jobs. However, you can manually configure Canvas to use either EMR Serverless or a SageMaker Processing job with your own settings.

**Note**  
When running an EMR Serverless job, by default the job inherits the IAM role, KMS key settings, and tags of your Canvas application.

The following summarizes the options for remote jobs in Canvas:
+ **EMR Serverless**: This is the default option that Canvas uses for remote jobs. EMR Serverless automatically provisions and scales compute resources to process your data so that you don't have to worry about choosing the right compute resources for your workload. For more information about EMR Serverless, see the [EMR Serverless User Guide](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html).
+ **SageMaker Processing**: SageMaker Processing jobs offer more advanced options and granular control over the compute resources used to process your data. For example, you can specify the type and count of the compute instances, configure the job in your own VPC and control network access, automate processing jobs, and more. For more information about automating processing jobs see [Create a schedule to automatically process new data](canvas-data-export-schedule-job.md). For more general information about SageMaker Processing jobs, see [Data transformation workloads with SageMaker Processing](processing-job.md).

The following file types are supported when exporting to Amazon S3:
+ CSV
+ Parquet

To get started, review the following prerequisites.

### Prerequisites for EMR Serverless jobs


To create a remote job that uses EMR Serverless resources, you must have the necessary permissions. You can grant permissions either through the Amazon SageMaker AI domain or user profile settings, or you can manually configure your user's AWS IAM role. For instructions on how to grant users permissions to perform large data processing, see [Grant Users Permissions to Use Large Data across the ML Lifecycle](canvas-large-data-permissions.md).

If you don't want to configure these policies but still need to process large datasets through Data Wrangler, you can alternatively use a SageMaker Processing job.

Use the following procedures to export your data to Amazon S3. To configure a remote job, follow the optional advanced steps.

**To export a node in your flow to Amazon S3**

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the node that you're exporting.

1. In the context menu, hover over **Export**, and then select **Export data to Amazon S3**.

1. In the **Export to Amazon S3** side panel, you can change the **Dataset name** for the new dataset.

1. For the **S3 location**, enter the Amazon S3 location to which you want to export the dataset. You can enter the S3 URI, alias, or ARN of the S3 location or S3 access point. For more information access points, see [Managing data access with Amazon S3 access points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html) in the *Amazon S3 User Guide*.

1. (Optional) For the **Advanced settings**, specify values for the following fields:

   1. **File type** – The file format of your exported data.

   1. **Delimiter** – The delimiter used to separate values in the file.

   1. **Compression** – The compression method used to reduce the file size.

   1. **Number of partitions** – The number of dataset files that Canvas writes as the output of the job.

   1. **Choose columns** – You can choose a subset of columns from the data to include in the partitions.

1. Leave the **Process entire dataset** option selected if you want Canvas to apply your data flow transforms to your entire dataset and export the result. If you deselect this option, Canvas only applies the transforms to the sample of your dataset used in the interactive Data Wrangler data flow.
**Note**  
If you only export a sample of your data, Canvas processes your data in the application and doesn't create a remote job for you.

1. Leave the **Auto job configuration** option selected if you want Canvas to automatically determine whether to run the job using Canvas application memory or an EMR Serverless job. If you deselect this option and manually configure your job, then you can choose to use either an EMR Serverless or a SageMaker Processing job. For instructions on how to configure an EMR Serverless or a SageMaker Processing job, see the section after this procedure before you export your data.

1. Choose **Export**.

The following procedures show how to manually configure the remote job settings for either EMR Serverless or SageMaker Processing when exporting your full dataset to Amazon S3.

------
#### [ EMR Serverless ]

To configure an EMR Serverless job while exporting to Amazon S3, do the following:

1. In the Export to Amazon S3 side panel, turn off the **Auto job configuration** option.

1. Select **EMR Serverless**.

1. For **Job name**, enter a name for your EMR Serverless job. The name can contain letters, numbers, hyphens, and underscores.

1. For **IAM role**, enter the user's IAM execution role. This role should have the required permissions to run EMR Serverless applications. For more information, see [Grant Users Permissions to Use Large Data across the ML Lifecycle](canvas-large-data-permissions.md).

1. (Optional) For **KMS key**, specify the key ID or ARN of an AWS KMS key to encrypt the job logs. If you don't enter a key, Canvas uses a default key for EMR Serverless.

1. (Optional) For **Monitoring configuration**, enter the name of an Amazon CloudWatch Logs log group to which you want to publish your logs.

1. (Optional) For **Tags**, add metadata tags to the EMR Serverless job consisting of key-value pairs. These tags can be used to categorize and search for jobs.

1. Choose **Export** to start the job.

------
#### [ SageMaker Processing ]

To configure a SageMaker Processing job while exporting to Amazon S3, do the following:

1. In the **Export to Amazon S3** side panel, turn off the **Auto job configuration** option.

1. Select **SageMaker Processing**.

1. For **Job name**, enter a name for your SageMaker AI Processing job.

1. For **Instance type**, select the type of compute instance to run the processing job.

1. For **Instance count**, specify the number of compute instances to launch.

1. For **IAM role**, enter the user's IAM execution role. This role should have the required permissions for SageMaker AI to create and run processing jobs on your behalf. These permissions are granted if you have the [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) policy attached to your IAM role.

1. For **Volume size**, enter the storage size in GB for the ML storage volume that is attached to each processing instance. Choose the size based on your expected input and output data size.

1. (Optional) For **Volume KMS key**, specify a KMS key to encrypt the storage volume. If you don't specify a key, the default Amazon EBS encryption key is used.

1. (Optional) For **KMS key**, specify a KMS key to encrypt input and output Amazon S3 data sources used by the processing job.

1. (Optional) For **Spark memory configuration**, do the following:

   1. Enter **Driver memory in MB** for the Spark driver node that handles job coordination and scheduling.

   1. Enter **Executor memory in MB** for the Spark executor nodes that run individual tasks in the job.

1. (Optional) For **Network configuration**, do the following:

   1. For **Subnet configuration**, enter the IDs of the VPC subnets for the processing instances to be launched in. By default, the job uses the settings of your default VPC.

   1. For **Security group configuration**, enter the IDs of the security groups to control inbound and outbound connectivity rules.

   1. Turn on the **Enable inter-container traffic encryption** option to encrypt network communication between processing containers during the job.

1. (Optional) For **Associate schedules**, you can choose create an Amazon EventBridge schedule to have the processing job run on recurring intervals. Choose **Create new schedule** and fill out the dialog box. For more information about filling out this section and running processing jobs on a schedule, see [Create a schedule to automatically process new data](canvas-data-export-schedule-job.md).

1. (Optional) Add **Tags** as key-value pairs so that you can categorize and search for processing jobs.

1. Choose **Export** to start the processing job.

------

After exporting your data, you should find the fully processed dataset in the specified Amazon S3 location.

# Export a data flow


Exporting your data flow translates the operations that you've made in Data Wrangler and exports it into a Jupyter notebook of Python code that you can modify and run. This can be helpful for integrating the code for your data transformations into your machine learning pipelines.

You can choose any data node in your data flow and export it. Exporting the data node exports the transformation that the node represents and the transformations that precede it.

**To export a data flow as a Jupyter notebook**

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the node that you want to export.

1. In the context menu, hover over **Export**, and then hover over **Export via Jupyter notebook**.

1. Choose one of the following:
   + **SageMaker Pipelines**
   + **Amazon S3**
   + **SageMaker AI Inference Pipeline**
   + **SageMaker AI Feature Store**
   + **Python Code**

1. The **Export data flow as notebook** dialog box opens. Select one of the following:
   + **Download a local copy**
   + **Export to S3 location**

1. If you selected **Export to S3 location**, enter the Amazon S3 location to which you want to export the notebook.

1. Choose **Export**.

Your Jupyter notebook should either download to your local machine, or you can find it saved in the Amazon S3 location you specified.

# Add destination nodes


A destination node in SageMaker Canvas specifies where to store your processed and transformed data. When you choose to export your transformed data to Amazon S3, Canvas uses the specified destination node location, applying all the transformations you've configured in your data flow. For more information about export jobs to Amazon S3, see the preceding section [Export to Amazon S3](canvas-export-data.md#canvas-export-data-s3).

By default, choosing to export your data to Amazon S3 adds a destination node to your data flow. However, you can add multiple destination nodes to your flow, allowing you to simultaneously export different sets of transformations or variations of your data to different Amazon S3 locations. For example, you can create one destination node that exports the data after applying all transformations, and another destination node that exports the data after only certain initial transformations, such as a join operation. This flexibility enables you to export and store different versions or subsets of your transformed data in separate S3 locations for various use cases.

Use the following procedure to add a destination node to your data flow.

**To add a destination node**

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the node where you want to place the destination node.

1. In the context menu, hover over **Export**, and then select **Add destination**.

1. In the **Export destination** side panel, enter a **Dataset name** to name the output.

1. For **Amazon S3 location**, enter the Amazon S3 location to which you want to export the output. You can enter the S3 URI, alias, or ARN of the S3 location or S3 access point. For more information access points, see [Managing data access with Amazon S3 access points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html) in the *Amazon S3 User Guide*.

1. For **Export settings**, specify the following fields:

   1. **File type** – The file format of the exported data.

   1. **Delimiter** – The delimiter used to separate values in the file.

   1. **Compression** – The compression method used to reduce the file size.

1. For **Partitioning**, specify the following fields:

   1. **Number of partitions** – The number of dataset files that SageMaker Canvas writes as the output of the job.

   1. **Choose columns** – You can choose a subset of columns from the data to include in the partitions.

1. Choose **Add** if you want to simply add a destination node to your data flow, or choose **Add** and then choose **Export** if you want to add the node and initiate an export job.

You should now see a new destination node in your flow.

# Edit a destination node


A *destination node* in a Amazon SageMaker Canvas data flow specifies the Amazon S3 location where your processed and transformed data is stored, applying all the configured transformations in your data flow. You can edit the configuration of an existing destination node and then choose to re-run the job to overwrite the data in the specified Amazon S3 location. For more information about adding a new destination node, see [Add destination nodes](canvas-destination-nodes-add.md).

Use the following procedure to edit a destination node in your data flow and initiate an export job.

**To edit a destination node**

1. Navigate to your data flow.

1. Choose the ellipsis icon next to the destination node that you want to edit.

1. In the context menu, choose **Edit**.

1. The **Edit destination** side panel opens. From this panel, you can edit details such as the dataset name, the Amazon S3 location, and the export and partitioning settings.

1. (Optional) In **Additional nodes to export**, you can select more destination nodes to process when you run the export job.

1. Leave the **Process entire dataset** option selected if you want Canvas to apply your data flow transforms to your entire dataset and export the result. If you deselect this option, Canvas only applies the transforms to the sample of your dataset used in the interactive Data Wrangler data flow.

1. Leave the **Auto job configuration** option selected if you want Canvas to automatically determine whether to run the job using Canvas application memory or an EMR Serverless job. If you deselect this option and manually configure your job, then you can choose to use either an EMR Serverless or a SageMaker Processing job. For instructions on how to configure an EMR Serverless or a SageMaker Processing job, see the preceding section [Export to Amazon S3](canvas-export-data.md#canvas-export-data-s3).

1. When you're done making changes, choose **Update**.

Saving changes to your destination node configuration doesn't automatically re-run a job or overwrite data that has already been processed and exported. Export your data again to run a job with the new configuration. If you decide to export your data again with a job, Canvas uses the updated destination node configuration to transform and output the data to the specified location, overwriting any existing data.

# Create a schedule to automatically process new data


**Note**  
The following section only applies to SageMaker Processing jobs. If you used the default Canvas settings or EMR Serverless to create a remote job to apply transforms to your full dataset, this section doesn’t apply.

If you're processing data periodically, you can create a schedule to run the processing job automatically. For example, you can create a schedule that runs a processing job automatically when you get new data. For more information about processing jobs, see [Export to Amazon S3](canvas-export-data.md#canvas-export-data-s3).

When you create a job, you must specify an IAM role that has permissions to create the job. You can use the [AmazonSageMakerCanvasDataPrepFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasDataPrepFullAccess.html) policy to add permissions.

Add the following trust policy to the role to allow EventBridge to assume it.

```
{
    "Effect": "Allow",
    "Principal": {
        "Service": "events.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
}
```

**Important**  
When you create a schedule, Data Wrangler creates an `eventRule` in EventBridge. You incur charges for both the event rules that you create and the instances used to run the processing job.  
For information about EventBridge pricing, see [Amazon EventBridge pricing](https://aws.amazon.com/eventbridge/pricing/). For information about processing job pricing, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).

You can set a schedule using one of the following methods:
+ [CRON expressions](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html)
**Note**  
Data Wrangler doesn't support the following expressions:  
LW\$1
Abbreviations for days
Abbreviations for months
+ [RATE expressions](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html#eb-rate-expressions)
+ Recurring – Set an hourly or daily interval to run the job.
+ Specific time – Set specific days and times to run the job.

The following sections provide procedures on scheduling jobs when filling out the SageMaker AI Processing job settings while [exporting your data to Amazon S3](canvas-export-data.md#canvas-export-data-s3). All of the following instructions begin in the **Associate schedules** section of the SageMaker Processing job settings.

------
#### [ CRON ]

Use the following procedure to create a schedule with a CRON expression.

1. In the **Export to Amazon S3** side panel, make sure you've turned off the **Auto job configuration** toggle and have the **SageMaker Processing** option selected.

1. In the **SageMaker Processing** job settings, open the **Associate schedules** section and choose **Create new schedule**.

1. The **Create new schedule** dialog box opens. For **Schedule Name**, specify the name of the schedule.

1. For **Run Frequency**, choose **CRON**.

1. For each of the **Minutes**, **Hours**, **Days of month**, **Month**, and **Day of week** fields, enter valid CRON expression values.

1. Choose **Create**.

1. (Optional) Choose **Add another schedule** to run the job on an additional schedule.
**Note**  
You can associate a maximum of two schedules. The schedules are independent and don't affect each other unless the times overlap.

1. Choose one of the following:
   + **Schedule and run now** – The job runs immediately and subsequently runs on the schedules.
   + **Schedule only** – The job only runs on the schedules that you specify.

1. Choose **Export** after you've filled out the rest of the export job settings.

------
#### [ RATE ]

Use the following procedure to create a schedule with a RATE expression.

1. In the **Export to Amazon S3** side panel, make sure you've turned off the **Auto job configuration** toggle and have the **SageMaker Processing** option selected.

1. In the **SageMaker Processing** job settings, open the **Associate schedules** section and choose **Create new schedule**.

1. The **Create new schedule** dialog box opens. For **Schedule Name**, specify the name of the schedule.

1. For **Run Frequency**, choose **Rate**.

1. For **Value**, specify an integer.

1. For **Unit**, select one of the following:
   + **Minutes**
   + **Hours**
   + **Days**

1. Choose **Create**.

1. (Optional) Choose **Add another schedule** to run the job on an additional schedule.
**Note**  
You can associate a maximum of two schedules. The schedules are independent and don't affect each other unless the times overlap.

1. Choose one of the following:
   + **Schedule and run now** – The job runs immediately and subsequently runs on the schedules.
   + **Schedule only** – The job only runs on the schedules that you specify.

1. Choose **Export** after you've filled out the rest of the export job settings.

------
#### [ Recurring ]

Use the following procedure to create a schedule that runs a job on a recurring basis.

1. In the **Export to Amazon S3** side panel, make sure you've turned off the **Auto job configuration** toggle and have the **SageMaker Processing** option selected.

1. In the **SageMaker Processing** job settings, open the **Associate schedules** section and choose **Create new schedule**.

1. The **Create new schedule** dialog box opens. For **Schedule Name**, specify the name of the schedule.

1. For **Run Frequency**, choose **Recurring**.

1. For **Every x hours**, specify the hourly frequency that the job runs during the day. Valid values are integers in the inclusive range of **1** and **23**.

1. For **On days**, select one of the following options:
   + **Every Day**
   + **Weekends**
   + **Weekdays**
   + **Select Days**

   1. (Optional) If you've selected **Select Days**, choose the days of the week to run the job.
**Note**  
The schedule resets every day. If you schedule a job to run every five hours, it runs at the following times during the day:  
00:00
05:00
10:00
15:00
20:00

1. Choose **Create**.

1. (Optional) Choose **Add another schedule** to run the job on an additional schedule.
**Note**  
You can associate a maximum of two schedules. The schedules are independent and don't affect each other unless the times overlap.

1. Choose one of the following:
   + **Schedule and run now** – The job runs immediately and subsequently runs on the schedules.
   + **Schedule only** – The job only runs on the schedules that you specify.

1. Choose **Export** after you've filled out the rest of the export job settings.

------
#### [ Specific time ]

Use the following procedure to create a schedule that runs a job at specific times.

1. In the **Export to Amazon S3** side panel, make sure you've turned off the **Auto job configuration** toggle and have the **SageMaker Processing** option selected.

1. In the **SageMaker Processing** job settings, open the **Associate schedules** section and choose **Create new schedule**.

1. The **Create new schedule** dialog box opens. For **Schedule Name**, specify the name of the schedule.

1. For **Run Frequency**, choose **Start time**.

1. For **Start time**, enter a time in UTC format (for example, **09:00**). The start time defaults to the time zone where you are located.

1. For **On days**, select one of the following options:
   + **Every Day**
   + **Weekends**
   + **Weekdays**
   + **Select Days**

   1. (Optional) If you've selected **Select Days**, choose the days of the week to run the job.

1. Choose **Create**.

1. (Optional) Choose **Add another schedule** to run the job on an additional schedule.
**Note**  
You can associate a maximum of two schedules. The schedules are independent and don't affect each other unless the times overlap.

1. Choose one of the following:
   + **Schedule and run now** – The job runs immediately and subsequently runs on the schedules.
   + **Schedule only** – The job only runs on the schedules that you specify.

1. Choose **Export** after you've filled out the rest of the export job settings.

------

You can use the SageMaker AI AWS Management Console to view the jobs that are scheduled to run. Your processing jobs run within Pipelines. Each processing job has its own pipeline. It runs as a processing step within the pipeline. You can view the schedules that you've created within a pipeline. For information about viewing a pipeline, see [View the details of a pipeline](pipelines-studio-list.md).

Use the following procedure to view the jobs that you've scheduled.

To view the jobs you've scheduled, do the following.

1. Open Amazon SageMaker Studio Classic.

1. Open Pipelines

1. View the pipelines for the jobs that you've created.

   The pipeline running the job uses the job name as a prefix. For example, if you've created a job named `housing-data-feature-enginnering`, the name of the pipeline is `canvas-data-prep-housing-data-feature-engineering`.

1. Choose the pipeline containing your job.

1. View the status of the pipelines. Pipelines with a **Status** of **Succeeded** have run the processing job successfully.

To stop the processing job from running, do the following:

To stop a processing job from running, delete the event rule that specifies the schedule. Deleting an event rule stops all the jobs associated with the schedule from running. For information about deleting a rule, see [Disabling or deleting an Amazon EventBridge rule](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-delete-rule.html).

You can stop and delete the pipelines associated with the schedules as well. For information about stopping a pipeline, see [StopPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopPipelineExecution.html). For information about deleting a pipeline, see [DeletePipeline](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeletePipeline.html#API_DeletePipeline_RequestSyntax).