

# Where you can create a notebook job


If you want to create a notebook job, you have multiple options. The following provides the SageMaker AI options for you to create a notebook job.

You can create a job in your JupyterLab notebook in the Studio UI, or you can programmatically create a job with the SageMaker Python SDK:
+ If you create your notebook job in the Studio UI, you supply details about the image and kernel, security configurations, and any custom variables or scripts, and your job is scheduled. For details about how to schedule your job using SageMaker Notebook Jobs, see [Create a notebook job in Studio](create-notebook-auto-run-studio.md).
+ To create a notebook job with the SageMaker Python SDK, you create a pipeline with a Notebook Job step and initiate an on-demand run or optionally use the pipeline scheduling feature to schedule future runs. The SageMaker SDK gives you the flexibility to customize your pipeline—you can expand your pipeline to a workflow with multiple notebook job steps. Since you create both a SageMaker Notebook Job step and a pipeline, you can track your pipeline execution status in the SageMaker Notebook Jobs job dashboard and also view your pipeline graph in Studio. For details about how to schedule your job with the SageMaker Python SDK and links to example notebooks, see [Create notebook job with SageMaker AI Python SDK example](create-notebook-auto-run-sdk.md).

# Create notebook job with SageMaker AI Python SDK example


To run a standalone notebook using the SageMaker Python SDK, you need to create a Notebook Job step, attach it into a pipeline, and use the utilities provided by Pipelines to run your job on demand or optionally schedule one or more future jobs. The following sections describe the basic steps to create an on-demand or scheduled notebook job and track the run. In addition, refer to the following discussion if you need to pass parameters to your notebook job or connect to Amazon EMR in your notebook—additional preparation of your Jupyter notebook is required in these cases. You can also apply defaults for a subset of the arguments of `NotebookJobStep` so you don’t have to specify them every time you create a Notebook Job step.

To view sample notebooks that demonstrate how to schedule notebook jobs with the SageMaker AI Python SDK, see [notebook job sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step).

**Topics**
+ [

## Steps to create a notebook job
](#create-notebook-auto-run-overall)
+ [

## View your notebook jobs in the Studio UI dashboard
](#create-notebook-auto-run-dash)
+ [

## View your pipeline graph in Studio
](#create-notebook-auto-run-graph)
+ [

## Passing parameters to your notebook
](#create-notebook-auto-run-passparam)
+ [

## Connecting to an Amazon EMR cluster in your input notebook
](#create-notebook-auto-run-emr)
+ [

## Set up default options
](#create-notebook-auto-run-intdefaults)

## Steps to create a notebook job


You can either create a notebook job that runs immediately or on a schedule. The following instructions describe both methods.

**To schedule a notebook job, complete the following basic steps:**

1. Create a `NotebookJobStep` instance. For details about `NotebookJobStep` parameters, see [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep). At minimum, you can provide the following arguments as shown in the following code snippet:
**Important**  
If you schedule your notebook job using the SageMaker Python SDK, you can only specify certain images to run your notebook job. For more information, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk).

   ```
   notebook_job_step = NotebookJobStep(
       input_notebook=input-notebook,
       image_uri=image-uri,
       kernel_name=kernel-name
   )
   ```

1. Create a pipeline with your `NotebookJobStep` as a single step, as shown in the following snippet:

   ```
   pipeline = Pipeline(
       name=pipeline-name,
       steps=[notebook_job_step],
       sagemaker_session=sagemaker-session,
   )
   ```

1. Run the pipeline on demand or optionally schedule future pipeline runs. To initiate an immediate run, use the following command:

   ```
   execution = pipeline.start(
       parameters={...}
   )
   ```

   Optionally, you can schedule a single future pipeline run or multiple runs at a predetermined interval. You specify your schedule in `PipelineSchedule` and then pass the schedule object to your pipeline with `put_triggers`. For more information about pipeline scheduling, see [Schedule a pipeline with the SageMaker Python SDK](pipeline-eventbridge.md#build-and-manage-scheduling).

   The following example schedules your pipeline to run once at December 12, 2023 at 10:31:32 UTC.

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       at=datetime(year=2023, month=12, date=25, hour=10, minute=31, second=32) 
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

   The following example schedules your pipeline to run at 10:15am UTC on the last Friday of each month during the years 2022 to 2023. For details about cron-based scheduling, see [Cron-based schedules](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html#cron-based).

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       cron="15 10 ? * 6L 2022-2023"
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

1. (Optional) View your notebook jobs in the SageMaker Notebook Jobs dashboard. The values you supply for the `tags` argument of your Notebook Job step control how the Studio UI captures and displays the job. For more information, see [View your notebook jobs in the Studio UI dashboard](#create-notebook-auto-run-dash).

## View your notebook jobs in the Studio UI dashboard


The notebook jobs you create as pipeline steps appear in the Studio Notebook Job dashboard if you specify certain tags.

**Note**  
Only notebook jobs created in Studio or local JupyterLab environments create job definitions. Therefore, if you create your notebook job with the SageMaker Python SDK, you don’t see job definitions in the Notebook Jobs dashboard. You can, however, view your notebook jobs as described in [View notebook jobs](view-notebook-jobs.md). 

You can control which team members can view your notebook jobs with the following tags:
+ To display the notebook to all user profiles or [spaces](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html) in a domain, add the domain tag with your domain name. An example is shown as follows:
  + key: `sagemaker:domain-name`, value: `d-abcdefghij5k`
+ To display the notebook job to a certain user profile in a domain, add both the user profile and the domain tags. An example of a user profile tag is shown as follows:
  + key: `sagemaker:user-profile-name`, value: `studio-user`
+ To display the notebook job to a [space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html), add both the space and the domain tags. An example of a space tag is shown as follows:
  + key: `sagemaker:shared-space-name`, value: `my-space-name`
+ If you do not attach any domain or user profile or space tags, then the Studio UI does not show the notebook job created by pipeline step. In this case, you can view the underlying training job in the training job console or you can view the status in the [list of pipeline executions](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-studio-view-execution.html).

Once you set up the necessary tags to view your jobs in the dashboard, see [View notebook jobs](view-notebook-jobs.md) for instructions about how to view your jobs and download outputs.

## View your pipeline graph in Studio


Since your notebook job step is part of a pipeline, you can view the pipeline graph (DAG) in Studio. In the pipeline graph, you can view the status of the pipeline run and track lineage. For details, see [View the details of a pipeline run](pipelines-studio-view-execution.md).

## Passing parameters to your notebook


If you want to pass parameters to your notebook job (using the `parameters` argument of `NotebookJobStep`), you need to prepare your input notebook to receive the parameters. 

The Papermill-based notebook job executor searches for a Jupyter cell tagged with the `parameters` tag and applies the new parameters or parameter overrides immediately after this cell. For details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md). 

Once you have performed this step, pass your parameters to your `NotebookJobStep`, as shown in the following example:

```
notebook_job_parameters = {
    "company": "Amazon"
}

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input-notebook,
    parameters=notebook_job_parameters,
    ...
)
```

## Connecting to an Amazon EMR cluster in your input notebook


If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to further modify your Jupyter notebook. See [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md) if you need to perform any of the following tasks in your notebook:
+ **Pass parameters into your Amazon EMR connection command.** Studio uses Papermill to run notebooks. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to how Papermill passes information to SparkMagic.
+ **Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters.** You have to pass user credentials through the AWS Secrets Manager.

## Set up default options


The SageMaker SDK gives you the option to set defaults for a subset of parameters so you don’t have to specify these parameters every time you create a `NotebookJobStep` instance. These parameters are `role`, `s3_root_uri`, `s3_kms_key`, `volume_kms_key`, `subnets`, and `security_group_ids`. Use the SageMaker AI config file to set the defaults for the step. For information about the SageMaker AI configuration file, see [Configuring and using defaults with the SageMaker Python SDK.](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk).

To set up the notebook job defaults, apply your new defaults to the notebook job section of the config file as shown in the following snippet:

```
SageMaker:
  PythonSDK:
    Modules:
      NotebookJob:
        RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
        S3RootUri: 's3://amzn-s3-demo-bucket/my-project'
        S3KmsKeyId: 's3kmskeyid'
        VolumeKmsKeyId: 'volumekmskeyid1'
        VpcConfig:
          SecurityGroupIds:
            - 'sg123'
          Subnets:
            - 'subnet-1234'
```

# Create a notebook job in Studio


**Note**  
The notebook scheduler is built from the Amazon EventBridge, SageMaker Training, and Pipelines services. If your notebook jobs fail, you might see errors related to these services. The following provides information on how to create a notebook job in the Studio UI.

SageMaker Notebook Jobs gives you the tools to create and manage your noninteractive notebook jobs using the Notebook Jobs widget. You can create jobs, view the jobs you created, and pause, stop, or resume existing jobs. You can also modify notebook schedules.

When you create your scheduled notebook job with the widget, the scheduler tries to infer a selection of default options and automatically populates the form to help you get started quickly. If you are using Studio, at minimum you can submit an on-demand job without setting any options. You can also submit a (scheduled) notebook job definition supplying just the time-specific schedule information. However, you can customize other fields if your scheduled job requires specialized settings. If you are running a local Jupyter notebook, the scheduler extension provides a feature for you to specify your own defaults (for a subset of options) so you don't have to manually insert the same values every time.

When you create a notebook job, you can include additional files such as datasets, images, and local scripts. To do so, choose **Run job with input folder**. The Notebook Job will now have access to all files under the input file's folder. While the notebook job is running the file structure of directory remains unchanged.

To schedule a notebook job, complete the following steps.

1. Open the **Create Job** form.

   In local JupyterLab environments, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the taskbar. If you don't see the icon, follow the instructions in [Installation guide](scheduled-notebook-installation.md) to install it.

   In Studio, open the form in one of two ways:
   + Using the **File Browser**

     1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

     1. Choose **Create Notebook Job**.
   + Within the Studio notebook
     + Inside the Studio notebook you want to run as a scheduled job, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the Studio toolbar.

1. Complete the popup form. The form displays the following fields:
   + **Job name**: A descriptive name you specify for your job.
   + **Input file**: The name of the notebook which you are scheduling to run in noninteractive mode.
   + **Compute type**: The type of Amazon EC2 instance in which you want to run your notebook.
   + **Parameters**: Custom parameters you can optionally specify as inputs to your notebook. To use this feature, you might optionally want to tag a specific cell in your Jupyter notebook with the **parameters** tag to control where your parameters are applied. For more details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md).
   + (Optional)**Run job with input folder**: If selected the scheduled job will have access to all the files found in the same folder as the **Input file**.
   + **Additional Options**: You can specify additional customizations for your job. For example, you can specify an image or kernel, input and output folders, job retry and timeout options, encryption details, and custom initialization scripts. For the complete listing of customizations you can apply, see [Available options](create-notebook-auto-execution-advanced.md).

1. Schedule your job. You can run your notebook on demand or on a fixed schedule.
   + To run the notebook on demand, complete the following steps:
     + Select **Run Now**.
     + Choose **Create**.
     + The **Notebook Jobs** tab appears. Choose **Reload** to load your job into the dashboard.
   + To run the notebook on a fixed schedule, complete the following steps:
     + Choose **Run on a schedule**.
     + Choose the **Interval** dropdown list and select an interval. The intervals range from every minute to monthly. You can also select **Custom schedule**.
     + Based on the interval you choose, additional fields appear to help you further specify your desired run day and time. For example, if you select **Day** for a daily run, an additional field appears for you to specify the desired time. Note that any time you specify is in UTC format. Note also that if you choose a small interval, such as one minute, your jobs overlap if the previous job is not complete when the next job starts.

       If you select a custom schedule, you use cron syntax in the expression box to specify your exact run date and time. The cron syntax is a space-separated list of digits, each of which represent a unit of time from seconds to years. For help with cron syntax, you can choose **Get help with cron syntax** under the expression box.
     + Choose **Create**.
     + The **Notebook Job Definitions** tab appears. Choose **Reload** to load your job definition into the dashboard.

# Set up default options for local notebooks


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

You can set up default options when you create a notebook job. This can save you time if you plan to create multiple notebook jobs with different options than the provided defaults. The following provides information on how to set up the default options for local notebooks.

If you have to manually type (or paste in) custom values in the **Create Job** form, you can store new default values and the scheduler extension inserts your new values every time you create a new job definition. This feature is available for the following options:
+ **Role ARN**
+ **S3 Input Folder**
+ **S3 Output Folder**
+ **Output encryption KMS key** (if you turn on **Configure Job Encryption**)
+ **Job instance volume encryption KMS key** (if you turn on **Configure Job Encryption**)

This feature saves you time if you insert different values than the provided defaults and continue to use those values for future job runs. Your chosen user settings are stored on the machine that runs your JupyterLab server and are retrieved with the help of native API. If you provide new default values for one or more but not all five options, the previous defaults are taken for the ones you don’t customize.

The following instructions show you how to preview the existing default values, set new default values, and reset your default values for your notebook jobs.

**To preview existing default values for your notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

1. Choose **Create Notebook Job**.

1. Choose **Additional options** to expand the tab of notebook job settings. You can view the default settings here. 

**To set new default values for your future notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can update the default settings directly in this UI page or by using the JSON editor.
   + In the UI you can insert new values for **Role ARN**, **S3 Input Folder**, **S3 Output Folder**, **Output encryption KMS key**, or **Job instance volume encryption KMS key**. If you change these values, you will see the new defaults for these fields while you create your next notebook job under **Additional options**.
   + (Optional) To update the user defaults using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To update your default values, copy and paste the JSON snippet from the **System Defaults** panel to the **User Preferences** panel, and update the fields.

     1. If you updated the default values, choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.

**If you previously changed and now want to reset the user-defined default values, complete following steps:**

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can restore the defaults by directly using this UI page or using the JSON editor.
   + In the UI you can choose **Restore to Defaults** in the top right corner. Your defaults are restored to empty strings. You only see this option if you previously changed your default values.
   + (Optional) To restart the default settings using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To restore your current default settings copy the content from the **System Defaults** panel to the **User Preferences** panel.

     1. Choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.