

# Schedule monitoring jobs
<a name="model-monitor-scheduling"></a>

Amazon SageMaker Model Monitor provides you the ability to monitor the data collected from your real-time endpoints. You can monitor your data on a recurring schedule, or you can monitor it one time, immediately. You can create a monitoring schedule with the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateMonitoringSchedule.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateMonitoringSchedule.html) API.

With a monitoring schedule, SageMaker AI can start processing jobs to analyze the data collected during a given period. In the processing job, SageMaker AI compares the dataset for the current analysis with the baseline statistics and constraints that you provide. Then, SageMaker AI generate a violations report. In addition, CloudWatch metrics are emitted for each feature under analysis.

SageMaker AI provides a prebuilt container for performing analysis on tabular datasets. Alternatively, you could choose to bring your own container as outlined in the [Support for Your Own Containers With Amazon SageMaker Model Monitor](model-monitor-byoc-containers.md) topic.

You can create a model monitoring schedule for your real-time endpoint or batch transform job. Use the baseline resources (constraints and statistics) to compare against the real-time traffic or batch job inputs. 

**Example baseline assignments**  
In the following example, the training dataset used to train the model was uploaded to Amazon S3. If you already have it in Amazon S3, you can point to it directly.  

```
# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)
baseline_prefix = prefix + '/baselining'
baseline_data_prefix = baseline_prefix + '/data'
baseline_results_prefix = baseline_prefix + '/results'

baseline_data_uri = 's3://{}/{}'.format(bucket,baseline_data_prefix)
baseline_results_uri = 's3://{}/{}'.format(bucket, baseline_results_prefix)
print('Baseline data uri: {}'.format(baseline_data_uri))
print('Baseline results uri: {}'.format(baseline_results_uri))
```

```
training_data_file = open("test_data/training-dataset-with-header.csv", 'rb')
s3_key = os.path.join(baseline_prefix, 'data', 'training-dataset-with-header.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(s3_key).upload_fileobj(training_data_file)
```

**Example schedule for recurring analysis**  
If you are scheduling a model monitor for a real-time endpoint, use the baseline constraints and statistics to compare against real-time traffic. The following code snippet shows the general format you use to schedule a model monitor for a real-time endpoint. This example schedules the model monitor to run hourly.  

```
from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

mon_schedule_name = 'my-model-monitor-schedule-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=EndpointInput(
        endpoint_name=endpoint_name,
        destination="/opt/ml/processing/input/endpoint"
    ),
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)
```

**Example schedule for one-time analysis**  
You can also schedule the analysis to run once without recurring by passing arguments like the following to the `create_monitoring_schedule` method:  

```
    schedule_cron_expression=CronExpressionGenerator.now(),
    data_analysis_start_time="-PT1H",
    data_analysis_end_time="-PT0H",
```
In these arguments, the `schedule_cron_expression` parameter schedules the analysis to run once, immediately, with the value `CronExpressionGenerator.now()`. For any schedule with this setting, the `data_analysis_start_time` and `data_analysis_end_time` parameters are required. These parameters set the start time and end time of an analysis window. Define these times as offsets that are relative to the current time, and use ISO 8601 duration format. In this example, the times `-PT1H` and `-PT0H` define a window between one hour in the past and the current time. With this schedule, the analysis evaluates only the data that was collected during the specified window.

**Example schedule for a batch transform job**  
The following code snippet shows the general format you use to schedule a model monitor for a batch transform job.  

```
from sagemaker.model_monitor import (
    CronExpressionGenerator,
    BatchTransformInput, 
    MonitoringDatasetFormat, 
)
from time import gmtime, strftime

mon_schedule_name = 'my-model-monitor-schedule-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    batch_transform_input=BatchTransformInput(
        destination="opt/ml/processing/input",
        data_captured_destination_s3_uri=s3_capture_upload_path,
        dataset_format=MonitoringDatasetFormat.csv(header=False),
    ),
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)
```

```
desc_schedule_result = my_default_monitor.describe_schedule()
print('Schedule status: {}'.format(desc_schedule_result['MonitoringScheduleStatus']))
```

# The cron expression for monitoring schedule
<a name="model-monitor-schedule-expression"></a>

To provide details for the monitoring schedule, use [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ScheduleConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ScheduleConfig.html), which is a `cron` expression that describes details about the monitoring schedule.

Amazon SageMaker Model Monitor supports the following `cron` expressions:
+ To set the job to start every hour, use the following:

  `Hourly: cron(0 * ? * * *)`
+ To run the job daily, use the following:

  `cron(0 [00-23] ? * * *)`
+ The run the job one time, immediately, use the following keyword:

  `NOW`

For example, the following are valid `cron` expressions:
+ Daily at 12 PM UTC: `cron(0 12 ? * * *)`
+ Daily at 12 AM UTC: `cron(0 0 ? * * *)`

To support running every 6, 12 hours, Model Monitor supports the following expression:

`cron(0 [00-23]/[01-24] ? * * *)`

For example, the following are valid `cron` expressions:
+ Every 12 hours, starting at 5 PM UTC: `cron(0 17/12 ? * * *)`
+ Every two hours, starting at 12 AM UTC: `cron(0 0/2 ? * * *)`

**Notes**  
Although the `cron` expression is set to start at 5 PM UTC, note that there could be a delay of 0-20 minutes from the actual requested time to run the execution.
If you want to run on a daily schedule, don't provide this parameter. SageMaker AI picks a time to run every day.
Currently, SageMaker AI only supports hourly integer rates between 1 hour and 24 hours.

# Configuring service control policies for monitoring schedules
<a name="model-monitor-scp-rules"></a>

 You have to specify the parameters of a monitoring job when you create or update a schedule for it with the [CreateMonitoringSchedule](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateMonitoringSchedule.html) API or the [UpdateMonitoringSchedule](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateMonitoringSchedule.html) API, respectively. Depending on your use case, you can do this in one of the following ways: 
+  You can specify the [MonitoringJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_MonitoringJobDefinition.html) field of [MonitoringScheduleConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_MonitoringScheduleConfig.html), when you invoke `CreateMonitoringSchedule` or `UpdateMonitoringSchedule`. You can use this only to create or update a schedule for a data quality monitoring job. 
+  You can specify the name of a monitoring job definition, that you have already created, for the `MonitoringJobDefinitionName` field of `MonitoringScheduleConfig`, when you invoke `CreateMonitoringSchedule` or `UpdateMonitoringSchedule`. You can use this for any job definition that you create with one of the following APIs: 
  +  [CreateDataQualityJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateDataQualityJobDefinition.html) 
  +  [CreateModelQualityJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelQualityJobDefinition.html) 
  +  [CreateModelBiasJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelBiasJobDefinition.html) 
  +  [CreateModelExplainabilityJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelExplainabilityJobDefinition.html) 

   If you want to use the SageMaker Python SDK to create or update schedules, then you have to use this process. 

 The aforementioned processes are mutually exclusive, that is, you can either specify the `MonitoringJobDefinition` field or the `MonitoringJobDefinitionName` field when creating or updating monitoring schedules. 

 When you create a monitoring job definition, or specify one in the `MonitoringJobDefinition` field, you can set security parameters, such as `NetworkConfig` and `VolumeKmsKeyId`. As an administrator, you might want that these parameters are always set to certain values, so that the monitoring jobs always run in a secure environment. To ensure this, set up appropriate [Service control policies](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html) (SCPs). SCPs are a type of organization policy that you can use to manage permissions in your organization. 

 The following example shows a SCP that you can use to ensure that infrastructure parameters are properly set when creating or updating schedules for monitoring jobs. 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateDataQualityJobDefinition",
                "sagemaker:CreateModelBiasJobDefinition",
                "sagemaker:CreateModelExplainabilityJobDefinition",
                "sagemaker:CreateModelQualityJobDefinition"
            ],
            "Resource": "arn:*:sagemaker:*:*:*",
            "Condition": {
                "Null": {
                    "sagemaker:VolumeKmsKey":"true",
                    "sagemaker:VpcSubnets": "true",
                    "sagemaker:VpcSecurityGroupIds": "true"
                }
            }
        },
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateDataQualityJobDefinition",
                "sagemaker:CreateModelBiasJobDefinition",
                "sagemaker:CreateModelExplainabilityJobDefinition",
                "sagemaker:CreateModelQualityJobDefinition"
            ],
            "Resource": "arn:*:sagemaker:*:*:*",
            "Condition": {
                "Bool": {
                    "sagemaker:InterContainerTrafficEncryption": "false"
                }
            }
        },
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateMonitoringSchedule",
                "sagemaker:UpdateMonitoringSchedule"
            ],
            "Resource": "arn:*:sagemaker:*:*:monitoring-schedule/*",
            "Condition": {
                "Null": {
                    "sagemaker:ModelMonitorJobDefinitionName": "true"
                }
            }
        }
    ]
}
```

------

 The first two rules in the example, ensure that the security parameters are always set for monitoring job definitions. The final rule requires that anyone, in your organization, creating or updating a schedule, have to always specify the `MonitoringJobDefinitionName` field. This ensures that no one in your organization, can set insecure values for the security parameters by specifying the `MonitoringJobDefinition` field, when creating or updating schedules. 