

# Bias drift for models in production
<a name="clarify-model-monitor-bias-drift"></a>

Amazon SageMaker Clarify bias monitoring helps data scientists and ML engineers monitor predictions for bias on a regular basis. As the model is monitored, customers can view exportable reports and graphs detailing bias in SageMaker Studio and configure alerts in Amazon CloudWatch to receive notifications if bias beyond a certain threshold is detected. Bias can be introduced or exacerbated in deployed ML models when the training data differs from the data that the model sees during deployment (that is, the live data). These kinds of changes in the live data distribution might be temporary (for example, due to some short-lived, real-world events) or permanent. In either case, it might be important to detect these changes. For example, the outputs of a model for predicting home prices can become biased if the mortgage rates used to train the model differ from current, real-world mortgage rates. With bias detection capabilities in Model Monitor, when SageMaker AI detects bias beyond a certain threshold, it automatically generates metrics that you can view in SageMaker Studio and through Amazon CloudWatch alerts. 

In general, measuring bias only during the train-and-deploy phase might not be sufficient. It is possible that after the model has been deployed, the distribution of the data that the deployed model sees (that is, the live data) is different from data distribution in the training dataset. This change might introduce bias in a model over time. The change in the live data distribution might be temporary (for example, due to some short-lived behavior like the holiday season) or permanent. In either case, it might be important to detect these changes and take steps to reduce the bias when appropriate.

To detect these changes, SageMaker Clarify provides functionality to monitor the bias metrics of a deployed model continuously and raise automated alerts if the metrics exceed a threshold. For example, consider the DPPL bias metric. Specify an allowed range of values A=(amin​,amax​), for instance an interval of (-0.1, 0.1), that DPPL should belong to during deployment. Any deviation from this range should raise a *bias detected* alert. With SageMaker Clarify, you can perform these checks at regular intervals.

For example, you can set the frequency of the checks to 2 days. This means that SageMaker Clarify computes the DPPL metric on data collected during a 2-day window. In this example, Dwin​ is the data that the model processed during last 2-day window. An alert is issued if the DPPL value bwin​ computed on Dwin​ falls outside of an allowed range A. This approach to checking if bwin​ is outside of A can be somewhat noisy. Dwin​ might consist of very few samples and might not be representative of the live data distribution. The small sample size means that the value of bias bwin​ computed over Dwin​ might not be a very robust estimate. In fact, very high (or low) values of bwin​ may be observed purely due to chance. To ensure that the conclusions drawn from the observed data Dwin​ are statistically significant, SageMaker Clarify makes use of confidence intervals. Specifically, it uses the Normal Bootstrap Interval method to construct an interval C=(cmin​,cmax​) such that SageMaker Clarify is confident that the true bias value computed over the full live data is contained in C with high probability. Now, if the confidence interval C overlaps with the allowed range A, SageMaker Clarify interprets it as “it is likely that the bias metric value of the live data distribution falls within the allowed range”. If C and A are disjoint, SageMaker Clarify is confident that the bias metric does not lie in A and raises an alert.

## Model Monitor Sample Notebook
<a name="clarify-model-monitor-sample-notebooks-bias-drift"></a>

Amazon SageMaker Clarify provides the following sample notebook that shows how to capture inference data for a real-time endpoint, create a baseline to monitor evolving bias against, and inspect the results: 
+ [Monitoring bias drift and feature attribution drift Amazon SageMaker Clarify](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability.html) – Use Amazon SageMaker Model Monitor to monitor bias drift and feature attribution drift over time.

This notebook has been verified to run in Amazon SageMaker Studio only. If you need instructions on how to open a notebook in Amazon SageMaker Studio, see [Create or Open an Amazon SageMaker Studio Classic Notebook](notebooks-create-open.md). If you're prompted to choose a kernel, choose **Python 3 (Data Science)**. The following topics contain the highlights from the last two steps, and they contain code examples from the example notebook. 

**Topics**
+ [Model Monitor Sample Notebook](#clarify-model-monitor-sample-notebooks-bias-drift)
+ [Create a Bias Drift Baseline](clarify-model-monitor-bias-drift-baseline.md)
+ [Bias Drift Violations](clarify-model-monitor-bias-drift-violations.md)
+ [Parameters to Monitor Bias Drift](clarify-config-json-monitor-bias-parameters.md)
+ [Schedule Bias Drift Monitoring Jobs](clarify-model-monitor-bias-drift-schedule.md)
+ [Inspect Reports for Data Bias Drift](clarify-model-monitor-bias-drift-report.md)
+ [CloudWatch Metrics for Bias Drift Analysis](clarify-model-monitor-bias-drift-cw.md)

# Create a Bias Drift Baseline
<a name="clarify-model-monitor-bias-drift-baseline"></a>

After you have configured your application to capture real-time or batch transform inference data, the first task to monitor for bias drift is to create a baseline. This involves configuring the data inputs, which groups are sensitive, how the predictions are captured, and the model and its post-training bias metrics. Then you need to start the baselining job.

Model bias monitor can detect bias drift of ML models on a regular basis. Similar to the other monitoring types, the standard procedure of creating a model bias monitor is first baselining and then establishing a monitoring schedule.

```
model_bias_monitor = ModelBiasMonitor(
    role=role,
    sagemaker_session=sagemaker_session,
    max_runtime_in_seconds=1800,
)
```

`DataConfig` stores information about the dataset to be analyzed (for example, the dataset file), its format (that is, CSV or JSON Lines), headers (if any) and label.

```
model_bias_baselining_job_result_uri = f"{baseline_results_uri}/model_bias"
model_bias_data_config = DataConfig(
    s3_data_input_path=validation_dataset,
    s3_output_path=model_bias_baselining_job_result_uri,
    label=label_header,
    headers=all_headers,
    dataset_type=dataset_type,
)
```

`BiasConfig` is the configuration of the sensitive groups in the dataset. Typically, bias is measured by computing a metric and comparing it across groups. The group of interest is called the *facet*. For post-training bias, you should also take the positive label into account.

```
model_bias_config = BiasConfig(
    label_values_or_threshold=[1],
    facet_name="Account Length",
    facet_values_or_threshold=[100],
)
```

`ModelPredictedLabelConfig` specifies how to extract a predicted label from the model output. In this example, the 0.8 cutoff has been chosen in anticipation that customers will turn over frequently. For more complicated outputs, there are a few more options, like "label" is the index, name, or JMESPath to locate predicted label in endpoint response payload.

```
model_predicted_label_config = ModelPredictedLabelConfig(
    probability_threshold=0.8,
)
```

`ModelConfig` is the configuration related to the model to be used for inferencing. In order to compute post-training bias metrics, the computation needs to get inferences for the model name provided. To accomplish this, the processing job uses the model to create an ephemeral endpoint (also known as *shadow endpoint*). The processing job deletes the shadow endpoint after the computations are completed. This configuration is also used by the explainability monitor.

```
model_config = ModelConfig(
    model_name=model_name,
    instance_count=endpoint_instance_count,
    instance_type=endpoint_instance_type,
    content_type=dataset_type,
    accept_type=dataset_type,
)
```

Now you can start the baselining job.

```
model_bias_monitor.suggest_baseline(
    model_config=model_config,
    data_config=model_bias_data_config,
    bias_config=model_bias_config,
    model_predicted_label_config=model_predicted_label_config,
)
print(f"ModelBiasMonitor baselining job: {model_bias_monitor.latest_baselining_job_name}")
```

The scheduled monitor automatically picks up baselining job name and waits for it before monitoring begins.

# Bias Drift Violations
<a name="clarify-model-monitor-bias-drift-violations"></a>

Bias drift jobs evaluate the baseline constraints provided by the [baseline configuration](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelBiasJobDefinition.html#sagemaker-CreateModelBiasJobDefinition-request-ModelBiasBaselineConfig) against the analysis results of current `MonitoringExecution`. If violations are detected, the job lists them to the *constraint\$1violations.json* file in the execution output location, and marks the execution status as [Interpret results](model-monitor-interpreting-results.md).

Here is the schema of the bias drift violations file.
+ `facet` – The name of the facet, provided by the monitoring job analysis configuration facet `name_or_index`. 
+ `facet_value` – The value of the facet, provided by the monitoring job analysis configuration facet `value_or_threshold`.
+ `metric_name` – The short name of the bias metric. For example, "CI" for class imbalance. See [Pre-training Bias Metrics](clarify-measure-data-bias.md) for the short names of each of the pre-training bias metrics and [Post-training Data and Model Bias Metrics](clarify-measure-post-training-bias.md) for the short names of each of the post-training bias metrics.
+ `constraint_check_type` – The type of violation monitored. Currently only `bias_drift_check` is supported.
+ `description` – A descriptive message to explain the violation.

```
{
    "version": "1.0",
    "violations": [{
        "facet": "string",
        "facet_value": "string",
        "metric_name": "string",
        "constraint_check_type": "string",
        "description": "string"
    }]
}
```

A bias metric is used to measure the level of equality in a distribution. A value close to zero indicates that the distribution is more balanced. If the value of a bias metric in the job analysis results file (analysis.json) is worse than its corresponding value in the baseline constraints file, a violation is logged. As an example, if the baseline constraint for the DPPL bias metric is `0.2`, and the analysis result is `0.1`, no violation is logged because `0.1` is closer to `0` than `0.2`. However, if the analysis result is `-0.3`, a violation is logged because it is farther from `0` than the baseline constraint of `0.2`.

```
{
    "version": "1.0",
    "violations": [{
        "facet": "Age",
        "facet_value": "40",
        "metric_name": "CI",
        "constraint_check_type": "bias_drift_check",
        "description": "Value 0.0751544567666083 does not meet the constraint requirement"
    }, {
        "facet": "Age",
        "facet_value": "40",
        "metric_name": "DPPL",
        "constraint_check_type": "bias_drift_check",
        "description": "Value -0.0791244970125596 does not meet the constraint requirement"
    }]
}
```

# Parameters to Monitor Bias Drift
<a name="clarify-config-json-monitor-bias-parameters"></a>

Amazon SageMaker Clarify bias monitoring reuses a subset of the parameters used in the analysis configuration of [Analysis Configuration Files](clarify-processing-job-configure-analysis.md). After describing the configuration parameters, this topic provides examples of JSON files. These files are used to configure CSV and JSON Lines datasets to monitor them for bias drift when machine learning models are in production.

The following parameters must be provided in a JSON file. The path to this JSON file must be provided in the `ConfigUri` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelBiasAppSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelBiasAppSpecification) API.
+ `"version"` – (Optional) Schema version of the configuration file. If not provided, the latest supported version is used.
+ `"headers"` – (Optional) A list of column names in the dataset. If the `dataset_type` is `"application/jsonlines"` and `"label"` is specified, then the last header becomes the header of the label column. 
+ `"label"` – (Optional) Target attribute for the model to be used for *bias metrics*. Specified either as a column name, or an index (if dataset format is CSV), or as a JMESPath (if dataset format is JSON Lines).
+ `"label_values_or_threshold"` – (Optional) List of label values or threshold. Indicates positive outcome used for bias metrics.
+ `"facet"` – (Optional) A list of features that are sensitive attributes, referred to as facets. Facets are used for *bias metrics* in the form of pairs, and include the following:
  + `"name_or_index"` – Facet column name or index.
  + `"value_or_threshold"` – (Optional) List of values or threshold that the facet column can take. Indicates the sensitive group, such as the group that is used to measure bias against. If not provided, bias metrics are computed as one group for every unique value (rather than all values). If the facet column is numeric, this threshold value is applied as the lower bound to select the sensitive group.
+ `"group_variable"` – (Optional) A column name or index to indicate the group variable to be used for the *bias metric* *Conditional Demographic Disparity.*

The other parameters should be provided in `EndpointInput` (for real-time endpoints) or `BatchTransformInput` (for batch transform jobs) of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelBiasJobInput](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelBiasJobInput) API.
+ `FeaturesAttribute` – This parameter is required if endpoint input data format is `"application/jsonlines"`. It is the JMESPath used to locate the feature columns if the dataset format is JSON Lines.
+ `InferenceAttribute` – Index or JMESPath location in the model output for the target attribute to be used for monitored for bias using bias metrics. If it is not provided in the CSV `accept_type` case, then it is assumed that the model output is a single numeric value corresponding to a score or probability.
+ `ProbabilityAttribute` – Index or JMESPath location in the model output for probabilities. If the model output is JSON Lines with a list of labels and probabilities, for example, then the label that corresponds to the maximum probability is selected for bias computations.
+ `ProbabilityThresholdAttribute` – (Optional) A float value to indicate the threshold to select the binary label, in the case of binary classification. The default value is 0.5.

## Example JSON Configuration Files for CSV and JSON Lines Datasets
<a name="clarify-config-json-monitor-bias-parameters-examples"></a>

Here are examples of the JSON files used to configure CSV and JSON Lines datasets to monitor them for bias drift.

**Topics**
+ [CSV Datasets](#clarify-config-json-monitor-bias-parameters-example-csv)
+ [JSON Lines Datasets](#clarify-config-json-monitor-bias-parameters-example-jsonlines)

### CSV Datasets
<a name="clarify-config-json-monitor-bias-parameters-example-csv"></a>

Consider a dataset that has four feature columns and one label column, where the first feature and the label are binary, as in the following example.

```
0, 0.5814568701544718, 0.6651538910132964, 0.3138080342665499, 0
1, 0.6711642728531724, 0.7466687034026017, 0.1215477472819713, 1
0, 0.0453256543003371, 0.6377430803264152, 0.3558625219713576, 1
1, 0.4785191813363956, 0.0265841045263860, 0.0376935084990697, 1
```

Assume that the model output has two columns, where the first one is the predicted label and the second one is the probability, as in the following example.

```
1, 0.5385257417814224
```

Then the following JSON configuration file shows an example of how this CSV dataset can be configured.

```
{
    "headers": [
        "feature_0",
        "feature_1",
        "feature_2",
        "feature_3",
        "target"
    ],
    "label": "target",
    "label_values_or_threshold": [1],
    "facet": [{
        "name_or_index": "feature_1",
        "value_or_threshold": [1]
    }]
}
```

The predicted label is selected by the `"InferenceAttribute"` parameter. Zero-based numbering is used, so 0 indicates the first column of the model output,

```
"EndpointInput": {
    ...
    "InferenceAttribute": 0
    ...
}
```

Alternatively, you can use different parameters to convert probability values to binary predicted labels. Zero-based numbering is used: 1 indicates the second column; the `ProbabilityThresholdAttribute` value of 0.6 indicates that a probability greater than 0.6 predicts the binary label as 1.

```
"EndpointInput": {
    ...
    "ProbabilityAttribute": 1,
    "ProbabilityThresholdAttribute": 0.6
    ...
}
```

### JSON Lines Datasets
<a name="clarify-config-json-monitor-bias-parameters-example-jsonlines"></a>

Consider a dataset that has four feature columns and one label column, where the first feature and the label are binary, as in the following example.

```
{"features":[0, 0.5814568701544718, 0.6651538910132964, 0.3138080342665499], "label":0}
{"features":[1, 0.6711642728531724, 0.7466687034026017, 0.1215477472819713], "label":1}
{"features":[0, 0.0453256543003371, 0.6377430803264152, 0.3558625219713576], "label":1}
{"features":[1, 0.4785191813363956, 0.0265841045263860, 0.0376935084990697], "label":1}
```

Assume that the model output has two columns, where the first is a predicted label and the second is a probability.

```
{"predicted_label":1, "probability":0.5385257417814224}
```

The following JSON configuration file shows an example of how this JSON Lines dataset can be configured.

```
{
    "headers": [
        "feature_0",
        "feature_1",
        "feature_2",
        "feature_3",
        "target"
    ],
    "label": "label",
    "label_values_or_threshold": [1],
    "facet": [{
        "name_or_index": "feature_1",
        "value_or_threshold": [1]
    }]
}
```

Then, the `"features"` parameter value in `EndpointInput` (for real-time endpoints) or `BatchTransformInput` (for batch transform jobs) is used to locate the features in the dataset, and the `"predicted_label"` parameter value selects the predicted label from the model output. 

```
"EndpointInput": {
    ...
    "FeaturesAttribute": "features",
    "InferenceAttribute": "predicted_label"
    ...
}
```

Alternatively, you can convert probability values to predicted binary labels using the `ProbabilityThresholdAttribute` parameter value. A value of 0.6, for example, indicates that a probability greater than 0.6 predicts the binary label as 1.

```
"EndpointInput": {
    ...
    "FeaturesAttribute": "features",
    "ProbabilityAttribute": "probability",
    "ProbabilityThresholdAttribute": 0.6
    ...
}
```

# Schedule Bias Drift Monitoring Jobs
<a name="clarify-model-monitor-bias-drift-schedule"></a>

After you create your baseline, you can call the `create_monitoring_schedule()` method of your `ModelBiasModelMonitor` class instance to schedule an hourly bias drift monitor. The following sections show you how to create bias drift monitor for a model deployed to a real-time endpoint as well as for a batch transform job.

**Important**  
You can specify either a batch transform input or an endpoint input, but not both, when you create your monitoring schedule.

Unlike data quality monitoring, you need to supply Ground Truth labels if you want to monitor model quality. However, Ground Truth labels could be delayed. To address this, specify offsets when you create your monitoring schedule. For details about how to create time offsets, see [Model monitor offsets](model-monitor-model-quality-schedule.md#model-monitor-model-quality-schedule-offsets). 

If you have submitted a baselining job, the monitor automatically picks up analysis configuration from the baselining job. If you skip the baselining step or the capture dataset has a different nature from the training dataset, you must provide the analysis configuration.

## Bias drift monitoring for models deployed to real-time endpoint
<a name="model-monitor-bias-quality-rt"></a>

To schedule a bias drift monitor for a real-time endpoint, pass your `EndpointInput` instance to the `endpoint_input` argument of your `ModelBiasModelMonitor` instance, as shown in the following code sample:

```
from sagemaker.model_monitor import CronExpressionGenerator
            
model_bias_monitor = ModelBiasModelMonitor(
    role=sagemaker.get_execution_role(),
    ...
)

model_bias_analysis_config = None
if not model_bias_monitor.latest_baselining_job:
    model_bias_analysis_config = BiasAnalysisConfig(
        model_bias_config,
        headers=all_headers,
        label=label_header,
    )

model_bias_monitor.create_monitoring_schedule(
    monitor_schedule_name=schedule_name,
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=model_bias_monitor.baseline_statistics(),
    constraints=model_bias_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
    analysis_config=model_bias_analysis_config,
    endpoint_input=EndpointInput(
        endpoint_name=endpoint_name,
        destination="/opt/ml/processing/input/endpoint",
        start_time_offset="-PT1H",
        end_time_offset="-PT0H",
        probability_threshold_attribute=0.8,
    ),
)
```

## Bias drift monitoring for batch transform jobs
<a name="model-monitor-bias-quality-bt"></a>

To schedule a bias drift monitor for a batch transform job, pass your `BatchTransformInput` instance to the `batch_transform_input` argument of your `ModelBiasModelMonitor` instance, as shown in the following code sample:

```
from sagemaker.model_monitor import CronExpressionGenerator
                
model_bias_monitor = ModelBiasModelMonitor(
    role=sagemaker.get_execution_role(),
    ...
)

model_bias_analysis_config = None
if not model_bias_monitor.latest_baselining_job:
    model_bias_analysis_config = BiasAnalysisConfig(
        model_bias_config,
        headers=all_headers,
        label=label_header,
    )
    
schedule = model_bias_monitor.create_monitoring_schedule(
   monitor_schedule_name=schedule_name,
   post_analytics_processor_script=s3_code_postprocessor_uri,
   output_s3_uri=s3_report_path,
   statistics=model_bias_monitor.baseline_statistics(),
   constraints=model_bias_monitor.suggested_constraints(),
   schedule_cron_expression=CronExpressionGenerator.hourly(),
   enable_cloudwatch_metrics=True,
   analysis_config=model_bias_analysis_config,
   batch_transform_input=BatchTransformInput(
        destination="opt/ml/processing/input",
        data_captured_destination_s3_uri=s3_capture_path,
        start_time_offset="-PT1H",
        end_time_offset="-PT0H",
        probability_threshold_attribute=0.8
   ),
)
```

# Inspect Reports for Data Bias Drift
<a name="clarify-model-monitor-bias-drift-report"></a>

If you are not able to inspect the results of the monitoring in the generated reports in SageMaker Studio, you can print them out as follows:

```
schedule_desc = model_bias_monitor.describe_schedule()
execution_summary = schedule_desc.get("LastMonitoringExecutionSummary")
if execution_summary and execution_summary["MonitoringExecutionStatus"] in ["Completed", "CompletedWithViolations"]:
    last_model_bias_monitor_execution = model_bias_monitor.list_executions()[-1]
    last_model_bias_monitor_execution_report_uri = last_model_bias_monitor_execution.output.destination
    print(f'Report URI: {last_model_bias_monitor_execution_report_uri}')
    last_model_bias_monitor_execution_report_files = sorted(S3Downloader.list(last_model_bias_monitor_execution_report_uri))
    print("Found Report Files:")
    print("\n ".join(last_model_bias_monitor_execution_report_files))
else:
    last_model_bias_monitor_execution = None
    print("====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.")
```

 If there are violations compared to the baseline, they are listed here:

```
if last_model_bias_monitor_execution:
    model_bias_violations = last_model_bias_monitor_execution.constraint_violations()
    if model_bias_violations:
        print(model_bias_violations.body_dict)
```

If your model is deployed to a real-time endpoint, you can see visualizations in SageMaker AI Studio of the analysis results and CloudWatch metrics by choosing the **Endpoints** tab, and then double-clicking the endpoint.

# CloudWatch Metrics for Bias Drift Analysis
<a name="clarify-model-monitor-bias-drift-cw"></a>

This guide shows CloudWatch metrics and their properties that you can use for bias drift analysis in SageMaker Clarify. Bias drift monitoring jobs compute both [pre-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bias.html) and [post-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html), and publish them to the following CloudWatch namespace:
+ For real-time endpoints: `aws/sagemaker/Endpoints/bias-metrics`
+ For batch transform jobs: `aws/sagemaker/ModelMonitoring/bias-metrics` 

The CloudWatch metric name appends the metric's short name to `bias_metric`.

For example, `bias_metric_CI` is the bias metric for class imbalance (CI).

**Note**  
`+/- infinity` is published as the floating point number `+/- 2.348543e108`, and errors including null values are not published.

Each metric has the following properties:
+ `Endpoint`: The name of the monitored endpoint, if applicable.
+ `MonitoringSchedule`The name of the schedule for the monitoring job. 
+ `BiasStage`: The name of the stage of the bias drift monitoring job. Choose either `Pre-training` or `Post-Training`.
+ `Label`: The name of the target feature, provided by the monitoring job analysis configuration `label`.
+ `LabelValue`: The value of the target feature, provided by the monitoring job analysis configuration `label_values_or_threshold`.
+ `Facet`: The name of the facet, provided by the monitoring job analysis configuration facet `name_of_index`.
+ `FacetValue`: The value of the facet, provided by the monitoring job analysis configuration facet `nvalue_or_threshold`.

To stop the monitoring jobs from publishing metrics, set `publish_cloudwatch_metrics` to `Disabled` in the `Environment` map of [model bias job](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelBiasJobDefinition.html) definition.