

# Create a model evaluation job with Amazon Bedrock
Create a model evaluation job

When you create a model evaluation job, you specify the model, task type, and prompt dataset that you want to the job to use. You also specify the metrics that you want the job to create.

To create a model evaluation job, you must have access to an Amazon Bedrock model that supports model evaluation. For more information, see [Model support by feature](https://docs.aws.amazon.com/bedrock/latest/userguide/models-features.html) in the *Amazon Bedrock user guide*. If you don't have access to a suitable model, contact your administrator. 

 Model evaluation supports the following task types that assess different aspects of the model's performance:
+ **[General text generation](model-evaluation-tasks-general-text.md)** – the model performs natural language processing and text generation tasks.
+ **[Text summarization](model-evaluation-tasks-text-summary.md)** – the model performs summarizes text based on the prompts you provide.
+ **[Question and answer](model-evaluation-tasks-question-answer.md)** – the model provides answers based on your prompts.
+ **[Text classification](model-evaluation-text-classification.md)** – the model categorizes text into predefined classes based on the input dataset.

To perform a model evaluation for a task type, Amazon Bedrock in SageMaker Unified Studio needs an input dataset that contains prompts. The job uses the dataset for inference during evaluation. You can use a [built in](model-evaluation-prompt-datasets-builtin.md) dataset that Amazon Bedrock in SageMaker Unified Studio suppplies or supply your own [custom](model-evaluation-prompt-datasets-custom.md) prompt dataset. To create a custom prompt dataset, use the information at [custom prompt](model-evaluation-prompt-datasets-custom.md). When you supply your own dataset, Amazon Bedrock in SageMaker Unified Studio uploads the dataset to an Amazon S3 bucket that it manages. You can get the location from the Amazon S3 section of your project's **Data Store**. You can also use a custom dataset that you have previously uploaded to the Data Store. 

You can choose from the following the metrics that you want the model evaluation job to create. 
+ **Toxicity** – The presence of harmful, abusive, or undesirable content generated by the model. 
+ **Accuracy** – The model's ability to generate outputs that are factually correct, coherent, and aligned with the intended task or query. 
+ **Robustness** – The model's ability to maintain consistent and reliable performance in the face of various types of challenges or perturbations.

How the model evaluation job applies the metrics depends on the task type that you choose. For more information, see [Review a model model evaluation job in Amazon Bedrock](model-evaluation-report.md).

You can tag model evaluation jobs for purposes such as tracking costs. Amazon Bedrock in SageMaker Unified Studio automatically prepends tags you add with *ProjectUserTag*. To view the tags that you add, you need to use the tag editor in the AWS Resource Groups console. For more information, see [What is Tag Editor?](https://docs.aws.amazon.com/tag-editor/latest/userguide/gettingstarted.html) in the *AWS Resource Management Documentation*.

You can set the inference parameters for the model evaluation job. You can change *Max tokens*, *temperature*, and *Top P* inference parameters. Models might support other parameters that you can change. For more information, see [Inference request parameters and response fields for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html) in the *Amazon Bedrock user guide*.

**To create an automatic model evaluation job**

1. Navigate to the Amazon SageMaker Unified Studio landing page by using the URL from your administrator.

1. Access Amazon SageMaker Unified Studio using your IAM or single sign-on (SSO) credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. If you want to use a new project, do the following:

   1. Choose the current project at the top of the page. If a project isn't already open, choose **Select a project**.

   1. Select **Create project**. 

   1. Follow the instructions at [Create a new project](create-new-project.md). For the **Project profile** in step 1, choose **Generative AI application development**.

1. If the project that you want to use isn't already open, do the following:

   1. Choose the current project at the top of the page. If a project isn't already open, choose **Select a project**.

   1. Select **Browse all projects**. 

   1. In **Projects** select the project that you want to use.

1. At the top of the page, select **Build**. 

1. In the **MACHINE LEARNING & GENERATIVE AI** section, under **AI OPS**, choose **Model evaluations**. 

1. Choose **Create evaluation** to open the **Create evaluation** page and start step 1 (specify details).

1. For **Evaluation job name**, enter a name for the evaluation job. This name is shown in your model evaluation job list. 

1. (Optional) For **Description** enter a description.

1. (Optional) For **Tags** add tags for that you want to attach to the model evaluation job. 

1. Choose **Next** to start step 2 (set up evaluation).

1. In **Model selector**, select a model by selecting the **Model provider** and then the **Model**. 

1. (Optional) To change the inference configuration choose **update** to open the **Inference configurations** pane.

1. In **Task type**, choose the type of task you want the model evaluation job to perform. For information about the available task types, see [Model evaluation task types in Amazon Bedrock](model-evaluation-tasks.md).

1. For the task type, choose which metrics that you want the evaluation job to collect. For information about available metrics, see [Review a model model evaluation job in Amazon Bedrock](model-evaluation-report.md). 

1. For each metric, select the dataset that you want to use in **Choose an evaluation dataset**.
   + To use a [built in](model-evaluation-prompt-datasets-builtin.md) dataset, choose **Built in datasets** and choose the metrics that you want use.
   + To upload a [custom dataset](model-evaluation-prompt-datasets-custom.md), choose **Upload a dataset to S3** and upload the dataset file. 
   + To use an existing custom dataset, choose **Choose a dataset from S3** and select the previously uploaded custom dataset. 

1. Choose **Next** to start step 3 (review and submit).

1. Check that the evaluation job details are correct.

1. Choose **Submit** to start the model evaluation job.

1. Wait until the model evaluation job finishes. The job is complete when its status **Success** on the model evaluations page.

1. Next step: [Review](model-evaluation-report.md) the results of the model evaluation job.

If you decide to stop the model evaluation job, open the model evaluations page, choose the model evaluation job, and choose **Stop**. To delete the evaluation, choose **Stop**.