# Fine-tune Amazon Nova models with reinforcement fine-tuning
Fine-tune Amazon Nova models

Before fine-tuning, ensure you have the pre-requisites as Amazon Bedrock needs specific permissions to create and manage the fine-tuning process. For comprehensive security and permissions information, see [Access and security for Amazon Nova models](rft-access-security.md).

Run reinforcement fine-tuning for Amazon Nova models in 5 steps:

1. **Provide Training Dataset** – Upload prompts in required format (e.g., JSONL) as the reinforcement fine-tuning training dataset. For more information, see [Prepare data for Amazon Nova models](rft-prepare-data.md).

1. **Configure Reward Function (grader)** – Define a grader to score model responses based on correctness, structure, tone, or other objectives. The reward function can be executed using Lambda to compute objective scores. You can also choose a model as a judge (via console) and grade responses based on criteria and principles you configure (the console converts these into Lambda functions automatically). For more information, see [Setting up reward functions for Amazon Nova models](reward-functions.md).

1. **Submit reinforcement fine-tuning Job** – Launch the reinforcement fine-tuning job by specifying base model, dataset, reward function, and other optional settings such as hyperparameters. For more information, see [Create and manage fine-tuning jobs for Amazon Nova models](rft-submit-job.md).

1. **Monitor Training** – Track job status, reward metrics, and training progress until completion. For more information, see [Monitor your RFT training job](rft-submit-job.md#rft-monitor-job).

1. **Use Fine-Tuned Model** – After job completion, deploy the resulting RFT model with one click for on-demand inference. You can also use Provisioned Throughput for mission-critical workloads that require consistent performance. See [Set up inference for a custom model](model-customization-use.md). Use **Test in Playground** to evaluate and compare responses with the base model.

**Important**  
You can provide a maximum of 20K prompts to Amazon Bedrock for reinforcement fine-tuning the model.

## Supported Nova models


The following table shows the Amazon Nova models that you can customize with reinforcement fine-tuning:

**Note**  
For information about additional supported models including open-weight models, see [Fine-tune open-weight models using OpenAI-compatible APIs](fine-tuning-openai-apis.md).


**Supported models for reinforcement fine-tuning**  

| Provider | Model | Model ID | Single-region model support | 
| --- | --- | --- | --- | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0:256k | us-east-1 | 

# Access and security for Amazon Nova models
Access and security

Before you begin reinforcement fine-tuning, make sure that you understand what kind of access Amazon Bedrock needs for RFT-specific operations. RFT requires additional permissions beyond standard fine-tuning due to its reward function execution capabilities.

For basic model customization security setup including trust relationships, Amazon S3 permissions, and KMS encryption, see [Create an IAM service role for model customization](custom-model-job-access-security.md#custom-model-job-service-role).

## Prerequisites


Before adding RFT-specific IAM permissions, you must add the following IAM service roles:
+ [Trust relationship](custom-model-job-access-security.md#custom-model-job-service-role-trust-relationship)
+ [Permissions to access training and validation files and to write output files in S3](custom-model-job-access-security.md#custom-model-job-service-role-s3-permissions)

## RFT-specific IAM permissions


Add these permissions to your existing model customization service role for RFT functionality.

### Lambda permissions for reward functions


You must add Lambda invocation permissions. The following shows an example policy you can use:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:reward-function-name"
            ]
        }
    ]
}
```

### Invocation log access


To use existing Amazon Bedrock model invocation logs as training data, add permissions to access your Amazon S3 bucket where invocation logs are stored. 

You need to provide Amazon S3 bucket access permissions for the input bucket. The following shows an example policy you can use:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::s3-invocation-logs-bucket",
                "arn:aws:s3:::s3-invocation-logs-bucket/*"
            ]
        }
    ]
}
```

For security setup including basic IAM roles, Amazon S3 permissions, and encryption, see [Create an IAM service role for model customization](custom-model-job-access-security.md#custom-model-job-service-role).

## Grader Lambda function permissions for RLAIF


If you create your own Lambda function for Reinforcement Learning from AI Feedback (RLAIF) reward functions, you need to add specific permissions to the Lambda execution role.

### Bedrock permissions for LLM judges


For LLM-as-Judge reward functions (RLAIF), add permissions to invoke foundation models. The following shows an example policy you can use for your Lambda execution role.

**Note**  
Only add these permissions to your Lambda execution role if you create your own Lambda function. The console handles this automatically when creating Lambda functions through the console.

The following is an example for bedrock LLM as judge invocation using foundation models:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:foundation-model/*"
            ]
        }
    ]
}
```

The following is an example for bedrock LLM as judge invocation using inference profile:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-premier-v1:0",
                "arn:aws:bedrock:us-east-2::foundation-model/amazon.nova-premier-v1:0",
                "arn:aws:bedrock:us-west-2::foundation-model/amazon.nova-premier-v1:0"
            ],
            "Condition": {
                "StringLike": {
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-east-1:111122223333:inference-profile/us.amazon.nova-premier-v1:0"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1:111122223333:inference-profile/us.amazon.nova-premier-v1:0"
            ]
        }
    ]
}
```

For information about inference profile prerequisites, see [ Prerequisites for inference profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-prereq.html).

# Prepare data for Amazon Nova models
Prepare data

When you fine-tune an Amazon Nova model with reinforcement fine-tuning, you can bring your own prompts or use existing Amazon Bedrock API invocation logs as training data.

## Training data requirements and sources


You can provide training data through one of the following options:

**Note**  
We only support the OpenAI chat completion format.

### Option 1: Provide your own prompts


Collect your prompts and store them in `.jsonl` file format. You can upload custom datasets in JSONL format or select existing datasets from Amazon S3. Each record in the JSONL must use the OpenAI chat completion format in the following structure:
+ `messages`: In this field, include the user, system or assistant role containing the input prompt provided to the model.
+ `reference_answer`: In this field, it should contain the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limitedto structured outputs—it can contain any format that helps your reward function evaluate quality.
+ [Optional] You can add fields used by grader Lambda for grading.

**Requirements:**
+ JSONL format with prompts in OpenAI chat completion format (one prompt per line)
+ A minimum of 100 records in training dataset
+ Amazon Bedrock automatically validates training dataset format

------
#### [ Example: General question-answering ]

```
{
            "messages": [
                {
                    "role": "system", 
                    "content": "You are a helpful assistant"
                },
                {
                    role": "user", 
                    "content": "What is machine learning?"}
            ],
            "reference_answer": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed."
            }
```

------
#### [ Example: Math problem ]

```
{
  "id": "sample-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  }
}
```

------

### Option 2: Use invocation logs


When you create a reinforcement fine-tuning job, you can have Amazon Bedrock use existing invocation logs from your S3 bucket as training data. For Amazon Bedrock, an invocation log is a detailed record of model invocations.

You can use customer-side stored Invoke/Converse API invocation logs from Amazon S3 for training.

**Requirements:**
+ API logging must be enabled for your Amazon Bedrock usage
+ Logs must be in a supported format (Amazon Bedrock Invoke/Converse API)
+ A minimum of 100 prompt examples

To use invocation logs for reinforcement fine-tuning, set the model invocation logging on, use one of the model invocation operations, and make sure that you've set up an Amazon S3 bucket as the destination for the logs. For more information about setting up the invocation logs, see [Monitor model invocation using CloudWatch Logs and Amazon S3](https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html).

Before you can start the reinforcement fine-tuning job with invocation logs from an S3 bucket as input, you must provide Amazon Bedrock permissions to access the logs from an S3 Bucket. For more information, see [Model customization access and security](custom-model-job-access-security.md).

You can optionally add request metadata to the prompt-response pairs in the invocation log using one of the model invocation operations and then later use it to filter the logs. Amazon Bedrock can use the filtered logs to fine-tune the model.

#### Add request metadata to prompts and responses in your invocation logs


With invocation logs, you can identify the training dataset prompts that you want Amazon Bedrock to use for reinforcement fine-tuning using request metadata attached to invocation logs. 

The following is an example of an invocation log from an `InvokeModel` call with the `requestMetadata`:

```
{
"schemaType": "ModelInvocationLog",
    "schemaVersion": "1.0",
    "timestamp": "2024-09-11T17:16:18Z",
    "accountId": "XXXXX",
    "identity": {
"arn": "arn:aws:sts::XXXXXXX:assumed-role/Admin/XXXXXX"
    },
    "region": "us-east-1",
    "requestId": "7ee514f3-9eff-467c-af75-8fd564ce58e5",
    "operation": "InvokeModel",
    "modelId": "meta.llama3-1-405b-instruct-v1:0",
    "input": {
"inputContentType": "application/json",
        "inputBodyJson": {
"prompt": "story of two dogs",
            "max_tokens_to_sample": 300
        },
        "inputTokenCount": 13
    },
    "output": {
"outputContentType": "application/json",
        "outputBodyJson": {
"type": "completion",
            "completion": " Here is a story about two dogs:\n\nRex and Charlie were best friends who lived next door to each other. Every morning, their owners would let them out into the backyard to play. Rex was a golden retriever with long, shaggy fur. He loved to run and chase sticks. Charlie was a corgi with short legs and a big smile. He was happy just sniffing the grass and rolling around. \n\nThe two dogs spent every day playing together. Rex would grab a stick and bring it over to Charlie, wagging his tail excitedly. Charlie would take the other end of the stick in his mouth and they'd have a friendly game of tug-of-war. After tiring themselves out, they'd flop down in the warm sunshine for a nap. \n\nAt lunchtime, their owners would call them in for food. Rex would gobble up his kibble hungrily while Charlie ate his at a more leisurely pace. After lunch, it was right back outside for more playtime. The afternoon sunbeams would light up their fur as they chased each other around the yard. \n\nWhen it started getting late, their owners called them in for the night. Rex and Charlie would head inside, tired but happy after another fun day of play. After slurping up fresh water from their bowls, they'd curl up on their beds, Rex's fluffy golden tail tucked over his nose and little",
            "stop_reason": "max_tokens",
            "stop": null
        },
        "outputTokenCount": 300
    },
    "requestMetadata": {
"project": "CustomerService",
        "intent": "ComplaintResolution",
        "priority": "High"
    }
}
```

You can specify the invocation log as your input data source when you start a reinforcement fine-tuning job. You can start a reinforcement fine-tuning job through the Amazon Bedrock console, using the API, AWS CLI, or SDK.

##### Requirements for providing request metadata


The request metadata must meet the following requirements:
+ Provided in the JSON `key:value` format.
+ Key and value pair must be a string of 256 characters maximum.
+ Provide a maximum of 16 key-value pairs.

##### Using request metadata filters


Once invocation logs with request metadata are available, you can apply filters based on the request metadata to selectively choose which prompts to include for fine-tuning the model. For example, you might want to include only those with `"project": "CustomerService"` and `"priority": "High"` request metadata.

To filter the logs using multiple request metadata, use a single Boolean operator `AND` or `OR`. You cannot combine these operators. For single request metadata filtering, use the `Equals` or `Not Equals` operator.

## Characteristics of effective training data


Effective RFT training data requires three key characteristics:
+ **Clarity and consistency** – Use clear, unambiguous prompts with consistent formatting. Avoid contradictory labels, ambiguous instructions, or conflicting reference answers that mislead training.
+ **Diversity** – Include varied input formats, edge cases, and difficulty levels that reflect production usage patterns across different user types and scenarios.
+ **Efficient reward functions** – Design functions that execute quickly (seconds, not minutes), parallelize with AWS Lambda, and return consistent scores for cost-effective training.

## Additional properties


The RFT data format supports custom fields beyond the core schema requirements (`messages` and `reference_answer`). This flexibility allows you to add any additional data your reward function needs for proper evaluation.

**Note**  
You don't need to configure this in your recipe. The data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the `metadata` field.

**Common additional properties**
+ `task_id` – Unique identifier for tracking
+ `difficulty_level` – Problem complexity indicator
+ `domain` – Subject area or category
+ `expected_reasoning_steps` – Number of steps in solution

These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.

**Examples with additional properties**

------
#### [ Chemistry problem ]

```
{
  "id": "chem-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful chemistry assistant"
    },
    {
      "role": "user",
      "content": "Predict hydrogen bond donors and acceptors for this SMILES: CCN(CC)CCC(=O)c1sc(N)nc1C"
    }
  ],
  "reference_answer": {
    "donor_bond_counts": 2,
    "acceptor_bond_counts": 4
  }
}
```

The `reference_answer` field contains the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limited to structured outputs—it can contain any format that helps your reward function evaluate quality.

------
#### [ Math problem with metadata ]

```
{
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  },
  "task_id": "algebra_001",
  "difficulty_level": "easy",
  "domain": "algebra",
  "expected_reasoning_steps": 3
}
```

------

# Setting up reward functions for Amazon Nova models
Setting up reward functions

Reward functions evaluate response quality and provide feedback signals for model training. You can set up reward functions using custom Lambda functions or Amazon Bedrock-hosted foundation models as judges. Guided templates are available to simplify reward function creation for common tasks like instruction following and format validation. Choose the approach that matches your task requirements.

## Reinforcement Learning via Verifiable Rewards (RLVR)


RLVR optimizes models for objective tasks such as code generation or math reasoning using verifiable rule-based graders or ready-to-use templates.

You have two options for RLVR (Custom Code):

### Option 1: Use console-provided templates


Amazon Bedrock console provides sample templates for grader Lambda functions:
+ Mathematical reasoning with ground truth verification
+ Format validation and constraint checking
+ Generic grader Lambda template with boilerplate code

Follow the instructions in the provided template on the **Create RFT job** page in the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock).

### Option 2: Bring your own Lambda function


Create custom reward functions using your own Lambda ARN for complex logic, external APIs, multi-step calculations, or combining multiple evaluation criteria.

**Note**  
If you bring your own Lambda function, keep the following in mind:  
Increase the Lambda timeout from default 3 seconds to maximum 15 minutes for complex evaluations.
The Lambda execution role needs permissions to invoke models as described in [Access and security for Amazon Nova models](rft-access-security.md).

## Reinforcement Learning via AI Feedback (RLAIF)


RLAIF optimizes models for subjective tasks such as instruction following or chatbot interactions using AI-based judges with ready-to-use templates.

**For RLAIF (Model as Judge):**
+ Select an Amazon Bedrock hosted base Model as Judge
+ Configure instructions for evaluation
+ Define evaluation criteria and scoring guidelines

Available LLM-as-Judge prompt templates in the Amazon Bedrock console:
+ Instruction following (Judge model training)
+ Summarization (Multi-turn dialogs)
+ Reasoning evaluation (CoT for specialized domains)
+ RAG faithfulness (Context-grounded Q&A)

**Note**  
The console's **Model as Judge** option automatically converts your configuration into a Lambda function during training.

## Lambda function implementation details


When implementing custom Lambda reward functions, your function must accept and return data in the following format.

------
#### [ Input structure ]

```
[{
  "id": "123",
  "messages": [
    {
      "role": "user",
      "content": "Do you have a dedicated security team?"
    },
    {
      "role": "assistant",
      "content": "As an AI developed by Amazon, I don not have a dedicated security team..."
    }
  ],
  "metadata": {
    "reference_answer": {
      "compliant": "No",
      "explanation": "As an AI developed by Company, I do not have a traditional security team..."
    },
    "my_key": "sample-001"
  }
}]
```

------
#### [ Output structure ]

```
[{
  "id": "123",
  "aggregate_reward_score": 0.85,
  "metrics_list": [
    {
      "name": "accuracy",
      "value": 0.9,
      "type": "Reward"
    },
    {
      "name": "policy_compliance",
      "value": 0.8,
      "type": "Metric"
    }
  ]
}]
```

------

**Design guidelines**
+ **Rank responses** – Give the best answer a clearly higher score
+ **Use consistent checks** – Evaluate task completion, format adherence, safety, and reasonable length
+ **Maintain stable scaling** – Keep scores normalized and non-exploitable

# Create and manage fine-tuning jobs for Amazon Nova models
Create fine-tuning jobs

You can create a reinforcement fine-tuning (RFT) job using the Amazon Bedrock console or API. The RFT job can take a few hours depending on the size of your training data, number of epochs, and complexity of your reward functions.

## Prerequisites

+ Create an IAM service role with the required permissions. For comprehensive security and permissions information including RFT-specific permissions, see [Access and security for Amazon Nova models](rft-access-security.md).
+ (Optional) Encrypt input and output data, your RFT job, or inference requests made to custom models. For more information, see [ Encryption of custom models](https://docs.aws.amazon.com/bedrock/latest/userguide/encryption-custom-job.html).

## Create your RFT job


Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

To submit an RFT job in the console, carry out the following steps:

1. Open the Amazon Bedrock console and navigate to **Custom models** under **Tune**.

1. Choose **Create**, then **Create reinforcement fine-tuning job**.

1. In the **Model details** section, choose **Amazon Nova 2 Lite** as your base model.

1. In the **Customization details** section, enter the customization name.

1. In the **Training data** section, choose your data source. Either select from your available invocation logs stored in Amazon S3, or select the Amazon S3 location of your training dataset file, or upload a file directly from your device.
**Note**  
Your training dataset should be in the OpenAI Chat Completions data format. If you provide invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to the Chat Completions format.

1. In the **Reward function** section, set up your reward mechanism:
   + **Model as judge (RLAIF)** - Select a Bedrock hosted base model as judge and configure the instructions for evaluation. Use this for subjective tasks like content moderation.
**Note**  
The console's **Model as judge** option automatically converts your configuration into a Lambda function during training.
   + **Custom code (RLVR)** - Create custom reward functions using Python code executed through Lambda functions. Use this for objective tasks like code generation.

   For more information, see [Setting up reward functions for Amazon Nova models](reward-functions.md).

1. (Optional) In the **Hyperparameters** section, adjust training parameters or use default values.

1. In the **Output data** section, enter the Amazon S3 location where Amazon Bedrock should save job outputs.

1. In the **Role configuration** section, either choose an existing role from the dropdown list or enter a name for the service role to create.

1. (Optional) In the **Additional configuration** section, configure the validation data by pointing to an Amazon S3 bucket, KMS encryption settings, and job and model tags.

1. Choose **Create reinforcement fine-tuning job** to begin the job.

------
#### [ API ]

Send a CreateModelCustomizationJob request with `customizationType` set to `REINFORCEMENT_FINE_TUNING`.

**Required fields:** `roleArn`, `baseModelIdentifier`, `customModelName`, `jobName`, `trainingDataConfig`, `outputDataConfig`, `rftConfig`

**Example request:**

```
{
    "roleArn": "arn:aws:iam::123456789012:role/BedrockRFTRole",
    "baseModelIdentifier": "amazon.nova-2.0",
    "customModelName": "my-rft-model",
    "jobName": "my-rft-job",
    "customizationType": "REINFORCEMENT_FINE_TUNING",
    "trainingDataConfig": {
        "s3Uri": "s3://my-bucket/training-data.jsonl"
    },
    "customizationConfig": {
        "rftConfig" : {
            "graderConfig": {
                "lambdaGrader": {
                    "lambdaArn": "arn:aws:lambda:us-east-1:123456789012:function:function-name"
                }
            },
            "hyperParameters": {
                "batchSize": 64,
                "epochCount": 2,
                "evalInterval": 10,
                "inferenceMaxTokens": 8192,
                "learningRate": 0.00001,
                "maxPromptLength": 4096,
                "reasoningEffort": "high",
                "trainingSamplePerPrompt": 4
            }
        }
    },
    "outputDataConfig": {
        "s3Uri": "s3://my-bucket/rft-output/"
    }
}
```

**Python API sample request:**

```
import boto3

bedrock = boto3.client(service_name='bedrock')
    
# Set parameters
customizationType = "REINFORCEMENT_FINE_TUNING"
baseModelIdentifier = "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-lite-v1:0:256k"
roleArn = "${your-customization-role-arn}"
jobName = "MyFineTuningJob"
customModelName = "MyCustomModel"

customizationConfig = {
    'rftConfig' : {
        'graderConfig': {
            'lambdaGrader': {
                'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:function-name'
            }
        },
        'hyperParameters': {
            'batchSize': 64,
            'epochCount': 2,
            'evalInterval': 10,
            'inferenceMaxTokens': 8192,
            'learningRate':0.00001,
            'maxPromptLength': 4096,
            'reasoningEffort': 'high',
            'trainingSamplePerPrompt':4
        }
    }
}

trainingDataConfig = {"s3Uri": "s3://${training-bucket}/myInputData/train.jsonl"}
outputDataConfig = {"s3Uri": "s3://${output-bucket}/myOutputData"}

# Create job
response_ft = bedrock.create_model_customization_job(
    jobName=jobName, 
    customModelName=customModelName,
    roleArn=roleArn,
    baseModelIdentifier=baseModelIdentifier,
    customizationConfig=customizationConfig,
    trainingDataConfig=trainingDataConfig,
    outputDataConfig=outputDataConfig,
    customizationType=customizationType
)

jobArn = response_ft['jobArn']
```

------

## Monitor your RFT training job
Monitor your RFT training job

Amazon Bedrock provides real-time monitoring with visual graphs and metrics during RFT training. These metrics help you understand whether the model converges properly and if the reward function effectively guides the learning process.

### Job status tracking


You can monitor your RFT job status through the validation and training phases in the Amazon Bedrock console.

**Completion indicators:**
+ Job status changes to **Completed** when training completes successfully
+ Custom model ARN becomes available for deployment
+ Training metrics reach convergence thresholds

### Real-time training metrics


Amazon Bedrock provides real-time monitoring during RFT training with visual graphs displaying training and validation metrics.

#### Core training metrics

+ **Training loss** - Measures how well the model is learning from the training data
+ **Training reward statistics** - Shows reward scores assigned by your reward functions
+ **Reward margin** - Measures the difference between good and bad response rewards
+ **Accuracy on training and validation sets** - Shows model performance on both the training and held-out data

**Detailed metric categories**
+ **Reward metrics** – `critic/rewards/mean`, `critic/rewards/max`, `critic/rewards/min` (reward distribution), and `val-score/rewards/mean@1` (validation rewards)
+ **Model behavior** – `actor/entropy` (policy variation; higher equals more exploratory)
+ **Training health** – `actor/pg_loss` (policy gradient loss), `actor/pg_clipfrac` (frequency of clipped updates), and `actor/grad_norm` (gradient magnitude)
+ **Response characteristics** – `prompt_length/mean`, `prompt_length/max`, `prompt_length/min` (input token statistics), `response_length/mean`, `response_length/max`, `response_length/min` (output token statistics), and `response/aborted_ratio` (incomplete generation rate; 0 equals all completed)
+ **Performance** – `perf/throughput` (training throughput), `perf/time_per_step` (time per training step), and `timing_per_token_ms/*` (per-token processing times)
+ **Resource usage** – `perf/max_memory_allocated_gb`, `perf/max_memory_reserved_gb` (GPU memory), and `perf/cpu_memory_used_gb` (CPU memory)

#### Training progress visualization


The console displays interactive graphs that update in real-time as your RFT job progresses. These visualizations can help you:
+ Track convergence toward optimal performance
+ Identify potential training issues early
+ Determine optimal stopping points
+ Compare performance across different epochs

## Set up inference


After job completion, deploy the RFT model for on-demand inference or use Provisioned Throughput for consistent performance. For setting up inference, see [Set up inference for a custom model](model-customization-use.md).

Use **Test in Playground** to evaluate and compare responses with the base model. For evaluating your completed RFT model, see [Evaluate your RFT model](rft-evaluate-model.md).