

# Customize a model with reinforcement fine-tuning in Amazon Bedrock
<a name="reinforcement-fine-tuning"></a>

Reinforcement fine-tuning is a model customization technique in Amazon Bedrock that improves foundation model performance by teaching models what constitutes a "good" response through feedback signals called rewards. Unlike traditional fine-tuning methods that depend on labeled datasets, reinforcement fine-tuning uses a feedback-driven approach that iteratively optimizes the model to maximize these rewards.

## Reinforcement fine-tuning applications and scenarios
<a name="reinforcement-fine-tuning-when"></a>

Use reinforcement fine-tuning when you can define clear, measurable success criteria for evaluating response quality. Reinforcement fine-tuning excels in domains where output quality can be objectively measured, especially when multiple valid responses exist or when optimal responses are difficult to define upfront. It's ideal for:
+ Mathematical problem-solving and code generation (using rule-based graders for objective evaluation)
+ Scientific reasoning and structured data analysis
+ Subjective tasks like instruction following, content moderation, and creative writing (using AI-based judges)
+ Tasks requiring step-by-step reasoning or multi-turn problem solving
+ Scenarios with multiple valid solutions where some are clearly better than others
+ Applications balancing multiple objectives (accuracy, efficiency, style)
+ Applications requiring iterative improvement, personalization, or adherence to complex business rules
+ Scenarios where success can be verified programmatically through execution results or performance metrics
+ Cases where collecting high-quality labeled examples is expensive or impractical

## Benefits of reinforcement fine-tuning
<a name="reinforcement-fine-tuning-benefits"></a>
+ **Improved model performance** – Reinforcement fine-tuning improves model accuracy by up to 66% on average compared to base models. This enables optimization for price and performance by fine-tuning smaller, faster, and more efficient model variants.
+ **Ease of use** – Amazon Bedrock automates the complexity of reinforcement fine-tuning, making it accessible to developers building AI applications. You can fine-tune models using your uploaded datasets or existing API invocation logs. You can define reward functions that grade model outputs with custom code using Lambda or model-as-a-judge grader, with built-in templates that help with quick setup.
+ **Security and compliance** – Your proprietary data never leaves AWS's secure, governed environment during the customization process.

## Supported models for reinforcement fine-tuning
<a name="rft-supported-models"></a>

The following table shows the foundation models that you can customize with reinforcement fine-tuning:


**Supported models for reinforcement fine-tuning**  

| Provider | Model | Model ID | Region name | Region | 
| --- | --- | --- | --- | --- | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0:256k |  US East (N. Virginia)  |  us-east-1  | 
| OpenAI | gpt-oss-20B | openai.gpt-oss-20b | US West (Oregon) | us-west-2 | 
| Qwen | Qwen3 32B | qwen.qwen3-32b | US West (Oregon) | us-west-2 | 

## How reinforcement fine-tuning works
<a name="rft-how-it-works"></a>

Amazon Bedrock fully automates the reinforcement fine-tuning workflow. The model receives prompts from your training dataset and generates several responses per prompt. These responses are then scored by a reward function. Amazon Bedrock uses the prompt-response pairs with scores to train the model through policy-based learning using Group Relative Policy Optimization (GRPO). The training loop continues until it reaches the end of your training data or you stop the job at a chosen checkpoint, producing a model optimized for the metric that matters to you.

## Reinforcement fine-tuning best practices
<a name="rft-best-practices"></a>
+ **Start small** – Begin with 100-200 examples, validate reward function correctness, and scale gradually based on results
+ **Pre fine-tuning evaluation** – Test baseline model performance before reinforcement fine-tuning. If rewards are consistently 0 percent, use supervised fine-tuning first to establish basic capabilities. If rewards are greater than 95 percent, reinforcement fine-tuning might be unnecessary
+ **Monitor training** – Track average reward scores and distribution. Watch for overfitting (training rewards increase while validation rewards decrease). Look for concerning patterns such as rewards plateauing below 0.15, increasing reward variance over time, and declining validation performance
+ **Optimize reward functions** – Execute within seconds (not minutes), minimize external API calls, use efficient algorithms, implement proper error handling, and take advantage of Lambda's parallel scaling
+ **Iteration strategy** – If rewards aren't improving, adjust reward function design, increase dataset diversity, add more representative examples, and verify reward signals are clear and consistent

**Topics**
+ [

## Reinforcement fine-tuning applications and scenarios
](#reinforcement-fine-tuning-when)
+ [

## Benefits of reinforcement fine-tuning
](#reinforcement-fine-tuning-benefits)
+ [

## Supported models for reinforcement fine-tuning
](#rft-supported-models)
+ [

## How reinforcement fine-tuning works
](#rft-how-it-works)
+ [

## Reinforcement fine-tuning best practices
](#rft-best-practices)
+ [

# Fine-tune Amazon Nova models with reinforcement fine-tuning
](rft-nova-models.md)
+ [

# Fine-tune open-weight models using OpenAI-compatible APIs
](fine-tuning-openai-apis.md)
+ [

# Evaluate your RFT model
](rft-evaluate-model.md)

# Fine-tune Amazon Nova models with reinforcement fine-tuning
<a name="rft-nova-models"></a>

Before fine-tuning, ensure you have the pre-requisites as Amazon Bedrock needs specific permissions to create and manage the fine-tuning process. For comprehensive security and permissions information, see [Access and security for Amazon Nova models](rft-access-security.md).

Run reinforcement fine-tuning for Amazon Nova models in 5 steps:

1. **Provide Training Dataset** – Upload prompts in required format (e.g., JSONL) as the reinforcement fine-tuning training dataset. For more information, see [Prepare data for Amazon Nova models](rft-prepare-data.md).

1. **Configure Reward Function (grader)** – Define a grader to score model responses based on correctness, structure, tone, or other objectives. The reward function can be executed using Lambda to compute objective scores. You can also choose a model as a judge (via console) and grade responses based on criteria and principles you configure (the console converts these into Lambda functions automatically). For more information, see [Setting up reward functions for Amazon Nova models](reward-functions.md).

1. **Submit reinforcement fine-tuning Job** – Launch the reinforcement fine-tuning job by specifying base model, dataset, reward function, and other optional settings such as hyperparameters. For more information, see [Create and manage fine-tuning jobs for Amazon Nova models](rft-submit-job.md).

1. **Monitor Training** – Track job status, reward metrics, and training progress until completion. For more information, see [Monitor your RFT training job](rft-submit-job.md#rft-monitor-job).

1. **Use Fine-Tuned Model** – After job completion, deploy the resulting RFT model with one click for on-demand inference. You can also use Provisioned Throughput for mission-critical workloads that require consistent performance. See [Set up inference for a custom model](model-customization-use.md). Use **Test in Playground** to evaluate and compare responses with the base model.

**Important**  
You can provide a maximum of 20K prompts to Amazon Bedrock for reinforcement fine-tuning the model.

## Supported Nova models
<a name="rft-nova-supported-models"></a>

The following table shows the Amazon Nova models that you can customize with reinforcement fine-tuning:

**Note**  
For information about additional supported models including open-weight models, see [Fine-tune open-weight models using OpenAI-compatible APIs](fine-tuning-openai-apis.md).


**Supported models for reinforcement fine-tuning**  

| Provider | Model | Model ID | Single-region model support | 
| --- | --- | --- | --- | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0:256k | us-east-1 | 

# Access and security for Amazon Nova models
<a name="rft-access-security"></a>

Before you begin reinforcement fine-tuning, make sure that you understand what kind of access Amazon Bedrock needs for RFT-specific operations. RFT requires additional permissions beyond standard fine-tuning due to its reward function execution capabilities.

For basic model customization security setup including trust relationships, Amazon S3 permissions, and KMS encryption, see [Create an IAM service role for model customization](custom-model-job-access-security.md#custom-model-job-service-role).

## Prerequisites
<a name="rft-access-prerequisites"></a>

Before adding RFT-specific IAM permissions, you must add the following IAM service roles:
+ [Trust relationship](custom-model-job-access-security.md#custom-model-job-service-role-trust-relationship)
+ [Permissions to access training and validation files and to write output files in S3](custom-model-job-access-security.md#custom-model-job-service-role-s3-permissions)

## RFT-specific IAM permissions
<a name="rft-iam-permissions"></a>

Add these permissions to your existing model customization service role for RFT functionality.

### Lambda permissions for reward functions
<a name="rft-lambda-permissions"></a>

You must add Lambda invocation permissions. The following shows an example policy you can use:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:reward-function-name"
            ]
        }
    ]
}
```

### Invocation log access
<a name="rft-api-log-permissions"></a>

To use existing Amazon Bedrock model invocation logs as training data, add permissions to access your Amazon S3 bucket where invocation logs are stored. 

You need to provide Amazon S3 bucket access permissions for the input bucket. The following shows an example policy you can use:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::s3-invocation-logs-bucket",
                "arn:aws:s3:::s3-invocation-logs-bucket/*"
            ]
        }
    ]
}
```

For security setup including basic IAM roles, Amazon S3 permissions, and encryption, see [Create an IAM service role for model customization](custom-model-job-access-security.md#custom-model-job-service-role).

## Grader Lambda function permissions for RLAIF
<a name="rft-grader-lambda-permissions"></a>

If you create your own Lambda function for Reinforcement Learning from AI Feedback (RLAIF) reward functions, you need to add specific permissions to the Lambda execution role.

### Bedrock permissions for LLM judges
<a name="rft-bedrock-permissions"></a>

For LLM-as-Judge reward functions (RLAIF), add permissions to invoke foundation models. The following shows an example policy you can use for your Lambda execution role.

**Note**  
Only add these permissions to your Lambda execution role if you create your own Lambda function. The console handles this automatically when creating Lambda functions through the console.

The following is an example for bedrock LLM as judge invocation using foundation models:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:foundation-model/*"
            ]
        }
    ]
}
```

The following is an example for bedrock LLM as judge invocation using inference profile:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-premier-v1:0",
                "arn:aws:bedrock:us-east-2::foundation-model/amazon.nova-premier-v1:0",
                "arn:aws:bedrock:us-west-2::foundation-model/amazon.nova-premier-v1:0"
            ],
            "Condition": {
                "StringLike": {
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-east-1:111122223333:inference-profile/us.amazon.nova-premier-v1:0"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1:111122223333:inference-profile/us.amazon.nova-premier-v1:0"
            ]
        }
    ]
}
```

For information about inference profile prerequisites, see [ Prerequisites for inference profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-prereq.html).

# Prepare data for Amazon Nova models
<a name="rft-prepare-data"></a>

When you fine-tune an Amazon Nova model with reinforcement fine-tuning, you can bring your own prompts or use existing Amazon Bedrock API invocation logs as training data.

## Training data requirements and sources
<a name="rft-data-source-options"></a>

You can provide training data through one of the following options:

**Note**  
We only support the OpenAI chat completion format.

### Option 1: Provide your own prompts
<a name="w2aac15c25c17c15b5b7b1"></a>

Collect your prompts and store them in `.jsonl` file format. You can upload custom datasets in JSONL format or select existing datasets from Amazon S3. Each record in the JSONL must use the OpenAI chat completion format in the following structure:
+ `messages`: In this field, include the user, system or assistant role containing the input prompt provided to the model.
+ `reference_answer`: In this field, it should contain the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limitedto structured outputs—it can contain any format that helps your reward function evaluate quality.
+ [Optional] You can add fields used by grader Lambda for grading.

**Requirements:**
+ JSONL format with prompts in OpenAI chat completion format (one prompt per line)
+ A minimum of 100 records in training dataset
+ Amazon Bedrock automatically validates training dataset format

------
#### [ Example: General question-answering ]

```
{
            "messages": [
                {
                    "role": "system", 
                    "content": "You are a helpful assistant"
                },
                {
                    role": "user", 
                    "content": "What is machine learning?"}
            ],
            "reference_answer": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed."
            }
```

------
#### [ Example: Math problem ]

```
{
  "id": "sample-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  }
}
```

------

### Option 2: Use invocation logs
<a name="w2aac15c25c17c15b5b7b3"></a>

When you create a reinforcement fine-tuning job, you can have Amazon Bedrock use existing invocation logs from your S3 bucket as training data. For Amazon Bedrock, an invocation log is a detailed record of model invocations.

You can use customer-side stored Invoke/Converse API invocation logs from Amazon S3 for training.

**Requirements:**
+ API logging must be enabled for your Amazon Bedrock usage
+ Logs must be in a supported format (Amazon Bedrock Invoke/Converse API)
+ A minimum of 100 prompt examples

To use invocation logs for reinforcement fine-tuning, set the model invocation logging on, use one of the model invocation operations, and make sure that you've set up an Amazon S3 bucket as the destination for the logs. For more information about setting up the invocation logs, see [Monitor model invocation using CloudWatch Logs and Amazon S3](https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html).

Before you can start the reinforcement fine-tuning job with invocation logs from an S3 bucket as input, you must provide Amazon Bedrock permissions to access the logs from an S3 Bucket. For more information, see [Model customization access and security](custom-model-job-access-security.md).

You can optionally add request metadata to the prompt-response pairs in the invocation log using one of the model invocation operations and then later use it to filter the logs. Amazon Bedrock can use the filtered logs to fine-tune the model.

#### Add request metadata to prompts and responses in your invocation logs
<a name="rft-request-metadata"></a>

With invocation logs, you can identify the training dataset prompts that you want Amazon Bedrock to use for reinforcement fine-tuning using request metadata attached to invocation logs. 

The following is an example of an invocation log from an `InvokeModel` call with the `requestMetadata`:

```
{
"schemaType": "ModelInvocationLog",
    "schemaVersion": "1.0",
    "timestamp": "2024-09-11T17:16:18Z",
    "accountId": "XXXXX",
    "identity": {
"arn": "arn:aws:sts::XXXXXXX:assumed-role/Admin/XXXXXX"
    },
    "region": "us-east-1",
    "requestId": "7ee514f3-9eff-467c-af75-8fd564ce58e5",
    "operation": "InvokeModel",
    "modelId": "meta.llama3-1-405b-instruct-v1:0",
    "input": {
"inputContentType": "application/json",
        "inputBodyJson": {
"prompt": "story of two dogs",
            "max_tokens_to_sample": 300
        },
        "inputTokenCount": 13
    },
    "output": {
"outputContentType": "application/json",
        "outputBodyJson": {
"type": "completion",
            "completion": " Here is a story about two dogs:\n\nRex and Charlie were best friends who lived next door to each other. Every morning, their owners would let them out into the backyard to play. Rex was a golden retriever with long, shaggy fur. He loved to run and chase sticks. Charlie was a corgi with short legs and a big smile. He was happy just sniffing the grass and rolling around. \n\nThe two dogs spent every day playing together. Rex would grab a stick and bring it over to Charlie, wagging his tail excitedly. Charlie would take the other end of the stick in his mouth and they'd have a friendly game of tug-of-war. After tiring themselves out, they'd flop down in the warm sunshine for a nap. \n\nAt lunchtime, their owners would call them in for food. Rex would gobble up his kibble hungrily while Charlie ate his at a more leisurely pace. After lunch, it was right back outside for more playtime. The afternoon sunbeams would light up their fur as they chased each other around the yard. \n\nWhen it started getting late, their owners called them in for the night. Rex and Charlie would head inside, tired but happy after another fun day of play. After slurping up fresh water from their bowls, they'd curl up on their beds, Rex's fluffy golden tail tucked over his nose and little",
            "stop_reason": "max_tokens",
            "stop": null
        },
        "outputTokenCount": 300
    },
    "requestMetadata": {
"project": "CustomerService",
        "intent": "ComplaintResolution",
        "priority": "High"
    }
}
```

You can specify the invocation log as your input data source when you start a reinforcement fine-tuning job. You can start a reinforcement fine-tuning job through the Amazon Bedrock console, using the API, AWS CLI, or SDK.

##### Requirements for providing request metadata
<a name="rft-metadata-requirements"></a>

The request metadata must meet the following requirements:
+ Provided in the JSON `key:value` format.
+ Key and value pair must be a string of 256 characters maximum.
+ Provide a maximum of 16 key-value pairs.

##### Using request metadata filters
<a name="rft-metadata-filters"></a>

Once invocation logs with request metadata are available, you can apply filters based on the request metadata to selectively choose which prompts to include for fine-tuning the model. For example, you might want to include only those with `"project": "CustomerService"` and `"priority": "High"` request metadata.

To filter the logs using multiple request metadata, use a single Boolean operator `AND` or `OR`. You cannot combine these operators. For single request metadata filtering, use the `Equals` or `Not Equals` operator.

## Characteristics of effective training data
<a name="rft-data-characteristics"></a>

Effective RFT training data requires three key characteristics:
+ **Clarity and consistency** – Use clear, unambiguous prompts with consistent formatting. Avoid contradictory labels, ambiguous instructions, or conflicting reference answers that mislead training.
+ **Diversity** – Include varied input formats, edge cases, and difficulty levels that reflect production usage patterns across different user types and scenarios.
+ **Efficient reward functions** – Design functions that execute quickly (seconds, not minutes), parallelize with AWS Lambda, and return consistent scores for cost-effective training.

## Additional properties
<a name="rft-additional-properties"></a>

The RFT data format supports custom fields beyond the core schema requirements (`messages` and `reference_answer`). This flexibility allows you to add any additional data your reward function needs for proper evaluation.

**Note**  
You don't need to configure this in your recipe. The data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the `metadata` field.

**Common additional properties**
+ `task_id` – Unique identifier for tracking
+ `difficulty_level` – Problem complexity indicator
+ `domain` – Subject area or category
+ `expected_reasoning_steps` – Number of steps in solution

These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.

**Examples with additional properties**

------
#### [ Chemistry problem ]

```
{
  "id": "chem-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful chemistry assistant"
    },
    {
      "role": "user",
      "content": "Predict hydrogen bond donors and acceptors for this SMILES: CCN(CC)CCC(=O)c1sc(N)nc1C"
    }
  ],
  "reference_answer": {
    "donor_bond_counts": 2,
    "acceptor_bond_counts": 4
  }
}
```

The `reference_answer` field contains the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limited to structured outputs—it can contain any format that helps your reward function evaluate quality.

------
#### [ Math problem with metadata ]

```
{
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  },
  "task_id": "algebra_001",
  "difficulty_level": "easy",
  "domain": "algebra",
  "expected_reasoning_steps": 3
}
```

------

# Setting up reward functions for Amazon Nova models
<a name="reward-functions"></a>

Reward functions evaluate response quality and provide feedback signals for model training. You can set up reward functions using custom Lambda functions or Amazon Bedrock-hosted foundation models as judges. Guided templates are available to simplify reward function creation for common tasks like instruction following and format validation. Choose the approach that matches your task requirements.

## Reinforcement Learning via Verifiable Rewards (RLVR)
<a name="rft-rlvr"></a>

RLVR optimizes models for objective tasks such as code generation or math reasoning using verifiable rule-based graders or ready-to-use templates.

You have two options for RLVR (Custom Code):

### Option 1: Use console-provided templates
<a name="w2aac15c25c17c17b5b7b1"></a>

Amazon Bedrock console provides sample templates for grader Lambda functions:
+ Mathematical reasoning with ground truth verification
+ Format validation and constraint checking
+ Generic grader Lambda template with boilerplate code

Follow the instructions in the provided template on the **Create RFT job** page in the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock).

### Option 2: Bring your own Lambda function
<a name="w2aac15c25c17c17b5b7b3"></a>

Create custom reward functions using your own Lambda ARN for complex logic, external APIs, multi-step calculations, or combining multiple evaluation criteria.

**Note**  
If you bring your own Lambda function, keep the following in mind:  
Increase the Lambda timeout from default 3 seconds to maximum 15 minutes for complex evaluations.
The Lambda execution role needs permissions to invoke models as described in [Access and security for Amazon Nova models](rft-access-security.md).

## Reinforcement Learning via AI Feedback (RLAIF)
<a name="rft-rlaif"></a>

RLAIF optimizes models for subjective tasks such as instruction following or chatbot interactions using AI-based judges with ready-to-use templates.

**For RLAIF (Model as Judge):**
+ Select an Amazon Bedrock hosted base Model as Judge
+ Configure instructions for evaluation
+ Define evaluation criteria and scoring guidelines

Available LLM-as-Judge prompt templates in the Amazon Bedrock console:
+ Instruction following (Judge model training)
+ Summarization (Multi-turn dialogs)
+ Reasoning evaluation (CoT for specialized domains)
+ RAG faithfulness (Context-grounded Q&A)

**Note**  
The console's **Model as Judge** option automatically converts your configuration into a Lambda function during training.

## Lambda function implementation details
<a name="rft-lambda-implementation"></a>

When implementing custom Lambda reward functions, your function must accept and return data in the following format.

------
#### [ Input structure ]

```
[{
  "id": "123",
  "messages": [
    {
      "role": "user",
      "content": "Do you have a dedicated security team?"
    },
    {
      "role": "assistant",
      "content": "As an AI developed by Amazon, I don not have a dedicated security team..."
    }
  ],
  "metadata": {
    "reference_answer": {
      "compliant": "No",
      "explanation": "As an AI developed by Company, I do not have a traditional security team..."
    },
    "my_key": "sample-001"
  }
}]
```

------
#### [ Output structure ]

```
[{
  "id": "123",
  "aggregate_reward_score": 0.85,
  "metrics_list": [
    {
      "name": "accuracy",
      "value": 0.9,
      "type": "Reward"
    },
    {
      "name": "policy_compliance",
      "value": 0.8,
      "type": "Metric"
    }
  ]
}]
```

------

**Design guidelines**
+ **Rank responses** – Give the best answer a clearly higher score
+ **Use consistent checks** – Evaluate task completion, format adherence, safety, and reasonable length
+ **Maintain stable scaling** – Keep scores normalized and non-exploitable

# Create and manage fine-tuning jobs for Amazon Nova models
<a name="rft-submit-job"></a>

You can create a reinforcement fine-tuning (RFT) job using the Amazon Bedrock console or API. The RFT job can take a few hours depending on the size of your training data, number of epochs, and complexity of your reward functions.

## Prerequisites
<a name="rft-prerequisites"></a>
+ Create an IAM service role with the required permissions. For comprehensive security and permissions information including RFT-specific permissions, see [Access and security for Amazon Nova models](rft-access-security.md).
+ (Optional) Encrypt input and output data, your RFT job, or inference requests made to custom models. For more information, see [ Encryption of custom models](https://docs.aws.amazon.com/bedrock/latest/userguide/encryption-custom-job.html).

## Create your RFT job
<a name="rft-submit-job-how-to"></a>

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

To submit an RFT job in the console, carry out the following steps:

1. Open the Amazon Bedrock console and navigate to **Custom models** under **Tune**.

1. Choose **Create**, then **Create reinforcement fine-tuning job**.

1. In the **Model details** section, choose **Amazon Nova 2 Lite** as your base model.

1. In the **Customization details** section, enter the customization name.

1. In the **Training data** section, choose your data source. Either select from your available invocation logs stored in Amazon S3, or select the Amazon S3 location of your training dataset file, or upload a file directly from your device.
**Note**  
Your training dataset should be in the OpenAI Chat Completions data format. If you provide invocation logs in the Amazon Bedrock invoke or converse format, Amazon Bedrock automatically converts them to the Chat Completions format.

1. In the **Reward function** section, set up your reward mechanism:
   + **Model as judge (RLAIF)** - Select a Bedrock hosted base model as judge and configure the instructions for evaluation. Use this for subjective tasks like content moderation.
**Note**  
The console's **Model as judge** option automatically converts your configuration into a Lambda function during training.
   + **Custom code (RLVR)** - Create custom reward functions using Python code executed through Lambda functions. Use this for objective tasks like code generation.

   For more information, see [Setting up reward functions for Amazon Nova models](reward-functions.md).

1. (Optional) In the **Hyperparameters** section, adjust training parameters or use default values.

1. In the **Output data** section, enter the Amazon S3 location where Amazon Bedrock should save job outputs.

1. In the **Role configuration** section, either choose an existing role from the dropdown list or enter a name for the service role to create.

1. (Optional) In the **Additional configuration** section, configure the validation data by pointing to an Amazon S3 bucket, KMS encryption settings, and job and model tags.

1. Choose **Create reinforcement fine-tuning job** to begin the job.

------
#### [ API ]

Send a CreateModelCustomizationJob request with `customizationType` set to `REINFORCEMENT_FINE_TUNING`.

**Required fields:** `roleArn`, `baseModelIdentifier`, `customModelName`, `jobName`, `trainingDataConfig`, `outputDataConfig`, `rftConfig`

**Example request:**

```
{
    "roleArn": "arn:aws:iam::123456789012:role/BedrockRFTRole",
    "baseModelIdentifier": "amazon.nova-2.0",
    "customModelName": "my-rft-model",
    "jobName": "my-rft-job",
    "customizationType": "REINFORCEMENT_FINE_TUNING",
    "trainingDataConfig": {
        "s3Uri": "s3://my-bucket/training-data.jsonl"
    },
    "customizationConfig": {
        "rftConfig" : {
            "graderConfig": {
                "lambdaGrader": {
                    "lambdaArn": "arn:aws:lambda:us-east-1:123456789012:function:function-name"
                }
            },
            "hyperParameters": {
                "batchSize": 64,
                "epochCount": 2,
                "evalInterval": 10,
                "inferenceMaxTokens": 8192,
                "learningRate": 0.00001,
                "maxPromptLength": 4096,
                "reasoningEffort": "high",
                "trainingSamplePerPrompt": 4
            }
        }
    },
    "outputDataConfig": {
        "s3Uri": "s3://my-bucket/rft-output/"
    }
}
```

**Python API sample request:**

```
import boto3

bedrock = boto3.client(service_name='bedrock')
    
# Set parameters
customizationType = "REINFORCEMENT_FINE_TUNING"
baseModelIdentifier = "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-lite-v1:0:256k"
roleArn = "${your-customization-role-arn}"
jobName = "MyFineTuningJob"
customModelName = "MyCustomModel"

customizationConfig = {
    'rftConfig' : {
        'graderConfig': {
            'lambdaGrader': {
                'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:function-name'
            }
        },
        'hyperParameters': {
            'batchSize': 64,
            'epochCount': 2,
            'evalInterval': 10,
            'inferenceMaxTokens': 8192,
            'learningRate':0.00001,
            'maxPromptLength': 4096,
            'reasoningEffort': 'high',
            'trainingSamplePerPrompt':4
        }
    }
}

trainingDataConfig = {"s3Uri": "s3://${training-bucket}/myInputData/train.jsonl"}
outputDataConfig = {"s3Uri": "s3://${output-bucket}/myOutputData"}

# Create job
response_ft = bedrock.create_model_customization_job(
    jobName=jobName, 
    customModelName=customModelName,
    roleArn=roleArn,
    baseModelIdentifier=baseModelIdentifier,
    customizationConfig=customizationConfig,
    trainingDataConfig=trainingDataConfig,
    outputDataConfig=outputDataConfig,
    customizationType=customizationType
)

jobArn = response_ft['jobArn']
```

------

## Monitor your RFT training job
<a name="rft-monitor-job"></a>

Amazon Bedrock provides real-time monitoring with visual graphs and metrics during RFT training. These metrics help you understand whether the model converges properly and if the reward function effectively guides the learning process.

### Job status tracking
<a name="rft-job-status"></a>

You can monitor your RFT job status through the validation and training phases in the Amazon Bedrock console.

**Completion indicators:**
+ Job status changes to **Completed** when training completes successfully
+ Custom model ARN becomes available for deployment
+ Training metrics reach convergence thresholds

### Real-time training metrics
<a name="rft-real-time-metrics"></a>

Amazon Bedrock provides real-time monitoring during RFT training with visual graphs displaying training and validation metrics.

#### Core training metrics
<a name="rft-core-metrics"></a>
+ **Training loss** - Measures how well the model is learning from the training data
+ **Training reward statistics** - Shows reward scores assigned by your reward functions
+ **Reward margin** - Measures the difference between good and bad response rewards
+ **Accuracy on training and validation sets** - Shows model performance on both the training and held-out data

**Detailed metric categories**
+ **Reward metrics** – `critic/rewards/mean`, `critic/rewards/max`, `critic/rewards/min` (reward distribution), and `val-score/rewards/mean@1` (validation rewards)
+ **Model behavior** – `actor/entropy` (policy variation; higher equals more exploratory)
+ **Training health** – `actor/pg_loss` (policy gradient loss), `actor/pg_clipfrac` (frequency of clipped updates), and `actor/grad_norm` (gradient magnitude)
+ **Response characteristics** – `prompt_length/mean`, `prompt_length/max`, `prompt_length/min` (input token statistics), `response_length/mean`, `response_length/max`, `response_length/min` (output token statistics), and `response/aborted_ratio` (incomplete generation rate; 0 equals all completed)
+ **Performance** – `perf/throughput` (training throughput), `perf/time_per_step` (time per training step), and `timing_per_token_ms/*` (per-token processing times)
+ **Resource usage** – `perf/max_memory_allocated_gb`, `perf/max_memory_reserved_gb` (GPU memory), and `perf/cpu_memory_used_gb` (CPU memory)

#### Training progress visualization
<a name="rft-progress-visualization"></a>

The console displays interactive graphs that update in real-time as your RFT job progresses. These visualizations can help you:
+ Track convergence toward optimal performance
+ Identify potential training issues early
+ Determine optimal stopping points
+ Compare performance across different epochs

## Set up inference
<a name="rft-setup-inference"></a>

After job completion, deploy the RFT model for on-demand inference or use Provisioned Throughput for consistent performance. For setting up inference, see [Set up inference for a custom model](model-customization-use.md).

Use **Test in Playground** to evaluate and compare responses with the base model. For evaluating your completed RFT model, see [Evaluate your RFT model](rft-evaluate-model.md).

# Fine-tune open-weight models using OpenAI-compatible APIs
<a name="fine-tuning-openai-apis"></a>

Amazon Bedrock provides OpenAI compatible API endpoints for fine-tuning foundation models. These endpoints allow you to use familiar OpenAI SDKs and tools to create, monitor, and manage fine-tuning jobs with Amazon Bedrock models. This page highlights using these APIs for reinforcement fine-tuning.

## Key capabilities
<a name="fine-tuning-openai-key-capabilities"></a>
+ **Upload training files** – Use the Files API to upload and manage training data for fine-tuning jobs
+ **Create fine-tuning jobs** – Start fine-tuning jobs with custom training data and reward functions
+ **List and retrieve jobs** – View all fine-tuning jobs and get detailed information about specific jobs
+ **Monitor job events** – Track fine-tuning progress through detailed event logs
+ **Access checkpoints** – Retrieve intermediate model checkpoints created during training
+ **Immediate inference** – After fine-tuning completes, use the resulting fine-tuned model for on-demand inference through Amazon Bedrock's OpenAI-compatible APIs (Responses/chat completions API) without additional deployment steps
+ **Easy migration** – Compatible with existing OpenAI SDK codebases

## Reinforcement fine-tuning workflow for open-weight models
<a name="fine-tuning-openai-workflow"></a>

Before fine-tuning, ensure you have the pre-requisites as Amazon Bedrock needs specific permissions to create and manage the fine-tuning process. For comprehensive security and permissions information, see [Access and security for open-weight models](rft-open-weight-access-security.md).

Run reinforcement fine-tuning for open-weight models in 5 steps:

1. **Upload Training Dataset** – Use the Files API to upload prompts in required format (e.g., JSONL) with purpose "fine-tune" as the reinforcement fine-tuning training dataset. For more information, see [Prepare data for open-weight models](rft-prepare-data-open-weight.md).

1. **Configure Reward Function** – Define a grader to score model responses based on correctness, structure, tone, or other objectives using Lambda functions. For more information, see [Setting up reward functions for open-weight models](reward-functions-open-weight.md).

1. **Create Fine-tuning Job** – Launch the reinforcement fine-tuning job using the OpenAI-compatible API by specifying base model, dataset, reward function, and other optional settings such as hyperparameters. For more information, see [Create fine-tuning job](fine-tuning-openai-job-create.md#fine-tuning-openai-create-job).

1. **Monitor Training Progress** – Track job status, events, and training metrics using the fine-tuning jobs APIs. For more information, see [List fine-tuning events](fine-tuning-openai-job-create.md#fine-tuning-openai-list-events). Access intermediate model checkpoints to evaluate performance at different training stages, see [List fine-tuning checkpoints](fine-tuning-openai-job-create.md#fine-tuning-openai-list-checkpoints).

1. **Run Inference** – Use the fine-tuned model ID directly for inference through Amazon Bedrock's OpenAI-compatible Responses or Chat Completions APIs. For more information, see [Run inference with fine-tuned model](fine-tuning-openai-job-create.md#fine-tuning-openai-inference).

## Supported regions and endpoints
<a name="fine-tuning-openai-supported-regions"></a>

The following table shows the foundation models and regions that support OpenAI compatible fine-tuning APIs:


**Supported models and regions for OpenAI compatible fine-tuning APIs**  

| Provider | Model | Model ID | Region name | Region | Endpoint | 
| --- | --- | --- | --- | --- | --- | 
| OpenAI | Gpt-oss-20B | openai.gpt-oss-20b | US West (Oregon) | us-west-2 | bedrock-mantle.us-west-2.api.aws | 
| Qwen | Qwen3 32B | qwen.qwen3-32b | US West (Oregon) | us-west-2 | bedrock-mantle.us-west-2.api.aws | 

# Access and security for open-weight models
<a name="rft-open-weight-access-security"></a>

Before you begin reinforcement fine-tuning (RFT), make sure that you understand what kind of access Amazon Bedrock needs for RFT-specific operations. RFT requires additional permissions beyond standard fine-tuning due to its reward function execution capabilities.

## Prerequisites
<a name="fine-tuning-openai-prereq"></a>

Before using Amazon Bedrock's OpenAI-compatible fine-tuning APIs, ensure you have the following:

1. An AWS account with appropriate permissions to access Amazon Bedrock

1. **Authentication** – You can authenticate using:
   + Amazon Bedrock API key (required for OpenAI SDK and available for HTTP requests)
   + AWS credentials (supported for HTTP requests)
**Note**  
If you are using Amazon Bedrock short-term/long-term API keys, then make sure that your role has access to the following IAM policy permissions: `AmazonBedrockMantleFullAccess` and [AWSLambdaRole](https://docs.aws.amazon.com/bedrock/latest/ug/rft-open-weight-access-security#openai-fine-tuning-lambda-permissions).

1. **OpenAI SDK (optional)** – Install the OpenAI Python SDK if using SDK-based requests.

1. **Environment variables** – Set the following environment variables:
   + `OPENAI_API_KEY` – Set to your Amazon Bedrock API key
   + `OPENAI_BASE_URL` – Set to the Amazon Bedrock endpoint for your region (for example, `https://bedrock-mantle.us-west-2.api.aws/v1`)

   For more information, see [Responses API](bedrock-mantle.md#bedrock-mantle-responses).

1. **Training data** formatted as JSONL files with the purpose `fine-tune`. For more information, see [Prepare data for open-weight models](rft-prepare-data-open-weight.md).

## Lambda permissions for reward functions
<a name="openai-fine-tuning-lambda-permissions"></a>

You must add Lambda invocation permissions. The following shows an example policy you can use:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:reward-function-name"
            ]
        }
    ]
}
```

You can also use Amazon Bedrock hosted models as Judges for setting up reward functions. You will need to add specific permissions to invoke foundation models to the Lambda execution role. In your lambda role, you can configure these managed policies for LLMs for grading. See [AmazonBedrockLimitedAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonBedrockLimitedAccess.html).

The following is an example for invoking Amazon Bedrock foundation models as judge using the Invoke API:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:foundation-model/*"
            ]
        }
    ]
}
```

# Prepare data for open-weight models
<a name="rft-prepare-data-open-weight"></a>

When you fine-tune open-weight models with reinforcement fine-tuning using OpenAI-compatible APIs, provide training data by bringing your own prompts in JSONL format with the purpose `fine-tune`.

## Training data format and requirements
<a name="rft-data-format-open-weight"></a>

Training data must follow the OpenAI chat completions format with 100-20K examples. Each training example contains:
+ `messages`: In this field, include the user, system or assistant role containing the input prompt provided to the model.
+ `reference_answer`: In this field, it should contain the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limitedto structured outputs—it can contain any format that helps your reward function evaluate quality.
+ [Optional] You can add fields used by grader Lambda for grading.

**Requirements:**
+ JSONL format with prompts in OpenAI chat completion format (one prompt per line)
+ Purpose must be set to `fine-tune`
+ A minimum of 100 records in training dataset
+ Amazon Bedrock automatically validates training dataset format

------
#### [ Example: General question-answering ]

```
{
            "messages": [
                {
                    "role": "system", 
                    "content": "You are a helpful assistant"
                },
                {
                    role": "user", 
                    "content": "What is machine learning?"}
            ],
            "reference_answer": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed."
            }
```

------
#### [ Example: Math problem ]

```
{
  "id": "sample-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  }
}
```

------

## Files API
<a name="fine-tuning-openai-files-api"></a>

You can use OpenAI-compatible files api to upload your training data for fine-tuning jobs. Files are stored securely in Amazon Bedrock, and are used when creating fine-tuning jobs. For complete API details, see the [OpenAI Files documentation](https://platform.openai.com/docs/api-reference/files).

### Upload training file
<a name="fine-tuning-openai-upload-file"></a>

To upload a training file, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Upload training file
with open(TRAINING_FILE_PATH, 'rb') as f:
    file_response = client.files.create(
        file=f,
        purpose='fine-tune'
    )

# Store file ID for next steps
training_file_id = file_response.id
print(f"✅ Training file uploaded successfully: {training_file_id}")
```

------
#### [ HTTP request ]

Make a POST request to `/v1/files`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F purpose="fine-tune" \
  -F file="@training_data.jsonl"
```

------

### Retrieve file details
<a name="fine-tuning-openai-retrieve-file"></a>

To retrieve details about a specific file, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Retrieve file details
file_details = client.files.retrieve(training_file_id)

# Print raw response
print(json.dumps(file_details.model_dump(), indent=2))
```

------
#### [ HTTP request ]

Make a GET request to `/v1/files/{file_id}`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/files/file-abc123 \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

### List files
<a name="fine-tuning-openai-list-files"></a>

To list uploaded files, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# List files
files_response = client.files.list(purpose='fine-tune')

# Print raw response
print(json.dumps(files_response.model_dump(), indent=2))
```

------
#### [ HTTP request ]

Make a GET request to `/v1/files`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/files?purpose=fine-tune \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

### Delete file
<a name="fine-tuning-openai-delete-file"></a>

To delete a file, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Delete file
delete_response = client.files.delete(training_file_id)
```

------
#### [ HTTP request ]

Make a DELETE request to `/v1/files/{file_id}`:

```
curl -X DELETE https://bedrock-mantle.us-west-2.api.aws/v1/files/file-abc123 \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

## Characteristics of effective training data
<a name="rft-data-characteristics-open-weight"></a>

Effective RFT training data requires three key characteristics:
+ **Clarity and consistency** – Use clear, unambiguous prompts with consistent formatting. Avoid contradictory labels, ambiguous instructions, or conflicting reference answers that mislead training.
+ **Diversity** – Include varied input formats, edge cases, and difficulty levels that reflect production usage patterns across different user types and scenarios.
+ **Efficient reward functions** – Design functions that execute quickly (seconds, not minutes), parallelize with AWS Lambda, and return consistent scores for cost-effective training.

## Additional properties
<a name="rft-additional-properties-open-weight"></a>

The RFT data format supports custom fields beyond the core schema requirements (`messages` and `reference_answer`). This flexibility allows you to add any additional data your reward function needs for proper evaluation.

**Note**  
You don't need to configure this in your recipe. The data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the `metadata` field.

**Common additional properties**
+ `task_id` – Unique identifier for tracking
+ `difficulty_level` – Problem complexity indicator
+ `domain` – Subject area or category
+ `expected_reasoning_steps` – Number of steps in solution

These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.

**Examples with additional properties**

------
#### [ Chemistry problem ]

```
{
  "id": "chem-001",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful chemistry assistant"
    },
    {
      "role": "user",
      "content": "Predict hydrogen bond donors and acceptors for this SMILES: CCN(CC)CCC(=O)c1sc(N)nc1C"
    }
  ],
  "reference_answer": {
    "donor_bond_counts": 2,
    "acceptor_bond_counts": 4
  }
}
```

The `reference_answer` field contains the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limited to structured outputs—it can contain any format that helps your reward function evaluate quality.

------
#### [ Math problem with metadata ]

```
{
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  },
  "task_id": "algebra_001",
  "difficulty_level": "easy",
  "domain": "algebra",
  "expected_reasoning_steps": 3
}
```

------

# Setting up reward functions for open-weight models
<a name="reward-functions-open-weight"></a>

Reward functions evaluate response quality and provide feedback signals for model training. You can set up reward functions using custom Lambda functions. Choose the approach that matches your task requirements.

## Custom Lambda functions for reward evaluation
<a name="rft-custom-lambda-functions"></a>

You can set up reward functions using custom Lambda functions. Within your Lambda function, you have flexibility in how you implement the evaluation logic:
+ **Objective tasks** – For objective tasks like code generation or math reasoning, use verifiable rule-based graders that check correctness against known standards or test cases.
+ **Subjective tasks** – For subjective tasks like instruction following or chatbot interactions, call Amazon Bedrock foundation models as judges within your Lambda function to evaluate response quality based on your criteria.

Your Lambda function can implement complex logic, integrate external APIs, perform multi-step calculations, or combine multiple evaluation criteria depending on your task requirements.

**Note**  
When using custom Lambda functions:  
Increase the Lambda timeout from default 3 seconds to maximum 15 minutes for complex evaluations.
The Lambda execution role needs permissions to invoke the Lambda function as described in [Lambda permissions for reward functions](rft-open-weight-access-security.md#openai-fine-tuning-lambda-permissions).

## Lambda function implementation details
<a name="rft-lambda-implementation-open-weight"></a>

When implementing custom Lambda reward functions, your function must accept and return data in the following format.

------
#### [ Input structure ]

```
[{
  "id": "123",
  "messages": [
    {
      "role": "user",
      "content": "Do you have a dedicated security team?"
    },
    {
      "role": "assistant",
      "content": "As an AI developed by Amazon, I don not have a dedicated security team..."
    }
  ],
  "metadata": {
    "reference_answer": {
      "compliant": "No",
      "explanation": "As an AI developed by Company, I do not have a traditional security team..."
    },
    "my_key": "sample-001"
  }
}]
```

------
#### [ Output structure ]

```
[{
  "id": "123",
  "aggregate_reward_score": 0.85,
  "metrics_list": [
    {
      "name": "accuracy",
      "value": 0.9,
      "type": "Reward"
    },
    {
      "name": "policy_compliance",
      "value": 0.8,
      "type": "Metric"
    }
  ]
}]
```

------

**Design guidelines**
+ **Rank responses** – Give the best answer a clearly higher score
+ **Use consistent checks** – Evaluate task completion, format adherence, safety, and reasonable length
+ **Maintain stable scaling** – Keep scores normalized and non-exploitable

# Create and manage fine-tuning jobs for open-weight models using OpenAI APIs
<a name="fine-tuning-openai-job-create"></a>

The OpenAI-compatible fine-tuning job APIs allow you to create, monitor, and manage fine-tuning jobs. This page highlights using these APIs for reinforcement fine-tuning. For complete API details, see the [OpenAI Fine-tuning documentation](https://platform.openai.com/docs/api-reference/fine-tuning).

## Create fine-tuning job
<a name="fine-tuning-openai-create-job"></a>

Creates a fine-tuning job that begins the process of creating a new model from a given dataset. For complete API details, see the [OpenAI Create fine-tuning jobs documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/methods/create).

### Examples
<a name="fine-tuning-openai-create-job-examples"></a>

To create a fine-tuning job with RFT method, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Create fine-tuning job with RFT method
job_response = client.fine_tuning.jobs.create(
    model=MODEL_ID,
    training_file=training_file_id,
    # Suffix field is not supported so commenting for now.
    # suffix="rft-example",  # Optional: suffix for fine-tuned model name
    extra_body={
        "method": {
            "type": "reinforcement", 
            "reinforcement": {
                "grader": {
                    "type": "lambda",
                    "lambda": {
                        "function": "arn:aws:lambda:us-west-2:123456789012:function:my-reward-function"  # Replace with your Lambda ARN
                    }
                },
                "hyperparameters": {
                    "n_epochs": 1,  # Number of training epochs
                    "batch_size": 4,  # Batch size
                    "learning_rate_multiplier": 1.0  # Learning rate multiplier
                }
            }
        }
    }
)

# Store job ID for next steps
job_id = job_response.id
print({job_id})
```

------
#### [ HTTP request ]

Make a POST request to `/v1/fine_tuning/jobs`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "training_file": "file-abc123",
    "model": "gpt-4o-mini",
    "method": {
      "type": "reinforcement",
      "reinforcement": {
        "grader": {
          "type": "lambda",
          "lambda": {
            "function": "arn:aws:lambda:us-west-2:123456789012:function:my-grader"
          }
        },
        "hyperparameters": {
          "n_epochs": 1,
          "batch_size": 4,
          "learning_rate_multiplier": 1.0
        }
      }
    }
  }'
```

------

## List fine-tuning events
<a name="fine-tuning-openai-list-events"></a>

Lists events for a fine-tuning job. Fine-tuning events provide detailed information about the progress of your job, including training metrics, checkpoint creation, and error messages. For complete API details, see the [OpenAI List fine-tuning events documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/methods/list_events).

### Examples
<a name="fine-tuning-openai-list-events-examples"></a>

To list fine-tuning events, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# List fine-tuning events
events = client.fine_tuning.jobs.list_events(
    fine_tuning_job_id="ftjob-abc123",
    limit=50
)

for event in events.data:
    print(f"[{event.created_at}] {event.level}: {event.message}")
    if event.data:
        print(f"  Metrics: {event.data}")
```

------
#### [ HTTP request ]

Make a GET request to `/v1/fine_tuning/jobs/{fine_tuning_job_id}/events`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123/events?limit=50
```

------

Events include information such as:
+ Training started and completed messages
+ Checkpoint creation notifications
+ Training metrics (loss, accuracy) at each step
+ Error messages if the job fails

To paginate through all events, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Paginate through all events
all_events = []
after = None

while True:
    events = client.fine_tuning.jobs.list_events(
        fine_tuning_job_id="ftjob-abc123",
        limit=100,
        after=after
    )
    
    all_events.extend(events.data)
    
    if not events.has_more:
        break
    
    after = events.data[-1].id
```

------
#### [ HTTP request ]

Make multiple GET requests with the `after` parameter:

```
# First request
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123/events?limit=100

# Subsequent requests with 'after' parameter
curl "https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123/events?limit=100&after=ft-event-abc123"
```

------

## Retrieve fine-tuning job
<a name="fine-tuning-openai-retrieve-job"></a>

Get detailed information about a fine-tuning job. For complete API details, see the [OpenAI Retrieve fine-tuning job documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/methods/retrieve).

### Examples
<a name="fine-tuning-openai-retrieve-job-examples"></a>

To retrieve specific job details, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Retrieve specific job details
job_details = client.fine_tuning.jobs.retrieve(job_id)

# Print raw response
print(json.dumps(job_details.model_dump(), indent=2))
```

------
#### [ HTTP request ]

Make a GET request to `/v1/fine_tuning/jobs/{fine_tuning_job_id}`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123 \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

## List fine-tuning jobs
<a name="fine-tuning-openai-list-jobs"></a>

Lists your organization's fine-tuning jobs with pagination support. For complete API details, see the [OpenAI List fine-tuning jobs documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/methods/list).

### Examples
<a name="fine-tuning-openai-list-jobs-examples"></a>

To list fine-tuning jobs with limit and pagination, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# List fine-tuning jobs with limit and pagination
response = client.fine_tuning.jobs.list(
    limit=20  # Maximum number of jobs to return
)

# Print raw response
print(json.dumps(response.model_dump(), indent=2))
```

------
#### [ HTTP request ]

Make a GET request to `/v1/fine_tuning/jobs`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs?limit=20 \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

## Cancel fine-tuning job
<a name="fine-tuning-openai-cancel-job"></a>

Cancels a fine-tuning job that is in progress. Once cancelled, the job cannot be resumed. For complete API details, see the [OpenAI Cancel fine-tuning job documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/methods/cancel).

### Examples
<a name="fine-tuning-openai-cancel-job-examples"></a>

To cancel a fine-tuning job, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Cancel fine-tuning job
cancel_response = client.fine_tuning.jobs.cancel("ftjob-abc123")

print(f"Job ID: {cancel_response.id}")
print(f"Status: {cancel_response.status}")  # Should be "cancelled"
```

------
#### [ HTTP request ]

Make a POST request to `/v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel`:

```
curl -X POST https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123/cancel \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

## List fine-tuning checkpoints
<a name="fine-tuning-openai-list-checkpoints"></a>

Lists checkpoints for a fine-tuning job. Checkpoints are intermediate model snapshots created during fine-tuning that can be used for inference to evaluate performance at different training stages. For more information, see the [OpenAI List fine-tuning checkpoints documentation](https://developers.openai.com/api/reference/resources/fine_tuning/subresources/jobs/subresources/checkpoints/methods/list).

### Examples
<a name="fine-tuning-openai-list-checkpoints-examples"></a>

To list checkpoints for a fine-tuning job, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# List checkpoints for a fine-tuning job
checkpoints = client.fine_tuning.jobs.checkpoints.list(
    fine_tuning_job_id="ftjob-abc123",
    limit=10
)

for checkpoint in checkpoints.data:
    print(f"Checkpoint ID: {checkpoint.id}")
    print(f"Step: {checkpoint.step_number}")
    print(f"Model: {checkpoint.fine_tuned_model_checkpoint}")
    print(f"Metrics: {checkpoint.metrics}")
    print("---")
```

------
#### [ HTTP request ]

Make a GET request to `/v1/fine_tuning/jobs/{fine_tuning_job_id}/checkpoints`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/fine_tuning/jobs/ftjob-abc123/checkpoints?limit=10
```

------

Each checkpoint includes:
+ **Checkpoint ID** – Unique identifier for the checkpoint
+ **Step number** – Training step at which the checkpoint was created
+ **Model checkpoint** – Model identifier that can be used for inference
+ **Metrics** – Validation loss and accuracy at this checkpoint

To use a checkpoint model for inference, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Test inference with a checkpoint
response = client.chat.completions.create(
    model=checkpoint.fine_tuned_model_checkpoint,
    messages=[{"role": "user", "content": "What is AI?"}],
    max_tokens=100
)

print(response.choices[0].message.content)
```

------
#### [ HTTP request ]

Make a POST request to `/v1/chat/completions`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ft:gpt-4o-mini:openai:custom:7p4lURel:ckpt-step-1000",
    "messages": [{"role": "user", "content": "What is AI?"}],
    "max_tokens": 100
  }'
```

------

## Run inference with fine-tuned model
<a name="fine-tuning-openai-inference"></a>

Once your fine-tuning job is complete, you can use the fine-tuned model for inference through the Responses API or Chat Completions API. For complete API details, see [Generate responses using OpenAI APIs](bedrock-mantle.md).

### Responses API
<a name="fine-tuning-openai-responses-api"></a>

Use the Responses API for single-turn text generation with your fine-tuned model:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Get the fine-tuned model ID
job_details = client.fine_tuning.jobs.retrieve("ftjob-abc123")

if job_details.status == 'succeeded' and job_details.fine_tuned_model:
    fine_tuned_model = job_details.fine_tuned_model
    print(f"Using fine-tuned model: {fine_tuned_model}")
    
    # Run inference with Responses API
    response = client.completions.create(
        model=fine_tuned_model,
        prompt="What is the capital of France?",
        max_tokens=100,
        temperature=0.7
    )
    
    print(f"Response: {response.choices[0].text}")
else:
    print(f"Job status: {job_details.status}")
    print("Job must be in 'succeeded' status to run inference")
```

------
#### [ HTTP request ]

Make a POST request to `/v1/completions`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "ft:gpt-4o-mini:openai:custom-model:7p4lURel",
    "prompt": "What is the capital of France?",
    "max_tokens": 100,
    "temperature": 0.7
  }'
```

------

### Chat Completions API
<a name="fine-tuning-openai-inference-examples"></a>

Use the Chat Completions API for conversational interactions with your fine-tuned model:

------
#### [ OpenAI SDK (Python) ]

```
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables
from openai import OpenAI
client = OpenAI()

# Get the fine-tuned model ID
job_details = client.fine_tuning.jobs.retrieve("ftjob-abc123")

if job_details.status == 'succeeded' and job_details.fine_tuned_model:
    fine_tuned_model = job_details.fine_tuned_model
    print(f"Using fine-tuned model: {fine_tuned_model}")
    
    # Run inference
    inference_response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[
            {"role": "user", "content": "What is the capital of France?"}
        ],
        max_tokens=100
    )
    
    print(f"Response: {inference_response.choices[0].message.content}")
else:
    print(f"Job status: {job_details.status}")
    print("Job must be in 'succeeded' status to run inference")
```

------
#### [ HTTP request ]

Make a POST request to `/v1/chat/completions`:

```
curl https://bedrock-mantle.us-west-2.api.aws/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "ft:gpt-4o-mini:openai:custom-model:7p4lURel",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 100
  }'
```

------

# Evaluate your RFT model
<a name="rft-evaluate-model"></a>

After your reinforcement fine-tuning job completes successfully, you can evaluate your custom model's performance using multiple assessment methods. Amazon Bedrock provides built-in evaluation tools to help you compare your RFT model against the base model and validate improvements.

**Topics**
+ [

## Evaluation methods
](#rft-evaluation-methods)
+ [

## Setting up inference for evaluation
](#rft-setup-inference-evaluation)
+ [

## Evaluation best practices
](#rft-evaluation-best-practices)

## Evaluation methods
<a name="rft-evaluation-methods"></a>

Amazon Bedrock offers several ways to assess your RFT model performance.

### Validation metrics
<a name="rft-validation-metrics"></a>

If you upload a validation dataset, you will see two additional graphs in training metrics.
+ **Validation rewards** - Shows how well your model generalizes beyond training examples. Lower scores than training rewards are normal and expected.
+ **Validation episode lengths** - Average response length on unseen validation data. Shows how efficiently your model responds to new inputs compared to the training examples.

### Test in Playground
<a name="rft-test-playground"></a>

Use the Test in Playground feature for quick, ad-hoc evaluations. To use the Test in Playground feature, inference needs to be set up. For more information, see [Setting up inference for evaluation](#rft-setup-inference-evaluation).

This interactive tool allows you to:
+ Test prompts directly with your RFT model
+ Compare responses side-by-side between your custom model and the base model
+ Evaluate response quality improvements in real-time
+ Experiment with different prompts to assess model capabilities

### Bedrock Model Evaluation
<a name="rft-model-evaluation"></a>

Use Amazon Bedrock's Model Evaluation to assess your RFT model using your own datasets. This provides comprehensive performance analysis with standardized metrics and benchmarks. Here are some examples of the Amazon Bedrock Model Evaluation benefits.
+ Systematic evaluation using custom test datasets
+ Quantitative performance comparisons
+ Standardized metrics for consistent assessment
+ Integration with existing Amazon Bedrock evaluation workflows

## Setting up inference for evaluation
<a name="rft-setup-inference-evaluation"></a>

Before evaluating your RFT model, set up inference using one of these options:

### On-demand inference
<a name="rft-on-demand-inference"></a>

Create a custom model on-demand deployment for flexible, pay-per-use evaluation. This option includes token-based pricing that charges based on the number of tokens processed during inference.

## Evaluation best practices
<a name="rft-evaluation-best-practices"></a>
+ **Compare systematically** - Always evaluate your RFT model against the base model using the same test prompts and evaluation criteria.
+ **Use diverse test cases** - Include various prompt types and scenarios that represent your real-world use cases.
+ **Validate reward alignment** - Ensure your model improvements align with the reward functions used during training.
+ **Test edge cases** - Evaluate model behavior on challenging or unusual inputs to assess robustness.
+ **Monitor response consistency** - Check that your model provides consistent quality across multiple runs with similar prompts.