

# Training for Amazon Nova models
<a name="smtj-training"></a>

Training Amazon Nova models on SageMaker Training Jobs supports Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each technique serves different customization needs and can be applied to different Amazon Nova model versions.

**Topics**
+ [Fine-tune Nova 2.0](nova-fine-tune-2.md)
+ [Reinforcement Fine-Tuning (RFT) with Amazon Nova models](nova-reinforcement-fine-tuning.md)

# Fine-tune Nova 2.0
<a name="nova-fine-tune-2"></a>

## Prerequisites
<a name="nova-model-training-jobs-prerequisites2"></a>

Before you start a training job, note the following.
+ Amazon S3 buckets to store your input data and output of training jobs. You can either use one bucket for both or separate buckets for each type of the data. Make sure your buckets are in the same AWS Region where you create all the other resources for training. For more information, see [Creating a general purpose bucket](https://docs.aws.amazon.com//AmazonS3/latest/userguide/create-bucket-overview.html).
+ An IAM role with permissions to run a training job. Make sure you attach an IAM policy with `AmazonSageMakerFullAccess`. For more information, see [How to use SageMaker AI execution roles](https://docs.aws.amazon.com//sagemaker/latest/dg/sagemaker-roles.html).
+ Base Amazon Nova recipes, see [Getting Amazon Nova recipes](nova-model-recipes.md#nova-model-get-recipes).

## What is SFT?
<a name="nova-2-what-is-sft"></a>

Supervised fine-tuning (SFT) trains a language model using labeled input-output pairs. The model learns from demonstration examples consisting of prompts and responses, refining its capabilities to align with specific tasks, instructions, or desired behaviors.

## Data preparation
<a name="nova-2-data-preparation"></a>

### Overview
<a name="nova-2-data-overview"></a>

Nova 2.0 SFT data uses the same Converse API format as Nova 1.0, with the addition of optional reasoning content fields. For complete format specifications, see:
+ Reasoning content: [ReasoningContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ReasoningContentBlock.html)
+ Converse API schema: [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html)
+ Dataset constraints: [Dataset constraints](https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-prepare-data-understanding.html)

### Supported features
<a name="nova-2-supported-features"></a>
+ **Input types** – Text, image, or video in user content blocks
+ **Assistant content** – Text-only responses and reasoning content
+ **Dataset composition** – Must be homogeneous. Choose one of:
  + Text-only turns
  + Text \$1 image turns
  + Text \$1 video turns (supports document understanding)

**Important**  
You cannot mix images and videos within the same dataset or across different turns.

### Current limitations
<a name="nova-2-current-limitations"></a>
+ **Multimodal reasoning content** – Although the Converse format supports image-based reasoning content, Nova 2.0 SFT supports only text-based reasoning content in the reasoningText field.
+ **Validation sets** – You cannot provide a validation dataset for SFT with Nova 2.0. If you provide a validation dataset, it is ignored during training. This limitation applies to both UI-based and programmatic job submissions.

### Supported media formats
<a name="nova-2-supported-media"></a>
+ **Images** – PNG, JPEG, GIF
+ **Videos** – MOV, MKV, MP4

### Data format examples
<a name="nova-2-data-examples"></a>

------
#### [ Text-only (Nova 1.0 compatible) ]

```
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "system": [  
    {  
      "text": "You are a digital assistant with a friendly personality"  
    }  
  ],  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "text": "What country is right next to Australia?"  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "text": "The closest country is New Zealand"  
        }  
      ]  
    }  
  ]  
}
```

------
#### [ Text with reasoning (Nova 2.0) ]

```
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "system": [  
    {  
      "text": "You are a digital assistant with a friendly personality"  
    }  
  ],  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "text": "What country is right next to Australia?"  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "reasoningContent": {  
            "reasoningText": {  
              "text": "I need to use my world knowledge of geography to answer this question"  
            }  
          }  
        },  
        {  
          "text": "The closest country to Australia is New Zealand, located to the southeast across the Tasman Sea."  
        }  
      ]  
    }  
  ]  
}
```

------
#### [ Image \$1 text input ]

```
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "system": [  
    {  
      "text": "You are a helpful assistant."  
    }  
  ],  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "image": {  
            "format": "jpeg",  
            "source": {  
              "s3Location": {  
                "uri": "s3://your-bucket/your-path/your-image.jpg",  
                "bucketOwner": "your-aws-account-id"  
              }  
            }  
          }  
        },  
        {  
          "text": "Which country is highlighted in the image?"  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "reasoningContent": {  
            "reasoningText": {  
              "text": "I will determine the highlighted country by examining its location on the map and using my geographical knowledge"  
            }  
          }  
        },  
        {  
          "text": "The highlighted country is New Zealand"  
        }  
      ]  
    }  
  ]  
}
```

------
#### [ Video \$1 text input ]

```
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "system": [  
    {  
      "text": "You are a helpful assistant."  
    }  
  ],  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "video": {  
            "format": "mp4",  
            "source": {  
              "s3Location": {  
                "uri": "s3://your-bucket/your-path/your-video.mp4",  
                "bucketOwner": "your-aws-account-id"  
              }  
            }  
          }  
        },  
        {  
          "text": "What is shown in this video?"  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "reasoningContent": {  
            "reasoningText": {  
              "text": "I will analyze the video content to identify key elements"  
            }  
          }  
        },  
        {  
          "text": "The video shows a map with New Zealand highlighted"  
        }  
      ]  
    }  
  ]  
}
```

------

## Tool calling
<a name="nova-2-tool-calling"></a>

Nova 2.0 SFT supports training models on tool calling patterns, enabling your model to learn when and how to invoke external tools or functions.

### Data format for tool calling
<a name="nova-2-tool-calling-format"></a>

Tool calling training data includes a `toolConfig` section that defines available tools, along with conversation turns that demonstrate tool usage patterns.

**Sample input**

```
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "system": [  
    {  
      "text": "You are an expert in composing function calls."  
    }  
  ],  
  "toolConfig": {  
    "tools": [  
      {  
        "toolSpec": {  
          "name": "getItemCost",  
          "description": "Retrieve the cost of an item from the catalog",  
          "inputSchema": {  
            "json": {  
              "type": "object",  
              "properties": {  
                "item_name": {  
                  "type": "string",  
                  "description": "The name of the item to retrieve cost for"  
                },  
                "item_id": {  
                  "type": "string",  
                  "description": "The ASIN of item to retrieve cost for"  
                }  
              },  
              "required": [  
                "item_id"  
              ]  
            }  
          }  
        }  
      },  
      {  
        "toolSpec": {  
          "name": "getItemAvailability",  
          "description": "Retrieve whether an item is available in a given location",  
          "inputSchema": {  
            "json": {  
              "type": "object",  
              "properties": {  
                "zipcode": {  
                  "type": "string",  
                  "description": "The zipcode of the location to check in"  
                },  
                "quantity": {  
                  "type": "integer",  
                  "description": "The number of items to check availability for"  
                },  
                "item_id": {  
                  "type": "string",  
                  "description": "The ASIN of item to check availability for"  
                }  
              },  
              "required": [  
                "item_id", "zipcode"  
              ]  
            }  
          }  
        }  
      }  
    ]  
  },  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "text": "I need to check whether there are twenty pieces of the following item available. Here is the item ASIN on Amazon: id-123. Please check for the zipcode 94086"  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "reasoningContent": {  
            "reasoningText": {  
              "text": "The user wants to check how many pieces of the item with ASIN id-123 are available in the zipcode 94086"  
            }  
          }  
        },  
        {  
          "toolUse": {  
            "toolUseId": "getItemAvailability_0",  
            "name": "getItemAvailability",  
            "input": {  
              "zipcode": "94086",  
              "quantity": 20,  
              "item_id": "id-123"  
            }  
          }  
        }  
      ]  
    },  
    {  
      "role": "user",  
      "content": [  
        {  
          "toolResult": {  
            "toolUseId": "getItemAvailability_0",  
            "content": [  
              {  
                "text": "[{\"name\": \"getItemAvailability\", \"results\": {\"availability\": true}}]"  
              }  
            ]  
          }  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {  
          "text": "Yes, there are twenty pieces of item id-123 available at 94086. Would you like to place an order or know the total cost?"  
        }  
      ]  
    }  
  ]  
}
```

### Tool calling requirements
<a name="nova-2-tool-calling-requirements"></a>

When creating tool calling training data, follow these requirements:


| Requirement | Description | 
| --- | --- | 
| ToolUse placement | ToolUse must appear in assistant turns only | 
| ToolResult placement | ToolResult must appear in user turns only | 
| ToolResult format | ToolResult should be text or JSON only. Other modalities are not supported for Nova models | 
| inputSchema format | The inputSchema within the toolSpec must be a valid JSON Schema object | 
| toolUseId matching | Each ToolResult must reference a valid toolUseId from a preceding assistant ToolUse, with each toolUseId used exactly once per conversation | 

### Important notes
<a name="nova-2-tool-calling-notes"></a>
+ Ensure your tool definitions are consistent across all training samples
+ The model learns tool invocation patterns from the demonstrations you provide
+ Include diverse examples of when to use each tool and when not to use tools

## Document understanding
<a name="nova-2-document-understanding"></a>

Nova 2.0 SFT supports training on document-based tasks, enabling your model to learn how to analyze and respond to questions about PDF documents.

### Data format for document understanding
<a name="nova-2-document-format"></a>

Document understanding training data includes document references in the user content blocks, with the model learning to extract and reason over document content.

**Sample input**

```
{  
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "text": "What are the ways in which a customer can experience issues during checkout on Amazon?"  
        },  
        {  
          "document": {  
            "format": "pdf",  
            "source": {  
              "s3Location": {  
                "uri": "s3://my-bucket-name/path/to/documents/customer_service_debugging.pdf",  
                "bucketOwner": "123456789012"  
              }  
            }  
          }  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {
          "reasoningContent": {  
            "reasoningText": {  
              "text": "I need to find the relevant section in the document to answer the question."  
            }  
          }
        },
        {  
          "text": "Customers can experience issues with 1. Data entry, 2. Payment methods, 3. Connectivity while placing the order. Which one would you like to dive into?"  
        }   
      ]
    }  
  ]  
}
}
```

### Document understanding limitations
<a name="nova-2-document-limitations"></a>


| Limitation | Details | 
| --- | --- | 
| Supported format | PDF files only | 
| Maximum document size | 10 MB | 
| Modality mixing | A sample can have documents and text, but cannot have documents mixed with other modalities (images, videos) | 

### Best practices for document understanding
<a name="nova-2-document-best-practices"></a>
+ Ensure documents are clearly formatted and text is extractable
+ Provide diverse examples covering different document types and question formats
+ Include reasoning content to help the model learn document analysis patterns

## Video understanding
<a name="nova-2-video-understanding"></a>

Nova 2.0 SFT supports training on video-based tasks, enabling your model to learn how to analyze and respond to questions about video content.

### Data format for video understanding
<a name="nova-2-video-format"></a>

Video understanding training data includes video references in the user content blocks, with the model learning to extract information and reason over video content.

**Sample input**

```
  
{  
  "schemaVersion": "bedrock-conversation-2024",  
  "messages": [  
    {  
      "role": "user",  
      "content": [  
        {  
          "text": "What are the ways in which a customer can experience issues during checkout on Amazon?"  
        },  
        {  
          "video": {  
            "format": "mp4",  
            "source": {  
              "s3Location": {  
                "uri": "s3://my-bucket-name/path/to/videos/customer_service_debugging.mp4",  
                "bucketOwner": "123456789012"  
              }  
            }  
          }  
        }  
      ]  
    },  
    {  
      "role": "assistant",  
      "content": [  
        {
          "reasoningContent": {  
            "reasoningText": {  
              "text": "I need to find the relevant section in the video to answer the question."  
            }  
          }
        },
        {  
          "text": "Customers can experience issues with 1. Data entry, 2. Payment methods, 3. Connectivity while placing the order. Which one would you like to dive into?"  
        }   
      ]  
    }  
  ]  
}
```

### Video understanding limitations
<a name="nova-2-video-limitations"></a>


| Limitation | Details | 
| --- | --- | 
| Maximum video size | 50 MB | 
| Maximum video duration | 15 minutes | 
| Videos per sample | Only one video is allowed per sample. Multiple videos in the same sample are not supported | 
| Modality mixing | A sample can have video and text, but cannot have video combined with other modalities (images, documents) | 

### Supported video formats
<a name="nova-2-video-formats"></a>
+ MOV
+ MKV
+ MP4

### Best practices for video understanding
<a name="nova-2-video-best-practices"></a>
+ Keep videos concise and focused on the content relevant to your task
+ Ensure video quality is sufficient for the model to extract meaningful information
+ Provide clear questions that reference specific aspects of the video content
+ Include diverse examples covering different video types and question formats

## Reasoning vs non-reasoning modes
<a name="nova-2-reasoning-modes"></a>

### Understanding reasoning content
<a name="nova-2-understanding-reasoning"></a>

Reasoning content (also called chain-of-thought) captures the model's intermediate thinking steps before generating a final answer. In the `assistant` turn, use the `reasoningContent` field to include these reasoning traces.

**How loss is calculated**
+ **With reasoning content** – Training loss includes both reasoning tokens and final output tokens
+ **Without reasoning content** – Training loss is calculated only on the final output tokens

You can include `reasoningContent` across multiple assistant turns in multi-turn conversations.

**Formatting guidelines**
+ Use plain text for reasoning content
+ Avoid markup tags like `<thinking>` and `</thinking>` unless specifically required by your task
+ Ensure reasoning content is clear and relevant to the problem-solving process

### When to enable reasoning mode
<a name="nova-2-when-enable-reasoning"></a>

Set `reasoning_enabled: true` in your training configuration when:
+ Your training data has reasoning tokens
+ You want the model to generate thinking tokens before producing final outputs
+ You need improved performance on complex reasoning tasks

Training Nova on a non-reasoning dataset with `reasoning_enabled = true` is permitted. However, doing so may cause the model to lose its reasoning capabilities, as Nova primarily learns to generate the responses presented in the data without applying reasoning. If you want to train Nova on a non-reasoning dataset but still expect reasoning during inference, you can disable reasoning during training (`reasoning_enabled = false`) but enable it for inference. While this approach allows reasoning to be used at inference time, it does not guarantee improved performance compared to inference without reasoning. In general, enable reasoning for both training and inference when using reasoning datasets, and disable it for both when using non-reasoning datasets.

Set `reasoning_enabled: false` when:
+ Your training data does not have reasoning tokens
+ You're training on straightforward tasks that don't benefit from explicit reasoning steps
+ You want to optimize for speed and reduce token usage

### Generating reasoning data
<a name="nova-2-generating-reasoning"></a>

If your dataset lacks reasoning traces, you can create them using a reasoning-capable model like Nova Premier. Provide your input-output pairs to the model and capture its reasoning process to build a reasoning-augmented dataset.

### Using reasoning tokens for training
<a name="nova-2-using-reasoning-training"></a>

When training with reasoning mode enabled, the model learns to separate internal reasoning from the final answer. The training process:
+ Organizes data as triples: input, reasoning, and answer
+ Optimizes using standard next-token prediction loss from both reasoning and answer tokens
+ Encourages the model to reason internally before generating responses

### Effective reasoning content
<a name="nova-2-effective-reasoning"></a>

High-quality reasoning content should include:
+ Intermediate thoughts and analysis
+ Logical deductions and inference steps
+ Step-by-step problem-solving approaches
+ Explicit connections between steps and conclusions

This helps the model develop the ability to "think before answering."

## Dataset preparation guidelines
<a name="nova-2-dataset-preparation"></a>

### Size and quality
<a name="nova-2-size-quality"></a>
+ **Recommended size** – 2,000-10,000 samples
+ **Minimum samples** – 200
+ **Priority** – Quality over quantity. Ensure examples are accurate and well-annotated
+ **Application alignment** – Dataset should closely reflect your production use cases

### Diversity
<a name="nova-2-diversity"></a>

Include diverse examples that:
+ Cover the full range of expected inputs
+ Represent different difficulty levels
+ Include edge cases and variations
+ Prevent overfitting to narrow patterns

### Output formatting
<a name="nova-2-output-formatting"></a>

Clearly specify the desired output format in assistant responses:
+ JSON structures
+ Tables
+ CSV format
+ Custom formats specific to your application

### Multi-turn conversations
<a name="nova-2-multi-turn"></a>

For multi-turn datasets, remember:
+ Loss is calculated only on assistant turns, not user turns
+ Each assistant response should be properly formatted
+ Maintain consistency across conversation turns

### Quality checklist
<a name="nova-2-quality-checklist"></a>
+ Sufficient dataset size (2K-10K samples)
+ Diverse examples covering all use cases
+ Clear, consistent output formatting
+ Accurate labels and annotations
+ Representative of production scenarios
+ Free from contradictions or ambiguities

### Uploading your data
<a name="nova-2-uploading-data"></a>

Datasets should be uploaded to a bucket that can be accessed by SageMaker training jobs. For information about setting the right permissions, see [Prerequisites](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-model-general-prerequisites.html).

## Starting a training job
<a name="nova-2-starting-training"></a>

### Selecting hyperparameters and updating the recipe
<a name="nova-2-selecting-hyperparameters"></a>

The setup for Nova 2.0 is largely the same as for Nova 1.0. Once the input data has been uploaded to S3, use the recipe from [SageMaker HyperPod Recipes](https://github.com/aws/sagemaker-hyperpod-recipes/tree/main/recipes_collection/recipes/fine-tuning/nova) under Fine tuning folder. For Nova 2.0, the following are some of the key hyperparameters that you can update based on the use case. The following is an example of the Nova 2.0 SFT PEFT recipe. For the container image URI, use `708977205387.dkr.ecr.us-east-1.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-V2-latest` to run an SFT fine-tuning job.

Please use v2.254.1 of SageMaker AI PySDK for strict compatibility with Nova training. Upgrading the SDK to v3.0 version will result in breaking changes. Support for v3 of SageMaker AI PySDK is coming soon.

**Sample Input **

```
!pip install sagemaker==2.254.1
```

```
run:  
  name: {peft_recipe_job_name}  
  model_type: amazon.nova-2-lite-v1:0:256k  
  model_name_or_path: {peft_model_name_or_path}  
  data_s3_path: {train_dataset_s3_path} # SageMaker HyperPod (SMHP) only and not compatible with SageMaker Training jobs. Note replace my-bucket-name with your real bucket name for SMHP job  
  replicas: 4                      # Number of compute instances for training, allowed values are 4, 8, 16, 32  
  output_s3_path: ""               # Output artifact path (Hyperpod job-specific; not compatible with standard SageMaker Training jobs). Note replace my-bucket-name with your real bucket name for SMHP job  
  
training_config:  
  max_steps: 10                   # Maximum training steps. Minimal is 4.  
  save_steps: 10                      # How many training steps the checkpoint will be saved. Should be less than or equal to max_steps  
  save_top_k: 1                    # Keep top K best checkpoints. Note supported only for SageMaker HyperPod jobs. Minimal is 1.  
  max_length: 32768                # Sequence length (options: 8192, 16384, 32768 [default], 65536)  
  global_batch_size: 32            # Global batch size (options: 32, 64, 128)  
  reasoning_enabled: true          # If data has reasoningContent, set to true; otherwise False  
  
  lr_scheduler:  
    warmup_steps: 15               # Learning rate warmup steps. Recommend 15% of max_steps  
    min_lr: 1e-6                   # Minimum learning rate, must be between 0.0 and 1.0  
  
  optim_config:                    # Optimizer settings  
    lr: 1e-5                       # Learning rate, must be between 0.0 and 1.0  
    weight_decay: 0.0              # L2 regularization strength, must be between 0.0 and 1.0  
    adam_beta1: 0.9                # Exponential decay rate for first-moment estimates, must be between 0.0 and 1.0  
    adam_beta2: 0.95               # Exponential decay rate for second-moment estimates, must be between 0.0 and 1.0  
  
  peft:                            # Parameter-efficient fine-tuning (LoRA)  
    peft_scheme: "lora"            # Enable LoRA for PEFT  
    lora_tuning:  
      alpha: 64                    # Scaling factor for LoRA weights ( options: 32, 64, 96, 128, 160, 192),  
      lora_plus_lr_ratio: 64.0
```

The recipe also contains largely the same hyperparameters as Nova 1.0. The notable hyperparameters are:
+ `max_steps` – The number of steps you want to run the job for. Generally, for one epoch (one run through your entire dataset), the number of steps = number of data samples / global batch size. The larger the number of steps and the smaller your global batch size, the longer the job will take to run.
+ `reasoning_enabled` – Controls reasoning mode for your dataset. Options:
  + `true`: Enables reasoning mode (equivalent to high reasoning)
  + `false`: Disables reasoning mode

  Note: For SFT, there is no granular control over reasoning effort levels. Setting `reasoning_enabled: true` enables full reasoning capability.
+ `peft.peft_scheme` – Setting this to "lora" enables PEFT-based fine tuning. Setting it to null (no quotes) enables Full-Rank fine tuning.

### Start the training job
<a name="nova-2-start-job"></a>

```
from sagemaker.pytorch import PyTorch  
  
# define OutputDataConfig path  
if default_prefix:  
    output_path = f"s3://{bucket_name}/{default_prefix}/{sm_training_job_name}"  
else:  
    output_path = f"s3://{bucket_name}/{sm_training_job_name}"  

output_kms_key = "<KMS key arn to encrypt trained model in Amazon-owned S3 bucket>" # optional, leave blank for Amazon managed encryption
  
recipe_overrides = {  
    "run": {  
        "replicas": instance_count,  # Required  
        "output_s3_path": output_path  
    },  
}  
  
estimator = PyTorch(  
    output_path=output_path,  
    base_job_name=sm_training_job_name,  
    role=role,  
    disable_profiler=True,  
    debugger_hook_config=False,  
    instance_count=instance_count,  
    instance_type=instance_type,  
    training_recipe=training_recipe,  
    recipe_overrides=recipe_overrides,  
    max_run=432000,  
    sagemaker_session=sagemaker_session,  
    image_uri=image_uri,
    output_kms_key=output_kms_key,
    tags=[  
        {'Key': 'model_name_or_path', 'Value': model_name_or_path},  
    ]  
)  
  
print(f"\nsm_training_job_name:\n{sm_training_job_name}\n")  
print(f"output_path:\n{output_path}")
```

```
from sagemaker.inputs import TrainingInput  
  
train_input = TrainingInput(  
    s3_data=train_dataset_s3_path,  
    distribution="FullyReplicated",  
    s3_data_type="Converse",  
)  
  
estimator.fit(inputs={"validation": val_input}, wait=False)
```

**Note**  
Passing a validation dataset is not supported for supervised fine tuning of Nova 2.0.

To kick off the job:
+ Update the recipe with your dataset paths and hyperparameters
+ Execute the specified cells in the notebook to submit the training job

The notebook handles job submission and provides status tracking.

# Preparing data for multimodal fine-tuning
<a name="fine-tune-prepare-data-understanding"></a>

The following are guidelines and requirements for preparing data for fine-tuning Understanding models:

1. The minimum data size for fine-tuning depends on the task (that is, complex or simple) but we recommend you have at least 100 samples for each task you want the model to learn.

1. We recommend using your optimized prompt in a zero-shot setting during both training and inference to achieve the best results.

1. Traning and validation datasets must be JSONL files, where each line is a JSON object corresponding to a record. These file names can consist of only alphanumeric characters, underscores, hyphens, slashes and dots.

1. Image and video constraints

   1. Dataset can't contain different media modalities. That is, the dataset can either be text with images or text with videos.

   1. One sample (single record in messages) can have multiple images

   1. One sample (single record in messages) can have only 1 video

1. `schemaVersion` can be any string value

1. The (*optional*) `system` turn can be a customer-provided custom system prompt.

1. Supported roles are `user` and `assistant`.

1. The first turn in `messages` should always start with `"role": "user"`. The last turn is the bot's response, denoted by "role": "assistant".

1. The `image.source.s3Location.uri` and `video.source.s3Location.uri` must be accessible to Amazon Bedrock.

1.  Your Amazon Bedrock service role must be able to access the image files in Amazon S3. For more information about granting access, see [Create a service role for model customization](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html) 

1. The images or videos must be in the same Amazon S3 bucket as your dataset. For example, if your dataset is in `s3://amzn-s3-demo-bucket/train/train.jsonl`, then your images or videos must be in `s3://amzn-s3-demo-bucket`

1. The terms `User:`, `Bot:`, `Assistant:`, `System:`, `<image>`, `<video>` and `[EOS]` are reserved keywords. If a user prompt or system prompt starts with any of these keywords, or have these keywords anywhere in their prompts, your training job will fail due to data issues. If you need to use these keywords for your use case, you must substitute it for a different keyword with a similar meaning so that your training can proceed.

**Note**  
To validate your dataset before submitting a fine-tuning job, you can use the [dataset validation script](https://github.com/aws-samples/amazon-nova-samples/tree/main/customization/bedrock-finetuning/understanding/dataset_validation) available on GitHub.

**Topics**
+ [Example dataset formats](#customize-fine-tune-examples)
+ [Dataset constraints](#custom-fine-tune-constraints)

## Example dataset formats
<a name="customize-fine-tune-examples"></a>

The following example dataset formats provide a guide for you to follow.

### Text-only custom fine tuning format
<a name="example4"></a>

The following example is for custom fine tuning over text only.

```
// train.jsonl
{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [
    {
      "text": "You are a digital assistant with a friendly personality"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What is the capital of Mars?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": "Mars does not have a capital. Perhaps it will one day."
        }
      ]
    }
  ]
}
```

### Single image custom fine tuning format
<a name="example1"></a>

The following example is for custom fine tuning over text and a single image.

```
// train.jsonl{
    "schemaVersion": "bedrock-conversation-2024",
    "system": [{
        "text": "You are a smart assistant that answers questions respectfully"
    }],
    "messages": [{
            "role": "user",
            "content": [{
                    "text": "What does the text in this image say?"
                },
                {
                    "image": {
                        "format": "png",
                        "source": {
                            "s3Location": {
                                "uri": "s3://your-bucket/your-path/your-image.png",
                                "bucketOwner": "your-aws-account-id"
                            }
                        }
                    }
                }
            ]
        },
        {
            "role": "assistant",
            "content": [{
                "text": "The text in the attached image says 'LOL'."
            }]
        }
    ]
}
```

### Video custom fine tuning format
<a name="example3"></a>

The following example is for custom fine tuning over text and video.

```
{
    "schemaVersion": "bedrock-conversation-2024",
    "system": [{
        "text": "You are a helpful assistant designed to answer questions crisply and to the point"
    }],
    "messages": [{
            "role": "user",
            "content": [{
                    "text": "How many white items are visible in this video?"
                },
                {
                    "video": {
                        "format": "mp4",
                        "source": {
                            "s3Location": {
                                "uri": "s3://your-bucket/your-path/your-video.mp4",
                                "bucketOwner": "your-aws-account-id"
                            }
                        }
                    }
                }
            ]
        },
        {
            "role": "assistant",
            "content": [{
                "text": "There are at least eight visible items that are white"
            }]
        }
    ]
}
```

## Dataset constraints
<a name="custom-fine-tune-constraints"></a>

Amazon Nova applies the following constraints on model customizations for Understanding models.


| Model | Minimum Samples | Maximum Samples | Context Length | 
| --- |--- |--- |--- |
| Nova 2 Lite | 8 | 20k | 32k | 


**Image and video constraints**  

|  |  | 
| --- |--- |
| Maximum images | 10/sample | 
| Maximum image file size | 10 MB | 
| Maximum videos | 1/sample | 
| Maximum video length/duration | 90 seconds | 
| Maximum video file size | 50 MB | 

**Supported media formats**
+ Image - `png`, `jpeg`, `gif`, `webp`
+ Video - `mov`, `mkv`, `mp4`, `webm`

# Reinforcement Fine-Tuning (RFT) with Amazon Nova models
<a name="nova-reinforcement-fine-tuning"></a>

## Overview
<a name="nova-rft-overview"></a>

**What is RFT?**

Reinforcement fine-tuning (RFT) improves model performance by training on feedback signals—measurable scores or rewards indicating how well the model performed—rather than exact correct answers. Unlike supervised fine-tuning that learns from input-output pairs, RFT uses reward functions to evaluate model responses and iteratively optimizes the model to maximize these rewards. This approach excels when defining the exact correct output is challenging, but you can reliably measure response quality.

**When to use RFT**

Use RFT when you can define clear, measurable success criteria but struggle to provide exact correct outputs for training. It's ideal for:
+ Tasks where quality is subjective or multifaceted (creative writing, code optimization, complex reasoning)
+ Scenarios with multiple valid solutions where some are clearly better than others
+ Applications requiring iterative improvement, personalization, or adherence to complex business rules
+ Cases where collecting high-quality labeled examples is expensive or impractical

**Best use cases**

RFT excels in domains where output quality can be objectively measured but optimal responses are difficult to define upfront:
+ Mathematical problem-solving and code generation
+ Scientific reasoning and structured data analysis
+ Tasks requiring step-by-step reasoning or multi-turn problem solving
+ Applications balancing multiple objectives (accuracy, efficiency, style)
+ Scenarios where success can be verified programmatically through execution results or performance metrics

**Supported models**

Nova Lite 2.0

## Data format overview
<a name="nova-rft-data-format"></a>

RFT training data must follow the OpenAI Reinforcement Fine-Tuning [format](https://platform.openai.com/docs/api-reference/fine-tuning/reinforcement-input). Each training example is a JSON object containing:
+ A `messages` array with conversational turns using `system` and `user` roles
+ A `reference_answer` field containing the expected output or evaluation criteria for reward calculation

**Current limitations**
+ Text only

### Data format examples
<a name="nova-rft-data-examples"></a>

Each example should be on a single line in your JSONL file, with one JSON object per line.

------
#### [ Chemistry problem ]

```
{
  "id": "chem-01",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful chemistry assistant"
    },
    {
      "role": "user",
      "content": "Calculate the molecular weight of caffeine (C8H10N4O2)"
    }
  ],
  "reference_answer": {
    "molecular_weight": 194.19,
    "unit": "g/mol",
    "calculation": "8(12.01) + 10(1.008) + 4(14.01) + 2(16.00) = 194.19"
  }
}
```

------
#### [ Math problem ]

```
{
  "id": "sample-001",  // Optional
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  }
}
```

------
#### [ Code problem ]

```
{
  "id": "code-002",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful programming assistant"
    },
    {
      "role": "user",
      "content": "Write a Python function that reverses a string without using built-in reverse methods"
    }
  ],
  "reference_answer": {
    "code": "def reverse_string(s):  \n    result = ''  \n    for i in range(len(s) - 1, -1, -1):  \n        result += s[i]  \n    return result",
    "test_cases": [
      {
        "input": "hello",
        "expected_output": "olleh"
      },
      {
        "input": "",
        "expected_output": ""
      },
      {
        "input": "a",
        "expected_output": "a"
      },
      {
        "input": "Python123",
        "expected_output": "321nohtyP"
      }
    ],
    "all_tests_pass": true
  }
}
```

------

The `reference_answer` field contains the expected output or evaluation criteria that your reward function uses to score the model's response. It is not limited to structured outputs—it can contain any format that helps your reward function evaluate quality.

## Dataset size recommendations
<a name="nova-rft-dataset-size"></a>

**Starting point**
+ Minimum 100 training examples
+ Minimum 100 evaluation examples

**Evaluation-first approach**

Before investing in large-scale RFT training, evaluate your model's baseline performance:
+ **High performance (>95% reward)** – RFT may be unnecessary—your model already performs well
+ **Very poor performance (0% reward)** – Switch to SFT first to establish basic capabilities
+ **Moderate performance** – RFT is likely appropriate

Starting with a small dataset allows you to:
+ Validate your reward function is bug-free
+ Confirm RFT is the right approach for your use case
+ Identify and fix issues early
+ Test the workflow before scaling up

Once validated, you can expand to larger datasets to further improve performance.

## Characteristics of effective training data
<a name="nova-rft-effective-data"></a>

**Clarity and consistency**

Good RFT examples require clear, unambiguous input data that enables accurate reward calculation across different model outputs. Avoid noise in your data, including:
+ Inconsistent formatting
+ Contradictory labels or instructions
+ Ambiguous prompts
+ Conflicting reference answers

Any ambiguity will mislead the training process and cause the model to learn unintended behaviors.

**Diversity**

Your dataset should capture the full diversity of production use cases to ensure robust real-world performance. Include:
+ Different input formats and edge cases
+ Map actual production usage patterns from logs and user analytics
+ Sample across user types, geographic regions, and seasonal variations
+ Include difficulty levels from simple to complex problems

**Reward function considerations**

Design your reward function for efficient training:
+ Execute within seconds (not minutes)
+ Parallelize effectively with Lambda
+ Return consistent, reliable scores
+ Handle different types of model outputs gracefully

Fast, scalable reward functions enable rapid iteration and cost-effective experimentation.

## Additional properties
<a name="nova-rft-additional-properties"></a>

The RFT data format supports custom fields beyond the core schema requirements (`messages` and `reference_answer`). This flexibility lets you add any additional data your reward function needs for proper evaluation.

**Note**  
You don't need to configure this in your recipe—the data format inherently supports additional fields. Simply include them in your training data JSON, and they will be passed to your reward function in the `metadata` field.

**Common additional properties**

Example metadata fields:
+ `task_id` – Unique identifier for tracking
+ `difficulty_level` – Problem complexity indicator
+ `domain` – Subject area or category
+ `expected_reasoning_steps` – Number of steps in solution

**Example with additional properties**

```
{
  "messages": [
    {
      "role": "system",
      "content": "You are a math tutor"
    },
    {
      "role": "user",
      "content": "Solve: 2x + 5 = 13"
    }
  ],
  "reference_answer": {
    "solution": "x = 4",
    "steps": ["2x = 13 - 5", "2x = 8", "x = 4"]
  },
  "task_id": "algebra_001",
  "difficulty_level": "easy",
  "domain": "algebra",
  "expected_reasoning_steps": 3
}
```

These additional fields are passed to your reward function during evaluation, enabling sophisticated scoring logic tailored to your specific use case.

## Training configuration
<a name="nova-rft-training-config"></a>

**Sample recipe**

```
# Note:
# This recipe can run on p5.48xlarge and p5en.48xlarge instance types.
run:
  name: "my-rft-run"                           # Unique run name (appears in logs/artifacts).
  model_type: amazon.nova-2-lite-v1:0:256k
  model_name_or_path: nova-lite-2/prod
  data_s3_path: s3://<bucket>/<data file>      # Training dataset in JSONL;
  replicas: 4
  reward_lambda_arn: arn:aws:lambda:<region>:<account-id>:function:<function-name>

  ## MLFlow configs
  mlflow_tracking_uri: "" # Required for MLFlow
  mlflow_experiment_name: "my-rft-experiment" # Optional for MLFlow. Note: leave this field non-empty
  mlflow_run_name: "my-rft-run" # Optional for MLFlow. Note: leave this field non-empty

## SMTJ GRPO Training specific configs
training_config:
  max_length: 8192                              # Context window (tokens) for inputs+prompt;
  global_batch_size: 16                         # Total samples per optimizer step across all replicas (16/32/64/128/256).
  reasoning_effort: high                        # Enables reasoning mode high / low / or null for non-reasoning

  rollout:                                      # How responses are generated for GRPO/advantage calc.
    advantage_strategy:
      number_generation: 2                      # N samples per prompt to estimate advantages (variance vs cost).
    generator:
      max_new_tokens: 6000                      # Cap on tokens generated per sample
      set_random_seed: true                     # Seed generation for reproducibility across runs.
      temperature: 1                            # Softmax temperature;
      top_k: 1                                  # Sample only from top-K logits
    rewards:
      preset_reward_function: null              # Usage of preset reward functions [exact_match]
      api_endpoint:
        lambda_arn: arn:aws:lambda:<region>:<account-id>:function:<function-name>
        lambda_concurrency_limit: 12             # Max concurrent Lambda invocations (throughput vs. throttling).

  trainer:
    max_steps: 2                                 # Steps to train for. One Step = global_batch_size
    save_steps: 5
    test_steps: 1

    # RL parameters
    ent_coeff: 0.0                              # A bonus added to the policy loss that rewards higher-output entropy.
    kl_loss_coef: 0.001                         # Weight on the KL penalty between the actor (trainable policy) and a frozen reference model

    optim_config:                    # Optimizer settings
        lr: 5e-5                       # Learning rate
        weight_decay: 0.0              # L2 regularization strength (0.0–1.0)
        adam_beta1: 0.9
        adam_beta2: 0.95

    peft:                            # Parameter-efficient fine-tuning (LoRA)
        peft_scheme: "lora"            # Enable LoRA for PEFT
        lora_tuning:
            alpha: 32
            lora_plus_lr_ratio: 64.0     # LoRA+ learning rate scaling factor (0.0–100.0)
```

## RFT training using LLM as a judge
<a name="nova-rft-llm-judge"></a>

### Overview
<a name="nova-rft-llm-judge-overview"></a>

Large language models (LLMs) are increasingly being used as judges in reinforcement fine-tuning (RFT) workflows, providing automated reward signals that guide model optimization. In this approach, an LLM evaluates model outputs against specified criteria—whether assessing correctness, quality, style adherence, or semantic equivalence—and assigns rewards that drive the reinforcement learning process.

This is particularly valuable for tasks where traditional reward functions are difficult to define programmatically, such as determining whether different representations (like "1/3", "0.333", and "one-third") are semantically equivalent, or evaluating nuanced qualities like coherence and relevance. By using LLM-based judges as reward functions, you can scale RFT to complex domains without requiring extensive human annotation, enabling rapid iteration and continuous improvement of your models across diverse use cases beyond traditional alignment problems.

### Reasoning mode selection
<a name="nova-rft-reasoning-mode"></a>

**Available modes**
+ none – No reasoning (omit the reasoning\$1effort field)
+ low – Minimal reasoning overhead
+ high – Maximum reasoning capability (default when reasoning\$1effort is specified)

**Note**  
There is no medium option for RFT. If the reasoning\$1effort field is absent from your configuration, reasoning is disabled. When reasoning is enabled, you should set `max_new_tokens` to 32768 to accommodate extended reasoning outputs.

**When to use each mode**

Use high reasoning for:
+ Complex analytical tasks
+ Mathematical problem-solving
+ Multi-step logical deduction
+ Tasks where step-by-step thinking adds value

Use none (omit reasoning\$1effort) or low reasoning for:
+ Simple factual queries
+ Direct classifications
+ Speed and cost optimization
+ Straightforward question-answering

**Cost and performance trade-offs**

Higher reasoning modes increase:
+ Training time and cost
+ Inference latency and cost
+ Model capability for complex reasoning tasks

### Validating your LLM judge
<a name="nova-rft-validating-judge"></a>

Before deploying an LLM-as-a-judge in production, validate that the judge model's evaluations align with human judgment. This involves:
+ Measuring agreement rates between the LLM judge and human evaluators on representative samples of your task
+ Ensuring that the LLM's agreement with humans meets or exceeds inter-human agreement rates
+ Identifying potential biases in the judge model
+ Building confidence that the reward signal guides your model in the intended direction

This validation step helps ensure the automated evaluation process will produce models that meet your production quality criteria.

### Lambda configuration for LLM judge
<a name="nova-rft-lambda-config"></a>

Using an LLM as a judge is an extension of using Lambda functions for Reinforcement Learning with Verifiable Rewards (RLVR). Inside the Lambda function, you make a call to one of the models hosted in Amazon Bedrock.

**Important configuration requirements:**


| Configuration | Requirement | Details | 
| --- | --- | --- | 
| Amazon Bedrock throughput | Sufficient quota | Ensure your throughput quota for the Amazon Bedrock model used is sufficient for your training workload | 
| Lambda timeout | Extended timeout | Configure your Lambda function timeout up to the maximum of 15 minutes. The default setting is 3 seconds, which is insufficient for Amazon Bedrock model responses | 
| Lambda concurrency | Increased concurrency | The Lambda gets invoked in parallel during training. Increase concurrency to maximize available throughput | 
| Recipe configuration | Match Lambda settings | The concurrency limit must be configured in your recipe | 

## Creating and running jobs
<a name="nova-rft-creating-jobs"></a>

**Starting a training job**

Use the SageMaker training job notebook template: [https://docs.aws.amazon.com/sagemaker/latest/dg/nova-fine-tuning-training-job.html#nova-model-training-jobs-notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-fine-tuning-training-job.html#nova-model-training-jobs-notebook)

**Instance requirements**

The container supports both Full-Rank and LoRA training:
+ **LoRA training** – 2/4/6/8 × p5.48xlarge or p5en.48xlarge instances
+ **Full-Rank training** – 2/4/6/8 × p5.48xlarge instances (required)

## Monitoring training
<a name="nova-rft-monitoring"></a>

Training logs include comprehensive metrics at each step. Key metric categories:

**Reward metrics**
+ `critic/rewards/mean`, `critic/rewards/max`, `critic/rewards/min` – Reward distribution
+ `val-score/rewards/mean@1` – Validation rewards

**Model behavior**
+ `actor/entropy` – Policy variation (higher = more exploratory)

**Training health**
+ `actor/pg_loss` – Policy gradient loss
+ `actor/pg_clipfrac` – Frequency of clipped updates
+ `actor/grad_norm` – Gradient magnitude

**Response characteristics**
+ `prompt_length/mean`, `prompt_length/max`, `prompt_length/min` – Input token statistics
+ `response_length/mean`, `response_length/max`, `response_length/min` – Output token statistics
+ `response/aborted_ratio` – Incomplete generation rate (0 = all completed)

**Performance**
+ `perf/throughput` – Training throughput
+ `perf/time_per_step` – Time per training step
+ `timing_per_token_ms/*` – Per-token processing times

**Resource usage**
+ `perf/max_memory_allocated_gb`, `perf/max_memory_reserved_gb` – GPU memory
+ `perf/cpu_memory_used_gb` – CPU memory

## Using fine-tuned models
<a name="nova-rft-using-models"></a>

After training completes, the final model checkpoint is saved to your specified output location. The checkpoint path is available in:
+ Training logs
+ `manifest.json` file in the output Amazon S3 location (defined by `output_s3_uri` in your notebook)

## Limitations and best practices
<a name="nova-rft-limitations"></a>

**Limitations**
+ **Lambda timeout** – Reward functions must complete within 15 minutes (prevents runaway processes and manages costs)
+ **Single-turn only** – Multi-turn conversations are not supported
+ **Data requirements** – Needs sufficient diversity; struggles with sparse rewards (<5% positive examples)
+ **Computational cost** – More expensive than supervised fine-tuning
+ **No multi-modal data** – Only text data type is supported

**Best practices**

**Start small**
+ Begin with 100-200 examples
+ Validate reward function correctness
+ Scale gradually based on results

**Pre-training evaluation**
+ Test baseline model performance before RFT
+ If rewards are consistently 0%, use SFT first to establish basic capabilities
+ If rewards are >95%, RFT may be unnecessary

**Monitor training**
+ Track average reward scores and distribution
+ Watch for overfitting (training rewards increase while validation rewards decrease)
+ Look for concerning patterns:
  + Rewards plateauing below 0.15
  + Increasing reward variance over time
  + Declining validation performance

**Optimize reward functions**
+ Execute within seconds (not minutes)
+ Minimize external API calls
+ Use efficient algorithms
+ Implement proper error handling
+ Take advantage of Lambda's parallel scaling

**Iteration strategy**

If rewards aren't improving:
+ Adjust reward function design
+ Increase dataset diversity
+ Add more representative examples
+ Verify reward signals are clear and consistent

## Advanced capabilities: Nova Forge
<a name="nova-rft-advanced"></a>

For users requiring advanced capabilities beyond standard RFT limitations, Nova Forge is available as a paid subscription service offering:
+ Multi-turn conversation support
+ Reward functions with >15 minute execution time
+ Additional algorithms and tuning options
+ Custom training recipe modifications
+ State-of-the-art AI techniques

Nova Forge runs on SageMaker HyperPod and is designed to support enterprise customers to build their own frontier models.

## Useful commands and tips
<a name="nova-rft-useful-commands"></a>

A collection of [observability scripts](https://github.com/aws-samples/amazon-nova-samples/tree/main/customization/SageMakerUilts/SageMakerJobsMonitoring) is available to help monitor the status and progress of training jobs.

Available scripts are:
+ Enabling email notifications for training job status updates
+ Obtaining training time estimates based on job configurations
+ Obtaining approximations for how long training is expected to take for in-progress jobs

**Installation**

**Note**  
Be sure to refresh your AWS credentials prior to using any of the following scripts.

```
pip install boto3
git clone https://github.com/aws-samples/amazon-nova-samples.git
cd amazon-nova-samples/customization/SageMakerUilts/SageMakerJobsMonitoring/
```

**Basic usage**

```
# Enabling email notifications for training job status updates
python enable_sagemaker_job_notifs.py --email test@amazon.com test2@gmail.com --region us-east-1 --platform SMTJ

Creating resources........
Please check your email for a subscription confirmation email, and click 'Confirm subscription' to start receiving job status email notifications!
You'll receive the confirmation email within a few minutes.
```

```
# Obtaining training time estimates based on job configurations
python get_training_time_estimate.py
```

```
# Obtaining approximations for how long training is expected to take for in-progress jobs
python get-training-job-progress.py --region us-east-1 --job-name my-training-job --num-dataset-samples 1000
```

Please see [here](https://github.com/aws-samples/amazon-nova-samples/blob/main/customization/SageMakerUilts/SageMakerJobsMonitoring/README.md) for additional details and examples.