Launching Training Jobs Monitoring Training Get Training Results

Training job submission

Launching Training Jobs

After your agent is deployed and your dataset is in S3, create a training job using one of the following methods.

SageMaker AI Studio

Navigate to Models in the navigation pane and select JumpStart Base Models.
Select a model that supports multi-turn RL (see the supported models table) and choose Customize model, then Customize with UI.
Select Multi-Turn Reinforcement Learning as the customization technique.
Configure your agent environment — select your Bedrock AgentCore runtime or provide your Lambda forwarder ARN.
Provide your training dataset as an S3 URI or registered dataset.
Adjust hyperparameters as needed.
Review your configuration and choose Submit.

SageMaker AI Python SDK

Discover supported models


from sagemaker.train.multi_turn_rl_trainer import MultiTurnRLTrainer

supported_models = MultiTurnRLTrainer.list_supported_models()
print(f"Supported MTRL models ({len(supported_models)}):")
for m in supported_models:
    print(f"  - {m}")

Set up your agent environment

Option 1: Bedrock AgentCore runtime


# List available runtimes
runtimes = MultiTurnRLTrainer.list_bedrock_agentcore_runtimes()
for rt in runtimes:
    print(f"  - {rt['name']} ({rt['status']}) → {rt['arn']}")

Option 2: Custom Lambda agent


from sagemaker.train.agent_lambda import AgentLambda

# Create from inline code
adapter = AgentLambda.create(
    source='''
import json
def handler(event, context):
    prompt = event.get("prompt", "")
    return {"statusCode": 200, "body": json.dumps({"status": "ok", "agentResponse": prompt})}
''',
    role="arn:aws:iam::123456789012:role/AgentLambdaRole",
)

# Create from a local file
adapter = AgentLambda.create(
    source="~/my_agent_handler.py",
    role="arn:aws:iam::123456789012:role/AgentLambdaRole",
)

# Create from S3
adapter = AgentLambda.create(
    source="s3://my-bucket/agent_handler.py",
    role="arn:aws:iam::123456789012:role/AgentLambdaRole",
)

# Wrap an existing Lambda
adapter = AgentLambda.get("arn:aws:lambda:us-west-2:123456789012:function:my-agent")

Register your dataset (optional)


from sagemaker.ai_registry.dataset import DataSet

dataset = DataSet.create(
    name="my-mtrl-dataset",
    source="s3://my-bucket/prompts/training_prompts.parquet"
)
print(f"Dataset ARN: {dataset.arn}")

Create Restricted Model Package Group for Nova (optional)

If you are choosing Nova model (nova-textgeneration-lite-v2), then optionally create Restricted Model Package Group prior to submitting a training job (next step). If you skip this step, the SDK automatically creates one for you.

Restricted Model Package Group (RMPG) is a Model Package Group with ManagedStorageType: Restricted. It's required for closed-source models like Nova where the model weights are managed by AWS and not directly accessible to the customer.

The RFT job schema requires two separate Restricted MPGs:

Output MPG — stores the final fine-tuned model package
Intermediate Checkpoint MPG — reserved for intermediate training checkpoints (must differ from the Output MPG)


from sagemaker.core.resources import Job, ModelPackageGroup
from sagemaker.core.shapes import ManagedConfiguration

model_name = "nova-textgeneration-lite-v2"

# Restricted configuration
managed_config = ManagedConfiguration(managed_storage_type="Restricted")

# Output Model package group
output_mpg_name = f"{model_name}-mtrl-output-mpg"
create_kwargs = {
    "model_package_group_name": output_mpg_name,
    "region": "us-east-1",
    "managed_configuration": managed_config
}
output_mpg = ModelPackageGroup.create(**create_kwargs)

# Intermediate Model package group
intermediate_mpg_name = f"{model_name}-mtrl-inter-mpg"
create_kwargs = {
    "model_package_group_name": intermediate_mpg_name,
    "region": "us-east-1",
    "managed_configuration": managed_config
}
intermediate_mpg = ModelPackageGroup.create(**create_kwargs)

Once the Model package group is created, pass the groups in the next step when submitting a training job.

Submit a training job with Bedrock AgentCore


from sagemaker.train.multi_turn_rl_trainer import MultiTurnRLTrainer

trainer = MultiTurnRLTrainer(
    model="openai-reasoning-gpt-oss-20b",
    agent_env="arn:aws:bedrock-agentcore:us-west-2:123456789012:runtime/my-agent-runtime",
    training_dataset="s3://my-bucket/prompts/prompts.parquet",
    mlflow_app_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/mlflow-app-id",
    s3_output_path="s3://my-bucket/output/",
    role="arn:aws:iam::123456789012:role/SageMakerRole",
    accept_eula=True,
)

# View and adjust hyperparameters
trainer.hyperparameters.get_info()
trainer.hyperparameters.max_epochs = 1
trainer.hyperparameters.global_batch_size = 32
trainer.hyperparameters.max_steps = 12

job = trainer.train(wait=True)
print(f"Job: {job.job_name}")
print(f"Status: {job.job_status}")
print(f"Output Model Package: {job.output_model_package_arn}")

Submit a training job with a custom Lambda agent


trainer = MultiTurnRLTrainer(
    model="openai-reasoning-gpt-oss-20b",
    agent_env=adapter,  # AgentLambda object or Lambda ARN string
    training_dataset="s3://my-bucket/prompts/prompts.parquet",
    mlflow_app_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/mlflow-app-id",
    s3_output_path="s3://my-bucket/output/",
    role="arn:aws:iam::123456789012:role/SageMakerRole",
    accept_eula=True,
)

trainer.hyperparameters.max_epochs = 1
trainer.hyperparameters.global_batch_size = 32
trainer.hyperparameters.max_steps = 12

job = trainer.train(wait=True)
print(f"Job: {job.job_name}")
print(f"Status: {job.job_status}")
print(f"Output Model Package: {job.output_model_package_arn}")

Submit a training job with Restricted Model package group for Nova

Refer to the step above (Create Restricted Model Package Group for Nova) on how to create a Restricted Model Package group.


from sagemaker.train.multi_turn_rl_trainer import MultiTurnRLTrainer

trainer = MultiTurnRLTrainer(
    model="nova-textgeneration-lite-v2",
    agent_env="arn:aws:bedrock-agentcore:us-west-2:123456789012:runtime/my-agent-runtime",
    training_dataset="s3://my-bucket/prompts/prompts.parquet",
    mlflow_app_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/mlflow-app-id",
    s3_output_path="s3://my-bucket/output/",
    role="arn:aws:iam::123456789012:role/SageMakerRole",
    accept_eula=True,
    output_model_package_group=output_mpg,
    intermediate_checkpoint_model_package_group=intermediate_mpg
)

# View and adjust hyperparameters
trainer.hyperparameters.get_info()
trainer.hyperparameters.max_epochs = 1
trainer.hyperparameters.global_batch_size = 32
trainer.hyperparameters.max_steps = 12

job = trainer.train(wait=True)
print(f"Job: {job.job_name}")
print(f"Status: {job.job_status}")
print(f"Output Model Package: {job.output_model_package_arn}")

AWS CLI

Create a training job using the CreateJob API. You specify the agent configuration, training data location, base model, and output settings in the JobConfigDocument.

To retrieve the full JobConfigDocument schema:


aws sagemaker list-job-schema-versions --job-category AgentRFT
aws sagemaker describe-job-schema-version --job-category AgentRFT --version "1.0.0"

Create job with Bedrock AgentCore


aws sagemaker create-job \
  --job-category AgentRFT \
  --job-name "my-agent-rft-job" \
  --role-arn "arn:aws:iam::123456789012:role/SageMakerFineTuningJobRole" \
  --job-config-schema-version "1.0.0" \
  --job-config-document '{
    "AgentConfig": {
      "BedrockAgentCoreConfig": {
        "AgentRuntimeArn": "arn:aws:bedrock-agentcore:us-west-2:123456789012:runtime/my-agent"
      }
    },
    "InputDataConfig": [...],
    "OutputDataConfig": {...},
    "ModelPackageConfig": {...},
    "TrainingConfig": {...}
  }' \
  --region us-west-2

Create job with custom Lambda agent


aws sagemaker create-job \
  --job-category AgentRFT \
  --job-name "my-custom-agent-rft-job" \
  --role-arn "arn:aws:iam::account-id:role/SageMakerFineTuningJobRole" \
  --job-config-schema-version "1.0.0" \
  --job-config-document '{
    "AgentConfig": {
      "CustomAgentLambdaConfig": {
        "LambdaArn": "arn:aws:lambda:us-west-2:account-id:function:rft-agent-forwarder"
      }
    },
    "InputDataConfig": [...],
    "OutputDataConfig": {...},
    "ModelPackageConfig": {...},
    "TrainingConfig": {...}
  }' \
  --region us-west-2

boto3

Create job with Bedrock AgentCore


import json
import boto3

sm = boto3.client("sagemaker")

response = sm.create_job(
    JobName="my-agent-rft-job",
    RoleArn="arn:aws:iam::123456789012:role/SageMakerFineTuningJobRole",
    JobCategory="AgentRFT",
    JobConfigSchemaVersion="1.0.0",
    JobConfigDocument=json.dumps({
        "AgentConfig": {
            "BedrockAgentCoreConfig": {
                "AgentRuntimeArn": "arn:aws:bedrock-agentcore:us-west-2:123456789012:runtime/my-agent"
            }
        },
        "InputDataConfig": [...],
        "OutputDataConfig": {...},
        "ModelPackageConfig": {...},
        "TrainingConfig": {...}
    })
)

print(f"Job ARN: {response['JobArn']}")

Create job with custom Lambda agent


import json
import boto3

sm = boto3.client("sagemaker")

response = sm.create_job(
    JobName="my-custom-agent-rft-job",
    RoleArn="arn:aws:iam::account-id:role/SageMakerFineTuningJobRole",
    JobCategory="AgentRFT",
    JobConfigSchemaVersion="1.0.0",
    JobConfigDocument=json.dumps({
        "AgentConfig": {
            "CustomAgentLambdaConfig": {
                "LambdaArn": "arn:aws:lambda:us-west-2:account-id:function:rft-agent-forwarder"
            }
        },
        "InputDataConfig": [...],
        "OutputDataConfig": {...},
        "ModelPackageConfig": {...},
        "TrainingConfig": {...}
    })
)

print(f"Job ARN: {response['JobArn']}")

Monitoring Training

Monitor your Training Job

Use the DescribeJob API to check your job's current status at any time. The job status transitions through InProgress, and then to Completed, Failed or Stopped.


aws sagemaker describe-job \
  --job-name "my-agent-rft-job" \
  --job-category AgentRFT \
  --region us-west-2

Use the SDK:


# Run without blocking
job = trainer.train(wait=False)
job.wait(poll=5, timeout=3000, max_log_lines=10)
# Check status
job.refresh()
print(f"Status: {job.job_status}")
print(f"Secondary Status: {job.secondary_status}")
print(f"Output Model Package: {job.output_model_package_arn}")
print(f"MLflow Details: {job.mlflow_details}")
print(f"Billable Tokens: {job.billable_token_usage}")
# Open MLflow tracking URL
job.get_mlflow_url()
# Stop a running job
job.stop()
# Attach to an existing job from a different session
existing_job = MultiTurnRLTrainer.attach(job_name="my-existing-job-name")
print(f"Status: {existing_job.job_status}")
print(f"Output Model: {existing_job.output_model_package_arn}")
# List all completed jobs
from sagemaker.train.agent_rft_job import AgentRFTJob
for j in AgentRFTJob.get_all(status_equals="Completed"):
    print(f"{j.job_name}: {j.job_status}")

Monitor Training in MLflow

SageMaker AI automatically integrates with managed MLflow to track your training job's progress, metrics, and artifacts. To enable MLflow tracking, include an MlflowConfig in your job's OutputDataConfig:


"OutputDataConfig": {
    "S3OutputPath": "s3://your-bucket/output/",
    "MlflowConfig": {
        "MlflowResourceArn": "arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/my-rft-mlflow-app"
    }
}

Prerequisites

Create a managed MLflow App in your account. For setup instructions, see MLflow App Setup.
Ensure your SageMaker AI execution role has permissions to write to the MLflow App (sagemaker-mlflow:* actions).
Include the MlflowResourceArn in your job configuration.

What gets logged

#	Category	What's logged	Where in MLflow UI
1	Training metrics	Per-step counters, throughput, datum and token accounting, all-clock duration of each phase of a step, trajectory-reward summary rollout batch, and turn-count distribution per trajectory	Metrics tab (time-series charts)
2	Trajectory traces	Full multi-turn conversations with tool calls and rewards	Traces tab

Detailed training metrics reference

The following metrics are logged at each training step.

Step counters and throughput (training/)

Metric	Description
`training/epoch`	Current epoch number
`training/global_step`	Global training step counter
`training/num_groups`	Trajectory groups in this step
`training/num_trajectories`	Total trajectories processed in this step
`training/total_tokens`	Tokens summed across all micro-batches in this step
`training/num_datums`	Training datums formed from trajectories
`training/datums_per_trajectory`	Average datums emitted per trajectory
`training/action_tokens_mean`	Mean action (response) tokens per trajectory
`training/obs_tokens_mean`	Mean observation (prompt) tokens per trajectory
`training/trainable_token_positions`	Total trainable target positions in this step
`training/nontrainable_token_positions`	Total non-trainable target positions in this step
`training/trainable_token_ratio`	Ratio: `trainable / (trainable + nontrainable)` token positions

Phase durations (timing_s/)

Metric	Description
`timing_s/step`	Total time for the full step
`timing_s/training`	Time for forward/backward passes and optimizer step
`timing_s/policy_update`	Time saving updated weights for the sampler
`timing_s/save_checkpoint`	Time saving a checkpoint (only on checkpoint steps)
`timing_s/eval`	Time running evaluation (only on eval steps)

Reward distribution (rollout/reward/)

Metric	Description
`rollout/reward/mean`	Mean trajectory reward across all groups
`rollout/reward/valid_mean`	Mean reward over only the valid (non-zero-advantage) groups; equals `mean` when no filtering occurred
`rollout/reward/std`	Standard deviation of trajectory rewards
`rollout/reward/min`	Minimum trajectory reward
`rollout/reward/max`	Maximum trajectory reward
`rollout/reward/zero_frac`	Fraction of trajectories with total reward exactly 0.0

Turn counts (rollout/turns/)

Metric	Description
`rollout/turns/mean`	Mean turns (transitions) per trajectory
`rollout/turns/min`	Minimum turns across trajectories
`rollout/turns/max`	Maximum turns across trajectories

Token lengths (rollout/tokens/)

Metric	Description
`rollout/tokens/prompt_mean`	Mean prompt token count per transition
`rollout/tokens/response_mean`	Mean response token count per transition
`rollout/tokens/response_std`	Standard deviation of response token counts
`rollout/tokens/response_min`	Minimum response tokens
`rollout/tokens/response_max`	Maximum response tokens (watch for clustering at `sampling_max_tokens`)

Log-probability health (rollout/logprob/)

Metric	Description
`rollout/logprob/zero_count`	Total zero-logprob tokens
`rollout/logprob/zero_frac`	Fraction of all logprobs that are exactly 0.0
`rollout/logprob/zero_per_group`	Average zero logprobs per trajectory group
`rollout/logprob/nz_mean`	Mean of non-zero logprobs
`rollout/logprob/nz_std`	Standard deviation of non-zero logprobs
`rollout/logprob/nz_min`	Minimum non-zero logprob
`rollout/logprob/nz_max`	Maximum non-zero logprob

Advantage distribution (rollout/advantage/)

Metric	Description
`rollout/advantage/mean`	Mean advantage value across all transitions
`rollout/advantage/std`	Standard deviation of advantages
`rollout/advantage/min`	Minimum advantage
`rollout/advantage/max`	Maximum advantage
`rollout/advantage/n_positive`	Transitions with positive advantage
`rollout/advantage/n_negative`	Transitions with negative advantage

Batch-quality classification (analysis/)

Metric	Description
`analysis/batch_completion_ratio`	`total_completed / batch_size` — fraction of expected groups that arrived
`analysis/batch_valid_ratio`	`valid_count / batch_size` — non-zero-advantage groups relative to full batch
`analysis/zero_adv_groups`	Groups where all transitions have near-zero advantage
`analysis/zero_adv_nonzero_reward`	Zero-advantage groups where at least one transition has reward not equal to 0 (all-correct case for binary rewards)
`analysis/zero_adv_zero_reward`	Zero-advantage groups where all rewards are 0 (all-wrong case)
`analysis/reward_variance_across_groups`	Variance of per-group mean rewards (high = diverse batch)
`analysis/mean_group_reward_spread`	Average within-group reward spread `max - min`

Evaluation reward and pass@k (val/reward/)

Emitted at baseline (step 0), at every val_every interval, and at the final step. Includes the same distribution metrics as rollout/reward plus group-reward metrics aggregated by prompt.

Distribution:

Metric	Description
`val/reward/mean`	Mean reward over the eval set
`val/reward/std`	Reward std dev
`val/reward/min`	Minimum reward
`val/reward/max`	Maximum reward
`val/reward/zero_frac`	Fraction of zero-reward trajectories

Group-reward (per-prompt aggregation):

Metric	Description
`val/reward/min_within_groups`	Average per-prompt minimum reward
`val/reward/mean_within_groups`	Average per-prompt mean reward
`val/reward/max_within_groups`	Average per-prompt maximum reward
`val/reward/std_within_groups`	Average per-prompt reward std (consistency)
`val/reward/rollouts_per_prompt`	Mean rollouts (n) across prompts
`val/reward/num_prompts`	Distinct prompts evaluated

Pass@k and success accounting:

Metric	Description
`val/reward/succeeded_rollouts`	Total rollouts with reward ≥ `success_threshold`
`val/reward/failed_rollouts`	Total rollouts with reward < `success_threshold`
`val/reward/success_threshold`	Threshold used (echoed for clarity)
`val/reward/pass_at_{k}`	Probability ≥1 of k samples passes
`val/reward/pass_power_{k}`	Probability all k samples pass (reliability)

Evaluation turn counts (val/turns/)

Metric	Description
`val/turns/mean`	Mean turns per eval trajectory
`val/turns/min`	Minimum turns
`val/turns/max`	Maximum turns

Evaluation token lengths (val/tokens/)

Metric	Description
`val/tokens/prompt_mean`	Mean prompt tokens per transition
`val/tokens/response_mean`	Mean response tokens per transition
`val/tokens/response_std`	Standard deviation of response tokens
`val/tokens/response_min`	Minimum response tokens
`val/tokens/response_max`	Maximum response tokens

Evaluation log-probability health (val/logprob/)

Metric	Description
`val/logprob/zero_count`	Total zero-logprob tokens
`val/logprob/zero_frac`	Fraction of zero logprobs
`val/logprob/zero_per_group`	Zero logprobs per group
`val/logprob/nz_mean`	Mean of non-zero logprobs
`val/logprob/nz_std`	Standard deviation of non-zero logprobs
`val/logprob/nz_min`	Minimum non-zero logprob
`val/logprob/nz_max`	Maximum non-zero logprob

Accessing the MLflow UI

Access the MLflow UI via a presigned URL:


aws sagemaker create-presigned-mlflow-app-url \
  --arn arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/mlflow-app-id \
  --region us-west-2

Copy the AuthorizedUrl from the output into your browser.

Agent trajectories and traces

During training, SageMaker AI records every interaction between your agent and the policy model as a trajectory — the complete record of one rollout. Each trajectory captures every prompt sent to the model, every response generated, every tool call made, and the final reward. Trajectories are published to your MLflow experiment as structured traces.

Trace contents

The input prompt from your training dataset
Every model inference turn (prompt, response, and token-level data)
Tool calls and their results, if your agent uses tools
The final reward score
Timing information for each turn

Viewing trajectories in the MLflow UI

Access the MLflow UI via a presigned URL:


aws sagemaker create-presigned-mlflow-app-url \
  --arn arn:aws:sagemaker:us-west-2:123456789012:mlflow-app/mlflow-app-id \
  --region us-west-2

Copy the AuthorizedUrl from the output into your browser.

Open the MLflow UI using the presigned URL above. Navigate to your experiment's run and select the Traces tab. Each trace represents one completed rollout and shows:

The system prompt and user prompt
Each assistant response (with thinking/reasoning if applicable)
Tool use spans showing which tools were called and their outputs
The reward score assigned to the trajectory

Use trajectories to debug low reward scores

Symptom	What to look for
Low reward across most rollouts	Are model responses coherent? Is the prompt format correct?
Tool-related failures	Are tool calls succeeding? Are inputs and outputs well-formed?
Agent looping	Is the agent repeating the same actions without making progress?
Truncated responses	Are responses being cut off by the maxTokens limit?

Get Training Results

When a training job completes, your trained model weights are stored as a SageMaker AI Model Package. This section explains how to find your results, understand the checkpoint types produced during training, and use them for deployment or continued training.

How results are stored

SageMaker AI stores training output as versioned, immutable Model Packages inside Model Package Groups. Multi-turn RL uses two separate groups, which you specify when creating a job:

Group	Purpose	Contents
Output Model Package Group	Final trained model	HuggingFace-compatible LoRA adapter weights (adapter_config.json + adapter_model.safetensors)
Intermediate Checkpoint Model Package Group	Resumable training state	LoRA adapter weights + optimizer states + training step metadata

Configure both groups in your ModelPackageConfig:


"ModelPackageConfig": {
  "OutputModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-final-models",
  "IntermediateCheckpointModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-intermediate-checkpoints"
}

Checkpoint types

Training produces two types of checkpoints, saved at every training step:

Model checkpoint (weights only)

Stored in the Output Model Package Group
Contains HuggingFace-compatible LoRA adapter weights in SafeTensors format
Use for inference, deployment, or as the starting point for a new training job
Created at every step, at job completion, and when a job is stopped

Resumable checkpoint (full state)

Stored in the Intermediate Checkpoint Model Package Group
Contains LoRA adapter weights, optimizer states, and per-GPU training step metadata
Use to resume an interrupted job from the exact step it stopped
Internal format — not directly usable for inference

Checkpoint lifecycle


Step 1 → Intermediate Checkpoint (resumable)
Step 1 → Intermediate Checkpoint (HF-compatible)
...
Step N-1 → Intermediate Checkpoint (resumable)
Step N-1 → Intermediate Checkpoint (HF-compatible)
...
Step N (final) → Model Checkpoint (HuggingFace LoRA) → Output Model Package Group

Retrieve your trained model

When a job completes successfully, the final model is saved as a Model Package in the Output Model Package Group. The OutputModelPackageArn field on the job record contains the ARN.

Check job completion and retrieve the output model ARN:


aws sagemaker describe-job \
--job-name "my-agent-rft-job" \
--job-category AgentRFT \
--region us-west-2

Look for OutputModelPackageArn in the response. Use it to describe the Model Package and get the S3 location of the weights:


aws sagemaker describe-model-package \
--model-package-name "arn:aws:sagemaker:us-west-2:123456789012:model-package/my-final-models/5"

If a job fails or is stopped before completion, the last intermediate checkpoint is promoted to the Output Model Package Group on a best-effort basis. Check OutputModelPackageArn the same way.

To monitor checkpoint creation during training, watch the ResumableCheckpoint and ModelCheckpoint fields in DescribeJob output.

Resume an interrupted job

If a job fails or is stopped mid-training, you can start a new job that picks up from the exact step where it left off. The platform restores the full training state — weights, optimizer momentum, and step counter — from the resumable checkpoint.

Specify a resumable checkpoint from the Intermediate Checkpoint Model Package Group as InputModelPackageArn:


"ModelPackageConfig": {
  "OutputModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-final-models",
  "IntermediateCheckpointModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-intermediate-checkpoints",
  "InputModelPackageArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package/my-intermediate-checkpoints/5"
}

The InputModelPackageArn must point to a resumable checkpoint (one with IsCheckpoint=true in its Model Package metadata). Training resumes from the step after the checkpoint — for example, if the checkpoint was saved at step 4, training continues from step 5.

The following must stay the same between the original job and the resumed job:

Base model
LoRA configuration (rank and alpha)
Hyperparameters (learning rate, batch size, etc.)
Dataset

Continue training on a new job (iterative training)

Iterative training lets you build on a previously trained model with a different dataset, different hyperparameters, or a refined reward function. Unlike resuming, this starts a fresh training run — the optimizer resets, the step counter resets to 0, and only the trained LoRA weights carry over.

Specify a model checkpoint from the Output Model Package Group as InputModelPackageArn:


"ModelPackageConfig": {
  "OutputModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-final-models",
  "IntermediateCheckpointModelPackageGroupArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package-group/my-intermediate-checkpoints",
  "InputModelPackageArn": "arn:aws:sagemaker:us-west-2:123456789012:model-package/my-final-models/3"
}

What you can change between iterations:

Hyperparameters (learning rate, batch size, max_steps, group_size, etc.)
Dataset (different prompts or data distribution)
Reward function
Agent configuration

What must stay the same:

Base model — the LoRA adapter is tied to the base model architecture

Common patterns for iterative training:

Curriculum learning — train on easier problems first, then continue on harder ones
Reward refinement — start with a simple reward function, then iterate with a more nuanced one
Hyperparameter adjustment — increase batch size or tune learning rate after observing initial training dynamics

Checkpoint best practices

Monitor checkpoint creation. Use DescribeJob to track ResumableCheckpoint and ModelCheckpoint fields during training so you know what's available if you need to resume.
Plan for failures on long jobs. If a job has many steps, design your workflow to resume from checkpoints rather than restart from scratch.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Creating assets

Model evaluation