

# Foundation models and hyperparameters for fine-tuning
<a name="jumpstart-foundation-models-fine-tuning"></a>

Foundation models are computationally expensive and trained on a large, unlabeled corpus. Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model. 

Fine-tuning might be useful to you if you need: 
+ to customize your model to specific business needs
+ your model to successfully work with domain-specific language, such as industry jargon, technical terms, or other specialized vocabulary
+ enhanced performance for specific tasks
+ accurate, relative, and context-aware responses in applications
+ responses that are more factual, less toxic, and better-aligned to specific requirements

There are two main approaches that you can take for fine-tuning depending on your use case and chosen foundation model.

1. If you're interested in fine-tuning your model on domain-specific data, see [Fine-tune a large language model (LLM) using domain adaptation](jumpstart-foundation-models-fine-tuning-domain-adaptation.md).

1. If you're interested in instruction-based fine-tuning using prompt and response examples, see [Fine-tune a large language model (LLM) using prompt instructions](jumpstart-foundation-models-fine-tuning-instruction-based.md).

## Foundation models available for fine-tuning
<a name="jumpstart-foundation-models-fine-tuning-models"></a>

You can fine-tune any of the following JumpStart foundation models:
+ Bloom 3B
+ Bloom 7B1
+ BloomZ 3B FP16
+ BloomZ 7B1 FP16
+ Code Llama 13B
+ Code Llama 13B Python
+ Code Llama 34B
+ Code Llama 34B Python
+ Code Llama 70B
+ Code Llama 70B Python
+ Code Llama 7B
+ Code Llama 7B Python
+ CyberAgentLM2-7B-Chat (CALM2-7B-Chat)
+ Falcon 40B BF16
+ Falcon 40B Instruct BF16
+ Falcon 7B BF16
+ Falcon 7B Instruct BF16
+ Flan-T5 Base
+ Flan-T5 Large
+ Flan-T5 Small
+ Flan-T5 XL
+ Flan-T5 XXL
+ Gemma 2B
+ Gemma 2B Instruct
+ Gemma 7B
+ Gemma 7B Instruct
+ GPT-2 XL
+ GPT-J 6B
+ GPT-Neo 1.3B
+ GPT-Neo 125M
+ GPT-NEO 2.7B
+ LightGPT Instruct 6B
+ Llama 2 13B
+ Llama 2 13B Chat
+ Llama 2 13B Neuron
+ Llama 2 70B
+ Llama 2 70B Chat
+ Llama 2 7B
+ Llama 2 7B Chat
+ Llama 2 7B Neuron
+ Mistral 7B
+ Mixtral 8x7B
+ Mixtral 8x7B Instruct
+ RedPajama INCITE Base 3B V1
+ RedPajama INCITE Base 7B V1
+ RedPajama INCITE Chat 3B V1
+ RedPajama INCITE Chat 7B V1
+ RedPajama INCITE Instruct 3B V1
+ RedPajama INCITE Instruct 7B V1
+ Stable Diffusion 2.1

## Commonly supported fine-tuning hyperparameters
<a name="jumpstart-foundation-models-fine-tuning-hyperparameters"></a>

Different foundation models support different hyperparameters when fine-tuning. The following are commonly-supported hyperparameters that can further customize your model during training:


| Inference Parameter | Description | 
| --- | --- | 
| `epoch` | The number of passes that the model takes through the fine-tuning dataset during training. Must be an integer greater than 1.  | 
| `learning_rate` |  The rate at which the model weights are updated after working through each batch of fine-tuning training examples. Must be a positive float greater than 0.  | 
| `instruction_tuned` |  Whether to instruction-train the model or not. Must be `'True'` or `'False'`.  | 
| `per_device_train_batch_size` |  The batch size per GPU core or CPU for training. Must be a positive integer. | 
| `per_device_eval_batch_size` |  The batch size per GPU core or CPU for evaluation. Must be a positive integer.  | 
| `max_train_samples` |  For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means that the model uses all of the training samples. Must be a positive integer or -1.  | 
| `max_val_samples` |  For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means that the model uses all of the validation samples. Must be a positive integer or -1.  | 
| `max_input_length` |  Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, `max_input_length` is set to the minimum of 1024 and the `model_max_length` defined by the tokenizer. If set to a positive value, `max_input_length` is set to the minimum of the provided value and the `model_max_length` defined by the tokenizer. Must be a positive integer or -1.  | 
| `validation_split_ratio` |  If there is no validation channel, ratio of train-validation split from the training data. Must be between 0 and 1.  | 
| `train_data_split_seed` |  If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the model. Must be an integer.  | 
| `preprocessing_num_workers` |  The number of processes to use for the pre-processing. If `None`, main process is used for pre-processing.  | 
| `lora_r` |  Low-rank adaptation (LoRA) r value, which acts as the scaling factor for weight updates. Must be a positive integer.  | 
| `lora_alpha` |  Low-rank adaptation (LoRA) alpha value, which acts as the scaling factor for weight updates. Generally 2 to 4 times the size of `lora_r`. Must be a positive integer.  | 
| `lora_dropout` |  Dropout value for low-rank adaptation (LoRA) layers Must be a positive float between 0 and 1.  | 
| `int8_quantization` |  If `True`, model is loaded with 8 bit precision for training.  | 
| `enable_fsdp` |  If `True`, training uses Fully Sharded Data Parallelism.  | 

You can specify hyperparameter values when you fine-tune your model in Studio. For more information, see [Fine-tune a model in Studio](jumpstart-foundation-models-use-studio-updated-fine-tune.md). 

You can also override default hyperparameter values when fine-tuning your model using the SageMaker Python SDK. For more information, see [Fine-tune publicly available foundation models with the `JumpStartEstimator` class](jumpstart-foundation-models-use-python-sdk-estimator-class.md).

# Fine-tune a large language model (LLM) using domain adaptation
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation"></a>

Domain adaptation fine-tuning allows you to leverage pre-trained foundation models and adapt them to specific tasks using limited domain-specific data. If prompt engineering efforts do not provide enough customization, you can use domain adaption fine-tuning to get your model working with domain-specific language, such as industry jargon, technical terms, or other specialized data. This fine-tuning process modifies the weights of the model. 

To fine-tune your model on a domain-specific dataset:

1. Prepare your training data. For instructions, see [Prepare and upload training data for domain adaptation fine-tuning](#jumpstart-foundation-models-fine-tuning-domain-adaptation-prepare-data).

1. Create your fine-tuning training job. For instructions, see [Create a training job for instruction-based fine-tuning](#jumpstart-foundation-models-fine-tuning-domain-adaptation-train).

You can find end-to-end examples in [Example notebooks](#jumpstart-foundation-models-fine-tuning-domain-adaptation-examples).

Domain adaptation fine-tuning is available with the following foundation models:

**Note**  
Some JumpStart foundation models, such as Llama 2 7B, require acceptance of an end-user license agreement before fine-tuning and performing inference. For more information, see [End-user license agreements](jumpstart-foundation-models-choose.md#jumpstart-foundation-models-choose-eula).
+ Bloom 3B
+ Bloom 7B1
+ BloomZ 3B FP16
+ BloomZ 7B1 FP16
+ GPT-2 XL
+ GPT-J 6B
+ GPT-Neo 1.3B
+ GPT-Neo 125M
+ GPT-NEO 2.7B
+ Llama 2 13B
+ Llama 2 13B Chat
+ Llama 2 13B Neuron
+ Llama 2 70B
+ Llama 2 70B Chat
+ Llama 2 7B
+ Llama 2 7B Chat
+ Llama 2 7B Neuron

## Prepare and upload training data for domain adaptation fine-tuning
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation-prepare-data"></a>

Training data for domain adaptation fine-tuning can be provided in CSV, JSON, or TXT file format. All training data must be in a single file within a single folder.

The training data is taken from the **Text** column for CSV or JSON training data files. If no column is labeled **Text**, then the training data is taken from the first column for CSV or JSON training data files.

The following is an example body of a TXT file to be used for fine-tuning:

```
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of ....
```

### Split data for training and testing
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation-split-data"></a>

You can optionally provide another folder containing validation data. This folder should also include one CSV, JSON, or TXT file. If no validation dataset is provided, then a set amount of the training data is set aside for validation purposes. You can adjust the percentage of training data used for validation when you choose the hyperparameters for fine-tuning your model. 

### Upload fine-tuning data to Amazon S3
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation-upload-data"></a>

Upload your prepared data to Amazon Simple Storage Service (Amazon S3) to use when fine-tuning a JumpStart foundation model. You can use the following commands to upload your data:

```
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.txt"
train_data_location = f"s3://{output_bucket}/training_folder"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")
```

## Create a training job for instruction-based fine-tuning
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation-train"></a>

After your data is uploaded to Amazon S3, you can fine-tune and deploy your JumpStart foundation model. To fine-tune your model in Studio, see [Fine-tune a model in Studio](jumpstart-foundation-models-use-studio-updated-fine-tune.md). To fine-tune your model using the SageMaker Python SDK, see [Fine-tune publicly available foundation models with the `JumpStartEstimator` class](jumpstart-foundation-models-use-python-sdk-estimator-class.md).

## Example notebooks
<a name="jumpstart-foundation-models-fine-tuning-domain-adaptation-examples"></a>

For more information on domain adaptation fine-tuning, see the following example notebooks:
+ [SageMaker JumpStart Foundation Models - Fine-tuning text generation GPT-J 6B model on domain specific dataset](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/domain-adaption-finetuning-gpt-j-6b.html)
+ [Fine-tune LLaMA 2 models on JumpStart](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-finetuning.html)

# Fine-tune a large language model (LLM) using prompt instructions
<a name="jumpstart-foundation-models-fine-tuning-instruction-based"></a>

Instruction-based fine-tuning uses labeled examples to improve the performance of a pre-trained foundation model on a specific task. The labeled examples are formatted as prompt, response pairs and phrased as instructions. This fine-tuning process modifies the weights of the model. For more information on instruction-based fine-tuning, see the papers [Introducing FLAN: More generalizable Language Models with Instruction Fine-Tuning](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html) and [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416).

Fine-tuned LAnguage Net (FLAN) models use instruction tuning to make models more amenable to solving general downstream NLP tasks. Amazon SageMaker JumpStart provides a number of foundation models in the FLAN model family. For example, FLAN-T5 models are instruction fine-tuned on a wide range of tasks to increase zero-shot performance for a variety of common use cases. With additional data and fine-tuning, instruction-based models can be further adapted to more specific tasks that weren’t considered during pre-training. 

To fine-tune a LLM on a specific task using prompt-response pairs task instructions:

1. Prepare your instructions in JSON files. For more information about the required format for the prompt-response pair files and the structure of the data folder, see [Prepare and upload training data for instruction-based fine-tuning](#jumpstart-foundation-models-fine-tuning-instruction-based-prepare-data).

1. Create your fine-tuning training job. For instructions, see [Create a training job for instruction-based fine-tuning](#jumpstart-foundation-models-fine-tuning-instruction-based-train).

You can find end-to-end examples in [Example notebooks](#jumpstart-foundation-models-fine-tuning-instruction-based-examples).

Only a subset of JumpStart foundation models are compatible with instruction-based fine-tuning. Instruction-based fine-tuning is available with the following foundation models: 

**Note**  
Some JumpStart foundation models, such as Llama 2 7B, require acceptance of an end-user license agreement before fine-tuning and performing inference. For more information, see [End-user license agreements](jumpstart-foundation-models-choose.md#jumpstart-foundation-models-choose-eula).
+ Flan-T5 Base
+ Flan-T5 Large
+ Flan-T5 Small
+ Flan-T5 XL
+ Flan-T5 XXL
+ Llama 2 13B
+ Llama 2 13B Chat
+ Llama 2 13B Neuron
+ Llama 2 70B
+ Llama 2 70B Chat
+ Llama 2 7B
+ Llama 2 7B Chat
+ Llama 2 7B Neuron
+ Mistral 7B
+ RedPajama INCITE Base 3B V1
+ RedPajama INCITE Base 7B V1
+ RedPajama INCITE Chat 3B V1
+ RedPajama INCITE Chat 7B V1
+ RedPajama INCITE Instruct 3B V1
+ RedPajama INCITE Instruct 7B V1

## Prepare and upload training data for instruction-based fine-tuning
<a name="jumpstart-foundation-models-fine-tuning-instruction-based-prepare-data"></a>

Training data for instruction-based fine-tuning must be provided in JSON Lines text file format, where each line is a dictionary. All training data must be in a single folder. The folder can include multiple .jsonl files. 

The training folder can also include a template JSON file (`template.json`) that describes the input and output formats of your data. If no template file is provided, the following template file is used: 

```
{
  "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
  "completion": "{response}"
}
```

According to the `template.json` file, each .jsonl entry of the training data must include `{instruction}`, `{context}`, and `{response}` fields. 

If you provide a custom template JSON file, use the `"prompt"` and `"completion"` keys to define your own required fields. According to the following custom template JSON file, each .jsonl entry of the training data must include `{question}`, `{context}`, and `{answer}` fields:

```
{
  "prompt": "question: {question} context: {context}",
  "completion": "{answer}"
}
```

### Split data for training and testing
<a name="jumpstart-foundation-models-fine-tuning-instruction-based-split-data"></a>

You can optionally provide another folder containing validation data. This folder should also include one or more .jsonl files. If no validation dataset is provided, then a set amount of the training data is set aside for validation purposes. You can adjust the percentage of training data used for validation when you choose the hyperparameters for fine-tuning your model. 

### Upload fine-tuning data to Amazon S3
<a name="jumpstart-foundation-models-fine-tuning-instruction-based-upload-data"></a>

Upload your prepared data to Amazon Simple Storage Service (Amazon S3) to use when fine-tuning a JumpStart foundation model. You can use the following commands to upload your data:

```
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")
```

## Create a training job for instruction-based fine-tuning
<a name="jumpstart-foundation-models-fine-tuning-instruction-based-train"></a>

After your data is uploaded to Amazon S3, you can fine-tune and deploy your JumpStart foundation model. To fine-tune your model in Studio, see [Fine-tune a model in Studio](jumpstart-foundation-models-use-studio-updated-fine-tune.md). To fine-tune your model using the SageMaker Python SDK, see [Fine-tune publicly available foundation models with the `JumpStartEstimator` class](jumpstart-foundation-models-use-python-sdk-estimator-class.md).

## Example notebooks
<a name="jumpstart-foundation-models-fine-tuning-instruction-based-examples"></a>

For more information on instruction-based fine-tuning, see the following example notebooks:
+ [Fine-tune LLaMA 2 models on JumpStart](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-finetuning.html)
+ [Introduction to SageMaker JumpStart - Text Generation with Mistral models](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/mistral-7b-instruction-domain-adaptation-finetuning.html)
+ [Introduction to SageMaker JumpStart - Text Generation with Falcon models](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/falcon-7b-instruction-domain-adaptation-finetuning.html)
+ [SageMaker JumpStart Foundation Models - HuggingFace Text2Text Instruction Fine-Tuning](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/instruction-fine-tuning-flan-t5.html)