本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# HyperPod Slurm 叢集 DPO 教學課程 (GPU)
<a name="hyperpod-gpu-slurm-dpo-tutorial"></a>

下列教學課程會設定 Slurm 環境，並在 Llama 80 億參數模型上啟動直接喜好設定最佳化 (DPO) 任務。

**先決條件**  
開始設定環境之前，請確定您具有下列先決條件：  
設定 HyperPod GPU Slurm 叢集  
您的 HyperPod Slurm 叢集必須啟用 Nvidia Enroot 和 Pyxis (這些項目預設為啟用)。
共用儲存位置。它可以是可從叢集節點存取的 Amazon FSx 檔案系統或 NFS 系統。
採用下列其中一種格式的記號化二進位喜好設定資料集：  
JSON
JSONGZ (壓縮 JSON)
ARROW
(選用) 如果您需要來自 HuggingFace 的預先訓練權重，或者如果您要訓練 Llama 3.2 模型，則您必須在開始訓練之前取得 HuggingFace 權杖。如需取得權杖的詳細資訊，請參閱[使用者存取權杖](https://huggingface.co/docs/hub/en/security-tokens)。

## 設定 HyperPod GPU Slurm 環境
<a name="hyperpod-gpu-slurm-dpo-hyperpod-gpu-slurm-environment"></a>

若要在 Slurm 叢集上啟動訓練任務，請執行下列動作：
+ 對 Slurm 叢集的主節點執行 SSH。
+ 登入後，請設定虛擬環境。請確定您使用的是 Python 3.9 或更新版本。

  ```
  #set up a virtual environment
  python3 -m venv ${PWD}/venv
  source venv/bin/activate
  ```
+ 將 SageMaker HyperPod 配方和 SageMaker HyperPod 轉接器儲存庫複製到共用儲存位置。共用儲存位置可以是可從叢集節點存取的 Amazon FSx 檔案系統或 NFS 系統。

  ```
  git clone https://github.com/aws/sagemaker-hyperpod-training-adapter-for-nemo.git
  git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
  cd sagemaker-hyperpod-recipes
  pip3 install -r requirements.txt
  ```
+ 使用 Enroot 建立 squash 檔案。若要尋找 SMP 容器的最新版本，請參閱 [SageMaker 模型平行化程式庫的版本備註](model-parallel-release-notes.md)。如需使用 Enroot 檔案的詳細資訊，請參閱[建置最佳化 Nemo-Launcher AWS映像](https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/2.nemo-launcher#2-build-aws-optimized-nemo-launcher-image)。

  ```
  REGION="{{<region>}}"
  IMAGE="658645717510.dkr.ecr.${REGION}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
  aws ecr get-login-password --region ${REGION} | docker login --username AWS --password-stdin 658645717510.dkr.ecr.${REGION}.amazonaws.com
  enroot import -o $PWD/smdistributed-modelparallel.sqsh dockerd://${IMAGE}
  mv $PWD/smdistributed-modelparallel.sqsh "/fsx/{{<any-path-in-the-shared-filesystem>}}"
  ```
+ 若要使用 Enroot squash 檔案開始訓練，請使用下列範例來修改 `recipes_collection/config.yaml` 檔案。

  ```
  container: /fsx/path/to/your/smdistributed-modelparallel.sqsh
  ```

## 啟動訓練任務
<a name="hyperpod-gpu-slurm-dpo-launch-training-job"></a>

若要在單一 Slurm 運算節點上為序列長度為 8192 的 Llama 80 億參數模型啟動 DPO 任務，請將啟動指令碼 `launcher_scripts/llama/run_hf_llama3_8b_seq8k_gpu_dpo.sh` 設定為下列項目：
+ `IMAGE`：來自環境設定區段的容器。
+ `HF_MODEL_NAME_OR_PATH`：在配方的 hf\_model\_name\_or\_path 參數中定義預先訓練權重的名稱或路徑。
+ (選用) 如果您需要來自 HuggingFace 的預先訓練權重，您可以設定下列金鑰/值對，以提供 HuggingFace 權杖：

  ```
  recipes.model.hf_access_token=${HF_ACCESS_TOKEN}
  ```

**注意**  
此設定中用於 DPO 的參考模型自動衍生自正在訓練的基礎模型 (未明確定義任何個別的參考模型)。DPO 特定超參數已預先設定下列預設值：  
`beta`：0.1 (控制 KL 散度正規化的強度)
`label_smoothing`：0.0 (未將平滑套用至喜好設定標籤)

```
recipes.dpo.beta=${BETA}
recipes.dpo.label_smoothing=${LABEL_SMOOTHING}
```

```
#!/bin/bash
IMAGE="${YOUR_IMAGE}"
SAGEMAKER_TRAINING_LAUNCHER_DIR="${SAGEMAKER_TRAINING_LAUNCHER_DIR:-${PWD}}"

TRAIN_DIR="${YOUR_TRAIN_DIR}" # Location of training dataset
VAL_DIR="${YOUR_VAL_DIR}" # Location of validation dataset
# experiment output directory
EXP_DIR="${YOUR_EXP_DIR}"
HF_ACCESS_TOKEN="${YOUR_HF_TOKEN}"
HF_MODEL_NAME_OR_PATH="${HF_MODEL_NAME_OR_PATH}"
BETA="${BETA}"
LABEL_SMOOTHING="${LABEL_SMOOTHING}"

# Add hf_model_name_or_path and turn off synthetic_data
HYDRA_FULL_ERROR=1 python3 ${SAGEMAKER_TRAINING_LAUNCHER_DIR}/main.py \
recipes=fine-tuning/llama/hf_llama3_8b_seq8k_gpu_dpo \
base_results_dir=${SAGEMAKER_TRAINING_LAUNCHER_DIR}/results \
recipes.run.name="hf_llama3_dpo" \
recipes.exp_manager.exp_dir="$EXP_DIR" \
recipes.model.data.train_dir="$TRAIN_DIR" \
recipes.model.data.val_dir="$VAL_DIR" \
recipes.model.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" \
container="${IMAGE}" \
+cluster.container_mounts.0="/fsx:/fsx" \
recipes.model.hf_access_token="${HF_ACCESS_TOKEN}" \
recipes.dpo.enabled=true \
recipes.dpo.beta="${BETA}" \
recipes.dpo.label_smoothing="${LABEL_SMOOTHING}$" \
```

在上述指令碼中設定了所有必要參數之後，您可以透過執行該指令碼來啟動訓練任務。

```
bash launcher_scripts/llama/run_hf_llama3_8b_seq8k_gpu_dpo.sh
```

如需 Slurm 叢集組態的詳細資訊，請參閱 [在 HyperPod Slurm 上執行訓練任務](cluster-specific-configurations-run-training-job-hyperpod-slurm.md)。