本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 實作 MLOps
<a name="mlops"></a>

Amazon SageMaker AI 支援在具有持續整合和部署的生產環境中實作機器學習模型的功能。下列主題提供使用 SageMaker AI 時如何設定 MLOps 基礎架構的詳細資訊。

**Topics**
+ [為什麼要使用 MLOps？](sagemaker-projects-why.md)
+ [SageMaker Experiments](experiments-mlops.md)
+ [SageMaker AI Workflows](workflows.md)
+ [Amazon SageMaker 機器學習 (ML) 歷程追蹤](lineage-tracking.md)
+ [使用模型註冊庫進行模型註冊部署](model-registry.md)
+ [SageMaker AI 中的模型部署](model-deploy-mlops.md)
+ [SageMaker Model Monitor](model-monitor-mlops.md)
+ [使用 SageMaker 專案進行 MLOps 自動化](sagemaker-projects.md)
+ [Amazon SageMaker AI MLOps 疑難排解](mlopsfaq.md)

# 為什麼要使用 MLOps？
<a name="sagemaker-projects-why"></a>

當您從執行個別人工智慧和機器學習 (AI/ML) 專案轉變為使用 AI/ML 大規模轉型業務時，ML 作業 (MLOps) 準則可以提供協助。MLOps 體現了 AI/ML 專案在專案管理、CI/CD 和品質保證方面的獨特之處，幫助您縮短交付時間，減少缺陷，並提高資料科學的生產力。MLOps 是建立在將 DevOps 實務套用至機器學習工作負載的方法。如需 DevOps 原則的討論，請參閱白皮書 [DevOps on AWS簡介](https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/welcome.html?did=wp_card)。若要進一步了解使用 AWS 服務的實作，請參閱[在 上實作 CI/CD AWS](https://d1.awsstatic.com/whitepapers/DevOps/practicing-continuous-integration-continuous-delivery-on-AWS.pdf) 和[基礎設施即程式碼](https://d1.awsstatic.com/whitepapers/DevOps/infrastructure-as-code.pdf)。

與 DevOps 一樣，MLOps 依賴機器學習開發生命週期的協同合作和精簡的方法，在這個方法中，人員、程序和技術的交集將開發、建置和操作機器學習工作負載所需的端對端活動進行最佳化。

MLOP 專注於資料科學和資料工程的交集，結合現有 DevOps 實務，以簡化整個機器學習開發生命週期的模型交付。MLOps 是將機器學習工作負載整合至發行管理、CI/CD 和操作的準則。MLOps 需要整合軟體開發、操作、資料工程和資料科學。

## MLOps 面臨的問題
<a name="sagemaker-projects-why-challenges"></a>

雖然 MLOps 可以提供寶貴的工具來協助您擴展業務，但是當您將 MLOps 整合到機器學習工作負載時，您可能會遇到某些問題。

**專案管理**
+ 機器學習 (ML) 專案涉及資料科學家，這是一個相對較新的角色，通常不會整合到跨職能團隊中。這些新團隊成員所說的技術語言往往與產品擁有者和軟體工程師截然不同，這加劇了將業務需求轉化為技術需求的常見問題。

**溝通與協同合作**
+ 建立機器學習 (ML) 專案的可見性，並實現不同利益相關者 (如資料工程師、資料科學家、機器學習 (ML) 工程師和 DevOps) 之間的合作，對於確保成功成果越來越重要。


**一切都是策劃給您是嗎**
+ 在開發活動中使用生產資料、更長的實驗生命週期、資料管道的相依性、重新訓練部署管道，以及評估模型效能的獨特指標。
+ 模型的生命週期通常獨立於與這些模型整合的應用程式和系統。
+ 整個端對端系統可透過版本化的程式碼和成品重現。DevOps 專案使用基礎設施即程式碼 (IaC) 和組態即程式碼 (CaC) 來建置環境，並使用管道即程式碼 (PaC) 來確保一致的 CI/CD 模式。管道必須與大數據和機器學習 (ML) 訓練工作流程整合。這通常意味著管道是傳統的 CI/CD 工具和另一個工作流程引擎的組合。許多機器學習 (ML) 專案都有重要的政策考量，因此管道可能也需要強制執行這些政策。偏移的輸入資料會產生偏移的結果，企業利益相關者越來越關注這一問題。

**CI/CD**
+ 在 MLOps 中，來源資料與來源程式碼是一流的輸入。這就是為什麼 MLOps 要求在溯源或推論資料變更時對來源資料進行版本化並啟動管道執行的原因。
+ 管道還必須對機器學習 (ML) 模型以及輸入和其他輸出進行版本化，才能提供可追溯性。
+ 自動化測試必須包括在建置階段和模型生產時對機器學習 (ML) 模型的正確驗證。
+ 建置階段可能包括模型訓練和再訓練，這是一個耗時的資源密集型流程。管道必須足夠精細，以便僅在來源資料或機器學習 (ML) 程式碼變更時執行完整的訓練週期，而不是在相關元件變更時執行完整的訓練週期。
+ 由於機器學習程式碼通常是整體解決方案的一小部分，因此部署管道也可能納入封裝模型以供其他應用程式和系統使用的 API 所需的其他步驟。

**監控和記錄**
+ 擷取模型訓練指標以及模型實驗所需的特徵工程和模型訓練階段。調整機器學習 (ML) 模型需要操作輸入資料的形式以及演算法超參數，並有系統地擷取這些實驗。實驗追蹤可協助資料科學家更有效地工作，並為其工作提供可重複的快照。
+ 部署的機器學習 (ML) 模型需要監控傳遞至模型以進行推論的資料，以及標準端點穩定性和效能指標。監控系統還必須擷取透過適當的機器學習 (ML) 指標評估的模型輸出的品質。

## MLOps 的優勢
<a name="sagemaker-projects-benefits"></a>

採用 MLOps 實務可提供下列優點，讓您更快上市機器學習 (ML) 專案。
+ **生產力**：為自助服務環境提供可存取策劃彙整資料集的存取權，可讓資料工程師和資料科學家更快地移動資料，減少資料遺失或無效資料的情況。
+ **重複性**：自動化 MLDC 中的所有步驟可協助您確保可重複的程序，包括如何訓練、評估、版本化和部署模型。
+ **可靠性**：結合 CI/CD 實務，不僅可以快速部署，還可以提高品質和一致性。
+ **可稽核性**：版本化從資料科學實驗到來源資料再到訓練模型的所有輸入和輸出，這意味著我們可以準確展示模型的建置方式和部署位置。
+ **資料和模型品質**：MLOps 可讓我們強制執行政策，防範模型偏差，並追蹤資料統計屬性的變更和模型品質隨著時間的推移的變化情況。

# SageMaker Experiments
<a name="experiments-mlops"></a>

機器學習 (ML) 模型構建需要在調整算法、模型架構和參數時進行多次訓練反覆運算，以實現高預測精度。您可以使用 Amazon SageMaker Experiments 追蹤這些訓練反覆運算中的輸入和輸出，以改善團隊內試驗和協作的重複性。您也可以追蹤與模型訓練任務相關的參數、指標、資料集和其他成品。SageMaker Experiments 提供單一介面，您可以在其中將進行中的訓練任務視覺化、在團隊內共用實驗，以及直接從實驗部署模型。

如需了解 SageMaker Experiments，請參閱[Studio Classic 中的 Amazon SageMaker Experiments](experiments.md)。

# SageMaker AI Workflows
<a name="workflows"></a>

擴展機器學習 (ML) 操作時，您可以使用 Amazon SageMaker AI 全受管工作流程服務，為您的 ML 生命週期實作持續整合和部署 (CI/CD) 實務。使用 Pipeline SDK ，您可以選擇管道步驟並將其整合到一個統一的解決方案中，從資料準備到模型部署都能自動執行模型建置程序。對於 Kubernetes 型架構，您可以在您的 Kubernetes 叢集上安裝 SageMaker AI Operators，以使用 Kubernetes API 和命令列工具 (例如 `kubectl`) 原生建立 SageMaker AI 任務。使用 Kubeflow 管道的 SageMaker AI 元件，您可以從 Kubeflow 管道建立和監控原生 SageMaker AI 任務。來自 SageMaker AI 的工作參數、狀態和輸出都可以從 Kubeflow Pipelines UI 存取。最後，如果您想要排程批次任務，您可以使用 AWS Batch 任務佇列整合或 Jupyter 筆記本型工作流程服務，根據您定義的排程啟動獨立或定期執行。

總之，SageMaker AI 提供下列工作流程技術：
+ [管道](pipelines.md)：用於建置和管理機器學習 (ML) 管道的工具。
+ [Kubernetes 協調](kubernetes-workflows.md)：用於 Kubernetes 叢集的 SageMaker AI 自訂運算子，以及用於 Kubeflow Pipelines 的元件。
+ [SageMaker 筆記本工作](notebook-auto-run.md)：依需求或排定的 Jupyter 筆記本非互動式批次執行。

您也可以利用與 SageMaker AI 整合的其他服務來建立您的工作流程。選項包括下列服務：
+ [Airflow 工作流程](https://sagemaker.readthedocs.io/en/stable/workflows/airflow/index.html)：SageMaker API 可匯出用於建立和管理 Airflow 工作流程的組態。
+ [AWS Step Functions](https://sagemaker.readthedocs.io/en/stable/workflows/step_functions/index.html)：Python 中的多步驟 ML 工作流程，可協調 SageMaker AI 基礎設施，而不必單獨佈建您的資源。
+ [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/getting-started-sagemaker.html)：將 SageMaker AI 訓練任務提交至 AWS Batch 任務佇列，您可以在其中排定任務在運算環境中執行的優先順序和排程。

有關管理 SageMaker 訓練和推理的詳細資訊，請參閱[Amazon SageMaker Python SDK 工作流程](https://sagemaker.readthedocs.io/en/stable/workflows/index.html)。

**Topics**
+ [管道](pipelines.md)
+ [Kubernetes 協調](kubernetes-workflows.md)
+ [SageMaker 筆記本工作](notebook-auto-run.md)
+ [排程您的 ML 工作流程](workflow-scheduling.md)
+ [AWS Batch 支援 SageMaker AI 訓練任務](training-job-queues.md)

# 管道
<a name="pipelines"></a>

Amazon SageMaker Pipelines 是一種專門建置的工作流程協同運作服務，可自動化機器學習 (ML) 開發。

相較於其他 AWS 工作流程產品，管道提供下列優點：

**自動擴展無伺服器基礎設施** 您不需要管理基礎協同運作基礎設施來執行 Pipelines，這可讓您專注於核心 ML 任務。SageMaker AI 會自動根據您的 ML 工作負載需求佈建、擴展和關閉管道協同運作運算資源。

**直覺式使用者體驗** 您可以透過選擇的介面建立和管理管道：視覺化編輯器、SDK、API或 JSON。您可以在 Amazon SageMaker Studio 視覺化介面中拖放各種 ML 步驟來編寫管道。下列螢幕擷取畫面顯示適用於管道的 Studio 視覺化編輯器。

![\[Studio 中 Pipelines 視覺化拖放介面的螢幕擷取畫面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


如果您偏好以程式設計方式管理 ML 工作流程，SageMaker Python SDK 會提供進階協同運作特徵。如需詳細資訊，請參閱 SageMaker Python SDK 文件中的 [Amazon SageMaker Pipelines](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html)。

**AWS 整合**管道提供與所有 SageMaker AI 功能和其他 AWS 服務的無縫整合，以自動化資料處理、模型訓練、微調、評估、部署和監控任務。您可以將 SageMaker AI 特徵納入 Pipelines，並使用深層連結來大規模建立、監控和偵錯 ML 工作流程，以導覽這些特徵。

使用 Pipelines **降低成本**，您只需為 SageMaker Studio 環境和 Pipelines 協調的基礎任務 (例如 SageMaker Training、SageMaker Processing、SageMaker AI Inference 和 Amazon S3 資料儲存) 付費。

使用 Pipelines 的**可稽核性和歷程追蹤**，您可以使用內建版本控制來追蹤管道更新和執行的歷程記錄。Amazon SageMaker ML 歷程追蹤可協助您分析端對端 ML 開發生命週期中的資料來源和資料消費者。

**Topics**
+ [管道概觀](pipelines-overview.md)
+ [Pipelines 動作](pipelines-build.md)

# 管道概觀
<a name="pipelines-overview"></a>

Amazon SageMaker AI 管道是有向無環圖 (DAG) 中的一系列互連步驟，這些步驟是使用拖放 UI 或 [Pipelines SDK](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html) 所定義的。您也可以使用[管道定義 JSON 結構描述](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/)建置管道。此 DAG JSON 定義會提供管道每個步驟之間的要求和關係的相關資訊。管道的 DAG 結構由步驟之間的資料相依性決定。當一個步驟的輸出的内容作為輸入傳遞給另一個步驟時，就會建立這些資料相依性。下列影像是管道 DAG 範例：

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipeline-full.png)


**範例 DAG 包含下列步驟：**

1. `AbaloneProcess` ([處理](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing)步驟的執行個體) 會對用於訓練的資料執行預先處理指令碼。例如，指令碼可以填入缺失值、標準化數值資料，或將資料分割成訓練、驗證和測試資料集。

1. `AbaloneTrain` ([訓練](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training)步驟的執行個體) 會設定超參數，並從預先處理的輸入資料訓練模型。

1. `AbaloneEval` ([處理](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing)步驟的另一個執行個體) 會評估模型的準確性。此步驟顯示資料相依性的範例 - 此步驟使用 `AbaloneProcess` 的測試資料集輸出。

1. `AbaloneMSECond` 是[條件](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition)步驟的實例，在此範例中，其會檢查以確保模型評估的均方誤差結果低於特定限制。如果模型不符合條件，管道執行會停止。

1. 管道執行會繼續進行下列步驟：

   1. `AbaloneRegisterModel`，其中 SageMaker AI 會呼叫 [RegisterModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model) 步驟，將模型當作版本控制模型套件群組註冊至 Amazon SageMaker 模型註冊表。

   1. `AbaloneCreateModel`，其中 SageMaker AI 會呼叫 [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-create-model) 步驟來建立模型，以準備批次轉換。在 `AbaloneTransform` 中，SageMaker AI 會呼叫[轉換](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform)步驟，在您指定的資料集上產生模型預測。

下列主題描述 Pipelines 的基本概念。有關描述實現這些概念的教學課程，請參閱[Pipelines 動作](pipelines-build.md)。

**Topics**
+ [管道結構與執行](build-and-manage-pipeline.md)
+ [IAM 存取管理](build-and-manage-access.md)
+ [設定 Pipelines 的跨帳戶支援](build-and-manage-xaccount.md)
+ [管道參數](build-and-manage-parameters.md)
+ [Pipelines 步驟](build-and-manage-steps.md)
+ [使用 @step 裝飾項目直接移轉程式碼](pipelines-step-decorator.md)
+ [在步驟之間傳遞資料](build-and-manage-propertyfile.md)
+ [快取管道步驟](pipelines-caching.md)
+ [管道步驟的重試政策](pipelines-retry-policy.md)
+ [管道步驟的選取性執行](pipelines-selective-ex.md)
+ [Amazon SageMaker Pipelines 中具有 ClarifyCheck 和 QualityCheck 步驟的基準計算、漂移偵測和生命週期](pipelines-quality-clarify-baseline-lifecycle.md)
+ [排程管道執行](pipeline-eventbridge.md)
+ [Amazon SageMaker Experiments 整合](pipelines-experiments.md)
+ [使用本機模式執行管道](pipelines-local-mode.md)
+ [針對 Amazon SageMaker Pipelines 進行疑難排解](pipelines-troubleshooting.md)

# 管道結構與執行
<a name="build-and-manage-pipeline"></a>

**Topics**
+ [管道結構](#build-and-manage-pipeline-structure)
+ [使用平行組態執行管道](#build-and-manage-pipeline-execution)

## 管道結構
<a name="build-and-manage-pipeline-structure"></a>

Amazon SageMaker Pipelines 執行個體由 `name`、`parameters` 和 `steps` 組成。`(account, region)` 對內的管道名稱必須是唯一的。步驟定義中使用的所有參數都必須在管道中定義。列出的管道步驟會透過彼此的資料相依性，自動判斷其執行順序。Pipelines 服務會解析資料相依性 DAG 中步驟之間的關係，以建立執行完成的一系列步驟。下圖是管道結構範例。

**警告**  
透過視覺化編輯器或 SageMaker AI Python SDK 建置管道時，請勿在管道參數或任何步驟定義欄位 (例如環境變數) 中包含敏感資訊。未來在 `DescribePipeline` 請求中傳回這些欄位時，將可以看見它們。

```
from sagemaker.workflow.pipeline import Pipeline
  
  pipeline_name = f"AbalonePipeline"
  pipeline = Pipeline(
      name=pipeline_name,
      parameters=[
          processing_instance_type, 
          processing_instance_count,
          training_instance_type,
          model_approval_status,
          input_data,
          batch_data,
      ],
      steps=[step_process, step_train, step_eval, step_cond],
  )
```

## 使用平行組態執行管道
<a name="build-and-manage-pipeline-execution"></a>

依預設，管道會執行可平行執行的所有步驟。您可以在建立或更新管道時，以及啟動或重試管線執行時使用 `ParallelismConfiguration` 屬性，來控制此行為。

每次執行都會套用平行組態。例如，如果啟動兩個執行，則每個執行最多可以同時執行 50 個步驟，總共可以同時執行 100 個步驟。此外，在啟動、重試或更新執行時指定的 `ParallelismConfiguration` 優先於管道中定義的平行組態。

**Example 使用 `ParallelismConfiguration` 建立管道執行**  

```
pipeline = Pipeline(
        name="myPipeline",
        steps=[step_process, step_train]
    )

  pipeline.create(role, parallelism_config={"MaxParallelExecutionSteps": 50})
```

# IAM 存取管理
<a name="build-and-manage-access"></a>

下列各節說明 Amazon SageMaker 管道的 AWS Identity and Access Management (IAM) 要求。有關如何實現這些權限的範例，請參閱[先決條件](define-pipeline.md#define-pipeline-prereq)。

**Topics**
+ [管道角色許可](#build-and-manage-role-permissions)
+ [管道步驟許可](#build-and-manage-step-permissions)
+ [CORS 設定與 Amazon S3 儲存貯體](#build-and-manage-cors-s3)
+ [自訂 Pipelines 任務的存取管理](#build-and-manage-step-permissions-prefix)
+ [自訂管道版本的存取權](#build-and-manage-step-permissions-version)
+ [服務控制政策與 Pipelines](#build-and-manage-scp)

## 管道角色許可
<a name="build-and-manage-role-permissions"></a>

您的管道需要 IAM 管道執行角色，該角色會在您建立管道時傳遞至管道。您用來建立管道的 SageMaker AI 執行個體角色必須具有指定管道執行角色`iam:PassRole`的許可政策。這是因為執行個體需要許可，才能將您的管道執行角色傳遞至管道服務，以用於建立和執行管道。如需 IAM 角色的詳細資訊，請參閱 [IAM 角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)。

您的管道執行角色需要下列許可：
+ 您可以針對管道中的任何 SageMaker AI 任務步驟 （而不是預設使用的管道執行角色），使用唯一或自訂角色。請確定您的管道執行角色已新增具有指定這些角色之 `iam:PassRole`許可的政策。
+  管道中每個任務類型 `Create` 和 `Describe` 許可。
+  使用 `JsonGet` 函式的 Amazon S3 許可。您可以控制哪些使用者有權使用以資源為基礎的政策或以身分為基礎的政策來存取 Amazon S3 資源。資源型政策會套用至 Amazon S3 儲存貯體，並授予 Pipelines 存取儲存貯體的權限。以身分為基礎的政策可讓您的管道從您的帳戶發起 Amazon S3 呼叫。如需有關以資源為基礎的政策和以身分為基礎的政策的詳細資訊，請參閱[以身分為基礎的政策和以資源為基礎的政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_identity-vs-resource.html)。

  ```
  {
      "Action": [
          "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::<your-bucket-name>/*",
      "Effect": "Allow"
  }
  ```

## 管道步驟許可
<a name="build-and-manage-step-permissions"></a>

Pipelines 包括執行 SageMaker 任務的步驟。為了讓管道步驟執行這些任務，這些管道需要在您的帳戶中具有 IAM 角色，以提供所需資源的存取許可。此角色是由您的管道傳遞至 SageMaker AI 服務主體。如需 IAM 角色的詳細資訊，請參閱 [IAM 角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)。

依預設，每個步驟都具有管道執行角色。您可以選擇性地將不同的角色傳遞給管道中的任何步驟。這可確保每個步驟中的程式碼不會影響其他步驟中使用的資源，除非管道定義中指定的兩個步驟之間有直接關係。您可以在定義步驟的處理器或估算器時傳遞這些角色。如需如何在這些定義中包含這些角色的範例，請參閱 [SageMaker AI Python SDK 文件](https://sagemaker.readthedocs.io/en/stable/overview.html#using-estimators)。

## CORS 設定與 Amazon S3 儲存貯體
<a name="build-and-manage-cors-s3"></a>

若要確保您的映像以可預測的方式從 Amazon S3 儲存貯體匯入至 Pipelines，必須將 CORS 組態新增至從中匯入映像的 Amazon S3 儲存貯體。本節提供如何將所需 CORS 組態設定為 Amazon S3 儲存貯體的指示。Pipelines 所需的 XML `CORSConfiguration` 與 [輸入影像資料的 CORS 要求](sms-cors-update.md) 中的不同，否則您可以使用其中的資訊，進一步了解 Amazon S3 儲存貯體的 CORS 需求。

針對託管您映像的 Amazon S3 儲存貯體使用以下 CORS 組態程式碼。如需設定 CORS 的指示，請參閱 Amazon Simple Storage Service 使用者指南中的[設定跨來源資源共用 (CORS)](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-cors-configuration.html)。如果您使用 Amazon S3 主控台，將政策新增至您的儲存貯體，則必須使用 JSON 格式。

**JSON**

```
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "PUT"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "Access-Control-Allow-Origin"
        ]
    }
]
```

**XML**

```
<CORSConfiguration>
 <CORSRule>
   <AllowedHeader>*</AllowedHeader>
   <AllowedOrigin>*</AllowedOrigin>
   <AllowedMethod>PUT</AllowedMethod>
   <ExposeHeader>Access-Control-Allow-Origin</ExposeHeader>
 </CORSRule>
</CORSConfiguration>
```

若要使用 Amazon S3 主控台新增 CORS 標題政策，下列 GIF 會示範在 Amazon S3 文件中找到的指示。

![\[有關如何使用 Amazon S3 主控台新增 CORS 標頭政策的 Gif。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/sms/gifs/cors-config.gif)


## 自訂 Pipelines 任務的存取管理
<a name="build-and-manage-step-permissions-prefix"></a>

您可以進一步自訂 IAM 政策，以便您的組織中選定的成員可以執行任何或所有管道步驟。例如，您可以授予特定使用者建立訓練任務的許可，以及另一組使用者建立處理任務的許可，以及所有使用者執行剩餘步驟的許可。要使用此功能，您可以選擇一個自訂字串，其字首位您的任務名稱。您的管理員在允許的 ARN 前面加上字首，而您的資料科學家在管道執行個體中包含此字首。由於允許使用者的 IAM 政策包含具有指定字首的任務 ARN，因此管道步驟的後續任務要有必要的許可才能繼續。任務字首預設為關閉 – 您必須在 `Pipeline` 類別中開啟此選項才能使用它。

對於關閉字首的任務，任務名稱的格式如圖所示，並且是下表所述欄位的串連：

`pipelines-<executionId>-<stepNamePrefix>-<entityToken>-<failureCount>`


| 欄位 | 定義 | 
| --- | --- | 
|  管道   |  靜態字串始終在前面。此字串會將管道協同運作服務識別為任務的來源。  | 
|  executionId  |  管道執行中執行個體的隨機緩衝區。  | 
|  stepNamePrefix  |  使用者指定的步驟名稱 (在管道步驟的 `name` 引數中指定)，限制為前 20 個字元。  | 
|  entityToken  |  一個隨機的權杖，用於確保步驟實體的等冪性。  | 
|  failureCount  |  目前嘗試完成任務的重試次數。  | 

在此情況下，任務名稱前面不會加上自訂字首，且對應的 IAM 政策必須與此字串相符。

對於開啟任務字首的使用者，基礎任務名稱會採用下列格式，並將自訂字首指定為 `MyBaseJobName`：

*<MyBaseJobName>*-*<executionId>*-*<entityToken>*-*<failureCount>*

自訂字首會取代靜態 `pipelines` 字串，以協助您縮小可在管道中執行 SageMaker AI 任務之使用者的選取範圍。

**字首長度限制**

任務名稱具有特定於個別管道步驟的內部長度限制。此約束也會限制允許字首的長度。字首長度要求如下：


| 管道步驟 | 字首長度 | 
| --- | --- | 
|   `[TrainingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#trainingstep)`, `[ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)`, `[TransformStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#transformstep)`, `[ProcessingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#processingstep)`, `[ClarifyCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#clarifycheckstep)`, `[QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#qualitycheckstep)`, `[RegisterModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)`   |  38  | 
|  `[TuningStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#tuningstep)`, `[AutoML](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#automlstep)`  |  6  | 

### 將任務字首套用至 IAM 政策
<a name="build-and-manage-step-permissions-prefix-iam"></a>

您的管理員會建立 IAM 政策，允許特定字首的使用者建立任務。下列範例正策允許資料科學家在使用 `MyBaseJobName` 字首時建立訓練任務。

```
{
    "Action": "sagemaker:CreateTrainingJob",
    "Effect": "Allow",
    "Resource": [
        "arn:aws:sagemaker:region:account-id:*/MyBaseJobName-*"
    ]
}
```

### 將任務字首套用至管道建立
<a name="build-and-manage-step-permissions-prefix-inst"></a>

您可以使用任務執行個體類別的 `*base_job_name` 引數來指定字首。

**注意**  
您可以在建立管道步驟之前，將任務字首和 `*base_job_name` 引數傳遞至人任務執行個體。此任務執行個體包含任務在管道中作為步驟執行的必要資訊。此引數會根據使用的任務執行個體而有所不同。下列清單顯示每個管道步驟類型要使用的引數：  
對於 `[Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)` (`[TrainingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#trainingstep)`)、`[Processor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html)` (`[ProcessingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#processingstep)`) 和 `[AutoML](https://sagemaker.readthedocs.io/en/stable/api/training/automl.html)` (`[AutoMLStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#automlstep)`) 類別，引數為 `base_job_name`
對於 `[Tuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html)` (`[TuningStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#tuningstep)`) 類別，引數為 `tuning_base_job_name`
對於 `[Transformer](https://sagemaker.readthedocs.io/en/stable/api/inference/transformer.html)` (`[TransformStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#transformstep)`) 類別，引數為 `transform_base_job_name`
對於 `[QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#qualitycheckstep)` (品質檢查) 和 `[ClarifyCheckstep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#clarifycheckstep)` (澄清檢查) 類別，引數為 `[CheckJobConfig](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#checkjobconfig)` 中的 `base_job_name`
對於 `[Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html)` 類別，使用的引數取決於將結果傳遞給 `[ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)` 之前在模型上執行的是 `create` 還是 `register`  
如果您呼叫 `create`，則自訂字首來自構造模型時的 `name` 引數 (即 `Model(name=)`)
如果您呼叫 `register`，則自訂字首來自您呼叫 `register` 的 `model_package_name` 引數 (即 `my_model.register(model_package_name=)`)

以下範例顯示如何指定新訓練任務執行個體的字首。

```
# Create a job instance
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=model_path,
    role=role,
    subnets=["subnet-0ab12c34567de89f0"],
    base_job_name="MyBaseJobName"
    security_group_ids=["sg-1a2bbcc3bd4444e55"],
    tags = [ ... ]
    encrypt_inter_container_traffic=True, 
)

# Attach your job instance to a pipeline step
step_train = TrainingStep(
    name="TestTrainingJob",
    estimator=xgb_train, 
    inputs={
        "train": TrainingInput(...), 
        "validation": TrainingInput(...) 
    }
)
```

任務字首依預設處於關閉狀態。若要選擇使用此功能，請使用 `PipelineDefinitionConfig` 的 `use_custom_job_prefix` 選項，如下列程式碼片段所示：

```
from sagemaker.workflow.pipeline_definition_config import PipelineDefinitionConfig
        
# Create a definition configuration and toggle on custom prefixing
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=True);

# Create a pipeline with a custom prefix
 pipeline = Pipeline(
     name="MyJobPrefixedPipeline",
     parameters=[...]
     steps=[...]
     pipeline_definition_config=definition_config
)
```

建立並執行管道。下列範例會建立並執行管道，並示範如何關閉任務字首並重新執行管道。

```
pipeline.create(role_arn=sagemaker.get_execution_role())

# Optionally, call definition() to confirm your prefixed job names are in the built JSON
pipeline.definition()
pipeline.start()
      
# To run a pipeline without custom-prefixes, toggle off use_custom_job_prefix, update the pipeline 
# via upsert() or update(), and start a new run
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=False)
pipeline.pipeline_definition_config = definition_config
pipeline.update()
execution = pipeline.start()
```

同樣，您可以開啟現有管道的功能，並啟動使用任務字首的新執行。

```
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=True)
pipeline.pipeline_definition_config = definition_config
pipeline.update()
execution = pipeline.start()
```

最後，您可以透過呼叫管道執行 `list_steps` 來查看自訂字首的任務。

```
steps = execution.list_steps()

prefixed_training_job_name = steps['PipelineExecutionSteps'][0]['Metadata']['TrainingJob']['Arn']
```

## 自訂管道版本的存取權
<a name="build-and-manage-step-permissions-version"></a>

您可以使用 `sagemaker:PipelineVersionId` 條件金鑰，授予特定版本 Amazon SageMaker Pipelines 的自訂存取權。例如，以下政策僅授予版本 ID 6 及更新版本啟動執行或更新管道版本的存取權。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": {
        "Sid": "AllowStartPipelineExecution",
        "Effect": "Allow",
        "Action": [
            "sagemaker:StartPipelineExecution",
            "sagemaker:UpdatePipelineVersion"
        ],
        "Resource": "*",
        "Condition": {
            "NumericGreaterThanEquals": {
                "sagemaker:PipelineVersionId": 6
            }
        }
    }
}
```

------

如需支援的條件金鑰的詳細資訊，請參閱 [Amazon SageMaker AI 的條件金鑰](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonsagemaker.html#amazonsagemaker-policy-keys)。

## 服務控制政策與 Pipelines
<a name="build-and-manage-scp"></a>

服務控制政策 (SCP) 是一種組織政策類型，可用來管理您的組織中的許可。SCP 可集中控制組織中所有帳戶可用的許可上限。透過在組織內使用管道，您可以確保資料科學家管理您的管道執行，而不必與 AWS 主控台互動。 

如果您將 VPC 與 SCP 搭配使用來限制對 Amazon S3 的存取，則需要採取措施來允許管道存取其他 Amazon S3 資源。

若要允許 Pipelines 使用 `JsonGet` 函式在 VPC 之外存取 Amazon S3，請更新組織的 SCP，以確保使用 Pipelines 的角色可以存取 Amazon S3。若要這樣做，請使用主體標籤和條件索引鍵，透過管道執行角色為 Pipelines 執行器使用的角色建立例外狀況。

**允許 Pipelines 在 VPC 以外存取 Amazon S3**

1. 按照[標記 IAM 使用者和角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags.html)中的步驟，為您的管道執行角色建立唯一標籤。

1. 使用您建立之標籤的 `Aws:PrincipalTag IAM` 條件索引鍵，在 SCP 中授予例外狀況。有關詳細資訊，請參閱[建立、更新和刪除服務控制策略](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps_create.html)。

# 設定 Pipelines 的跨帳戶支援
<a name="build-and-manage-xaccount"></a>

Amazon SageMaker Pipelines 的跨帳戶支援可讓您與在不同 AWS 帳戶中運作的其他團隊或組織協作機器學習管道。透過設定跨帳戶管道共用，您可以授予管道的受控存取權、允許其他帳戶檢視管道詳細資訊、觸發執行，以及監控執行。下列主題涵蓋如何設定跨帳戶管道共用、共用資源可用的不同許可政策，以及如何透過對 SageMaker AI 的直接 API 呼叫來存取共用管道實體並與之互動。

## 設定跨帳戶管道共用
<a name="build-and-manage-xaccount-set-up"></a>

SageMaker AI 使用 [AWS Resource Access Manager](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) (AWS RAM) 協助您跨帳戶安全地共用管道實體。

### 建立資源共用
<a name="build-and-manage-xaccount-set-up-console"></a>

1. 透過 [AWS RAM](https://console.aws.amazon.com/ram/home) 主控台選取**建立資源共用**。

1. 指定資源共用詳細資訊時，請選擇 Pipelines 資源類型，然後選取一或多個您要共用的管道。當您與其他帳戶共用管道時，也會隱含共用其所有執行。

1. 將許可與資源共用建立關聯。選擇預設唯讀許可政策或擴充管道執行許可政策。如需詳細資訊，請參閱[Pipelines 資源的許可政策](#build-and-manage-xaccount-permissions)。
**注意**  
如果您選取延伸管道執行政策，請注意，共用帳戶呼叫的任何啟動、停止和重試命令都會使用 AWS 共用管道帳戶中的資源。

1. 使用 AWS 帳戶 IDs 指定您要授予共用資源存取權的帳戶。

1. 檢閱您的資源共用組態，然後選取**建立資源共用**。資源共用和主體關聯可能需要幾分鐘的時間才能完成。

如需詳細資訊，請參閱 *AWS Resource Access Manager 使用者指南*中的[共用您的 AWS 資源](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html)。

### 取得資源共用邀請的回應
<a name="build-and-manage-xaccount-set-up-responses"></a>

設定資源共用和主體關聯後，指定的 AWS 帳戶會收到加入該資源共用的邀請。 AWS 帳戶必須接受邀請，才能存取任何共用資源。

如需透過 接受資源共用邀請的詳細資訊 AWS RAM，請參閱*AWS 《Resource Access Manager 使用者指南*》中的[使用共用 AWS 資源](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html)。

## Pipelines 資源的許可政策
<a name="build-and-manage-xaccount-permissions"></a>

建立資源共用時，請選擇兩個支援的許可政策之一，以與 SageMaker AI 管道資源類型建立關聯。這兩項政策都會授予任何選取管道及其所有執行的存取權。

### 預設唯讀許可
<a name="build-and-manage-xaccount-permissions-default"></a>

`AWSRAMDefaultPermissionSageMakerPipeline` 政策允許下列唯讀動作：

```
"sagemaker:DescribePipeline"
"sagemaker:DescribePipelineDefinitionForExecution"   
"sagemaker:DescribePipelineExecution"
"sagemaker:ListPipelineExecutions"
"sagemaker:ListPipelineExecutionSteps"
"sagemaker:ListPipelineParametersForExecution"
"sagemaker:Search"
```

### 擴充管道執行許可
<a name="build-and-manage-xaccount-permissions-extended"></a>

`AWSRAMPermissionSageMakerPipelineAllowExecution` 政策包括預設政策中的所有唯讀許可，並允許共用帳戶啟動、停止和重試管道執行。

**注意**  
使用延伸管道執行許可政策時，請注意 AWS 資源使用情況。透過此政策，共用帳戶可以啟動、停止和重試管道執行。擁有者帳戶會使用用於共用管道執行的所有資源。

擴充管道執行許可政策允許下列動作：

```
"sagemaker:DescribePipeline"
"sagemaker:DescribePipelineDefinitionForExecution"   
"sagemaker:DescribePipelineExecution"
"sagemaker:ListPipelineExecutions"
"sagemaker:ListPipelineExecutionSteps"
"sagemaker:ListPipelineParametersForExecution"
"sagemaker:StartPipelineExecution"
"sagemaker:StopPipelineExecution"
"sagemaker:RetryPipelineExecution"
"sagemaker:Search"
```

## 透過直接 API 呼叫存取共用管道實體
<a name="build-and-manage-xaccount-api-calls"></a>

設定跨帳戶管道共用之後，您可以使用管道 ARN 呼叫下列 SageMaker API 動作：

**注意**  
僅在 API 命令包含在與資源共用關聯的許可中時，您才能呼叫這些命令。如果您選取`AWSRAMPermissionSageMakerPipelineAllowExecution`政策，則開始、停止和重試命令會使用共用管道 AWS 之帳戶中的資源。
+ [DescribePipeline](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipeline.html)
+ [DescribePipelineDefinitionForExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipelineDefinitionForExecution.html)
+ [DescribePipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipelineExecution.html)
+ [ListPipelineExecutions](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineExecutions.html)
+ [ListPipelineExecutionSteps](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineExecutionSteps.html)
+ [ListPipelineParametersForExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineParametersForExecution.html)
+ [StartPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StartPipelineExecution.html)
+ [StopPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopPipelineExecution.html)
+ [RetryPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_RetryPipelineExecution.html)

# 管道參數
<a name="build-and-manage-parameters"></a>

您可以使用參數將變數引入管道定義。您可以參考您在整個管道定義中定義的參數。參數具有預設值，您可以透過在啟動管道執行時指定參數值來覆寫預設值。預設值必須是與參數類型相符的執行個體。步驟定義中使用的所有參數都必須在管道定義中定義。本主題描述您可以定義的參數，以及如何實作這些參數。

Amazon SageMaker Pipelines 支援下列參數類型：
+  `ParameterString` – 表示字串參數。
+  `ParameterInteger` – 表示整數參數。
+  `ParameterFloat` – 表示浮點數參數。
+  `ParameterBoolean` – 表示布林值 Python 類型。

參數的格式如下：

```
<parameter> = <parameter_type>(
    name="<parameter_name>",
    default_value=<default_value>
)
```

以下範例顯示範例參數實作。

```
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
    ParameterBoolean
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
```

您可以在建立管道時傳遞參數，如下列範例所示。

```
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_count
    ],
    steps=[step_process]
)
```

您也可以將不同於預設值的參數值傳遞給管道執行，如下列範例所示。

```
execution = pipeline.start(
    parameters=dict(
        ProcessingInstanceCount="2",
        ModelApprovalStatus="Approved"
    )
)
```

您可以使用類似 `[ sagemaker.workflow.functions.Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.functions.Join)` 的 SageMaker Python SDK 函式來操作參數。如需有關參數的詳細資訊，請參閱 [SageMaker Pipelines 參數](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#parameters)。

如需 Pipelines 參數的已知限制，請參閱 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 中的*[限制 - 參數化](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#parameterization)*。

# Pipelines 步驟
<a name="build-and-manage-steps"></a>

Pipelines 由步驟組成。這些步驟使用屬性定義管道採取的動作以及步驟之間的關係。下頁描述步驟的類型、其屬性，以及它們之間的關係。

**Topics**
+ [新增步驟](build-and-manage-steps-types.md)
+ [新增整合](build-and-manage-steps-integration.md)
+ [步驟屬性](#build-and-manage-properties)
+ [步驟平行處理](#build-and-manage-parallelism)
+ [步驟之間的資料相依性](#build-and-manage-data-dependency)
+ [步驟之間的自訂相依性](#build-and-manage-custom-dependency)
+ [步驟中的自訂映像](#build-and-manage-images)

# 新增步驟
<a name="build-and-manage-steps-types"></a>

下文描述每個步驟類型的要求，並提供步驟的範例實作，以及如何將步驟新增至 Pipelines。這些不是有效的實作，因為它們不提供所需的資源和輸入。有關實現這些步驟的教學課程，請參閱[Pipelines 動作](pipelines-build.md)。

**注意**  
您也可以從本機機器學習程式碼建立步驟，方法是使用 `@step` 裝飾項目將其轉換為 Pipelines 步驟。如需詳細資訊，請參閱[@step 裝飾項目](#step-type-custom)。

Amazon SageMaker Pipelines 支援下列步驟類型：
+ [執行程式碼](#step-type-executecode)

  [處理](#step-type-processing)
+ [培訓](#step-type-training)
+ [調校](#step-type-tuning)
+ [AutoML](#step-type-automl)
+ [`Model`](#step-type-model)
+ [`Create model`](#step-type-create-model)
+ [`Register model`](#step-type-register-model)
+ [`Deploy model (endpoint)`](#step-type-deploy-model-endpoint)
+ [轉換](#step-type-transform)
+ [條件](#step-type-condition)
+ [`Callback`](#step-type-callback)
+ [Lambda](#step-type-lambda)
+ [`ClarifyCheck`](#step-type-clarify-check)
+ [`QualityCheck`](#step-type-quality-check)
+ [EMR](#step-type-emr)
+ [筆記本任務](#step-type-notebook-job)
+ [失敗](#step-type-fail)

## @step 裝飾項目
<a name="step-type-custom"></a>

如果您想要在drag-and-drop管道 UI 中協調利用進階 SageMaker AI 功能或其他 AWS 服務的自訂 ML 任務，請使用 [執行程式碼步驟](#step-type-executecode)。

您可以使用 `@step` 裝飾項目從本機機器學習程式碼建立步驟。測試您的程式碼後，您可以透過使用 `@step` 裝飾項目註釋函數，將該函數轉換為 SageMaker AI 管道步驟。當您將 `@step` 裝飾函數的輸出作為步驟傳遞至管道時，Pipelines 會建立並執行管道。您也可以建立多步驟 DAG 管道，其中包含一或多個 `@step` 裝飾函數，以及傳統 SageMaker AI 管道步驟。如需如何使用 `@step` 裝飾項目建立步驟的詳細資訊，請參閱[使用 @step 裝飾項目直接移轉程式碼](pipelines-step-decorator.md)。

## 執行程式碼步驟
<a name="step-type-executecode"></a>

在 Pipelines 拖放 UI 中，您可以使用**執行程式碼**步驟，將您自己的程式碼作為管道步驟執行。您可以上傳 Python 函數、指令碼或筆記本，作為管道的一部分執行。如果您想要協調利用進階 SageMaker AI 功能或其他 AWS 服務的自訂 ML 任務，您應該使用此步驟。

**執行程式碼**步驟會將檔案上傳至 Amazon SageMaker AI 的預設 Amazon S3 儲存貯體。此儲存貯體可能未設定必要的跨來源資源共用 (CORS) 許可。若要進一步了解如何設定 CORS 許可，請參閱[輸入影像資料的 CORS 要求](sms-cors-update.md)。

**執行程式碼**步驟會使用 Amazon SageMaker 訓練任務來執行您的程式碼。確保您的 IAM 角色具有 `sagemaker:DescribeTrainingJob` 和 `sagemaker:CreateTrainingJob` API 許可。若要進一步了解 Amazon SageMaker AI 的所有必要許可，以及如何設定這些許可，請參閱 [Amazon SageMaker AI API 許可：動作、許可與資源參考](api-permissions-reference.md)。

若要使用管道設計工具將執行程式碼步驟新增至管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Amazon SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**執行程式碼**並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**執行程式碼**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。

1. 您可以上傳單一檔案，以執行或上傳包含多個成品的壓縮資料夾。

1. 對於單一檔案上傳，您可以為筆記本、python 函數或指令碼提供選用參數。

1. 提供 Python 函數時，必須以格式 `file.py:<function_name>` 提供處理常式

1. 對於壓縮資料夾上傳，必須提供程式碼的相對路徑，而且您可以選擇性地在壓縮資料夾內提供 `requirements.txt` 檔案或初始化指令碼的路徑。

1. 如果畫布包含緊接在您已新增之**執行程式碼**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**執行程式碼**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**執行程式碼**步驟之後的任何步驟，請按一下游標，然後將其從**執行程式碼**步驟拖曳到該步驟以建立邊緣。Python 函數可以參考**執行程式碼**步驟的輸出。

## 處理步驟
<a name="step-type-processing"></a>

使用處理步驟建立用於資料處理的處理任務。如需有關處理工作的詳細資訊，請參閱[處理資料和評估模型](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html)。

------
#### [ Pipeline Designer ]

若要使用管道設計工具將處理步驟新增至管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Amazon SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 在左側邊欄中，選擇**處理資料**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**處理資料**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep)。

1. 如果畫布包含緊接在您已新增之**處理資料**步驟之前的任何步驟，請按一下游標，然後將其從步驟拖曳到**處理資料**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**處理資料**步驟之後的任何步驟，請按一下游標，然後將其從**處理資料**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

處理步驟需要處理器、定義處理程式碼的 Python 指令碼、處理輸出以及任務引數。下列範例示範如何建立 `ProcessingStep` 定義。

```
from sagemaker.sklearn.processing import SKLearnProcessor

sklearn_processor = SKLearnProcessor(framework_version='1.0-1',
                                     role=<role>,
                                     instance_type='ml.m5.xlarge',
                                     instance_count=1)
```

```
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

inputs = [
    ProcessingInput(source=<input_data>, destination="/opt/ml/processing/input"),
]

outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
    ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
]

step_process = ProcessingStep(
    name="AbaloneProcess",
    step_args = sklearn_processor.run(inputs=inputs, outputs=outputs,
        code="abalone/preprocessing.py")
)
```

**傳遞執行期參數**

下列範例示範如何將執行期參數從 PySpark 處理器傳遞給 `ProcessingStep`。

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.spark.processing import PySparkProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

pipeline_session = PipelineSession()

pyspark_processor = PySparkProcessor(
    framework_version='2.4',
    role=<role>,
    instance_type='ml.m5.xlarge',
    instance_count=1,
    sagemaker_session=pipeline_session,
)

step_args = pyspark_processor.run(
    inputs=[ProcessingInput(source=<input_data>, destination="/opt/ml/processing/input"),],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
    ],
    code="preprocess.py",
    arguments=None,
)


step_process = ProcessingStep(
    name="AbaloneProcess",
    step_args=step_args,
)
```

如需有關處理步驟要求的詳細資訊，請參閱 [sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) 文件。如需深入範例，請參閱[協調任務以使用 Amazon SageMaker Pipelines 訓練和評估模型](https://github.com/aws/amazon-sagemaker-examples/blob/62de6a1fca74c7e70089d77e36f1356033adbe5f/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb)範例筆記本。*定義特徵工程的處理步驟*一節包含詳細資訊。

------

## 訓練步驟
<a name="step-type-training"></a>

您使用訓練步驟建立訓練任務來訓練模型。如需訓練任務的詳細資訊，請參閱[使用 Amazon SageMaker AI 訓練模型](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html)。

訓練步驟需要估算器以及訓練和驗證資料輸入。

------
#### [ Pipeline Designer ]

若要使用管道設計工具將訓練步驟新增至管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Amazon SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**訓練模型**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**訓練模型**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep)。

1. 如果畫布包含緊接在您已新增之**訓練模型**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**訓練模型**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**訓練模型**步驟之後的任何步驟，請按一下游標，然後將其從**訓練模型**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

下列範例示範如何建立 `TrainingStep` 定義。如需訓練步驟要求的詳細資訊，請參閱 [sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) 文件。

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

from sagemaker.xgboost.estimator import XGBoost

pipeline_session = PipelineSession()

xgb_estimator = XGBoost(..., sagemaker_session=pipeline_session)

step_args = xgb_estimator.fit(
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    }
)

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=step_args,
)
```

------

## 調校步驟
<a name="step-type-tuning"></a>

您可以使用調校步驟來建立超參數調校任務，也稱為超參數最佳化 (HPO)。超參數調校任務會執行多個訓練任務，每個任務都會產生一個模型版本。如需有關超參數調校的詳細資訊，請參閱[使用 SageMaker AI 執行自動模型調校](automatic-model-tuning.md)。

調校任務與管道的 SageMaker AI 實驗相關聯，並以試驗的形式建立訓練任務。如需詳細資訊，請參閱[Experiments 整合](pipelines-experiments.md)。

調校步驟需要 [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html) 和訓練輸入。您可以指定 `HyperparameterTuner` 之 `warm_start_config` 參數來重新訓練先前的調校任務。如需有關超參數調整和熱啟動的詳細資訊，請參閱[執行超參數調校任務的暖啟動](automatic-model-tuning-warm-start.md)。

您使用 [sagemaker.workflow.steps.TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) 類別的 [get\$1top\$1model\$1s3\$1uri](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep.get_top_model_s3_uri) 方法，從效能最佳的模型版本之一取得模型成品。如需顯示如何在 SageMaker AI 管道中使用調校步驟的筆記本，請參閱 [sagemaker-pipelines-tuning-step.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/tuning-step/sagemaker-pipelines-tuning-step.ipynb)。

**重要**  
Amazon SageMaker Python SDK v2.48.0 和 Amazon SageMaker Studio Classic v3.8.0 中引入了調校步驟。您必須先更新 Studio Classic，才能使用調校步驟，否則管道 DAG 不會顯示。若要更新 Studio Classic，請參閱[關閉並更新 Amazon SageMaker Studio Classic](studio-tasks-update-studio.md)。

下列範例示範如何建立 `TuningStep` 定義。

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.tuner import HyperparameterTuner
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TuningStep

tuner = HyperparameterTuner(..., sagemaker_session=PipelineSession())
    
step_tuning = TuningStep(
    name = "HPTuning",
    step_args = tuner.fit(inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data"))
)
```

**取得最佳模型版本**

下列範例示範如何使用 `get_top_model_s3_uri` 方法從調校任務取得最佳模型版本。根據 [HyperParameterTuningJobObjective](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobObjective.html)，最多提供效能排名前 50 的版本。`top_k` 引數是版本的索引，其中 `top_k=0` 是效能最佳的版本，而 `top_k=49` 是效能最差的版本。

```
best_model = Model(
    image_uri=image_uri,
    model_data=step_tuning.get_top_model_s3_uri(
        top_k=0,
        s3_bucket=sagemaker_session.default_bucket()
    ),
    ...
)
```

如需有關調整步驟要求的詳細資訊，請參閱 [sagemaker.workflow.steps.TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) 文件。

## 微調步驟
<a name="step-type-fine-tuning"></a>

微調會在新資料集上，從 Amazon SageMaker JumpStart 訓練預先訓練的基礎模型。這個程序也稱為移轉學習，可以利用較小的資料集和較短的訓練時間來產生精確的模型。微調模型時，您可以使用預設資料集或選擇自己的資料。若要進一步了解如何從 JumpStart 微調基礎模型，請參閱[微調模型](jumpstart-fine-tune.md)。

微調步驟使用 Amazon SageMaker 訓練任務來自訂您的模型。確保您的 IAM 角色具有在您管道中執行微調任務的 `sagemaker:DescribeTrainingJob` 和 `sagemaker:CreateTrainingJob` API 許可。若要進一步了解 Amazon SageMaker AI 的必要許可，以及如何設定這些許可，請參閱 [Amazon SageMaker AI API 許可：動作、許可與資源參考](api-permissions-reference.md)。

若要使用拖放編輯器將**微調模型**步驟新增至您的管道，請遵循下列步驟：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**微調模型**並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**微調模型**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。

1. 如果畫布包含緊接在您已新增之**微調模型**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**微調模型**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**微調模型**步驟之後的任何步驟，請按一下游標，然後將其從**微調模型**步驟拖曳到該步驟以建立邊緣。

## AutoML 步驟
<a name="step-type-automl"></a>

使用 [AutoML](https://sagemaker.readthedocs.io/en/stable/api/training/automl.html) API 建立 AutoML 任務以自動訓練模型。如需有關 AutoML 任務的詳細資訊，請參閱[使用 Amazon SageMaker Autopilot 將模型開發自動化](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html)。

**注意**  
目前，AutoML 步驟僅支援[集成訓練模式](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-model-support-validation.html)。

下列範例示範如何使用 `AutoMLStep` 建立定義。

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.automl_step import AutoMLStep

pipeline_session = PipelineSession()

auto_ml = AutoML(...,
    role="<role>",
    target_attribute_name="my_target_attribute_name",
    mode="ENSEMBLING",
    sagemaker_session=pipeline_session) 

input_training = AutoMLInput(
    inputs="s3://amzn-s3-demo-bucket/my-training-data",
    target_attribute_name="my_target_attribute_name",
    channel_type="training",
)
input_validation = AutoMLInput(
    inputs="s3://amzn-s3-demo-bucket/my-validation-data",
    target_attribute_name="my_target_attribute_name",
    channel_type="validation",
)

step_args = auto_ml.fit(
    inputs=[input_training, input_validation]
)

step_automl = AutoMLStep(
    name="AutoMLStep",
    step_args=step_args,
)
```

**取得最佳模型版本**

AutoML 步驟會自動訓練多個候選模型。使用 `get_best_auto_ml_model` 方法從 AutoML 任務取得具有最佳目標指標的模型，如下所示。您亦須使用 IAM `role` 來存取模型成品。

```
best_model = step_automl.get_best_auto_ml_model(role=<role>)
```

如需詳細資訊，請參閱 SageMaker Python SDK 中的 [AutoML](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.automl_step.AutoMLStep) 步驟。

## 模型步驟
<a name="step-type-model"></a>

使用 `ModelStep` 建立或註冊 SageMaker AI 模型。如需有關 `ModelStep` 要求的更多資訊，請參閱 [sagemaker.workflow.model\$1step.ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.model_step.ModelStep) 文件。

### 建立模型
<a name="step-type-model-create"></a>

您可以使用 `ModelStep` 建立 SageMaker AI 模型。`ModelStep` 需要模型成品以及您用來建立模型所需之 SageMaker AI 執行個體的相關資訊。如需 SageMaker AI 模型的詳細資訊，請參閱[使用 Amazon SageMaker AI 訓練模型](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html)。

下列範例示範如何建立 `ModelStep` 定義。

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.model import Model
from sagemaker.workflow.model_step import ModelStep

step_train = TrainingStep(...)
model = Model(
    image_uri=pytorch_estimator.training_image_uri(),
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=PipelineSession(),
    role=role,
)

step_model_create = ModelStep(
   name="MyModelCreationStep",
   step_args=model.create(instance_type="ml.m5.xlarge"),
)
```

### 註冊模型
<a name="step-type-model-register"></a>

您可以使用 `ModelStep` 搭配 Amazon SageMaker 模型註冊表註冊 `sagemaker.model.Model` 或 `sagemaker.pipeline.PipelineModel`。`PipelineModel` 表示推論管道，是一種由直線順序容器構成的模型，可處理推論請求。如需有關如何註冊模型的詳細資訊，請參閱[使用模型註冊庫進行模型註冊部署](model-registry.md)。

下列範例示範如何建立註冊 `PipelineModel` 所需的 `ModelStep`。

```
import time

from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn import SKLearnModel
from sagemaker.xgboost import XGBoostModel

pipeline_session = PipelineSession()

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

sklearn_model = SKLearnModel(
   model_data=processing_step.properties.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri,
   entry_point='inference.py',
   source_dir='sklearn_source_dir/',
   code_location=code_location,
   framework_version='1.0-1',
   role=role,
   sagemaker_session=pipeline_session,
   py_version='py3'
)

xgboost_model = XGBoostModel(
   model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
   entry_point='inference.py',
   source_dir='xgboost_source_dir/',
   code_location=code_location,
   framework_version='0.90-2',
   py_version='py3',
   sagemaker_session=pipeline_session,
   role=role
)

from sagemaker.workflow.model_step import ModelStep
from sagemaker import PipelineModel

pipeline_model = PipelineModel(
   models=[sklearn_model, xgboost_model],
   role=role,sagemaker_session=pipeline_session,
)

register_model_step_args = pipeline_model.register(
    content_types=["application/json"],
   response_types=["application/json"],
   inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
   transform_instances=["ml.m5.xlarge"],
   model_package_group_name='sipgroup',
)

step_model_registration = ModelStep(
   name="AbaloneRegisterModel",
   step_args=register_model_step_args,
)
```

## 建立模型步驟
<a name="step-type-create-model"></a>

您可以使用建立模型步驟建立 SageMaker AI 模型。如需 SageMaker AI 模型的詳細資訊，請參閱[使用 Amazon SageMaker 訓練模型](how-it-works-training.md)。

建立模型步驟需要模型成品以及您用來建立模型所需之 SageMaker AI 執行個體的相關資訊。下列範例展示如何建立一個建立模型步驟定義。如需建立模型步驟要求的詳細資訊，請參閱 [sagemaker.workflow.steps.CreateModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.CreateModelStep) 文件。

------
#### [ Pipeline Designer ]

若要將建立模型步驟新增至您的管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**建立模型**並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**建立模型**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些索引標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.CreateModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.CreateModelStep)。

1. 如果畫布包含緊接在您已新增之**建立模型**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**建立模型**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**建立模型**步驟之後的任何步驟，請按一下游標，然後將其從**建立模型**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

**重要**  
我們建議從 SageMaker AI Python SDK v2.90.0 起使用 [模型步驟](#step-type-model) 建立模型。`CreateModelStep` 將會繼續在 SageMaker Python SDK 的先前版本中運作，但不再受到主動支援。

```
from sagemaker.workflow.steps import CreateModelStep

step_create_model = CreateModelStep(
    name="AbaloneCreateModel",
    model=best_model,
    inputs=inputs
)
```

------

## 註冊模型步驟
<a name="step-type-register-model"></a>

註冊模型步驟會將模型註冊至 SageMaker 模型註冊庫。

------
#### [ Pipeline Designer ]

若要使用管道設計工具從管道註冊模型，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Amazon SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**註冊模型**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**註冊模型**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel)。

1. 如果畫布包含緊接在您已新增之**註冊模型**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**註冊模型**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**註冊模型**步驟之後的任何步驟，請按一下游標，然後將其從**註冊模型**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

**重要**  
我們建議從 SageMaker AI Python SDK v2.90.0 起使用 [模型步驟](#step-type-model) 註冊模型。`RegisterModel` 將會繼續在 SageMaker Python SDK 的先前版本中運作，但不再受到主動支援。

您可以使用 `RegisterModel` 步驟搭配 Amazon SageMaker 模型註冊表註冊 [sagemaker.model.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) 或 [sagemaker.pipeline.PipelineModel](https://sagemaker.readthedocs.io/en/stable/api/inference/pipeline.html#pipelinemodel)。`PipelineModel` 表示推論管道，是一種由直線順序容器構成的模型，可處理推論請求。

如需有關如何註冊模型的詳細資訊，請參閱[使用模型註冊庫進行模型註冊部署](model-registry.md)。如需有關 `RegisterModel` 步驟要求的詳細資訊，請參閱 [sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel) 文件。

下列範例示範如何建立註冊 `PipelineModel` 所需的 `RegisterModel` 步驟。

```
import time
from sagemaker.sklearn import SKLearnModel
from sagemaker.xgboost import XGBoostModel

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

sklearn_model = SKLearnModel(model_data=processing_step.properties.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri,
 entry_point='inference.py',
 source_dir='sklearn_source_dir/',
 code_location=code_location,
 framework_version='1.0-1',
 role=role,
 sagemaker_session=sagemaker_session,
 py_version='py3')

xgboost_model = XGBoostModel(model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
 entry_point='inference.py',
 source_dir='xgboost_source_dir/',
 code_location=code_location,
 framework_version='0.90-2',
 py_version='py3',
 sagemaker_session=sagemaker_session,
 role=role)

from sagemaker.workflow.step_collections import RegisterModel
from sagemaker import PipelineModel
pipeline_model = PipelineModel(models=[sklearn_model,xgboost_model],role=role,sagemaker_session=sagemaker_session)

step_register = RegisterModel(
 name="AbaloneRegisterModel",
 model=pipeline_model,
 content_types=["application/json"],
 response_types=["application/json"],
 inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
 transform_instances=["ml.m5.xlarge"],
 model_package_group_name='sipgroup',
)
```

如果未提供 `model`，則註冊模型步驟需要估算器，如下列範例所示。

```
from sagemaker.workflow.step_collections import RegisterModel

step_register = RegisterModel(
    name="AbaloneRegisterModel",
    estimator=xgb_train,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics
)
```

------

## 部署模型 (端點) 步驟
<a name="step-type-deploy-model-endpoint"></a>

在管道設計工具中，使用部署模型 (端點) 步驟將您的模型部署到端點。您可以建立新的端點或使用現有的端點。即時推論非常適合您具有即時、互動、低延遲需求的推論工作負載。您可以將模型部署到 SageMaker AI 託管服務，並取得可用於推論的即時端點。這些端點是完全受管且支援自動擴展。若要進一步了解 SageMaker AI 中的即時推論，請參閱[即時推論](realtime-endpoints.md)。

在將部署模型步驟新增至您的管道之前，請確定您的 IAM 角色具有下列許可：
+ `sagemaker:CreateModel`
+ `sagemaker:CreateEndpointConfig`
+ `sagemaker:CreateEndpoint`
+ `sagemaker:UpdateEndpoint`
+ `sagemaker:DescribeModel`
+ `sagemaker:DescribeEndpointConfig`
+ `sagemaker:DescribeEndpoint`

若要進一步了解 SageMaker AI 的所有必要許可，以及如何設定這些許可，請參閱 [Amazon SageMaker AI API 許可：動作、許可與資源參考](api-permissions-reference.md)。

若要在拖放編輯器中將模型部署步驟新增至您的管道，請完成下列步驟：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**部署模型 (端點)** 並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**部署模型 (端點)** 步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。

1. 如果畫布包含緊接在您已新增之**部署模型 (端點)** 步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**部署模型 (端點)** 步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**部署模型 (端點)** 步驟之後的任何步驟，請按一下游標，然後將其從**部署模型 (端點)** 步驟拖曳到該步驟以建立邊緣。

## 轉換步驟
<a name="step-type-transform"></a>

您可以使用批次轉換的轉換步驟對整個資料集執行推論。有關批次轉換的更多資訊，請參閱[使用推論管道執行批次轉換](inference-pipeline-batch.md)。

轉換步驟需要轉換器以及要對其執行批次轉換的資料。下列範例展示如何建立轉換步驟定義。如需轉換步驟要求的詳細資訊，請參閱 [sagemaker.workflow.steps.TransformStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep) 文件。

------
#### [ Pipeline Designer ]

若要使用拖放視覺化編輯器將批次轉換步驟新增至您的管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**部署模型 (批次轉換)** 並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**部署模型 (批次轉換)** 步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些索引標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.TransformStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep)。

1. 如果畫布包含緊接在您已新增之**部署模型 (批次轉換)** 步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**部署模型 (批次轉換)** 步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**部署模型 (批次轉換)** 步驟之後的任何步驟，請按一下游標，然後將其從**部署模型 (批次轉換)** 步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.transformer import Transformer
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep

transformer = Transformer(..., sagemaker_session=PipelineSession())

step_transform = TransformStep(
    name="AbaloneTransform",
    step_args=transformer.transform(data="s3://amzn-s3-demo-bucket/my-data"),
)
```

------

## 條件步驟
<a name="step-type-condition"></a>

您可以使用條件步驟來評估步驟屬性的條件，以評估接下來應該在管道中採取哪些動作。

條件步驟需要：
+ 條件清單。
+ 如果條件評估為 `true`，要執行的步驟清單。
+ 如果條件評估為 `false`，要執行的步驟清單。

------
#### [ Pipeline Designer ]

若要使用管道設計工具將條件步驟新增至管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Amazon SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**條件**並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**條件**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些索引標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.condition\$1step.ConditionStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.condition_step.ConditionStep)。

1. 如果畫布包含緊接在您已新增之**條件**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**條件**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**條件**步驟之後的任何步驟，請按一下游標，然後將其從**條件**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

 下列範例示範如何建立 `ConditionStep` 定義。

**限制**
+ Pipelines 不支援使用巢狀條件步驟。您無法將條件步驟作為另一個條件步驟的輸入傳遞。
+ 條件步驟不能在兩個分支中使用相同的步驟。如果您需要在兩個分支中使用相同的步驟功能，請複製該步驟並為其指定不同的名稱。

```
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet

cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="regression_metrics.mse.value"
    ),
    right=6.0
)

step_cond = ConditionStep(
    name="AbaloneMSECond",
    conditions=[cond_lte],
    if_steps=[step_register, step_create_model, step_transform],
    else_steps=[]
)
```

如需有關 `ConditionStep` 要求的詳細資訊，請參閱 [sagemaker.workflow.condition\$1step.ConditionStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#conditionstep) API 參考。如需支援的條件的詳細資訊，請參閱 SageMaker AI Python SDK 文件中的 *[Amazon SageMaker Pipelines - 條件](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#conditions)*。

------

## 回呼步驟
<a name="step-type-callback"></a>

使用`Callback`步驟將非由 Amazon SageMaker Pipelines 直接提供的其他程序和 AWS 服務新增至您的工作流程。`Callback` 步驟執行時，會發生下列程序：
+ Pipelines 會將訊息傳送至客戶指定的 Amazon Simple Queue Service (Amazon SQS) 佇列。訊息包含 Pipelines 產生的字符，以及客戶提供的輸入參數清單。傳送訊息後，Pipelines 會等待客戶回應。
+ 客戶從 Amazon SQS 佇列擷取訊息，並開始自訂程序。
+ 程序完成時，客戶會呼叫下列其中一個 API，並提交 Pipelines 產生的字符：
  +  [SendPipelineExecutionStepSuccess](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SendPipelineExecutionStepSuccess.html) 以及輸出參數清單
  +  [SendPipelineExecutionStepFailure](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SendPipelineExecutionStepFailure.html) 以及失敗原因
+ API 呼叫會使 Pipeline 繼續進行管道程序或使程序失敗。

如需有關 `Callback` 步驟要求的詳細資訊，請參閱[工作流程回調步驟回調步驟](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.callback_step.CallbackStep)文件。有關完整的解決方案，請參閱[擴展 SageMaker 管道以包含使用回調步驟的自訂步驟](https://aws.amazon.com/blogs/machine-learning/extend-amazon-sagemaker-pipelines-to-include-custom-steps-using-callback-steps/)。

**重要**  
Amazon SageMaker Python SDK v2.45.0 和 Amazon SageMaker Studio Classic v3.6.2 中引入了 `Callback` 步驟。您必須先更新 Studio Classic，才能使用 `Callback` 步驟，否則管道 DAG 不會顯示。若要更新 Studio Classic，請參閱[關閉並更新 Amazon SageMaker Studio Classic](studio-tasks-update-studio.md)。

下列範例展示上述程序的實作。

```
from sagemaker.workflow.callback_step import CallbackStep

step_callback = CallbackStep(
    name="MyCallbackStep",
    sqs_queue_url="https://sqs.us-east-2.amazonaws.com/012345678901/MyCallbackQueue",
    inputs={...},
    outputs=[...]
)

callback_handler_code = '
    import boto3
    import json

    def handler(event, context):
        sagemaker_client=boto3.client("sagemaker")

        for record in event["Records"]:
            payload=json.loads(record["body"])
            token=payload["token"]

            # Custom processing

            # Call SageMaker AI to complete the step
            sagemaker_client.send_pipeline_execution_step_success(
                CallbackToken=token,
                OutputParameters={...}
            )
'
```

**注意**  
`CallbackStep` 的輸出參數不應巢狀化。例如，如果您使用巢狀字典作為輸出參數，則字典會被視為單一字串 (例如 `{"output1": "{\"nested_output1\":\"my-output\"}"}`)。如果您提供巢狀值，則當您嘗試參考特定輸出參數時，SageMaker AI 會擲出不可重試的用戶端錯誤。

**停止行為**

`Callback` 步驟執行期間，管道程序不會停止。

當您使用執行中 `Callback` 步驟對管道程序呼叫 [StopPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopPipelineExecution.html) 時，Pipelines 會將 Amazon SQS 訊息傳送到 SQS 佇列。SQS 訊息的主體包含設定為 `Stopping` 的**狀態**欄位。下列範例示範 SQS 訊息主體。

```
{
  "token": "26vcYbeWsZ",
  "pipelineExecutionArn": "arn:aws:sagemaker:us-east-2:012345678901:pipeline/callback-pipeline/execution/7pinimwddh3a",
  "arguments": {
    "number": 5,
    "stringArg": "some-arg",
    "inputData": "s3://sagemaker-us-west-2-012345678901/abalone/abalone-dataset.csv"
  },
  "status": "Stopping"
}
```

您應該將邏輯新增至 Amazon SQS 訊息取用者，以在收到訊息時採取任何必要的動作 (例如資源清理)。然後將呼叫新增至 `SendPipelineExecutionStepSuccess` 或 `SendPipelineExecutionStepFailure`。

只有當 Pipelines 接收到其中一個呼叫時，才會停止管道程序。

## Lambda 步驟
<a name="step-type-lambda"></a>

您可以使用 Lambda 步驟來執行 AWS Lambda 函數。您可以執行現有的 Lambda 函式，或 SageMaker AI 可以建立並執行新的 Lambda 函式。如果您選擇使用現有的 Lambda 函數，它必須與 SageMaker AI 管道 AWS 區域 位於相同的 中。如需顯示如何在 SageMaker AI 管道中使用 Lambda 步驟的筆記本，請參閱 [sagemaker-pipelines-lambda-step.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/lambda-step/sagemaker-pipelines-lambda-step.ipynb)。

**重要**  
Amazon SageMaker Python SDK v2.51.0 和 Amazon SageMaker Studio Classic v3.9.1 中引入了 Lambda 步驟。您必須先更新 Studio Classic，才能使用 Lambda 步驟，否則管道 DAG 不會顯示。若要更新 Studio Classic，請參閱[關閉並更新 Amazon SageMaker Studio Classic](studio-tasks-update-studio.md)。

SageMaker AI 提供 [sagemaker.lambda\$1helper.Lambda](https://sagemaker.readthedocs.io/en/stable/api/utility/lambda_helper.html) 類別，以建立、更新、調用和刪除 Lambda 函式。`Lambda` 具有下列簽章。

```
Lambda(
    function_arn,       # Only required argument to invoke an existing Lambda function

    # The following arguments are required to create a Lambda function:
    function_name,
    execution_role_arn,
    zipped_code_dir,    # Specify either zipped_code_dir and s3_bucket, OR script
    s3_bucket,          # S3 bucket where zipped_code_dir is uploaded
    script,             # Path of Lambda function script
    handler,            # Lambda handler specified as "lambda_script.lambda_handler"
    timeout,            # Maximum time the Lambda function can run before the lambda step fails
    ...
)
```

[sagemaker.workflow.lambda\$1step.LambdaStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.lambda_step.LambdaStep) 類別具有 `Lambda` 類型的 `lambda_func` 引數。若要調用現有的 Lambda 函式，唯一的要求是將函式的 Amazon Resource Name (ARN) 提供給 `function_arn`。如果您未提供 `function_arn` 的值，則必須指定 `handler` 和下列其中一項：
+ `zipped_code_dir` – 壓縮 Lambda 函式的路徑

  `s3_bucket` – `zipped_code_dir` 上傳至的 Amazon S3 儲存貯體
+ `script` – Lambda 函式指令碼檔案的路徑

下列範例示範如何建立調用現有 Lambda 函式的 `Lambda` 步驟定義。

```
from sagemaker.workflow.lambda_step import LambdaStep
from sagemaker.lambda_helper import Lambda

step_lambda = LambdaStep(
    name="ProcessingLambda",
    lambda_func=Lambda(
        function_arn="arn:aws:lambda:us-west-2:012345678910:function:split-dataset-lambda"
    ),
    inputs={
        s3_bucket = s3_bucket,
        data_file = data_file
    },
    outputs=[
        "train_file", "test_file"
    ]
)
```

下列範例示範如何建立使用 Lambda 函式指令碼建立並調用 Lambda 函式的 `Lambda` 步驟定義。

```
from sagemaker.workflow.lambda_step import LambdaStep
from sagemaker.lambda_helper import Lambda

step_lambda = LambdaStep(
    name="ProcessingLambda",
    lambda_func=Lambda(
      function_name="split-dataset-lambda",
      execution_role_arn=execution_role_arn,
      script="lambda_script.py",
      handler="lambda_script.lambda_handler",
      ...
    ),
    inputs={
        s3_bucket = s3_bucket,
        data_file = data_file
    },
    outputs=[
        "train_file", "test_file"
    ]
)
```

**輸入和輸出**

如果您的 `Lambda` 函式具有輸入或輸出，則輸入或輸出也必須在 `Lambda` 步驟中定義。

**注意**  
輸入和輸出參數不應巢狀化。例如，如果您使用巢狀字典作為輸出參數，則字典會被視為單一字串 (例如 `{"output1": "{\"nested_output1\":\"my-output\"}"}`)。如果您提供巢狀值並稍後嘗試參考它，則系統會拋出不可重試的用戶端錯誤。

定義 `Lambda` 步驟時，`inputs` 必須是鍵值對的字典。`inputs` 字典的每個值都必須是基本類型 (字串、整數或浮點數)。不支援巢狀物件。如果未定義，則 `inputs` 值預設為 `None`。

`outputs` 值必須是鍵清單。這些鍵指的是在 `Lambda` 函式的輸出中定義的字典。與 `inputs` 類似，這些鍵必須是基本類型，並且不支援巢狀物件。

**逾時和停止行為**

`Lambda` 類別具有一個 `timeout` 引數，可指定 Lambda 函式可以執行的最長時間。預設值是 120 秒，上限值為 10 分鐘。如果 Lambda 函式在達到逾時時間時正在執行，則 Lambda 步驟會失敗；不過，Lambda 函式會繼續執行。

Lambda 步驟執行時，無法停止管道程序，因為 Lambda 步驟調用的 Lambda 函式無法停止。如果您在 Lambda 函式執行時停止程序，管道會等待該函式完成，或直到逾時為止。這取決於哪個先發生。程序接著會停止。如果 Lambda 函式完成，則管道程序狀態為 `Stopped`。如果超時到達，則管道程序狀態為 `Failed`。

## ClarifyCheck 步驟
<a name="step-type-clarify-check"></a>

您可以使用 `ClarifyCheck` 步驟可對照先前的基準執行基準漂移檢查，以便進行偏差分析和模型可解釋性。然後，您可以使用 `model.register()` 方法產生並[註冊基準](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-calculations)，然後使用 `[step\$1args](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step)` 將該方法的輸出傳遞給 [模型步驟](#step-type-model)。Amazon SageMaker Model Monitor 可以針對您的模型端點使用這些基準，進行漂移檢查。因此，您不需要個別提出[基準](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html)建議。

`ClarifyCheck` 步驟也可以從模型註冊中提取漂移檢查基準。`ClarifyCheck` 步驟會使用 SageMaker Clarify 預先建置的容器。此容器提供一系列模型監控功能，包括建議限制條件和根據指定基準來驗證限制條件。如需詳細資訊，請參閱[預先建置的 SageMaker Clarify 容器](clarify-processing-job-configure-container.md)。

### 設定 ClarifyCheck 步驟
<a name="configuring-step-type-clarify"></a>

您可以設定 `ClarifyCheck` 步驟，以便每次在管道中使用該步驟時僅執行下列其中一種可用的檢查類型。
+ 資料偏差檢查
+ 模型偏差檢查
+ 模型可解釋性檢查

若要這樣做，請使用下列其中一個檢查類型值來設定 `clarify_check_config` 參數：
+ `DataBiasCheckConfig`
+ `ModelBiasCheckConfig`
+ `ModelExplainabilityCheckConfig`

`ClarifyCheck` 步驟會啟動執行 SageMaker Clarify AI 預先建置容器的處理任務，並需要專用的[檢查和處理任務組態](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-configure-processing-jobs.html)。`ClarifyCheckConfig` 和 `CheckJobConfig` 是這些組態的協助程式函式。這些協助程式函式與 SageMaker Clarify 處理任務進行運算，以檢查模型偏差、資料偏差或模型可解釋性的方式一玫。如需詳細資訊，請參閱[執行 SageMaker Clarify 處理任務以進行偏差分析和解釋性](clarify-processing-job-run.md)。

### 控制漂移檢查的步驟行為
<a name="controlling-step-type-clarify"></a>

`ClarifyCheck` 步驟需要下列兩個布林標記來控制其行為：
+ `skip_check`：此參數指示是否略過針對先前基準的漂移檢查。如果設定為 `False`，則必須有已設定檢查類型的先前基準。
+ `register_new_baseline`：此參數指示是否可透過步驟屬性 `BaselineUsedForDriftCheckConstraints` 存取新計算的基準。如果設定為 `False`，則也必須有已設定檢查類型的先前基準。這可以透過 `BaselineUsedForDriftCheckConstraints` 屬性存取。

如需更多詳細資訊，請參閱[Amazon SageMaker Pipelines 中具有 ClarifyCheck 和 QualityCheck 步驟的基準計算、漂移偵測和生命週期](pipelines-quality-clarify-baseline-lifecycle.md)。

### 處理基準
<a name="step-type-clarify-working-with-baselines"></a>

您可以選擇性地指定 `model_package_group_name` 來尋找現有的基準。然後，`ClarifyCheck` 步驟會在模型套件群組中提取最新核准的模型套件上的 `DriftCheckBaselines`。

或者，您可以透過 `supplied_baseline_constraints` 參數提供先前的基準。如果同時指定 `model_package_group_name` 和 `supplied_baseline_constraints`，則 `ClarifyCheck` 步驟會使用 `supplied_baseline_constraints` 參數指定的基準。

如需使用 `ClarifyCheck` 步驟要求的詳細資訊，請參閱 *Amazon SageMaker AI SageMaker AI SDK for Python* 中的 [sagemaker.workflow.steps.ClarifyCheckStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.clarify_check_step.ClarifyCheckStep)。如需顯示如何在 Pipelines 中使用 `ClarifyCheck` 步驟的 Amazon SageMaker Studio Classic 筆記本，請參閱 [sagemaker-pipeline-model-monitor-clarify-steps.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/model-monitor-clarify-pipelines/sagemaker-pipeline-model-monitor-clarify-steps.ipynb)。

**Example 為資料偏差檢查建立 `ClarifyCheck` 步驟**  

```
from sagemaker.workflow.check_job_config import CheckJobConfig
from sagemaker.workflow.clarify_check_step import DataBiasCheckConfig, ClarifyCheckStep
from sagemaker.workflow.execution_variables import ExecutionVariables

check_job_config = CheckJobConfig(
    role=role,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    volume_size_in_gb=120,
    sagemaker_session=sagemaker_session,
)

data_bias_data_config = DataConfig(
    s3_data_input_path=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
    s3_output_path=Join(on='/', values=['s3:/', your_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'databiascheckstep']),
    label=0,
    dataset_type="text/csv",
    s3_analysis_config_output_path=data_bias_analysis_cfg_output_path,
)

data_bias_config = BiasConfig(
    label_values_or_threshold=[15.0], facet_name=[8], facet_values_or_threshold=[[0.5]]  
)

data_bias_check_config = DataBiasCheckConfig(
    data_config=data_bias_data_config,
    data_bias_config=data_bias_config,
)h

data_bias_check_step = ClarifyCheckStep(
    name="DataBiasCheckStep",
    clarify_check_config=data_bias_check_config,
    check_job_config=check_job_config,
    skip_check=False,
    register_new_baseline=False
   supplied_baseline_constraints="s3://sagemaker-us-west-2-111122223333/baseline/analysis.json",
    model_package_group_name="MyModelPackageGroup"
)
```

## QualityCheck 步驟
<a name="step-type-quality-check"></a>

使用 `QualityCheck` 步驟，針對管道中先前的資料品質或模型品質基準執行[基準建議](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html)和漂移檢查。然後，您可以使用 `model.register()` 方法產生並[註冊基準](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-calculations)，然後使用 `[step\$1args](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step)` 將該方法的輸出傳遞給 [模型步驟](#step-type-model)。

Model Monitor 可針對模型端點使用這些基準進行漂移檢查，因此您不需要單獨執行基準建議。`QualityCheck` 步驟也可以從模型註冊中提取漂移檢查基準。`QualityCheck` 步驟會利用 Amazon SageMaker AI Model Monitor 預先建置的容器。此容器具有一系列模型監控功能，包括建議限制條件、產生統計資料和根據基準驗證限制條件。如需詳細資訊，請參閱[Amazon SageMaker Model Monitor 預建容器](model-monitor-pre-built-container.md)。

### 設定 QualityCheck 步驟
<a name="configuring-step-type-quality"></a>

您可以將 `QualityCheck` 步驟設定為每次在管道中使用該步驟時，僅執行下列其中一種檢查類型。
+ 資料品質檢查
+ 模型品質檢查

為了執行此動作，您可以使用下列其中一個檢查類型值來設定 `quality_check_config` 參數：
+ `DataQualityCheckConfig`
+ `ModelQualityCheckConfig`

`QualityCheck` 步驟會啟動執行 Model Monitor 預先建置容器的處理任務，並需要檢查和處理任務的專用組態。`QualityCheckConfig` 和 `CheckJobConfig` 是這些組態的協助程式函式。這些協助程式函式與 Model Monitor 為模型品質或資料品質監控建立基準的方式一致。如需有關模型監視器基線建議的詳細資訊，請參閱[建立基準](model-monitor-create-baseline.md)和[建立模型品質基準](model-monitor-model-quality-baseline.md)。

### 控制漂移檢查的步驟行為
<a name="controlling-step-type-quality"></a>

`QualityCheck` 步驟需要下列兩個布林標記來控制其行為：
+ `skip_check`：此參數指示是否略過針對先前基準的漂移檢查。如果設定為 `False`，則必須有已設定檢查類型的先前基準。
+ `register_new_baseline`：此參數指示是否可透過步驟屬性 `BaselineUsedForDriftCheckConstraints` 和 `BaselineUsedForDriftCheckStatistics` 存取新計算的基準。如果設定為 `False`，則也必須有已設定檢查類型的先前基準。這些可透過 `BaselineUsedForDriftCheckConstraints` 和 `BaselineUsedForDriftCheckStatistics` 屬性存取。

如需更多詳細資訊，請參閱[Amazon SageMaker Pipelines 中具有 ClarifyCheck 和 QualityCheck 步驟的基準計算、漂移偵測和生命週期](pipelines-quality-clarify-baseline-lifecycle.md)。

### 處理基準
<a name="step-type-quality-working-with-baselines"></a>

您可以直接透過 `supplied_baseline_statistics` 和 `supplied_baseline_constraints` 參數指定先前的基準。您也可以指定 `model_package_group_name`，然後 `QualityCheck` 步驟會在模型套件組中提取最新核准的模型套件上的 `DriftCheckBaselines`。

當您指定下列項目時，`QualityCheck` 步驟會對 `QualityCheck` 步驟的檢查類型使用 `supplied_baseline_constraints` 和 `supplied_baseline_statistics` 指定的基準。
+ `model_package_group_name`
+ `supplied_baseline_constraints`
+ `supplied_baseline_statistics`

如需使用 `QualityCheck` 步驟要求的詳細資訊，請參閱 *Amazon SageMaker AI SageMaker AI SDK for Python* 中的 [sagemaker.workflow.steps.QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.quality_check_step.QualityCheckStep)。如需顯示如何在 Pipelines 中使用 `QualityCheck` 步驟的 Amazon SageMaker Studio Classic 筆記本，請參閱 [sagemaker-pipeline-model-monitor-clarify-steps.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/model-monitor-clarify-pipelines/sagemaker-pipeline-model-monitor-clarify-steps.ipynb)。

**Example 為資料品質檢查建立 `QualityCheck` 步驟**  

```
from sagemaker.workflow.check_job_config import CheckJobConfig
from sagemaker.workflow.quality_check_step import DataQualityCheckConfig, QualityCheckStep
from sagemaker.workflow.execution_variables import ExecutionVariables

check_job_config = CheckJobConfig(
    role=role,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    volume_size_in_gb=120,
    sagemaker_session=sagemaker_session,
)

data_quality_check_config = DataQualityCheckConfig(
    baseline_dataset=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
    dataset_format=DatasetFormat.csv(header=False, output_columns_position="START"),
    output_s3_uri=Join(on='/', values=['s3:/', your_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'dataqualitycheckstep'])
)

data_quality_check_step = QualityCheckStep(
    name="DataQualityCheckStep",
    skip_check=False,
    register_new_baseline=False,
    quality_check_config=data_quality_check_config,
    check_job_config=check_job_config,
    supplied_baseline_statistics="s3://sagemaker-us-west-2-555555555555/baseline/statistics.json",
    supplied_baseline_constraints="s3://sagemaker-us-west-2-555555555555/baseline/constraints.json",
    model_package_group_name="MyModelPackageGroup"
)
```

## EMR 步驟
<a name="step-type-emr"></a>

使用 Amazon SageMaker Pipelines [EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-overview.html) 步驟：
+ 在執行中 Amazon EMR 叢集上處理 [Amazon EMR 步驟](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-work-with-steps.html)。
+ 讓管道為您在 Amazon EMR 叢集上建立和管理叢集。

如需有關 Amazon EMR 的詳細資訊，請參閱 [Getting started with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html)。

EMR 步驟要求 `EMRStepConfig` 包括 Amazon EMR 叢集所使用的 JAR 檔案的位置，以及要傳遞的任何引數。如果您想要在執行中的 EMR 叢集上執行步驟，您也會提供 Amazon EMR 叢集 ID。您也可以傳遞叢集組態，在其為您建立、管理和終止的叢集上執行 EMR 步驟。以下章節包括示範這兩種方法的範例和範例筆記本連結。

**注意**  
EMR 步驟要求傳遞至管道的角色具有其他許可。將 [AWS 受管政策：`AmazonSageMakerPipelinesIntegrations`](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam-awsmanpol-pipelines.html#security-iam-awsmanpol-AmazonSageMakerPipelinesIntegrations) 連接至您的管道角色，或確保該角色包含該政策中的許可。
如果您在執行中的叢集上處理 EMR 步驟，則只能使用處於下列其中一種狀態的叢集：  
`STARTING`
`BOOTSTRAPPING`
`RUNNING`
`WAITING`
如果您在正在執行的叢集上處理 EMR 步驟，則 EMR 叢集上最多可以有 256 個處於 `PENDING` 狀態的 EMR 步驟。提交超出此限制的 EMR 步驟會導致管道執行失敗。您可以考慮使用 [管道步驟的重試政策](pipelines-retry-policy.md)。
您可以指定叢集 ID 或叢集組態，但不能同時指定兩者。
EMR 步驟憑藉 Amazon EventBridge 來監控 EMR 步驟或叢集狀態變更。如果您在正在執行的叢集上處理 Amazon EMR 任務，EMR 步驟會使用 `SageMakerPipelineExecutionEMRStepStatusUpdateRule` 規則來監控 EMR 步驟狀態。如果您在 EMR 步驟建立的叢集上處理任務，則此步驟會使用 `SageMakerPipelineExecutionEMRClusterStatusRule` 規則來監控叢集狀態中的變化。如果您在 AWS 帳戶中看到這些 EventBridge 規則之一，請不要刪除它們，否則 EMR 步驟可能無法完成。

**將 Amazon EMR 步驟新增至管道**

若要將 EMR 步驟新增至管道，請執行下列動作：
+ 依照[啟動 Amazon SageMaker Studio 中的指示開啟 Studio ](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html)主控台。
+ 在左側導覽窗格中，選取**管道**。
+ 選擇**建立**。
+ 選擇**空白**。
+ 在左側邊欄中，選擇**處理資料**並將其拖曳至畫布。
+ 在畫布中，選擇您新增的**處理資料**步驟。
+ 在右側邊欄中的 模式下，選擇 **EMR （受管）**。
+ 在右側邊欄中，完成**設定和詳細資訊**索引標籤中的表單。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.fail\$1step.EMRstep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.emr_step.EMRStep)。

**在正在執行的 Amazon EMR 叢集上啟動新任務**

若要在執行中的 Amazon EMR 叢集上啟動新任務，請將叢集 ID 作為字串傳遞至 `EMRStep` 的 `cluster_id` 引數。下列範例示範此程序。

```
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig

emr_config = EMRStepConfig(
    jar="jar-location", # required, path to jar file used
    args=["--verbose", "--force"], # optional list of arguments to pass to the jar
    main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest 
    properties=[ # optional list of Java properties that are set when the step runs
    {
        "key": "mapred.tasktracker.map.tasks.maximum",
        "value": "2"
    },
    {
        "key": "mapreduce.map.sort.spill.percent",
        "value": "0.90"
   },
   {
       "key": "mapreduce.tasktracker.reduce.tasks.maximum",
       "value": "5"
    }
  ]
)

step_emr = EMRStep (
    name="EMRSampleStep", # required
    cluster_id="j-1ABCDEFG2HIJK", # include cluster_id to use a running cluster
    step_config=emr_config, # required
    display_name="My EMR Step",
    description="Pipeline step to execute EMR job"
)
```

如需逐步引導您完成整個範例的範例筆記本，請參閱 [Pipelines EMR Step With Running EMR Cluster](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/emr-step/sagemaker-pipelines-emr-step-with-running-emr-cluster.ipynb)。

**在新的 Amazon EMR 叢集上啟動新任務**

若要在 `EMRStep` 為您建立的新叢集上啟動新任務，請提供您的叢集組態作為字典。字典必須具有與 [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html) 請求相同的結構。但是，請勿在叢集組態中包含下列欄位：
+ [`Name`]
+ [`Steps`]
+ [`AutoTerminationPolicy`]
+ [`Instances`][`KeepJobFlowAliveWhenNoSteps`]
+ [`Instances`][`TerminationProtected`]

所有其他 `RunJobFlow` 引數都可以在叢集組態中使用。如需有關請求語法的詳細資訊，請參閱 [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html)。

下列範例會將叢集組態傳遞至 EMR 步驟定義。這會提示在新的 EMR 叢集上啟動新任務的步驟。此範例中的 EMR 叢集組態包含主要和核心 EMR 叢集節點的規格。如需有關 Amazon EMR 節點類型的詳細資訊，請參閱[了解節點類型：主節點、核心節點和任務節點](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html)。

```
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig

emr_step_config = EMRStepConfig(
    jar="jar-location", # required, path to jar file used
    args=["--verbose", "--force"], # optional list of arguments to pass to the jar
    main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest 
    properties=[ # optional list of Java properties that are set when the step runs
    {
        "key": "mapred.tasktracker.map.tasks.maximum",
        "value": "2"
    },
    {
        "key": "mapreduce.map.sort.spill.percent",
        "value": "0.90"
   },
   {
       "key": "mapreduce.tasktracker.reduce.tasks.maximum",
       "value": "5"
    }
  ]
)

# include your cluster configuration as a dictionary
emr_cluster_config = {
    "Applications": [
        {
            "Name": "Spark", 
        }
    ],
    "Instances":{
        "InstanceGroups":[
            {
                "InstanceRole": "MASTER",
                "InstanceCount": 1,
                "InstanceType": "m5.2xlarge"
            },
            {
                "InstanceRole": "CORE",
                "InstanceCount": 2,
                "InstanceType": "m5.2xlarge"
            }
        ]
    },
    "BootstrapActions":[],
    "ReleaseLabel": "emr-6.6.0",
    "JobFlowRole": "job-flow-role",
    "ServiceRole": "service-role"
}

emr_step = EMRStep(
    name="emr-step",
    cluster_id=None,
    display_name="emr_step",
    description="MyEMRStepDescription",
    step_config=emr_step_config,
    cluster_config=emr_cluster_config
)
```

如需逐步引導您完成整個範例的範例筆記本，請參閱 [Pipelines EMR Step With Cluster Lifecycle Management](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/emr-step/sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb)。

## EMR 無伺服器步驟
<a name="step-type-serverless"></a>

若要將 EMR 無伺服器步驟新增至管道，請執行下列動作：
+ 依照[啟動 Amazon SageMaker Studio 中的指示開啟 Studio ](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html)主控台。
+ 在左側導覽窗格中，選取**管道**。
+ 選擇**建立**。
+ 選擇**空白**。
+ 在左側邊欄中，選擇**處理資料**並將其拖曳至畫布。
+ 在畫布中，選擇您新增的**處理資料**步驟。
+ 在右側邊欄中的 模式下，選擇 **EMR （無伺服器）**。
+ 在右側邊欄中，完成**設定和詳細資訊**索引標籤中的表單。

## 筆記本任務步驟
<a name="step-type-notebook-job"></a>

使用 `NotebookJobStep` 以非互動方式將您的 SageMaker 筆記本任務做為管道步驟執行。如果您在 Pipelines 拖放 UI 中建置管道，請使用 [執行程式碼步驟](#step-type-executecode) 來執行筆記本。如需 SageMaker 筆記本任務的詳細資訊，請參閱 [SageMaker 筆記本工作](notebook-auto-run.md)。

`NotebookJobStep` 至少需要一個輸入筆記本、映像 URI 和核心名稱。如需筆記本任務步驟要求以及您可以設定為自訂步驟的其他參數的詳細資訊，請參閱 [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep)。

下列範例使用最少引數來定義 `NotebookJobStep`。

```
from sagemaker.workflow.notebook_job_step import NotebookJobStep


notebook_job_step = NotebookJobStep(
    input_notebook=input_notebook,
    image_uri=image_uri,
    kernel_name=kernel_name
)
```

您的 `NotebookJobStep` 管道步驟會被視為 SageMaker 筆記本任務。因此，請在 Studio Classic UI 筆記本任務儀表板中追蹤執行狀態，方法是使用 `tags` 引數包含特定標籤。如需要包含的標籤詳細資訊，請參閱[在 Studio UI 儀表板中檢視您的筆記本任務](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash)。

此外，如果您使用 SageMaker Python SDK 排程筆記本任務，您只能指定特定映像來執行筆記本任務。如需詳細資訊，請參閱[SageMaker AI Python SDK 筆記本任務的映像限制條件](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk)。

## 失敗步驟
<a name="step-type-fail"></a>

使用失敗步驟在未達到所需條件或狀態時停止 Amazon SageMaker Pipelines 執行。失敗步驟也可讓您輸入自訂錯誤訊息，指出管道執行失敗的原因。

**注意**  
當失敗步驟和其他管道步驟同時執行時，在所有並行步驟都完成之前，管道都不會終止。

### 使用失敗步驟的限制
<a name="step-type-fail-limitations"></a>
+ 您無法將失敗步驟新增至其他步驟的 `DependsOn` 清單。如需詳細資訊，請參閱[步驟之間的自訂相依性](build-and-manage-steps.md#build-and-manage-custom-dependency)。
+ 其他步驟無法參考失敗步驟。它*始終*是管道執行的最後一步。
+ 您無法重試以失敗步驟結束的管道執行。

您能以靜態文字字串的形式建立失敗步驟錯誤訊息。或者，您也可以使用[管道參數](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html)和[聯結](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html?highlight=Join#sagemaker.workflow.functions.Join)操作或其他[步驟屬性](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#build-and-manage-properties)，來建立資訊更豐富的錯誤訊息，如果您使用 SDK 的話。

------
#### [ Pipeline Designer ]

若要將失敗步驟新增至您的管道，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**失敗**並將其拖曳至畫布。

1. 在畫布中，選擇您已新增的**失敗**步驟。

1. 在右側邊欄中，完成**設定**和**詳細資訊**索引標籤中的表單。如需這些索引標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.fail\$1step.FailStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.fail_step.FailStep)。

1. 如果畫布包含緊接在您已新增之**失敗**步驟之前的任何步驟，請按一下游標，然後將其從該步驟拖曳到**失敗**步驟以建立邊緣。

1. 如果畫布包含緊接在您已新增之**失敗**步驟之後的任何步驟，請按一下游標，然後將其從**失敗**步驟拖曳到該步驟以建立邊緣。

------
#### [ SageMaker Python SDK ]

**Example**  
下列範例程式碼片段使用 `FailStep` (包含透過管道參數和 `Join` 操作設定的 `ErrorMessage`)。  

```
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join
from sagemaker.workflow.parameters import ParameterInteger

mse_threshold_param = ParameterInteger(name="MseThreshold", default_value=5)
step_fail = FailStep(
    name="AbaloneMSEFail",
    error_message=Join(
        on=" ", values=["Execution failed due to MSE >", mse_threshold_param]
    ),
)
```

------

# 新增整合
<a name="build-and-manage-steps-integration"></a>

MLflow 整合可讓您將 MLflow 與管道搭配使用，以選取追蹤伺服器或無伺服器應用程式、選擇實驗和日誌指標。

## 重要概念
<a name="add-integration-key-concepts"></a>

**預設應用程式建立** - 當您輸入管道視覺化編輯器時，將會建立預設 MLflow 應用程式。

**整合面板** - 新的整合面板包含 MLflow，您可以選取和設定。

**更新應用程式和實驗** - 在管道執行期間覆寫所選應用程式和實驗的選項。

## 運作方式
<a name="add-integration-how-it-works"></a>
+ 前往**管道視覺化編輯器**
+ 選擇工具列上的**整合** 
+ 選擇 **MLflow**
+ 設定 MLflow 應用程式和實驗

## 範例螢幕擷取畫面
<a name="add-integration-example-screenshots"></a>

整合側邊面板

![\[要執行的描述。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/screenshot-pipeline-1.png)


MLflow 組態

![\[要執行的描述。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/screenshot-pipeline-2.png)


如何在管道執行期間覆寫實驗

![\[要執行的描述。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/screenshot-pipeline-3.png)


## 步驟屬性
<a name="build-and-manage-properties"></a>

使用 `properties` 屬性在管道中的步驟之間新增資料相依性。Pipelines 會使用這些資料相依性，從管道定義建構 DAG。這些屬性可以作為預留位置值參考，並在執行期解析。

Pipelines 步驟的 `properties` 屬性與 `Describe` 呼叫針對對應 SageMaker AI 任務類型傳回的物件相符。對於每個任務類型，`Describe` 呼叫都會傳回下列回應物件：
+ `ProcessingStep` – [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html)
+ `TrainingStep` – [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html)
+ `TransformStep` – [DescribeTransformJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTransformJob.html)

若要在建立資料相依性期間檢查每個步驟類型可參考的屬性，請參閱 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 中的 *[資料相依性 - 屬性參考](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#data-dependency-property-reference)*。

## 步驟平行處理
<a name="build-and-manage-parallelism"></a>

當步驟不依賴任何其他步驟時，其會在管道執行時立即執行。不過，平行執行過多管道步驟可能會快速耗盡可用的資源。使用 `ParallelismConfiguration` 控制管道執行的並行步驟數量。

下列範例使用 `ParallelismConfiguration` 將並行步驟限制為 5 個。

```
pipeline.create(
    parallelism_config=ParallelismConfiguration(5),
)
```

## 步驟之間的資料相依性
<a name="build-and-manage-data-dependency"></a>

您可以透過指定步驟之間的資料關係來定義 DAG 的結構。若要在步驟之間建立資料相依性，請將一個步驟的屬性作為輸入傳遞至管道中的另一個步驟。在提供輸入的步驟執行完後，接收輸入的步驟才會啟動。

資料相依性使用以下格式的 JsonPath 表示法。此格式會周遊 JSON 屬性檔案。這表示您可以視需要附加任意數量的 *<property>* 執行個體，以達到檔案中所需的巢狀屬性。如需有關 JsonPath 符號的詳細資訊，請參閱 [JSONPath 回購](https://github.com/json-path/JsonPath)。

```
<step_name>.properties.<property>.<property>
```

下面示範如何使用處理步驟的 `ProcessingOutputConfig` 屬性來指定 Amazon S3 儲存貯體。

```
step_process.properties.ProcessingOutputConfig.Outputs["train_data"].S3Output.S3Uri
```

若要建立資料相依性，請將儲存貯體傳遞至訓練步驟，如下所示。

```
from sagemaker.workflow.pipeline_context import PipelineSession

sklearn_train = SKLearn(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="CensusTrain",
    step_args=sklearn_train.fit(inputs=TrainingInput(
        s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
            "train_data"].S3Output.S3Uri
    ))
)
```

若要在建立資料相依性期間檢查每個步驟類型可參考的屬性，請參閱 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 中的[Data Dependency - Property Reference](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#data-dependency-property-reference)。**

## 步驟之間的自訂相依性
<a name="build-and-manage-custom-dependency"></a>

當您指定資料相依性時，Pipelines 會提供步驟之間的資料連線。或者，一個步驟可以存取上一個步驟中的資料，而無需直接使用 Pipelines。在此情況下，您可以建立自訂相依性，告知 Pipelines 在另一個步驟完成執行之前不要開始步驟。您可以透過指定步驟的 `DependsOn` 屬性來建立自訂相依性。

例如，下列內容定義了僅在 `A` 步驟和 `B` 步驟執行完後才開始的 `C` 步驟。

```
{
  'Steps': [
    {'Name':'A', ...},
    {'Name':'B', ...},
    {'Name':'C', 'DependsOn': ['A', 'B']}
  ]
}
```

如果相依性將建立循環相依性，Pipelines 會擲回驗證例外狀況。

下列範例會建立在處理步驟執行完後開始的訓練步驟。

```
processing_step = ProcessingStep(...)
training_step = TrainingStep(...)

training_step.add_depends_on([processing_step])
```

下列範例會建立在兩個不同的處理步驟執行完後才開始的訓練步驟。

```
processing_step_1 = ProcessingStep(...)
processing_step_2 = ProcessingStep(...)

training_step = TrainingStep(...)

training_step.add_depends_on([processing_step_1, processing_step_2])
```

下面提供了建立自訂相依性的替代方法。

```
training_step.add_depends_on([processing_step_1])
training_step.add_depends_on([processing_step_2])
```

下列範例會建立一個訓練步驟，接收來自一個處理步驟的輸入，並等待另一個處理步驟執行完成。

```
processing_step_1 = ProcessingStep(...)
processing_step_2 = ProcessingStep(...)

training_step = TrainingStep(
    ...,
    inputs=TrainingInput(
        s3_data=processing_step_1.properties.ProcessingOutputConfig.Outputs[
            "train_data"
        ].S3Output.S3Uri
    )

training_step.add_depends_on([processing_step_2])
```

下列範例示範如何擷取步驟之自訂相依性的字串清單。

```
custom_dependencies = training_step.depends_on
```

## 步驟中的自訂映像
<a name="build-and-manage-images"></a>

 在您的管道中建立步驟時，您可以使用任何可用的 SageMaker AI [深度學習容器映像](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)。

您也可以在管道步驟中使用您自己的容器。因為您無法從 Studio Classic 內建立映像，所以必須先使用另一種方法建立映像，然後才能使用該映像搭配 Pipelines。

若要在為管道建立步驟時使用您自己的容器，請在估算器定義中包含映像 URI。如需使用您自己的容器搭配 SageMaker AI 的詳細資訊，請參閱[使用 Docker 容器搭配 SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html)。

# 使用 @step 裝飾項目直接移轉程式碼
<a name="pipelines-step-decorator"></a>

`@step` 裝飾項目是將本機機器學習 (ML) 程式碼轉換為一或多個管道步驟的特徵。您可以像撰寫任何 ML 專案一樣撰寫 ML 函式。一旦使用 `@remote` 裝飾項目在本機或做為訓練任務測試，您就可以透過新增 `@step` 裝飾項目，將函式轉換為 SageMaker AI 管道步驟。然後，您可以將 `@step` 裝飾的函式呼叫的輸出做為步驟傳遞至 Pipelines，以建立和執行管道。您也可以使用 `@step` 裝飾項目鏈結一系列函式，以建立多步驟有向無環圖 (DAG) 管道。

使用 `@step` 裝飾項目的設定與使用 `@remote` 裝飾項目的設定相同。如需如何[設定環境](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html#train-remote-decorator-env)和[使用組態檔案](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html)來設定預設值的詳細資訊，請參閱遠端函式文件。如需 `@step` 裝飾項目的詳細資訊，請參閱。[sagemaker.workflow.function\$1step.step](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.function_step.step)。

若要檢視示範如何使用 `@step` 裝飾項目的範例筆記本，請參閱 [@step 裝飾項目範例筆記本](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/step-decorator)。

下列各節說明如何使用 `@step` 裝飾項目註釋您的本機 ML 程式碼，以建立步驟、使用步驟建立和執行管道，以及為您的使用案例自訂體驗。

**Topics**
+ [使用 `@step` 裝飾的函式建立管道](pipelines-step-decorator-create-pipeline.md)
+ [執行管道](pipelines-step-decorator-run-pipeline.md)
+ [設定您的管道](pipelines-step-decorator-cfg-pipeline.md)
+ [最佳實務](pipelines-step-decorator-best.md)
+ [限制](pipelines-step-decorator-limit.md)

# 使用 `@step` 裝飾的函式建立管道
<a name="pipelines-step-decorator-create-pipeline"></a>

您可以建立一個管道，方法是使用 `@step` 裝飾項目將 Python 函式轉換為管道步驟、在這些函式之間建立相依性以建立管道圖形 (或有向無環圖 (DAG))，然後將該圖形的分葉節點做為步驟清單傳遞至管道。下列各節會使用範例詳細說明此程序。

**Topics**
+ [將函式轉換為步驟](#pipelines-step-decorator-run-pipeline-convert)
+ [在步驟之間建立相依性](#pipelines-step-decorator-run-pipeline-link)
+ [使用 `ConditionStep` 搭配 `@step` 裝飾的步驟](#pipelines-step-decorator-condition)
+ [使用步驟的 `DelayedReturn` 輸出定義管道](#pipelines-step-define-delayed)
+ [建立管道](#pipelines-step-decorator-pipeline-create)

## 將函式轉換為步驟
<a name="pipelines-step-decorator-run-pipeline-convert"></a>

若要使用 `@step` 裝飾項目建立步驟，請使用 `@step` 註釋函式。下列範例顯示預先處理資料的 `@step` 裝飾函式。

```
from sagemaker.workflow.function_step import step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe
    
step_process_result = preprocess(raw_data)
```

當您調用 `@step` 裝飾函式時，SageMaker AI 會傳回 `DelayedReturn` 執行個體，而不是執行該函式。`DelayedReturn` 執行個體是該函式實際傳回的 Proxy。`DelayedReturn` 執行個體可以做為引數傳遞至另一個函式，或直接做為步驟傳遞至管道執行個體。如需 `DelayedReturn` 類別的相關資訊，請參閱 [sagemaker.workflow.function\$1step.DelayedReturn](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.function_step.DelayedReturn)。

## 在步驟之間建立相依性
<a name="pipelines-step-decorator-run-pipeline-link"></a>

當您在兩個步驟之間建立相依性時，您可以在管道圖中的步驟之間建立連線。下列各節介紹在管道步驟之間建立相依性的多種方式。

### 透過輸入引數的資料相依性
<a name="pipelines-step-decorator-run-pipeline-link-interstep"></a>

將某個函式的 `DelayedReturn` 輸出做為輸入傳遞至另一個函式，會自動在管道 DAG 中建立資料相依性。在下列範例中，將 `preprocess` 函式的 `DelayedReturn` 輸出傳遞至 `train` 函式，會在 `preprocess` 與 `train` 之間建立相依性。

```
from sagemaker.workflow.function_step import step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe

@step
def train(training_data):
    ...
    return trained_model

step_process_result = preprocess(raw_data)    
step_train_result = train(step_process_result)
```

前一個範例定義了使用 `@step` 裝飾的訓練函式。調用此函式時，其會接收預先處理管道步驟的 `DelayedReturn` 輸出做為輸入。調用訓練函式會傳回另一個 `DelayedReturn`。執行個體。此執行個體會保留形成管道 DAG 之函式 (即此範例中 `preprocess` 的步驟) 中定義之所有先前步驟的相關資訊。

在前一個範例中，`preprocess` 函式會傳回單一值。如需清單或元組等更複雜的傳回類型，請參閱[限制](pipelines-step-decorator-limit.md)。

### 定義自訂相依性
<a name="pipelines-step-decorator-run-pipeline-link-custom"></a>

在前一個範例中，`train` 函式收到 `preprocess` 的 `DelayedReturn` 輸出並建立了相依性。如果您想要明確定義相依性，而不傳遞前一個步驟輸出，請使用 `add_depends_on` 函式搭配步驟。您可以使用 `get_step()` 函式，從其 `DelayedReturn` 執行個體擷取基礎步驟，然後使用相依性做為輸入來呼叫 `add_depends_on`\$1on。若要檢視 `get_step()` 函式定義，請參閱 [sagemaker.workflow.step\$1outputs.get\$1step](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_outputs.get_step)。下列範例展示如何使用 `get_step()` 和 `add_depends_on()` 建立 `preprocess` 與 `train` 之間的相依性。

```
from sagemaker.workflow.step_outputs import get_step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    processed_data = ..
    return s3.upload(processed_data)

@step
def train():
    training_data = s3.download(....)
    ...
    return trained_model

step_process_result = preprocess(raw_data)    
step_train_result = train()

get_step(step_train_result).add_depends_on([step_process_result])
```

### 將資料從 `@step` 裝飾函式傳遞至傳統管道步驟，或從中將資料傳遞至該函式
<a name="pipelines-step-decorator-run-pipeline-link-pass"></a>

您可以建立一個管道，其中包含 `@step` 裝飾步驟和傳統管道步驟，以及在它們之間傳遞資料。例如，您可以使用 `ProcessingStep` 來處理資料，並將其結果傳遞至 `@step` 裝飾的訓練函式。在下列範例中，`@step` 裝飾的訓練步驟會參考處理步驟的輸出。

```
# Define processing step

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

sklearn_processor = SKLearnProcessor(
    framework_version='1.2-1',
    role='arn:aws:iam::123456789012:role/SagemakerExecutionRole',
    instance_type='ml.m5.large',
    instance_count='1',
)

inputs = [
    ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),
]
outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
    ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
]

process_step = ProcessingStep(
    name="MyProcessStep",
    step_args=sklearn_processor.run(inputs=inputs, outputs=outputs,code='preprocessing.py'),
)
```

```
# Define a @step-decorated train step which references the 
# output of a processing step

@step
def train(train_data_path, test_data_path):
    ...
    return trained_model
    
step_train_result = train(
   process_step.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
   process_step.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
)
```

## 使用 `ConditionStep` 搭配 `@step` 裝飾的步驟
<a name="pipelines-step-decorator-condition"></a>

Pipelines 支援 `ConditionStep` 類別，其會評估先前步驟的結果，以決定要在管道中採取的動作。您也可以使用 `ConditionStep` 搭配 `@step` 裝飾的步驟。若要使用任何 `@step` 裝飾步驟的輸出搭配 `ConditionStep`，請輸入該步驟的輸出做為 `ConditionStep` 的引數。在下列範例中，條件步驟會收到 `@step` 裝飾的模型評估步驟的輸出。

```
# Define steps

@step(name="evaluate")
def evaluate_model():
    # code to evaluate the model
    return {
        "rmse":rmse_value
    }
    
@step(name="register")
def register_model():
    # code to register the model
    ...
```

```
# Define ConditionStep

from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.fail_step import FailStep

conditionally_register = ConditionStep(
    name="conditional_register",
    conditions=[
        ConditionGreaterThanOrEqualTo(
            # Output of the evaluate step must be json serializable
            left=evaluate_model()["rmse"],  # 
            right=5,
        )
    ],
    if_steps=[FailStep(name="Fail", error_message="Model performance is not good enough")],
    else_steps=[register_model()],
)
```

## 使用步驟的 `DelayedReturn` 輸出定義管道
<a name="pipelines-step-define-delayed"></a>

無論您是否使用 `@step` 裝飾項目，您都會以相同的方式定義管道。將 `DelayedReturn` 執行個體傳遞至您的管道時，您不需要傳遞完整步驟清單以建置管道。SDK 會根據您定義的相依性，自動推斷先前的步驟。管道圖中包含您傳遞至管道的 `Step` 物件或 `DelayedReturn` 物件的所有先前步驟。在下列範例中，管道會收到 `train` 函式的 `DelayedReturn` 物件。SageMaker AI 會將 `preprocess` 步驟做為 `train` 的前一個步驟傳遞至管道圖。

```
from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_train_result],
    sagemaker_session=<sagemaker-session>,
)
```

如果步驟之間沒有資料或自訂相依性，而且您平行執行多個步驟，則管道圖具有多個分葉節點。將清單中的所有這些分葉節點傳遞至管道定義中的 `steps` 引數，如下列範例所示：

```
@step
def process1():
    ...
    return data
    
@step
def process2():
   ...
   return data
   
step_process1_result = process1()
step_process2_result = process2()

pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_process1_result, step_process2_result],
    sagemaker_session=sagemaker-session,
)
```

當管道執行時，這兩個步驟都會平行執行。

您只能將圖形的分葉節點傳遞至管道，因為分葉節點包含透過資料或自訂相依性定義之所有先前步驟的相關資訊。當其編譯管道時，SageMaker AI 也會推斷形成管道圖的所有後續步驟，並將每個步驟做為個別步驟新增至管道。

## 建立管道
<a name="pipelines-step-decorator-pipeline-create"></a>

呼叫 `pipeline.create()` 來建立管道，如下列程式碼片段所示。如需 `create()` 的詳細資訊，請參閱 [sagemaker.workflow.pipeline.Pipeline.create](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.create)。

```
role = "pipeline-role"
pipeline.create(role)
```

當您呼叫 `pipeline.create()` 時，SageMaker AI 會編譯所有定義為管道執行個體一部分的步驟。SageMaker AI 會將序列化函式、引數和所有其他步驟相關成品上傳至 Amazon S3。

根據下列結構，資料位於 S3 儲存貯體中：

```
s3_root_uri/
    pipeline_name/
        sm_rf_user_ws/
            workspace.zip  # archive of the current working directory (workdir)
        step_name/
            timestamp/
                arguments/                # serialized function arguments
                function/                 # serialized function
                pre_train_dependencies/   # any dependencies and pre_execution scripts provided for the step       
        execution_id/
            step_name/
                results     # returned output from the serialized function including the model
```

`s3_root_uri` 定義在 SageMaker AI 組態檔案中，並套用至整個管道。如果未定義，則會使用預設的 SageMaker AI 儲存貯體。

**注意**  
每次 SageMaker AI 編譯管道時，SageMaker AI 都會將步驟的序列化函式、引數和相依性儲存在以目前時間加上時間戳記的資料夾中。每次執行 `pipeline.create()`、`pipeline.update()`、`pipeline.upsert()` 或 `pipeline.definition()` 時都會發生這種情況。

# 執行管道
<a name="pipelines-step-decorator-run-pipeline"></a>

下頁描述如何使用 Amazon SageMaker Pipelines 執行管道，無論是搭配 SageMaker AI 資源還是在本機。

使用 `pipeline.start()` 函式啟動新的管道執行，就像傳統 SageMaker AI 管道執行一樣。如需 `start()` 函式的相關資訊，請參閱 [sagemaker.workflow.pipeline.Pipeline.start](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.start)。

**注意**  
使用 `@step` 裝飾項目定義的步驟會做為訓練任務執行。因此，請注意下列限制：  
您帳戶中的執行個體限制和訓練任務限制。相應地更新您的限制，以避免發生任何限流或資源限制問題。
與管道中每個訓練步驟執行相關聯的金錢成本。如需詳細資訊，請參閱 [Amazon SageMaker 定價](https://aws.amazon.com/sagemaker/pricing/)。

## 從本機執行的管道擷取結果
<a name="pipelines-step-decorator-run-pipeline-retrieve"></a>

若要檢視管道執行任何步驟的結果，請使用 [execution.result()](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline._PipelineExecution.result           )，如下列程式碼片段所示：

```
execution = pipeline.start()
execution.result(step_name="train")
```

**注意**  
Pipelines 不支援本機模式下的 `execution.result()`。

您一次只能擷取一個步驟的結果。如果步驟名稱是由 SageMaker AI 產生，您可以呼叫 `list_steps` 來擷取步驟名稱，如下所示：

```
execution.list_step()
```

## 在本機執行管道
<a name="pipelines-step-decorator-run-pipeline-local"></a>

您可以像執行傳統管道步驟一樣，使用 `@step` 裝飾步驟在本機執行管道。如需本機模式管道執行的詳細資訊，請參閱[使用本機模式執行管道](pipelines-local-mode.md)。若要使用本機模式，請將 `LocalPipelineSession` 而非 `SageMakerSession` 提供給管道定義，如下列範例所示：

```
from sagemaker.workflow.function_step import step
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import LocalPipelineSession

@step
def train():
    training_data = s3.download(....)
    ...
    return trained_model
    
step_train_result = train()

local_pipeline_session = LocalPipelineSession()

local_pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_train_result],
    sagemaker_session=local_pipeline_session # needed for local mode
)

local_pipeline.create(role_arn="role_arn")

# pipeline runs locally
execution = local_pipeline.start()
```

# 設定您的管道
<a name="pipelines-step-decorator-cfg-pipeline"></a>

建議您使用 SageMaker AI 組態檔案來設定管道的預設值。如需 SageMaker AI 組態檔案的詳細資訊，請參閱[搭配 SageMaker Python SDK 設定和使用預設值](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk)。新增至組態檔案的任何組態都會套用至管道中的所有步驟。如果您想要覆寫任何步驟的選項，請在 `@step` 裝飾項目引數中提供新值。下列主題描述如何設定組態檔案。

組態檔案中 `@step` 裝飾項目的組態與 `@remote` 裝飾項目的組態相同。若要在組態檔案中設定管道角色 ARN 和管道標籤，請使用下列程式碼片段中顯示的 `Pipeline` 區段：

```
SchemaVersion: '1.0'
SageMaker:
  Pipeline:
    RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
    Tags:
    - Key: 'tag_key'
      Value: 'tag_value'
```

對於您在組態檔案中設定的大多數預設值，您也可以透過將新值傳遞至 `@step` 裝飾項目來覆寫這些預設值。例如，您可以覆寫組態檔案中為預先處理步驟設定的執行個體類型，如下列範例所示：

```
@step(instance_type="ml.m5.large")
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe
```

一些引數不是 `@step` 裝飾項目參數清單的一部分 - 只能透過 SageMaker AI 組態檔案為整個管道設定這些引數。它們列示如下：
+ `sagemaker_session` (`sagemaker.session.Session`)：獲 SageMaker AI 委派服務呼叫的基礎 SageMaker AI 工作階段。如果未指定，則會使用預設組態建立工作階段，如下所示：

  ```
  SageMaker:
    PythonSDK:
      Modules:
        Session:
          DefaultS3Bucket: 'default_s3_bucket'
          DefaultS3ObjectKeyPrefix: 'key_prefix'
  ```
+ `custom_file_filter` (`CustomFileFilter)`)：一種 `CustomFileFilter` 物件，其指定要包含在管道步驟中的本機目錄和檔案。如果未指定，此值預設為 `None`。`custom_file_filter` 若要生效，您必須將 `IncludeLocalWorkdir` 設定為 `True`。下列範例顯示的組態會忽略所有筆記本檔案，以及名為 `data` 的檔案和目錄。

  ```
  SchemaVersion: '1.0'
  SageMaker:
    PythonSDK:
      Modules:
        RemoteFunction:
          IncludeLocalWorkDir: true
          CustomFileFilter: 
            IgnoreNamePatterns: # files or directories to ignore
            - "*.ipynb" # all notebook files
            - "data" # folder or file named "data"
  ```

  如需如何使用 `IncludeLocalWorkdir` 搭配 `CustomFileFilter` 的詳細資訊，請參閱[搭配 @remote 裝飾項目使用模組化代碼](train-remote-decorator-modular.md)。
+ `s3_root_uri (str)`：SageMaker AI 將程式碼封存和資料上傳至其中的根 Amazon S3 資料夾。如果未指定，則會使用預設 SageMaker AI 儲存貯體。
+ `s3_kms_key (str)`：用來加密輸入和輸出資料的金鑰。您只能在 SageMaker AI 組態檔案中設定此引數，而且此引數會套用至管道中定義的所有步驟。如果未指定，則值預設為 `None`。如需範例 S3 KMS 金鑰組態，請參閱下列程式碼片段：

  ```
  SchemaVersion: '1.0'
  SageMaker:
    PythonSDK:
      Modules:
        RemoteFunction:
          S3KmsKeyId: 's3kmskeyid'
          S3RootUri: 's3://amzn-s3-demo-bucket/my-project
  ```

# 最佳實務
<a name="pipelines-step-decorator-best"></a>

以下各節建議在您針對管道步驟使用 `@step` 裝飾項目時要遵循的最佳實務。

## 使用暖集區
<a name="pipelines-step-decorator-best-warmpool"></a>

如需加快管道步驟的執行速度，請使用為訓練任務提供的暖集區功能。您可以將 `keep_alive_period_in_seconds` 引數提供給 `@step` 裝飾項目以開啟暖集區功能，如下列程式碼片段所示範：

```
@step(
   keep_alive_period_in_seconds=900
)
```

如需有關暖集區的詳細資訊，請參閱 [SageMaker AI 受管暖集區](train-warm-pools.md)。

## 建構您的目錄
<a name="pipelines-step-decorator-best-dir"></a>

建議您在使用 `@step` 裝飾項目時使用程式碼模組。將您在其中調用步驟函式並定義管道的 `pipeline.py` 模組放在工作區的根目錄。建議的結構如下所示：

```
.
├── config.yaml # the configuration file that define the infra settings
├── requirements.txt # dependencies
├── pipeline.py  # invoke @step-decorated functions and define the pipeline here
├── steps/
| ├── processing.py
| ├── train.py
├── data/
├── test/
```

# 限制
<a name="pipelines-step-decorator-limit"></a>

下列各節概述您在針對管道步驟中使用 `@step` 裝飾項目時應注意的限制。

## 函式引數限制
<a name="pipelines-step-decorator-arg"></a>

將輸入引數傳遞至 `@step` 裝飾函式時，適用下列限制：
+ 您可以將 `DelayedReturn`、`Properties` (其他類型的步驟)、`Parameter` 和 `ExecutionVariable` 物件傳遞至 `@step` 裝飾函式做為引數。但是 `@step` 裝飾函式不支援 `JsonGet` 和 `Join` 物件做為引數。
+ 您無法直接從 `@step` 函式存取管道變數。以下範例產生錯誤：

  ```
  param = ParameterInteger(name="<parameter-name>", default_value=10)
  
  @step
  def func():
      print(param)
  
  func() # this raises a SerializationError
  ```
+ 您無法將一個管道變數套疊另一個物件中，然後將其傳遞至 `@step` 函式。以下範例產生錯誤：

  ```
  param = ParameterInteger(name="<parameter-name>", default_value=10)
  
  @step
  def func(arg):
      print(arg)
  
  func(arg=(param,)) # this raises a SerializationError because param is nested in a tuple
  ```
+ 由於函式的輸入和輸出是序列化的，因此對可做為函數輸入或輸出傳遞的資料類型有限制。如需詳細資訊，請參閱[調用遠端函式](train-remote-decorator-invocation.md)的*資料序列化和還原序列化*一節。相同的限制適用於 `@step` 裝飾函式。
+ 任何具有 boto 用戶端的物件都無法序列化，因此您無法將這類物件做為輸入傳遞至 `@step` 裝飾函式，也無法將其做為此函式的輸出傳遞。例如，`Estimator`、`Predictor` 和 `Processor` 等 SageMaker Python SDK 用戶端類別無法序列化。

## 函式匯入
<a name="pipelines-step-decorator-best-import"></a>

您應該在函式內匯入步驟所需的程式庫，而不是函數外。如果您在全域範圍內匯入它們，則在序列化函式時您會有發生匯入衝突的風險。例如，`sklearn.pipeline.Pipeline` 可被 `sagemaker.workflow.pipeline.Pipeline` 覆寫。

## 參考函式傳回值的子成員
<a name="pipelines-step-decorator-best-child"></a>

如果您參考 `@step` 裝飾函式傳回值的子成員，則適用下列限制：
+ 如果 `DelayedReturn` 物件代表元組、清單或字典，您可以參考具有 `[]` 的子成員，如下列範例所示：

  ```
  delayed_return[0]
  delayed_return["a_key"]
  delayed_return[1]["a_key"]
  ```
+ 您無法解壓縮元組或清單輸出，因為當您調用函式時，無法得知基礎元組或清單的確切長度。以下範例產生錯誤：

  ```
  a, b, c = func() # this raises ValueError
  ```
+ 您無法逐一查看 `DelayedReturn` 物件。以下範例會引發錯誤：

  ```
  for item in func(): # this raises a NotImplementedError
  ```
+ 您無法參考具有 '`.`' 的任意子成員。以下範例產生錯誤：

  ```
  delayed_return.a_child # raises AttributeError
  ```

## 不支援的現有管道特徵
<a name="pipelines-step-decorator-best-unsupported"></a>

您無法使用 `@step` 裝飾項目搭配下列管道特徵：
+ [管道步驟快取](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html)
+ [屬性檔案](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-propertyfile.html#build-and-manage-propertyfile-property)

# 在步驟之間傳遞資料
<a name="build-and-manage-propertyfile"></a>

使用 Amazon SageMaker Pipelines 建置管道時，您可能需要將資料從某個步驟傳遞到下一個步驟。例如，您可能想要使用訓練步驟產生的模型成品，做為模型評估或部署步驟的輸入。您可以使用此功能來建立相互依賴的管道步驟，並建置 ML 工作流程。

當您需要從管道步驟的輸出擷取資訊時，您可以使用 `JsonGet`。`JsonGet` 可協助您從 Amazon S3 或屬性檔案中擷取資訊。下列各節說明您可以用來搭配 `JsonGet` 擷取步驟輸出的方法。

## 使用 Amazon S3 在步驟之間傳遞資料
<a name="build-and-manage-propertyfile-s3"></a>

您可以在 `JsonGet` 中使用 `ConditionStep`，直接從 Amazon S3 擷取 JSON 輸出。Amazon S3 URI 可以是包含基本字串、管道執行變數或管道參數的 `Std:Join` 函式。以下範例展示如何在 `ConditionStep` 中使用 `JsonGet`。

```
# Example json file in s3 bucket generated by a processing_step
{
   "Output": [5, 10]
}

cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name="<step-name>",
        s3_uri="<s3-path-to-json>",
        json_path="Output[1]"
    ),
    right=6.0
)
```

如果您在條件步驟中使用 `JsonGet` 搭配 Amazon S3 路徑，則必須在條件步驟與產生 JSON 輸出的步驟之間明確新增相依性。在下列範例中，條件步驟是透過處理步驟上的相依性建立的：

```
cond_step = ConditionStep(
        name="<step-name>",
        conditions=[cond_lte],
        if_steps=[fail_step],
        else_steps=[register_model_step],
        depends_on=[processing_step],
)
```

## 使用屬性檔案在步驟之間傳遞資料
<a name="build-and-manage-propertyfile-property"></a>

使用屬性檔案來存放處理步驟輸出中的資訊。這在分析處理步驟的結果以決定如何執行條件步驟時特別有用。`JsonGet` 函式會處理屬性檔案，並可讓您使用 JsonPath 表示法來查詢屬性 JSON 檔案。如需有關 JsonPath 符號的詳細資訊，請參閱 [JSONPath 回購](https://github.com/json-path/JsonPath)。

若要存放屬性檔案以供日後使用，您必須先建立使用下列格式的 `PropertyFile` 執行個體。`path` 參數是儲存屬性檔案的 JSON 檔案的名稱。所有 `output_name` 都必須符合您在處理步驟中定義的 `ProcessingOutput` 之 `output_name`。這可讓屬性檔案在步驟中擷取 `ProcessingOutput`。

```
from sagemaker.workflow.properties import PropertyFile

<property_file_instance> = PropertyFile(
    name="<property_file_name>",
    output_name="<processingoutput_output_name>",
    path="<path_to_json_file>"
)
```

建立 `ProcessingStep` 執行個體時，請新增 `property_files` 參數以列出 Amazon SageMaker Pipelines 服務必須為其編製索引的所有參數檔案。這將儲存屬性檔案以供日後使用。

```
property_files=[<property_file_instance>]
```

若要在條件步驟中使用屬性檔案，請將 `property_file` 新增至您傳遞至條件步驟的條件 (如下列範例所示)，以使用 `json_path` 參數查詢所需屬性的 JSON 檔案。

```
cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=<property_file_instance>,
        json_path="mse"
    ),
    right=6.0
)
```

如需更深入的範例，請參閱 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 中的 [Property File](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#property-file)。**

# 快取管道步驟
<a name="pipelines-caching"></a>

在 Amazon SageMaker Pipelines 中，您可以使用步驟快取，在重新執行管道時節省時間和資源。當步驟具有相同的組態和輸入時，步驟快取會重複使用步驟先前成功執行後的輸出 (而不是重新計算)。這可協助您在具有相同參數的管道重新執行之間獲得一致的結果。下列主題說明如何為您的管道設定和開啟步驟快取。

當您使用步驟簽章快取時，Pipelines 會嘗試尋找先前執行的目前管道步驟，該步驟具有相同的特定屬性值。如果找到，Pipelines 會傳播先前執行的輸出，而不是重新計算步驟。核取的屬性是步驟類型特有的，會在 [依管道步驟類型的預設快取金鑰屬性](pipelines-default-keys.md) 中列出。

您必須選擇啟用步驟快取，該功能依預設處於關閉狀態。當您開啟步驟快取時，還必須定義逾時。此逾時定義間隔多久的先前執行仍可作為重新執行候選。

步驟快取只會考慮成功的執行，決不會重複使用失敗的執行。當逾時期間內存在多個成功的執行時，Pipelines 會使用最近一次成功執行的結果。如果逾時期間內沒有相符的成功執行，Pipelines 會重新執行該步驟。如果執行器找到符合條件之先前的執行，但該執行仍在進行，則這兩個步驟都會繼續執行並在成功時更新快取。

步驟快取的範圍僅限於個別管道，因此即使有步驟簽名相符，您也無法重複使用其他管道中的步驟。

步驟快取適用於下列步驟類型：
+ [處理](build-and-manage-steps-types.md#step-type-processing)
+ [培訓](build-and-manage-steps-types.md#step-type-training)
+ [調校](build-and-manage-steps-types.md#step-type-tuning)
+ [AutoML](build-and-manage-steps-types.md#step-type-automl)
+ [轉換](build-and-manage-steps-types.md#step-type-transform)
+ [`ClarifyCheck`](build-and-manage-steps-types.md#step-type-clarify-check)
+ [`QualityCheck`](build-and-manage-steps-types.md#step-type-quality-check)
+ [EMR](build-and-manage-steps-types.md#step-type-emr)

**Topics**
+ [開啟步驟快取](pipelines-caching-enabling.md)
+ [關閉步驟快取](pipelines-caching-disabling.md)
+ [依管道步驟類型的預設快取金鑰屬性](pipelines-default-keys.md)
+ [快取資料存取控制](pipelines-access-control.md)

# 開啟步驟快取
<a name="pipelines-caching-enabling"></a>

若要開啟步驟快取，您必須將 `CacheConfig` 屬性新增至步驟定義。`CacheConfig` 屬性會在管道定義檔案中使用以下格式：

```
{
    "CacheConfig": {
        "Enabled": false,
        "ExpireAfter": "<time>"
    }
}
```

`Enabled` 欄位會指出是否針對特定步驟開啟了快取。您可以將欄位設定為 `true`，這會告訴 SageMaker AI 嘗試尋找先前執行且具有相同屬性的步驟。或者，您可以將欄位設定為 `false`，這會告訴 SageMaker AI 在每次管道執行時執行步驟。`ExpireAfter` 是 [ISO 8601 持續時間](https://en.wikipedia.org/wiki/ISO_8601#Durations)格式中定義逾時期間的字串。`ExpireAfter` 持續時間可以是年、月、週、日、小時或分鐘值。每個值由一個數字組成，後跟一個表示持續時間單位的字母。例如：
+ “30d” = 30 天
+ “5y” = 5 年
+ “T16m” = 16 分鐘
+ “30dT5h” = 30 天 5 小時。

以下討論說明使用 Amazon SageMaker Python SDK 為新管道或既有管道開啟快取的程序。

**為新管道開啟快取**

對於新管道，請使用 `enable_caching=True` 初始化 `CacheConfig` 執行個體，並將其作為管道步驟輸入提供。下列範例會為訓練步驟開啟逾時時間為 1 小時的快取：

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
      
cache_config = CacheConfig(enable_caching=True, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)
```

**為既有的管道開啟快取**

若要為既有、已定義之管道開啟快取，請開啟步驟的 `enable_caching` 屬性，並將 `expire_after` 設定為逾時值。最後，使用 `pipeline.upsert()` 或 `pipeline.update()` 更新管道。再次執行後，下列程式碼範例會為訓練步驟開啟逾時時間為 1 小時的快取：

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
from sagemaker.workflow.pipeline import Pipeline

cache_config = CacheConfig(enable_caching=True, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)

# define pipeline
pipeline = Pipeline(
    steps=[step_train]
)

# additional step for existing pipelines
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

或者，在定義 (既有) 管道之後更新快取組態，允許一次持續的程式碼執行。下列程式碼範例會示範此方法：

```
# turn on caching with timeout period of one hour
pipeline.steps[0].cache_config.enable_caching = True 
pipeline.steps[0].cache_config.expire_after = "PT1H" 

# additional step for existing pipelines
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

如需有關 Python SDK 參數如何影響快取的詳細程式碼範例和討論，請參閱 Amazon SageMaker Python SDK 文件中的 [Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration)。

# 關閉步驟快取
<a name="pipelines-caching-disabling"></a>

如果您變更任何未針對其步驟類型在 [依管道步驟類型的預設快取金鑰屬性](pipelines-default-keys.md) 中列出的屬性，管道步驟不會重新執行。不過，您可能會決定仍然要重新執行管道步驟。在這種情況下，您需要關閉步驟快取。

若要關閉步驟快取，請在步驟定義中將步驟定義之 `CacheConfig` 屬性中的 `Enabled` 屬性設定為 `false`，如下列程式碼片段所示：

```
{
    "CacheConfig": {
        "Enabled": false,
        "ExpireAfter": "<time>"
    }
}
```

請注意，如果 `Enabled` 為 `false`，則 `ExpireAfter` 屬性會被忽略。

若要使用 Amazon SageMaker Python SDK 為管道步驟關閉快取，請定義管道步驟的管道、關閉 `enable_caching` 屬性，然後更新管道。

再次執行後，下列程式碼範例會為訓練步驟關閉快取：

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
from sagemaker.workflow.pipeline import Pipeline

cache_config = CacheConfig(enable_caching=False, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)

# define pipeline
pipeline = Pipeline(
    steps=[step_train]
)

# update the pipeline
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

或者，在定義管道之後關閉 `enable_caching` 屬性，允許一次持續的程式碼執行。下列程式碼範例會示範此解決方案：

```
# turn off caching for the training step
pipeline.steps[0].cache_config.enable_caching = False

# update the pipeline
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

如需有關 Python SDK 參數如何影響快取的詳細程式碼範例和討論，請參閱 Amazon SageMaker Python SDK 文件中的 [Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration)。

# 依管道步驟類型的預設快取金鑰屬性
<a name="pipelines-default-keys"></a>

決定是否要重複使用先前的管道步驟或重新執行步驟時，Pipelines 會檢查某些屬性是否已變更。如果該組屬性與逾時時間內的所有先前執行不同，則步驟會再次執行。這些屬性包括輸入成品、應用程式或演算法規格以及環境變數。下列清單顯示每個配管步驟類型，以及如果變更，則會啟動步驟重新執行的屬性。如需有關使用哪些 Python SDK 參數來建立下列屬性的詳細資訊，請參閱 Amazon SageMaker Python SDK 文件中的 [Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration)。

## [處理步驟](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)
<a name="collapsible-caching-section-1"></a>
+ AppSpecification
+ Environment
+ ProcessingInputs。此屬性包含預先處理指令碼的相關資訊。

  
## [訓練步驟](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)
<a name="collapsible-caching-section-2"></a>
+ AlgorithmSpecification
+ CheckpointConfig
+ DebugHookConfig
+ DebugRuleConfigurations
+ Environment
+ HyperParameters
+ InputDataConfig。此屬性包含訓練指令碼的相關資訊。

  
## [調校步驟](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html)
<a name="collapsible-caching-section-3"></a>
+ HyperParameterTuningJobConfig
+ TrainingJobDefinition。此屬性由多個子屬性組成，並非所有子屬性都會導致步驟重新執行。可能會導致重新執行 (如果已變更) 的子屬性包括：
  + AlgorithmSpecification
  + HyperParameterRanges
  + InputDataConfig
  + StaticHyperParameters
  + TuningObjective
+ TrainingJobDefinitions

  
## [AutoML 步驟](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AutoMLJobConfig.html)
<a name="collapsible-caching-section-4"></a>
+ AutoMLJobConfig。此屬性由多個子屬性組成，並非所有子屬性都會導致步驟重新執行。可能會導致重新執行 (如果已變更) 的子屬性包括：
  + CompletionCriteria
  + CandidateGenerationConfig
  + DataSplitConfig
  + Mode
+ AutoMLJobObjective
+ InputDataConfig
+ ProblemType

  
## [轉換步驟](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)
<a name="collapsible-caching-section-5"></a>
+ DataProcessing
+ Environment
+ ModelName
+ TransformInput

  
## [ClarifyCheck 步驟](build-and-manage-steps-types.md#step-type-clarify-check)
<a name="collapsible-caching-section-6"></a>
+ ClarifyCheckConfig
+ CheckJobConfig
+ SkipCheck
+ RegisterNewBaseline
+ ModelPackageGroupName
+ SuppliedBaselineConstraints

  
## [QualityCheck 步驟](build-and-manage-steps-types.md#step-type-quality-check)
<a name="collapsible-caching-section-7"></a>
+ QualityCheckConfig
+ CheckJobConfig
+ SkipCheck
+ RegisterNewBaseline
+ ModelPackageGroupName
+ SuppliedBaselineConstraints
+ SuppliedBaselineStatistics

  
## [EMR 步驟](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-emr)
<a name="collapsible-caching-section-8"></a>
+ ClusterId
+ StepConfig

  
# 快取資料存取控制
<a name="pipelines-access-control"></a>

SageMaker AI 管道執行時，它會快取與管道啟動的 SageMaker AI 任務相關聯的參數和中繼資料，並儲存這些參數和中繼資料，以便在後續執行中重複使用。除了快取的管道步驟之外，還可透過各種來源存取此中繼資料，包括下列類型的來源：
+ `Describe*Job` 請求
+ CloudWatch Logs
+ CloudWatch Events
+ CloudWatch Metrics
+ SageMaker AI Search

請注意，對清單中每個資料來源的存取權都是由其自己的一組 IAM 許可控制。移除特定角色對某個資料來源的存取權不會影響對其他資料來源的存取層級。例如，帳戶管理員可能會移除發起人角色之 `Describe*Job` 請求的 IAM 許可。雖然發起人無法再提出 `Describe*Job` 請求，但只要有執行管道的許可，就仍然可以使用快取步驟從管道執行中擷取中繼資料。如果帳戶管理員想要從特定 SageMaker AI 任務中完全移除中繼資料的存取權，他們需要移除每項提供資料存取權的相關服務的許可。

# 管道步驟的重試政策
<a name="pipelines-retry-policy"></a>

重試政策可協助在發生錯誤後自動重試 Pipelines 步驟。任何管道步驟都可能會遇到例外狀況，而發生例外狀況的原因有很多。在某些情況下，重試可以解決這些問題。透過管道步驟的重試政策，您可以選擇是否重試特定管道步驟。

重試政策僅支援下列管道步驟：
+ [處理步驟](build-and-manage-steps-types.md#step-type-processing) 
+ [訓練步驟](build-and-manage-steps-types.md#step-type-training) 
+ [調校步驟](build-and-manage-steps-types.md#step-type-tuning) 
+ [AutoML 步驟](build-and-manage-steps-types.md#step-type-automl) 
+ [建立模型步驟](build-and-manage-steps-types.md#step-type-create-model) 
+ [註冊模型步驟](build-and-manage-steps-types.md#step-type-register-model) 
+ [轉換步驟](build-and-manage-steps-types.md#step-type-transform) 
+ [筆記本任務步驟](build-and-manage-steps-types.md#step-type-notebook-job) 

**注意**  
在調校和 AutoML 步驟中執行的任務會在內部執行重試，而且不會重試 `SageMaker.JOB_INTERNAL_ERROR` 例外狀況類型，即使已設定重試政策也是如此。您可以使用 SageMaker API 來對自己的[重試政策](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_RetryStrategy.html)進行程式設計。

## 重試政策支援的例外狀況類型
<a name="pipelines-retry-policy-supported-exceptions"></a>

管道步驟的重試政策支援下列例外狀況類型：
+ `Step.SERVICE_FAULT`：如果呼叫下游服務時發生內部伺服器錯誤或暫時性錯誤，就會發生這些例外狀況。Pipelines 會自動針對此類型的錯誤重試。憑藉重試政策，您可以覆寫此例外狀況類型的預設重試操作。
+ `Step.THROTTLING`：呼叫下游服務時可能會發生限流例外狀況。Pipelines 會自動針對此類型的錯誤重試。憑藉重試政策，您可以覆寫此例外狀況類型的預設重試操作。
+ `SageMaker.JOB_INTERNAL_ERROR`：SageMaker AI 任務傳回 `InternalServerError` 時，會發生這些例外狀況。在此情況下，啟動新任務可能會修正暫時性問題。
+ `SageMaker.CAPACITY_ERROR`：SageMaker AI 任務可能會遇到 Amazon EC2 `InsufficientCapacityErrors`，這會導致 SageMaker AI 任務失敗。您可以透過啟動新的 SageMaker AI 任務來重試，以避免發生此問題。
+ `SageMaker.RESOURCE_LIMIT`：執行 SageMaker AI 任務時，您可以超出資源限制配額。您可以在短時間後等待並重新嘗試執行 SageMaker AI 任務，並查看資源是否已釋放。

## 重試政策的 JSON 結構描述
<a name="pipelines-retry-policy-json-schema"></a>

管道的重試政策具有下列 JSON 結構描述：

```
"RetryPolicy": {
   "ExceptionType": [String]
   "IntervalSeconds": Integer
   "BackoffRate": Double
   "MaxAttempts": Integer
   "ExpireAfterMin": Integer
}
```
+ `ExceptionType`：此欄位需要字串陣列格式的下列例外狀況類型。
  + `Step.SERVICE_FAULT`
  + `Step.THROTTLING`
  + `SageMaker.JOB_INTERNAL_ERROR`
  + `SageMaker.CAPACITY_ERROR`
  + `SageMaker.RESOURCE_LIMIT`
+ `IntervalSeconds` (可選)：第一次重試嘗試之前的秒數 (預設值為 1)。`IntervalSeconds` 的最大值為 43200 秒 (12 小時)。
+ `BackoffRate` (可選)：乘數，重試間隔會在每次嘗試時隨之增加 (預設值為 2.0)。
+ `MaxAttempts` (可選)：正整數，代表重試次數上限 (預設值為 5)。如果出現錯誤的次數超過 `MaxAttempts` 指定的次數，則重試會停止且一般錯誤處理會繼續執行。值為 0 表示永遠不會重試錯誤。`MaxAttempts` 最大值為 20。
+ `ExpireAfterMin` (可選)：正整數，代表重試的最大時間範圍。如果從步驟開始計數 `ExpireAfterMin` 分鐘後再次發生錯誤，則重試會停止且一般錯誤處理會繼續執行。值為 0 表示永遠不會重試錯誤。`ExpireAfterMin ` 的最大值為 14,400 分鐘 (10 天)。
**注意**  
只能指定 `MaxAttempts` 或 `ExpireAfterMin` 中的一個，但不能同時指定兩者；如果*未*指定兩者，則 `MaxAttempts` 會變成預設值。如果在一項政策中識別了兩個屬性，則重試政策會產生驗證錯誤。

# 設定重試政策
<a name="pipelines-configuring-retry-policy"></a>

雖然 SageMaker Pipelines 提供強大且自動化的方式來協調機器學習工作流程，但您可能會在執行它們時遇到失敗。若要正常處理這類案例並改善管道的可靠性，您可以設定重試政策，定義遇到例外狀況後自動重試特定步驟的方式和時間。重試政策可讓您指定要重試的例外狀況類型、重試嘗試次數上限、重試之間的間隔，以及用於增加重試間隔的退避率。下節提供如何在 JSON 和使用 SageMaker Python SDK 為管道中的訓練步驟設定重試政策的範例。

以下是具有重試政策的訓練步驟範例。

```
{
    "Steps": [
        {
            "Name": "MyTrainingStep",
            "Type": "Training",
            "RetryPolicies": [
                {
                    "ExceptionType": [
                        "SageMaker.JOB_INTERNAL_ERROR",
                        "SageMaker.CAPACITY_ERROR"
                    ],
                    "IntervalSeconds": 1,
                    "BackoffRate": 2,
                    "MaxAttempts": 5
                }
            ]
        }
    ]
}
```


以下範例示範了如何使用重試政策在適用於 Python 的 SDK (Boto3) 中建置 `TrainingStep`。

```
from sagemaker.workflow.retry import (
    StepRetryPolicy, 
    StepExceptionTypeEnum,
    SageMakerJobExceptionTypeEnum,
    SageMakerJobStepRetryPolicy
)

step_train = TrainingStep(
    name="MyTrainingStep",
    xxx,
    retry_policies=[
        // override the default 
        StepRetryPolicy(
            exception_types=[
                StepExceptionTypeEnum.SERVICE_FAULT, 
                StepExceptionTypeEnum.THROTTLING
            ],
            expire_after_mins=5,
            interval_seconds=10,
            backoff_rate=2.0 
        ),
        // retry when resource limit quota gets exceeded
        SageMakerJobStepRetryPolicy(
            exception_types=[SageMakerJobExceptionTypeEnum.RESOURCE_LIMIT],
            expire_after_mins=120,
            interval_seconds=60,
            backoff_rate=2.0
        ),
        // retry when job failed due to transient error or EC2 ICE.
        SageMakerJobStepRetryPolicy(
            failure_reason_types=[
                SageMakerJobExceptionTypeEnum.INTERNAL_ERROR,
                SageMakerJobExceptionTypeEnum.CAPACITY_ERROR,
            ],
            max_attempts=10,
            interval_seconds=30,
            backoff_rate=2.0
        )
    ]
)
```

如需為特定步驟類型設定重試行為的詳細資訊，請參閱 Amazon SageMaker Python SDK 文件中的 [Amazon SageMaker Pipelines - 重試政策](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#retry-policy)。**

# 管道步驟的選取性執行
<a name="pipelines-selective-ex"></a>

當您使用 Pipelines 建立工作流程並協調 ML 訓練步驟時，您可能需要經歷多個實驗階段。您可能只想重複特定步驟，而不是每次執行完整管道。使用 Pipelines，您可以選擇性地執行管道步驟。這有助於最佳化您的 ML 訓練。選取性執行在下列情況下很有用：
+ 您想使用更新後的執行個體類型、超參數或其他變數來重新啟動特定步驟，同時保留上游步驟中的參數。
+ 您的管道會在中間步驟失敗。執行中的先前步驟 (例如資料準備或特徵擷取) 的重新執行成本很高。您可能需要引入修正程式，然後手動重新執行某些步驟以完成管道。

使用選取性執行，您可以選擇執行任何步驟子集，只要這些步驟子集在管道的有向無環圖 (DAG) 中已連線即可。下列 DAG 顯示管道工作流程範例：

![\[範例管道的有向無環圖 (DAG)。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipeline-full.png)


您可以在選取性執行中選取 `AbaloneTrain` 和 `AbaloneEval` 步驟，但您無法只選取 `AbaloneTrain` 和 `AbaloneMSECond` 步驟，因為這些步驟在 DAG 中未連線。對於工作流程中未選取的步驟，選取性執行會重新使用參考管道執行的輸出，而不是重新執行步驟。此外，位於所選取步驟下游的未選取步驟不會在選取性執行中執行。

如果您選擇在管道中執行中繼步驟的子集，則您的步驟可能會依賴先前的步驟。SageMaker AI 需要參考管道執行，以便從中為這些相依性提供資源。例如，如果您選擇執行步驟 `AbaloneTrain` 和 `AbaloneEval`，則需要 `AbaloneProcess` 步驟的輸出。您可以提供參考執行 ARN 或指示 SageMaker AI 使用最新的管道執行，這是預設行為。如果您有參考執行，則還可以透過參考執行建置執行時期參數，並將其提供給具有覆寫的選取性執行。如需詳細資訊，請參閱[重複使用參考執行中的執行期參數值](#pipelines-selective-ex-reuse)。

詳細來說，您可以使用 `SelectiveExecutionConfig` 為選取性執行管道執行提供組態。如果您包含參考管道執行 (含 `source_pipeline_execution_arn` 引數) 的 ARN，SageMaker AI 會使用先前的步驟相依性，這些相依性來自您提供的管道執行。如果您未包含 ARN 且最新的管道執行存在，則 SageMaker AI 預設會使用其作為參考。如果您未包含 ARN 且不想要 SageMaker AI 使用最新的管道執行，請將 `reference_latest_execution` 設定為 `False`。SageMaker AI 最終用作參考的管道執行 (無論是最新的還是使用者指定的) 都必須處於 `Success` 或 `Failed` 狀態。

下表彙總了 SageMaker AI 選擇參考執行的方式。


| `source_pipeline_execution_arn` 引數值 | `reference_latest_execution` 引數值 | 使用的參考執行 | 
| --- | --- | --- | 
| 管道 AARN | `True` 或未指定 | 指定的管道 ARN | 
| 管道 AARN | `False` | 指定的管道 ARN | 
| null 或未指定 | `True` 或未指定 | 最新管道執行 | 
| null 或未指定 | `False` | 無 - 在此情況下，請選取沒有上游相依性的步驟 | 

如需有關選取性執行組態要求的更多資訊，請參閱 [sagemaker.workflow.selective\$1execution\$1config.SelectiveExecutionConfig](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#selective-execution-config) 文件。

以下討論內容涵蓋您想執行下列動作的情況範例：指定管道參考執行、使用最新管道執行作為參考，或在沒有參考管道執行的情況下執行選取性執行。

## 使用使用者指定之管道參考的選取性執行
<a name="pipelines-selective-ex-arn"></a>

下列範例示範如何使用參考管道執行，選擇性地執行步驟 `AbaloneTrain` 和 `AbaloneEval`。

```
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig

selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef", 
    selected_steps=["AbaloneTrain", "AbaloneEval"]
)

selective_execution = pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## 以最新的管道執行作為參考的選取性執行
<a name="pipelines-selective-ex-latest"></a>

下列範例示範如何使用最新參考管道執行做為參考，選擇性地執行步驟 `AbaloneTrain` 和 `AbaloneEval`。由於 SageMaker AI 預設使用最新的管道執行，因此您可以選擇將 `reference_latest_execution` 引數設定為 `True`。

```
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn.
selective_execution_config = SelectiveExecutionConfig(
    selected_steps=["AbaloneTrain", "AbaloneEval"],
    # optional
    reference_latest_execution=True
)

# Start pipeline execution without source_pipeline_execution_arn
pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## 沒有參考管道的選取性執行
<a name="pipelines-selective-ex-none"></a>

下列範例示範如何選擇性地執行步驟 `AbaloneProcess` 和 `AbaloneTrain`，而無需提供參考 ARN 並關閉使用最新管道執行做為參考的選項。SageMaker AI 允許此組態，因為這個步驟子集不依賴先前的步驟。

```
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn.
selective_execution_config = SelectiveExecutionConfig(
    selected_steps=["AbaloneProcess", "AbaloneTrain"],
    reference_latest_execution=False
)

# Start pipeline execution without source_pipeline_execution_arn
pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## 重複使用參考執行中的執行期參數值
<a name="pipelines-selective-ex-reuse"></a>

您可以使用 `build_parameters_from_execution` 透過參考管道執行建置參數，並將結果提供給您的選取性執行管道。您可以使用參考執行中的原始參數，或使用 `parameter_value_overrides` 引數套用任何覆寫。

下列範例示範如何透過參考執行建置參數，以及如何套用 `MseThreshold` 參數的覆寫。

```
# Prepare a new selective execution.
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef",
    selected_steps=["AbaloneTrain", "AbaloneEval", "AbaloneMSECond"],
)
# Define a new parameters list to test.
new_parameters_mse={
    "MseThreshold": 5,
}

# Build parameters from reference execution and override with new parameters to test.
new_parameters = pipeline.build_parameters_from_execution(
    pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef",
    parameter_value_overrides=new_parameters_mse
)

# Start pipeline execution with new parameters.
execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters=new_parameters
)
```

# Amazon SageMaker Pipelines 中具有 ClarifyCheck 和 QualityCheck 步驟的基準計算、漂移偵測和生命週期
<a name="pipelines-quality-clarify-baseline-lifecycle"></a>

下列主題討論在使用 [`ClarifyCheck`](build-and-manage-steps-types.md#step-type-clarify-check) 和 [`QualityCheck`](build-and-manage-steps-types.md#step-type-quality-check) 步驟時，Amazon SageMaker Pipelines 中的基準和模型版本如何演進。

對於 `ClarifyCheck` 步驟而言，基準是位於具有尾碼 `constraints` 之步驟屬性中的單一檔案。對於 `QualityCheck` 步驟而言，基準是位於步驟屬性中的兩個檔案的組合：一個具有尾碼 `statistics`，另一個具有尾碼 `constraints`。在下列主題中，我們討論這兩個管道步驟中的這些屬性 (包含描述其使用方式的字首)、有影響力的基準行為以及生命週期。例如，`ClarifyCheck` 步驟一律會計算並指派 `CalculatedBaselineConstraints` 屬性中的新基準，而且 `QualityCheck` 步驟會在 `CalculatedBaselineConstraints` 和 `CalculatedBaselineStatistics` 屬性中執行相同的動作。

## ClarifyCheck 和 QualityCheck 步驟的基準計算和註冊
<a name="pipelines-quality-clarify-baseline-calculations"></a>

`ClarifyCheck` 和 `QualityCheck` 步驟一律會根據基礎處理任務執行的步驟輸入來計算新的基準。這些新計算的基準可透過具有字首 `CalculatedBaseline` 的屬性存取。您可以將這些屬性記錄為 [模型步驟](build-and-manage-steps-types.md#step-type-model) 中的模型套件之 `ModelMetrics`。此模型套件可以使用 5 個不同的基準註冊。您可以使用每種檢查類型的一個基準來註冊：資料偏差、模型偏差、執行 `ClarifyCheck` 步驟過程中的模型可解釋性、模型品質，以及執行 `QualityCheck` 步驟過程中的資料品質。在步驟執行之後，`register_new_baseline` 參數會指定在具有字首 `BaselineUsedForDriftCheck` 的屬性中設定的值。

下列潛在使用案例表格顯示了您可以為 `ClarifyCheck` 和 `QualityCheck` 步驟設定的步驟參數所產生的不同行為：


| 您可能會考慮選取此組態之可能的使用案例  | `skip_check` / `register_new_baseline` | 步驟是否會執行漂移檢查？ | 步驟屬性 `CalculatedBaseline` 的值 | 步驟屬性 `BaselineUsedForDriftCheck` 的值 | 
| --- | --- | --- | --- | --- | 
| 您正在進行定期重新訓練，並啟用檢查以取得新模型版本，但是您*想沿用先前的基準*，作為新模型版本的模型註冊表中的 `DriftCheckBaselines`。 | False/ False | 對照現有基準執行漂移檢查 | 透過執行步驟計算的新基準 | 來自模型註冊表中最新核准之模型的基準或作為步驟參數提供的基準 | 
| 您正在進行定期重新訓練，並啟用檢查以取得新模型版本，但想要*使用新模型版本之新計算的基準重新整理模型註冊表中的 `DriftCheckBaselines`*。 | False/ True | 對照現有基準執行漂移檢查 | 透過執行步驟計算的新基準 | 透過執行步驟新計算的基準 (屬性 CalculatedBaseline 的值) | 
| 您正在啟動管道以重新訓練新模型版本，因為 Amazon SageMaker Model Monitor 針對特定類型的檢查在端點上偵測到了違規，而且您想要*略過此類針對先前基準的檢查，但想沿用先前的基準，作為新模型版本之模型註冊表中的 `DriftCheckBaselines`*。 | True/ False | 無漂移檢查 | 透過執行計算的新基準 | 來自模型註冊表中最新核准之模型的基準或作為步驟參數提供的基準 | 
| 這發生的情況如下：[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html)  | True/ True | 無漂移檢查 | 透過執行步驟計算的新基準 | 透過執行步驟新計算的基準 (屬性 CalculatedBaseline 的值) | 

**注意**  
如果您在限制中使用科學表示法，則需要轉換為浮點數。如需執行此動作的預處理指令碼範例，請參閱 [Create a Model Quality Baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-baseline.html)。

當您使用 [模型步驟](build-and-manage-steps-types.md#step-type-model) 註冊模型時，可以將 `BaselineUsedForDriftCheck` 屬性註冊為 `DriftCheckBaselines`。然後，Model Monitor 可以使用這些基準檔案進行模型和資料品質檢查。此外，這些基準也可用於 ClarifyCheck 步驟和 `QualityCheck` 步驟，將新訓練的模型與在模型註冊表中註冊的現有模型進行比較，以供未來的管道執行使用。

## 針對 Pipelines 中先前的基準進行漂移偵測
<a name="pipelines-quality-clarify-baseline-drift-detection"></a>

在 `QualityCheck` 步驟中，當您啟動定期重新訓練的管道以取得新模型版本時，如果資料品質和資料偏差在先前核准的模型版本的基準方面有 [違規的結構描述 (constraint\$1violations.json 檔案)](model-monitor-interpreting-violations.md)，您可能不想執行訓練步驟。如果在執行 `ClarifyCheck` 步驟時模型品質、模型偏差或模型可解釋性違反先前核准之模型版本的註冊基準，您也可能不想註冊新訓練的模型版本。在這些情況下，您可以透過將對應檢查步驟集的 `skip_check` 屬性設定為 `False` 來啟用所需的檢查。如果針對先前的基準線偵測到違規，則會導致 `ClarifyCheck` 和 `QualityCheck` 步驟失敗。然後管道程序不會繼續進行，因此不會註冊從基準漂移的模型。`ClarifyCheck` 和 `QualityCheck` 步驟可以取得最新核准之模型版本的 `DriftCheckBaselines` (屬於要針對其進行比較的指定模型套件群組)。先前的基準也可以直接透過 `supplied_baseline_constraints` (如果是 `QualityCheck` 步驟，則還可以使用 `supplied_baseline_statistics`) 提供，而且一律優先於從模型套件群組中提取的所有基準。

## 使用 Pipelines 的基準和模型版本生命週期以及演進
<a name="pipelines-quality-clarify-baseline-evolution"></a>

將 `ClarifyCheck` 和 `QualityCheck` 步驟的 `register_new_baseline` 設定為 `False` 後，您可以透過步驟屬性字首 `BaselineUsedForDriftCheck` 存取先前的基準。然後，您可以在使用 [模型步驟](build-and-manage-steps-types.md#step-type-model) 註冊模型時，將這些基準註冊為新模型版本中的 `DriftCheckBaselines`。在模型註冊表中核准此新模型版本後，此模型版本中的 `DriftCheckBaseline` 即可用於下一個管道程序中的 `ClarifyCheck` 和 `QualityCheck` 步驟。如果您要重新整理未來模型版本之特定檢查類型的基準，可以將 `register_new_baseline` 設定為 `True`，以使具有字首 `BaselineUsedForDriftCheck` 的屬性成為新計算的基準。透過這些方式，您可以為未來訓練的模型保留偏好的基準，或在需要時重新整理漂移檢查的基準，以便在整個模型訓練重複過程中管理基準演進和生命週期。

下列圖表展示了以模型版本為中心的基準演進和生命週期檢視。

![\[以模型版本為中心的基準演進和生命週期檢視。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipelines/Baseline-Lifecycle.png)


# 排程管道執行
<a name="pipeline-eventbridge"></a>

您可以使用 [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html) 排程 Amazon SageMaker Pipelines 執行。支援 Amazon SageMaker Pipelines 作為 [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html) 中的目標。這可讓您根據事件匯流排中的任何事件，啟動模型建置管道執行。使用 EventBridge，您可以自動執行管道，並自動回應訓練工作或端點狀態變更等事件。事件包括上傳到 Amazon S3 儲存貯體的新檔案、因漂移而導致 Amazon SageMaker AI 端點狀態發生的變更，以及 *Amazon Simple Notification Service* (SNS) 主題。

可以自動啟動下列 Pipelines 動作：  
+  `StartPipelineExecution` 

如需排程 SageMaker AI 任務的詳細資訊，請參閱[使用 Amazon EventBridge 自動化 SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html)。

**Topics**
+ [使用 Amazon EventBridge 為管道排程](#pipeline-eventbridge-schedule)
+ [使用 SageMaker Python SDK 排程管道](#build-and-manage-scheduling)

## 使用 Amazon EventBridge 為管道排程
<a name="pipeline-eventbridge-schedule"></a>

若要使用 Amazon CloudWatch Events 開始管道執行，您必須建立 EventBridge [規則](https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_Rule.html)。當您為事件建立規則時，您可以指定當 EventBridge 收到符合規則的事件時要採取的目標動作。當事件符合規則時，EventBridge 會將事件傳送到指定目標並啟動規則中定義的動作。

 下列教學課程示範如何使用 EventBridge 主控台或 AWS CLI為管道執行排程。  

### 先決條件
<a name="pipeline-eventbridge-schedule-prerequisites"></a>
+ EventBridge 在 `SageMaker::StartPipelineExecution` 許可下可以擔任的角色。如果您從 EventBridge 主控台建立規則，則可以自動建立此角色；否則，您需要自己建立此角色。如需建立 SageMaker AI 角色的相關資訊，請參閱 [SageMaker 角色](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)。
+ 要排程的 Amazon SageMaker AI 管道。若要建立 Amazon SageMaker AI 管道，請參閱[定義管道](https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html)。

### 使用 EventBridge 主控台建立 EventBridge 規則
<a name="pipeline-eventbridge-schedule-console"></a>

 下列程序示範如何使用 EventBridge 主控台建立 EventBridge 規則。  

1. 導覽至 [EventBridge 主控台](https://console.aws.amazon.com/events)。

1. 選取左側的**規則**。

1.  選取 `Create Rule`。

1. 輸入規則的名稱和說明。

1.  選取啟動此規則的方式。您有下列規則選擇：
   + **事件模式**：當符合模式的事件發生時，您的規則就會啟動。您可以選擇符合特定事件類型的預先定義模式，也可以建立自訂模式。如果您選取預先定義的模式，可以編輯模式以對其進行自訂。如需有關事件模式的詳細資訊，請參閱 [CloudWatch Events 中的事件模式](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEventsandEventPatterns.html)。
   + **排程**：您的規則會按照指定排程定期啟動。您可以使用固定頻率排程，這類排程會按照指定的分鐘數、小時或週數啟動。您也可以使用 [cron 表達式](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html#CronExpressions)建立更精細的排程，例如 “每個月的第一個星期一早上 8 點”。自訂或合作夥伴事件匯流排不支援排程。

1. 選取您所需的事件匯流排。

1. 選取當事件符合您的事件模式或排程啟動時要調用的目標。最多可為每個規則新增 5 個目標。在下拉式清單中選取 `SageMaker Pipeline`。

1. 從管道下拉式清單中選取要啟動的管道。

1. 使用名稱和值對新增要傳遞至管道執行的參數。參數值可為靜態或動態。如需 Amazon SageMaker AI 管道參數的詳細資訊，請參閱 [AWS::Events::Rule SagemakerPipelineParameters](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html#aws-resource-sagemaker-pipeline-properties)。
   + 每次啟動管道時，靜態值都會傳遞給管道執行。例如，如果已在參數清單中指定 `{"Name": "Instance_type", "Value": "ml.4xlarge"}`，則每次 EventBridge 啟動管道時，它都會作為參數在 `StartPipelineExecutionRequest` 中傳遞。
   + 動態值是使用 JSON 路徑指定的。EventBridge 會剖析事件承載中的值，然後將其傳遞至管道執行。例如：*`$.detail.param.value`*

1. 選取要用於此規則的角色。您可使用現有的角色或建立新角色。

1. (可選) 新增標籤。

1. 選取 `Create` 以完成規則。

 您的規則現已生效，可用於啟動管道執行了。

### 使用 [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/events/index.html) 建立 EventBridge 規則
<a name="pipeline-eventbridge-schedule-cli"></a>

 下列程序示範如何使用 AWS CLI建立 EventBridge 規則。

1. 建立要啟動的規則。使用 建立 EventBridge 規則時 AWS CLI，有兩個啟動規則的選項：事件模式和排程。
   +  **事件模式**：當符合模式的事件發生時，您的規則就會啟動。您可以選擇符合特定事件類型的預先定義模式，也可以建立自訂模式。如果您選取預先定義的模式，可以編輯模式以對其進行自訂。  您可以使用下列命令建立具有事件模式的規則：

     ```
     aws events put-rule --name <RULE_NAME> ----event-pattern <YOUR_EVENT_PATTERN> --description <RULE_DESCRIPTION> --role-arn <ROLE_TO_EXECUTE_PIPELINE> --tags <TAGS>
     ```
   +  **排程**：您的規則會按照指定排程定期啟動。您可以使用固定頻率排程，這類排程會按照指定的分鐘數、小時或週數啟動。您也可以使用 cron 表達式建立更精細的排程，例如 “每個月的第一個星期一早上 8 點”。自訂或合作夥伴事件匯流排不支援排程。您可以使用下列命令建立具有排程的規則：

     ```
     aws events put-rule --name <RULE_NAME> --schedule-expression <YOUR_CRON_EXPRESSION> --description <RULE_DESCRIPTION> --role-arn <ROLE_TO_EXECUTE_PIPELINE> --tags <TAGS>
     ```

1. 新增當事件符合您的事件模式或排程啟動時要調用的目標。最多可為每個規則新增 5 個目標。  對於每個目標，您必須指定以下內容：  
   +  ARN：管道的資源 ARN。
   +  角色 ARN：EventBridge 執行管道所用的角色 ARN。
   +  參數：要傳遞的 Amazon SageMaker AI 管道參數。

1. 執行下列命令，以使用 [put-targets](https://docs.aws.amazon.com/cli/latest/reference/events/put-targets.html) 將 Amazon SageMaker AI 管道作為傳遞至您的規則：

   ```
   aws events put-targets --rule <RULE_NAME> --event-bus-name <EVENT_BUS_NAME> --targets "[{\"Id\": <ID>, \"Arn\": <RESOURCE_ARN>, \"RoleArn\": <ROLE_ARN>, \"SageMakerPipelineParameter\": { \"SageMakerParameterList\": [{\"Name\": <NAME>, \"Value\": <VALUE>}]} }]"] 
   ```

## 使用 SageMaker Python SDK 排程管道
<a name="build-and-manage-scheduling"></a>

下列各節說明如何設定存取 EventBridge 資源的許可，並使用 SageMaker Python SDK 建立管道排程。

### 所需的許可
<a name="build-and-manage-scheduling-permissions"></a>

您需要具有使用管道排程器的必要許可。完成以下步驟來設定您的許可：

1. 將下列最低權限政策連接至用於建立管道觸發條件的 IAM 角色，或使用 AWS 受管政策 `AmazonEventBridgeSchedulerFullAccess`。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement":
       [
           {
               "Action":
               [
                   "scheduler:ListSchedules",
                   "scheduler:GetSchedule",
                   "scheduler:CreateSchedule",
                   "scheduler:UpdateSchedule",
                   "scheduler:DeleteSchedule"
               ],
               "Effect": "Allow",
               "Resource":
               [
                   "*"
               ]
           },
           {
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*", 
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": "scheduler.amazonaws.com"
                   }
               }
           }
       ]
   }
   ```

------

1. 將服務主體 `scheduler.amazonaws.com` 新增至此角色的信任政策，以建立與 EventBridge 的信任關係。如果您在 SageMaker Studio 中啟動筆記本，請確定您將下列信任政策連接至執行角色。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "scheduler.amazonaws.com",
                    "sagemaker.amazonaws.com"
                ]
            },
        "Action": "sts:AssumeRole"
        }
    ]
}
```

------

### 建立管道排程
<a name="build-and-manage-scheduling-create"></a>

使用 `PipelineSchedule` 建構子，您可以將管道排程為執行一次或按預定間隔執行。管道排程的類型必須為 `at`、`rate` 或 `cron`。這個排程類型集是 [EventBridge 排程選項](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html)的延伸。如需如何使用 `PipelineSchedule` 類別的詳細資訊，請參閱 [sagemaker.workflow.triggers.PipelineSchedule](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#pipeline-schedule)。下列範例示範如何使用 `PipelineSchedule` 建立每個排程類型。

```
from sagemaker.workflow.triggers import PipelineSchedule

# schedules a pipeline run for 12/13/2023 at time 10:15:20 UTC
my_datetime_schedule = PipelineSchedule(
    name="<schedule-name>", 
    at=datetime(2023, 12, 13, 10, 15, 20)
)

# schedules a pipeline run every 5 minutes
my_rate_schedule = PipelineSchedule(
    name="<schedule-name>", 
    rate=(5, "minutes")
)

# schedules a pipeline run at 10:15am UTC on the last Friday of each month during the years 2022 to 2023
my_cron_schedule = PipelineSchedule(
    name="<schedule-name>", 
    cron="15 10 ? * 6L 2022-2023"
)
```

**注意**  
如果您建立一次性排程，且需要存取目前時間，請使用 `datetime.utcnow()` 而非 `datetime.now()`。後者不會存放目前的區域內容，並導致不正確的時間傳遞至 EventBridge。

### 將觸發條件連接至您的管道
<a name="build-and-manage-scheduling-attach"></a>

若要將您的 `PipelineSchedule` 連線到您的管道，請使用觸發條件清單，在您建立的管道物件上調用 `put_triggers` 呼叫。如果您得到回應 ARN，表示已成功在您的帳戶中建立排程，而且 EventBridge 會按照指定的時間或速率開始調用目標管道。您必須指定一個角色，其具有將觸發條件連接至父管道的正確許可。如果您未提供一個，Pipelines 會擷取用來從[組態檔案](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html)建立管道的預設角色。

下列範例示範如何將排程連接至管道。

```
scheduled_pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[...],
    sagemaker_session=<sagemaker-session>,
)
custom_schedule = PipelineSchedule(
    name="<schedule-name>", 
    at=datetime(year=2023, month=12, date=25, hour=10, minute=30, second=30)
)
scheduled_pipeline.put_triggers(triggers=[custom_schedule], role_arn=<role>)
```

### 描述目前的觸發條件
<a name="build-and-manage-scheduling-describe"></a>

若要擷取所建立管道觸發條件的相關資訊，您可以使用觸發條件名稱調用 `describe_trigger()` API。此命令會傳回所建立排程運算式的詳細資訊，例如其開始時間、啟用狀態和其他實用資訊。下列程式碼片段顯示範例調用：

```
scheduled_pipeline.describe_trigger(name="<schedule-name>")
```

### 清除觸發條件資源
<a name="build-and-manage-scheduling-clean"></a>

刪除您的管道之前，請先清除現有的觸發條件，以避免您的帳戶中發生資源洩漏。您應該先刪除觸發條件，再摧毀父管道。您可以透過將觸發條件名稱清單傳遞至 `delete_triggers` API 來刪除觸發條件。下列程式碼片段示範如何刪除觸發條件。

```
pipeline.delete_triggers(trigger_names=["<schedule-name>"])
```

**注意**  
刪除觸發條件時，請注意下列限制：  
只有在 SageMaker Python SDK 中才提供透過指定觸發條件名稱刪除觸發條件的選項。在 CLI 或 `DeletePipeline` API 呼叫中刪除管道並不會刪除您的觸發條件。因此，觸發條件會變得孤立，而 SageMaker AI 會嘗試為不存在的管道啟動執行。
此外，如果您使用另一個筆記本工作階段或已刪除管道目標，請透過排程器 [CLI](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/scheduler/delete-schedule.html) 或 EventBridge 主控台清除孤立的排程。

# Amazon SageMaker Experiments 整合
<a name="pipelines-experiments"></a>

Amazon SageMaker Pipelines 已與 Amazon SageMaker Experiments 緊密整合。依預設，當 Pipelines 建立並執行管道時，如果下列 SageMaker Experiments 實體不存在，系統也會建立這些實體：
+ 管道實驗
+ 每次執行管道的執行群組
+ 針對在管道執行步驟中建立的每個 SageMaker AI 任務，新增至執行群組的執行

您可以跨多個管道執行比較模型訓練準確度等指標，就像比較 SageMaker AI 模型訓練實驗中多個執行群組的指標一樣。

下列範例顯示 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 中 [管道](https://github.com/aws/sagemaker-python-sdk/blob/v2.41.0/src/sagemaker/workflow/pipeline.py) 類別的相關參數。

```
Pipeline(
    name="MyPipeline",
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),
    steps=[...]
)
```

如果您不想為管道建立實驗和執行群組，請將 `pipeline_experiment_config` 設定為 `None`。

**注意**  
Amazon SageMaker Python SDK v2.41.0 中引入了實驗整合功能。

系統會根據您為 `pipeline_experiment_config` 的 `ExperimentName` 和 `TrialName` 參數指定的內容，適用下列命名規則：
+ 如果未指定 `ExperimentName`，管道 `name` 將用於實驗名稱。

  如果指定了 `ExperimentName`，系統會將此參數用作實驗名稱。如果存在具有此名稱的實驗，則管道建立的執行群組會新增至現有實驗。如果不存在具有該名稱的實驗，系統會建立新實驗。
+ 如果未指定 `TrialName`，系統會將管道執行 ID 用作執行群組名稱。

  如果指定了 `TrialName`，則此參數將用於執行群組名稱。如果存在具有該名稱的執行群組，則管道建立的執行會新增至現有執行群組。如果不存在具有該名稱的執行群組，系統則會建立新執行群組。

**注意**  
刪除建立實體的管道時，不會刪除實驗實體。您可以透過 SageMaker Experiments API 來刪除實體。

如需如何查看與管道關聯的 SageMaker AI Experiment 實體的相關資訊，請參閱[從管道中存取實驗資料](pipelines-studio-experiments.md)。如需有關 SageMaker Experiments 的詳細資訊，請參閱[Studio Classic 中的 Amazon SageMaker Experiments](experiments.md)。

以下章節展示先前規則的範例，以及它們在管道定義檔案中的表示方式。如需有關管道定義檔案的詳細資訊，請參閱[管道概觀](pipelines-overview.md)。

**Topics**
+ [預設行為](pipelines-experiments-default.md)
+ [停用 Experiments 整合](pipelines-experiments-none.md)
+ [指定自訂實驗名稱](pipelines-experiments-custom-experiment.md)
+ [指定自訂執行群組名稱](pipelines-experiments-custom-trial.md)

# 預設行為
<a name="pipelines-experiments-default"></a>

**建立管道**

建立 SageMaker AI 管道時的預設行為是自動將其與 SageMaker Experiments 整合。如果您未指定任何自訂組態，SageMaker AI 會使用與管道相同的名稱建立實驗、使用管道執行 ID 做為名稱為每個管道執行建立執行群組，以及為作為管道步驟一部分啟動的每個 SageMaker AI 任務的每個執行群組內建立個別執行。您可以無縫地跨不同管道執行追蹤和比較指標，類似於分析模型訓練實驗的方式。下節會在定義管道時示範此預設行為，而不明確設定實驗整合。

省略了 `pipeline_experiment_config`。`ExperimentName` 預設為管道 `name`。`TrialName` 預設為執行 ID。

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    steps=[step_train]
)
```

**管道定義檔案**

```
{
  "Version": "2020-12-01",
  "Parameters": [
    {
      "Name": "InputDataSource"
    },
    {
      "Name": "InstanceCount",
      "Type": "Integer",
      "DefaultValue": 1
    }
  ],
  "PipelineExperimentConfig": {
    "ExperimentName": {"Get": "Execution.PipelineName"},
    "TrialName": {"Get": "Execution.PipelineExecutionId"}
  },
  "Steps": [...]
}
```

# 停用 Experiments 整合
<a name="pipelines-experiments-none"></a>

**建立管道**

您可以在定義管道時將 `pipeline_experiment_config` 參數設定為 `None`，以停用管道與 SageMaker Experiments 的整合。如此一來，SageMaker AI 就不會自動建立實驗、執行群組或個別執行，以追蹤與管道執行相關聯的指標和成品。下列範例會將管道組態參數設定為 `None`。

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=None,
    steps=[step_train]
)
```

**管道定義檔案**

這與先前的預設範例相同，沒有 `PipelineExperimentConfig`。

# 指定自訂實驗名稱
<a name="pipelines-experiments-custom-experiment"></a>

雖然預設行為是在 SageMaker Experiments 中使用管道名稱做為實驗名稱，但您可以覆寫此名稱並改為指定自訂實驗名稱。如果您想要在相同實驗下將多個管道執行分組，以進行更輕鬆的分析和比較，這會很有用。執行群組名稱仍會預設為管道執行 ID，除非您也明確地為其設定自訂名稱。下節示範如何使用自訂實驗名稱建立管道，同時將執行群組名稱保留為預設執行 ID。

**建立管道**

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      "CustomExperimentName",
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),
    steps=[step_train]
)
```

**管道定義檔案**

```
{
  ...,
  "PipelineExperimentConfig": {
    "ExperimentName": "CustomExperimentName",
    "TrialName": {"Get": "Execution.PipelineExecutionId"}
  },
  "Steps": [...]
}
```

# 指定自訂執行群組名稱
<a name="pipelines-experiments-custom-trial"></a>

除了設定自訂實驗名稱之外，您也可以為 SageMaker Experiments 在管道執行期間建立的執行群組指定自訂名稱。此名稱會附加管道執行 ID，以確保唯一性。您可以指定自訂執行群組名稱，以識別和分析相同實驗內的相關管道執行。下節說明如何使用自訂執行群組名稱定義管道，同時為實驗名稱使用預設管道名稱。

**建立管道**

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      Join(on="-", values=["CustomTrialName", ExecutionVariables.PIPELINE_EXECUTION_ID])
    ),
    steps=[step_train]
)
```

**管道定義檔案**

```
{
  ...,
  "PipelineExperimentConfig": {
    "ExperimentName": {"Get": "Execution.PipelineName"},
    "TrialName": {
      "On": "-",
      "Values": [
         "CustomTrialName",
         {"Get": "Execution.PipelineExecutionId"}
       ]
    }
  },
  "Steps": [...]
}
```

# 使用本機模式執行管道
<a name="pipelines-local-mode"></a>

在受管 SageMaker AI 服務上執行管道之前，SageMaker Pipelines 本機模式為您測試訓練、處理和推論命令碼，以及[管道參數](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#pipeline-parameters)的執行時期相容性提供一種簡單方法。透過使用本機模式，您可以使用較小的資料集在本機測試 SageMaker AI 管道。這樣一來，您可以快速輕鬆地對使用者命令碼和管道定義中的錯誤進行偵錯，而不會產生使用受管服務的成本。下列主題說明如何在本機定義和執行管道。

管道本機模式會在幕後利用 [SageMaker AI 任務本機模式](https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode)。這是 SageMaker Python SDK 中的特徵，可讓您使用 Docker 容器在本機執行 SageMaker AI 內建或自訂映像。Pipelines 本機模式建置在 SageMaker AI 任務本機模式之上。因此，您看到的結果應當與單獨執行這些工作相同。例如，本機模式仍使用 Amazon S3 上傳模型成品和處理輸出。如果希望本機工作產生的資料存放在區域磁碟上，您可以使用[本機模式](https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode)中提到的設定。

管道本機模式目前支援下列步驟類型：
+ [訓練步驟](build-and-manage-steps-types.md#step-type-training)
+ [處理步驟](build-and-manage-steps-types.md#step-type-processing)
+ [轉換步驟](build-and-manage-steps-types.md#step-type-transform)
+ [模型步驟](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model-create) (僅包含建立模型引數)
+ [條件步驟](build-and-manage-steps-types.md#step-type-condition)
+ [失敗步驟](build-and-manage-steps-types.md#step-type-fail)

與允許使用[平行組態](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#parallelism-configuration)平行執行多個步驟的受管管道服務不同，本機管道執行程式會依序執行步驟。因此，本機管道的整體執行效能可能會比在雲端上執行的管道差，這主要取決於資料集的大小、演算法以及本機電腦的功能。另請注意，在本機模式下執行的管道不會記錄在[SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-experiments.html)中。

**注意**  
Pipelines 本機模式與 XGBoost 之類的 SageMaker AI 演算法不相容。如果想要使用這些演算法，則必須在[指令碼模式](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/sagemaker-script-mode.html)中使用。

為了在本機執行管道，與管道步驟和管道本身相關聯的 `sagemaker_session` 欄位的類型必須是 `LocalPipelineSession`。下列範例展示如何定義 SageMaker AI 管道以在本機執行。

```
from sagemaker.workflow.pipeline_context import LocalPipelineSession
from sagemaker.pytorch import PyTorch
from sagemaker.workflow.steps import TrainingStep
from sagemaker.workflow.pipeline import Pipeline

local_pipeline_session = LocalPipelineSession()

pytorch_estimator = PyTorch(
    sagemaker_session=local_pipeline_session,
    role=sagemaker.get_execution_role(),
    instance_type="ml.c5.xlarge",
    instance_count=1,
    framework_version="1.8.0",
    py_version="py36",
    entry_point="./entry_point.py",
)

step = TrainingStep(
    name="MyTrainingStep",
    step_args=pytorch_estimator.fit(
        inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data/train"),
    )
)

pipeline = Pipeline(
    name="MyPipeline",
    steps=[step],
    sagemaker_session=local_pipeline_session
)

pipeline.create(
    role_arn=sagemaker.get_execution_role(), 
    description="local pipeline example"
)

// pipeline will execute locally
execution = pipeline.start()

steps = execution.list_steps()

training_job_name = steps['PipelineExecutionSteps'][0]['Metadata']['TrainingJob']['Arn']

step_outputs = pipeline_session.sagemaker_client.describe_training_job(TrainingJobName = training_job_name)
```

準備好在受管的 SageMaker Pipelines 服務上執行管道之後，您可以使用 `PipelineSession` (如下列程式碼範例所示) 取代先前的程式碼片段 `LocalPipelineSession`，然後重新執行程式碼。

```
from sagemaker.workflow.pipeline_context import PipelineSession

pipeline_session = PipelineSession()
```

# 針對 Amazon SageMaker Pipelines 進行疑難排解
<a name="pipelines-troubleshooting"></a>

使用 Amazon SageMaker Pipelines 時，您可能會基於各種原因而遇到問題。本主題提供與常見錯誤及解決方法相關的資訊。

 **管道定義問題** 

您的管道定義可能未正確格式化。這可能會導致  執行失敗或工作不正確。建立或執行管道時，可能會發現這些錯誤。如果定義未驗證，Pipelines 會傳回一則錯誤訊息，識別 JSON 檔案中格式錯誤的字元。若要修正此問題，請檢閱使用 SageMaker AI Python SDK 建立的步驟，以確保正確性。

您只能在管道定義中包含一次步驟。因此，在同一管道中，步驟不能同時作為條件步驟*和*管道的一部分存在。

 **檢查管道日誌** 

您可以使用下列命令來檢視步驟的狀態：

```
execution.list_steps()
```

每個步驟都包含下列資訊：
+ 由管道啟動之實體的 ARN，例如 SageMaker AI 任務 ARN、模型 ARN 或模型套件 ARN。
+ 失敗原因包括步驟失敗的簡要說明。
+ 如果步驟是條件步驟，則會包含條件是否評估為真或假。  
+ 如果執行作業重複使用先前的任務執行，則 `CacheHit` 會列出來源執行項目。  

您也可以在 Amazon SageMaker Studio 介面中檢視錯誤訊息和日誌。如需有關如何在 Studio 中查看日誌的資訊，請參閱[檢視管道執行的詳細資訊](pipelines-studio-view-execution.md)。

 **缺少許可** 

建立管道執行的角色以及在管道執行中建立每個作業的步驟都需要正確的權限。如果沒有這些許可，您可能無法如預期般提交管道執行或執行 SageMaker AI 任務。要確保正確設定許可，請參閱[IAM 存取管理](build-and-manage-access.md)。

 **工作執行錯誤** 

由於定義 SageMaker AI 任務功能的命令碼中發生問題，您可能會在執行步驟時遇到問題。每個工作都有一組 CloudWatch 日誌。要在 Studio 中檢視這些日誌，請參閱[檢視管道執行的詳細資訊](pipelines-studio-view-execution.md)。如需使用 CloudWatch 日誌搭配 SageMaker AI 的相關資訊，請參閱[Amazon SageMaker AI 的 CloudWatch Logs](logging-cloudwatch.md)。

 **屬性檔錯誤** 

如果在管道中錯誤地實作屬性檔，您可能會遇到問題。要確保屬性檔按預期實作，請參閱[在步驟之間傳遞資料](build-and-manage-propertyfile.md)。

 **將指令碼複製到 Dockerfile 中容器的問題** 

您可以將指令碼複製到容器，或透過 (估算器實體的) `entry_point` 引數或 (處理器實體的) `code` 引數來傳遞指令碼，如下列程式碼範例所示。

```
step_process = ProcessingStep(
    name="PreprocessAbaloneData",
    processor=sklearn_processor,
    inputs = [
        ProcessingInput(
            input_name='dataset',
            source=...,
            destination="/opt/ml/processing/code",
        )
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train", destination = processed_data_path),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation", destination = processed_data_path),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test", destination = processed_data_path),
    ],
    code=os.path.join(BASE_DIR, "process.py"), ## Code is passed through an argument
    cache_config = cache_config,
    job_arguments = ['--input', 'arg1']
)

sklearn_estimator = SKLearn(
    entry_point=os.path.join(BASE_DIR, "train.py"), ## Code is passed through the entry_point
    framework_version="0.23-1",
    instance_type=training_instance_type,
    role=role,
    output_path=model_path, # New
    sagemaker_session=sagemaker_session, # New
    instance_count=1, # New
    base_job_name=f"{base_job_prefix}/pilot-train",
    metric_definitions=[
        {'Name': 'train:accuracy', 'Regex': 'accuracy_train=(.*?);'},
        {'Name': 'validation:accuracy', 'Regex': 'accuracy_validation=(.*?);'}
    ],
)
```

# Pipelines 動作
<a name="pipelines-build"></a>

您可以使用 Amazon SageMaker Pipelines Python SDK 或 Amazon SageMaker Studio 中的拖放視覺化設計工具來編寫、檢視、編輯、執行和監控您的 ML 工作流程。

下列螢幕擷取畫面顯示您可以用來建立和管理 Amazon SageMaker Pipelines 的視覺化設計工具。

![\[Studio 中 Pipelines 視覺化拖放介面的螢幕擷取畫面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


管道部署完成後，您可以檢視管道的有向無環圖 (DAG)，並使用 Amazon SageMaker Studio 管理管道執行。您可以使用 SageMaker Studio 取得與目前和歷史管道相關的資訊、比較執行項目、查看執行的 DAG、取得中繼資料資訊等。若要了解如何從 Studio 檢視管道，請參閱[檢視管道的詳細資訊](pipelines-studio-list.md)。

**Topics**
+ [定義管道](define-pipeline.md)
+ [編輯管道](edit-pipeline-before-execution.md)
+ [執行管道](run-pipeline.md)
+ [停止管道](pipelines-studio-stop.md)
+ [檢視管道的詳細資訊](pipelines-studio-list.md)
+ [檢視管道執行的詳細資訊](pipelines-studio-view-execution.md)
+ [下載管道定義檔案](pipelines-studio-download.md)
+ [從管道中存取實驗資料](pipelines-studio-experiments.md)
+ [追蹤管道的歷程](pipelines-lineage-tracking.md)

# 定義管道
<a name="define-pipeline"></a>

若要使用 Amazon SageMaker Pipelines 協調您的工作流程，您必須以 JSON 管道定義的形式產生有向無環圖 (DAG)。DAG 指定 ML 程序中涉及的不同步驟，例如資料預先處理、模型訓練、模型評估和模型部署，以及這些步驟之間的相依性和資料流程。下列主題說明如何產生管道定義。

您可以使用 SageMaker Python SDK 或 Amazon SageMaker Studio 中的視覺化拖放管道設計工具功能來產生 JSON 管道定義。下圖是您在本教學課程中建立的管道 DAG 表示法：

![\[Studio 中 Pipelines 視覺化拖放介面的螢幕擷取畫面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


您在下列各節中定義的管道可解決迴歸問題，以根據鮑魚的物理測量值確定其年齡。如需包含本教學課程中內容的可執行 Jupyter 筆記本，請參閱[使用 Amazon SageMaker 模型建構管道協調任務](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html)。

**注意**  
您可以將模型位置參照為訓練步驟的屬性，如 Github 中的端對端範例 [CustomerChurn 管道](https://github.com/aws-samples/customer-churn-sagemaker-pipelines-sample/blob/main/pipelines/customerchurn/pipeline.py)所示。

**Topics**

## 定義管道 (管道設計工具)
<a name="create-pipeline-designer"></a>

下列逐步解說會引導您完成使用拖放管道設計工具建立準系統管道的步驟。如果您需要隨時在視覺化設計工具中暫停或結束管道編輯工作階段，請按一下**匯出**選項。這可讓您將管道的目前定義下載到本機環境。稍後，當您想要繼續管道編輯程序時，您可以將相同的 JSON 定義檔案匯入至視覺化設計工具。

### 建立處理步驟
<a name="create-processing-step"></a>

若要建立資料處理任務步驟，請執行下列動作：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. 選擇**建立**。

1. 選擇**空白**。

1. 在左側邊欄中，選擇**處理資料**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**處理資料**步驟。

1. 若要新增輸入資料集，請在右側邊欄的**資料 (輸入)** 下選擇**新增**，然後選取資料集。

1. 若要新增一個位置以儲存輸出資料集，請在右側邊欄的**資料 (輸出)** 下選擇**新增**，然後導覽至目的地。

1. 完成右側邊欄中的其餘欄位。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep)。

### 建立訓練步驟
<a name="create-training-step"></a>

若要設定模型訓練步驟，請執行下列動作：

1. 在左側邊欄中，選擇**訓練模型**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**訓練模型**步驟。

1. 若要新增輸入資料集，請在右側邊欄的**資料 (輸入)** 下選擇**新增**，然後選取資料集。

1. 若要選擇一個位置以儲存您的模型成品，請在**位置 (S3 URI)** 欄位中輸入 Amazon S3 URI，或選擇**瀏覽 S3** 以導覽至目的地位置。

1. 完成右側邊欄中的其餘欄位。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep)。

1. 按一下游標並將其從您在上一節中新增的**處理資料**步驟拖曳至**訓練模型**步驟，以建立連線這兩個步驟的邊緣。

### 使用註冊模型步驟建立模型套件
<a name="create-register-model-step"></a>

若要使用模型註冊步驟建立模型套件，請執行下列動作：

1. 在左側邊欄中，選擇**註冊模型**並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**註冊模型**步驟。

1. 若要選取要註冊的模型，請選擇**模型 (輸入)** 下的**新增**。

1. 選擇**建立模型群組**，將您的模型加入至新的模型群組。

1. 完成右側邊欄中的其餘欄位。如需這些標籤中欄位的相關資訊，請參閱 [sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel)。

1. 按一下游標並將其從您在上一節中新增的**訓練模型**步驟拖曳至**註冊模型**步驟，以建立連線這兩個步驟的邊緣。

### 使用部署模型 (端點) 步驟將模型部署至端點
<a name="create-deploy-endpoint-step"></a>

若要使用模型部署步驟部署您的模型，請執行下列動作：

1. 在左側邊欄中，選擇**部署模型 (端點)** 並將其拖曳至畫布。

1. 在畫布中，選擇您新增的**部署模型 (端點)** 步驟。

1. 若要選擇要部署的模型，請選擇**模型 (輸入)** 下的**新增**。

1. 選擇**建立端點**選項按鈕以建立新的端點。

1. 為您的端點輸入**名稱**和**描述**。

1. 按一下游標並將其從您在上一節中新增的**註冊模型**步驟拖曳至**部署模型 (端點)** 步驟，以建立連線這兩個步驟的邊緣。

1. 完成右側邊欄中的其餘欄位。

### 定義管道參數
<a name="define-pipeline-parameters"></a>

您可以設定一組管道參數，其值可以在每次執行時更新。若要定義管道參數並設定預設值，請按一下視覺化設計工具底部的齒輪圖示。

### 儲存管道
<a name="save-pipeline"></a>

在您輸入了建立管道所需的所有必要資訊之後，請按一下視覺化設計工具底部的**儲存**。這會驗證您的管道在執行時期是否有任何潛在錯誤，並通知您。在您解決了自動驗證檢查所標記的所有錯誤之前，**儲存**操作不會成功。如果您想要稍後繼續編輯，您可以將進行中管道儲存為本機環境中的 JSON 定義。您可以按一下視覺化設計工具底部的**匯出**按鈕，將管道匯出為 JSON 定義檔案。稍後，若要繼續更新您的管道，請按一下**匯入**按鈕上傳該 JSON 定義檔案。

## 定義管道 (SageMaker Python SDK)
<a name="create-pipeline-wrap"></a>

### 先決條件
<a name="define-pipeline-prereq"></a>

 若要執行下列教學課程，請完成以下操作：
+ 依照[建立筆記本執行個體](https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html)中所述的內容，設定筆記本執行個體。這樣，您的角色將擁有讀取和寫入 Amazon S3 的許可，並能夠在 SageMaker AI 中建立訓練、批次轉換和處理任務。
+ 如[修改角色許可政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy)所示，授予您的筆記本取得及傳遞其角色的權限。新增下列 JSON 程式碼片段，以將此政策附加到您的角色。以用來建立筆記本執行個體的 ARN 取代 `<your-role-arn>`。

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "iam:GetRole",
                  "iam:PassRole"
              ],
              "Resource": "arn:aws:iam::111122223333:role/role-name"
          }
      ]
  }
  ```

------
+  遵循[修改角色信任政策](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-cli.html#roles-managingrole_edit-trust-policy-cli)中的步驟，信任 SageMaker AI 服務主體。將下列陳述式片段新增至角色的信任關係：

  ```
  {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
          "Service": "sagemaker.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
  ```

#### 設定您的環境
<a name="define-pipeline-prereq-setup"></a>

使用下列程式碼區塊建立新的 SageMaker AI 工作階段。這將返回作業階段的角色 ARN。此角色 ARN 應為您設定為先決條件的執行角色 ARN。

```
import boto3
import sagemaker
import sagemaker.session
from sagemaker.workflow.pipeline_context import PipelineSession

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()

pipeline_session = PipelineSession()

model_package_group_name = f"AbaloneModelPackageGroupName"
```

### 建立管道
<a name="define-pipeline-create"></a>

**重要**  
允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授與許可，才能將標籤新增至這些資源。需要將標籤新增至資源的許可，因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源，但不允許標記，則在嘗試建立資源時可能會發生 "AccessDenied" 錯誤。如需詳細資訊，請參閱[提供標記 SageMaker AI 資源的許可](security_iam_id-based-policy-examples.md#grant-tagging-permissions)。  
提供許可來建立 SageMaker 資源的 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md) 已包含建立這些資源時新增標籤的許可。

從 SageMaker AI 筆記本執行個體執行下列步驟，以建立一個管道，其中包含的步驟用於：
+ 預先處理
+ 訓練
+ 評估
+ 條件式評估
+ 模型註冊

**注意**  
您可以使用 [ExecutionVariables](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) 和 [Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) 函式來指定輸出位置。`ExecutionVariables` 會在執行時期解析。例如，`ExecutionVariables.PIPELINE_EXECUTION_ID` 解析目前執行的 ID，該 ID 可以用來做為不同執行間的唯一識別碼。

#### 步驟 1：下載資料集
<a name="define-pipeline-data-download"></a>

此筆記本使用 UCI Machine Learning 鮑魚資料集。此資料集含下列功能：
+ `length` – 測量的最長鮑魚外殼尺寸。
+ `diameter` – 與長度垂直的鮑魚直徑。
+ `height` – 鮑魚殼中鮑魚肉的高度。
+ `whole_weight` – 整個鮑魚的重量。
+ `shucked_weight` – 從鮑魚中取出的肉的重量。
+ `viscera_weight` – 鮑魚內臟經過出血處理後的重量。
+ `shell_weight` – 除肉和乾燥後鮑魚殼的重量。
+ `sex` – 鮑魚的性別。'M'、'F' 或 'I' 之一，其中 'I' 表示幼鮑。
+ `rings` – 鮑魚殼上的環數。

使用公式 `age=rings + 1.5`，可以根據鮑魚殼上的環數得到鮑魚年齡的良好近似值。不過，取得此數字是非常耗時的任務。您必須沿著螺錐切割貝殼，染色切片，然後在顯微鏡下計算環數。不過，其他的物理測量值比較容易取得。這個筆記本使用資料集，透過其他物理量測建立一個預測環數的模型。

**下載資料集**

1. 將資料集下載到帳戶的預設 Amazon S3 儲存貯體中。

   ```
   !mkdir -p data
   local_path = "data/abalone-dataset.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset.csv",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   input_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(input_data_uri)
   ```

1. 建立模型後，下載第二個資料集以進行批次轉換。

   ```
   local_path = "data/abalone-dataset-batch.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset-batch",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   batch_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(batch_data_uri)
   ```

#### 步驟 2：定義管道參數
<a name="define-pipeline-parameters"></a>

 此程式碼區塊會為您的管道定義下列參數：
+  `processing_instance_count` – 處理任務的執行個體計數。
+  `input_data` – 輸入資料的 Amazon S3 位置。
+  `batch_data` – 用於批次轉換之輸入資料的 Amazon S3 位置。
+  `model_approval_status` – 將已訓練模型進行 CI/CD 註冊的核准狀態。如需更多資訊，請參閱[使用 SageMaker 專案進行 MLOps 自動化](sagemaker-projects.md)。

```
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
model_approval_status = ParameterString(
    name="ModelApprovalStatus",
    default_value="PendingManualApproval"
)
input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)
batch_data = ParameterString(
    name="BatchData",
    default_value=batch_data_uri,
)
```

#### 步驟 3：定義特徵工程的處理步驟
<a name="define-pipeline-processing-feature"></a>

本節展示如何建立處理步驟，以透過資料集準備用於進行訓練的資料。

**建立處理步驟**

1.  建立處理命令碼的目錄。

   ```
   !mkdir -p abalone
   ```

1. 在 `/abalone` 目錄中，建立名為 `preprocessing.py` 的檔案，內含下列內容。此預先處理指令碼會傳入至處理步驟，以基於輸入資料執行。然後，訓練步驟會使用預先處理的訓練特徵和標籤來訓練模型。評估步驟會使用訓練的模型以及預先處理的測試特徵和標籤來評估模型。此指令碼使用 `scikit-learn` 執行下列動作：
   +  填寫缺少的 `sex` 分類資料並進行編碼，以適合訓練之用。
   +  對除`rings` 和 `sex` 之外的所有數字欄位執行縮放和標準化處理。
   +  將資料分割為訓練、測試和驗證資料集。

   ```
   %%writefile abalone/preprocessing.py
   import argparse
   import os
   import requests
   import tempfile
   import numpy as np
   import pandas as pd
   
   
   from sklearn.compose import ColumnTransformer
   from sklearn.impute import SimpleImputer
   from sklearn.pipeline import Pipeline
   from sklearn.preprocessing import StandardScaler, OneHotEncoder
   
   
   # Because this is a headerless CSV file, specify the column names here.
   feature_columns_names = [
       "sex",
       "length",
       "diameter",
       "height",
       "whole_weight",
       "shucked_weight",
       "viscera_weight",
       "shell_weight",
   ]
   label_column = "rings"
   
   feature_columns_dtype = {
       "sex": str,
       "length": np.float64,
       "diameter": np.float64,
       "height": np.float64,
       "whole_weight": np.float64,
       "shucked_weight": np.float64,
       "viscera_weight": np.float64,
       "shell_weight": np.float64
   }
   label_column_dtype = {"rings": np.float64}
   
   
   def merge_two_dicts(x, y):
       z = x.copy()
       z.update(y)
       return z
   
   
   if __name__ == "__main__":
       base_dir = "/opt/ml/processing"
   
       df = pd.read_csv(
           f"{base_dir}/input/abalone-dataset.csv",
           header=None, 
           names=feature_columns_names + [label_column],
           dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype)
       )
       numeric_features = list(feature_columns_names)
       numeric_features.remove("sex")
       numeric_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="median")),
               ("scaler", StandardScaler())
           ]
       )
   
       categorical_features = ["sex"]
       categorical_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
               ("onehot", OneHotEncoder(handle_unknown="ignore"))
           ]
       )
   
       preprocess = ColumnTransformer(
           transformers=[
               ("num", numeric_transformer, numeric_features),
               ("cat", categorical_transformer, categorical_features)
           ]
       )
       
       y = df.pop("rings")
       X_pre = preprocess.fit_transform(df)
       y_pre = y.to_numpy().reshape(len(y), 1)
       
       X = np.concatenate((y_pre, X_pre), axis=1)
       
       np.random.shuffle(X)
       train, validation, test = np.split(X, [int(.7*len(X)), int(.85*len(X))])
   
       
       pd.DataFrame(train).to_csv(f"{base_dir}/train/train.csv", header=False, index=False)
       pd.DataFrame(validation).to_csv(f"{base_dir}/validation/validation.csv", header=False, index=False)
       pd.DataFrame(test).to_csv(f"{base_dir}/test/test.csv", header=False, index=False)
   ```

1.  為 `SKLearnProcessor` 建立要傳入處理步驟的執行個體。

   ```
   from sagemaker.sklearn.processing import SKLearnProcessor
   
   
   framework_version = "0.23-1"
   
   sklearn_processor = SKLearnProcessor(
       framework_version=framework_version,
       instance_type="ml.m5.xlarge",
       instance_count=processing_instance_count,
       base_job_name="sklearn-abalone-process",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. 建立處理步驟。此步驟會接受 `SKLearnProcessor`、輸入和輸出通道，以及您建立的 `preprocessing.py` 指令碼。這與 SageMaker AI Python SDK 中的處理器執行個體 `run` 方法非常相似。傳入 `ProcessingStep` 的 `input_data` 參數是步驟本身的輸入資料。此輸入資料會在處理器執行個體執行時使用。

    請注意在處理任務的輸出組態中指定的 `"train`、`"validation` 和 `"test"` 具名通道。這類 `Properties` 步驟可以在後續步驟中使用，並在執行時期解析為其執行時期值。

   ```
   from sagemaker.processing import ProcessingInput, ProcessingOutput
   from sagemaker.workflow.steps import ProcessingStep
      
   
   processor_args = sklearn_processor.run(
       inputs=[
         ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
       ],
       outputs=[
           ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
           ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
           ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
       ],
       code="abalone/preprocessing.py",
   ) 
   
   step_process = ProcessingStep(
       name="AbaloneProcess",
       step_args=processor_args
   )
   ```

#### 步驟 4：定義訓練步驟
<a name="define-pipeline-training"></a>

本節展示如何使用 SageMaker AI [XGBoost 演算法](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html)，根據處理步驟的訓練資料輸出訓練模型。

**定義訓練步驟**

1.  指定儲存訓練模型的模型路徑。

   ```
   model_path = f"s3://{default_bucket}/AbaloneTrain"
   ```

1. 設定 XGBoost 演算法和輸入資料集的估算器。訓練執行個體類型會傳遞至此估算器。典型的訓練指令碼：
   + 從輸入通道載入資料
   + 使用超參數設定訓練
   + 訓練模型
   + 將模型儲存至 `model_dir`，以便稍後可以託管該模型

   SageMaker AI 會在訓練任務結束時，以 `model.tar.gz` 形式將模型上傳至 Amazon S3。

   ```
   from sagemaker.estimator import Estimator
   
   
   image_uri = sagemaker.image_uris.retrieve(
       framework="xgboost",
       region=region,
       version="1.0-1",
       py_version="py3",
       instance_type="ml.m5.xlarge"
   )
   xgb_train = Estimator(
       image_uri=image_uri,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=model_path,
       sagemaker_session=pipeline_session,
       role=role,
   )
   xgb_train.set_hyperparameters(
       objective="reg:linear",
       num_round=50,
       max_depth=5,
       eta=0.2,
       gamma=4,
       min_child_weight=6,
       subsample=0.7,
       silent=0
   )
   ```

1. 使用估算器執行個體和 `ProcessingStep` 的屬性建立 `S3Uri`。以 `"train"` 和 `"validation"` 輸出通道的 `TrainingStep` 傳遞至 `TrainingStep`。  

   ```
   from sagemaker.inputs import TrainingInput
   from sagemaker.workflow.steps import TrainingStep
   
   
   train_args = xgb_train.fit(
       inputs={
           "train": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "train"
               ].S3Output.S3Uri,
               content_type="text/csv"
           ),
           "validation": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "validation"
               ].S3Output.S3Uri,
               content_type="text/csv"
           )
       },
   )
   
   step_train = TrainingStep(
       name="AbaloneTrain",
       step_args = train_args
   )
   ```

#### 步驟 5：定義處理步驟進行模型評估
<a name="define-pipeline-processing-model"></a>

本節將介紹如何建立處理步驟來評估模型的準確性。此模型評估的結果會用於條件步驟，以決定要採用的執行路徑。

**定義模型評估的處理步驟**

1. 在 `/abalone` 目錄中建立名為 `evaluation.py` 的檔案。此指令碼用於處理步驟，以執行模型評估。它會採用訓練過的模型和測試資料集作為輸入，然後生成一個包含分類評估指標的 JSON 檔案。

   ```
   %%writefile abalone/evaluation.py
   import json
   import pathlib
   import pickle
   import tarfile
   import joblib
   import numpy as np
   import pandas as pd
   import xgboost
   
   
   from sklearn.metrics import mean_squared_error
   
   
   if __name__ == "__main__":
       model_path = f"/opt/ml/processing/model/model.tar.gz"
       with tarfile.open(model_path) as tar:
           tar.extractall(path=".")
       
       model = pickle.load(open("xgboost-model", "rb"))
   
       test_path = "/opt/ml/processing/test/test.csv"
       df = pd.read_csv(test_path, header=None)
       
       y_test = df.iloc[:, 0].to_numpy()
       df.drop(df.columns[0], axis=1, inplace=True)
       
       X_test = xgboost.DMatrix(df.values)
       
       predictions = model.predict(X_test)
   
       mse = mean_squared_error(y_test, predictions)
       std = np.std(y_test - predictions)
       report_dict = {
           "regression_metrics": {
               "mse": {
                   "value": mse,
                   "standard_deviation": std
               },
           },
       }
   
       output_dir = "/opt/ml/processing/evaluation"
       pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
       
       evaluation_path = f"{output_dir}/evaluation.json"
       with open(evaluation_path, "w") as f:
           f.write(json.dumps(report_dict))
   ```

1.  建立一個 `ScriptProcessor` 執行個體，用來建立 `ProcessingStep`。

   ```
   from sagemaker.processing import ScriptProcessor
   
   
   script_eval = ScriptProcessor(
       image_uri=image_uri,
       command=["python3"],
       instance_type="ml.m5.xlarge",
       instance_count=1,
       base_job_name="script-abalone-eval",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1.  使用處理器執行個體、輸入和輸出通道以及 `evaluation.py` 指令碼建立 `ProcessingStep`。傳入：
   + 來自 `step_train` 訓練步驟的 `S3ModelArtifacts` 屬性
   + `step_process` 處理步驟的 `"test"` 輸出通道的 `S3Uri`

   這與 SageMaker AI Python SDK 中的處理器執行個體 `run` 方法非常相似。  

   ```
   from sagemaker.workflow.properties import PropertyFile
   
   
   evaluation_report = PropertyFile(
       name="EvaluationReport",
       output_name="evaluation",
       path="evaluation.json"
   )
   
   eval_args = script_eval.run(
           inputs=[
           ProcessingInput(
               source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
               destination="/opt/ml/processing/model"
           ),
           ProcessingInput(
               source=step_process.properties.ProcessingOutputConfig.Outputs[
                   "test"
               ].S3Output.S3Uri,
               destination="/opt/ml/processing/test"
           )
       ],
       outputs=[
           ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation"),
       ],
       code="abalone/evaluation.py",
   )
   
   step_eval = ProcessingStep(
       name="AbaloneEval",
       step_args=eval_args,
       property_files=[evaluation_report],
   )
   ```

#### 步驟 6：定義 CreateModelStep 進行批次轉換
<a name="define-pipeline-create-model"></a>

**重要**  
我們建議您使用 [模型步驟](build-and-manage-steps-types.md#step-type-model) 在 SageMaker Python SDK v2.90.0 及更高版本中建立模型。`CreateModelStep` 將會繼續在 SageMaker Python SDK 的先前版本中運作，但不再受到主動支援。

本節將說明如何從訓練步驟的輸出建立 SageMaker AI 模型。此模型用於根據新資料集進行批次轉換。此步驟會傳入條件步驟，且只有在條件步驟評估為 `true` 時才會執行。

**為批次轉換定義 CreateModelStep**

1.  建立 SageMaker AI 模型。從 `step_train` 訓練步驟傳入 `S3ModelArtifacts` 屬性。

   ```
   from sagemaker.model import Model
   
   
   model = Model(
       image_uri=image_uri,
       model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. 定義 SageMaker AI 模型的模型輸入。

   ```
   from sagemaker.inputs import CreateModelInput
   
   
   inputs = CreateModelInput(
       instance_type="ml.m5.large",
       accelerator_type="ml.eia1.medium",
   )
   ```

1. 使用 `CreateModelInput` 和您定義的 SageMaker AI 模型執行個體來建立 `CreateModelStep`。

   ```
   from sagemaker.workflow.steps import CreateModelStep
   
   
   step_create_model = CreateModelStep(
       name="AbaloneCreateModel",
       model=model,
       inputs=inputs,
   )
   ```

#### 步驟 7：定義 TransformStep 以執行批次轉換
<a name="define-pipeline-transform"></a>

本節展示如何在模型訓練後建立 `TransformStep`，以根據資料集執行批次轉換。此步驟會傳入條件步驟，且只有在條件步驟評估為 `true` 時才會執行。

**定義 TransformStep 以執行批次轉換**

1. 使用適當的運算執行個體類型、執行個體計數和所需的輸出 Amazon S3 儲存貯體 URI 建立轉換器執行個體。從 `step_create_model` `CreateModel` 步驟傳入 `ModelName` 屬性。

   ```
   from sagemaker.transformer import Transformer
   
   
   transformer = Transformer(
       model_name=step_create_model.properties.ModelName,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=f"s3://{default_bucket}/AbaloneTransform"
   )
   ```

1. 使用您定義的轉換器執行個體和 `batch_data` 管道參數建立 `TransformStep`。

   ```
   from sagemaker.inputs import TransformInput
   from sagemaker.workflow.steps import TransformStep
   
   
   step_transform = TransformStep(
       name="AbaloneTransform",
       transformer=transformer,
       inputs=TransformInput(data=batch_data)
   )
   ```

#### 步驟 8：定義 RegisterModel 步驟以建立模型套件
<a name="define-pipeline-register"></a>

**重要**  
我們建議您使用 [模型步驟](build-and-manage-steps-types.md#step-type-model) 在 SageMaker Python SDK v2.90.0 及更高版本中註冊模型。`RegisterModel` 將會繼續在 SageMaker Python SDK 的先前版本中運作，但不再受到主動支援。

本節說明如何建立 `RegisterModel` 的執行個體。在管道中執行 `RegisterModel` 的結果是一個模型套件。模型套件是可重複使用的模型成品抽象，可封裝推論所需的所有元件。它由推論規格以及可選模型加權位置組成，推論規格會定義要使用的推論映像。模型套件群組是模型套件的集合。您可以針對 Pipelines 使用 `ModelPackageGroup`，將新版本和模型套件新增至每個管道執行的群組。若要取得有關模型註冊表的更多相關資訊，請參閱[使用模型註冊庫進行模型註冊部署](model-registry.md)。

此步驟會傳入條件步驟，且只有在條件步驟評估為 `true` 時才會執行。

**定義 RegisterModel 步驟以建立模型套件**
+  透過用於訓練步驟的估算器執行個體來建構 `RegisterModel` 步驟。從 `step_train` 訓練步驟傳入 `S3ModelArtifacts` 屬性並指定 `ModelPackageGroup`。Pipelines 為您建立此 `ModelPackageGroup`。

  ```
  from sagemaker.model_metrics import MetricsSource, ModelMetrics 
  from sagemaker.workflow.step_collections import RegisterModel
  
  
  model_metrics = ModelMetrics(
      model_statistics=MetricsSource(
          s3_uri="{}/evaluation.json".format(
              step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
          ),
          content_type="application/json"
      )
  )
  step_register = RegisterModel(
      name="AbaloneRegisterModel",
      estimator=xgb_train,
      model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
      content_types=["text/csv"],
      response_types=["text/csv"],
      inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
      transform_instances=["ml.m5.xlarge"],
      model_package_group_name=model_package_group_name,
      approval_status=model_approval_status,
      model_metrics=model_metrics
  )
  ```

#### 步驟 9：定義條件步驟以驗證模型準確性
<a name="define-pipeline-condition"></a>

`ConditionStep` 可讓 Pipelines 根據步驟屬性的條件，在您的管道 DAG 中支援條件式執行。在這種情況下，只有該模型準確性超過所需值時，您才會想要註冊模型套件。模型的準確性是由模型評估步驟決定。如果準確性超過所需值，管道也會建立 SageMaker AI 模型，並對資料集執行批次轉換。本節展示如何定義條件步驟。

**定義條件步驟以驗證模型準確性**

1.  使用模型評估計算處理步驟 `step_eval` 之輸出中的準確性值來定義 `ConditionLessThanOrEqualTo` 條件。使用處理步驟中編製索引的屬性檔案以及均方錯誤值 `"mse"` 的相應 JSONPath 來取得此輸出。

   ```
   from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
   from sagemaker.workflow.condition_step import ConditionStep
   from sagemaker.workflow.functions import JsonGet
   
   
   cond_lte = ConditionLessThanOrEqualTo(
       left=JsonGet(
           step_name=step_eval.name,
           property_file=evaluation_report,
           json_path="regression_metrics.mse.value"
       ),
       right=6.0
   )
   ```

1.  建構 `ConditionStep`。傳入 `ConditionEquals` 條件，然後將模型套件註冊和批次轉換步驟設定為條件通過時執行的後續步驟。

   ```
   step_cond = ConditionStep(
       name="AbaloneMSECond",
       conditions=[cond_lte],
       if_steps=[step_register, step_create_model, step_transform],
       else_steps=[], 
   )
   ```

#### 第 10 步：建立管道
<a name="define-pipeline-pipeline"></a>

您現在已建立了所有步驟，接下來將它們合併到一個管道中。

**建立管道**

1.  為管道定義下列內容：`name`、`parameters`、和 `steps`。`(account, region)` 對內的名稱必須是唯一的。
**注意**  
一個步驟只能在管道的步驟清單或條件步驟的 if/else 步驟清單中出現一次。不能同時在這兩個清單中出現。

   ```
   from sagemaker.workflow.pipeline import Pipeline
   
   
   pipeline_name = f"AbalonePipeline"
   pipeline = Pipeline(
       name=pipeline_name,
       parameters=[
           processing_instance_count,
           model_approval_status,
           input_data,
           batch_data,
       ],
       steps=[step_process, step_train, step_eval, step_cond],
   )
   ```

1.  (可選) 檢查 JSON 管道定義，以確保其格式正確。

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

 此管道定義已準備好提交至 SageMaker AI。在下一個教學課程中，您會將此管道提交至 SageMaker AI 並開始執行。

## 定義管道 (JSON)
<a name="collapsible-section-1"></a>

您也可以使用 [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_pipeline) 或 [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html) 來建立管道。建立管道需要管道定義，這是用以定義管道每個步驟的 JSON 物件。SageMaker SDK 提供一種建構管道定義的簡單方法，您可以與先前提到的任何 API 搭配使用來建立管道本身。在不使用 SDK 的情況下，使用者必須撰寫原始 JSON 定義來建立管道，而沒有 SageMaker Python SDK 提供的任何錯誤檢查。若要檢視管道 JSON 定義的結構描述，請參閱 [SageMaker AI 管道定義 JSON 結構描述](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/)。下列程式碼範例展示 SageMaker AI 管道定義 JSON 物件的範例：

```
{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ProcessingInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},
  {'Name': 'TrainingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ModelApprovalStatus',
   'Type': 'String',
   'DefaultValue': 'PendingManualApproval'},
  {'Name': 'ProcessedData',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
{'Name': 'InputDataUrl',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
  'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Name': 'ReadTrainDataFromFS',
   'Type': 'Processing',
   'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.4xlarge',
      'InstanceCount': 2,
      'VolumeSizeInGB': 30}},
    'AppSpecification': {'ImageUri': 'IMAGE_URI',
     'ContainerArguments': [....]},
    'RoleArn': 'ROLE',
      'ProcessingInputs': [...],
    'ProcessingOutputConfig': {'Outputs': [.....]},
    'StoppingCondition': {'MaxRuntimeInSeconds': 86400}},
   'CacheConfig': {'Enabled': True, 'ExpireAfter': '30d'}},
   ...
   ...
   ...
  }
```

 **後續步驟：**[執行管道](run-pipeline.md)

# 編輯管道
<a name="edit-pipeline-before-execution"></a>

若要在執行管道之前對其進行變更，請執行下列動作：

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio。

1. 在 Studio 的左側導覽窗格中，選擇 **Pipelines**。

1. 選取管道名稱，以檢視相關管道的詳細資訊。

1. 選擇**執行**索引標籤。

1. 選取管道執行的名稱。

1. 選擇**編輯**以開啟管道設計工具。

1. 視需要更新步驟或步驟組態之間的邊緣，然後按一下**儲存**。

   編輯後儲存管道會自動產生新的版本編號。

1. 選擇**執行**。

# 執行管道
<a name="run-pipeline"></a>

將管道的步驟定義為有向無環圖 (DAG) 之後，您可以執行管道，其會執行 DAG 中定義的步驟。下列逐步解說為您展示如何使用 Amazon SageMaker Studio 中的拖放視覺化編輯器或 Amazon SageMaker Python SDK 來執行 Amazon SageMaker AI 管道。

## 執行管道 (管道設計工具)
<a name="run-pipeline-designer"></a>

若要啟動管道的新執行，請執行下列動作：

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio。

1. 在左側導覽窗格中，選擇 **Pipelines** (管道)。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選擇管道名稱以開啟管道詳細資訊檢視。

1. 選擇右上角的**視覺化編輯器**。

1. 若要從最新版本開始執行，請選擇**執行**。

1. 若要從特定版本開始執行，請遵循下列步驟：
   + 選擇底部工具列中的版本圖示以開啟版本面板。
   + 選擇您要執行的管道版本。
   + 將滑鼠懸停在版本項目上方以顯示三點功能表，然後選擇**執行**。
   + (選用) 若要檢視管道的先前版本，請從版本面板的三點功能表中選擇**預覽**。您也可以在通知列中選擇**編輯**來編輯版本。

**注意**  
如果管道失敗，狀態橫幅會顯示**失敗**狀態。對失敗步驟進行故障診斷後，請在狀態橫幅上選擇**重試**，以從該步驟繼續執行管道。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取 **管道**。

1. 若要依名稱縮小管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱。

1. 從執行清單的**執行**或**圖表**標籤中，選擇**建立執行**。

1. 輸入或更新下列必填資訊：
   + **名稱** – 在 AWS 區域中，此名稱必須是您的帳戶獨有的。
   + **處理執行個體計數** – 用於處理的執行個體數目。
   + **模型批准狀態** – 供您方便參考。
   + **InputDataUrl** – 輸入資料的 Amazon S3 URI。

1. 選擇 **開始使用**。

一旦您的管道執行，您就可以在狀態橫幅上選擇**檢視詳細資訊**，以檢視執行的詳細資訊。

若要停止執行，請在狀態橫幅上選擇**停止**。若要從停止的位置繼續執行，請在狀態橫幅上選擇**繼續**。

**注意**  
如果管道失敗，狀態橫幅會顯示**失敗**狀態。對失敗步驟進行故障診斷後，請在狀態橫幅上選擇**重試**，以從該步驟繼續執行管道。

------

## 執行管道 (SageMaker Python SDK)
<a name="run-pipeline-sdk"></a>

使用 SageMaker AI Python SDK 建立管道定義之後，您可以將其提交至 SageMaker AI 以開始執行。下列教學課程展示如何提交管道、開始執行、檢查執行的結果，以及刪除管道。

**Topics**
+ [先決條件](#run-pipeline-prereq)
+ [第 1 步：啟動管道](#run-pipeline-submit)
+ [第 2 步：檢查管道執行](#run-pipeline-examine)
+ [第 3 步：取代管道執行的預設參數](#run-pipeline-parametrized)
+ [第 4 步：停止並刪除管道執行](#run-pipeline-delete)

### 先決條件
<a name="run-pipeline-prereq"></a>

本教學課程要求如下：
+  SageMaker 筆記本執行個體。  
+  Pipelines 管道定義。本教學課程假設您使用的完成[定義管道](define-pipeline.md)教學課程後建立的管道定義。

### 第 1 步：啟動管道
<a name="run-pipeline-submit"></a>

首先，您需要啟動管道。

**啟動管道**

1. 檢查 JSON 管道定義，以確保其格式正確。

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

1. 將管道定義提交至 Pipelines 服務以建立管道 (如果管道不存在)，或更新管道 (如果管道存在)。Pipelines 會使用傳入的角色來建立步驟中定義的所有任務。

   ```
   pipeline.upsert(role_arn=role)
   ```

1. 啟動管道執行。

   ```
   execution = pipeline.start()
   ```

### 第 2 步：檢查管道執行
<a name="run-pipeline-examine"></a>

接下來，您需要檢查管道的執行情況。

**檢查管道執行**

1.  描述管道執行狀態，以確保已成功建立和啟動管道。

   ```
   execution.describe()
   ```

1. 等候執行完成。

   ```
   execution.wait()
   ```

1. 列出執行步驟及狀態。

   ```
   execution.list_steps()
   ```

   您的輸出看起來應如以下所示：

   ```
   [{'StepName': 'AbaloneTransform',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 870000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 45, 50, 492000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:transform-job/pipelines-cfvy1tjuxdq8-abalonetransform-ptyjoef3jy'}}},
    {'StepName': 'AbaloneRegisterModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 929000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 28, 15000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model-package/abalonemodelpackagegroupname/1'}}},
    {'StepName': 'AbaloneCreateModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 895000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 708000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model/pipelines-cfvy1tjuxdq8-abalonecreatemodel-jl94rai0ra'}}},
    {'StepName': 'AbaloneMSECond',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 25, 558000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 329000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Condition': {'Outcome': 'True'}}},
    {'StepName': 'AbaloneEval',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 767000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 18, 80000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneeval-zfraozhmny'}}},
    {'StepName': 'AbaloneTrain',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 34, 55, 867000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 34000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:training-job/pipelines-cfvy1tjuxdq8-abalonetrain-tavd6f3wdf'}}},
    {'StepName': 'AbaloneProcess',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 30, 27, 160000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 34, 48, 390000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneprocess-mgqyfdujcj'}}}]
   ```

1. 管道執行完成後，從 Amazon S3 下載產生的 `evaluation.json` 檔案以檢查報告。

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### 第 3 步：取代管道執行的預設參數
<a name="run-pipeline-parametrized"></a>

您可以透過指定不同的管道參數來覆寫預設值，來啟動管道的額外執行。

**覆寫預設參數**

1. 建立管道執行。這會在模型批准狀態覆寫設為 “已批准” 的情況下啟動另一個管道執行。這意味著透過 `RegisterModel` 步驟產生的模型套件版本會自動準備好透過 CI/CD 管道進行部署，例如使用 SageMaker Projects。如需詳細資訊，請參閱[使用 SageMaker 專案進行 MLOps 自動化](sagemaker-projects.md)。

   ```
   execution = pipeline.start(
       parameters=dict(
           ModelApprovalStatus="Approved",
       )
   )
   ```

1. 等候執行完成。

   ```
   execution.wait()
   ```

1. 列出執行步驟及狀態。

   ```
   execution.list_steps()
   ```

1. 管道執行完成後，從 Amazon S3 下載產生的 `evaluation.json` 檔案以檢查報告。

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### 第 4 步：停止並刪除管道執行
<a name="run-pipeline-delete"></a>

管道完成後，您可以停止任何正在進行的執行並刪除管道。

**停止和刪除管道執行**

1. 停止管道執行。

   ```
   execution.stop()
   ```

1. 刪除管道。

   ```
   pipeline.delete()
   ```

# 停止管道
<a name="pipelines-studio-stop"></a>

您可以在 Amazon SageMaker Studio 主控台中停止管道執行。

若要在 Amazon SageMaker Studio 主控台中停止管道執行，請根據您是使用 Studio 還是 Studio Classic 來完成下列步驟。

------
#### [ Studio ]

1. 在左側導覽窗格中，選取**管道**。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱。

1. 選擇**執行**索引標籤。

1. 選取要停止的執行。

1. 選擇**停止**。若要從執行停止的位置繼續該執行，請選擇**繼續**。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取 **管道**。

1. 若要依名稱縮小管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 若要停止管道執行，請在管道的狀態橫幅上選擇**檢視詳細資訊**，然後選擇**停止**。若要從執行停止的位置繼續該執行，請選擇**繼續**。

------

# 檢視管道的詳細資訊
<a name="pipelines-studio-list"></a>

您可以檢視 SageMaker AI 管道的詳細資訊，以了解其參數、其步驟的相依性，或監控其進度和狀態。這可協助您針對工作流程進行疑難排解或最佳化工作流程。您可以使用 Amazon SageMaker Studio 主控台存取指定管道的詳細資訊，並探索其執行歷程記錄、定義、參數和中繼資料。

或者，如果您的管道與 SageMaker AI 專案相關聯，您可以從專案的詳細資訊頁面存取管道詳細資訊。如需詳細資訊，請參閱[檢視專案資源](sagemaker-projects-resources.md)。

若要檢視 SageMaker AI 管道的詳細資訊，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

**注意**  
當管道需要在壓縮模型檔案 (model.tar.gz) 中包含自訂指令碼，以上傳到 Amazon S3 並用來將模型部署到 SageMaker AI 端點時，會發生模型重新封裝。當 SageMaker AI 管道訓練模型並將其註冊到模型註冊庫時，*如果*訓練任務的訓練模型輸出需要包含自訂推論指令碼，則會引入重新封裝步驟。重新封裝步驟會解壓縮模型、新增新指令碼，然後重新壓縮模型。執行管道會將重新封裝步驟新增為訓練工作。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱，以檢視相關管道的詳細資訊。

1. 選擇下列其中一個索引標籤以檢視管道詳細資訊：
   + **執行** – 與執行相關的詳細資訊。
   + **圖形** – 管道圖形，包括所有步驟。
   + **參數** – 與管道相關的執行參數和指標。
   + **資訊** – 與管道相關聯的中繼資料，例如標籤、管道 Amazon Resource Name (ARN) 和角色 ARN。您也可以從這個頁面編輯管道描述。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取 **管道**。

1. 若要依名稱縮小管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱，以檢視相關管道的詳細資訊。管道詳細資訊標籤隨即開啟，並顯示管道執行清單。您可以開始執行或選擇某個其他標籤，以取得有關管道的詳細資訊。使用**屬性檢查器** 圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/gears.png)) 選擇要顯示的資料欄。

1. 從管道詳細資料頁面中，選擇下列某個標籤，檢視有關管道的詳細資訊：
   + **執行** – 與執行相關的詳細資訊。您可以透過此標籤或**圖表**標籤建立執行項目。
   + **圖表** – 管道的 DAG。
   + **參數** – 包含模型核准狀態。
   + **設定** – 與管道相關聯的中繼資料。您可以下載管道定義檔案，並透過此標籤編輯管道名稱和描述。

------

# 檢視管道執行的詳細資訊
<a name="pipelines-studio-view-execution"></a>

您可以檢閱特定 SageMaker AI 管道執行的詳細資訊。這可協助您：
+ 識別並解決執行期間可能發生的問題，例如失敗的步驟或非預期的錯誤。
+ 比較不同管道執行的結果，以了解輸入資料或參數的變更如何影響整體工作流程。
+ 識別瓶頸和最佳化的機會。

若要檢視管道執行的詳細資訊，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱，以檢視相關管道的詳細資訊。

1. 選擇**執行**索引標籤。

1. 選取要檢視的管道執行名稱。該執行的管道圖隨即出現。

1. 選擇圖形中的任何管道步驟，以查看右側邊欄中的步驟設定。

1. 選擇下列其中一個索引標籤，以檢視更多的管道詳細資訊：
   + **定義** - 管道圖，包括所有步驟。
   + **參數** – 包含模型核准狀態。
   + **詳細資訊** – 與管道相關聯的中繼資料，例如標籤、管道 Amazon Resource Name (ARN) 和角色 ARN。您也可以從這個頁面編輯管道描述。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取 **管道**。

1. 若要依名稱縮小管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱。管道的**執行**頁面隨即開啟。

1. 在**執行**頁面中，選取執行名稱以檢視執行的詳細資料。執行詳細資料標籤會開啟，並顯示管道中步驟的圖形。

1. 若要依名稱搜尋步驟，請在搜尋欄位中輸入符合步驟名稱的字元。使用圖形右下角的調整大小圖示來放大和縮小圖形、使圖形符合螢幕大小，以及將圖形展開至全螢幕。若要專注於圖形的特定部分，您可以選取圖形的空白區域，然後拖曳圖形以在該區域上置中。  
![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/yosemite/execution-graph-w-input.png)

1. 選擇圖形中的其中一個管道步驟以查看該步驟的詳細資料。在前面的螢幕擷取畫面中，選擇了一個訓練步驟並顯示下列標籤：
   + **輸入** – 訓練的輸入內容。如果輸入來源來自 Amazon Simple Storage Service (Amazon S3)，請選擇連結以在 Amazon S3 主控台中檢視檔案。
   + **輸出** – 訓練的輸出，例如指標、圖表、檔案和評估結果。這些圖形是使用[追蹤器](https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html#smexperiments.tracker.Tracker.log_precision_recall) API 產生的。
   + **日誌** – 由步驟產生的 Amazon CloudWatch 日誌。
   + **資訊** – 與步驟相關聯的參數和中繼資料。  
![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/yosemite/execution-graph-info.png)

------

# 下載管道定義檔案
<a name="pipelines-studio-download"></a>

您可以直接從 Amazon SageMaker Studio UI 下載 SageMaker AI 管道的定義檔案。您可以將此管道定義檔案用於：
+ 備份和還原：使用下載的檔案建立管道組態的備份，讓您可以在基礎設施故障或意外變更時將其還原。
+ 版本控制：將管道定義檔案存放在來源控制系統中，以追蹤管道的變更，並視需要還原至先前的版本。
+ 程式設計互動：使用管道定義檔案做為 SageMaker SDK 或 AWS CLI的輸入。
+ 與自動化程序整合：將管道定義整合到您的 CI/CD 工作流程或其他自動化程序。

若要下載管道的定義檔案，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱。**執行**頁面隨即開啟，並顯示管道執行的清單。

1. 停留在**執行**頁面，或選擇管道執行資料表左側的**圖形**、**資訊**或**參數**頁面。您可以從其中任何頁面下載管道定義。

1. 在頁面右上角，選擇垂直省略符號，然後選擇**下載管道定義 (JSON)**。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取 **管道**。

1. 若要依名稱縮小管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 選取管道名稱。

1. 選擇 **Settings** (設定) 標籤。

1. 選擇**下載管道定義檔案**。

------

# 從管道中存取實驗資料
<a name="pipelines-studio-experiments"></a>

**注意**  
SageMaker Experiments 是僅在 Studio Classic 中提供的一項功能。

在您建立管道並指定 [pipeline\$1experiment\$1config](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.pipeline_experiment_config) 時，Pipelines 預設會建立下列 SageMaker Experiments 實體 (如果這些實體不存在)：
+ 管道實驗
+ 每次執行管道的執行群組
+ 管道步驟中建立的每個 SageMaker AI 任務的執行

如需如何將實驗與管道整合的相關資訊，請參閱 [Amazon SageMaker Experiments 整合](pipelines-experiments.md)。如需 SageMaker Experiments 的詳細資訊，請參閱 [Studio Classic 中的 Amazon SageMaker Experiments](experiments.md)。

您可以從管道執行清單或實驗清單取得與管道相關聯的管路清單。

**從管道執行清單檢視執行清單**

1. 若要檢視管道執行清單，請遵循 [檢視管道的詳細資訊](pipelines-studio-list.md) 的 *Studio Classic* 索引標籤中的前五個步驟。

1. 在畫面右上方，選擇**篩選**圖示 (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png))。

1. 選取**實驗**。如果在建立管道時未停用實驗整合，則實驗名稱會顯示在執行清單中。
**注意**  
[Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 的 v2.41.0 版本中引入了實驗整合功能。依預設，使用舊版 SDK 建立的管道不會與實驗整合。

1. 選擇您選擇的實驗以檢視與該實驗相關的執行群組和執行。

**從實驗清單檢視執行清單**

1. 在 Studio Classic 的左側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**實驗**。

1. 使用搜尋列或**篩選**圖示 (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png))，將清單篩選為由管道建立的實驗。

1. 開啟實驗名稱並檢視管道建立的執行清單。

# 追蹤管道的歷程
<a name="pipelines-lineage-tracking"></a>

在本教學課程中，您會使用 Amazon SageMaker Studio 來追蹤 Amazon SageMaker AI ML 管道的歷程。

管道是由 [Amazon SageMaker 範例 GitHub 儲存庫](https://github.com/awslabs/amazon-sagemaker-examples)中的 [使用 Amazon SageMaker 模型建構管道協同運作工作](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html) 筆記本建立的。如需有關如何建立管道的詳細資訊，請參閱[定義管道](define-pipeline.md)。

Studio 中的歷程跟踪圍繞有向無環圖 (DAG) 進行。DAG 代表管道中的步驟。您可以透過 DAG 追蹤從任何步驟到任何其他步驟的歷程。下圖展示管道中的步驟。這些步驟會在 Studio 中顯示為 DAG。

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-steps.png)


若要在 Amazon SageMaker Studio 主控台中追蹤管道的歷程，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

**追蹤管道的歷程**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選取**管道**。

1. (選用) 若要依名稱篩選管道清單，請在搜尋欄位中輸入完整或部分管道名稱。

1. 在**名稱**欄中，選取管道名稱以檢視管道的詳細資訊。

1. 選擇**執行**索引標籤。

1. 在**執行**資料表的**名稱**欄中，選取要檢視的管道執行名稱。

1. 在**執行**頁面的右上角，選擇垂直省略符號，然後選擇**下載管道定義 (JSON)**。您可以檢視此檔案以查看管道圖形的定義方式。

1. 選擇**編輯**以開啟管道設計工具。

1. 使用畫布右上角的調整大小和縮放控制項來放大和縮小圖形、使圖形符合螢幕大小，或將圖形展開至全螢幕。

1. 若要檢視您的訓練、驗證和測試資料集，請完成下列步驟：

   1. 在您的管道圖形中選擇處理步驟。

   1. 在右側邊欄中，選擇**概觀**索引標籤。

   1. 在**檔案**區段中，尋找訓練、驗證和測試資料集的 Amazon S3 路徑。

1. 若要部署您的模型成品，請完成下列步驟：

   1. 在您的管道圖形中選擇訓練步驟。

   1. 在右側邊欄中，選擇**概觀**索引標籤。

   1. 在**檔案**區段中，尋找模型成品的 Amazon S3 路徑。

1. 若要尋找模型套件 ARN，請完成下列步驟：

   1. 選擇註冊模型步驟。

   1. 在右側邊欄中，選擇**概觀**索引標籤。

   1. 在**檔案**區段中，尋找模型套件的 ARN。

------
#### [ Studio Classic ]

**追蹤管道的歷程**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在 Studio 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 在功能表中，選取**管道**。

1. 您可以使用**搜尋**方塊來篩選管道清單。

1. 選擇 `AbalonePipeline` 管道以檢視執行清單和管道的其他詳細資訊。

1. 在右側邊欄中選擇**屬性檢查器** 圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/gears.png)) 以開啟**資料表屬性**窗格，您可以在其中選擇要檢視的屬性。

1. 選擇**設定**標籤，然後選擇**下載管道定義檔案**。您可以檢視此檔案以查看管道圖形的定義方式。

1. 在**執行**索引標籤上，選取執行清單中的第一列，以檢視其執行圖形和執行的其他詳細資訊。請注意，此圖形與教學課程開頭顯示的圖表相符。

   使用圖形右下角的調整大小圖示來放大和縮小圖形、使圖形符合螢幕大小，或將圖形展開至全螢幕。若要專注於圖形的特定部分，您可以選取圖形的空白區域，然後拖曳圖形以在該區域上置中。圖形右下角的插頁區域顯示您在圖形中的位置。  
![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-execution-graph.png)

1. 在**圖表**標籤上，選擇`AbaloneProcess`步驟以檢視與此步驟相關的詳細資訊。

1. 在**輸出**標籤的**檔案**下，找到訓練、驗證和測試資料集的 Amazon S3 路徑。
**注意**  
若要取得完整路徑，請以滑鼠右鍵按一下路徑，然後選擇**複製儲存格內容**。

   ```
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/train
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/validation
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/test
   ```

1. 選擇 `AbaloneTrain` 步驟。

1. 在**輸出**標籤的**檔案**下，找到模型成品的 Amazon S3 路徑：

   ```
   s3://sagemaker-eu-west-1-acct-id/AbaloneTrain/pipelines-6locnsqz4bfu-AbaloneTrain-NtfEpI0Ahu/output/model.tar.gz
   ```

1. 選擇 `AbaloneRegisterModel` 步驟。

1. 在**輸出**頁籤的**檔案**下，找到模型套件的 ARN：

   ```
   arn:aws:sagemaker:eu-west-1:acct-id:model-package/abalonemodelpackagegroupname/2
   ```

------

# Kubernetes 協調
<a name="kubernetes-workflows"></a>

您可以使用 SageMaker AI Operators for Kubernetes 和 SageMaker AI Components for Kubeflow Pipelines 來協調 SageMaker 訓練和推論任務。SageMaker AI Operators for Kubernetes 可讓使用 Kubernetes 的開發人員和資料科學家更輕鬆地在 SageMaker AI 中訓練、調整和部署機器學習 (ML) 模型。SageMaker AI Components for Kubeflow Pipelines 可讓您將資料處理和訓練任務從 Kubernetes 叢集移至 SageMaker AI 的機器學習最佳化受管服務。

**Topics**
+ [SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators.md)
+ [適用於 Kubeflow 管道的 SageMaker AI 元件](kubernetes-sagemaker-components-for-kubeflow-pipelines.md)

# SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators"></a>

SageMaker AI Operators for Kubernetes 可讓使用 Kubernetes 的開發人員和資料科學家更輕鬆地在 SageMaker AI 中訓練、調整和部署機器學習 (ML) 模型。您可以在 Amazon Elastic Kubernetes Service (Amazon EKS) 的 Kubernetes 叢集上安裝這些 SageMaker AI Operators，以使用 Kubernetes API 和命令列 Kubernetes 工具 (例如 `kubectl`) 原生建立 SageMaker AI 任務。本指南展示如何設定和使用運算子，以透過 Kubernetes 叢集在 SageMaker AI 上執行模型訓練、超參數調整或推論 (即時和批次)。本章中的程序和準則假設您熟悉 Kubernetes 及其基本命令。

**重要**  
我們正在停止對 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 原始版本的開發和技術支援。  
如果您目前使用的是 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的 `v1.2.2` 或以下版本，我們建議您將資源遷移到 [Amazon SageMaker 的 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代的 SageMaker Operators for Kubernetes，以 [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/) 為基礎。  
如需與移轉步驟相關的資訊，請參閱[將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)。  
如需與終止支援 SageMaker Operators for Kubernetes 原始版本相關的常見問題的答案，請參閱[宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

**注意**  
使用這些運算子無須額外收費。透過這些運算子使用的任何 SageMaker AI 資源都會產生費用。

## 什麼是運算子？
<a name="kubernetes-sagemaker-operators-overview"></a>

Kubernetes 運算子是一種應用程式控制器，代表 Kubernetes 使用者管理應用程式。控制平面的控制器包括各種控制迴圈，這些迴圈監聽中央狀態管理器 (ETCD) 的指令來調節它們控制的應用程式的狀態。這類應用程式的範例包含 [Cloud-controller-manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) 和 `[kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)`。運算子通常提供比原始 Kubernetes API 更高層級的抽象，方便使用者更輕鬆地部署和管理應用程式。若要將新功能新增至 Kubernetes，開發人員可以建立包含其應用程式特定或網域特定邏輯和元件的**自訂資源**來擴充 Kubernetes API。Kubernetes 中的運算子可讓使用者以原生方式調用這些自訂資源，並自動執行相關聯的工作流程。

### Kubernetes (ACK) 的 AWS 控制器如何運作？
<a name="kubernetes-sagemaker-operators-explained"></a>

SageMaker AI Operators for Kubernetes 讓您可以透過 Kubernetes 叢集管理 SageMaker AI 中的任務。SageMaker AI Operators for Kubernetes 的最新版本是以 AWS Controllers for Kubernetes (ACK) 為基礎。ACK 包含常見的控制器執行時間、程式碼產生器，以及一組 AWS 服務特定的控制器，其中一個是 SageMaker AI 控制器。

下圖說明 ACK 的工作原理。

![\[解釋 ACK 型 SageMaker AI Operator for Kubernetes。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/k8s-orchestration/sagemaker-operators-for-kubernetes-ack-controller.png)


在此圖中，Kubernetes 使用者想要使用 Kubernetes API，透過 Kubernetes 叢集在 SageMaker AI 上執行模型訓練。使用者呼叫 `kubectl apply`，並傳入用於描述 SageMaker 訓練工作之 Kubernetes 自訂資源的一個檔案。`kubectl apply` 將此檔案 (稱為資訊清單) 傳遞至 Kubernetes 控制器節點 (工作流程圖表中的第 *1* 步) 中執行的 Kubernetes API 伺服器。Kubernetes API 伺服器會接收具有 SageMaker 訓練工作規格的資訊清單，並決定使用者是否具有建立 `sageMaker.services.k8s.aws/TrainingJob` 類型自訂資源的許可，以及自訂資源是否已正確格式化 (第 *2* 步)。如果使用者獲得授權且自訂資源有效，Kubernetes API 伺服器會將自訂資源寫入 (第 *3* 步) 至其 etcd 資料存放區，然後回應 (第 *4* 步) 建立自訂資源的使用者。SageMaker AI 控制器 (在一般 Kubernetes Pod 上下文中的 Kubernetes 工作節點上執行) 會收到通知 (步驟 *5*)，告知已建立 `sageMaker.services.k8s.aws/TrainingJob` 類型的新自訂資源。接著，SageMaker AI 控制器會與 SageMaker API 進行通訊 (第 *6* 步)，呼叫 SageMaker AI `CreateTrainingJob` API 以在 AWS中建立訓練任務。與 SageMaker API 通訊後，SageMaker AI 控制器會呼叫 Kubernetes API 伺服器，以使用從 SageMaker 接收到的資訊更新 (步驟 *7*) 自訂資源的狀態。因此，SageMaker AI 控制器會向開發人員提供與使用 AWS SDK 時相同的資訊。

### 權限概觀
<a name="kubernetes-sagemaker-operators-authentication"></a>

運算子代表您存取 SageMaker AI 資源。運算子擔任與 AWS 資源互動的 IAM 角色與您用來存取 Kubernetes 叢集的登入資料不同。該角色也與執行機器學習任務時 AWS 擔任的角色不同。

下列影像說明了各種驗證層。

![\[SageMaker AI Operator for Kubernetes 的各種驗證層。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/k8s-orchestration/sagemaker-operators-for-kubernetes-authentication.png)


# 最新 SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators-ack"></a>

本節以使用 AWS Controllers for Kubernetes (ACK) 的最新版本 SageMaker AI Operators for Kubernetes 為基礎。

**重要**  
如果您目前使用的是 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的 `v1.2.2` 或以下版本，我們建議您將資源遷移到 [Amazon SageMaker 的 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代的 SageMaker Operators for Kubernetes，以 [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/) 為基礎。  
如需與移轉步驟相關的資訊，請參閱[將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)。  
如需與終止支援 SageMaker Operators for Kubernetes 原始版本相關的常見問題的答案，請參閱[宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

[SageMaker AI Operators for Kubernetes ](https://github.com/aws-controllers-k8s/sagemaker-controller)的最新版本是以 [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/ ) 為基礎，這是用於建置 Kubernetes 自訂控制器的架構，其中每個控制器都會與服務 AWS API 通訊。這些控制器可讓 Kubernetes 使用者使用 Kubernetes API 佈建 AWS 資源，例如資料庫或訊息佇列。

使用下列步驟安裝和使用 ACK，透過 Amazon SageMaker AI 訓練、調整和部署機器學習模型。

**Topics**
+ [安裝 SageMaker AI Operators for Kubernetes](#kubernetes-sagemaker-operators-ack-install)
+ [使用 SageMaker AI Operators for Kubernetes](#kubernetes-sagemaker-operators-ack-use)
+ [參考資料](#kubernetes-sagemaker-operators-ack-reference)

## 安裝 SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators-ack-install"></a>

若要設定 SageMaker AI Operators for Kubernetes 的最新可用版本，請參閱[使用 ACK SageMaker AI 控制器進行機器學習](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup)中的*設定*章節。

## 使用 SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators-ack-use"></a>

如需有關如何透過 Amazon EKS 使用適用於 Amazon SageMaker AI 的 ACK 服務控制器訓練機器學習模型的教學課程，請參閱[使用 ACK SageMaker AI 控制器進行機器學習](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/)。

如需自動擴展範例，請參閱[使用應用程式自動擴展來擴展 SageMaker AI 工作負載](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/)

## 參考資料
<a name="kubernetes-sagemaker-operators-ack-reference"></a>

另請參閱[適用於 Amazon SageMaker AI 的 ACK 服務控制器 GitHub 儲存庫](https://github.com/aws-controllers-k8s/sagemaker-controller)或閱讀[適用於 Kubernetes 的AWS 控制器文件](https://aws-controllers-k8s.github.io/community/docs/community/overview/)。

# 舊版 SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators-end-of-support"></a>

本節以 [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s) 的原始版本為基礎。

**重要**  
我們正在停止對 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 原始版本的開發和技術支援。  
如果您目前使用的是 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的 `v1.2.2` 或以下版本，我們建議您將資源遷移到 [Amazon SageMaker 的 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代的 SageMaker Operators for Kubernetes，以 [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/) 為基礎。  
如需與移轉步驟相關的資訊，請參閱[將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)。  
如需與終止支援 SageMaker Operators for Kubernetes 原始版本相關的常見問題的答案，請參閱[宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

**Topics**
+ [安裝 SageMaker AI Operators for Kubernetes](#kubernetes-sagemaker-operators-eos-install)
+ [使用 Amazon SageMaker AI 任務](kubernetes-sagemaker-jobs.md)
+ [將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)
+ [宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

## 安裝 SageMaker AI Operators for Kubernetes
<a name="kubernetes-sagemaker-operators-eos-install"></a>

使用下列步驟安裝和使用 SageMaker AI Operators for Kubernetes，透過 Amazon SageMaker AI 訓練、調整和部署機器學習模型。

**Topics**
+ [IAM 角色型設定和運算子部署](#iam-role-based-setup-and-operator-deployment)
+ [清除資源](#cleanup-operator-resources)
+ [刪除運算子](#delete-operators)
+ [故障診斷](#troubleshooting)
+ [每個區域的映像和 SMLog](#images-and-smlogs-in-each-region)

### IAM 角色型設定和運算子部署
<a name="iam-role-based-setup-and-operator-deployment"></a>

以下各節說明設定和部署原始版本運算子的步驟。

**警告**  
**提醒：**下列步驟不會安裝 SageMaker AI Operators for Kubernetes 的最新版本。若要安裝新 ACK 型 SageMaker AI Operators for Kubernetes，請參閱[最新 SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md)。

#### 先決條件
<a name="prerequisites"></a>

本指南假設已完成下列先決條件：
+ 在用來存取 Kubernetes 叢集的用戶端電腦上安裝下列工具：
  + [https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) 1.13 版本或更新版本。使用 `kubectl` 版本，必須與 Amazon EKS 叢集控制平面的版本差距在一個版本以內。例如，1.13 `kubectl` 用戶端可搭配使用 Kubernetes 1.13 和 1.14 版叢集。早於 1.13 的版本不支援 OpenID Connect (OIDC)。
  + [https://github.com/weaveworks/eksctl](https://github.com/weaveworks/eksctl) 0.7.0 版本或更新版本 
  + [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv1.html) 1.16.232 版或更新版本 
  + (可選) [Helm](https://helm.sh/docs/intro/install/) 3.0 版本或更新版本 
  + [aws-iam-authenticator](https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html) 
+ 擁有建立角色並將政策附加至角色的 IAM 許可。
+ 已建立要在其上執行運算子的 Kubernetes 叢集。應該是 Kubernetes 版本 1.13 或 1.14。對於使用 `eksctl` 自動建立的叢集，請參閱[eksctl 入門](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html)。佈建叢集需要 20–30 分鐘才能完成。

#### 叢集範圍部署
<a name="cluster-scoped-deployment"></a>

使用 IAM 角色部署運算子之前，請先將 OpenID Connect (OIDC) 身分提供者 (IdP) 與您的角色建立關聯，以便透過 IAM 服務進行驗證。

##### 為您的叢集建立 OIDC 身分提供者
<a name="create-an-openid-connect-provider-for-your-cluster"></a>

下列指示展示如何建立 OIDC 提供者，並將其與您的 Amazon EKS 叢集相關聯。

1. 設定本機 `CLUSTER_NAME` 和 `AWS_REGION` 環境變數，如下所示：

   ```
   # Set the Region and cluster
   export CLUSTER_NAME="<your cluster name>"
   export AWS_REGION="<your region>"
   ```

1. 使用下列命令將 OIDC 提供者與叢集相關聯。如需詳細資訊，請參閱[為叢集上的服務帳戶啟用 IAM 角色](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html)。

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
         --region ${AWS_REGION} --approve
   ```

   您的輸出看起來應如以下所示：

   ```
   [_]  eksctl version 0.10.1
     [_]  using region us-east-1
     [_]  IAM OpenID Connect provider is associated with cluster "my-cluster" in "us-east-1"
   ```

現在叢集具有 OIDC 身分提供者，您可以建立角色，並授與 Kubernetes 服務帳戶許可來擔任該角色。

##### 取得 OIDC ID
<a name="get-the-oidc-id"></a>

若要設定服務帳戶，請使用下列命令取得 OIDC 發行者 URL：

```
aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} \
      --query cluster.identity.oidc.issuer --output text
```

此命令會傳回類似以下內容的 URL：

```
https://oidc.eks.${AWS_REGION}.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

在此 URL 中，值 `D48675832CA65BD10A532F597OIDCID` 為 OIDC ID。叢集的 OIDC ID 與此不同。您需要此 OIDC ID 值才能建立角色。

 如果您的輸出是 `None`，則意味著你的用戶端版本是舊的。若要解決此問題，請執行下列命令：

```
aws eks describe-cluster --region ${AWS_REGION} --query cluster --name ${CLUSTER_NAME} --output text | grep OIDC
```

傳回的 OIDC URL 如下：

```
OIDC https://oidc.eks.us-east-1.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

##### 建立 IAM 角色
<a name="create-an-iam-role"></a>

1. 建立一個名為 `trust.json` 的檔案，並將以下信任關係代碼塊插入其中。請務必使用與叢集對應的值取代所有 `<OIDC ID>`、`<AWS account number>`和 `<EKS Cluster region>` 預留位置。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws-cn:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

1. 執行下列命令，以建立具有 `trust.json` 中定義的信任關係的角色。此角色可讓 Amazon EKS 叢集從 IAM 取得和重新整理登入資料。

   ```
   aws iam create-role --region ${AWS_REGION} --role-name <role name> --assume-role-policy-document file://trust.json --output=text
   ```

   您的輸出看起來應如以下所示：

   ```
   ROLE    arn:aws:iam::123456789012:role/my-role 2019-11-22T21:46:10Z    /       ABCDEFSFODNN7EXAMPLE   my-role
   ASSUMEROLEPOLICYDOCUMENT        2012-10-17		 	 	 
   STATEMENT       sts:AssumeRoleWithWebIdentity   Allow
   STRINGEQUALS    sts.amazonaws.com       system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default
   PRINCIPAL       arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/
   ```

    請注意 `ROLE ARN`；您會將此值傳遞給運算子。

##### 將 AmazonSageMakerFullAccess 政策附加到此角色
<a name="attach-the-amazonsagemakerfullaccess-policy-to-the-role"></a>

若要授予角色 SageMaker AI 的存取權，請附加 [AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) 政策。如果想要限制運算子的權限，您可以建立並附加自訂政策。

 若要附加 `AmazonSageMakerFullAccess`，請執行下列命令：

```
aws iam attach-role-policy --role-name <role name>  --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
```

Kubernetes 服務帳戶 `sagemaker-k8s-operator-default` 應該具有 `AmazonSageMakerFullAccess` 許可。當您安裝運算子時，請確認這一點。

##### 部署運算子
<a name="deploy-the-operator"></a>

部署運算子時，您可以使用 YAML 檔案或 Helm Chart。

##### 使用 YAML 部署運算子
<a name="deploy-the-operator-using-yaml"></a>

這是部署運算子的最簡單方法。程序如下：

1. 使用以下命令下載安裝程式指令碼：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/installer.yaml
   ```

1. 編輯 `installer.yaml` 檔案以取代 `eks.amazonaws.com/role-arn`。將此處的 ARN 取代為您建立的 OIDC 型角色的 Amazon Resource Name (ARN)。

1. 使用下列命令部署叢集：

   ```
   kubectl apply -f installer.yaml
   ```

##### 使用 Helm Chart 部署運算子
<a name="deploy-the-operator-using-helm-charts"></a>

使用提供的 Helm Chart 安裝運算子。

1. 使用以下命令複製 Helm 安裝程式目錄：

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. 導覽至 `amazon-sagemaker-operator-for-k8s/hack/charts/installer` 資料夾。編輯 `rolebased/values.yaml` 檔案，其中包含圖表的高階參數。將此處的角色 ARN 取代為您建立的 OIDC 型角色的 Amazon Resource Name (ARN)。

1. 使用以下命令安裝 Helm Chart：

   ```
   kubectl create namespace sagemaker-k8s-operator-system
     helm install --namespace sagemaker-k8s-operator-system sagemaker-operator rolebased/
   ```

   如果您決定將運算子安裝到指定命名空間以外的其他命名空間，則需要調整 IAM 角色 `trust.json` 檔案中定義的命名空間，以確保命名空間相符。

1. 片刻之後，圖表會以隨機產生的名稱安裝。執行下列命令來驗證是否安裝成功：

   ```
   helm ls
   ```

   您的輸出看起來應如以下所示：

   ```
   NAME                    NAMESPACE                       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
     sagemaker-operator      sagemaker-k8s-operator-system   1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

##### 驗證運算子部署
<a name="verify-the-operator-deployment"></a>

1. 您應該可以透過執行下列命令，查看部署至叢集的每個運算子的 SageMaker AI 自訂資源定義 (CRD)：

   ```
   kubectl get crd | grep sagemaker
   ```

   您的輸出看起來應如以下所示：

   ```
   batchtransformjobs.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   endpointconfigs.sagemaker.aws.amazon.com            2019-11-20T17:12:34Z
   hostingdeployments.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   hyperparametertuningjobs.sagemaker.aws.amazon.com   2019-11-20T17:12:34Z
   models.sagemaker.aws.amazon.com                     2019-11-20T17:12:34Z
   trainingjobs.sagemaker.aws.amazon.com               2019-11-20T17:12:34Z
   ```

1. 確定運算子 pod 已成功執行。使用下列命令列出所有 pod：

   ```
   kubectl -n sagemaker-k8s-operator-system get pods
   ```

   您應該會看到命名空間 `sagemaker-k8s-operator-system` 中名為 `sagemaker-k8s-operator-controller-manager-*****` 的 pod，如下所示：

   ```
   NAME                                                         READY   STATUS    RESTARTS   AGE
   sagemaker-k8s-operator-controller-manager-12345678-r8abc     2/2     Running   0          23s
   ```

#### 命名空間範圍部署
<a name="namespace-scoped-deployment"></a>

您可以選擇在個別 Kubernetes 命名空間的範圍內安裝操作員。在此模式下，只有當資源是在該 SageMaker AI 命名空間內建立時，控制器才會監視並協調資源。這樣可以更好地控制由哪個控制器管理哪些資源。這對於部署到多個 AWS 帳戶或控制哪些使用者可以存取特定任務非常有用。

本指南概述如何將運算子安裝到預先定義的特定命名空間。若要將控制器部署到第二個命名空間，請從頭到尾遵循指南進行操作，並在每個步驟中變更命名空間。

##### 為您的 Amazon EKS 叢集建立 OIDC 身分提供者
<a name="create-an-openid-connect-provider-for-your-eks-cluster"></a>

下列指示展示如何建立 OIDC 提供者，並將其與您的 Amazon EKS 叢集相關聯。

1. 設定本機 `CLUSTER_NAME` 和 `AWS_REGION` 環境變數，如下所示：

   ```
   # Set the Region and cluster
   export CLUSTER_NAME="<your cluster name>"
   export AWS_REGION="<your region>"
   ```

1. 使用下列命令將 OIDC 提供者與叢集相關聯。如需詳細資訊，請參閱[為叢集上的服務帳戶啟用 IAM 角色](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html)。

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
         --region ${AWS_REGION} --approve
   ```

   您的輸出看起來應如以下所示：

   ```
   [_]  eksctl version 0.10.1
     [_]  using region us-east-1
     [_]  IAM OpenID Connect provider is associated with cluster "my-cluster" in "us-east-1"
   ```

現在叢集具有 OIDC 身分提供者，接下來建立角色，並授與 Kubernetes 服務帳戶許可來擔任該角色。

##### 取得 OIDC ID
<a name="get-your-oidc-id"></a>

若要設定服務帳戶，首先請使用下列命令取得 OpenID Connect 發行者 URL：

```
aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} \
      --query cluster.identity.oidc.issuer --output text
```

此命令會傳回類似以下內容的 URL：

```
https://oidc.eks.${AWS_REGION}.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

在此 URL 中，值 D48675832CA65BD10A532F597OIDCID 為 OIDC ID。叢集的 OIDC ID 與此不同。您需要此 OIDC ID 值才能建立角色。

 如果您的輸出是 `None`，則意味著你的用戶端版本是舊的。若要解決此問題，請執行下列命令：

```
aws eks describe-cluster --region ${AWS_REGION} --query cluster --name ${CLUSTER_NAME} --output text | grep OIDC
```

傳回的 OIDC URL 如下：

```
OIDC https://oidc.eks.us-east-1.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

##### 建立 IAM 角色
<a name="create-your-iam-role"></a>

1. 建立一個名為 `trust.json` 的檔案，並將以下信任關係代碼塊插入其中。請務必使用與叢集對應的值取代所有 `<OIDC ID>`、`<AWS account number>`、`<EKS Cluster region>` 和 `<Namespace>` 預留位置。在本指南中，使用 `my-namespace` 作為 `<Namespace>` 的值。

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
           "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:<Namespace>:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws-cn:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:<Namespace>:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

1. 執行下列命令，以建立具有 `trust.json` 中定義的信任關係的角色。此角色可讓 Amazon EKS 叢集從 IAM 取得和重新整理登入資料。

   ```
   aws iam create-role --region ${AWS_REGION} --role-name <role name> --assume-role-policy-document file://trust.json --output=text
   ```

   您的輸出看起來應如以下所示：

   ```
   ROLE    arn:aws:iam::123456789012:role/my-role 2019-11-22T21:46:10Z    /       ABCDEFSFODNN7EXAMPLE   my-role
     ASSUMEROLEPOLICYDOCUMENT        2012-10-17		 	 	 
     STATEMENT       sts:AssumeRoleWithWebIdentity   Allow
     STRINGEQUALS    sts.amazonaws.com       system:serviceaccount:my-namespace:sagemaker-k8s-operator-default
     PRINCIPAL       arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/
   ```

請注意 `ROLE ARN`。您會將此值傳遞給運算子。

##### 將 AmazonSageMakerFullAccess 政策附加到您的角色
<a name="attach-the-amazonsagemakerfullaccess-policy-to-your-role"></a>

若要授予角色 SageMaker AI 的存取權，請附加 [https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) 政策。如果想要限制運算子的權限，您可以建立並附加自訂政策。

 若要附加 `AmazonSageMakerFullAccess`，請執行下列命令：

```
aws iam attach-role-policy --role-name <role name>  --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
```

Kubernetes 服務帳戶 `sagemaker-k8s-operator-default` 應該具有 `AmazonSageMakerFullAccess` 許可。當您安裝運算子時，請確認這一點。

##### 將運算子部署到命名空間
<a name="deploy-the-operator-to-your-namespace"></a>

部署運算子時，您可以使用 YAML 檔案或 Helm Chart。

##### 使用 YAML 將運算子部署到命名空間
<a name="deploy-the-operator-to-your-namespace-using-yaml"></a>

在命名空間範圍內部署運算子分為兩個部分。第一部分是在叢集層級安裝的一組 CRD。每個 Kubernetes 叢集只需安裝一次這些資源定義。第二部分是運算子許可和部署本身。

 如果您尚未將 CRD 安裝到叢集中，請使用下列命令套用 CRD 安裝程式 YAML：

```
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/crd.yaml
```

將運算子安裝到叢集：

1. 使用以下命令下載運算子安裝程式：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/operator.yaml
   ```

1. 使用下列命令更新安裝程式 YAML，以將資源放入指定的命名空間中：

   ```
   sed -i -e 's/PLACEHOLDER-NAMESPACE/<YOUR NAMESPACE>/g' operator.yaml
   ```

1. 編輯 `operator.yaml` 檔案以將資源放入您的 `eks.amazonaws.com/role-arn`。將此處的 ARN 取代為您建立的 OIDC 型角色的 Amazon Resource Name (ARN)。

1. 使用下列命令部署叢集：

   ```
   kubectl apply -f operator.yaml
   ```

##### 使用 Helm Chart 將運算子部署到命名空間
<a name="deploy-the-operator-to-your-namespace-using-helm-charts"></a>

在命名空間範圍內部署運算子需要分為兩個部分。第一部分是在叢集層級安裝的一組 CRD。每個 Kubernetes 叢集只需安裝一次這些資源定義。第二部分是運算子許可和部署本身。使用 Helm Chart 時，您必須首先使用 `kubectl` 建立命名空間。

1. 使用以下命令複製 Helm 安裝程式目錄：

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. 導覽至 `amazon-sagemaker-operator-for-k8s/hack/charts/installer/namespaced` 資料夾。編輯 `rolebased/values.yaml` 檔案，其中包含圖表的高階參數。將此處的角色 ARN 取代為您建立的 OIDC 型角色的 Amazon Resource Name (ARN)。

1. 使用以下命令安裝 Helm Chart：

   ```
   helm install crds crd_chart/
   ```

1. 建立必要的命名空間並使用下列命令安裝運算子：

   ```
   kubectl create namespace <namespace>
   helm install --n <namespace> op operator_chart/
   ```

1. 片刻之後，系統會以 `sagemaker-operator` 名稱安裝圖表。執行下列命令來驗證是否安裝成功：

   ```
   helm ls
   ```

   您的輸出看起來應如以下所示：

   ```
   NAME                    NAMESPACE                       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
   sagemaker-operator      my-namespace                    1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

##### 驗證運算子已部署到命名空間
<a name="verify-the-operator-deployment-to-your-namespace"></a>

1. 您應該可以透過執行下列命令，查看部署至叢集的每個運算子的 SageMaker AI 自訂資源定義 (CRD)：

   ```
   kubectl get crd | grep sagemaker
   ```

   您的輸出看起來應如以下所示：

   ```
   batchtransformjobs.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   endpointconfigs.sagemaker.aws.amazon.com            2019-11-20T17:12:34Z
   hostingdeployments.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   hyperparametertuningjobs.sagemaker.aws.amazon.com   2019-11-20T17:12:34Z
   models.sagemaker.aws.amazon.com                     2019-11-20T17:12:34Z
   trainingjobs.sagemaker.aws.amazon.com               2019-11-20T17:12:34Z
   ```

1. 確定運算子 pod 已成功執行。使用下列命令列出所有 pod：

   ```
   kubectl -n my-namespace get pods
   ```

   您應該會看到命名空間 `my-namespace` 中名為 `sagemaker-k8s-operator-controller-manager-*****` 的 pod，如下所示：

   ```
   NAME                                                         READY   STATUS    RESTARTS   AGE
   sagemaker-k8s-operator-controller-manager-12345678-r8abc     2/2     Running   0          23s
   ```

#### 安裝 SageMaker AI 日誌 `kubectl` 外掛程式
<a name="install-the-amazon-sagemaker-logs-kubectl-plugin"></a>

 作為 SageMaker AI Operators for Kubernetes 的一部分，您可以針對 `kubectl` 使用 `smlogs` [外掛程式](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/)。這允許使用 `kubectl` 串流 SageMaker AI CloudWatch 日誌。`kubectl` 必須安裝到您的 [PATH](http://www.linfo.org/path_env_var.html) 上。以下命令將二進位文件放置在主目錄中的 `sagemaker-k8s-bin` 目錄中，並將該目錄新增到您的 `PATH`。

```
export os="linux"
  
wget https://amazon-sagemaker-operator-for-k8s-us-east-1.s3.amazonaws.com/kubectl-smlogs-plugin/v1/${os}.amd64.tar.gz
tar xvzf ${os}.amd64.tar.gz
  
# Move binaries to a directory in your homedir.
mkdir ~/sagemaker-k8s-bin
cp ./kubectl-smlogs.${os}.amd64/kubectl-smlogs ~/sagemaker-k8s-bin/.
  
# This line adds the binaries to your PATH in your .bashrc.
  
echo 'export PATH=$PATH:~/sagemaker-k8s-bin' >> ~/.bashrc
  
# Source your .bashrc to update environment variables:
source ~/.bashrc
```

請使用下列命令驗證 `kubectl` 外掛程式的安裝是否正確：

```
kubectl smlogs
```

如果 `kubectl` 外掛程式已正確安裝，則輸出應與以下類似：

```
View SageMaker AI logs via Kubernetes
  
Usage:
  smlogs [command]
  
Aliases:
  smlogs, SMLogs, Smlogs
  
Available Commands:
  BatchTransformJob       View BatchTransformJob logs via Kubernetes
  TrainingJob             View TrainingJob logs via Kubernetes
  help                    Help about any command
  
Flags:
   -h, --help   help for smlogs
  
Use "smlogs [command] --help" for more information about a command.
```

### 清除資源
<a name="cleanup-operator-resources"></a>

若要從叢集中解除安裝運算子，您必須先確定從叢集中刪除所有 SageMaker AI 資源。不這樣做會導致運算子刪除操作掛起。執行下列命令來停止所有工作：

```
# Delete all SageMaker AI jobs from Kubernetes
kubectl delete --all --all-namespaces hyperparametertuningjob.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces trainingjobs.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces batchtransformjob.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces hostingdeployment.sagemaker.aws.amazon.com
```

您應該會看到類似下列的輸出：

```
$ kubectl delete --all --all-namespaces trainingjobs.sagemaker.aws.amazon.com
trainingjobs.sagemaker.aws.amazon.com "xgboost-mnist-from-for-s3" deleted
  
$ kubectl delete --all --all-namespaces hyperparametertuningjob.sagemaker.aws.amazon.com
hyperparametertuningjob.sagemaker.aws.amazon.com "xgboost-mnist-hpo" deleted
  
$ kubectl delete --all --all-namespaces batchtransformjob.sagemaker.aws.amazon.com
batchtransformjob.sagemaker.aws.amazon.com "xgboost-mnist" deleted
  
$ kubectl delete --all --all-namespaces hostingdeployment.sagemaker.aws.amazon.com
hostingdeployment.sagemaker.aws.amazon.com "host-xgboost" deleted
```

刪除所有 SageMaker AI 任務後，請參閱[刪除運算子](#delete-operators)，以從叢集中刪除運算子。

### 刪除運算子
<a name="delete-operators"></a>

#### 刪除叢集型運算子
<a name="delete-cluster-based-operators"></a>

##### 使用 YAML 安裝的運算子
<a name="operators-installed-using-yaml"></a>

若要從叢集中解除安裝運算子，請確保已從叢集中刪除所有 SageMaker AI 資源。不這樣做會導致運算子刪除操作掛起。

**注意**  
刪除叢集之前，請務必刪除叢集中的所有 SageMaker AI 資源。如需詳細資訊，請參閱[清除資源](#cleanup-operator-resources)。

刪除所有 SageMaker AI 任務後，請使用 `kubectl`，以從叢集中刪除運算子：

```
# Delete the operator and its resources
kubectl delete -f /installer.yaml
```

您應該會看到類似下列的輸出：

```
$ kubectl delete -f raw-yaml/installer.yaml
namespace "sagemaker-k8s-operator-system" deleted
customresourcedefinition.apiextensions.k8s.io "batchtransformjobs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "endpointconfigs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "hostingdeployments.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "hyperparametertuningjobs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "models.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "trainingjobs.sagemaker.aws.amazon.com" deleted
role.rbac.authorization.k8s.io "sagemaker-k8s-operator-leader-election-role" deleted
clusterrole.rbac.authorization.k8s.io "sagemaker-k8s-operator-manager-role" deleted
clusterrole.rbac.authorization.k8s.io "sagemaker-k8s-operator-proxy-role" deleted
rolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-leader-election-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-manager-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-proxy-rolebinding" deleted
service "sagemaker-k8s-operator-controller-manager-metrics-service" deleted
deployment.apps "sagemaker-k8s-operator-controller-manager" deleted
secrets "sagemaker-k8s-operator-abcde" deleted
```

##### 使用 Helm Chart 安裝的運算子
<a name="operators-installed-using-helm-charts"></a>

若要刪除運算子 CRD，請先刪除所有執行中的工作。然後使用以下命令刪除用於部署運算子的 Helm Chart：

```
# get the helm charts
helm ls
  
# delete the charts
helm delete <chart_name>
```

#### 刪除命名空間型運算子
<a name="delete-namespace-based-operators"></a>

##### 使用 YAML 安裝的運算子
<a name="operators-installed-with-yaml"></a>

若要從叢集中解除安裝運算子，首先請確保已從叢集中刪除所有 SageMaker AI 資源。不這樣做會導致運算子刪除操作掛起。

**注意**  
刪除叢集之前，請務必刪除叢集中的所有 SageMaker AI 資源。如需詳細資訊，請參閱[清除資源](#cleanup-operator-resources)。

刪除所有 SageMaker AI 任務後，請首先使用 `kubectl` 從命名空間刪除運算子，然後從叢集中刪除 CRD。執行下列命令以從叢集中刪除運算子：

```
# Delete the operator using the same yaml file that was used to install the operator
kubectl delete -f operator.yaml
  
# Now delete the CRDs using the CRD installer yaml
kubectl delete -f https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/crd.yaml
  
# Now you can delete the namespace if you want
kubectl delete namespace <namespace>
```

##### 使用 Helm Chart 安裝的運算子
<a name="operators-installed-with-helm-charts"></a>

若要刪除運算子 CRD，請先刪除所有執行中的工作。然後使用以下命令刪除用於部署運算子的 Helm Chart：

```
# Delete the operator
helm delete <chart_name>
  
# delete the crds
helm delete crds
  
# optionally delete the namespace
kubectl delete namespace <namespace>
```

### 故障診斷
<a name="troubleshooting"></a>

#### 對失敗的工作進行偵錯
<a name="debugging-a-failed-job"></a>

請使用這些步驟來對失敗的工作進行偵錯。
+ 以執行下列命令來檢查工作狀態：

  ```
  kubectl get <CRD Type> <job name>
  ```
+ 如果任務是在 SageMaker AI 中建立的，您可以使用下列命令來查看 `STATUS` 和 `SageMaker Job Name`：

  ```
  kubectl get <crd type> <job name>
  ```
+ 您可以使用以下命令，透過 `smlogs` 來查找問題的原因：

  ```
  kubectl smlogs <crd type> <job name>
  ```
+  您也可以使用下列命令，透過 `describe` 來取得與作業相關的詳細資料。輸出中有一個 `additional` 欄位，其中包含有關工作狀態的詳細資訊。

  ```
  kubectl describe <crd type> <job name>
  ```
+ 如果未在 SageMaker AI 中建立任務，請使用運算子 Pod 的日誌來尋找問題的原因，如下所示：

  ```
  $ kubectl get pods -A | grep sagemaker
  # Output:
  sagemaker-k8s-operator-system   sagemaker-k8s-operator-controller-manager-5cd7df4d74-wh22z   2/2     Running   0          3h33m
    
  $ kubectl logs -p <pod name> -c manager -n sagemaker-k8s-operator-system
  ```

#### 刪除運算子 CRD
<a name="deleting-an-operator-crd"></a>

如果刪除工作失敗，請檢查運算子是否正在執行。如果運算子沒有執行，則您必須執行以下步驟刪除終結器：

1. 在新的終端機中，使用 `kubectl edit` 在編輯器中打開工作，如下所示：

   ```
   kubectl edit <crd type> <job name>
   ```

1. 透過從檔案中移除以下兩行來編輯工作，以刪除終結器。儲存檔案，該作業將被刪除。

   ```
   finalizers:
     - sagemaker-operator-finalizer
   ```

### 每個區域的映像和 SMLog
<a name="images-and-smlogs-in-each-region"></a>

下表列出每個區域中可用的運算子映像和 SMlog。


|  區域  |  控制器映像  |  Linux Smlog  | 
| --- | --- | --- | 
|  us-east-1  |  957583890962.dkr.ecr.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  us-east-2  |  922499468684.dkr.ecr.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  us-west-2  |  640106867763.dkr.ecr.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-west-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-west-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  eu-west-1  |  613661167059.dkr.ecr.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-eu-west-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-eu-west-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 

# 使用 Amazon SageMaker AI 任務
<a name="kubernetes-sagemaker-jobs"></a>

本節以 [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s) 的原始版本為基礎。

**重要**  
我們正在停止對 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 原始版本的開發和技術支援。  
如果您目前使用的是 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的 `v1.2.2` 或以下版本，我們建議您將資源遷移到 [Amazon SageMaker 的 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代的 SageMaker Operators for Kubernetes，以 [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/) 為基礎。  
如需與移轉步驟相關的資訊，請參閱[將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)。  
如需與終止支援 SageMaker Operators for Kubernetes 原始版本相關的常見問題的答案，請參閱[宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

若要使用 Operators for Kubernetes 執行 Amazon SageMaker AI 任務，您可以套用 YAML 檔案或使用提供的 Helm Chart。

下列教學課程中的所有範例運算子工作都使用從公開 MNIST 資料集擷取的範例資料。若要執行這些範例，請將資料集下載到 Amazon S3 儲存貯體。您可以在[下載 MNIST 資料集](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data-pull-data.html)中找到資料集。

**Topics**
+ [TrainingJob 運算子](#trainingjob-operator)
+ [HyperParameterTuningJob 運算子](#hyperparametertuningjobs-operator)
+ [BatchTransformJob 運算子](#batchtransformjobs-operator)
+ [HostingDeployment 運算子](#hosting-deployment-operator)
+ [ProcessingJob 運算子](#kubernetes-processing-job-operator)
+ [HostingAutoscalingPolicy (HAP) 運算子](#kubernetes-hap-operator)

## TrainingJob 運算子
<a name="trainingjob-operator"></a>

訓練任務運算子會在 SageMaker AI 中為您啟動訓練任務，將您指定的訓練任務規格與 SageMaker AI 進行協調。您可以在 SageMaker AI [CreateTrainingJob API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html)中進一步了解 SageMaker 訓練任務。

**Topics**
+ [使用 YAML 檔案建立 TrainingJob](#create-a-trainingjob-using-a-simple-yaml-file)
+ [使用 Helm Chart 建立 TrainingJob](#create-a-trainingjob-using-a-helm-chart)
+ [列出 TrainingJobs](#list-training-jobs)
+ [描述 TrainingJob](#describe-a-training-job)
+ [檢視 TrainingJobs 的日誌](#view-logs-from-training-jobs)
+ [刪除 TrainingJobs](#delete-training-jobs)

### 使用 YAML 檔案建立 TrainingJob
<a name="create-a-trainingjob-using-a-simple-yaml-file"></a>

1. 使用下列命令下載範例 YAML 檔案以進行訓練：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-trainingjob.yaml
   ```

1. 編輯 `xgboost-mnist-trainingjob.yaml` 檔案，將 `roleArn` 參數取代為 `<sagemaker-execution-role>`，以及將 `outputPath` 取代為 SageMaker AI 執行角色具有寫入存取權的 Amazon S3 儲存貯體。`roleArn` 必須具有許可，這樣 SageMaker AI 才能代表您存取 Amazon S3、Amazon CloudWatch 和其他服務。如需建立 SageMaker AI 執行角色的詳細資訊，請參閱 [SageMaker AI 角色](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms)。使用下列命令來套用 YAML 檔案：

   ```
   kubectl apply -f xgboost-mnist-trainingjob.yaml
   ```

### 使用 Helm Chart 建立 TrainingJob
<a name="create-a-trainingjob-using-a-helm-chart"></a>

您可以使用 Helm Chart 執行 TrainingJobs。

1. 使用以下命令複製 GitHub 儲存庫以取得來源代碼：

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. 導覽至 `amazon-sagemaker-operator-for-k8s/hack/charts/training-jobs/` 資料夾並編輯 `values.yaml` 檔案，以將 `rolearn` 和 `outputpath` 等值取代與您的帳戶相對應的值。RoleARN 必須具有許可，這樣 SageMaker AI 才能代表您存取 Amazon S3、Amazon CloudWatch 和其他服務。如需建立 SageMaker AI 執行角色的詳細資訊，請參閱 [SageMaker AI 角色](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms)。

#### 建立 TrainingJob
<a name="create-the-training-job"></a>

將角色和 Amazon S3 儲存貯體取代為 `values.yaml` 中適當的值後，您可以使用下列命令建立訓練工作：

```
helm install . --generate-name
```

您的輸出看起來應如以下所示：

```
NAME: chart-12345678
LAST DEPLOYED: Wed Nov 20 23:35:49 2019
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for installing the sagemaker-k8s-trainingjob.
```

#### 驗證您的訓練 Helm Chart
<a name="verify-your-training-helm-chart"></a>

要驗證 Helm Chart 是否成功建立，請執行以下命令：

```
helm ls
```

您的輸出看起來應如以下所示：

```
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
chart-12345678        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
rolebased-12345678    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
```

`helm install` 會建立 `TrainingJob` Kubernetes 資源。運算子會在 SageMaker AI 中啟動實際的訓練任務，並更新 `TrainingJob` Kubernetes 資源以反映 SageMaker AI 中的任務狀態。您在工作期間使用的 SageMaker AI 資源會產生費用。工作完成或停止後，就不會再產生任何費用。

**注意**：SageMaker AI 不允許您更新執行中的訓練任務。您無法編輯任何參數並重新套用設定檔。變更中繼資料名稱或刪除現有工作，然後建立新工作。與 Kubeflow 中的 TFJob 等現有訓練工作運算子類似，`update` 不受支援。

### 列出 TrainingJobs
<a name="list-training-jobs"></a>

使用下列命令列出使用 Kubernetes 運算子建立的所有工作：

```
kubectl get TrainingJob
```

列出所有工作的輸出應與以下類似：

```
kubectl get trainingjobs
NAME                        STATUS       SECONDARY-STATUS   CREATION-TIME          SAGEMAKER-JOB-NAME
xgboost-mnist-from-for-s3   InProgress   Starting           2019-11-20T23:42:35Z   xgboost-mnist-from-for-s3-examplef11eab94e0ed4671d5a8f
```

訓練工作會在工作完成或失敗後繼續列出。您可以依照[刪除 TrainingJobs](#delete-training-jobs)步驟從清單中移除 `TrainingJob` 工作。對於已完成或停止的工作，SageMaker AI 資源不會產生任何費用。

#### TrainingJob 狀態值
<a name="training-job-status-values"></a>

`STATUS` 欄位可以是以下其中一個值：
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

這些狀態直接來自 SageMaker AI 官方 [API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeTrainingJob.html#SageMaker-DescribeTrainingJob-response-TrainingJobStatus)。

除了 SageMaker AI 的官方狀態之外，`STATUS` 還有可能是 `SynchronizingK8sJobWithSageMaker`。這表示運算子尚未處理工作。

#### 次要狀態值
<a name="secondary-status-values"></a>

次要狀態直接來自 SageMaker AI 官方 [API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeTrainingJob.html#SageMaker-DescribeTrainingJob-response-SecondaryStatus)。其中包含有關工作狀態的詳細資訊。

### 描述 TrainingJob
<a name="describe-a-training-job"></a>

您可以使用 `describe` `kubectl` 命令取得有關訓練工作的更多詳細資訊。這通常用於對問題進行偵錯或檢查訓練工作的參數。若要取得與訓練工作相關的資訊，請使用下列命令：

```
kubectl describe trainingjob xgboost-mnist-from-for-s3
```

訓練工作的輸出應與以下類似：

```
Name:         xgboost-mnist-from-for-s3
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  sagemaker.aws.amazon.com/v1
Kind:         TrainingJob
Metadata:
  Creation Timestamp:  2019-11-20T23:42:35Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  23119
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/trainingjobs/xgboost-mnist-from-for-s3
  UID:               6d7uiui-0bef-11ea-b94e-0ed467example
Spec:
  Algorithm Specification:
    Training Image:       8256416981234.dkr.ecr.us-east-2.amazonaws.com/xgboost:1
    Training Input Mode:  File
  Hyper Parameters:
    Name:   eta
    Value:  0.2
    Name:   gamma
    Value:  4
    Name:   max_depth
    Value:  5
    Name:   min_child_weight
    Value:  6
    Name:   num_class
    Value:  10
    Name:   num_round
    Value:  10
    Name:   objective
    Value:  multi:softmax
    Name:   silent
    Value:  0
  Input Data Config:
    Channel Name:      train
    Compression Type:  None
    Content Type:      text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Distribution Type:  FullyReplicated
        S 3 Data Type:               S3Prefix
        S 3 Uri:                     https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/train/
    Channel Name:                    validation
    Compression Type:                None
    Content Type:                    text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Distribution Type:  FullyReplicated
        S 3 Data Type:               S3Prefix
        S 3 Uri:                     https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/validation/
  Output Data Config:
    S 3 Output Path:  s3://amzn-s3-demo-bucket/sagemaker/xgboost-mnist/xgboost/
  Region:             us-east-2
  Resource Config:
    Instance Count:     1
    Instance Type:      ml.m4.xlarge
    Volume Size In GB:  5
  Role Arn:             arn:aws:iam::12345678910:role/service-role/AmazonSageMaker-ExecutionRole
  Stopping Condition:
    Max Runtime In Seconds:  86400
  Training Job Name:         xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0example
Status:
  Cloud Watch Log URL:           https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logStream:group=/aws/sagemaker/TrainingJobs;prefix=<example>;streamFilter=typeLogStreamPrefix
  Last Check Time:               2019-11-20T23:44:29Z
  Sage Maker Training Job Name:  xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94eexample
  Secondary Status:              Downloading
  Training Job Status:           InProgress
Events:                          <none>
```

### 檢視 TrainingJobs 的日誌
<a name="view-logs-from-training-jobs"></a>

使用下列命令查看 `kmeans-mnist` 訓練工作的日誌：

```
kubectl smlogs trainingjob xgboost-mnist-from-for-s3
```

您的輸出應該類似以下內容：執行個體的日誌會按時間順序排序。

```
"xgboost-mnist-from-for-s3" has SageMaker TrainingJobName "xgboost-mnist-from-for-s3-123456789" in region "us-east-2", status "InProgress" and secondary status "Starting"
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC Arguments: train
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] Running standalone xgboost training.
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] File size need to be processed in the node: 1122.95mb. Available memory size in the node: 8586.0mb
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] Determined delimiter of CSV input is ','
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [23:45:22] S3DistributionType set as FullyReplicated
```

### 刪除 TrainingJobs
<a name="delete-training-jobs"></a>

使用下列命令停止 Amazon SageMaker AI 上的訓練任務：

```
kubectl delete trainingjob xgboost-mnist-from-for-s3
```

此命令會從 Kubernetes 移除 SageMaker 訓練工作。此命令會傳回下列輸出：

```
trainingjob.sagemaker.aws.amazon.com "xgboost-mnist-from-for-s3" deleted
```

如果任務在 SageMaker AI 上仍在進行中，任務將停止。您的任務完成或停止後，SageMaker AI 資源就不會對您產生任何費用。

**注意**：SageMaker AI 不會刪除訓練任務。已停止的工作會繼續在 SageMaker AI 主控台上展示。`delete` 命令大約需要 2 分鐘的時間來清除 SageMaker AI 中的資源。

## HyperParameterTuningJob 運算子
<a name="hyperparametertuningjobs-operator"></a>

超參數調校任務運算會在 SageMaker AI 中啟動您指定的超參數調校任務規格，以與 SageMaker AI 進行協調。您可以在 SageMaker AI [CreateHyperParameterTuningJob API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateHyperParameterTuningJob.html)中進一步了解 SageMaker AI 超參數調校任務。

**Topics**
+ [使用 YAML 檔案建立 HyperparameterTuningJob](#create-a-hyperparametertuningjob-using-a-simple-yaml-file)
+ [使用 Helm Chart 建立 HyperparameterTuningJob](#create-a-hyperparametertuningjob-using-a-helm-chart)
+ [列出 HyperparameterTuningJobs](#list-hyperparameter-tuning-jobs)
+ [描述 HyperparameterTuningJob](#describe-a-hyperparameter-tuning-job)
+ [檢視 HyperparameterTuningJobs 日誌](#view-logs-from-hyperparametertuning-jobs)
+ [刪除 HyperparameterTuningJob](#delete-hyperparametertuning-jobs)

### 使用 YAML 檔案建立 HyperparameterTuningJob
<a name="create-a-hyperparametertuningjob-using-a-simple-yaml-file"></a>

1. 使用下列命令，下載超參數調整工作的範例 YAML 檔案：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-hpo.yaml
   ```

1. 編輯 `xgboost-mnist-hpo.yaml` 檔案，以將 `roleArn` 參數取代為您的 `sagemaker-execution-role`。您必須將 `s3InputPath` 和 `s3OutputPath` 變更為與帳戶對應的值，超參數調校才能成功。使用下列命令來套用更新的 YAML 檔案：

   ```
   kubectl apply -f xgboost-mnist-hpo.yaml
   ```

### 使用 Helm Chart 建立 HyperparameterTuningJob
<a name="create-a-hyperparametertuningjob-using-a-helm-chart"></a>

您可以使用 Helm Chart 來執行超參數調校工作。

1. 使用以下命令複製 GitHub 儲存庫以取得來源代碼：

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. 導覽至 `amazon-sagemaker-operator-for-k8s/hack/charts/hyperparameter-tuning-jobs/` 資料夾。

1. 編輯 `values.yaml` 檔案，以將 `roleArn` 參數取代為您的 `sagemaker-execution-role`。您必須將 `s3InputPath` 和 `s3OutputPath` 變更為與您的帳戶對應的值，超參數調校才能成功。

#### 建立 HyperparameterTuningJob
<a name="create-the-hpo-job"></a>

將角色和 Amazon S3 路徑取代為 `values.yaml` 中適當的值後，您可以使用下列命令建立超參數調校工作：

```
helm install . --generate-name
```

您的輸出應該類似以下內容：

```
NAME: chart-1574292948
LAST DEPLOYED: Wed Nov 20 23:35:49 2019
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for installing the sagemaker-k8s-hyperparametertuningjob.
```

#### 驗證圖表安裝
<a name="verify-chart-installation"></a>

要驗證 Helm Chart 是否已成功建立，請執行以下命令：

```
helm ls
```

您的輸出看起來應如以下所示：

```
NAME                    NAMESPACE       REVISION        UPDATED
chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-hyperparametertuningjob-0.1.0                               STATUS          CHART                           APP VERSION
chart-1574292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
rolebased-1574291698    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
```

`helm install` 會建立 `HyperParameterTuningJob` Kubernetes 資源。運算子會在 SageMaker AI 中啟動實際的超參數最佳化任務，並更新 `HyperParameterTuningJob` Kubernetes 資源以反映 SageMaker AI 中的任務狀態。您在工作期間使用的 SageMaker AI 資源會產生費用。工作完成或停止後，就不會再產生任何費用。

**注意**：SageMaker AI 不允許您更新執行中的超參數調校任務。您無法編輯任何參數並重新套用設定檔。您必須變更中繼資料名稱或刪除現有工作，然後建立新的工作。與 Kubeflow 中的 `TFJob` 等現有訓練工作運算子類似，`update` 不受支援。

### 列出 HyperparameterTuningJobs
<a name="list-hyperparameter-tuning-jobs"></a>

使用下列命令列出使用 Kubernetes 運算子建立的所有工作：

```
kubectl get hyperparametertuningjob
```

您的輸出看起來應如以下所示：

```
NAME         STATUS      CREATION-TIME          COMPLETED   INPROGRESS   ERRORS   STOPPED   BEST-TRAINING-JOB                               SAGEMAKER-JOB-NAME
xgboost-mnist-hpo   Completed   2019-10-17T01:15:52Z   10          0            0        0         xgboostha92f5e3cf07b11e9bf6c06d6-009-4c7a123   xgboostha92f5e3cf07b11e9bf6c123
```

超參數調校工作會在工作完成或失敗後繼續列出。您可以依照[刪除 HyperparameterTuningJob](#delete-hyperparametertuning-jobs) 步驟從清單中移除 `hyperparametertuningjob`。對於已完成或停止的工作，SageMaker AI 資源不會產生任何費用。

#### 超參數調校工作狀態值
<a name="hyperparameter-tuning-job-status-values"></a>

`STATUS` 欄位可以是以下其中一個值：
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

這些狀態直接來自 SageMaker AI 官方 [API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeHyperParameterTuningJob.html#SageMaker-DescribeHyperParameterTuningJob-response-HyperParameterTuningJobStatus)。

除了 SageMaker AI 的官方狀態之外，`STATUS` 還有可能是 `SynchronizingK8sJobWithSageMaker`。這表示運算子尚未處理工作。

#### 狀態計數器
<a name="status-counters"></a>

有多個輸出計數器，例如 `COMPLETED` 和 `INPROGRESS`。它們分別代表了已完成和正在進行的訓練工作的數量。如需與如何確定這些資訊相關的詳細資訊，請參閱 SageMaker API 文件中的[訓練工作狀態計數器](https://docs.aws.amazon.com/sagemaker/latest/dg/API_TrainingJobStatusCounters.html)。

#### 最佳 TrainingJob
<a name="best-training-job"></a>

此欄包含對選取的指標進行了最佳化的 `TrainingJob` 名稱。

若要查看調整過的超參數的摘要，請執行下列命令：

```
kubectl describe hyperparametertuningjob xgboost-mnist-hpo
```

若要查看與 `TrainingJob` 相關的詳細資訊，請執行下列命令：

```
kubectl describe trainingjobs <job name>
```

#### 產生的 TrainingJobs
<a name="spawned-training-jobs"></a>

您也可以執行下列命令，追蹤 `HyperparameterTuningJob` 在 Kubernetes 中啟動的所有 10 項訓練工作：

```
kubectl get trainingjobs
```

### 描述 HyperparameterTuningJob
<a name="describe-a-hyperparameter-tuning-job"></a>

您可以使用 `describe` `kubectl` 命令取得偵錯詳細資訊。

```
kubectl describe hyperparametertuningjob xgboost-mnist-hpo
```

除了與調校任務相關的資訊外，SageMaker AI Operator for Kubernetes 還會在 `describe` 輸出中公開超參數調校找到的[最佳訓練任務](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-monitor.html#automatic-model-tuning-best-training-job)，如下所示：

```
Name:         xgboost-mnist-hpo
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"HyperparameterTuningJob","metadata":{"annotations":{},"name":"xgboost-mnist-hpo","namespace":...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         HyperparameterTuningJob
Metadata:
  Creation Timestamp:  2019-10-17T01:15:52Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  8167
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/hyperparametertuningjobs/xgboost-mnist-hpo
  UID:               a92f5e3c-f07b-11e9-bf6c-06d6f303uidu
Spec:
  Hyper Parameter Tuning Job Config:
    Hyper Parameter Tuning Job Objective:
      Metric Name:  validation:error
      Type:         Minimize
    Parameter Ranges:
      Integer Parameter Ranges:
        Max Value:     20
        Min Value:     10
        Name:          num_round
        Scaling Type:  Linear
    Resource Limits:
      Max Number Of Training Jobs:     10
      Max Parallel Training Jobs:      10
    Strategy:                          Bayesian
    Training Job Early Stopping Type:  Off
  Hyper Parameter Tuning Job Name:     xgboostha92f5e3cf07b11e9bf6c06d6
  Region:                              us-east-2
  Training Job Definition:
    Algorithm Specification:
      Training Image:       12345678910.dkr.ecr.us-east-2.amazonaws.com/xgboost:1
      Training Input Mode:  File
    Input Data Config:
      Channel Name:  train
      Content Type:  text/csv
      Data Source:
        s3DataSource:
          s3DataDistributionType:  FullyReplicated
          s3DataType:              S3Prefix
          s3Uri:                   https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/train/
      Channel Name:                validation
      Content Type:                text/csv
      Data Source:
        s3DataSource:
          s3DataDistributionType:  FullyReplicated
          s3DataType:              S3Prefix
          s3Uri:                   https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/validation/
    Output Data Config:
      s3OutputPath:  https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/xgboost
    Resource Config:
      Instance Count:     1
      Instance Type:      ml.m4.xlarge
      Volume Size In GB:  5
    Role Arn:             arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
    Static Hyper Parameters:
      Name:   base_score
      Value:  0.5
      Name:   booster
      Value:  gbtree
      Name:   csv_weights
      Value:  0
      Name:   dsplit
      Value:  row
      Name:   grow_policy
      Value:  depthwise
      Name:   lambda_bias
      Value:  0.0
      Name:   max_bin
      Value:  256
      Name:   max_leaves
      Value:  0
      Name:   normalize_type
      Value:  tree
      Name:   objective
      Value:  reg:linear
      Name:   one_drop
      Value:  0
      Name:   prob_buffer_row
      Value:  1.0
      Name:   process_type
      Value:  default
      Name:   rate_drop
      Value:  0.0
      Name:   refresh_leaf
      Value:  1
      Name:   sample_type
      Value:  uniform
      Name:   scale_pos_weight
      Value:  1.0
      Name:   silent
      Value:  0
      Name:   sketch_eps
      Value:  0.03
      Name:   skip_drop
      Value:  0.0
      Name:   tree_method
      Value:  auto
      Name:   tweedie_variance_power
      Value:  1.5
    Stopping Condition:
      Max Runtime In Seconds:  86400
Status:
  Best Training Job:
    Creation Time:  2019-10-17T01:16:14Z
    Final Hyper Parameter Tuning Job Objective Metric:
      Metric Name:        validation:error
      Value:
    Objective Status:     Succeeded
    Training End Time:    2019-10-17T01:20:24Z
    Training Job Arn:     arn:aws:sagemaker:us-east-2:123456789012:training-job/xgboostha92f5e3cf07b11e9bf6c06d6-009-4sample
    Training Job Name:    xgboostha92f5e3cf07b11e9bf6c06d6-009-4c7a3059
    Training Job Status:  Completed
    Training Start Time:  2019-10-17T01:18:35Z
    Tuned Hyper Parameters:
      Name:                                    num_round
      Value:                                   18
  Hyper Parameter Tuning Job Status:           Completed
  Last Check Time:                             2019-10-17T01:21:01Z
  Sage Maker Hyper Parameter Tuning Job Name:  xgboostha92f5e3cf07b11e9bf6c06d6
  Training Job Status Counters:
    Completed:            10
    In Progress:          0
    Non Retryable Error:  0
    Retryable Error:      0
    Stopped:              0
    Total Error:          0
Events:                   <none>
```

### 檢視 HyperparameterTuningJobs 日誌
<a name="view-logs-from-hyperparametertuning-jobs"></a>

超參數調校工作沒有日誌，但它們啟動的所有訓練工作都有日誌。這些記錄檔可以存取，就像是一般訓練工作一樣。如需詳細資訊，請參閱[檢視 TrainingJobs 的日誌](#view-logs-from-training-jobs)。

### 刪除 HyperparameterTuningJob
<a name="delete-hyperparametertuning-jobs"></a>

使用下列命令停止 SageMaker AI 中的超參數任務。

```
kubectl delete hyperparametertuningjob xgboost-mnist-hpo
```

此命令會從您的 Kubernetes 叢集中移除超參數調校任務和相關聯的訓練任務，並在 SageMaker AI 中停止這些任務。對於已停止或完成的工作，SageMaker AI 資源不會產生任何費用。SageMaker AI 不會刪除超參數調校任務。已停止的工作會繼續在 SageMaker AI 主控台上展示。

您的輸出看起來應如以下所示：

```
hyperparametertuningjob.sagemaker.aws.amazon.com "xgboost-mnist-hpo" deleted
```

**注意**：刪除命令大約需要 2 分鐘的時間來清除 SageMaker AI 中的資源。

## BatchTransformJob 運算子
<a name="batchtransformjobs-operator"></a>

批次轉換任務運算子會在 SageMaker AI 中啟動批次轉換任務，將您指定的批次轉換任務規格與 SageMaker AI 進行協調。您可以在 SageMaker AI [CreateTransformJob API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html)中進一步了解 SageMaker AI 批次轉換任務。

**Topics**
+ [使用 YAML 檔案建立 BatchTransformJob](#create-a-batchtransformjob-using-a-simple-yaml-file)
+ [使用 Helm Chart 建立 BatchTransformJob](#create-a-batchtransformjob-using-a-helm-chart)
+ [列出 BatchTransformJobs](#list-batch-transform-jobs)
+ [描述 BatchTransformJob](#describe-a-batch-transform-job)
+ [檢視 BatchTransformJobs 日誌](#view-logs-from-batch-transform-jobs)
+ [刪除 BatchTransformJob](#delete-a-batch-transform-job)

### 使用 YAML 檔案建立 BatchTransformJob
<a name="create-a-batchtransformjob-using-a-simple-yaml-file"></a>

1. 使用下列命令下載批次轉換工作的範例 YAML 檔案：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-batchtransform.yaml
   ```

1. 編輯 `xgboost-mnist-batchtransform.yaml` 檔案，變更必要的參數，將 `inputdataconfig` 取代為您的輸入資料，以及將 `s3OutputPath` 取代為 SageMaker AI 執行角色具有寫入存取權的 Amazon S3 儲存貯體。

1. 使用下列命令來套用 YAML 檔案：

   ```
   kubectl apply -f xgboost-mnist-batchtransform.yaml
   ```

### 使用 Helm Chart 建立 BatchTransformJob
<a name="create-a-batchtransformjob-using-a-helm-chart"></a>

您可以使用 Helm Chart 執行批次轉換工作。

#### 取得 Ham 安裝程式目錄
<a name="get-the-helm-installer-directory"></a>

使用以下命令複製 GitHub 儲存庫以取得來源代碼：

```
git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
```

#### 設定 Helm Chart
<a name="configure-the-helm-chart"></a>

導覽至 `amazon-sagemaker-operator-for-k8s/hack/charts/batch-transform-jobs/` 資料夾。

編輯 `values.yaml` 檔案，將 `inputdataconfig` 取代為您的輸入資料，以及將 outputPath 取代為 SageMaker AI 執行角色具有寫入存取權的 S3 儲存貯體。

#### 建立 BatchTransformJob
<a name="create-a-batch-transform-job"></a>

1. 使用下列命令建立批次轉換工作：

   ```
   helm install . --generate-name
   ```

   您的輸出看起來應如以下所示：

   ```
   NAME: chart-1574292948
   LAST DEPLOYED: Wed Nov 20 23:35:49 2019
   NAMESPACE: default
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
   NOTES:
   Thanks for installing the sagemaker-k8s-batch-transform-job.
   ```

1. 要驗證 Helm Chart 是否已成功建立，請執行以下命令：

   ```
   helm ls
   NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
   chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-batchtransformjob-0.1.0
   chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-hyperparametertuningjob-0.1.0
   chart-1574292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
   rolebased-1574291698    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

   此命令會建立 `BatchTransformJob` Kubernetes 資源。運算子會在 SageMaker AI 中啟動實際的轉換任務，並更新 `BatchTransformJob` Kubernetes 資源以反映 SageMaker 中的任務狀態。您在工作期間使用的 SageMaker AI 資源會產生費用。工作完成或停止後，就不會再產生任何費用。

**注意**：SageMaker AI 不允許您更新執行中的批次轉換任務。您無法編輯任何參數並重新套用設定檔。您必須變更中繼資料名稱或刪除現有工作，然後建立新的工作。與 Kubeflow 中的 `TFJob` 等現有訓練工作運算子類似，`update` 不受支援。

### 列出 BatchTransformJobs
<a name="list-batch-transform-jobs"></a>

使用下列命令列出使用 Kubernetes 運算子建立的所有工作：

```
kubectl get batchtransformjob
```

您的輸出看起來應如以下所示：

```
NAME                                STATUS      CREATION-TIME          SAGEMAKER-JOB-NAME
xgboost-mnist-batch-transform       Completed   2019-11-18T03:44:00Z   xgboost-mnist-a88fb19809b511eaac440aa8axgboost
```

批次轉換工作會在工作完成或失敗後繼續列出。您可以依照[刪除 BatchTransformJob](#delete-a-batch-transform-job) 步驟從清單中移除 `hyperparametertuningjob`。對於已完成或停止的工作，SageMaker AI 資源不會產生任何費用。

#### 批次轉換狀態值
<a name="batch-transform-status-values"></a>

`STATUS` 欄位可以是以下其中一個值：
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

這些狀態直接來自 SageMaker AI 官方 [API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeHyperParameterTuningJob.html#SageMaker-DescribeHyperParameterTuningJob-response-HyperParameterTuningJobStatus)。

除了 SageMaker AI 的官方狀態之外，`STATUS` 還有可能是 `SynchronizingK8sJobWithSageMaker`。這表示運算子尚未處理工作。

### 描述 BatchTransformJob
<a name="describe-a-batch-transform-job"></a>

您可以使用 `describe` `kubectl` 命令取得偵錯詳細資訊。

```
kubectl describe batchtransformjob xgboost-mnist-batch-transform
```

您的輸出看起來應如以下所示：

```
Name:         xgboost-mnist-batch-transform
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"BatchTransformJob","metadata":{"annotations":{},"name":"xgboost-mnist","namespace"...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         BatchTransformJob
Metadata:
  Creation Timestamp:  2019-11-18T03:44:00Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  21990924
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/batchtransformjobs/xgboost-mnist
  UID:               a88fb198-09b5-11ea-ac44-0aa8a9UIDNUM
Spec:
  Model Name:  TrainingJob-20190814SMJOb-IKEB
  Region:      us-east-1
  Transform Input:
    Content Type:  text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Type:  S3Prefix
        S 3 Uri:        s3://amzn-s3-demo-bucket/mnist_kmeans_example/input
  Transform Job Name:   xgboost-mnist-a88fb19809b511eaac440aa8a9SMJOB
  Transform Output:
    S 3 Output Path:  s3://amzn-s3-demo-bucket/mnist_kmeans_example/output
  Transform Resources:
    Instance Count:  1
    Instance Type:   ml.m4.xlarge
Status:
  Last Check Time:                2019-11-19T22:50:40Z
  Sage Maker Transform Job Name:  xgboost-mnist-a88fb19809b511eaac440aaSMJOB
  Transform Job Status:           Completed
Events:                           <none>
```

### 檢視 BatchTransformJobs 日誌
<a name="view-logs-from-batch-transform-jobs"></a>

使用下列命令查看 `xgboost-mnist` 批次轉換工作的日誌：

```
kubectl smlogs batchtransformjob xgboost-mnist-batch-transform
```

### 刪除 BatchTransformJob
<a name="delete-a-batch-transform-job"></a>

使用下列命令在 SageMaker AI 中停止批次轉換任務。

```
kubectl delete batchTransformJob xgboost-mnist-batch-transform
```

您的輸出看起來應如以下所示：

```
batchtransformjob.sagemaker.aws.amazon.com "xgboost-mnist" deleted
```

此命令會從您的 Kubernetes 叢集中移除批次轉換任務，並在 SageMaker AI 中停止這些任務。對於已停止或完成的工作，SageMaker AI 資源不會產生任何費用。刪除作業大約需要 2 分鐘的時間來清除 SageMaker AI 中的資源。

**注意**：SageMaker AI 不會刪除批次轉換工作。已停止的工作會繼續在 SageMaker AI 主控台上展示。

## HostingDeployment 運算子
<a name="hosting-deployment-operator"></a>

HostingDeployment 運算子支援建立和刪除端點，以及更新現有端點以進行即時推論。託管部署運算子透過在 SageMaker AI 中建立模型、端點配置和端點，將您指定的託管部署任務規格與 SageMaker AI 進行協調。您可以在 SageMaker AI [CreateEndpoint API 文件](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateEndpoint.html)中進一步了解 SageMaker AI 推論。

**Topics**
+ [設定 HostingDeployment 資源](#configure-a-hostingdeployment-resource)
+ [建立 HostingDeployment](#create-a-hostingdeployment)
+ [列出 HostingDeployments](#list-hostingdeployments)
+ [描述 HostingDeployment](#describe-a-hostingdeployment)
+ [調用端點](#invoking-the-endpoint)
+ [更新 HostingDeployment](#update-hostingdeployment)
+ [刪除 HostingDeployment](#delete-the-hostingdeployment)

### 設定 HostingDeployment 資源
<a name="configure-a-hostingdeployment-resource"></a>

使用下列命令下載託管部署工作的範例 YAML 檔案：

```
wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-hostingdeployment.yaml
```

`xgboost-mnist-hostingdeployment.yaml` 檔案具有下列可視需要進行編輯的元件：
+ *ProductionVariants*。生產變體是為單一模型提供服務的一組執行個體。SageMaker AI 根據設定的權重在所有生產變體之間進行負載平衡。
+ *模型*。模型是為模型提供服務所必需的容器和執行角色 ARN。它至少需要一個容器。
+ *容器*。容器用於指定資料集和提供映像。如果您使用的是自己的自訂演算法，而不是 SageMaker AI 所提供的演算法，則推論程式碼必須符合 SageMaker AI 要求。如需詳細資訊，請參閱[使用自已的演算法搭配 SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html)。

### 建立 HostingDeployment
<a name="create-a-hostingdeployment"></a>

若要建立 HostingDeployment，請使用以下命令，透過 `kubectl` 套用檔案 `hosting.yaml`：

```
kubectl apply -f hosting.yaml
```

SageMaker AI 會使用指定的組態建立端點。您在端點存留期內使用的 SageMaker AI 資源會產生費用。刪除端點後，就不會再產生任何費用。

完成建立過程約需 10 分鐘的時間。

### 列出 HostingDeployments
<a name="list-hostingdeployments"></a>

若要驗證已建立 HostingDeployment，請使用下列命令：

```
kubectl get hostingdeployments
```

您的輸出看起來應如以下所示：

```
NAME           STATUS     SAGEMAKER-ENDPOINT-NAME
host-xgboost   Creating   host-xgboost-def0e83e0d5f11eaaa450aSMLOGS
```

#### HostingDeployment 狀態值
<a name="hostingdeployment-status-values"></a>

狀態欄位可以是以下其中一個值：
+ `SynchronizingK8sJobWithSageMaker`：運算子正準備建立端點。
+ `ReconcilingEndpoint`：運算子正在建立、更新或刪除端點資源。如果 HostingDeployment 仍處於此狀態，請使用 `kubectl describe` 以在 `Additional` 欄位中查看原因。
+ `OutOfService`：端點無法接受傳入請求。
+ `Creating`：[CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateEndpoint.html) 正在執行。
+ `Updating`：[UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpoint.html) 或 [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) 正在執行。
+ `SystemUpdating`：端點正在維護，在完成之前無法更新、刪除或重新調整規模。此維護操作不會變更任何客戶指定的值，例如 VPC 組態、 AWS KMS 加密、模型、執行個體類型或執行個體計數。
+ `RollingBack`：端點無法縱向擴展、縮減規模或變更變體加權，且正在回復至之前的組態。回復完成後，端點會返回 `InService` 狀態。此過渡狀態僅適用於在 [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) 呼叫中，或在明確呼叫 [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) 作業時已開啟自動擴展，且正在進行變體加權或容量變更的端點。
+ `InService`：端點可以處理傳入請求。
+ `Deleting`：[DeleteEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DeleteEndpoint.html) 正在執行。
+ `Failed`：無法建立、更新端點或重新調整端點的規模。透過 [DescribeEndpoint:FailureReason](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeEndpoint.html#SageMaker-DescribeEndpoint-response-FailureReason) 取得與失敗原因相關的資訊。[DeleteEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DeleteEndpoint.html) 是唯一可以在失敗端點上執行的作業。

### 描述 HostingDeployment
<a name="describe-a-hostingdeployment"></a>

您可以使用 `describe` `kubectl` 命令取得偵錯詳細資訊。

```
kubectl describe hostingdeployment
```

您的輸出看起來應如以下所示：

```
Name:         host-xgboost
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"HostingDeployment","metadata":{"annotations":{},"name":"host-xgboost","namespace":"def..."
API Version:  sagemaker.aws.amazon.com/v1
Kind:         HostingDeployment
Metadata:
  Creation Timestamp:  2019-11-22T19:40:00Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        1
  Resource Version:  4258134
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/hostingdeployments/host-xgboost
  UID:               def0e83e-0d5f-11ea-aa45-0a3507uiduid
Spec:
  Containers:
    Container Hostname:  xgboost
    Image:               123456789012.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
    Model Data URL:      s3://amzn-s3-demo-bucket/inference/xgboost-mnist/model.tar.gz
  Models:
    Containers:
      xgboost
    Execution Role Arn:  arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
    Name:                xgboost-model
    Primary Container:   xgboost
  Production Variants:
    Initial Instance Count:  1
    Instance Type:           ml.c5.large
    Model Name:              xgboost-model
    Variant Name:            all-traffic
  Region:                    us-east-2
Status:
  Creation Time:         2019-11-22T19:40:04Z
  Endpoint Arn:          arn:aws:sagemaker:us-east-2:123456789012:endpoint/host-xgboost-def0e83e0d5f11eaaaexample
  Endpoint Config Name:  host-xgboost-1-def0e83e0d5f11e-e08f6c510d5f11eaaa450aexample
  Endpoint Name:         host-xgboost-def0e83e0d5f11eaaa450a350733ba06
  Endpoint Status:       Creating
  Endpoint URL:          https://runtime.sagemaker.us-east-2.amazonaws.com/endpoints/host-xgboost-def0e83e0d5f11eaaaexample/invocations
  Last Check Time:       2019-11-22T19:43:57Z
  Last Modified Time:    2019-11-22T19:40:04Z
  Model Names:
    Name:   xgboost-model
    Value:  xgboost-model-1-def0e83e0d5f11-df5cc9fd0d5f11eaaa450aexample
Events:     <none>
```

狀態欄位使用下列欄位提供更多資訊：
+ `Additional`：與託管部署狀態相關的其他資訊。此欄位是選填的，只有在發生錯誤時才會填入。
+ `Creation Time`：在 SageMaker AI 中建立了端點時。
+ `Endpoint ARN`：SageMaker AI 端點 ARN。
+ `Endpoint Config Name`：端點組態的 SageMaker AI 名稱。
+ `Endpoint Name`：端點的 SageMaker AI 名稱。
+ `Endpoint Status`：端點的狀態。
+ `Endpoint URL`：可用來存取端點的 HTTPS URL。如需詳細資訊，請參閱[在 SageMaker AI 託管服務上部署模型](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html)。
+ `FailureReason`：如果建立、更新或刪除命令失敗，原因會顯示於此處。
+ `Last Check Time`：運算子上次檢查端點狀態的時間。
+ `Last Modified Time`：上次修改端點的時間。
+ `Model Names`：HostingDeployment 模型名稱與 SageMaker AI 模型名稱的金鑰/值對。

### 調用端點
<a name="invoking-the-endpoint"></a>

一旦端點狀態為 `InService`，您可以透過兩種方式叫用端點：使用 AWS CLI，其會執行身分驗證和 URL 請求簽署，或使用類似 cURL 的 HTTP 用戶端。如果您使用自己的用戶端，則需要自行執行 AWS v4 URL 簽署和身分驗證。

若要使用 CLI AWS 叫用端點，請執行下列命令。確保將區域和端點名稱取代為端點的區域和 SageMaker AI 端點名稱。此資訊可從 `kubectl describe` 的輸出中取得。

```
# Invoke the endpoint with mock input data.
aws sagemaker-runtime invoke-endpoint \
  --region us-east-2 \
  --endpoint-name <endpoint name> \
  --body $(seq 784 | xargs echo | sed 's/ /,/g') \
  >(cat) \
  --content-type text/csv > /dev/null
```

例如，如果您的區域是 `us-east-2` 並且端點設定名稱為 `host-xgboost-f56b6b280d7511ea824b129926example`，則以下命令將調用端點：

```
aws sagemaker-runtime invoke-endpoint \
  --region us-east-2 \
  --endpoint-name host-xgboost-f56b6b280d7511ea824b1299example \
  --body $(seq 784 | xargs echo | sed 's/ /,/g') \
  >(cat) \
  --content-type text/csv > /dev/null
4.95847082138
```

在此處，`4.95847082138` 是模型對模擬資料的預測值。

### 更新 HostingDeployment
<a name="update-hostingdeployment"></a>

1. HostingDeployment 的狀態為 `InService` 後，就可以更新。HostingDeployment 可能需要大約 10 分鐘時間才能進入服務中狀態。。可使用以下命令來驗證狀態是否為 `InService`：

   ```
   kubectl get hostingdeployments
   ```

1. HostingDeployment 可在狀態為 `InService` 之前進行更新。運算子會等到 SageMaker AI 端點處於 `InService` 狀態再套用更新。

   若要套用更新，請修改 `hosting.yaml` 檔案。例如，將 `initialInstanceCount` 欄位從 1 變更為 2，如下所示：

   ```
   apiVersion: sagemaker.aws.amazon.com/v1
   kind: HostingDeployment
   metadata:
     name: host-xgboost
   spec:
       region: us-east-2
       productionVariants:
           - variantName: all-traffic
             modelName: xgboost-model
             initialInstanceCount: 2
             instanceType: ml.c5.large
       models:
           - name: xgboost-model
             executionRoleArn: arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
             primaryContainer: xgboost
             containers:
               - xgboost
       containers:
           - containerHostname: xgboost
             modelDataUrl: s3://amzn-s3-demo-bucket/inference/xgboost-mnist/model.tar.gz
             image: 123456789012.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
   ```

1. 儲存檔案，然後依如下所示，使用 `kubectl` 套用更新。您應該會看到狀態從 `InService` 變更為 `ReconcilingEndpoint`，然後變更為 `Updating`。

   ```
   $ kubectl apply -f hosting.yaml
   hostingdeployment.sagemaker.aws.amazon.com/host-xgboost configured
   
   $ kubectl get hostingdeployments
   NAME           STATUS                SAGEMAKER-ENDPOINT-NAME
   host-xgboost   ReconcilingEndpoint   host-xgboost-def0e83e0d5f11eaaa450a350abcdef
   
   $ kubectl get hostingdeployments
   NAME           STATUS     SAGEMAKER-ENDPOINT-NAME
   host-xgboost   Updating   host-xgboost-def0e83e0d5f11eaaa450a3507abcdef
   ```

SageMaker AI 會在您的模型中部署一組新的執行個體、切換流量以使用新執行個體，以及耗盡舊執行個體。這個過程序始後，狀態就會變成 `Updating`。更新完成後，端點狀態會變成 `InService`。完成此過程約需 10 分鐘的時間。

### 刪除 HostingDeployment
<a name="delete-the-hostingdeployment"></a>

1. 使用以下命令,透過 `kubectl` 刪除 HostingDeployment：

   ```
   kubectl delete hostingdeployments host-xgboost
   ```

   您的輸出看起來應如以下所示：

   ```
   hostingdeployment.sagemaker.aws.amazon.com "host-xgboost" deleted
   ```

1. 若要驗證是否已刪除託管部署，請使用下列命令：

   ```
   kubectl get hostingdeployments
   No resources found.
   ```

刪除端點後，SageMaker AI 資源就不會再產生任何費用。

## ProcessingJob 運算子
<a name="kubernetes-processing-job-operator"></a>

ProcessingJob 運算子用來啟動 Amazon SageMaker 處理工作。如需 SageMaker Processing 任務的詳細資訊，請參閱 [CreateProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)。

**Topics**
+ [使用 YAML 檔案建立 ProcessingJob](#kubernetes-processing-job-yaml)
+ [列出 ProcessingJobs](#kubernetes-processing-job-list)
+ [描述 ProcessingJob](#kubernetes-processing-job-description)
+ [刪除處理工作](#kubernetes-processing-job-delete)

### 使用 YAML 檔案建立 ProcessingJob
<a name="kubernetes-processing-job-yaml"></a>

請依照下列步驟使用 YAML 檔案建立 Amazon SageMaker 處理工作：

1. 下載 `kmeans_preprocessing.py` 預處理指令碼。

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/kmeans_preprocessing.py
   ```

1. 在 Amazon Simple Storage Service (Amazon S3) 儲存貯體中，建立 `mnist_kmeans_example/processing_code` 資料夾，然後將指令碼上傳到此資料夾。

1. 下載 `kmeans-mnist-processingjob.yaml` 檔案。

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/kmeans-mnist-processingjob.yaml
   ```

1. 編輯 YAML 檔案以指定 `sagemaker-execution-role`，並將 `amzn-s3-demo-bucket` 的所有執行個體取代為您的 S3 儲存貯體。

   ```
   ...
   metadata:
     name: kmeans-mnist-processing
   ...
     roleArn: arn:aws:iam::<acct-id>:role/service-role/<sagemaker-execution-role>
     ...
     processingOutputConfig:
       outputs:
         ...
             s3Output:
               s3Uri: s3://<amzn-s3-demo-bucket>/mnist_kmeans_example/output/
     ...
     processingInputs:
       ...
           s3Input:
             s3Uri: s3://<amzn-s3-demo-bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
   ```

   `sagemaker-execution-role` 必須具有許可，這樣 SageMaker AI 才能代表您存取您的 S3 儲存貯體、Amazon CloudWatch 和其他服務。如需建立執行角色的詳細資訊，請參閱 [SageMaker AI 角色](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms)。

1. 使用下列其中一個命令來套用 YAML 檔案。

   叢集範圍的安裝：

   ```
   kubectl apply -f kmeans-mnist-processingjob.yaml
   ```

   命名空間範圍的安裝：

   ```
   kubectl apply -f kmeans-mnist-processingjob.yaml -n <NAMESPACE>
   ```

### 列出 ProcessingJobs
<a name="kubernetes-processing-job-list"></a>

使用下列其中一個命令列出使用 ProcessingJob 運算子建立的所有任務。`SAGEMAKER-JOB-NAME ` 來自 YAML 檔案的 `metadata` 部分。

叢集範圍的安裝：

```
kubectl get ProcessingJob kmeans-mnist-processing
```

命名空間範圍的安裝：

```
kubectl get ProcessingJob -n <NAMESPACE> kmeans-mnist-processing
```

您的輸出應該類似以下內容：

```
NAME                    STATUS     CREATION-TIME        SAGEMAKER-JOB-NAME
kmeans-mnist-processing InProgress 2020-09-22T21:13:25Z kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385
```

輸出會列出所有工作，無論其狀態為何。要從清單中移除作業，請參閱[刪除處理任務](https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-processing-job-operator.html#kubernetes-processing-job-delete)。

**ProcessingJob 狀態**
+ `SynchronizingK8sJobWithSageMaker` – 工作會先提交至叢集。運算子已收到請求並正準備建立處理工作。
+ `Reconciling` – 運算子正在初始化或從暫時性誤差以及其他錯誤中恢復。如果處理任務仍處於此狀態，請使用 `kubectl` `describe` 命令在 `Additional` 欄位中查看原因。
+ `InProgress | Completed | Failed | Stopping | Stopped` – SageMaker Processing 任務的狀態。有關詳細資訊，請參閱 [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html#sagemaker-DescribeProcessingJob-response-ProcessingJobStatus)。
+ `Error` - 運算子無法透過調節來復原。

對於已完成、已停止或失敗的任務，SageMaker AI 資源未來不會產生任何費用。

### 描述 ProcessingJob
<a name="kubernetes-processing-job-description"></a>

使用下列其中一個命令可取得有關處理工作的詳細資訊。這些命令通常用於對問題進行偵錯或檢查處理工作的參數。

叢集範圍的安裝：

```
kubectl describe processingjob kmeans-mnist-processing
```

命名空間範圍的安裝：

```
kubectl describe processingjob kmeans-mnist-processing -n <NAMESPACE>
```

處理工作的輸出應該類似以下內容。

```
$ kubectl describe ProcessingJob kmeans-mnist-processing
Name:         kmeans-mnist-processing
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"ProcessingJob","metadata":{"annotations":{},"name":"kmeans-mnist-processing",...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         ProcessingJob
Metadata:
  Creation Timestamp:  2020-09-22T21:13:25Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  21746658
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/processingjobs/kmeans-mnist-processing
  UID:               7410ed52-fd18-11ea-b19a-165ae9f9e385
Spec:
  App Specification:
    Container Entrypoint:
      python
      /opt/ml/processing/code/kmeans_preprocessing.py
    Image Uri:  763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.5.0-cpu-py36-ubuntu16.04
  Environment:
    Name:   MYVAR
    Value:  my_value
    Name:   MYVAR2
    Value:  my_value2
  Network Config:
  Processing Inputs:
    Input Name:  mnist_tar
    s3Input:
      Local Path:   /opt/ml/processing/input
      s3DataType:   S3Prefix
      s3InputMode:  File
      s3Uri:        s3://<s3bucket>-us-west-2/algorithms/kmeans/mnist/mnist.pkl.gz
    Input Name:     source_code
    s3Input:
      Local Path:   /opt/ml/processing/code
      s3DataType:   S3Prefix
      s3InputMode:  File
      s3Uri:        s3://<s3bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
  Processing Output Config:
    Outputs:
      Output Name:  train_data
      s3Output:
        Local Path:    /opt/ml/processing/output_train/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
      Output Name:     test_data
      s3Output:
        Local Path:    /opt/ml/processing/output_test/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
      Output Name:     valid_data
      s3Output:
        Local Path:    /opt/ml/processing/output_valid/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
  Processing Resources:
    Cluster Config:
      Instance Count:     1
      Instance Type:      ml.m5.xlarge
      Volume Size In GB:  20
  Region:                 us-west-2
  Role Arn:               arn:aws:iam::<acct-id>:role/m-sagemaker-role
  Stopping Condition:
    Max Runtime In Seconds:  1800
  Tags:
    Key:    tagKey
    Value:  tagValue
Status:
  Cloud Watch Log URL:             https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logStream:group=/aws/sagemaker/ProcessingJobs;prefix=kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385;streamFilter=typeLogStreamPrefix
  Last Check Time:                 2020-09-22T21:14:29Z
  Processing Job Status:           InProgress
  Sage Maker Processing Job Name:  kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385
Events:                            <none>
```

### 刪除處理工作
<a name="kubernetes-processing-job-delete"></a>

當您刪除處理任務時，系統會從 Kubernetes 中移除 SageMaker Processing 任務，但不會從 SageMaker AI 中刪除該任務。如果 SageMaker AI 中的任務狀態為 `InProgress`，任務就會停止。已停止的處理任務不會對 SageMaker AI 資源產生任何費用。使用下列其中一個命令可刪除處理工作。

叢集範圍的安裝：

```
kubectl delete processingjob kmeans-mnist-processing
```

命名空間範圍的安裝：

```
kubectl delete processingjob kmeans-mnist-processing -n <NAMESPACE>
```

處理工作的輸出應該類似以下內容。

```
processingjob.sagemaker.aws.amazon.com "kmeans-mnist-processing" deleted
```


**注意**  
SageMaker AI 不會刪除處理任務。已停止的任務會繼續在 SageMaker AI 主控台中顯示。`delete` 命令需要幾分鐘的時間來清除 SageMaker AI 中的資源。

## HostingAutoscalingPolicy (HAP) 運算子
<a name="kubernetes-hap-operator"></a>

HostingAutoscalingPolicy (HAP) 運算子會接受一系列資源 ID 作為輸入，並將相同的政策套用至每個資源 ID。每個資源 ID 都是端點名稱和變體名稱的組合。HAP 運算子會執行兩個步驟：註冊資源 ID，然後將擴展政策套用至每個資源 ID。`Delete` 會復原這兩個動作。您可以將 HAP 套用至現有的 SageMaker AI 端點，也可以使用 [HostingDeployment 運算子](https://docs.aws.amazon.com/sagemaker/latest/dg/hosting-deployment-operator.html#create-a-hostingdeployment)建立新的 SageMaker AI 端點。您可以在[應用程式自動擴展政策文件](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html)中閱讀更多關於 SageMaker AI 自動擴展的資訊。

**注意**  
在 `kubectl` 命令中，您可以使用簡短格式 `hap` 來代替 `hostingautoscalingpolicy`。

**Topics**
+ [使用 YAML 檔案建立 HostingAutoscalingPolicy](#kubernetes-hap-job-yaml)
+ [列出 HostingAutoscalingPolicies](#kubernetes-hap-list)
+ [描述 HostingAutoscalingPolicy](#kubernetes-hap-describe)
+ [更新 HostingAutoscalingPolicy](#kubernetes-hap-update)
+ [刪除 HostingAutoscalingPolicy](#kubernetes-hap-delete)
+ [使用 HostingAutoscalingPolicy 更新或刪除端點](#kubernetes-hap-update-delete-endpoint)

### 使用 YAML 檔案建立 HostingAutoscalingPolicy
<a name="kubernetes-hap-job-yaml"></a>

使用 YAML 檔案建立 HostingAutoscalingPolicy (HAP)，將預先定義指標或自訂指標套用至一個或多個 SageMaker AI 端點。

Amazon SageMaker AI 需要特定值才能將自動擴展套用至您的變體。如果 YAML 規格中未指定這些值，HAP 運算子會套用下列預設值。

```
# Do not change
Namespace                    = "sagemaker"
# Do not change
ScalableDimension            = "sagemaker:variant:DesiredInstanceCount"
# Only one supported
PolicyType                   = "TargetTrackingScaling"
# This is the default policy name but can be changed to apply a custom policy
DefaultAutoscalingPolicyName = "SageMakerEndpointInvocationScalingPolicy"
```

使用下列範例建立 HAP，將預先定義的指標或自訂指標套用至一個或多個端點。

#### 範例 1：將預先定義的指標套用至單一端點變體
<a name="kubernetes-hap-predefined-metric"></a>

1. 使用下列命令，下載預先定義指標的範例 YAML 檔案：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-predefined-metric.yaml
   ```

1. 編輯 YAML 檔案以指定 `endpointName`、`variantName`、和 `Region`。

1. 使用下列其中一個命令，將預先定義的指標套用至單一資源 ID (端點名稱和變體名稱組合)。

   叢集範圍的安裝：

   ```
   kubectl apply -f hap-predefined-metric.yaml
   ```

   命名空間範圍的安裝：

   ```
   kubectl apply -f hap-predefined-metric.yaml -n <NAMESPACE>
   ```

#### 範例 2：將自訂指標套用至單一端點變體
<a name="kubernetes-hap-custom-metric"></a>

1. 使用下列命令，下載自訂指標的範例 YAML 檔案：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-custom-metric.yaml
   ```

1. 編輯 YAML 檔案以指定 `endpointName`、`variantName`、和 `Region`。

1. 使用下列其中一個命令，將自訂指標套用至單一資源 ID (端點名稱和變體名稱組合)，以取代建議的 `SageMakerVariantInvocationsPerInstance`。
**注意**  
Amazon SageMaker AI 不會檢查 YAML 規格的有效性。

   叢集範圍的安裝：

   ```
   kubectl apply -f hap-custom-metric.yaml
   ```

   命名空間範圍的安裝：

   ```
   kubectl apply -f hap-custom-metric.yaml -n <NAMESPACE>
   ```

#### 範例 3：將擴展政策套用至多個端點和變體
<a name="kubernetes-hap-scaling-policy"></a>

您可以使用 HAP 運算子將相同的資源調度政策套用至多個資源 ID。系統會針對每個資源 ID (端點名稱和變體名稱組合) 建立單獨的 `scaling_policy` 要求。

1. 使用下列命令，下載預先定義指標的範例 YAML 檔案：

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-predefined-metric.yaml
   ```

1. 編輯 YAML 檔案以指定`Region` 和多個 `endpointName` 與 `variantName`值。

1. 使用下列其中一個命令，將預先定義的指標套用至多個資源 ID (端點名稱和變體名稱組合)。

   叢集範圍的安裝：

   ```
   kubectl apply -f hap-predefined-metric.yaml
   ```

   命名空間範圍的安裝：

   ```
   kubectl apply -f hap-predefined-metric.yaml -n <NAMESPACE>
   ```

#### 針對多個端點和變體的 HostingAutoscalingPolicies 考量事項
<a name="kubernetes-hap-scaling-considerations"></a>

使用多個資源 ID 時有下列考量：
+ 如果您在多個資源 ID 上套用單一政策，則系統會針對每個資源 ID 建立一個 PolicyARN。五個端點有五個 PolicyARN。當您對政策執行 `describe` 命令時，回應會顯示為一項工作，並包含單一任務狀態。
+ 如果您將自訂指標套用至多個資源 ID，則所有資源 ID (變體) 值都會使用相同的維度或值。例如，如果您針對執行個體 1-5 套用自訂指標，且端點變體維度已對應到變體 1，則當變體 1 超過指標時，所有端點都會縱向擴展或縮減規模。
+ HAP 運算子支援更新資源 ID 清單。如果您修改、新增或刪除規格的資源 ID，則自動擴展資源政策會從先前一的變體清單中移除，並套用至新指定的資源 ID 組合。使用 [https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-hap-operator.html#kubernetes-hap-describe](https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-hap-operator.html#kubernetes-hap-describe) 命令列出目前套用政策的資源 ID。

### 列出 HostingAutoscalingPolicies
<a name="kubernetes-hap-list"></a>

使用下列其中一個命令來列出使用 HAP 運算子建立的所有 HostingAutoscalingPolicies (HAP)。

叢集範圍的安裝：

```
kubectl get hap
```

命名空間範圍的安裝：

```
kubectl get hap -n <NAMESPACE>
```

您的輸出應該類似以下內容：

```
NAME             STATUS   CREATION-TIME
hap-predefined   Created  2021-07-13T21:32:21Z
```

使用下列命令來檢查 HostingAutoscalingPolicy (HAP) 的狀態。

```
kubectl get hap <job-name>
```

系統會傳回下列其中一個值：
+ `Reconciling` – 某些類型的錯誤會將狀態顯示為 `Reconciling` 而非 `Error`。範例包括伺服器端錯誤和處於 `Creating` 或 `Updating` 狀態的端點。如需更多詳細資訊，請查看狀態或運算子日誌中的 `Additional` 欄位。
+ `Created`
+ `Error`

**檢視您套用政策的自動擴展端點**

1. 開啟 Amazon SageMaker AI 主控台，網址為 [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/)。

1. 在左側面板中，展開**推論**。

1. 選擇**端點**。

1. 選取感興趣的端點名稱。

1. 捲動至**端點執行期設定**區段。

### 描述 HostingAutoscalingPolicy
<a name="kubernetes-hap-describe"></a>

使用下列命令可取得與 HostingAutoscalingPolicy (HAP) 相關的詳細資料。這些命令通常用於對問題進行偵錯或檢查 HAP 的資源 ID (端點名稱和變體名稱組合)。

```
kubectl describe hap <job-name>
```

### 更新 HostingAutoscalingPolicy
<a name="kubernetes-hap-update"></a>

HostingAutoscalingPolicy (HAP) 運算子支援更新。您可以編輯 YAML 規格以變更值，然後重新套用政策。HAP 運算子會刪除現有政策並套用新政策。

### 刪除 HostingAutoscalingPolicy
<a name="kubernetes-hap-delete"></a>

使用下列其中一個命令來刪除 HostingAutoscalingPolicy (HAP) 政策。

叢集範圍的安裝：

```
kubectl delete hap hap-predefined
```

命名空間範圍的安裝：

```
kubectl delete hap hap-predefined -n <NAMESPACE>
```

此命令會刪除擴展政策，並從 Kubernetes 取消註冊擴展目標。此命令會傳回下列輸出：

```
hostingautoscalingpolicies.sagemaker.aws.amazon.com "hap-predefined" deleted
```

### 使用 HostingAutoscalingPolicy 更新或刪除端點
<a name="kubernetes-hap-update-delete-endpoint"></a>

若要更新具有 HostingAutoscalingPolicy (HAP) 的端點，請使用 `kubectl` `delete` 命令移除 HAP、更新端點，然後重新套用 HAP。

若要刪除具有 HAP 的端點，請先使用 `kubectl` `delete` 命令移除 HAP，然後再刪除端點。

# 將資源遷移到最新的運算子
<a name="kubernetes-sagemaker-operators-migrate"></a>

我們正在停止對 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 原始版本的開發和技術支援。

如果您目前使用的是 [SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的 `v1.2.2` 或以下版本，我們建議您將資源遷移到 [Amazon SageMaker 的 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代的 SageMaker Operators for Kubernetes，以 [AWS Kubernetes 專用控制器 (ACK)](https://aws-controllers-k8s.github.io/community/) 為基礎。

如需與終止支援 SageMaker Operators for Kubernetes 原始版本相關的常見問題的答案，請參閱[宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本](kubernetes-sagemaker-operators-eos-announcement.md)

使用下列步驟遷移您的資源，並使用 ACK 搭配 Amazon SageMaker AI 訓練、調整和部署機器學習模型。

**注意**  
最新版本的 SageMaker AI Operators for Kubernetes 不具回溯相容性。

**Topics**
+ [先決條件](#migrate-resources-to-new-operators-prerequisites)
+ [採用資源](#migrate-resources-to-new-operators-steps)
+ [清除舊資源](#migrate-resources-to-new-operators-cleanup)
+ [使用新版 SageMaker AI Operators for Kubernetes](#migrate-resources-to-new-operators-tutorials)

## 先決條件
<a name="migrate-resources-to-new-operators-prerequisites"></a>

若要成功將資源遷移至最新版本的 SageMaker AI Operators for Kubernetes，您必須執行下列操作：

1. 安裝最新的 SageMaker AI Operators for Kubernetes。如需逐步指示，請參閱 *使用 ACK SageMaker AI 控制器進行機器學習*中的[設定](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup)。

1. 如果您正在使用 [HostingAutoscalingPolicy 資源](#migrate-resources-to-new-operators-hap)，請安裝新的應用程式自動擴展運算子。如需逐步指示，請參閱*使用應用程式自動擴展來擴展 SageMaker AI 工作負載*中的[設定](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/#setup)。如果您不使用 HostingAutoScalingPolicy 資源，則此步驟為選用步驟。

如果許可設定正確，則 ACK SageMaker AI 服務控制器可以判斷 AWS 資源的規格和狀態，並協調資源，就像 ACK 控制器最初建立它一樣。

## 採用資源
<a name="migrate-resources-to-new-operators-steps"></a>

新版 SageMaker AI Operators for Kubernetes 能夠採用原本不是由 ACK 服務控制器建立的資源。如需詳細資訊，請參閱 ACK 文件中的[採用現有 AWS 資源](https://aws-controllers-k8s.github.io/community/docs/user-docs/adopted-resource/)。

下列步驟展示新版 SageMaker AI Operators for Kubernetes 如何採用現有的 SageMaker AI 端點。將下列範例儲存為名為 `adopt-endpoint-sample.yaml` 的檔案。

```
apiVersion: services.k8s.aws/v1alpha1
kind: AdoptedResource
metadata:
  name: adopt-endpoint-sample
spec:  
  aws:
    # resource to adopt, not created by ACK
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      # target K8s CR name
      name: xgboost-endpoint
```

使用 `kubectl apply` 提交自訂資源 (CR)：

```
kubectl apply -f adopt-endpoint-sample.yaml
```

使用 `kubectl describe` 檢查所採用資源的狀態條件。

```
kubectl describe adoptedresource adopt-endpoint-sample
```

確認 `ACK.Adopted` 條件為 `True`。輸出應類似以下範例：

```
---
kind: AdoptedResource
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"services.k8s.aws/v1alpha1","kind":"AdoptedResource","metadata":{"annotations":{},"name":"xgboost-endpoint","namespace":"default"},"spec":{"aws":{"nameOrID":"xgboost-endpoint"},"kubernetes":{"group":"sagemaker.services.k8s.aws","kind":"Endpoint","metadata":{"name":"xgboost-endpoint"}}}}'
  creationTimestamp: '2021-04-27T02:49:14Z'
  finalizers:
  - finalizers.services.k8s.aws/AdoptedResource
  generation: 1
  name: adopt-endpoint-sample
  namespace: default
  resourceVersion: '12669876'
  selfLink: "/apis/services.k8s.aws/v1alpha1/namespaces/default/adoptedresources/adopt-endpoint-sample"
  uid: 35f8fa92-29dd-4040-9d0d-0b07bbd7ca0b
spec:
  aws:
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      name: xgboost-endpoint
status:
  conditions:
  - status: 'True'
    type: ACK.Adopted
```

檢查您的資源是否存在於叢集中：

```
kubectl describe endpoints.sagemaker xgboost-endpoint
```

### HostingAutoscalingPolicy 資源
<a name="migrate-resources-to-new-operators-hap"></a>

`HostingAutoscalingPolicy` (HAP) 資源由多個應用程式自動擴展資源組成：`ScalableTarget` 和 `ScalingPolicy`。採用使用 ACK 的 HAP 資源時，請先安裝 [應用程式自動擴展控制器](https://github.com/aws-controllers-k8s/applicationautoscaling-controller)。要採用 HAP 資源，您需要採用 `ScalableTarget` 和 `ScalingPolicy` 資源。您可以在 `HostingAutoscalingPolicy` 資源 (`status.ResourceIDList`) 的狀態中找到這些資源的資源識別碼。

### HostingDeployment 資源
<a name="migrate-resources-to-new-operators-hosting-deployment"></a>

`HostingDeployment` 資源包含多個 SageMaker AI 資源：`Endpoint`、`EndpointConfig` 和每個 `Model`。如果您在 ACK 中採用 SageMaker AI 端點，則需要分別採用 `Endpoint`、`EndpointConfig` 和每個 `Model`。您可以在 `HostingDeployment` 資源 (`status.endpointName`、`status.endpointConfigName` 和 `status.modelNames`) 的狀態中找到 `Endpoint`、`EndpointConfig` 和 `Model` 名稱。

如需所有受支援 SageMaker AI 資源的清單，請參閱 [ACK API 參考](https://aws-controllers-k8s.github.io/community/reference/)。

## 清除舊資源
<a name="migrate-resources-to-new-operators-cleanup"></a>

新版 SageMaker AI Operators for Kubernetes 採用您的資源後，您可以解除安裝舊的運算子並清除舊資源。

### 第 1 步：解除安裝舊運算子
<a name="migrate-resources-to-new-operators-uninstall"></a>

若要解除安裝舊運算子，請參閱[刪除運算子](kubernetes-sagemaker-operators-end-of-support.md#delete-operators)。

**警告**  
刪除任何舊資源之前，請先解除安裝舊運算子。

### 第 2 步：移除終結器並刪除舊資源
<a name="migrate-resources-to-new-operators-delete-resources"></a>

**警告**  
刪除舊資源之前，請確定您已解除安裝舊運算子。

解除安裝舊操作符後，您必須明確移除終結器以刪除舊運算子資源。下列範例指令碼展示如何刪除指定命名空間中由舊運算子所管理的所有訓練工作。當新運算子採用其他資源時，您可以使用類似的模式來刪除其他資源。

**注意**  
您必須使用完整的資源名稱才能取得資源。例如，使用 `kubectl get trainingjobs.sagemaker.aws.amazon.com` 代替 `kubectl get trainingjob`。

```
namespace=sagemaker_namespace
training_jobs=$(kubectl get trainingjobs.sagemaker.aws.amazon.com -n $namespace -ojson | jq -r '.items | .[] | .metadata.name')
 
for job in $training_jobs
do
    echo "Deleting $job resource in $namespace namespace"
    kubectl patch trainingjobs.sagemaker.aws.amazon.com $job -n $namespace -p '{"metadata":{"finalizers":null}}' --type=merge
    kubectl delete trainingjobs.sagemaker.aws.amazon.com $job -n $namespace
done
```

## 使用新版 SageMaker AI Operators for Kubernetes
<a name="migrate-resources-to-new-operators-tutorials"></a>

如需使用新版 SageMaker AI Operators for Kubernetes 的深度指南，請參閱[使用 SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md#kubernetes-sagemaker-operators-ack-use)

# 宣布終止支援 SageMaker AI Operators for Kubernetes 原始版本
<a name="kubernetes-sagemaker-operators-eos-announcement"></a>

本頁宣布終止支援 [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s) 的原始版本，並提供常見問題的解答，以及有關 [Amazon SageMaker AI 專用 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller) (完全受支援的新一代 SageMaker AI Operators for Kubernetes) 的遷移資訊。如需新版 SageMaker AI Operators for Kubernetes 的一般資訊，請參閱[最新 SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md)。

## 終止支援常見問答集
<a name="kubernetes-sagemaker-operators-eos-faq"></a>

**Topics**
+ [我們為什麼要終止支援 SageMaker AI Operators for Kubernetes 的原始版本？](#kubernetes-sagemaker-operators-eos-faq-why)
+ [我可以在哪裡找到新版 SageMaker AI Operators for Kubernetes 和 ACK 的詳細資訊？](#kubernetes-sagemaker-operators-eos-faq-more)
+ [終止支援 (EOS) 是什麼意思？](#kubernetes-sagemaker-operators-eos-faq-definition)
+ [如何將工作負載遷移至新版 SageMaker AI Operators for Kubernetes，以進行訓練和推論？](#kubernetes-sagemaker-operators-eos-faq-how)
+ [我應該遷移至哪個版本的 ACK？](#kubernetes-sagemaker-operators-eos-faq-version)
+ [初始版本的 SageMaker AI Operators for Kubernetes 和新版運算子 (Amazon SageMaker AI 專用 ACK 服務控制器) 在功能上是否相同？](#kubernetes-sagemaker-operators-eos-faq-parity)

### 我們為什麼要終止支援 SageMaker AI Operators for Kubernetes 的原始版本？
<a name="kubernetes-sagemaker-operators-eos-faq-why"></a>

使用者現在可以利用 [Amazon SageMaker AI 專用 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。ACK 服務控制器是新一代 SageMaker AI Operators for Kubernetes，以 [AWS Controllers for Kubernetes ](https://aws-controllers-k8s.github.io/community/)(ACK) 為基礎，這是針對生產最佳化的社群驅動專案，標準化透過 Kubernetes Operator 公開 AWS 服務的方式。因此，我們宣布終止支援 (EOS) [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s) 的原始版本 (非 ACK 型)。[Amazon Elastic Kubernetes Service Kubernetes 1.21](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar) 於 **2023 年 2 月 15 日**終止支援。

如需與 ACK 相關的更多資訊，請參閱 [ACK 歷史記錄和原則](https://aws-controllers-k8s.github.io/community/docs/community/background/)。

### 我可以在哪裡找到新版 SageMaker AI Operators for Kubernetes 和 ACK 的詳細資訊？
<a name="kubernetes-sagemaker-operators-eos-faq-more"></a>
+ 如需新版 SageMaker AI Operators for Kubernetes 的詳細資訊，請參閱 Amazon SageMaker AI 專用 [ACK 服務控制器 ](https://github.com/aws-controllers-k8s/sagemaker-controller)GitHub 儲存庫或閱讀 [Kubernetes 專用AWS 控制器文件](https://aws-controllers-k8s.github.io/community/docs/community/overview/)。
+ 如需有關如何使用 Amazon EKS 搭配 Amazon SageMaker AI 專用 ACK 服務控制器訓練機器學習模型的教學課程，請參閱此 [SageMaker AI 範例](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/)。

  如需自動擴展範例，請參閱[使用應用程式自動擴展來擴展 SageMaker AI 工作負載](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/)。
+ 如需與 AWS Kubernetes 專用控制器 (ACK) 相關的資訊，請參閱 [AWS Kubernetes 專用控制器](https://aws-controllers-k8s.github.io/community/) (ACK) 文件。
+ 如需受支援 SageMaker AI 資源的清單，請參閱 [ACK API 參考](https://aws-controllers-k8s.github.io/community/reference/)。

### 終止支援 (EOS) 是什麼意思？
<a name="kubernetes-sagemaker-operators-eos-faq-definition"></a>

雖然使用者可以繼續使用其當前的運算子，但我們不再為運算子開發新功能，也不會針對發現的任何問題發布任何修補程式或安全性更新。`v1.2.2` 是 [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master) 的最新版本。使用者應遷移其工作負載，以使用 [Amazon SageMaker AI 專用 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller)。

### 如何將工作負載遷移至新版 SageMaker AI Operators for Kubernetes，以進行訓練和推論？
<a name="kubernetes-sagemaker-operators-eos-faq-how"></a>

如需將資源從舊版遷移至新版 SageMaker AI Operators for Kubernetes 的相關資訊，請遵循[將資源遷移到最新的運算子](kubernetes-sagemaker-operators-migrate.md)操作。

### 我應該遷移至哪個版本的 ACK？
<a name="kubernetes-sagemaker-operators-eos-faq-version"></a>

使用者應遷移至 [Amazon SageMaker AI 專用 ACK 服務控制器](https://github.com/aws-controllers-k8s/sagemaker-controller/tags)的最新版本。

### 初始版本的 SageMaker AI Operators for Kubernetes 和新版運算子 (Amazon SageMaker AI 專用 ACK 服務控制器) 在功能上是否相同？
<a name="kubernetes-sagemaker-operators-eos-faq-parity"></a>

是的，功能相同。

兩個版本之間的主要顯著差異包括：
+ ACK 型 SageMaker AI Operators for Kubernetes 使用的自訂資源定義 (CRD) 遵循 AWS API 定義，使其與原始版本中 SageMaker AI Operators for Kubernetes 的自訂資源規格不相容。請參閱新控制器中的 [CRD](https://github.com/aws-controllers-k8s/sagemaker-controller/tree/main/helm/crds)，或使用遷移指南來採用資源並使用新的控制器。
+ `Hosting Autoscaling` 政策不再是新版 SageMaker AI Operators for Kubernetes 中的一部分，而且已遷移至[應用程式自動擴展](https://github.com/aws-controllers-k8s/applicationautoscaling-controller) ACK 控制器。若要了解如何使用應用程式自動擴展控制器在 SageMaker AI 端點上設定自動擴展，請遵循此[自動擴展範例](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/)。
+ `HostingDeployment` 資源用於在一個 CRD 中建立模型、端點組態和端點。新版 SageMaker AI Operators for Kubernetes 為每種資源提供個別的 CRD。

# 適用於 Kubeflow 管道的 SageMaker AI 元件
<a name="kubernetes-sagemaker-components-for-kubeflow-pipelines"></a>

使用適用於 Kubeflow 管道的 SageMaker AI 元件，您可以從 Kubeflow 管道建立和監視原生 SageMaker AI 訓練、調校、端點部署和批次轉換任務。透過在 SageMaker AI 上執行 Kubeflow 管道任務，您可以將資料處理和訓練任務從 Kubernetes 叢集移至 SageMaker AI 的機器學習最佳化受管服務。本文件假設您事先了解 Kubernetes 和 Kubeflow。

**Topics**
+ [什麼是 Kubeflow 管道？](#what-is-kubeflow-pipelines)
+ [什麼是 Kubeflow 管道元件？](#kubeflow-pipeline-components)
+ [為什麼要使用適用於 Kubeflow 管道的 SageMaker AI 元件？](#why-use-sagemaker-components)
+ [適用於 Kubeflow 管道的 SageMaker AI 元件版本](#sagemaker-components-versions)
+ [適用於 Kubeflow 管道的 SageMaker AI 元件清單](#sagemaker-components-list)
+ [IAM 許可](#iam-permissions)
+ [轉換管道以使用 SageMaker AI](#converting-pipelines-to-use-amazon-sagemaker)
+ [安裝 Kubeflow 管道](kubernetes-sagemaker-components-install.md)
+ [使用 SageMaker AI 元件](kubernetes-sagemaker-components-tutorials.md)

## 什麼是 Kubeflow 管道？
<a name="what-is-kubeflow-pipelines"></a>

Kubeflow 管道 (KFP) 是一個平台，可用來建置和部署以 Docker 容器為基礎的可攜式、可擴充機器學習 (ML) 工作流程。Kubeflow 管道平台由下列各項組成：
+ 用於管理和追蹤實驗、工作和執行的使用者介面 (UI)。
+ 用於排程多步驟機器學習 (ML) 工作流程的引擎 (Argo)。
+ 用於定義和操作管道和元件的 SDK。
+ 使用 SDK 與系統互動的筆記本。

管道是對機器學習 (ML) 工作流程的一項描述，以[有向無環圖](https://www.kubeflow.org/docs/pipelines/concepts/graph/)來表示。工作流程中的每個步驟都會以 Kubeflow 管道[元件](https://www.kubeflow.org/docs/pipelines/overview/concepts/component/)表示，也就是 適用於 Python (Boto3) 的 AWS SDK 模組。

如需與 Kubeflow 管道相關的詳細資訊，請參閱 [Kubeflow 管道文件](https://www.kubeflow.org/docs/pipelines/)。

## 什麼是 Kubeflow 管道元件？
<a name="kubeflow-pipeline-components"></a>

Kubeflow 管道元件是用來執行 Kubeflow 管道中某個步驟的一組程式碼。元件由 Docker 映像中內建的 Python 模組表示。管道執行時，元件的容器會在執行 Kubeflow 之 Kubernetes 叢集的其中一個工作者節點上具現化，並執行您的邏輯。管道元件可以讀取先前元件的輸出，並建立管道中的下一個元件可以消耗的輸出。透過這些元件可以快速、輕鬆地為實驗和生產環境撰寫管道，而不必與基礎 Kubernetes 基礎設施互動。

您可以在 Kubeflow 管道中使用 SageMaker AI 元件。與其將您的邏輯封裝在自訂容器中，您可以使用 Kubeflow Pipelines SDK 載入組件並描述您的管道。管道執行時，您的指示會轉換為 SageMaker AI 任務或部署。然後，工作負載會在 SageMaker AI 的受管基礎設施上執行。

## 為什麼要使用適用於 Kubeflow 管道的 SageMaker AI 元件？
<a name="why-use-sagemaker-components"></a>

適用於 Kubeflow 管道的 SageMaker AI 元件為從 SageMaker AI 啟動運算密集型任務提供一種替代方法。這些元件將 SageMaker AI 與 Kubeflow 管道的可攜性和協同運作整合在一起。透過適用於 Kubeflow 管道的 SageMaker AI 元件，您可以作為 Kubeflow 管道工作流程的一部分來建立和監視 SageMaker AI 資源。管道中的每個任務都在 SageMaker AI 上執行，而不是在本機 Kubernetes 叢集上執行。如此一來，您可以利用 SageMaker AI 的關鍵功能，例如資料標記、大規模超參數調校和分散式訓練任務，或是一鍵式安全且可擴展的模型部署。來自 SageMaker AI 的任務參數、狀態、日誌和輸出都仍然可以從 Kubeflow 管道 UI 存取。

SageMaker AI 元件可將關鍵的 SageMaker AI 功能整合到您的 ML 工作流程，從準備資料到建置、訓練和部署 ML 模型。您可以建立完全使用這些元件建立的 Kubeflow 管道，或視需要將個別元件整合到您的工作流程中。元件可以有一個或兩個版本。元件的每個版本使用不同的後端。如需與這些版本相關的詳細資訊，請參閱[適用於 Kubeflow 管道的 SageMaker AI 元件版本](#sagemaker-components-versions)。

使用適用於 Kubeflow 管道的 SageMaker AI 元件無需額外收費。透過這些元件使用的任何 SageMaker AI 資源都會產生費用。

## 適用於 Kubeflow 管道的 SageMaker AI 元件版本
<a name="sagemaker-components-versions"></a>

適用於 Kubeflow 管道的 SageMaker AI 元件有兩個版本。每個版本利用不同的後端來建立和管理 SageMaker AI 上的資源。
+ 適用於 Kubeflow 管道的 SageMaker AI 元件版本 1 (v1.x 或更低版本) 使用 **[Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html)** (適用於 Python (Boto3) 的 AWS SDK) 作為後端。
+ 適用於 Kubeflow 管道的 SageMaker AI 元件版本 2 (v2.0.0-alpha2 和更高版本) 使用 [SageMaker AI Operator for Kubernetes (ACK)](https://github.com/aws-controllers-k8s/sagemaker-controller)。

  AWS 引進 [ACK](https://aws-controllers-k8s.github.io/community/)，以促進管理 AWS 雲端資源的 Kubernetes 原生方式。ACK 包含一組 AWS 服務特定的控制器，其中一個是 SageMaker AI 控制器。SageMaker AI 控制器可讓機器學習開發人員和資料科學家使用 Kubernetes 作為控制平面，以便在 SageMaker AI 中訓練、調整和部署機器學習 (ML) 模型。如需詳細資訊，請參閱 [SageMaker AI Operators for Kubernetes](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/) 

適用於 Kubeflow 管道的 SageMaker AI 元件的兩個版本均受支援。但是，版本 2 提供了一些額外優點。尤其是：

1. 無論您是使用 Kubeflow 管道、Kubernetes CLI (`kubectl`) 還是其他 Kubeflow 應用程式 (例如筆記本)，都能從任何應用程式管理 SageMaker AI 資源，體驗保持一致。

1. 在 Kubeflow 管道工作流程之外，可以彈性地管理與監控 SageMaker AI 資源。

1. 如果您在 [AWS版本上部署了完整的 Kubeflow](https://awslabs.github.io/kubeflow-manifests/docs/about/)，則使用 SageMaker AI 元件的設定時間為零，因為 SageMaker AI 運算子是其部署的一部分。

## 適用於 Kubeflow 管道的 SageMaker AI 元件清單
<a name="sagemaker-components-list"></a>

以下是適用於 Kubeflow 管道的所有 SageMaker AI 元件及其可用版本的清單。或者，您可以[在 GitHub 中找到適用於 Kubeflow 管道的所有 SageMaker AI 元件](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker#versioning)。

**注意**  
我們鼓勵使用者盡量使用 SageMaker AI 元件的版本 2。

### Ground Truth 元件
<a name="ground-truth-components"></a>
+ **Ground Truth**

  Ground Truth 元件讓您可以直接透過 Kubeflow 管道工作流程提交 SageMaker AI Ground Truth 標籤工作。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **工作團隊**

  工作團隊元件讓您可以直接從 Kubeflow 管道工作流程建立 SageMaker AI 私有工作團隊任務。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### 資料處理元件
<a name="data-processing-components"></a>
+ **處理**

  處理元件讓您可以直接從 Kubeflow 管道工作流程將處理工作提交至 SageMaker AI。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### 訓練元件
<a name="training-components"></a>
+ **訓練**

  訓練元件讓您可以直接透過 Kubeflow 管道工作流程提交 SageMaker 訓練工作。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **超參數最佳化**

  超參數最佳化元件讓您可以直接從 Kubeflow 管道工作流程將超參數調校工作提交至 SageMaker AI。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### 推論元件
<a name="inference-components-kfp"></a>
+ **託管部署**

  託管元件讓您可以從 Kubeflow 管道工作流程使用 SageMaker AI 託管服務部署模型。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **批次轉換**

  批次轉換元件讓您可以透過 Kubeflow 管道工作流程針對 SageMaker AI 中的整個資料集執行推論任務。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **模型監控**

  模型監控元件讓您可以透過 Kubeflow 管道工作流程監控生產環境中 SageMaker AI 機器學習模型的品質。    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

## IAM 許可
<a name="iam-permissions"></a>

使用 SageMaker AI 元件部署 Kubeflow 管道需要以下三層驗證：
+ 授予閘道節點 (可以是本機機器或遠端執行個體) 存取 Amazon Elastic Kubernetes Service (Amazon EKS) 叢集的 IAM 角色。

  存取閘道節點的使用者會擔任此角色來執行以下操作：
  + 建立 Amazon EKS 叢集並安裝 KFP
  + 建立 IAM 角色
  + 為您的範例輸入資料建立 Amazon S3 儲存貯體

  角色需要以下許可：
  + CloudWatchLogsFullAccess 
  + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess) 
  + IAMFullAccess
  + AmazonS3FullAccess
  + AmazonEC2FullAccess
  + AmazonEKSAdminPolicy (使用 [Amazon EKS 身分型政策範例](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)中的結構描述建立此政策) 
+ Kubernetes 管道 Pod (**kfp-example-pod-role**) 或 SageMaker AI Operator for Kubernetes 控制器 Pod 擔任的 Kubernetes IAM 執行角色，用於存取 SageMaker AI。此角色用來從 Kubernetes 建立和監控 SageMaker AI 工作。

  角色需要以下許可：
  + AmazonSageMakerFullAccess 

  您可以透過建立並連接自訂政策來限制 KFP 和控制器 Pod 的許可。
+ SageMaker AI 任務擔任的 SageMaker AI IAM 執行角色，可存取 Amazon S3 或 Amazon ECR (**kfp-example-sagemaker-execution-role**) 等 AWS 資源。

  SageMaker AI 任務使用此角色來執行以下操作：
  + 存取 SageMaker AI 資源
  + 從 Amazon S3 輸入資料
  + 將您的輸出模型儲存到 Amazon S3

  角色需要以下許可：
  + AmazonSageMakerFullAccess 
  + AmazonS3FullAccess 

## 轉換管道以使用 SageMaker AI
<a name="converting-pipelines-to-use-amazon-sagemaker"></a>

您可以透過移植一般 Python [處理容器](https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-containers.html)和[訓練容器](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html)，將現有的管道轉換為使用 SageMaker AI。如果使用 SageMaker AI 進行推論，您還需要將 IAM 許可連接到叢集，並將成品轉換為模型。

# 安裝 Kubeflow 管道
<a name="kubernetes-sagemaker-components-install"></a>

[Kubeflow 管道 (KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) 是 Kubeflow 的管道協調流程元件。

您可以在現有的 Amazon Elastic Kubernetes Service (Amazon EKS) 上部署 Kubeflow 管道 (KFP) 或建立新的 Amazon EKS 叢集。使用閘道節點與叢集互動。閘道節點可以是您的本機機器或 Amazon EC2 執行個體。

以下部分將引導您完成設置和設定這些資源的步驟。

**Topics**
+ [選擇安裝選項](#choose-install-option)
+ [設定您的管道許可以存取 SageMaker AI](#configure-permissions-for-pipeline)
+ [存取 KFP 使用者介面 (Kubeflow 儀表板)](#access-the-kfp-ui)

## 選擇安裝選項
<a name="choose-install-option"></a>

Kubeflow 管道可作為 Kubeflow 在 上完整分佈的核心元件 AWS 或獨立安裝。

選取適用於您使用案例的選項：

1. [AWS 部署時的完整 Kubeflow](#full-kubeflow-deployment)

   若要使用 Kubeflow 管道以外的其他 Kubeflow 元件，請選擇完整的 [AWS Kubeflow 發行版](https://awslabs.github.io/kubeflow-manifests)部署。

1. [獨立 Kubeflow 管道部署](#kubeflow-pipelines-standalone)

   若要在不使用 Kubeflow 的其他元件的情況下使用 Kubeflow 管道，請獨立安裝 Kubeflow 管道。

### AWS 部署時的完整 Kubeflow
<a name="full-kubeflow-deployment"></a>

若要在 上安裝 Kubeflow 的完整版本 AWS，請從部署[指南上的 Kubeflow 或任何其他支援與各種服務 (Amazon S3、Amazon RDS、Amazon Cognito) 整合的部署選項中選擇 vanilla AWS](https://awslabs.github.io/kubeflow-manifests/docs/deployment/) 部署選項。 AWS Amazon S3 Amazon Cognito

### 獨立 Kubeflow 管道部署
<a name="kubeflow-pipelines-standalone"></a>

本節假設您的使用者具有建立角色和定義角色政策的權限。

#### 設定閘道節點
<a name="set-up-a-gateway-node"></a>

您可以使用本機機器或 Amazon EC2 執行個體作為閘道節點。閘道節點可用來建立 Amazon EKS 叢集並存取 Kubeflow 管道使用者介面。

完成以下步驟以設定節點。

1. 

**建立閘道節點。**

   您可以使用現有的 Amazon EC2 執行個體，也可以使用[啟動和設定 DLAMI](https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html) 中的步驟，使用最新的 Ubuntu 18.04 DLAMI 版本建立新執行個體。

1. 

**建立 IAM 角色以授予閘道節點對 AWS 資源的存取權。**

   建立具有下列資源許可的 IAM 角色：CloudWatch CloudFormation、IAM、Amazon EC2、Amazon S3、Amazon EKS。

   將下列內嵌政策連接到角色：
   + CloudWatchLogsFullAccess 
   + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess)
   + IAMFullAccess 
   + AmazonS3FullAccess 
   + AmazonEC2FullAccess 
   + AmazonEKSAdminPolicy (使用 [Amazon EKS 身分型政策範例](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)中的結構描述建立此政策) 

   如需與將 IAM 許可新增至 IAM 角色相關的資訊，請參閱[新增和移除 IAM 身分許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html)。

1. 

**安裝下列工具和用戶端**

   在閘道節點上安裝並設定下列工具和資源，以存取 Amazon EKS 叢集和 KFP 使用者介面 (UI)。
   + [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html)：使用 AWS 服務的命令列工具。如需 AWS CLI 組態資訊，請參閱[設定 AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)。
   + [aws-iam-authenticator](https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html) 版本 0.1.31 和更高版本：使用 AWS IAM 憑證對 Kubernetes 叢集進行身分驗證的工具。
   + [https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html](https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html) 0.15 以上版本：使用 Amazon EKS 叢集的命令列工具。
   + [https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl)：用於處理 Kubernetes 叢集的命令列工具。該版本需要與您的 Kubernetes 版本在一個次要版本中相符。
   + [https://aws.amazon.com/sdk-for-python/](https://aws.amazon.com/sdk-for-python/).

     ```
     pip install boto3
     ```

#### 設定 Amazon EKS 叢集
<a name="set-up-anamazon-eks-cluster"></a>

1. 如果沒有現有的 Amazon EKS 叢集，請從閘道節點的命令列執行下列步驟，否則請跳過此步驟。

   1. 執行下列命令建立 1.17 或更高版本的 Amazon EKS 叢集。將 `<clustername>` 取代為任何叢集名稱。

      ```
      eksctl create cluster --name <clustername> --region us-east-1 --auto-kubeconfig --timeout=50m --managed --nodes=1
      ```

   1. 叢集建立完成後，請列出叢集節點，以確保您可以存取叢集。

      ```
      kubectl get nodes
      ```

1. 使用下列命令，確保目前的 `kubectl` 環境指向您的叢集。目前內容會在輸出中以星號 (\$1) 表示。

   ```
   kubectl config get-contexts
   
   CURRENT NAME     CLUSTER
   *   <username>@<clustername>.us-east-1.eksctl.io   <clustername>.us-east-1.eksctl.io
   ```

1. 如果所需的叢集未配置為目前的預設值，請使用下列指令更新預設值。

   ```
   aws eks update-kubeconfig --name <clustername> --region us-east-1
   ```

#### 安裝 Kubeflow 管道
<a name="install-kubeflow-pipelines"></a>

從閘道節點的終端機執行下列步驟，以在叢集上安裝 Kubeflow 管道。

1. 安裝所有[憑證管理員元件](https://cert-manager.io/docs/installation/kubectl/)。

   ```
   kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml
   ```

1. 安裝 Kubeflow 管道。

   ```
   export PIPELINE_VERSION=2.0.0-alpha.5
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/cluster-scoped-resources?ref=$KFP_VERSION"
   kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/dev?ref=$KFP_VERSION"
   ```

1. 確保 Kubeflow 管道服務和其他相關資源正在執行中。

   ```
   kubectl -n kubeflow get all | grep pipeline
   ```

   您的輸出看起來應該如下所示。

   ```
   pod/ml-pipeline-6b88c67994-kdtjv                      1/1     Running            0          2d
   pod/ml-pipeline-persistenceagent-64d74dfdbf-66stk     1/1     Running            0          2d
   pod/ml-pipeline-scheduledworkflow-65bdf46db7-5x9qj    1/1     Running            0          2d
   pod/ml-pipeline-ui-66cc4cffb6-cmsdb                   1/1     Running            0          2d
   pod/ml-pipeline-viewer-crd-6db65ccc4-wqlzj            1/1     Running            0          2d
   pod/ml-pipeline-visualizationserver-9c47576f4-bqmx4   1/1     Running            0          2d
   service/ml-pipeline                       ClusterIP   10.100.170.170   <none>        8888/TCP,8887/TCP   2d
   service/ml-pipeline-ui                    ClusterIP   10.100.38.71     <none>        80/TCP              2d
   service/ml-pipeline-visualizationserver   ClusterIP   10.100.61.47     <none>        8888/TCP            2d
   deployment.apps/ml-pipeline                       1/1     1            1           2d
   deployment.apps/ml-pipeline-persistenceagent      1/1     1            1           2d
   deployment.apps/ml-pipeline-scheduledworkflow     1/1     1            1           2d
   deployment.apps/ml-pipeline-ui                    1/1     1            1           2d
   deployment.apps/ml-pipeline-viewer-crd            1/1     1            1           2d
   deployment.apps/ml-pipeline-visualizationserver   1/1     1            1           2d
   replicaset.apps/ml-pipeline-6b88c67994                      1         1         1       2d
   replicaset.apps/ml-pipeline-persistenceagent-64d74dfdbf     1         1         1       2d
   replicaset.apps/ml-pipeline-scheduledworkflow-65bdf46db7    1         1         1       2d
   replicaset.apps/ml-pipeline-ui-66cc4cffb6                   1         1         1       2d
   replicaset.apps/ml-pipeline-viewer-crd-6db65ccc4            1         1         1       2d
   replicaset.apps/ml-pipeline-visualizationserver-9c47576f4   1         1         1       2d
   ```

## 設定您的管道許可以存取 SageMaker AI
<a name="configure-permissions-for-pipeline"></a>

在本節中，您會建立 IAM 執行角色，授予 Kubeflow 管道 Pod 存取 SageMaker AI 服務的許可。

### SageMaker AI 元件版本 2 組態
<a name="permissions-for-SM-v2"></a>

若要執行適用於 Kubeflow 管道的 SageMaker AI 元件版本 2，您需要安裝 [SageMaker AI Operator for Kubernetes](https://github.com/aws-controllers-k8s/sagemaker-controller)，並設定角色型存取控制 (RBAC)，允許 Kubeflow 管道 Pod 在您的 Kubernetes 叢集中建立 SageMaker AI 自訂資源。

**重要**  
如果您使用 Kubeflow 管道獨立部署，請遵循本節的指示操作。如果您使用的是 AWS Kubeflow 版本 1.6.0-aws-b1.0.0 或更高版本，SageMaker AI 元件版本 2 已設定。

1. 安裝 SageMaker AI Operator for Kubernetes，以使用 SageMaker AI 元件版本 2。

   遵循[使用 ACK SageMaker AI 控制器進行機器學習教學課程](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup)的*設定*一節。

1. 針對 Kubeflow 管道 Pod 所使用的執行角色 (服務帳戶) 設定 RBAC 許可。在 Kubeflow 管道獨立部署中，管道執行會使用 `pipeline-runner` 服務帳戶在 `kubeflow` 命名空間中執行。

   1. 建立 [RoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-example)，授予服務帳戶管理 SageMaker AI 自訂資源的許可。

      ```
      cat > manage_sagemaker_cr.yaml <<EOF
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
      name: manage-sagemaker-cr  
      namespace: kubeflow
      subjects:
      - kind: ServiceAccount
      name: pipeline-runner
      namespace: kubeflow
      roleRef:
      kind: ClusterRole
      name: ack-sagemaker-controller 
      apiGroup: rbac.authorization.k8s.io
      EOF
      ```

      ```
      kubectl apply -f manage_sagemaker_cr.yaml
      ```

   1. 確保 RoleBinding 是透過執行以下命令建立的：

      ```
      kubectl get rolebinding manage-sagemaker-cr -n kubeflow -o yaml
      ```

### SageMaker AI 元件版本 1 組態
<a name="permissions-for-SM-v1"></a>

若要執行適用於 Kubeflow 管道的 SageMaker AI 元件版本 1，Kubeflow 管道 Pod 需要存取 SageMaker AI。

**重要**  
無論您是在 AWS 部署上使用完整的 Kubeflow，還是獨立使用 Kubeflow Pilepines，請遵循本節。

若要建立 IAM 執行角色，將 Kubeflow 管道 Pod 存取權授予 SageMaker AI，請遵循下列步驟：

1. 匯出您的叢集名稱 (例如*my-cluster-name*) 和叢集區域 (例如 *us-east-1*)。

   ```
   export CLUSTER_NAME=my-cluster-name
   export CLUSTER_REGION=us-east-1
   ```

1. 根據安裝類型，匯出命名空間和服務帳戶名稱。
   + 如需 AWS 安裝時的完整 Kubeflow，請將您的設定檔 `namespace`（例如 *kubeflow-user-example-com*) 和 *default-editor* 匯出為服務帳戶。

     ```
     export NAMESPACE=kubeflow-user-example-com
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=default-editor
     ```
   + 對於獨立管道部署，請將 *kubeflow* 匯出為 `namespace`，將 *pipeline-runner* 匯出為服務帳戶。

     ```
     export NAMESPACE=kubeflow
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=pipeline-runner
     ```

1. 使用下列命令[為 Amazon EKS 叢集建立 IAM OIDC 身分提供者](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html)。

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
               --region ${CLUSTER_REGION} --approve
   ```

1. 為 KFP Pod 建立 IAM 執行角色以存取 AWS 服務 (SageMaker AI、CloudWatch)。

   ```
   eksctl create iamserviceaccount \
   --name ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} \
   --namespace ${NAMESPACE} --cluster ${CLUSTER_NAME} \
   --region ${CLUSTER_REGION} \
   --attach-policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess \
   --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess \
   --override-existing-serviceaccounts \
   --approve
   ```

一旦您的管道許可設定為存取 SageMaker AI 元件第 1 版，請遵循[AWS 文件上 Kubeflow 上 Kubeflow](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/) 管道的 SageMaker AI 元件指南。

## 存取 KFP 使用者介面 (Kubeflow 儀表板)
<a name="access-the-kfp-ui"></a>

Kubeflow 管道使用者介面可用來管理和追蹤叢集上的實驗、工作和執行。如需有關如何從閘道節點存取 Kubeflow 管道使用者介面的指示，請遵循本節中與您的部署選項對應的步驟。

### AWS 部署時的完整 Kubeflow
<a name="access-kfp-ui-full-kubeflow-deployment"></a>

遵循[AWS 網站上的 Kubeflow](https://awslabs.github.io/kubeflow-manifests/docs/deployment/connect-kubeflow-dashboard/) 上的指示，連接到 Kubeflow 儀表板並導覽至管道索引標籤。

### 獨立 Kubeflow 管道部署
<a name="access-kfp-ui-standalone-kubeflow-pipelines-deployment"></a>

遵循以下步驟，使用連接埠轉遞功能，從閘道節點存取 Kubeflow 管道使用者介面。

#### 設定連接埠轉送至 KFP 使用者介面服務
<a name="set-up-port-forwarding-to-the-kfp-ui-service"></a>

從閘道節點的命令列執行下列命令。

1. 使用以下命令驗證 KFP 使用者介面服務是否正在執行。

   ```
   kubectl -n kubeflow get service ml-pipeline-ui
   
   NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
   ml-pipeline-ui   ClusterIP   10.100.38.71   <none>        80/TCP    2d22h
   ```

1. 執行下列命令以設定 KFP 使用者介面服務的連接埠轉送。這會將 KFP 使用者介面轉送到閘道節點上的連接埠 8080，並允許您從瀏覽器存取 KFP 使用者介面。

   ```
   kubectl port-forward -n kubeflow service/ml-pipeline-ui 8080:80
   ```

   如果沒有任何活動，遠端機器的連接埠轉發會停止。如果儀表板無法取得日誌或更新，請再次執行此命令。如果命令傳回錯誤，請確保您嘗試使用的連接埠上沒有任何進程已在執行。

#### 存取 KFP 使用者介面服務
<a name="set-up-port-forwarding-to-the-kfp-ui-service-access"></a>

您存取 KFP 使用者介面的方法取決於閘道節點類型。
+ 本地機器作為閘道節點：

  1. 在瀏覽器中存取儀表板，如下所示：

     ```
     http://localhost:8080
     ```

  1. 選擇**管道**以存取管道使用者介面。
+ 作為閘道節點的 Amazon EC2 執行個體：

  1. 您需要在 Amazon EC2 執行個體上設定 SSH 通道，才能從本機機器的瀏覽器存取 Kubeflow 儀表板。

     從本機機器新的終端工作階段中，執行下列命令。將 `<public-DNS-of-gateway-node>` 取代為您在 Amazon EC2 主控台上找到的執行個體 IP 地址。您也可以使用公有 DNS。將 `<path_to_key>` 取代為用來存取閘道節點的 pem 金鑰路徑。

     ```
     public_DNS_address=<public-DNS-of-gateway-node>
     key=<path_to_key>
     
     on Ubuntu:
     ssh -i ${key} -L 9000:localhost:8080 ubuntu@${public_DNS_address}
     
     or on Amazon Linux:
     ssh -i ${key} -L 9000:localhost:8080 ec2-user@${public_DNS_address}
     ```

  1. 在瀏覽器中存取儀表板。

     ```
     http://localhost:9000
     ```

  1. 選擇**管道**以存取 KFP 使用者介面。

#### (選用) 授予 SageMaker AI 筆記本執行個體存取 Amazon EKS 的權限，並從您的筆記本執行 KFP 管道。
<a name="add-access-to-additional-iam-users-or-roles"></a>

SageMaker 筆記本執行個體屬於全受管的 Amazon EC2 運算執行個體，其可執行 Jupyter 筆記本應用程式。您可以使用筆記本執行個體來建立和管理 Jupyter 筆記本，然後使用 適用於 Python (Boto3) 的 AWS SDK 或 KFP CLI 定義、編譯、部署和執行 KFP 管道。

1. 依照[建立 SageMaker 筆記本執行個體](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)中的步驟建立筆記本執行個體，然後將 `S3FullAccess` 政策連接至其 IAM 執行角色。

1. 從閘道節點的命令列執行下列命令，以擷取您建立之筆記本執行個體的 IAM 角色 ARN。以您的執行個體的名稱取代 `<instance-name>`。

   ```
   aws sagemaker describe-notebook-instance --notebook-instance-name <instance-name> --region <region> --output text --query 'RoleArn'
   ```

   此命令會以 `arn:aws:iam::<account-id>:role/<role-name>` 格式輸出 IAM 角色 ARN。記下此 ARN。

1. 執行此命令將以下政策 (AmazonSageMakerFullAccess、AmazonEKSWorkerNodePolicy、AmazonS3FullAccess) 連接到此 IAM 角色。將 `<role-name>` 取代為 ARN 中的 `<role-name>`。

   ```
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
   ```

1. Amazon EKS 叢集使用 IAM 角色來控制對叢集的存取。這些規則在名為 `aws-auth` 的配置對應中實現。`eksctl` 提供用於讀取和編輯 `aws-auth` 設定對應的命令。只有擁有叢集存取權的使用者才能編輯此組態對應。

   `system:masters` 是具有叢集超級使用者許可的預設使用者群組之一。將您的使用者新增至此群組，或建立具有更嚴格許可的群組。

1. 透過執行以下命令，將角色繫結至叢集。將 `<IAM-Role-arn>` 取代為 IAM 角色的 ARN。`<your_username>` 可以是任何唯一的使用者名稱。

   ```
   eksctl create iamidentitymapping \
   --cluster <cluster-name> \
   --arn <IAM-Role-arn> \
   --group system:masters \
   --username <your-username> \
   --region <region>
   ```

1. 在 SageMaker AI 執行個體上開啟 Jupyter 筆記本，然後執行下列命令以確保其具有叢集的存取權。

   ```
   aws eks --region <region> update-kubeconfig --name <cluster-name>
   kubectl -n kubeflow get all | grep pipeline
   ```

# 使用 SageMaker AI 元件
<a name="kubernetes-sagemaker-components-tutorials"></a>

在本教學課程中，您會使用適用於 Kubeflow 管道的 SageMaker AI 元件來執行管道，以使用 Kmeans 搭配 SageMaker AI 上的 MNIST 資料集來訓練分類模型。此工作流程使用 Kubeflow 管道作為協調器，並使用 SageMaker AI 來執行工作流程的每個步驟。此範例取自現有的 [SageMaker AI 範例](https://github.com/aws/amazon-sagemaker-examples/blob/8279abfcc78bad091608a4a7135e50a0bd0ec8bb/sagemaker-python-sdk/1P_kmeans_highlevel/kmeans_mnist.ipynb)，並經過修改，以使用適用於 Kubeflow 管道的 SageMaker AI 元件。

您可以在 Python 中使用 定義管道， 適用於 Python (Boto3) 的 AWS SDK 然後使用 KFP 儀表板、KFP CLI 或 Boto3 來編譯、部署和執行工作流程。MNIST 分類管道範例的完整程式碼可在 [Kubeflow Github 儲存庫](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker#mnist-classification-with-kmeans)中取得。要使用它，請將 Python 檔案複製到您的網關節點。

您可以在 GitHub 上找到其他 [SageMaker AI Kubeflow 管道範例](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples)。如需與所使用元件相關的資訊，請參閱 [KubeFlow 管道 GitHub 儲存庫](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker)。

若要執行分類管道範例，請建立 SageMaker AI IAM 執行角色，授予訓練任務存取 AWS 資源的許可，然後繼續執行與您的部署選項對應的步驟。

## 建立一個 SageMaker AI 執行角色
<a name="create-an-amazonsagemaker-execution-role"></a>

IAM 角色是由 SageMaker AI `kfp-example-sagemaker-execution-role` 任務擔任的執行期角色，用於存取 AWS 資源。在下列命令中，您可以建立名為 的 IAM 執行角色`kfp-example-sagemaker-execution-role`，連接兩個受管政策 (AmazonSageMakerFullAccess、AmazonS3FullAccess)，並與 SageMaker AI 建立信任關係，以授予 SageMaker AI 任務對這些 AWS 資源的存取權。

執行管道時，您可以將此角色作為輸入參數提供。

執行下列 命令以建立角色。請注意在輸出中傳回的 ARN。

```
SAGEMAKER_EXECUTION_ROLE_NAME=kfp-example-sagemaker-execution-role

TRUST="{ \"Version\": \"2012-10-17		 	 	 \", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"Service\": \"sagemaker.amazonaws.com\" }, \"Action\": \"sts:AssumeRole\" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn'
```

## AWS 部署時的完整 Kubeflow
<a name="run-pipelines-on-full-kubeflow-deployment"></a>

請遵循 [SageMaker 訓練管道教學課程中的指示，使用 K 平均數對 MNIST 進行分類](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/)。

## 獨立 Kubeflow 管道部署
<a name="run-pipelines-on-standalone-kubeflow-pipelines-deployment"></a>

### 準備資料集
<a name="prepare-datasets"></a>

若要執行管道，您需要將資料擷取預處理指令碼上傳至 Amazon S3 儲存貯體。此儲存貯體和此範例的所有資源都必須位於 `us-east-1` 區域中。如需與建立儲存貯體相關的資訊，請參閱[建立儲存貯體](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)。

從您在閘道節點上複製的 Kubeflow 儲存庫 `mnist-kmeans-sagemaker` 資料夾中，執行下列命令，將 `kmeans_preprocessing.py` 檔案上傳到 Amazon S3 儲存貯體。將 `<bucket-name>` 變更為 Amazon S3 儲存貯體的名稱。

```
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
```

### 編譯和部署您的管道
<a name="compile-and-deploy-your-pipeline"></a>

定義管道之後，您必須先將其編譯成中繼表示形式，然後才能將其提交至叢集上的 Kubeflow 管道服務。中繼表示形式是壓縮成 tar.gz 檔案的 YAML 檔案形式的工作流程規格。您需要 KFP SDK 來編譯管道。

#### 安裝 KFP SDK
<a name="install-kfp-sdk"></a>

從閘道節點的命令列執行下列命令：

1. 按照 [Kubeflow 管道文件](https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/)中的指示安裝 KFP SDK。

1. 使用以下命令驗證已安裝 KFP SDK：

   ```
   pip show kfp
   ```

1. 驗證已正確安裝 `dsl-compile`，如下所示：

   ```
   which dsl-compile
   ```

#### 編譯管道
<a name="compile-your-pipeline"></a>

您有三個選項可以與 Kubeflow 管道互動：KFP UI、KFP CLI 或 KFP SDK。以下各節說明使用 KFP 使用者介面和 CLI 的工作流程。

完成以下步驟以設定節點。

1. 使用您的 Amazon S3 儲存貯體名稱和 IAM 角色 ARN 修改您的 Python 文件。

1. 使用 `dsl-compile` 命令行中的命令來編譯管道，如下所示。將 `<path-to-python-file>` 取代為管道的路徑，將 `<path-to-output>` 取代為您希望 tar.gz 檔案所在的位置。

   ```
   dsl-compile --py <path-to-python-file> --output <path-to-output>
   ```

#### 使用 KFP CLI 上傳並執行管道
<a name="upload-and-run-the-pipeline-using-the-kfp-cli"></a>

從閘道節點的命令列完成下列步驟。KFP 會將您的管道執行組織為實驗。您可以選擇指定實驗名稱。如果未指定，則該執行將列在**預設值**實驗下。

1. 上傳您的管道，如下所示：

   ```
   kfp pipeline upload --pipeline-name <pipeline-name> <path-to-output-tar.gz>
   ```

   您的輸出看起來應該如下所示。請記下管道 `ID`。

   ```
   Pipeline 29c3ff21-49f5-4dfe-94f6-618c0e2420fe has been submitted
   
   Pipeline Details
   ------------------
   ID           29c3ff21-49f5-4dfe-94f6-618c0e2420fe
   Name         sm-pipeline
   Description
   Uploaded at  2020-04-30T20:22:39+00:00
   ...
   ...
   ```

1. 使用以下命令來建立執行。KFP CLI 執行命令目前不支援在建立執行時指定輸入參數。您需要在編譯之前更新 適用於 Python (Boto3) 的 AWS SDK 管道檔案中的參數。將 `<experiment-name>` 和 `<job-name>` 取代為任何名稱。將 `<pipeline-id>` 取代為您提交的管道的 ID。將 `<your-role-arn>` 取代為 `kfp-example-pod-role` 的 ARN。將 `<your-bucket-name>` 取代為您建立的 Amazon S3 儲存貯體的名稱。

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   您也可以使用作為 `dsl-compile` 命令輸出而建立的已編譯管道套件直接提交執行。

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   您的輸出看起來應如以下所示：

   ```
   Creating experiment aws.
   Run 95084a2c-f18d-4b77-a9da-eba00bf01e63 is submitted
   +--------------------------------------+--------+----------+---------------------------+
   | run id                               | name   | status   | created at                |
   +======================================+========+==========+===========================+
   | 95084a2c-f18d-4b77-a9da-eba00bf01e63 | sm-job |          | 2020-04-30T20:36:41+00:00 |
   +--------------------------------------+--------+----------+---------------------------+
   ```

1. 導覽至使用者介面以檢查工作進度。

#### 使用 KFP 使用者介面上傳並執行管道
<a name="upload-and-run-the-pipeline-using-the-kfp-ui"></a>

1. 在左側面板中，選擇**管道**標籤。

1. 在右上角選擇 **\$1 上傳管道**。

1. 輸入名稱和描述。

1. 選擇**上傳檔案**，然後輸入您使用 CLI 或使用 適用於 Python (Boto3) 的 AWS SDK建立之 tar.gz 檔案的路徑。

1. 在左側面板中，選擇**管道**標籤。

1. 尋找您建立的管道。

1. 選擇 **\$1 建立執行**。

1. 輸入您的輸入參數。

1. 選擇**執行**。

### 執行預測
<a name="running-predictions"></a>

部署分類管道後，您可以針對由部署元件建立的端點執行分類預測。使用 KFP 使用者介面檢查 `sagemaker-deploy-model-endpoint_name` 的輸出成品。下載 .tgz 檔案以擷取端點名稱，或檢查您所使用區域中的 SageMaker AI 主控台。

#### 設定執行預測的許可
<a name="configure-permissions-to-run-predictions"></a>

如果要從閘道節點執行預測，請略過本節。

**若要使用任何其他機器執行預測，請將 `sagemaker:InvokeEndpoint` 許可指派給用戶端機器使用的 IAM 角色。**

1. 在閘道節點上執行下列命令以建立 IAM 政策檔案：

   ```
   cat <<EoF > ./sagemaker-invoke.json
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "sagemaker:InvokeEndpoint"
               ],
               "Resource": "*"
           }
       ]
   }
   EoF
   ```

1. 將政策連接至用戶端節點的 IAM 角色。

   執行下列命令。將 `<your-instance-IAM-role>` 取代為 IAM 角色的名稱。將 `<path-to-sagemaker-invoke-json>` 取代為您建立的政策檔案的路徑。

   ```
   aws iam put-role-policy --role-name <your-instance-IAM-role> --policy-name sagemaker-invoke-for-worker --policy-document file://<path-to-sagemaker-invoke-json>
   ```

#### 執行預測
<a name="run-predictions"></a>

1. 使用下列內容，從名為 `mnist-predictions.py`的用戶端機器建立 適用於 Python (Boto3) 的 AWS SDK 檔案。取代 `ENDPOINT_NAME` 變數。此指令碼會載入 MNIST 資料集，透過這些數字建立 CSV，然後將 CSV 傳送至端點進行預測並列印結果。

   ```
   import boto3
   import gzip
   import io
   import json
   import numpy
   import pickle
   
   ENDPOINT_NAME='<endpoint-name>'
   region = boto3.Session().region_name
   
   # S3 bucket where the original mnist data is downloaded and stored
   downloaded_data_bucket = f"jumpstart-cache-prod-{region}"
   downloaded_data_prefix = "1p-notebooks-datasets/mnist"
   
   # Download the dataset
   s3 = boto3.client("s3")
   s3.download_file(downloaded_data_bucket, f"{downloaded_data_prefix}/mnist.pkl.gz", "mnist.pkl.gz")
   
   # Load the dataset
   with gzip.open('mnist.pkl.gz', 'rb') as f:
       train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
   
   # Simple function to create a csv from our numpy array
   def np2csv(arr):
       csv = io.BytesIO()
       numpy.savetxt(csv, arr, delimiter=',', fmt='%g')
       return csv.getvalue().decode().rstrip()
   
   runtime = boto3.Session(region).client('sagemaker-runtime')
   
   payload = np2csv(train_set[0][30:31])
   
   response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                      ContentType='text/csv',
                                      Body=payload)
   result = json.loads(response['Body'].read().decode())
   print(result)
   ```

1. 執行 適用於 Python (Boto3) 的 AWS SDK 檔案，如下所示：

   ```
   python mnist-predictions.py
   ```

### 檢視結果和日誌
<a name="view-results-and-logs"></a>

管道執行時，您可以選擇任何元件來檢查執行詳細資訊，例如輸入和輸出。這會列出已建立資源的名稱。

如果成功處理 KFP 要求並建立了 SageMaker AI 工作，則 KFP UI 中的元件日誌會提供在 SageMaker AI 中建立之任務的連結。如果成功建立工作，也會提供 CloudWatch 記錄。

如您在同一叢集上執行過多管道工作，您可能會看到錯誤訊息，指出您沒有足夠的 Pod 可用。若要修正此問題，請登入閘道節點並刪除由您未使用的管道所建立的 Pod：

```
kubectl get pods -n kubeflow
kubectl delete pods -n kubeflow <name-of-pipeline-pod>
```

### 清除
<a name="cleanup"></a>

管道完成後，您需要清除資源。

1. 如果管道執行未正確結束，請從 KFP 儀表板中選擇**終止**來終止管道執行。

1. 如果**終止**選項不起作用，請登入閘道節點並手動終止管道執行所建立的所有 Pod，如下所示：

   ```
   kubectl get pods -n kubeflow
   kubectl delete pods -n kubeflow <name-of-pipeline-pod>
   ```

1. 使用 AWS 您的帳戶登入 SageMaker AI 服務。手動停止所有訓練、批次轉換和 HPO 工作。刪除模型、資料儲存貯體和端點，以避免產生任何額外費用。終止管道執行並不會停止 SageMaker AI 中的任務。

# SageMaker 筆記本工作
<a name="notebook-auto-run"></a>

您可以使用 Amazon SageMaker AI 在任何 JupyterLab 環境中，從 Jupyter 筆記本以互動方式建置、訓練和部署機器學習模型。不過，在多種情況下，您可能會想要以已排程的非互動式工作來執行筆記本。例如，您可能想要建立定期稽核報告，以分析在特定時間範圍內執行的所有訓練工作，並分析將這些模型部署到生產環境中的商業價值。或者，在對一小部分資料進行資料轉換邏輯測試之後，您可能想要擴展特徵工程工作。其他常用使用案例包括：
+ 對模型漂移監控工作進行排程
+ 探索參數空間以獲得更好的模型

在這些案例中，您可以使用 SageMaker 筆記本任務建立非互動式任務 (SageMaker AI 將其做為基礎訓練任務執行)，以隨需執行或按排程執行。SageMaker 筆記本工作提供直觀的使用者介面，您可以在筆記本中選擇筆記本工作小工具 (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/notebook-schedule.png))，直接透過 JupyterLab 對工作進行排程。您也可以使用 SageMaker AI Python SDK 來排程任務，該 SDK 提供在管道工作流程中排程多個筆記本任務的彈性。您可以平行執行多個筆記本，並將筆記本中的儲存格參數化以自訂輸入參數。

此功能利用 Amazon EventBridge、SageMaker 訓練和 Pipelines 服務，並可在下列任何環境中於 Jupyter 筆記本內使用：
+ Studio、Studio Lab、Studio Classic 或筆記本執行個體
+ 本機設定，例如您的本機機器，您執行 JupyterLab 的位置

**先決條件**

若要對筆記本工作進行排程，請確定符合下列條件：
+ 確保您的 Jupyter 筆記本和任何初始化或啟動指令碼在代碼和軟體套件方面都是獨立的。否則，您的非互動式工作可能會產生錯誤。
+ 檢閱 [限制和考量事項](notebook-auto-run-constraints.md) 以確保您已正確設定 Jupyter 筆記本、網路設定和容器設定。
+ 確保您的筆記本可以存取所需的外部資源，例如 Amazon EMR 叢集。
+ 如果您要在本機 Jupyter 筆記本中設定筆記本工作，請完成安裝。如需說明，請參閱[安裝指南](scheduled-notebook-installation.md)。
+ 如果您連線到筆記本中的 Amazon EMR 叢集，並想要參數化 Amazon EMR 連線命令，則必須使用環境變數套用因應措施來傳遞參數。如需詳細資訊，請參閱[從您的筆記本連線至 Amazon EMR 叢集](scheduled-notebook-connect-emr.md)。
+ 如果您使用 Kerberos、LDAP 或 HTTP 基本驗證身分驗證連線至 Amazon EMR 叢集，您必須使用 AWS Secrets Manager 將安全憑證傳遞至 Amazon EMR 連線命令。如需詳細資訊，請參閱[從您的筆記本連線至 Amazon EMR 叢集](scheduled-notebook-connect-emr.md)。
+ (可選) 如果您希望使用者介面預先載入在筆記本啟動時執行的指令碼，您的管理員必須使用生命週期組態 (LCC) 來安裝指令碼。如需與如何使用 LCC 指令碼相關的資訊，請參閱[使用生命週期組態指令碼自訂筆記本執行個體](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html)。

# 安裝指南
<a name="scheduled-notebook-installation"></a>

以下提供在 JupyterLab 環境中使用筆記本任務所需安裝內容的相關資訊。

**對於 Amazon SageMaker Studio 和 Amazon SageMaker Studio Lab**

如果您的筆記本位於 Amazon SageMaker Studio 或 Amazon SageMaker Studio Lab 中，則無需執行其他安裝 - SageMaker 筆記本工作內建於之內。若要為 Studio 設定必要的許可，請參閱[設定 Studio 的政策和許可](scheduled-notebook-policies-studio.md)。

**適用於本機 Jupyter 筆記本**

如果您想要在本機的 JupyterLab 環境中使用 SageMaker 筆記本工作，則您需要執行其他安裝。

若要安裝 SageMaker 筆記本工作，請完成下列步驟：

1. 安裝 Python 3。如需詳細資訊，請參閱[安裝 Python 3 和 Python 套件](https://www.codecademy.com/article/install-python3)。

1. 安裝 JupyterLab 第 4 版或更新版本。如需詳細資訊，請參閱 [JupyterLab SDK 文件](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)。

1. 安裝 AWS CLI。如需詳細資訊，請參閱[安裝或更新最新版本的 AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)。

1. 安裝兩組許可。IAM 使用者需要許可才能將任務提交至 SageMaker AI，而且一旦提交，筆記本工作本身就會擔任 IAM 角色，根據工作任務的不同，該角色需要存取資源的許可。

   1. 如果尚未建立 IAM 使用者，請參閱[在 AWS 帳戶中建立 IAM 使用者](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html)。

   1. 如果尚未建立筆記本任務角色，請參閱[建立角色以將許可委派給 IAM 使用者](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html)。

   1. 連接必要的許可和信任政策，以連接到您的使用者和角色。如需逐步說明和許可詳細資訊，請參閱[安裝本機 Jupyter 環境政策和許可](scheduled-notebook-policies-other.md)。

1. 為新建立的 IAM 使用者產生 AWS 登入資料，並將其儲存在 JupyterLab 環境的登入資料檔案 (\$1/.aws/credentials) 中。您可以使用 CLI 命令 `aws configure` 實現此目的。如需指示，請參閱[組態和憑證設定](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)中的*使用命令設定和檢視組態設定*一節。

1. (選用) 依預設，排程器延伸模組會使用預先建置的 SageMaker AI Docker 映像檔搭配 Python 2.0。筆記本中使用的任何非預設核心都應該安裝在容器中。如果要在容器或 Docker 映像中執行筆記本，您需要建立 Amazon Elastic Container Registry (Amazon ECR) 映像。如需有關如何將 Docker 映像推送至 Amazon ECR 的相關資訊，請參閱[推送 Docker 映像](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html)。

1. 為 SageMaker 筆記本工作新增 JupyterLab 擴充功能。您可以使用以下指令將其新增至您的 JupyterLab 環境：`pip install amazon_sagemaker_jupyter_scheduler`。您可能需要使用以下命令重新啟動 Jupyter 伺服器：`sudo systemctl restart jupyter-server`。

1. 使用以下指令啟動 JupyterLab：`jupyter lab`。

1. 驗證筆記本工作小工具 (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/notebook-schedule.png)) 在 Jupyter 筆記本工作列中顯示。

# 設定 Studio 的政策和許可
<a name="scheduled-notebook-policies-studio"></a>

在排程您的第一個筆記本執行之前，您將需要安裝適當的政策和許可。以下提供設定下列許可的指示：
+ 工作執行角色信任關係
+ 連接至任務執行角色的其他 IAM 許可
+ （選用） 使用自訂 KMS 金鑰的 AWS KMS 許可政策

**重要**  
如果 AWS 您的帳戶屬於具有服務控制政策 (SCP) 的組織，則您的有效許可是 SCPs 允許的內容與您的 IAM 角色和使用者政策允許的內容之間的邏輯交集。例如，如果您組織的 SCP 指定您只能存取 `us-east-1` 和 `us-west-1` 中的資源，而您的政策僅允許您存取 `us-west-1` 和 `us-west-2` 中的資源，則最終您只能存取 `us-west-1` 中的資源。如果您想要行使您的角色和使用者政策中允許的所有權限，則組織的 SCP 應該授予與您自己的 IAM 使用者和角色政策相同的一組許可。如需與如何判斷請求是否得到允許相關的詳細資訊，請參閱[確定帳戶內是否允許或拒絕請求](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html#policy-eval-denyallow)。

**信任關係**

若要修改信任關係，請完成下列步驟：

1. 開啟 [IAM 主控台](https://console.aws.amazon.com/iam/)。

1. 在左側面板中，選取**角色**。

1. 尋找筆記本工作的工作執行角色，並選擇角色名稱。

1. 選擇**信任關係**標籤。

1. 選擇**編輯信任政策**。

1. 複製並貼上下方政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "sagemaker.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "events.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. 選擇**更新政策**。

## 其他 IAM 許可
<a name="scheduled-notebook-policies-add"></a>

在下列情況下，您可能需要包含額外的 IAM 許可：
+ 您的 Studio 執行與筆記本工作角色不同
+ 您需要透過 S3 VPC 端點存取 Amazon S3 資源
+ 您想要使用自訂 KMS 金鑰來加密 Amazon S3 儲存貯體輸入和輸出

以下討論內容提供了每個案例所需的政策。

### 您的 Studio 執行與筆記本工作角色不同時需要的許可
<a name="scheduled-notebook-policies-add-diffrole"></a>

下列 JSON 程式碼片段是一個範例政策，如果您不使用 Studio 執行角色作為筆記本工作角色，則應該將其新增至 Studio 執行和筆記本工作角色。如果您需要進一步限制許可，請檢閱並修改此政策。

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":"iam:PassRole",
         "Resource":"arn:aws:iam::*:role/*",
         "Condition":{
            "StringLike":{
               "iam:PassedToService":[
                  "sagemaker.amazonaws.com",
                  "events.amazonaws.com"
               ]
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "events:TagResource",
            "events:DeleteRule",
            "events:PutTargets",
            "events:DescribeRule",
            "events:PutRule",
            "events:RemoveTargets",
            "events:DisableRule",
            "events:EnableRule"
         ],
         "Resource":"*",
         "Condition":{
            "StringEquals":{
               "aws:ResourceTag/sagemaker:is-scheduling-notebook-job":"true"
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "s3:CreateBucket",
            "s3:PutBucketVersioning",
            "s3:PutEncryptionConfiguration"
         ],
         "Resource":"arn:aws:s3:::sagemaker-automated-execution-*"
      },
      {
            "Sid": "S3DriverAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sagemakerheadlessexecution-*"
            ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:ListTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:user-profile/*",
            "arn:aws:sagemaker:*:*:space/*",
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:AddTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "ec2:DescribeDhcpOptions",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSubnets",
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribeVpcs",
            "ecr:BatchCheckLayerAvailability",
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetAuthorizationToken",
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:GetEncryptionConfiguration",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:GetObject",
            "sagemaker:DescribeApp",
            "sagemaker:DescribeDomain",
            "sagemaker:DescribeUserProfile",
            "sagemaker:DescribeSpace",
            "sagemaker:DescribeStudioLifecycleConfig",
            "sagemaker:DescribeImageVersion",
            "sagemaker:DescribeAppImageConfig",
            "sagemaker:CreateTrainingJob",
            "sagemaker:DescribeTrainingJob",
            "sagemaker:StopTrainingJob",
            "sagemaker:Search",
            "sagemaker:CreatePipeline",
            "sagemaker:DescribePipeline",
            "sagemaker:DeletePipeline",
            "sagemaker:StartPipelineExecution"
         ],
         "Resource":"*"
      }
   ]
}
```

------

### 透過 S3 VPC 端點存取 Amazon S3 資源所需的許可
<a name="scheduled-notebook-policies-add-vpc"></a>

如果您在私有虛擬私人雲端模式下執行 SageMaker Studio，並透過 S3 VPC 私人雲端端點存取 S3，則可以向 VPC 端點政策新增許可，以控制哪些 S3 資源可透過 VPC 端點存取。將下列許可新增至您的 VPC 端點政策。如果您需要進一步限制許可，可以修改政策，例如，您可以為 `Principal` 欄位提供更狹窄的規格。

```
{
    "Sid": "S3DriverAccess",
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket"
    ],
    "Resource": "arn:aws:s3:::sagemakerheadlessexecution-*"
}
```

如需有關如何設定 S3 VPC 端點政策的詳細資訊，請參閱[編輯 VPC 端點政策](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#edit-vpc-endpoint-policy-s3)。

### 使用自訂 KMS 金鑰所需的許可 (可選)
<a name="scheduled-notebook-policies-add-kms"></a>

根據預設，輸入和輸出 Amazon S3 儲存貯體使用伺服器端加密，但您可以指定自訂 KMS 金鑰來對輸出 Amazon S3 儲存貯體中的資料，以及連接到筆記本任務的儲存磁碟區進行加密。

如果您想要使用自訂 KMS 金鑰，請連接下列政策並提供您自己的 KMS 金鑰 ARN。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

# 安裝本機 Jupyter 環境政策和許可
<a name="scheduled-notebook-policies-other"></a>

您將需要設定必要的許可和政策，才能在本機 Jupyter 環境中排程筆記本任務。IAM 使用者需要許可才能將任務提交至 SageMaker AI，而且筆記本任務本身擔任的 IAM 角色需要存取資源的許可，取決於工作任務。以下將提供如何設定必要許可和政策的指示。

您需要安裝兩組許可。下圖顯示您在本機 Jupyter 環境中排程筆記本任務的許可結構。IAM 使用者需要設定 IAM 許可，才能將任務提交至 SageMaker AI。使用者提交筆記本工作後，工作本身就會承擔 IAM 角色，根據工作任務的不同，該角色會擁有存取資源的許可。

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/notebook-jobs-permissions.png)


以下各節可協助您為 IAM 使用者和工作執行角色安裝必要的政策和許可。

## IAM 使用者許可
<a name="scheduled-notebook-policies-other-user"></a>

**將任務提交至 SageMaker AI 的許可**

若要新增提交工作的許可，請完成下列步驟：

1. 開啟 [IAM 主控台](https://console.aws.amazon.com/iam/)。

1. 在左側面板中，選取**使用者**。

1. 尋找您筆記本工作的 IAM 使用者，然後選擇使用者名稱。

1. 選擇**新增許可**，然後從下拉式功能表中選擇**建立內嵌政策**。

1. 選擇 **JSON** 標籤。

1. 複製並貼上下方政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "EventBridgeSchedule",
               "Effect": "Allow",
               "Action": [
                   "events:TagResource",
                   "events:DeleteRule",
                   "events:PutTargets",
                   "events:DescribeRule",
                   "events:EnableRule",
                   "events:PutRule",
                   "events:RemoveTargets",
                   "events:DisableRule"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "IAMPassrole",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": [
                           "sagemaker.amazonaws.com",
                           "events.amazonaws.com"
                       ]
                   }
               }
           },
           {
               "Sid": "IAMListRoles",
               "Effect": "Allow",
               "Action": "iam:ListRoles",
               "Resource": "*"
           },
           {
               "Sid": "S3ArtifactsAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:PutEncryptionConfiguration",
                   "s3:CreateBucket",
                   "s3:PutBucketVersioning",
                   "s3:ListBucket",
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:GetEncryptionConfiguration",
                   "s3:DeleteObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemaker-automated-execution-*"
               ]
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:DescribeTrainingJob",
                   "sagemaker:StopTrainingJob",
                   "sagemaker:DescribePipeline",
                   "sagemaker:CreateTrainingJob",
                   "sagemaker:DeletePipeline",
                   "sagemaker:CreatePipeline"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "AllowSearch",
               "Effect": "Allow",
               "Action": "sagemaker:Search",
               "Resource": "*"
           },
           {
               "Sid": "SagemakerTags",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:ListTags",
                   "sagemaker:AddTags"
               ],
               "Resource": [
                   "arn:aws:sagemaker:*:*:pipeline/*",
                   "arn:aws:sagemaker:*:*:space/*",
                   "arn:aws:sagemaker:*:*:training-job/*",
                   "arn:aws:sagemaker:*:*:user-profile/*"
               ]
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchGetImage"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

**AWS KMS 許可政策 （選用）**

根據預設，輸入和輸出 Amazon S3 儲存貯體使用伺服器端加密，但您可以指定自訂 KMS 金鑰來對輸出 Amazon S3 儲存貯體中的資料，以及連接到筆記本任務的儲存磁碟區進行加密。

如果您想要使用自訂 KMS 金鑰，請重複先前的指示，連接下列政策，然後提供您自己的 KMS 金鑰 ARN。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

## 工作執行角色許可
<a name="scheduled-notebook-policies-other-job"></a>

**信任關係**

若要修改工作執行角色信任關係，請完成下列步驟：

1. 開啟 [IAM 主控台](https://console.aws.amazon.com/iam/)。

1. 在左側面板中，選取**角色**。

1. 尋找筆記本工作的工作執行角色，並選擇角色名稱。

1. 選擇**信任關係**標籤。

1. 選擇**編輯信任政策**。

1. 複製並貼上下方政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": [
                       "sagemaker.amazonaws.com",
                       "events.amazonaws.com"
                   ]
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

**額外許可**

提交之後，筆記本工作需要許可才能存取資源。下列指示展示如何新增一組最低限度的許可。如有需要，請根據筆記本工作的需要新增更多許可。若要將許可新增至工作執行角色，請完成下列步驟：

1. 開啟 [IAM 主控台](https://console.aws.amazon.com/iam/)。

1. 在左側面板中，選取**角色**。

1. 尋找筆記本工作的工作執行角色，並選擇角色名稱。

1. 選擇**新增許可**，然後從下拉式功能表中選擇**建立內嵌政策**。

1. 選擇 **JSON** 標籤。

1. 複製並貼上下方政策：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "PassroleForJobCreation",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": "sagemaker.amazonaws.com"
                   }
               }
           },
           {
               "Sid": "S3ForStoringArtifacts",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:ListBucket",
                   "s3:GetBucketLocation"
               ],
               "Resource": "arn:aws:s3:::sagemaker-automated-execution-*"
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:StartPipelineExecution",
                   "sagemaker:CreateTrainingJob"
               ],
               "Resource": "*"
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetDownloadUrlForLayer",
                   "ecr:BatchGetImage",
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchCheckLayerAvailability"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. 將許可新增至筆記本工作存取的其他資源。

1. 選擇**檢閱政策**。

1. 輸入政策的名稱。

1. 選擇**建立政策**。

# 您可以在其中建立筆記本任務
<a name="create-notebook-auto-run"></a>

如果您想要建立筆記本任務，您有多個選項。以下提供 SageMaker AI 選項，供您建立筆記本任務。

您可以在 Studio UI 的 JupyterLab 筆記本中建立任務，或以程式設計方式使用 SageMaker Python SDK 建立任務：
+ 如果您在 Studio UI 中建立筆記本任務，您可以提供有關映像和核心、安全組態，以及任何自訂變數或指令碼的詳細資訊，且已排程您的任務。如需如何使用 SageMaker 筆記本任務來排程任務的詳細資訊，請參閱[在 Studio 中建立筆記本任務](create-notebook-auto-run-studio.md)。
+ 若要使用 SageMaker Python SDK 建立筆記本任務，您可以使用筆記本任務步驟建立管道，並啟動隨需執行，或選擇性地使用管道排程功能來排程未來的執行。SageMaker SDK 可讓您靈活地自訂管道 - 您可以將管道擴展到具有多個筆記本任務步驟的工作流程。由於您同時建立 SageMaker 筆記本任務步驟和管道，因此您可以在 SageMaker 筆記本任務儀表板中追蹤管道執行狀態，也可以在 Studio 中檢視管道圖。如需如何使用 SageMaker Python SDK 來排程任務的詳細資訊和範例筆記本的連結，請參閱 [使用 SageMaker AI Python SDK 建立筆記本任務範例](create-notebook-auto-run-sdk.md)。

# 使用 SageMaker AI Python SDK 建立筆記本任務範例
<a name="create-notebook-auto-run-sdk"></a>

若要使用 SageMaker Python SDK 執行獨立筆記本，您需要建立筆記本任務步驟、將其連接至管道，並使用 Pipelines 提供的公用程式來隨需執行任務，或選擇性地排程一或多個未來任務。下列各節描述建立隨需或排程筆記本任務並追蹤執行的基本步驟。此外，如果您需要將參數傳遞至筆記本任務或連線到筆記本中的 Amazon EMR，請參閱下列討論 - 在這些情況下，需要額外準備 Jupyter 筆記本。您也可以套用 `NotebookJobStep` 引數子集的預設值，以便您不必在每次建立筆記本任務步驟時指定這些引數。

若要檢視示範如何使用 SageMaker AI Python SDK 排程筆記本工作的範例筆記本，請參閱[筆記本工作範例筆記本](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step)。

**Topics**
+ [建立筆記本任務的步驟](#create-notebook-auto-run-overall)
+ [在 Studio UI 儀表板中檢視您的筆記本任務](#create-notebook-auto-run-dash)
+ [在 Studio 中檢視您的管道圖](#create-notebook-auto-run-graph)
+ [將參數傳遞到您的筆記本](#create-notebook-auto-run-passparam)
+ [在您的輸入筆記本中連線至 Amazon EMR 叢集](#create-notebook-auto-run-emr)
+ [設定預設選項](#create-notebook-auto-run-intdefaults)

## 建立筆記本任務的步驟
<a name="create-notebook-auto-run-overall"></a>

您可以建立一個立即執行或依排程執行的筆記本任務。下列指示描述這兩種方法。

**若要排程筆記本任務，請完成下列步驟：**

1. 建立 `NotebookJobStep` 執行個體。如需 `NotebookJobStep` 參數的詳細資訊，請參閱 [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep)。至少，您可以提供下列引數，如下列程式碼片段所示：
**重要**  
如果您使用 SageMaker Python SDK 排程筆記本任務，您只能指定特定映像來執行筆記本任務。如需詳細資訊，請參閱[SageMaker AI Python SDK 筆記本任務的映像限制條件](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk)。

   ```
   notebook_job_step = NotebookJobStep(
       input_notebook=input-notebook,
       image_uri=image-uri,
       kernel_name=kernel-name
   )
   ```

1. 使用 `NotebookJobStep` 做為單一步驟建立管道，如下列程式碼片段所示：

   ```
   pipeline = Pipeline(
       name=pipeline-name,
       steps=[notebook_job_step],
       sagemaker_session=sagemaker-session,
   )
   ```

1. 隨需執行管道，或選擇性地排程未來的管道執行。若要啟動立即執行，請使用下列命令：

   ```
   execution = pipeline.start(
       parameters={...}
   )
   ```

   或者，您可以按預定間隔排程單一未來管道執行或多個執行。您可以在 `PipelineSchedule` 中指定排程，然後使用 `put_triggers` 將排程物件傳遞至管道。如需管道排程的詳細資訊，請參閱[使用 SageMaker Python SDK 排程管道](pipeline-eventbridge.md#build-and-manage-scheduling)。

   下列範例排程您的管道在 2023 年 12 月 12 日 10:31:32 UTC 執行一次。

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       at=datetime(year=2023, month=12, date=25, hour=10, minute=31, second=32) 
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

   以下範例排程您的管道在 2022 年至 2023 年每個月的最後一個星期五上午 10:15 UTC 執行。如需 Cron 型排程的詳細資訊，請參閱 [Cron 型排程](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html#cron-based)。

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       cron="15 10 ? * 6L 2022-2023"
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

1. (選用) 在 SageMaker 筆記本任務儀表板中檢視您的筆記本任務。您為筆記本任務步驟的 `tags` 引數提供的值會控制 Studio UI 擷取和顯示任務的方式。如需詳細資訊，請參閱[在 Studio UI 儀表板中檢視您的筆記本任務](#create-notebook-auto-run-dash)。

## 在 Studio UI 儀表板中檢視您的筆記本任務
<a name="create-notebook-auto-run-dash"></a>

如果您指定特定標籤，做為管道步驟建立的筆記本任務會出現在 Studio Notebook 任務儀表板中。

**注意**  
只有 Studio 或本機 JupyterLab 環境中建立的筆記本工作才會建立工作定義。因此，如果您使用 SageMaker Python SDK 建立筆記本任務，則不會在筆記本任務儀表板中看到任務定義。不過，您可以檢視筆記本任務，如[檢視筆記本任務](view-notebook-jobs.md)中所述。

您可以使用下列標籤控制哪些團隊成員可以檢視您的筆記本任務：
+ 若要將筆記本顯示給網域中的所有使用者設定檔或[空間](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html)，請使用您的網域名稱新增網域標籤。範例顯示如下：
  + 金鑰：`sagemaker:domain-name`、值：`d-abcdefghij5k`
+ 若要將筆記本任務顯示給網域中的特定使用者設定檔，請同時新增使用者設定檔和網域標籤。使用者設定檔標籤的範例如下所示：
  + 金鑰：`sagemaker:user-profile-name`、值：`studio-user`
+ 若要將筆記本任務顯示給[空間](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html)，請同時新增空間和網域標籤。空間標籤的範例如下所示：
  + 金鑰：`sagemaker:shared-space-name`、值：`my-space-name`
+ 如果您未連接任何網域或使用者設定檔或空間標籤，則 Studio UI 不會顯示管道步驟建立的筆記本任務。在這種情況下，您可以在訓練任務主控台中檢視基礎訓練任務，也可以在[管道執行清單](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-studio-view-execution.html)中檢視狀態。

一旦您設定必要的標籤以在儀表板中檢視任務，請參閱[檢視筆記本任務](view-notebook-jobs.md)以取得如何檢視任務和下載輸出的指示。

## 在 Studio 中檢視您的管道圖
<a name="create-notebook-auto-run-graph"></a>

由於您的筆記本任務步驟是管道的一部分，因此您可以在 Studio 中檢視管道圖 (DAG)。在管道圖中，您可以檢視管道執行的狀態並追蹤歷程。如需詳細資訊，請參閱[檢視管道執行的詳細資訊](pipelines-studio-view-execution.md)。

## 將參數傳遞到您的筆記本
<a name="create-notebook-auto-run-passparam"></a>

如果您想要將參數傳遞至您的筆記本任務 (使用 `NotebookJobStep` 的 `parameters` 引數)，您需要準備輸入筆記本來接收參數。

Papermill 型筆記本任務執行器會搜尋標記有 `parameters` 標籤的 Jupyter 儲存格，並在此儲存格之後立即套用新參數或參數覆寫。如需詳細資訊，請參閱[對筆記本進行參數化](notebook-auto-run-troubleshoot-override.md)。

一旦您執行了此步驟，請將參數傳遞至 `NotebookJobStep`，如下列範例所示：

```
notebook_job_parameters = {
    "company": "Amazon"
}

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input-notebook,
    parameters=notebook_job_parameters,
    ...
)
```

## 在您的輸入筆記本中連線至 Amazon EMR 叢集
<a name="create-notebook-auto-run-emr"></a>

如果您從 Studio 中的 Jupyter 筆記本連線至 Amazon EMR 叢集，您可能需要進一步修改 Jupyter 筆記本。請查看[從您的筆記本連線至 Amazon EMR 叢集](scheduled-notebook-connect-emr.md)，您是否需要在筆記本中執行下列任何任務：
+ **將參數傳遞到您的 Amazon EMR 連線命令。**Studio 使用 Papermill 執行筆記本。在 SparkMagic 核心中，由於 Papermill 傳遞資訊至 SparkMagic 的方式，因此您傳遞至 Amazon EMR 連線命令的參數可能無法如預期運作。
+ **將使用者憑證傳遞至經過 Kerberos、LDAP 或 HTTP 基本身分驗證的 Amazon EMR 叢集**。您必須透過 AWS Secrets Manager傳遞使用者憑證。

## 設定預設選項
<a name="create-notebook-auto-run-intdefaults"></a>

SageMaker SDK 可讓您選擇為參數子集設定預設值，以便您不必在每次建立 `NotebookJobStep` 執行個體時指定這些參數。這些參數為 `role`、`s3_root_uri`、`s3_kms_key`、`volume_kms_key`、`subnets` 和 `security_group_ids`。使用 SageMaker AI 組態檔案來設定步驟的預設值。如需 SageMaker AI 組態檔案的詳細資訊，請參閱[搭配 SageMaker Python SDK 設定和使用預設值](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk)。

若要設定筆記本任務預設值，請將新的預設值套用至組態檔案的筆記本任務區段，如下列程式碼片段所示：

```
SageMaker:
  PythonSDK:
    Modules:
      NotebookJob:
        RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
        S3RootUri: 's3://amzn-s3-demo-bucket/my-project'
        S3KmsKeyId: 's3kmskeyid'
        VolumeKmsKeyId: 'volumekmskeyid1'
        VpcConfig:
          SecurityGroupIds:
            - 'sg123'
          Subnets:
            - 'subnet-1234'
```

# 在 Studio 中建立筆記本任務
<a name="create-notebook-auto-run-studio"></a>

**注意**  
筆記本排程器是從 Amazon EventBridge、SageMaker 訓練和 Pipelines 服務建置的。如果筆記本工作失敗，您可能會看到與這些服務相關的錯誤。以下提供如何在 Studio UI 中建立筆記本任務的相關資訊。

SageMaker 筆記本任務提供您使用 Notebook 任務小工具建立和管理非互動式筆記本任務的工具。您可以建立工作、檢視您建立的工作，以及暫停、停止或繼續現有工作。您也可以修改筆記本排程。

當您使用小工具建立排程的筆記本任務時，排程器會嘗試推論所選的預設選項，並自動填入表單，協助您快速開始使用。如果使用 Studio，則您至少無需設定任何選項就能提交隨需工作。您只需提供特定時間的排程資訊，即可提交 (已排程的) 筆記本工作定義。如果排程的工作需要特殊設定，您可以自訂其他欄位。如果您正在執行本機 Jupyter 筆記本，排程器擴充功能會提供一項功能，讓您可以自行指定預設值 (適用於選項的子集)，這樣您就不必每次都手動插入相同的值。

建立筆記本任務時，您可以包含其他檔案，例如資料集、映像和本機指令碼。若要這樣做，請選擇**使用輸入資料夾執行任務**。筆記本任務現在將可以存取輸入檔案資料夾下的所有檔案。當筆記本任務執行時，目錄的檔案結構保持不變。

若要對筆記本工作進行排程，請完成下列步驟。

1. 開啟**建立工作**表單。

   在本機 JupyterLab 環境中，選擇任務列中的**建立筆記本工作**圖示 (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/notebook-schedule.png))。如果您未看到此圖示，請依照[安裝指南](scheduled-notebook-installation.md)中的指示進行安裝。

   在 Studio 中，透過以下兩種方式的其中一種開啟表單：
   + 使用**檔案瀏覽器**

     1. 在左側面板的**檔案瀏覽器**中，以滑鼠右鍵按一下要作為排程工作執行的筆記本。

     1. 選擇**建立筆記本**。
   + 使用 Studio 筆記本
     + 在您想要作為排程工作執行的 Studio 筆記本中，選擇 Studio 工具列中的**建立筆記本工作** 圖示 (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/notebook-schedule.png))。

1. 填寫快顯表單。表單會顯示下列欄位：
   + **工作名稱**：您為工作指定的描述性名稱。
   + **輸入檔案**：您要排程在非互動模式下執行的筆記本的名稱。
   + **運算類型**：您要在其中執行筆記本的 Amazon EC2 執行個體的類型。
   + **參數**：您可以選擇性地指定為筆記本輸入的自訂參數。若要使用此特徵，您可以選擇性地使用 **parameters** 標籤來標記 Jupyter 筆記本中的特定儲存格，以控制參數的套用位置。如需詳細資訊，請參閱[對筆記本進行參數化](notebook-auto-run-troubleshoot-override.md)。
   + (選用) **使用輸入資料夾執行任務**：如果選取，排程任務將可以存取與**輸入檔案**位於相同資料夾中的所有檔案。
   + **其他選項**：您可以為工作指定其他自訂項目。例如，您可以指定映像或核心、輸入和輸出資料夾、任務重試和逾時選項、加密詳細資料以及自訂初始化指令碼。如需可套用之自訂項目的完整清單，請參閱[可用選項](create-notebook-auto-execution-advanced.md)。

1. 安排您的工作。您可以隨需執行或按固定排程執行筆記本。
   + 若要隨需執行 Jupyter 筆記本，請完成下列步驟：
     + 選取**立即執行**。
     + 選擇**建立**。
     + 系統隨即會顯示**筆記本工作**。選擇**重新載入**，將工作載入儀表板。
   + 若要按固定排程執行 Jupyter 筆記本，請完成下列步驟：
     + 選擇**按排程執行**。
     + 選擇**間隔**下拉式清單，然後選取間隔。間隔可選範圍從每分鐘到每月。您也可以選取**自訂排程**。
     + 根據您選擇的間隔，系統會顯示其他欄位，以協助您進一步指定所需的執行日期和時間。例如，如果您為每日執行選取**日**，則系統會顯示其他欄位供您指定所需的時間。請注意，您指定的任何時間都採用 UTC 格式。另請注意，如果選擇較小的間隔時間 (例如一分鐘)，則如果下一個工作開始時，先前的工作未完成，則工作會重疊。

       如果選取自訂排程，您可以在運算式方塊中使用 Cron 語法來指定確切的執行日期和時間。Cron 語法是以空格分隔的數字清單，每個清單都代表從秒到幾年的時間單位。如需 Cron 語法的說明，您可以在運算式方塊下選擇**取得 Cron 語法相關協助**。
     + 選擇**建立**。
     + 系統隨即會顯示**筆記本工作定義**標籤。選擇**重新載入**，將工作定義載入儀表板。

# 設定本機筆記本的預設選項
<a name="create-notebook-auto-execution-advanced-default"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用 Studio Classic 應用程式。如需使用已更新 Studio 體驗的資訊，請參閱 [Amazon SageMaker Studio](studio-updated.md)。  
Studio Classic 仍會針對現有工作負載進行維護，但無法再用於加入。您只能停止或刪除現有的 Studio Classic 應用程式，而且無法建立新的應用程式。建議您[將工作負載遷移至新的 Studio 體驗](studio-updated-migrate.md)。

您可以在建立筆記本任務時設定預設選項。如果您計劃使用與所提供預設值不同的選項建立多個筆記本任務，這可以節省您的時間。以下提供如何為本機筆記本設定預設選項的相關資訊。

如果必須在**建立工作**表單中手動輸入 (或貼上) 自訂值，則您可以儲存新的預設值，排程器擴充功能會在您每次建立新工作定義時插入新值。此功能適用於以下選項：
+ **角色 ARN**
+ **S3 輸入資料夾**
+ **S3 輸出資料夾**
+ **輸出加密 KMS 金鑰** (如果您開啟**設定任務加密**)
+ **任務執行個體磁碟區加密 KMS 金鑰** (如果您開啟**設定任務加密**)

如果您插入與提供的預設值不同的值，並繼續將這些值用於未來的工作執行，則此功能可幫助您節省時間。您選擇的使用者設定儲存在執行 JupyterLab 伺服器的機器上，並藉助原生 API 進行檢索。如果您為一個或多個選項提供新的預設值，而並非提供全部五個選項，則系統會針對您未自訂的選項採用先前的預設值。

下列指示為您展示如何為您的筆記本任務預覽現有預設值，設定新的預設值，以及重設預設值。

**若要為您的筆記本任務預覽現有預設值，請完成下列步驟：**

1. 遵循 [啟動 Amazon SageMaker Studio Classic](studio-launch.md) 中的指示開啟 Amazon SageMaker Studio Classic 主控台。

1. 在左側面板的**檔案瀏覽器**中，以滑鼠右鍵按一下要作為排程工作執行的筆記本。

1. 選擇**建立筆記本**。

1. 選擇**其他選項**以展開筆記本任務設定的索引標籤。您可以在此處檢視預設設定。

**若要為您的未來筆記本任務設定新的預設值，請完成下列步驟：**

1. 遵循 [啟動 Amazon SageMaker Studio Classic](studio-launch.md) 中的指示開啟 Amazon SageMaker Studio Classic 主控台。

1. 從 Studio Classic 頂端功能表中，選擇**設定**，然後選擇**進階設定編輯器**。

1. 從**設定**下方的清單中選擇 **Amazon SageMaker 排程器**。預設可能已開啟此項目。

1. 您可以直接在此 UI 頁面或使用 JSON 編輯器更新預設設定。
   + 在 UI 中，您可以插入**角色 ARN**、**S3 輸入資料夾**、**S3 輸出資料夾**、**輸出加密 KMS 金鑰**或**任務執行個體磁碟區加密 KMS 金鑰**的新值。如果變更這些值，則在您建立下一個筆記本任務時，將在**其他選項**下看到這些欄位的新預設值。
   + (選用) 若要使用 **JSON 設定編輯器**更新使用者預設值，請完成下列步驟：

     1. 在右上角選擇 **JSON 設定編輯器**。

     1. 在左側邊欄的**設定**中，選擇 **Amazon SageMaker AI 排程器**。預設可能已開啟此項目。

        您可以在**使用者偏好設定**面板中查看目前的預設值。

        您可以在**系統預設值**面板中查看系統預設值。

     1. 若要更新預設值，請將 JSON 程式碼片段從**系統預設值**面板複製並貼上至**使用者偏好設定**面板，然後更新欄位。

     1. 如果您更新了預設值，請選擇右上角的**儲存使用者設定** 圖示 (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/Notebook_save.png))。關閉編輯器並不會儲存變更。

**如果您之前已進行過變更，現在想要重設使用者定義的預設值，請完成下列步驟：**

1. 從 Studio Classic 頂端功能表中，選擇**設定**，然後選擇**進階設定編輯器**。

1. 從**設定**下方的清單中選擇 **Amazon SageMaker 排程器**。預設可能已開啟此項目。

1. 您可以直接使用此 UI 頁面或使用 JSON 編輯器來還原預設值。
   + 在 UI 中，您可以選擇右上角的**還原至預設值**。預設值會還原為空字串。只有在您先前變更過預設值時，才能看到此選項。
   + (選用) 若要使用 **JSON 設定編輯器**重新啟動預設設定，請完成下列步驟：

     1. 在右上角選擇 **JSON 設定編輯器**。

     1. 在左側邊欄的**設定**中，選擇 **Amazon SageMaker AI 排程器**。預設可能已開啟此項目。

        您可以在**使用者偏好設定**面板中查看目前的預設值。

        您可以在**系統預設值**面板中查看系統預設值。

     1. 若要還原目前的預設設定，請將內容從**系統預設值**面板複製到**使用者喜好設定**面板。

     1. 選擇右上角的**儲存使用者設定**圖示 (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/Notebook_save.png))。關閉編輯器並不會儲存變更。

# 筆記本任務工作流程
<a name="create-notebook-auto-run-dag"></a>

由於筆記本任務執行您的自訂程式碼，因此您可以建立一個管道，包括一或多個筆記本任務步驟。ML 工作流程通常包含多個步驟，例如預先處理資料的處理步驟、建置模型的訓練步驟，以及模型評估步驟等。筆記本任務的一個可能用途是處理預先處理 - 您可能有一個執行資料轉換或擷取的筆記本、一個執行資料清除的 EMR 步驟，以及另一個在啟動訓練步驟之前執行輸入特徵化的筆記本任務。筆記本任務可能需要來自管道中先前步驟的資訊，或來自使用者所指定自訂的資訊，做為輸入筆記本中的參數。如需展示如何將環境變數和參數傳遞至筆記本，並從先前步驟擷取資訊的範例，請參閱[將資訊傳遞至筆記本步驟以及從中傳遞資訊](create-notebook-auto-run-dag-seq.md)。

在另一個使用案例中，其中一個筆記本任務可能會呼叫另一個筆記本，以在筆記本執行期間執行一些任務 - 在這種情況下，您需要將這些來源筆記本指定為筆記本任務步驟的相依性。如需如何呼叫另一個筆記本的相關資訊，請參閱[在您的筆記本任務中調用另一個筆記本](create-notebook-auto-run-dag-call.md)。

若要檢視示範如何使用 SageMaker AI Python SDK 排程筆記本工作的範例筆記本，請參閱[筆記本工作範例筆記本](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step)。

# 將資訊傳遞至筆記本步驟以及從中傳遞資訊
<a name="create-notebook-auto-run-dag-seq"></a>

下列各節描述將資訊作為環境變數和參數傳遞至筆記本的方法。

## 傳遞環境變數
<a name="create-notebook-auto-run-dag-seq-env-var"></a>

將環境變數做為字典傳遞至 `NotebookJobStep` 的 `environment_variable` 引數，如下列範例所示：

```
environment_variables = {"RATE": 0.0001, "BATCH_SIZE": 1000}

notebook_job_step = NotebookJobStep(
    ...
    environment_variables=environment_variables,
    ...
)
```

您可以在使用 `os.getenv()` 的筆記本中使用環境變數，如下列範例所示：

```
# inside your notebook
import os
print(f"ParentNotebook: env_key={os.getenv('env_key')}")
```

## 傳遞參數
<a name="create-notebook-auto-run-dag-seq-param"></a>

當您將參數傳遞至 `NotebookJobStep` 執行個體中的第一個筆記本任務步驟時，您可能會選擇性地想要在 Jupyter 筆記本中標記儲存格，以指出要套用新參數或參數覆寫的位置。如需如何在 Jupyter 筆記本中標記儲存格的指示，請參閱[對筆記本進行參數化](notebook-auto-run-troubleshoot-override.md)。

您可以透過筆記本任務步驟的 `parameters` 參數傳遞參數，如下列程式碼片段所示：

```
notebook_job_parameters = {
    "company": "Amazon",
}

notebook_job_step = NotebookJobStep(
    ...
    parameters=notebook_job_parameters,
    ...
)
```

在輸入筆記本內，如果您沒有標記的儲存格，則會在標記有 `parameters` 的儲存格之後或在筆記本的開頭套用您的參數。

```
# this cell is in your input notebook and is tagged with 'parameters'
# your parameters and parameter overrides are applied after this cell
company='default'
```

```
# in this cell, your parameters are applied
# prints "company is Amazon"
print(f'company is {company}')
```

## 從上一個步驟擷取資訊
<a name="create-notebook-auto-run-dag-seq-interstep"></a>

下列討論說明如何從上一個步驟擷取要傳遞至筆記本任務步驟的資訊。

**使用 `properties` 屬性**

您可以使用下列屬性搭配上一個步驟的 `properties` 屬性：
+ `ComputingJobName` - 訓練任務名稱
+ `ComputingJobStatus` - 訓練任務狀態
+ `NotebookJobInputLocation` - 輸入 Amazon S3 位置
+ `NotebookJobOutputLocationPrefix` - 訓練任務輸出的路徑，特別是 `{NotebookJobOutputLocationPrefix}/{training-job-name}/output/output.tar.gz`。包含輸出
+ `InputNotebookName` - 輸入筆記本檔案名稱
+ `OutputNotebookName` - 輸出筆記本檔案名稱 (如果任務失敗，該名稱可能不存在於訓練任務輸出資料夾中)

下列程式碼片段展示如何從屬性擷取參數。

```
notebook_job_step2 = NotebookJobStep(
    ....
    parameters={
        "step1_JobName": notebook_job_step1.properties.ComputingJobName,
        "step1_JobStatus": notebook_job_step1.properties.ComputingJobStatus,
        "step1_NotebookJobInput": notebook_job_step1.properties.NotebookJobInputLocation,
        "step1_NotebookJobOutput": notebook_job_step1.properties.NotebookJobOutputLocationPrefix,
    }
```

**使用 JsonGet**

如果您想要傳遞上述以外的參數，且上一個步驟的 JSON 輸出位於 Amazon S3 中，請使用 `JsonGet`。`JsonGet` 是一般機制，可以直接從 Amazon S3 中的 JSON 檔案擷取資料。

若要使用 `JsonGet` 擷取 Amazon S3 中的 JSON 檔案，請完成下列步驟：

1. 將您的 JSON 檔案上傳至 Amazon S3。如果您的資料已上傳至 Amazon S3，請略過此步驟。下列範例示範如何將 JSON 檔案上傳至 Amazon S3。

   ```
   import json
   from sagemaker.s3 import S3Uploader
   
   output = {
       "key1": "value1", 
       "key2": [0,5,10]
   }
               
   json_output = json.dumps(output)
   
   with open("notebook_job_params.json", "w") as file:
       file.write(json_output)
   
   S3Uploader.upload(
       local_path="notebook_job_params.json",
       desired_s3_uri="s3://path/to/bucket"
   )
   ```

1. 提供您要擷取之值的 S3 URI 和 JSON 路徑。在下列範例中，`JsonGet` 會傳回物件，代表與索引鍵 `key2` (`10`) 相關聯值的索引 2。

   ```
   NotebookJobStep(
       ....
       parameters={
           # the key job_key1 returns an object representing the value 10
           "job_key1": JsonGet(
               s3_uri=Join(on="/", values=["s3:/", ..]),
               json_path="key2[2]" # value to reference in that json file
           ), 
           "job_key2": "Amazon" 
       }
   )
   ```

# 在您的筆記本任務中調用另一個筆記本
<a name="create-notebook-auto-run-dag-call"></a>

您可以設定其中一個筆記本任務呼叫另一個筆記本的管道。以下設定具有筆記本任務步驟的管道範例，其中筆記本會呼叫其他兩個筆記本。輸入筆記本包含下列幾行：

```
%run 'subfolder/notebook_to_call_in_subfolder.ipynb'
%run 'notebook_to_call.ipynb'
```

使用 `NotebookJobStep` 將這些筆記本傳遞至您的 `additional_dependencies` 執行個體，如下列程式碼片段所示。請注意，`additional_dependencies` 中為筆記本提供的路徑是從根位置提供的。如需 SageMaker AI 如何將您的相依檔案和資料夾上傳至 Amazon S3，以便您可以正確提供相依性路徑的相關資訊，請參閱 [NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep) 中 `additional_dependencies` 的描述。

```
input_notebook = "inputs/input_notebook.ipynb"
simple_notebook_path = "inputs/notebook_to_call.ipynb"
folder_with_sub_notebook = "inputs/subfolder"

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input_notebook,
    additional_dependencies=[simple_notebook_path, folder_with_sub_notebook],
    tags=tags,
)
```

# 可用選項
<a name="create-notebook-auto-execution-advanced"></a>

下表顯示您可以用來自訂筆記本任務的所有可用選項，無論您是在 Studio、本機 Jupyter 環境，還是使用 SageMaker Python SDK 執行筆記本任務。資料表包含自訂選項的類型、描述、有關如何使用選項的其他指導方針、Studio 中選項的欄位名稱 (如果可用)，以及 SageMaker Python SDK 中筆記本任務步驟的參數名稱 (如果可用)。

對於某些選項，您也可以預先設定自訂預設值，這樣您就不必在每次設定筆記本任務時指定這些值。對於 Studio，這些選項包括**角色**、**輸入資料夾**、**輸出資料夾**和 **KMS 金鑰 ID**，並在下表中指定。如果您為這些選項預先設定自訂預設值，當您建立筆記本任務時，這些欄位會預先填入**建立任務**表單中。如需如何在 Studio 和本機 Jupyter 環境中建立自訂預設值的詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。

SageMaker SDK 也提供您設定智慧型預設值的選項，讓您在建立 `NotebookJobStep` 時不必指定這些參數。這些參數為 `role`、`s3_root_uri`、`s3_kms_key`、`volume_kms_key`、`subnets`、`security_group_ids`，並在下表中指定。如需如何設定智慧型預設值的相關資訊，請參閱[設定預設選項](create-notebook-auto-run-sdk.md#create-notebook-auto-run-intdefaults)。


| 自訂選項 | Description | Studio 特定指南 | 本機 Jupyter 環境指南 | SageMaker Python SDK 指導方針 | 
| --- | --- | --- | --- | --- | 
| 任務名稱 | 應該出現在筆記本任務儀表板中的任務名稱。 | 欄位任務名稱。 | 與 Studio 相同。 | 參數 notebook\$1job\$1name。預設為 None。 | 
| 映像 | 用來在所選計算類型上以非互動方式執行筆記本的容器映像。 | 欄位映像。此欄位預設為筆記本的目前映像。如有需要，可以將此欄位從預設值變更為自訂值。如果 Studio 無法推論此值，表單會顯示驗證錯誤，要求您指定此值。此映像可以是自訂映像、[您自己的映像](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html)，也可以是可用的 Amazon SageMaker 映像。如需筆記本排程器支援的可用 SageMaker 映像清單，請參閱[Amazon SageMaker 映像可與 Studio Classic 筆記本搭配使用](notebooks-available-images.md)。 | 欄位映像。此欄位需要 Docker 映像的 ECR URI，該映像可以在所選計算類型上執行提供的筆記本。依預設，排程器延伸模組會使用預先建立的 SageMaker AI Docker 映像檔 - 基本 Python 2.0。這是 DockerHub 搭配 boto3 的官方 Python 3.8 映像 AWS CLI，以及 Python 3 核心。您也可以提供符合筆記本自訂映像規格的任何 ECR URI。如需詳細資訊，請參閱[Amazon SageMaker Studio Classic 的自訂 Amazon SageMaker 映像規格](studio-byoi-specs.md)。此映像應具有筆記本執行所需的所有核心和程式庫。 | 「必要」。參數 image\$1uri。ECR 上 Docker 映像檔的 URI 位置。您可以使用特定的 SageMaker Distribution 映像或基於這些映像的自訂映像，也可以使用您自己的映像，此映像已預先安裝筆記本任務相依性且符合額外要求。如需詳細資訊，請參閱[SageMaker AI Python SDK 筆記本任務的映像限制條件](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk)。 | 
| 執行個體類型 | 用來執行筆記本任務的 EC2 執行個體類型。筆記本任務使用 SageMaker 訓練任務作為運算層，因此指定的執行個體類型應為 SageMaker 訓練支援的執行個體類型。 | 欄位運算類型。預設為 ml.m5.large。 | 與 Studio 相同。 | 參數 instance\$1type。預設為 ml.m5.large。 | 
| 核心 | 用來執行筆記本工作的 Jupyter 核心。 | 欄位核心。此欄位預設為筆記本的目前核心。如有需要，可以將此欄位從預設值變更為自訂值。如果 Studio 無法推論此值，表單會顯示驗證錯誤，要求您指定此值。 | 欄位核心。此核心應存在於映像中，並遵循 Jupyter 內核規範。此欄位預設為在基本 Python 2.0 映像中的 SageMaker 核心。如有需要，可以將此欄位從預設值變更為自訂值。 | 「必要」。參數 kernel\$1name。此核心應存在於映像中，並遵循 Jupyter 內核規範。若要查看映像的核心識別碼，請參閱 (LINK)。 | 
| SageMaker AI 工作階段 | 獲委派 SageMaker AI 服務呼叫的基礎 SageMaker AI 工作階段。 | N/A | N/A | 參數 sagemaker\$1session。如果未指定，則會使用預設組態鏈結建立一個。 | 
| 角色 ARN | 角色的 Amazon Resource Name (ARN) 與筆記本工作搭配使用。 | 欄位角色 ARN。此欄位預設為 Studio 執行角色。如有需要，可以將此欄位從預設值變更為自訂值。 如果 Studio 無法推論此值，則**角色 ARN** 欄位為空白。在這種情況下，請插入您要使用的 ARN。  | 欄位角色 ARN。此欄位預設為任何字首為 SagemakerJupyterScheduler 的角色。如果您有多個帶有字首的角色，則擴展功能會從中選擇一個。如有需要，可以將此欄位從預設值變更為自訂值。對於此欄位，您可以設定自己的使用者預設值，該預設值會在您建立新工作定義時預先填入。如需詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。 | 參數 role。如果 SDK 正在 SageMaker 筆記本或 SageMaker Studio 筆記本中執行，則預設為 SageMaker AI 預設 IAM 角色。否則，它會擲回 ValueError。允許智慧型預設值。 | 
| 輸入筆記本 | 您要排程執行的筆記本名稱。 | 「必要」。欄位輸入檔案。 | 與 Studio 相同。 | 必要參數 input\$1notebook。 | 
| 輸入資料夾 | 包含您輸入內容的資料夾。工作輸入 (包括輸入筆記本和任何選用的啟動或初始化指令碼) 都放在此資料夾中。 | 欄位輸入資料夾。如果您未提供資料夾，排程器會為您的輸入建立預設的 Amazon S3 儲存貯體。 | 與 Studio 相同。對於此欄位，您可以設定自己的使用者預設值，該預設值會在您建立新工作定義時預先填入。如需詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。 | N/A。輸入資料夾位於參數 s3\$1root\$1uri 指定的位置內。 | 
| 輸出資料夾 | 包含輸內容的資料夾。工作輸出 (包括輸出筆記本和日誌) 都放在此文件夾中。 | 欄位輸出資料夾。如果您未指定資料夾，排程器會為您的輸出建立預設的 Amazon S3 儲存貯體。 | 與 Studio 相同。對於此欄位，您可以設定自己的使用者預設值，該預設值會在您建立新工作定義時預先填入。如需詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。 | N/A。輸出資料夾位於參數 s3\$1root\$1uri 指定的位置內。 | 
| Parameters | 要傳遞至筆記本任務之變數和值的字典。 | 欄位參數。您需要[參數化筆記本](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html)，才能接受參數。 | 與 Studio 相同。 | 參數 parameters。您需要[參數化筆記本](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html)，才能接受參數。 | 
| 其他 (檔案或資料夾) 相依性 | 筆記本任務上傳至 s3 暫存資料夾的檔案或資料夾相依性清單。 | 不支援。 | 不支援。 | 參數 additional\$1dependencies。筆記本任務會將這些相依性上傳至 S3 暫存資料夾，以便可在執行期間使用它們。 | 
| S3 根 URI | 包含您輸入內容的資料夾。工作輸入 (包括輸入筆記本和任何選用的啟動或初始化指令碼) 都放在此資料夾中。此 S3 儲存貯體必須位於您用來執行筆記本任務的同一 AWS 帳戶 中。 | N/A。使用輸入資料夾和輸出資料夾。 | 與 Studio 相同。 | 參數 s3\$1root\$1uri。預設為預設 S3 儲存貯體。允許智慧型預設值。 | 
| 環境變數 | 您要覆寫的任何現有環境變數，或是您要在筆記本中引入和使用的新環境變數。 | 欄位環境變數。 | 與 Studio 相同。 | 參數 environment\$1variables。預設為 None。 | 
| Tags (標籤) | 連接至任務的標籤清單。 | N/A | N/A | 參數 tags。預設為 None。您的標籤控制 Studio UI 如何擷取和顯示管道所建立的任務。如需詳細資訊，請參閱[在 Studio UI 儀表板中檢視您的筆記本任務](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash)。 | 
| 啟動指令碼 | 在筆記本啟動功能表中預先載入的指令碼，您可以選擇在執行筆記本之前執行。 | 欄位啟動指令碼。選取啟動時在映像上執行的生命週期組態 (LCC) 指令碼。 啟動指令碼會在 Studio 環境之外的 Shell 中執行。因此，此指令碼無法依賴 Studio 本機儲存空間、環境變數或應用程式中繼資料 (在 `/opt/ml/metadata` 中)。此外，如果您同時使用啟動指令碼和初始化指令碼，啟動指令碼會先執行。   | 不支援。 | 不支援。 | 
| 初始化指令碼 | 筆記本啟動時，您可以執行的本機指令碼的路徑。 | 欄位初始化指令碼。輸入本機指令碼或生命週期組態 (LCC) 指令碼所在的 EFS 檔案路徑。如果您同時使用啟動指令碼和初始化指令碼，啟動指令碼會先執行。 初始化指令碼來自與筆記本工作相同的筆記本。先前描述的啟動指令碼並非如此。此外，如果您同時使用啟動指令碼和初始化指令碼，啟動指令碼會先執行。   | 欄位初始化指令碼。輸入本機指令碼或生命週期組態 (LCC) 指令碼所在的本機檔案路徑。 | 參數 initialization\$1script。預設為 None。 | 
| 重試次數上限 | Studio 嘗試重新執行失敗工作的執行次數。 | 欄位重試次數上限。預設值為 1。 | 與 Studio 相同。 | 參數 max\$1retry\$1attempts。預設值為 1。 | 
| 最大執行期 (以秒為單位) | 筆記本工作在停止前可以執行的最大時長 (以秒為單位)。如果您同時設定最大執行期和重試嘗試次數上限，則每次重試都會套用執行期。如果工作未在此時間內完成，則其狀態會設定為 Failed。 | 欄位執行時間上限 (以秒為單位) 預設為 172800 seconds (2 days)。 | 與 Studio 相同。 | 參數 max\$1runtime\$1in\$1seconds。預設為 172800 seconds (2 days)。 | 
| 重試政策 | 重試政策的清單，這些政策管控失敗時要採取的動作。 | 不支援。 | 不支援。 | 參數 retry\$1policies。預設為 None。 | 
| 新增 Step 或 StepCollection 相依性 | 任務依賴的 Step 或 StepCollection 名稱或執行個體清單。 | 不支援。 | 不支援。 | 參數 depends\$1on。預設為 None。使用此項來定義管道圖中步驟之間的明確相依性。 | 
| 磁碟區大小 | 用於在訓練期間存放輸入和輸出資料的儲存磁碟區大小，以 GB 為單位。 | 不支援。 | 不支援。 | 參數 volume\$1size。預設為 30GB。 | 
| 加密容器之間的流量 | 指定是否針對訓練任務加密訓練容器之間流量的旗標。 | 不適用。預設啟用。 | 不適用。預設啟用。 | 參數 encrypt\$1inter\$1container\$1traffic。預設為 True。 | 
| 設定工作加密 | 此指示器表示您想要為筆記本工作輸出、工作執行個體磁碟區或兩者進行加密。 | 欄位設定任務加密。勾選此方塊可選擇加密。如果未勾選此方塊，則工作輸出會使用帳戶的預設 KMS 金鑰加密，且工作執行個體磁碟區不會加密。 | 與 Studio 相同。 | 不支援。 | 
| 輸出加密 KMS 金鑰 | 如果您想要對用於筆記本工作輸出的加密金鑰進行自訂，可以使用此 KMS 金鑰。此欄位僅在勾選了設定工作加密時適用。 | 欄位輸出加密 KMS 金鑰。如果未指定此欄位，筆記本工作輸出會使用預設的 Amazon S3 KMS 金鑰，使用 SSE-KMS 加密。此外，如果您自行建立 Amazon S3 儲存貯體並使用加密，系統會保留您的加密方法。 | 與 Studio 相同。對於此欄位，您可以設定自己的使用者預設值，該預設值會在您建立新工作定義時預先填入。如需詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。 | 參數 s3\$1kms\$1key。預設為 None。允許智慧型預設值。 | 
| 工作執行個體磁碟區加密 KMS 金鑰 | 如果您想要對工作執行個體磁碟區進行加密，可以使用此 KMS 金鑰。此欄位僅在勾選了設定工作加密時適用。 | 欄位工作執行個體磁碟區加密 KMS 金鑰。 | 欄位工作執行個體磁碟區加密 KMS 金鑰。對於此欄位，您可以設定自己的使用者預設值，該預設值會在您建立新工作定義時預先填入。如需詳細資訊，請參閱[設定本機筆記本的預設選項](create-notebook-auto-execution-advanced-default.md)。 | 參數 volume\$1kms\$1key。預設為 None。允許智慧型預設值。 | 
| 使用 Virtual Private Cloud 執行此工作 (適用於 VPC 使用者) | 此指示器表示您想要在 Virtual Private Cloud (VPC) 中執行此工作。為了獲得更好的安全性，建議您使用私有 VPC。 | 欄位使用 Virtual Private Cloud 執行此任務 如果您要使用 VPC，請勾選此方塊。至少，請建立下列 VPC 端點，讓您的筆記本任務能夠私下連線至這些 AWS 資源：[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/create-notebook-auto-execution-advanced.html)如果選擇使用 VPC，則您需要在下列選項中至少指定一個私有子網路和至少一個安全群組。如果不使用任何私有子網路，則您需要考慮其他組態選項。如需詳細資訊，請參閱[限制和考量事項](notebook-auto-run-constraints.md)中不受支援的公用 VPC 子網路。 | 與 Studio 相同。 | N/A | 
| 子網路 (適用於 VPC 使用者) | 您的子網路。此欄位必須包含至少一個子網路，最多五個子網路，而且您提供的所有子網路都應該為私有。如需詳細資訊，請參閱[限制和考量事項](notebook-auto-run-constraints.md)中不受支援的公用 VPC 子網路。 | 欄位子網路。此欄位預設為與 Studio 網域相關聯的子網路，但您可以視需要對此欄位進行變更。 | 欄位子網路。排程器無法偵測到您的子網路，因此您必須輸入為 VPC 設定的任何子網路。 | 參數 subnets。預設為 None。允許智慧型預設值。 | 
| 安全群組 (適用於 VPC 使用者) | 您的安全群組。此欄位至少必須包含一個安全群組，最多 15 個安全群組。如需詳細資訊，請參閱[限制和考量事項](notebook-auto-run-constraints.md)中不受支援的公用 VPC 子網路。 | 欄位安全群組。此欄位預設為與網域 VPC 相關聯的安全群組，但您可以視需要對此欄位進行變更。 | 欄位安全群組。排程器無法偵測到您的安全群組，因此您必須輸入為 VPC 設定的任何安全群組。 | 參數 security\$1group\$1ids。預設為 None。允許智慧型預設值。 | 
| 名稱 | 筆記本任務步驟的名稱。 | N/A | N/A | 參數 name。如果未指定，其會衍生自筆記本檔案名稱。 | 
| 顯示名稱 | 應該出現在管道執行清單中的任務名稱。 | N/A | N/A | 參數 display\$1name。預設為 None。 | 
| Description | 任務的描述。 | N/A | N/A | 參數 description。 | 

# 對筆記本進行參數化
<a name="notebook-auto-run-troubleshoot-override"></a>

若要將新參數或參數覆寫傳遞至您排程的筆記本任務，如果您想要在儲存格之後套用新參數值，您可以選擇性地修改 Jupyter 筆記本。當您傳遞參數時，筆記本任務執行器會使用 Papermill 強制執行的方法。筆記本任務執行器會搜尋標記有 `parameters` 標籤的 Jupyter 儲存格，並在此儲存格之後立即套用新參數或參數覆寫。如果您沒有任何標記有 `parameters` 的儲存格，則會在筆記本的開頭套用這些參數。如果您有多個標記有 `parameters` 的儲存格，則會在標記有 `parameters` 的第一個儲存格之後套用這些參數。

若要使用標籤 `parameters` 來標記筆記本中的儲存格，請完成下列步驟：

1. 選取要參數化的儲存格。

1. 在右側邊欄中選擇**屬性檢視器** 圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/gears.png))。

1. 在**新增標籤**方塊中輸入 **parameters**。

1. 選擇 **\$1** 號。

1. `parameters` 標籤會在**儲存格標籤**下出現，並帶有核取記號，表示標籤會套用至儲存格。

# 從您的筆記本連線至 Amazon EMR 叢集
<a name="scheduled-notebook-connect-emr"></a>

如果在 Studio 中透過 Jupyter 筆記本連線至 Amazon EMR 叢集，您可能需要執行其他設定。特別是下面討論的內容涉及兩個問題：
+ **將參數傳遞到 Amazon EMR 連線命令**。在 SparkMagic 核心中，由於 Papermill 傳遞參數的方式與 SparkMagic 接收參數的方式有所差異，因此您傳遞給 Amazon EMR 連線命令的參數可能無法依預期運作。此限制的解決方法是將參數作為環境變數傳遞。如需與此問題和解決方法相關的詳細資訊，請參閱[將參數傳遞至 EMR 連線命令](#scheduled-notebook-connect-emr-pass-param)。
+ **將使用者憑證傳遞至經過 Kerberos、LDAP 或 HTTP 基本身分驗證的 Amazon EMR 叢集**。在互動模式下，Studio 會要求在快顯表單中輸入憑證，您可以在表單中輸入登錄憑證。在非互動式排程筆記本中，您必須 AWS Secrets Manager透過傳遞憑證。如需如何在排程筆記本任務 AWS Secrets Manager 中使用 的詳細資訊，請參閱 [將使用者憑證傳遞至經過 Kerberos、LDAP 或 HTTP 基本身分驗證的 Amazon EMR 叢集](#scheduled-notebook-connect-emr-credentials)。

## 將參數傳遞至 EMR 連線命令
<a name="scheduled-notebook-connect-emr-pass-param"></a>

如果您正在搭配 SparkMagic PySpark 和 Spark 核心使用映像，並且想要對 EMR 連線命令進行參數化，請在**環境變數**欄位中提供參數，而不是在建立工作表單中的參數欄位內 (在**其他選項**下拉式功能表中) 提供參數。請確定 Jupyter 筆記本中的 EMR 連線命令會將這些參數作為環境變數傳遞。例如，假設您在建立工作時以環境變數的形式傳遞 `cluster-id`。您的 EMR 命令看起來應該如下列範例所示：

```
%%local
import os
```

```
%sm_analytics emr connect —cluster-id {os.getenv('cluster_id')} --auth-type None
```

您需要透過此解決方法來滿足 SparkMagic 和 Papermill 的要求。對於背景環境，SparkMagic 核心預期 `%%local` 魔術命令搭配您定義的任何局部變數使用。但是，Papermill 不會將 `%%local` 魔術命令與您的覆寫內容一起傳遞。為了解決此 Papermill 限制，您必須在**環境變數**欄位中將參數作為環境變數提供。

## 將使用者憑證傳遞至經過 Kerberos、LDAP 或 HTTP 基本身分驗證的 Amazon EMR 叢集
<a name="scheduled-notebook-connect-emr-credentials"></a>

若要建立使用 Kerberos、LDAP 或 HTTP 基本驗證身分驗證的 Amazon EMR 叢集的安全連線，您可以使用 AWS Secrets Manager 將使用者憑證傳遞至連線命令。如需與建立 Secrets Manager 機密相關的資訊，請參閱[建立 AWS Secrets Manager 機密](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html)。您的秘密必須包含使用者名稱和密碼。您可以使用 `--secrets` 引數傳遞機密，如下列範例所示：

```
%sm_analytics emr connect --cluster-id j_abcde12345 
    --auth Kerberos 
    --secret aws_secret_id_123
```

您的管理員可以使用屬性型存取控制 (ABAC) 方法來設定彈性存取政策，會根據特殊標記指派存取權。您可以設定彈性存取許可，為帳戶中的所有使用者建立單一機密，或為每個使用者建立機密。下列程式碼範例對這些案例進行示範：

**為帳戶中的所有使用者建立單一機密**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

**為每個使用者建立不同的機密**

您可以使用 `PrincipleTag` 標籤為每個使用者建立不同的機密，如下列範例所示：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/user-identity": "${aws:PrincipalTag/user-identity}"
                }
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

# Amazon SageMaker Studio 中的筆記本任務詳細資訊
<a name="track-jobs-jobdefs"></a>

SageMaker 筆記本工作儀表板可協助組織您排程的工作定義，並追蹤透過工作定義執行的實際工作。排程筆記本工作時，有兩個重要概念需要了解：*工作定義*和*工作執行*。工作定義是您為執行特定筆記本設定的排程。例如，您可以建立一個工作定義，在每週三執行筆記本 XYZ.IPynb。此工作定義會啟動本週三、下週三、之後週三的實際工作執行，以此類推。

**注意**  
SageMaker Python SDK 筆記本任務步驟不會建立任務定義。不過，您可以在筆記本任務儀表板中檢視任務。如果您在 JupyterLab 環境中排程任務，則任務和任務定義皆可用。

介面提供兩個主要標籤，協助您追蹤現有的工作定義和工作執行：
+ **筆記本工作**標籤：此標籤會顯示需求工作和工作定義中所有工作執行的清單。您可以從此標籤直接存取單一工作執行的詳細資料。例如，您可以檢視兩個週三前發生的單一工作執行。
+ **筆記本工作定義**標籤：此標籤會顯示所有工作定義的清單。您可以從此標籤直接存取單一工作定義的詳細資料。例如，您可以檢視為每週三執行 XYZ.IPynb 而建立的排程。

如需與**筆記本工作**標籤相關的詳細資訊，請參閱[檢視筆記本任務](view-notebook-jobs.md)。

如需與**筆記本工作定義**標籤相關的詳細資訊，請參閱[檢視筆記本工作定義](view-def-detail-notebook-auto-run.md)。

# 檢視筆記本任務
<a name="view-notebook-jobs"></a>

**注意**  
如果您已從 Studio UI 排程筆記本任務，則可以自動檢視您的筆記本任務。如果您已使用 SageMaker Python SDK 來排程筆記本任務，則需要在您建立筆記本任務步驟時提供額外的標籤。如需詳細資訊，請參閱[在 Studio UI 儀表板中檢視您的筆記本任務](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash)。

下列主題提供**筆記本任務**索引標籤的相關資訊，以及如何檢視單一筆記本任務的詳細資訊。**筆記本任務**索引標籤 (您可以在 Studio 工具列中選擇**建立筆記本任務**圖示 (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/notebook-schedule.png)) 來存取) 會顯示隨需任務的歷程記錄，以及從您建立的任務定義中執行的所有任務。建立隨需工作後，此標籤會開啟，您也可以自行檢視此標籤，以查看過去和目前工作的歷史記錄。如果選取任何工作的**工作名稱**，則可以在**工作詳細資**訊頁面中檢視單一資訊。如需與**工作詳細資訊**頁面相關的詳細資訊，請參閱下一節[檢視單一工作](#view-jobs-detail-notebook-auto-run)。

**筆記本工作**標籤包含每個工作的下列資訊：
+ **輸出檔案**：顯示輸出檔案的可用性。此欄可以包含下列內容其中之一：
  + 下載圖示 (![\[Cloud icon with downward arrow, representing download or cloud storage functionality.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/File_download.png))：輸出筆記本和日誌可供下載；選擇此按鈕來下載。請注意，如果在建立檔案之後發生失敗，失敗的工作仍然可以產生輸出檔案。在此情況下，檢視輸出筆記本會有助於識別失敗點。
  + **筆記本**和**輸出日誌**的連結：已下載筆記本和輸出記錄。選擇連結，即可在主控台中進行檢視。
  + (空白)：在能夠產生輸出檔案之前，工作已由使用者停止，或工作執行中發生失敗。例如，網路故障可能會導致工作無法啟動。

  輸出筆記本是執行筆記本中所有儲存格的結果，也會包含您加入的任何新參數、覆寫參數或環境變數。輸出日誌會擷取工作執行的詳細資料，以協助您對失敗的工作進行故障診斷。
+ **建立時間**：建立隨需工作或已排程工作的時間。
+ **狀態**：工作目前的狀態，可以是下列狀態之一：
  + **進行中**：工作正在執行
  + **失敗**：工作因組態或筆記本邏輯錯誤而失敗
  + **已停止**：使用者已停止工作
  + **已完成**：工作已完成
+ **動作**：此欄提供捷徑，協助您直接在介面中停止或移除任何工作。

## 檢視單一工作
<a name="view-jobs-detail-notebook-auto-run"></a>

您可以從**筆記本工作**標籤中選取工作名稱，以檢視特定工作的**工作詳細資訊**頁面。**工作詳細資訊**頁面包含您在**建立工作**表單中提供的所有詳細資訊。您可以在此頁面確認您在建立工作定義時指定的設定值。

此外，您還可以存取捷徑，協助自己在頁面中執行下列動作：
+ **刪除工作**：從**筆記本工作**標籤中移除工作。
+ **停止工作**：停止執行中的工作。

# 檢視筆記本工作定義
<a name="view-def-detail-notebook-auto-run"></a>

**注意**  
如果您已使用 SageMaker Python SDK 排程筆記本任務，請略過本節。只有 Studio 或本機 JupyterLab 環境中建立的筆記本工作才會建立工作定義。因此，如果您已使用 SageMaker Python SDK 建立筆記本任務，您將不會在筆記本任務儀表板中看到任務定義。不過，您可以檢視筆記本任務，如[檢視筆記本任務](view-notebook-jobs.md)中所述。

建立工作定義時，您會為工作建立排程。**筆記本任務定義**索引標籤會列出這些排程，以及特定筆記本任務定義的相關資訊。例如，您可以建立一個工作定義，每分鐘執行特定筆記本。此工作定義處於作用中狀態後，您會在**筆記本工作**標籤中每分鐘看到一個新工作。以下頁面提供**筆記本任務定義**索引標籤的相關資訊，以及如何檢視筆記本任務定義。

**筆記本工作定義**標籤會顯示包含所有工作定義的儀表板，並包括輸入筆記本、建立時間、排程以及每個工作定義的狀態。**狀態**欄包含以下其中一個值：
+ **已暫停**：您已暫停工作定義。在繼續此定義之前，Studio 不會啟動任何工作。
+ **作用中**：排程已開啟，Studio 可以根據您指定的排程執行筆記本。

此外，**動作**欄還提供捷徑，協助您直接在介面中執行下列任務：
+ 暫停：暫停工作定義。在繼續此定義之前，Studio 不會建立任何工作。
+ 刪除：從**筆記本工作定義**標籤移除工作定義。
+ 繼續：繼續暫停的工作定義，以便它可以啟動工作。

如果您已建立工作定義，但未啟動工作，請參閱[故障診斷指南](notebook-auto-run-troubleshoot.md)中的[工作定義不會建立工作](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs)。

## 檢視單一工作定義
<a name="view-job-definition-detail-page"></a>

如果在**筆記本工作定義**標籤中選取工作定義名稱，您會看到**工作定義**頁面，並且可以在其中檢視工作定義的特定詳細資訊。您可以在此頁面確認您在建立工作定義時指定的設定值。如果沒有看到透過工作定義建立的任何工作，請參閱[故障診斷指南](notebook-auto-run-troubleshoot.md)中的[工作定義不會建立工作](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs)。

此頁面也包含一個區段，列出透過此工作定義執行的工作。在**工作定義**頁面中檢視工作可能比在**筆記本工作標籤**中檢視工作更有效，因為工作定義頁面彙總了所有工作定義中的所有工作。

此外，此頁面還提供下列動作的捷徑：
+ **暫停/繼續**：暫停工作定義，或繼續暫停的定義。請注意，如果此定義的工作目前正在執行，Studio 不會停止相應工作。
+ **執行**：透過此工作定義執行單一隨需工作。在開始工作之前，您也可以透過此選項為筆記本指定不同的輸入參數。
+ **編輯工作定義**：變更工作定義的排程。您可以選取不同的時間間隔，也可以使用 Cron 語法選擇自訂排程。
+ **刪除工作定義**：從**筆記本工作定義**標籤移除工作定義。請注意，如果此定義的工作目前正在執行，Studio 不會停止相應工作。

# 故障診斷指南
<a name="notebook-auto-run-troubleshoot"></a>

請參閱此疑難排解指南，以協助您對筆記本工作執行排程期間時可能遇到的失敗進行偵錯。

## 工作定義不會建立工作
<a name="notebook-auto-run-troubleshoot-no-jobs"></a>

如果您的任務定義未啟動任何任務，筆記本或訓練任務可能不會顯示在 Amazon SageMaker Studio 左側導覽列的**任務**區段中。若是如此，您可以在 Studio 左側導覽列的**管道**區段中找到錯誤訊息。每個筆記本或訓練任務定義都屬於執行管道。以下是無法啟動筆記本任務的常見原因。

**缺少許可**
+ 指派給任務定義的角色與 Amazon EventBridge 沒有信任關係。也就是說，EventBridge 無法擔任該角色。
+ 指派給工作定義的角色沒有呼叫 `SageMaker AI:StartPipelineExecution` 的許可。
+ 指派給工作定義的角色沒有呼叫 `SageMaker AI:CreateTrainingJob` 的許可。

**超過 EventBridge 配額**

如果您看到與下列範例類似的 `Put*` 錯誤，則表示您已超出 EventBridge 配額。若要解決此問題，您可以清除未使用的 EventBridge 執行，或 AWS 支援 要求 增加您的配額。

```
LimitExceededException) when calling the PutRule operation: 
The requested resource exceeds the maximum number allowed
```

如需與 EventBridge 配額相關的詳細資訊，請參閱 [Amazon EventBridge 配額](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html)。

**超過管道配額限制**

如果您看到與下列範例類似的錯誤，則表示已超出可執行的管道數。若要解決此問題，您可以清除帳戶中未使用的管道，或要求 AWS 支援 增加配額。

```
ResourceLimitExceeded: The account-level service limit 
'Maximum number of pipelines allowed per account' is XXX Pipelines, 
with current utilization of XXX Pipelines and a request delta of 1 Pipelines.
```

如需管道配額的詳細資訊，請參閱 [Amazon SageMaker AI 端點與配額](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html)。

**超過訓練工作限制**

如果您看到與下列範例類似的錯誤，則表示已超出可執行的訓練工作數。若要解決此問題，請減少您帳戶中的訓練任務數量，或 AWS 支援 要求增加您的配額。

```
ResourceLimitExceeded: The account-level service limit 
'ml.m5.2xlarge for training job usage' is 0 Instances, with current 
utilization of 0 Instances and a request delta of 1 Instances. 
Please contact AWS support to request an increase for this limit.
```

如需訓練任務配額的詳細資訊，請參閱 [Amazon SageMaker AI 端點與配額](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html)。

## SparkMagic 筆記本中的自動視覺化已停用
<a name="notebook-auto-run-troubleshoot-visualization"></a>

如果您的筆記本使用 SparkMagic PySpark 核心，而且您將筆記本作為筆記本任務執行，您可能會在輸出中看到自動視覺化已停用。開啟自動視覺化會導致核心停止運作，因此筆記本任務執行器目前會停用自動視覺化做為解決方法。

# 限制和考量事項
<a name="notebook-auto-run-constraints"></a>

檢閱下列限制條件，確保您的筆記本工作順利完成。Studio 使用 Papermill 執行筆記本。您可能需要更新 Jupyter 筆記本以符合 Papermill 的要求。也有針對 LCC 指令碼內容的限制，並且您需要了解與 VPC 組態相關的重要詳細資料。

## JupyterLab 版本
<a name="notebook-auto-run-constraints-jpt"></a>

支援 JupyterLab 4.0 版。

## 安裝需要重新啟動核心的套件
<a name="notebook-auto-run-constraints-pmill-pkg"></a>

Papermill 不支援調用 `pip install` 來安裝需要重新啟動內核的軟體套件。在此情況下，請在初始化指令碼中使用 `pip install`。對於不需要重新啟動核心的套件安裝，您仍然可以在筆記本中包含 `pip install`。

## 使用 Jupyter 註冊的核心和語言名稱
<a name="notebook-auto-run-constraints-pmill-names"></a>

Papermill 會為特定內核和語言註冊翻譯器。如果您使用自帶執行個體 (BYOI)，請使用下列程式碼片段所示的標準核心名稱：

```
papermill_translators.register("python", PythonTranslator)
papermill_translators.register("R", RTranslator)
papermill_translators.register("scala", ScalaTranslator)
papermill_translators.register("julia", JuliaTranslator)
papermill_translators.register("matlab", MatlabTranslator)
papermill_translators.register(".net-csharp", CSharpTranslator)
papermill_translators.register(".net-fsharp", FSharpTranslator)
papermill_translators.register(".net-powershell", PowershellTranslator)
papermill_translators.register("pysparkkernel", PythonTranslator)
papermill_translators.register("sparkkernel", ScalaTranslator)
papermill_translators.register("sparkrkernel", RTranslator)
papermill_translators.register("bash", BashTranslator)
```

## 參數和環境變數限制
<a name="notebook-auto-run-constraints-var-limits"></a>

**參數和環境變數限制。**您建立筆記本工作時，該工作會接收您指定的參數和環境變數。您最多可以傳遞 100 個參數。每個參數名稱最多可以有 256 個字元，相關聯的值最多可以有 2500 個字元。如果傳遞環境變數，您最多可以傳遞 28 個變數。變數名稱和相關聯值的最多可以有 512 個字元。如果您需要的環境變數數量超過 28 個，請在對您可以使用的環境變數沒有數量限制的初始化指令碼中使用其他環境變數。

## 檢視任務和任務定義
<a name="notebook-auto-run-constraints-view-job"></a>

**檢視任務和任務定義。**如果您在 JupyterLab 筆記本的 Studio UI 中排程筆記本任務，則可以在 Studio UI 中[檢視您的筆記本任務](https://docs.aws.amazon.com/sagemaker/latest/dg/view-notebook-jobs.html)和您的[筆記本任務定義](https://docs.aws.amazon.com/sagemaker/latest/dg/view-def-detail-notebook-auto-run.html)。如果您已使用 SageMaker Python SDK 排程筆記本任務，您只能檢視任務 - SageMaker Python SDK 筆記本任務步驟不會建立任務定義。若要檢視您的任務，您也需要將額外的標籤提供給筆記本任務步驟執行個體。如需詳細資訊，請參閱[在 Studio UI 儀表板中檢視您的筆記本任務](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash)。

## 影像
<a name="notebook-auto-run-constraints-image"></a>

您需要管理映像限制條件，這取決於您在 Studio 中執行筆記本任務，還是在管道中執行 SageMaker Python SDK 筆記本任務步驟。

### SageMaker AI 筆記本任務的映像限制條件 (Studio)
<a name="notebook-auto-run-constraints-image-studio"></a>

**映像和內核支援。**啟動筆記本工作的驅動程式假設存在下列事實：
+ 在 Studio 或自帶(BYO)映像中安全了基本 Python 執行期環境，並且這是 Shell 中的預設設置。
+ 基本 Python 執行期環境包括 Jupyter 用戶端，其中包含正確設定的核心核規格。
+ 基本 Python 執行期環境包括 `pip` 函式，因此筆記本工作可以安裝系統依賴項。
+ 對於具有多個環境的映像，您的初始化指令碼應該在安裝特定於筆記本的套件之前，切換到適當的核心特定環境。在設定核心 Python 執行期環境之後，您應該切換回預設的 Python 執行期環境 (如果與核心執行期環境不同)。

啟動筆記本作業的驅動程式是 bash 指令碼，Bash v4 必須在 /bin/bash 中可用。

**自帶映像 (BYOI) 的 root 權限。**您必須擁有自己 Studio 映像的 root 權限，無論是具有 root 使用者身分還是擁有 `sudo` 存取權。如果您不是 root 使用者，而是能夠透過 `sudo` 存取 root 權限，請使用 **1000/100** 作為 `UID/GID`。

### SageMaker AI Python SDK 筆記本任務的映像限制條件
<a name="notebook-auto-run-constraints-image-sdk"></a>

筆記本任務步驟支援下列映像：
+ [Amazon SageMaker 映像可與 Studio Classic 筆記本搭配使用](notebooks-available-images.md) 中列出的 SageMaker Distribution 映像。
+ 上一個清單中基於 SageMaker Distribution 映像的自訂映像。使用 [SageMaker Distribution 映像](https://github.com/aws/sagemaker-distribution)作為基礎。
+ 預先安裝筆記本任務相依性的自訂映像 (BYOI)，即 [sagemaker-headless-execution-driver](https://pypi.org/project/sagemaker-headless-execution-driver/)。您的映像必須符合以下要求：
  + 映像已預先安裝筆記本任務相依性。
  + 基本 Python 執行時期環境已安裝，且是 shell 環境中的預設值。
  + 基本 Python 執行期環境包括 Jupyter 用戶端，其中包含正確設定的核心核規格。
  + 您必須具有根權限，無論是以根使用者身分還是透過 `sudo` 存取。如果您不是 root 使用者，而是能夠透過 `sudo` 存取 root 權限，請使用 **1000/100** 作為 `UID/GID`。

## 工作建立期間使用的 VPC 子網路
<a name="notebook-auto-run-constraints-vpc"></a>

如果您使用 VPC，Studio 會使用您的私有子網路來建立工作。指定一到五個私有子網路 (和 1-15 個安全群組)。

如果您使用具有私有子網路的 VPC，則必須選擇下列其中一個選項，以確保筆記本工作可以連線至相依服務或資源：
+ 如果任務需要存取支援介面 VPC 端點的 AWS 服務，請建立端點以連線至服務。如需支援介面端點的服務清單，請參閱 [AWS 整合的 服務 AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html)。如需建立介面 VPC 端點的資訊，請參閱[使用介面 VPC 端點存取 AWS 服務](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html)。至少必須提供 Amazon S3 VPC 端點閘道。
+ 如果筆記本任務需要存取不支援介面 VPC 端點 AWS 的服務，或存取外部的資源 AWS，請建立 NAT 閘道並設定安全群組以允許傳出連線。如需替 VPC 設定 NAT 閘道的相關資訊，請參閱 [Amazon Virtual Private Cloud 使用者指南](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html)中的* VPC 搭配公有與私有子網路 (NAT)* 的相關文章。

## 服務限制
<a name="notebook-auto-run-constraints-service-limit"></a>

由於筆記本工作排程器是從 Pipelines、SageMaker 訓練和 Amazon EventBridge 服務建置的，因此您的筆記本任務會受到其服務特定配額的限制。如果超出這些配額，您可能會看到與這些服務相關的錯誤訊息。例如，與一次可以執行的管道數量，以及單一事件匯流排可以設定的規則數量相關的限制。如需 SageMaker AI 配額的詳細資訊，請參閱 [Amazon SageMaker AI 端點與配額](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html)。如需與 EventBridge 配額相關的詳細資訊，請參閱 [Amazon EventBridge 配額](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html)。

# SageMaker 筆記本工作的定價
<a name="notebook-auto-run-pricing"></a>

為筆記本工作排程時，Jupyter 筆記本會在 SageMaker 訓練執行個體上執行。在**建立工作**表單中選取**映像**和**核心**後，表單會提供可用計算類型的清單。根據透過工作定義執行的所有筆記本工作的總使用期間，系統會依據您選擇的運算類型收費。如果您未指定運算類型，SageMaker AI 會為您指派一個預設的 Amazon EC2 執行個體類型 `ml.m5.large`。如需依運算類型劃分的 SageMaker 定價明細，請參閱 [Amazon SageMaker AI 定價](https://aws.amazon.com/sagemaker/pricing)。

# 排程您的 ML 工作流程
<a name="workflow-scheduling"></a>

使用 Amazon SageMaker AI，您可以在建立資料集、執行資料轉換、從資料建置模型，以及將模型部署到端點以進行推論時，管理整個 ML 工作流程。如果您定期執行工作流程的任何步驟子集，您也可以選擇按排程執行這些步驟。例如，您可能想要在 SageMaker Canvas 中排程一個任務，每小時對新資料執行轉換。在另一個案例中，您可能想要排程每週任務，以監控所部署模型的模型漂移。您可以指定任何時間間隔的週期性排程 - 您可以每秒、每分鐘、每日、每週、每月或每月第 3 個星期五下午 3 點重複執行。

**以下案例總結了您的可用選項，取決於您的使用案例。**
+ 使用案例 1：**在無程式碼環境中建置和排程您的 ML 工作流程**。對於 SageMaker AI 新手，您可以使用 Amazon SageMaker Canvas 來建置 ML 工作流程，並使用 Canvas UI 型排程器建立排程執行。
+ 使用案例 2：**在單一 Jupyter 筆記本中建置您的工作流程，並使用無程式碼排程器**。經驗豐富的 ML 從業人員可以使用程式碼，在 Jupyter 筆記本中建置其 ML 工作流程，並使用筆記本任務小工具提供的無程式碼排程選項。如果您的 ML 工作流程包含多個 Jupyter 筆記本，您可以使用使用案例 3 中所述 Pipelines Python SDK 中的排程特徵。
+ 使用案例 3：**使用 Pipelines 建置和排程您的 ML 工作流程**。進階使用者可以使用 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable)、Amazon SageMaker Pipelines 視覺化編輯器或 Pipelines 提供的 Amazon EventBridge 排程選項。您可以建置 ML 工作流程，其中包含各種 SageMaker AI 功能 AWS 和服務的操作，例如 Amazon EMR。


| 描述項 | 使用案例 1 | 使用案例 2 | 使用案例 3 | 
| --- | --- | --- | --- | 
| SageMaker AI 功能 | Amazon SageMaker Canvas 資料處理和 ML 工作流程排程 | 筆記本任務排程小工具 (UI) | Pipelines Python SDK 排程選項 | 
| Description | 使用 Amazon SageMaker Canvas，您可以排程自動執行資料處理步驟，以及在個別程序中，排程自動更新資料集。您也可以間接排程整個 ML 工作流程，方法為設定一個組態，每當更新特定資料集時就執行批次預測。對於自動資料處理和資料集更新，SageMaker Canvas 會提供基本表單，您可以從中選取開始時間和日期，以及執行之間的時間間隔 (或如果您排程資料處理步驟，則為 cron 表達式）。如需如何排程資料處理步驟的詳細資訊，請參閱[建立自動處理新資料的排程](canvas-data-export-schedule-job.md)。如需如何排程資料集和批次預測更新的詳細資訊，請參閱[如何管理自動化](canvas-manage-automations.md)。 | 如果您在單一 Jupyter 筆記本中建置了資料處理和管道工作流程，您可以使用筆記本任務小工具，隨需或按排程執行筆記本。筆記本任務小工具會顯示基本表單，您可以在其中指定運算類型、執行排程和選用的自訂設定。您可以透過選取時間型間隔或插入 Cron 表達式來定義執行排程。小工具會自動安裝在 Studio 中，或者您可以執行其他安裝，以在本機 JupyterLab 環境中使用此特徵。如需筆記本任務的詳細資訊，請參閱[SageMaker 筆記本工作](notebook-auto-run.md)。 | 如果您已使用 Pipelines 實作 ML 工作流程，則可以使用 SageMaker SDK 中的排程特徵。您的管道可以包含微調、資料處理和部署等步驟。Pipelines 支援兩種排程管道的方式。您可以建立 Amazon EventBridge 規則，或使用 SageMaker SDK [PipelineSchedule](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.triggers.PipelineSchedule) 建構函數或 Amazon SageMaker Pipelines 視覺化編輯器來定義排程。如需 Pipelines 中可用排程選項的詳細資訊，請參閱[排程管道執行](pipeline-eventbridge.md)。 | 
| 最佳化 | 為 SageMaker Canvas ML 工作流程提供排程選項 | 為 Jupyter 筆記本型 ML 工作流程提供 UI 型排程選項 | 為 ML 工作流程提供 SageMaker SDK 或 EventBridge 排程選項 | 
| 考量事項 | 您可以使用 Canvas 無程式碼架構來排程工作流程，但資料集更新和批次轉換更新最多可處理 5GB 的資料。 | 您可以在相同的任務中使用 UI 型排程表單來排程一個筆記本，但不能排程多個筆記本。若要排程多個筆記本，請使用使用案例 3 中所述的 Pipelines SDK 程式碼型解決方案。 | 您可以使用 Pipelines 提供的更進階 (SDK 型) 排程功能，但您需要參考 API 文件來指定正確的選項，而不是從 UI 型選項功能表中選取選項。 | 
| 建議的環境 | Amazon SageMaker Canvas | Studio、本機 JupyterLab 環境 | Studio、本機 JupyterLab 環境、任何程式碼編輯器 | 

## 其他資源
<a name="workflow-scheduling-addit"></a>

**SageMaker AI 提供下列其他選項來排程您的工作流程。**
+ [什麼是 Amazon EventBridge 排程器？](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html) 本節討論的排程選項包括 SageMaker Canvas、Studio 和 SageMaker AI Python SDK 中提供的預先建置選項。所有選項都會擴展 Amazon EventBridge 的特徵，您也可以使用 EventBridge 建立自己的自訂排程解決方案。
+ [以排程和事件為基礎執行特徵處理器管道](feature-store-feature-processor-schedule-pipeline.md)。 使用 Amazon SageMaker Feature Store Feature Processing，您可以將特徵處理管道設定為按排程或由於另一個 AWS 服務事件而執行。

# AWS Batch 支援 SageMaker AI 訓練任務
<a name="training-job-queues"></a>

[AWS Batch 任務佇列](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html)會在提交的任務在運算資源上執行之前先將其儲存並排定優先順序。您可以將 SageMaker AI 訓練任務提交至任務佇列，以利用 提供的無伺服器任務排程和優先順序工具 AWS Batch。

## 運作方式
<a name="training-job-queues-how-it-works"></a>

下列步驟說明如何搭配 SageMaker AI 訓練 AWS Batch 任務使用任務佇列的工作流程。如需更詳細的教學課程和範例筆記本，請參閱[開始使用](#training-job-queues-get-started)一節。
+ 設定 AWS Batch 和任何必要的許可。如需詳細資訊，請參閱《AWS Batch 使用者指南》**中的[設定 AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/get-set-up-for-aws-batch.html)。
+ 在 主控台或使用 建立下列 AWS Batch 資源 AWS CLI：
  + [服務環境](https://docs.aws.amazon.com/batch/latest/userguide/service-environments.html) - 包含用於與 SageMaker AI 整合的組態參數。
  + [SageMaker AI 訓練任務佇列](https://docs.aws.amazon.com/batch/latest/userguide/create-sagemaker-job-queue.html) - 與 SageMaker AI 整合以提交訓練任務。
+ 設定您的詳細資訊並請求 SageMaker AI 訓練任務，例如您的訓練容器映像。若要將訓練任務提交至 AWS Batch 佇列，您可以使用 AWS CLI 適用於 Python (Boto3) 的 AWS SDK、 或 SageMaker AI Python SDK。
+ 將您的訓練任務提交至任務佇列。您可以使用下列選項來提交任務：
  + 使用 AWS Batch [SubmitServiceJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitServiceJob.html) API。
  + 從 SageMaker AI Python SDK 使用 [`aws_batch` 模組](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/aws_batch)。在建立 TrainingQueue 物件和模型訓練物件 (例如估算器或 ModelTrainer) 之後，您可以使用 `queue.submit()` 方法將訓練任務提交至 TrainingQueue。
+ 提交任務後，請使用 AWS Batch 主控台、 AWS Batch [DescribeServiceJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_DescribeServiceJob.html) API 或 SageMaker AI [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) API 檢視您的任務佇列和任務狀態。

## 成本和可用性
<a name="training-job-queues-cost-availability"></a>

如需訓練任務的詳細定價資訊，請參閱 [Amazon SageMaker AI 定價](https://aws.amazon.com/sagemaker-ai/pricing/)。使用 時 AWS Batch，您只需支付任何使用 AWS 的資源，例如 Amazon EC2 執行個體。如需詳細資訊，請參閱 [AWS Batch 定價](https://aws.amazon.com/batch/pricing/)。

您可以在提供訓練任務的任何 AWS 區域 中使用 AWS Batch 進行 SageMaker AI 訓練任務。如需詳細資訊，請參閱[Amazon SageMaker AI 端點與配額](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html)。

若要確保您在需要時擁有所需的容量，您可以使用 SageMaker AI 彈性訓練計畫 (FTP)。這些計畫可讓您為訓練任務保留容量。與 AWS Batch的佇列功能結合時，您可以在計劃持續時間內最大化使用率。如需詳細資訊，請參閱[為您的訓練任務或 HyperPod 叢集保留訓練計畫](https://docs.aws.amazon.com/sagemaker/latest/dg/reserve-capacity-with-training-plans.html)。

## 開始使用
<a name="training-job-queues-get-started"></a>

如需如何設定 AWS Batch 任務佇列並提交 SageMaker AI 訓練任務的教學課程，請參閱*AWS Batch 《 使用者指南*》中的 [AWS Batch SageMaker AI 入門](https://docs.aws.amazon.com/batch/latest/userguide/getting-started-sagemaker.html)。

如需展示如何在 SageMaker AI Python SDK 中使用 `aws_batch` 模組的 Jupyter 筆記本，請參閱 [amazon-sagemaker-examples GitHub 儲存庫中的AWS Batch for SageMaker AI T訓練任務筆記本範例](https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20%20%20build_and_train_models/sm-training-queues)。

# Amazon SageMaker 機器學習 (ML) 歷程追蹤
<a name="lineage-tracking"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用 Studio Classic 應用程式。如需使用已更新 Studio 體驗的資訊，請參閱 [Amazon SageMaker Studio](studio-updated.md)。  
Studio Classic 仍會針對現有工作負載進行維護，但無法再用於加入。您只能停止或刪除現有的 Studio Classic 應用程式，而且無法建立新的應用程式。建議您[將工作負載遷移至新的 Studio 體驗](studio-updated-migrate.md)。

Amazon SageMaker 機器學習 (ML) 歷程追蹤會從資料準備到模型部署，建立並儲存與 ML 工作流程所有步驟相關的資訊。透過追蹤這些資訊，您可以重現工作流程步驟、追蹤模型和資料集歷程，以及建立模型控管和稽核標準。

SageMaker AI 的歷程追蹤特徵可在後端運作，以追蹤與模型訓練和部署工作流程相關聯的所有中繼資料。這包括您的訓練工作、使用的資料集、管道、端點和實際模型。您可以隨時查詢歷程服務，尋找用於訓練模型的確切成品。使用這些成品，您可以重新建立相同的機器學習 (ML) 工作流程以重製模型，只要您有權存取所使用的確切資料集。試驗元件會追蹤訓練工作。此試驗元件具有做為訓練工作一部分使用的所有參數。如果您不需要重新執行整個工作流程，您可以重製訓練工作以衍生相同的模型。

使用 SageMaker AI 歷程追蹤，資料科學家和模型建置器可以執行下列動作：
+ 保留模型發現實驗的執行歷史記錄。
+ 透過追蹤模型歷程成品來建立模型控管，以進行稽核和合規性驗證。

下圖顯示 Amazon SageMaker AI 在端對端模型訓練和部署 ML 工作流程中自動建立的範例歷程圖。

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/pipelines/PipelineLineageWorkflow.png)


**Topics**
+ [歷程追蹤實體](lineage-tracking-entities.md)
+ [Amazon SageMaker AI 建立的追蹤實體](lineage-tracking-auto-creation.md)
+ [手動建立追蹤實體](lineage-tracking-manual-creation.md)
+ [查詢歷程實體](querying-lineage-entities.md)
+ [追蹤跨帳戶歷程](xaccount-lineage-tracking.md)

# 歷程追蹤實體
<a name="lineage-tracking-entities"></a>

追蹤實體會保留端對端機器學習工作流程中所有元素的表現形式。您可以使用此表現形式來建立模型控管、重現工作流程，以及維護工作歷程記錄。

在您建立 SageMaker AI 任務 (例如處理任務、訓練任務和批次轉換任務) 時，Amazon SageMaker AI 會自動為試用元件及其相關聯的試用和實驗建立追蹤實體。除了自動追蹤之外，您還可以[手動建立追蹤實體](lineage-tracking-manual-creation.md)，為工作流程中的自訂步驟建立模型。如需詳細資訊，請參閱[Studio Classic 中的 Amazon SageMaker Experiments](experiments.md)。

SageMaker AI 還會自動為工作流程中的其他步驟建立追蹤實體，以便您可以實現端對端的工作流程追蹤。如需詳細資訊，請參閱[Amazon SageMaker AI 建立的追蹤實體](lineage-tracking-auto-creation.md)。

您可以建立其他實體來補充 SageMaker AI 建立的實體。如需詳細資訊，請參閱[手動建立追蹤實體](lineage-tracking-manual-creation.md)。

SageMaker AI 會重複使用任何現有實體，而不是建立新實體。例如，只能有一個成品具有唯一的 `SourceUri`。

**查詢歷程的重要概念**
+ **歷程** - 追蹤機器學習 (ML) 工作流程中各個實體之間關係的中繼資料。
+ **查詢歷程** - 檢查歷程並探索實體之間關係的動作。
+ **歷程實體** - 組成歷程所的中繼資料元素。
+ **跨帳戶歷程** - 您的機器學習 (ML) 工作流程可能跨越多個帳戶。透過跨帳戶歷程，您可以設定多個帳戶，在共用實體資源之間自動建立歷程關聯。然後，查詢歷程甚至可以從共用帳戶傳回實體。

已定義下列追蹤實體：

**實驗實體**
+ [試用元件](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrialComponent.html) - 一個機器學習試用階段。包括處理工作、訓練工作和批次轉換工作。
+ [試用](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrial.html) - 試用元件的組合，通常會產生模型。
+ [實驗](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateExperiment.html) - 一組試用，通常著重於解決特定用例。

**歷程實體**
+ [試用元件](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrialComponent.html) - 代表歷程中的處理、訓練和轉換工作。也是實驗管理的一部分。
+ [內容](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateContext.html) - 提供其他追蹤或實驗實體的邏輯群組。從概念上講，實驗和試驗都屬於內容。有些範例是端點和模型套件。
+ [動作](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAction.html) - 代表動作或活動。一般而言，動作至少涉及一個輸入成品或輸出成品。例如，工作流程步驟和模型部署。
+ [成品](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateArtifact.html) - 代表 URI 可定址物件或資料。成品通常是試用元件或動作的輸入或輸出。例如，資料集 (S3 儲存貯體 URI) 或映像 (Amazon ECR 登錄檔路徑)。
+ [關聯](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html) - 連結其他追蹤或實驗實體，例如訓練資料位置與訓練工作之間的關聯。

  關聯具有可選的 `AssociationType` 屬性。下列是每種類型的值與建議使用方式。SageMaker AI 對其使用方式沒有設下任何限制：
  + `ContributedTo` - 此來源對目標作出貢獻或對目標的啟用發揮作用。例如，訓練資料對訓練工作作出貢獻。
  + `AssociatedWith` - 此來源與目標連接。例如，核准工作流程與模型部署相關聯。
  + `DerivedFrom` - 目標是對此來源的修改。例如，處理工作的通道輸入摘要輸出是從原始輸入衍生出來的。
  + `Produced` - 目標是由此來源產生的。例如，訓練工作產生了模型成品。
  + `SameAs` - 在不同帳戶中使用相同的歷程實體。

**一般屬性**
+ **類型屬性**

  動作、成品和內容實體分別具有*類型*屬性 `ActionType`、`ArtifactType` 和 `ContextType`。此屬性是自訂字串，可以將有意義的資訊與實體相關聯，並用作清單 API 中的篩選器。
+ **來源屬性**

  動作、成品和內容實體具有 `Source` 屬性。此屬性提供實體所代表的基礎 URI。部分範例如下：
  + 來源為 `EndpointArn` 的 `UpdateEndpoint` 動作。
  + 來源為 `ImageUri` 之處理工作的映像成品。
  + 來源為 `EndpointArn` 的 `Endpoint` 內容。
+ **中繼資料屬性**

  動作和成品實體具有可選的 `Metadata` 屬性，可提供下列資訊：
  + `ProjectId` – 例如，模型所屬的 SageMaker AI MLOps 專案的 ID。
  + `GeneratedBy` - 例如，註冊模型套件版本的 SageMaker AI 管道執行。
  + `Repository` - 例如，包含演算法的儲存庫。
  + `CommitId` - 例如，演算法版本的遞交 ID。

# Amazon SageMaker AI 建立的追蹤實體
<a name="lineage-tracking-auto-creation"></a>

如果資料可用，Amazon SageMaker AI 會自動為 SageMaker AI 工作、模型、模型套件和端點建立追蹤實體。SageMaker AI 建立的歷程實體數量沒有限制。

如需與如何手動建立跟蹤實體相關的資訊，請參閱[手動建立追蹤實體](lineage-tracking-manual-creation.md)。

**Topics**
+ [SageMaker AI 任務的追蹤實體](#lineage-tracking-auto-creation-jobs)
+ [模型套件追蹤實體](#lineage-tracking-auto-creation-model-package)
+ [端點追蹤實體](#lineage-tracking-auto-creation-endpoint)

## SageMaker AI 任務的追蹤實體
<a name="lineage-tracking-auto-creation-jobs"></a>

SageMaker AI 會為每個 SageMaker AI 任務建立試用元件，並與其相關聯。SageMaker AI 會建立成品來追蹤工作中繼資料以及每個成品與任務之間的關聯。

系統會為下列任務屬性建立成品，並與 SageMaker AI 任務的 Amazon Resource Name (ARN) 相關聯。成品 `SourceUri` 會在括號中列出。

**訓練工作**
+ 包含訓練演算法的映像 (`TrainingImage`)。
+ 每個輸入通道的資料來源 (`S3Uri`)。
+ 模型的位置 (`S3OutputPath)`)。
+ 受管點檢查點資料的位置 (`S3Uri`)。

**處理工作**
+ 要由處理工作執行的容器 (`ImageUri`)。
+ 每個處理輸入和處理輸出的資料位置 (`S3Uri`)。

**轉換工作**
+ 要轉換的輸入資料來源 (`S3Uri`)。
+ 轉換的結果 (`S3OutputPath`)。

**注意**  
Amazon Simple Storage Service (Amazon S3) 成品根據提供給建立 API 的 Amazon S3 URI 值進行追蹤 (例如 [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html))，而不是根據 Amazon S3 金鑰和每個檔案的雜湊值或 etag 值進行追蹤。

## 模型套件追蹤實體
<a name="lineage-tracking-auto-creation-model-package"></a>

系統會建立下列實體：

**模型套件**
+ 每個模型套件群組的內容。
+ 每個模型套件的成品。
+ 每個模型套件成品與套件所屬之每個模型套件群組之間的內容關聯。
+ 用於建立模型套件版本的動作。
+ 模型套件成品與建立動作之間的關聯。
+ 模型套件成品與套件所屬的每個模型套件群組內容之間的關聯。
+ 推論容器
  + 在模型套件中定義之每個容器中使用的映像成品。
  + 每個容器中使用的模型成品。
  + 每個成品與模型套件成品之間的關聯。
+ 演算法
  + 模型套件中定義的每個演算法的成品。
  + 每個演算法所建立之模型的成品。
  + 每個成品與模型套件成品之間的關聯。

## 端點追蹤實體
<a name="lineage-tracking-auto-creation-endpoint"></a>

下列實體由 Amazon SageMaker AI 建立：

**端點**
+ 每個端點的內容
+ 建立每個端點之模型部署的動作
+ 部署到端點的每個模型的成品
+ 模型中使用的映像成品
+ 模型之模型套件的成品
+ 部署到端點之每個模型的成品
+ 每個成品與模型部署動作之間的關聯

# 手動建立追蹤實體
<a name="lineage-tracking-manual-creation"></a>

您可以為任何屬性手動建立追蹤實體，以建立模型治理、重現工作流程，以及維護工作歷程的記錄。如需 Amazon SageMaker AI 自動建立的追蹤實體的相關資訊，請參閱 [Amazon SageMaker AI 建立的追蹤實體](lineage-tracking-auto-creation.md)。以下教學課程展示在 SageMaker 訓練任務與端點之間手動建立和關聯成品，然後追蹤工作流程所需的步驟。

您可以將標籤新增至除了關聯以外的所有實體。標籤是提供自訂資訊的任意鍵值對。您能夠依標籤對清單執行篩選或排序，或執行搜尋查詢。如需詳細資訊，請參閱《》中的[標記 AWS 資源](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html)*AWS 一般參考*。

如需示範如何建立歷程實體的範例筆記本，請參閱 [Amazon SageMaker 範例 GitHub 儲存庫](https://github.com/awslabs/amazon-sagemaker-examples)中的 [Amazon SageMaker AI 歷程](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-lineage)筆記本。

**Topics**
+ [手動建立實體](#lineage-tracking-manual-create)
+ [手動追蹤工作流程](#lineage-tracking-manual-track)
+ [限制](#lineage-tracking-manual-track-limits)

## 手動建立實體
<a name="lineage-tracking-manual-create"></a>

下列程序展示如何在 SageMaker AI 訓練任務與端點之間建立和關聯成品。您會執行以下步驟：

**匯入追蹤實體和關聯**

1. 匯入歷程追蹤實體。

   ```
   import sys
   !{sys.executable} -m pip install -q sagemaker
   
   from sagemaker import get_execution_role
   from sagemaker.session import Session
   from sagemaker.lineage import context, artifact, association, action
   
   import boto3
   boto_session = boto3.Session(region_name=region)
   sagemaker_client = boto_session.client("sagemaker")
   ```

1. 輸入和輸出成品。

   ```
   code_location_arn = artifact.Artifact.create(
       artifact_name='source-code-location',
       source_uri='s3://...',
       artifact_type='code-location'
   ).artifact_arn
   
   # Similar constructs for train_data_location_arn and test_data_location_arn
   
   model_location_arn = artifact.Artifact.create(
       artifact_name='model-location',
       source_uri='s3://...',
       artifact_type='model-location'
   ).artifact_arn
   ```

1. 訓練模型並獲得代表訓練工作的 `trial_component_arn`。

1. 將輸入成品和輸出成品與訓練工作 (試用元件) 相關聯。

   ```
   input_artifacts = [code_location_arn, train_data_location_arn, test_data_location_arn]
   for artifact_arn in input_artifacts:
       try:
           association.Association.create(
               source_arn=artifact_arn,
               destination_arn=trial_component_arn,
               association_type='ContributedTo'
           )
       except:
           logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)
   
   output_artifacts = [model_location_arn]
   for artifact_arn in output_artifacts:
       try:
            association.Association.create(
               source_arn=trial_component_arn,
               destination_arn=artifact_arn,
               association_type='Produced'
           )
       except:
           logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)
   ```

1. 建立推論端點。

   ```
   predictor = mnist_estimator.deploy(initial_instance_count=1,
                                        instance_type='ml.m4.xlarge')
   ```

1. 建立端點內容。

   ```
   from sagemaker.lineage import context
   
   endpoint = sagemaker_client.describe_endpoint(EndpointName=predictor.endpoint_name)
   endpoint_arn = endpoint['EndpointArn']
   
   endpoint_context_arn = context.Context.create(
       context_name=predictor.endpoint_name,
       context_type='Endpoint',
       source_uri=endpoint_arn
   ).context_arn
   ```

1. 將訓練工作 (試用元件) 與端點內容相關聯。

   ```
   association.Association.create(
       source_arn=trial_component_arn,
       destination_arn=endpoint_context_arn
   )
   ```

## 手動追蹤工作流程
<a name="lineage-tracking-manual-track"></a>

您可以手動追蹤在上一節中建立的工作流程。

基於上一個範例中的端點 Amazon Resource Name (ARN)，下列程序會展是如何追蹤工作流程，返回到用於訓練已部署到端點之模型的資料集。您會執行以下步驟：

**追蹤從端點到訓練資料來源的工作流程**

1. 匯入追蹤實體。

   ```
   import sys
   !{sys.executable} -m pip install -q sagemaker
   
   from sagemaker import get_execution_role
   from sagemaker.session import Session
   from sagemaker.lineage import context, artifact, association, action
   
   import boto3
   boto_session = boto3.Session(region_name=region)
   sagemaker_client = boto_session.client("sagemaker")
   ```

1. 從端點 ARN 獲取端點內容。

   ```
   endpoint_context_arn = sagemaker_client.list_contexts(
       SourceUri=endpoint_arn)['ContextSummaries'][0]['ContextArn']
   ```

1. 透過試用元件和端點內容之間的關聯取得試用元件。

   ```
   trial_component_arn = sagemaker_client.list_associations(
       DestinationArn=endpoint_context_arn)['AssociationSummaries'][0]['SourceArn']
   ```

1. 透過試用元件和端點內容之間的關聯取得訓練資料位置成品。

   ```
   train_data_location_artifact_arn = sagemaker_client.list_associations(
       DestinationArn=trial_component_arn, SourceType='Model')['AssociationSummaries'][0]['SourceArn']
   ```

1. 透過訓練資料位置成品取得訓練資料位置。

   ```
   train_data_location = sagemaker_client.describe_artifact(
       ArtifactArn=train_data_location_artifact_arn)['Source']['SourceUri']
       print(train_data_location)
   ```

   回應：

   ```
   s3://sagemaker-sample-data-us-east-2/mxnet/mnist/train
   ```

## 限制
<a name="lineage-tracking-manual-track-limits"></a>

您可以在任何實體、實驗和歷程之間建立關聯，但下列項目除外：
+ 您無法在兩個實驗實體之間建立關聯。實驗實體由實驗、試用和試用元件組成。
+ 您可以建立與其他關聯之間的關聯。

如果您嘗試建立已存在的實體，就會發生錯誤。

**手動建立之歷程實體數量的上限**
+ 動作：3000
+ 成品：6000
+ 關聯：6000
+ 內容：500

Amazon SageMaker AI 自動建立的歷程實體數量沒有限制。

# 查詢歷程實體
<a name="querying-lineage-entities"></a>

Amazon SageMaker AI 會在您使用歷程實體時自動產生歷程實體的圖形。您可以查詢此資料以回答各種問題。以下提供如何在 SDK for Python 中查詢此資料的指示。

如需如何在 Amazon SageMaker Studio 中檢視已註冊模型歷程的相關資訊，請參閱[在 Studio 中檢視模型歷程詳細資訊](model-registry-lineage-view-studio.md)。

您可以查詢歷程實體，以執行下列作業：
+ 擷取建立模型時使用的所有資料集。
+ 擷取建立端點時使用的所有工作。
+ 擷取使用資料集的所有模型。
+ 擷取使用模型的所有端點。
+ 擷取從特定資料集衍生的端點。
+ 擷取建立訓練工作的管道執行。
+ 擷取實體之間的關係，以進行調查、治理和再現。
+ 擷取使用成品的所有下游試用。
+ 擷取所有使用成品的上游試用。
+ 擷取使用所提供之 S3 URI 的成品清單。
+ 擷取使用資料集成品的上游成品。
+ 擷取使用資料集成品的下游成品。
+ 擷取使用映像成品的資料集。
+ 擷取使用內容的動作。
+ 擷取使用端點的處理工作。
+ 擷取使用端點的轉換工作。
+ 擷取使用端點的試用元件。
+ 擷取與模型套件群組相關聯之管道執行的 ARN。
+ 擷取使用動作的所有成品。
+ 擷取使用模型套件核准動作的所有上游資料集。
+ 透過模型套件核准動作擷取模型套件。
+ 擷取使用端點的下游端點內容。
+ 擷取與試用元件相關聯之管道執行的 ARN。
+ 擷取使用試用元件的資料集。
+ 擷取使用試用元件的模型。
+ 探索歷程以進行視覺化。

**限制**
+ 下列區域無法使用歷程查詢：
  + 非洲 (開普敦) – af-south
  + 亞太區域 (雅加達) – ap-southeast-3
  + 亞太區域 (大阪) - (ap-northeast-3)
  + 歐洲 (米蘭) – eu-south-1
  + 歐洲 (西班牙) – eu-south-2
  + 以色列 (特拉維夫) – il-central-1
+ 目前，關係探索的最大深度限制為 10。
+ 篩選僅限於下列屬性：上次修改日期、建立日期、類型和歷程實體類型。

**Topics**
+ [開始查詢歷程實體](#querying-lineage-entities-getting-started)

## 開始查詢歷程實體
<a name="querying-lineage-entities-getting-started"></a>

開始查詢歷程實體的最簡單方式是：
+ [Amazon SageMaker AI SDK for Python](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/lineage/artifact.py#L397)，其中定義了許多常見使用案例。
+ 如需示範如何使用 SageMaker AI 歷程 API 查詢整個歷程圖中關係的筆記本，請參閱 [sagemaker-lineage-multihop-queries.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-multihop-queries.ipynb)。

下列範例展示如何使用 `LineageQuery` 和 `LineageFilter` API 建構查詢，以回答有關歷程圖的問題，並擷取一些使用案例中的實體關聯。

**Example 使用 `LineageQuery` API 尋找實體關聯**  

```
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)
```

**Example 尋找與某個端點相關聯的所有資料集**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)
```

**Example 尋找與某個端點相關聯的模型**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)
```

**Example 尋找與端點相關聯的試用元件**  

```
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)
```

**Example 變更歷程的焦點**  
`LineageQuery` 可以修改為具有不同的 `start_arns` 來變更歷程的焦點。此外，`LineageFilter` 可以採用多個來源和實體來擴充查詢的範圍。  
我們在下面使用該模型作為歷程焦點，並找到與之相關聯的端點和資料集。  

```
# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example 是用 `LineageQueryDirectionEnum.BOTH` 尋找遞增與遞減關係**  
當方向設定為 `BOTH` 時，查詢會遍歷圖形，以尋找遞增和遞減關係。這種遍歷不僅在起始節點發生，還會在造訪的每個節點進行。例如，如果某個訓練工作執行兩次，而且訓練工作產生的兩個模型均部署到端點，則查詢結果的方向會設定為 `BOTH`，以顯示兩個端點。這是因為模型訓練和部署是用了相同的映像。由於模型映像是相同的，因此 `start_arn` 和兩個端點都會出現在查詢結果中。  

```
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example `LineageQuery` 中的方向 - `ASCENDANTS` 和 `DESCENDANTS`**  
要了解在歷程圖中的方向，可採取以下實體關係圖：資料集-> 訓練工作 -> 模型-> 端點  
從模型到端點是遞減，從模型到資料集也是遞減。與此類似，從端點到模型是遞增。`direction` 參數可用來指定查詢應傳回 `start_arns` 中實體的遞減還是遞增實體。如果 `start_arns` 包含模型且方向為 `DESCENDANTS`，則查詢會傳回端點。如果方向為 `ASCENDANTS`，則查詢會傳回資料集。  

```
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)
```

**Example SDK 輔助函式讓歷程查詢變得更輕鬆**  
`EndpointContext`、`ModelArtifact` 和 `DatasetArtifact` 類別都具有輔助函式，這些函式是 `LineageQuery` API 上的包裝函式，可以讓某些歷程查詢變得更輕鬆。以下範例展示如何使用這些輔助函式。  

```
# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)
```

**Example 取得歷程圖視覺化圖形**  
範例筆記本 [visualizer.py](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/visualizer.py) 中提供了一個輔助函式類別 `Visualizer`，能夠幫助歷程圖出圖。彩現查詢回應時，系統會顯示含有來自 `StartArns` 之歷程關係的圖形。從`StartArns` 開始，此視覺化圖形會顯示與 `query_lineage` API 動作中傳回之其他歷程實體之間的關係。  

```
# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")
```

# 追蹤跨帳戶歷程
<a name="xaccount-lineage-tracking"></a>

Amazon SageMaker AI 支援追蹤來自不同 AWS 帳戶的歷程實體。其他 AWS 帳戶可以與您共用歷程實體，您可以透過直接 API 呼叫或 SageMaker AI 歷程查詢存取這些歷程實體。

SageMaker AI 使用 [AWS Resource Access Manager](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 來協助您安全共用歷程資源。您可以透過 [AWS RAM 主控台](https://console.aws.amazon.com/ram/home)共用資源。


## 設定跨帳戶歷程追蹤
<a name="setup-xaccount-lineage-tracking"></a>

您可以透過 Amazon SageMaker AI 中的歷程群組來分組和共用 [歷程追蹤實體](lineage-tracking-entities.md)。SageMaker AI 僅支援每個帳戶一個預設歷程群組。每當在您的帳戶中建立歷程實體時，SageMaker AI 都會建立預設歷程群組。您的帳戶擁有的每個歷程實體都會指派給此預設歷程群組。若要與其他帳戶共用歷程實體，您可以與該帳戶共用此預設歷程群組。

**注意**  
您可以共用歷程群組中的所有歷程追蹤實體，也可以不共用任何實體。

使用 AWS Resource Access Manager 主控台為您的歷程實體建立資源共享。如需詳細資訊，請參閱 *AWS Resource Access Manager 使用者指南*中的[共用 AWS 資源](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html)。

**注意**  
建立資源共用後，資源和主體可能需要幾分鐘的時間才能完成關聯。設定關聯之後，共享帳戶會收到加入資源共用的邀請。共用帳戶必須接受邀請才能存取共用資源。如需接受資源共用邀請的詳細資訊 AWS RAM，請參閱*AWS 《Resource Access Manager 使用者指南*》中的[使用共用 AWS 資源](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html)。

### 跨帳戶歷程追蹤資源政策
<a name="setup-xaccount-lineage-tracking-resource-policy"></a>

Amazon SageMaker AI 僅支援一種類型的資源政策。SageMaker AI 資源政策必須允許執行下列所有操作：

```
"sagemaker:DescribeAction"
"sagemaker:DescribeArtifact"
"sagemaker:DescribeContext"
"sagemaker:DescribeTrialComponent"
"sagemaker:AddAssociation"
"sagemaker:DeleteAssociation"
"sagemaker:QueryLineage"
```

**Example 以下是使用 建立的 SageMaker AI 資源政策， AWS Resource Access Manager 用於為帳戶歷程群組建立資源共享。**    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "FullLineageAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "111122223333"
      },
      "Action": [
        "sagemaker:DescribeAction",
        "sagemaker:DescribeArtifact",
        "sagemaker:DescribeContext",
        "sagemaker:DescribeTrialComponent",
        "sagemaker:AddAssociation",
        "sagemaker:DeleteAssociation",
        "sagemaker:QueryLineage"
      ],
      "Resource": "arn:aws:sagemaker:us-west-2:111111111111:lineage-group/sagemaker-default-lineage-group"
    }
  ]
}
```

## 追蹤跨帳戶歷程實體
<a name="tracking-lineage-xaccount"></a>

透過跨帳戶歷程追蹤，您可以使用相同的 `AddAssociation` API 動作關聯不同帳戶中的歷程實體。當您關聯兩個歷程實體時，SageMaker AI 會驗證您是否擁有對兩個歷程實體執行 `AddAssociation` API 動作的許可。然後，SageMaker AI 會建立關聯。如果您沒有許可，SageMaker AI *不會*建立關聯。建立跨帳戶關聯後，您可以透過 `QueryLineage` API 動作從另一個歷程實體存取任一歷程實體。如需詳細資訊，請參閱[查詢歷程實體](querying-lineage-entities.md)。

除了 SageMaker AI 自動建立歷程實體之外，如果您具有跨帳戶存取權，SageMaker AI 還會連線參考相同物件或資料的成品。如果一個帳戶的資料用於不同帳戶的歷程追蹤，SageMaker AI 會在每個帳戶中建立一個成品來追蹤該資料。若使用跨帳戶歷程，每當 SageMaker AI 建立新的成品時，SageMaker AI 都會檢查是否有其他成品針對同樣的資料建立，同時與您共用。然後，SageMaker AI 會在新建立的成品和與您共用的成品之間建立關聯，並將 `AssociationType` 設定為 `SameAs`。然後，您可以使用 `[QueryLineage](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_QueryLineage.html)` API 動作，歷程您自己帳戶中的歷程實體，以及與您共用但由不同 AWS 帳戶所擁有的實體。如需詳細資訊，請參閱[查詢歷程實體](querying-lineage-entities.md)

**Topics**
+ [從不同帳戶存取歷程資源](#tracking-lineage-xaccount-accessing-resources)
+ [授權跨帳戶查詢歷程實體](#tracking-lineage-xaccount-authorization)

### 從不同帳戶存取歷程資源
<a name="tracking-lineage-xaccount-accessing-resources"></a>

設定共用歷程的跨帳戶存取權之後，您可以透過 ARN 直接呼叫下列 SageMaker API 動作，以描述來自其他帳戶的共用歷程實體：
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAction.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAction.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeArtifact.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeArtifact.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeContext.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeContext.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrialComponent.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrialComponent.html)

您也可以使用下列 SageMaker API 動作，管理由不同帳戶擁有且已與您共用的歷程實體的[關聯](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking-entities.html)：
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteAssociation.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteAssociation.html)

如需示範如何使用 SageMaker AI 歷程 API 跨帳戶查詢歷程的筆記本，請參閱 [sagemaker-lineage-cross-account-with-ram.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-cross-account-with-ram.ipynb)。

### 授權跨帳戶查詢歷程實體
<a name="tracking-lineage-xaccount-authorization"></a>

Amazon SageMaker AI 必須驗證您是否具有在 `StartArns` 上執行 `QueryLineage` API 動作的許可。這是透過連接至 `LineageGroup` 的資源政策來強制執行的。此動作的結果包括您可以存取所有歷程實體，無論這些實體是由您的帳戶擁有還是由其他帳戶共用。如需詳細資訊，請參閱[查詢歷程實體](querying-lineage-entities.md)。

# 使用模型註冊庫進行模型註冊部署
<a name="model-registry"></a>

透過 Amazon SageMaker 模型註冊表，您可以執行下列操作：
+ 為生產模型製作目錄。
+ 管理模型版本。
+ 將中繼資料 (例如訓練指標) 與模型建立關聯。
+ 在已註冊的模型中檢視來自 Amazon SageMaker 模型卡的資訊。
+ 檢視模型譜系以取得可追蹤性和可重現性。
+ 定義模型可以在模型生命週期中進展的預備建構模組。
+ 管理模型的核准狀態。
+ 將模型部署到生產環境。
+ 使用 CI/CD 自動部署模型。
+ 與其他使用者共用模型。

透過建立包含不同模型版本的 SageMaker 模型註冊表模型 (套件) 群組來製作模型目錄。您可以建立模型群組，來追蹤為以解決特定問題而訓練的所有模型。然後，您可以對訓練的每個模型進行註冊，模型註冊表會將其作為新的模型版本新增至模型群組。最後，您可以透過進一步將模型群組整理到 SageMaker 模型註冊表集合，從而建立模型群組類別。典型的工作流程看起來應該如下所示：
+ 建立模型群組。
+ 建立對模型進行訓練的機器學習 (ML) 管道。如需與 SageMaker 管道相關的資訊，請參閱[Pipelines 動作](pipelines-build.md)。
+ 對於機器學習 (ML) 管道的每次執行，都要建立一個模型版本，並在第一步中建立的模型群組中註冊。
+ 將您的模型群組新增至一個或多個模型註冊表集合。

如需與如何建立和使用模型、模型版本和模型群組有關的詳細資訊，請參閱[模型註冊表模型、模型版本和模型群組](model-registry-models.md)。(可選) 如果您要將模型群組進一步分組為集合，請參閱[模型註冊表集合](modelcollections.md)。

# 模型註冊表模型、模型版本和模型群組
<a name="model-registry-models"></a>

SageMaker 模型註冊表結構為數個模型 (套件) 群組，每個群組中都有模型套件。您可以選擇性地將這些模型群組新增至一個或多個集合。模型群組中的每個模型套件都對應於一個訓練過的模型。每個模型套件的版本都是一個數值，從 1 開始，並每向模型群組中新增一個新模型套件，版本就遞增一次。例如，如果模型群組中新增了 5 個模型套件，則模型套件版本將分別是 1、2、3、4 和 5。

 模型套件是作為已進行版本控制的實體在模型註冊表中註冊的實際模型。SageMaker AI 中有兩種類型的模型套件。 AWS Marketplace 中使用一種類型，另一種類型則用於模型註冊表中。 AWS Marketplace 中使用的模型套件是不可進行版本控制的實體，也不與模型註冊表中的模型群組相關聯。模型註冊表會接收您重新訓練的每個新模型，為其提供一個版本，並將其指派給模型註冊表內的模型群組。下列影像展示 25 個連續版本化模型的模型群組範例。如需 AWS Marketplace 中使用的模型套件詳細資訊，請參閱 [中的演算法和套件 AWS Marketplace](sagemaker-marketplace.md)。

模型註冊表中使用的模型套件可進行版本控制，且**必須**與模型群組相關聯。此模型套件類型的 ARN 具有以下結構：`'arn:aws:sagemaker:region:account:model-package-group/version'`

下列主題展示如何在模型註冊表中建立和使用模型、模型版本和模型群組。

**Topics**
+ [建立模型群組](model-registry-model-group.md)
+ [刪除模型群組](model-registry-delete-model-group.md)
+ [註冊模型版本](model-registry-version.md)
+ [檢視模型群組和版本](model-registry-view.md)
+ [更新模型版本的詳細資訊](model-registry-details.md)
+ [比較模型版本](model-registry-version-compare.md)
+ [檢視和管理模型群組和模型版本標籤](model-registry-tags.md)
+ [刪除模型版本](model-registry-delete-model-version.md)
+ [模型生命週期的預備建構模組](model-registry-staging-construct.md)
+ [更新模型的核准狀態](model-registry-approve.md)
+ [使用 Python 從登錄中部署模型](model-registry-deploy.md)
+ [在 Studio 中部署模型](model-registry-deploy-studio.md)
+ [跨帳戶探索能力](model-registry-ram.md)
+ [檢視模型的部署歷史記錄](model-registry-deploy-history.md)
+ [在 Studio 中檢視模型歷程詳細資訊](model-registry-lineage-view-studio.md)

# 建立模型群組
<a name="model-registry-model-group"></a>

模型群組包含不同版本的模型。您可以建立模型群組，來追蹤為以解決特定問題而訓練的所有模型。使用 適用於 Python (Boto3) 的 AWS SDK 或 Amazon SageMaker Studio 主控台建立模型群組。

## 建立模型群組 (Boto3)
<a name="model-registry-package-group-api"></a>

**重要**  
允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授與許可，才能將標籤新增至這些資源。需要將標籤新增至資源的許可，因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源，但不允許標記，則在嘗試建立資源時可能會發生 "AccessDenied" 錯誤。如需詳細資訊，請參閱[提供標記 SageMaker AI 資源的許可](security_iam_id-based-policy-examples.md#grant-tagging-permissions)。  
提供許可來建立 SageMaker 資源的 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md) 已包含建立這些資源時新增標籤的許可。

若要使用 Boto3 建立模型群組，請呼叫 `create_model_package_group` API 操作並將名稱和描述指定為參數。下列範例展示如何建立模型群組。`create_model_package_group` 呼叫的回應是新模型群組的 Amazon Resource Name (ARN)。

首先，匯入所需的套件並設定 SageMaker AI Boto3 用戶端。

```
import time
import os
from sagemaker import get_execution_role, session
import boto3

region = boto3.Session().region_name

role = get_execution_role()

sm_client = boto3.client('sagemaker', region_name=region)
```

現在建立模型群組。

```
import time
model_package_group_name = "scikit-iris-detector-" + str(round(time.time()))
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}

create_model_package_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
print('ModelPackageGroup Arn : {}'.format(create_model_package_group_response['ModelPackageGroupArn']))
```

## 建立模型群組 (Studio 或 Studio Classic)
<a name="model-registry-package-group-studio"></a>

若要在 Amazon SageMaker Studio 主控台中建立模型群組，請根據您是使用 Studio 還是 Studio Classic 來完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 選擇**註冊**，然後選擇**模型群組**。

1. 在**註冊模型群組**對話方塊中，輸入下列資訊：
   + 在**模型群組名稱**欄位中輸入新模型群組的名稱。
   + (選用) 在**描述**欄位中輸入模型群組的描述。
   + (選用) 在**標籤**欄位中輸入您想要與模型群組建立關聯的任何鍵值對。如需與使用標記相關的資訊，請參閱 *AWS 一般參考*中的[標記 AWS 資源](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html)。

1. 選擇**註冊模型群組**。

1. (選用) 在**模型**頁面中，選擇**已註冊模型**索引標籤，然後選擇**模型群組**。確認新建立的模型群組出現在模型群組清單中。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**動作**，然後選擇 **建立模型群組**。

1. 在**建立模型群組**對話方塊中，輸入下列資訊：
   + 在**模型群組名稱**欄位中輸入新模型群組的名稱。
   + (可選) 在**描述**欄位中，輸入模型群組的描述。
   + (可選) 在**標籤**欄位中，輸入您希望與模型群組相關聯的任何鍵值對。如需與使用標記相關的資訊，請參閱 *AWS 一般參考*中的[標記 AWS 資源](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html)。
   + (可選) 在**專案**欄位中選擇要與模型群組相關聯的專案。如需與專案相關的資訊，請參閱[使用 SageMaker 專案進行 MLOps 自動化](sagemaker-projects.md)。

1. 選擇**建立模型群組**。

------

# 刪除模型群組
<a name="model-registry-delete-model-group"></a>

此程序示範如何在 Amazon SageMaker Studio 主控台中刪除模型群組。當您刪除模型群組時，您將無法存取模型群組中的模型版本。

## 刪除模型群組 (Studio 或 Studio Classic)
<a name="model-registry-delete-model-group-studio"></a>

**重要**  
您只能刪除空的模型群組。刪除模型群組之前，請先移除模型群組的模型版本 (如果有)。

若要在 Amazon SageMaker Studio 主控台中刪除模型群組，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選取您要刪除的模型群組名稱旁邊的核取方塊。

1. 選擇模型群組清單右上角的垂直省略符號，然後選擇**刪除**。

1. 在**刪除模型群組**對話方塊中，選擇**是，刪除模型群組**。

1. 選擇 **刪除**。

1. 確認您刪除的模型群組不再出現在模型群組清單中。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。系統隨即會顯示模型群組清單。

1. 從模型群組清單中，選取要刪除的模型群組的名稱。

1. 在右上角選擇**移除**。

1. 在確認對話方塊中，輸入 `REMOVE`。

1. 選擇**移除**。

------

# 註冊模型版本
<a name="model-registry-version"></a>

您可以透過建立一個指定模型所屬模型群組的模型版本來註冊 Amazon SageMaker AI 模型。模型版本必須包含模型成品 (模型的訓練權重)，以及選擇性地包含模型的推論程式碼。

*推論管道*是一種 SageMaker AI 模型，由兩個到十五個處理推論請求的容器線性序列容器構成。您可以透過指定容器和相關聯的環境變數來註冊推論管道。如需與推論管道相關的詳細資訊，請參閱[Amazon SageMaker AI 中的推論管道](inference-pipelines.md)。

您可以透過指定容器和關聯的環境變數，在推論管道中註冊模型。若要使用 適用於 Python (Boto3) 的 AWS SDK、Amazon SageMaker Studio 主控台或在 SageMaker AI 模型建置管道中建立步驟，建立具有推論管道的模型版本，請使用下列步驟。

**Topics**
+ [註冊模型版本 (SageMaker AI Pipeline)](#model-registry-pipeline)
+ [註冊模型版本 (Boto3)](#model-registry-version-api)
+ [註冊模型版本 (Studio 或 Studio Classic)](#model-registry-studio)
+ [從其他帳戶註冊模型版本](#model-registry-version-xaccount)

## 註冊模型版本 (SageMaker AI Pipeline)
<a name="model-registry-pipeline"></a>

若要使用 SageMaker AI 模型建置管道註冊模型版本，請在管道中建立 `RegisterModel` 步驟。如需與作為管道的一部分建立 `RegisterModel` 相關的詳細資訊，請參閱[步驟 8：定義 RegisterModel 步驟以建立模型套件](define-pipeline.md#define-pipeline-register)。

## 註冊模型版本 (Boto3)
<a name="model-registry-version-api"></a>

若要使用 Boto3 註冊模型版本，請呼叫 `create_model_package` API 操作。

首先，設定要傳遞至 `create_model_package` API 操作的參數字典。

```
# Specify the model source
model_url = "s3://your-bucket-name/model.tar.gz"

modelpackage_inference_specification =  {
    "InferenceSpecification": {
      "Containers": [
         {
            "Image": image_uri,
	    "ModelDataUrl": model_url
         }
      ],
      "SupportedContentTypes": [ "text/csv" ],
      "SupportedResponseMIMETypes": [ "text/csv" ],
   }
 }

# Alternatively, you can specify the model source like this:
# modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]=model_url

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_name,
    "ModelPackageDescription" : "Model to detect 3 different types of irises (Setosa, Versicolour, and Virginica)",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)
```

然後呼叫 `create_model_package` API 操作，傳入您剛剛設定的參數字典。

```
create_model_package_response = sm_client.create_model_package(**create_model_package_input_dict)
model_package_arn = create_model_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))
```

## 註冊模型版本 (Studio 或 Studio Classic)
<a name="model-registry-studio"></a>

若要在 Amazon SageMaker Studio 主控台中註冊模型版本，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，從功能表中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤標籤的正下方選擇**模型群組**和**我的模型**。

1. 選擇**註冊**。這會開啟**註冊模型**頁面。

1. 請遵循**註冊模型**頁面提供的指示。

1. 一旦檢閱了您的選擇，請選擇**註冊**。一旦完成，就會帶您前往模型版本**概觀**頁面。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 開啟**註冊版本**表單。您可以使用兩種方式的其中一種來執行此動作：
   + 選擇**動作**，然後選擇**建立模型版本**。
   + 選取您要為其建立模型版本的模型群組的名稱，然後選擇**建立模型版本**。

1. 在**註冊模型版本**表單中，輸入下列資訊：
   + 在**模型套件群組名稱**下拉式清單中，選取模型群組名稱。
   + (可選) 輸入模型版本的描述。
   + 在**模型核准狀態**下拉式清單中，選取版本核准狀態。
   + (選用) 在**自訂中繼資料**欄位中，以鍵值對形式新增自訂標籤。

1. 選擇**下一步**。

1. 在**推論規格**表單中，輸入下列資訊：
   + 輸入推論映像位置。
   + 輸入模型資料成品位置。
   + (選用) 輸入哪些映像要用於轉換和即時推論任務的相關資訊，以及支援的輸入和輸出 MIME 類型。

1. 選擇**下一步**。

1. (可選) 提供用於協助提出端點建議的詳細資料。

1. 選擇**下一步**。

1. (可選) 選擇您要包含的模型指標。

1. 選擇**下一步**。

1. 確認顯示的設定正確無誤，然後選擇**註冊模型版本**。如果您隨後看到顯示錯誤訊息的強制回應視窗，請選擇**檢視** (位於訊息旁) 以檢視錯誤來源。

1. 確認新模型版本在父模型群組頁面中顯示。

------

## 從其他帳戶註冊模型版本
<a name="model-registry-version-xaccount"></a>

若要向不同 AWS 帳戶建立的模型群組註冊模型版本，您必須新增跨帳戶 AWS Identity and Access Management 資源政策以啟用該帳戶。例如，組織中的一個 AWS 帳戶負責訓練模型，另一個帳戶負責管理、部署和更新模型。您可以建立 IAM 資源政策，並將政策套用至您想要在此案例中授予存取權的特定帳戶資源。如需 中跨帳戶資源政策的詳細資訊 AWS，請參閱*AWS Identity and Access Management 《 使用者指南*》中的[跨帳戶政策評估邏輯](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic-cross-account.html)。

若要啟用跨帳戶探索能力，允許其他帳戶從資源擁有者帳戶檢視模型套件群組，請參閱[跨帳戶探索能力](model-registry-ram.md)。

**注意**  
在跨帳戶模型部署訓練期間，您還必須使用 KMS 金鑰來對[輸出資料設定](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputDataConfig.html)動作進行加密。

若要在 SageMaker AI 中啟用跨帳戶模型註冊庫，您必須為包含模型版本的模型群組提供跨帳戶資源政策。下列範例為模型群組建立了跨帳戶政策，並將這些政策套用至特定資源。

必須為在模型群組中註冊模型跨帳戶存取權的來源帳戶設定下列組態。在此範例中，來源帳戶是模型訓練帳戶，該帳戶將訓練模型，然後將模型跨帳戶存取權註冊到模型註冊表帳戶的模型註冊表中。

此範例假設您先前已定義下列變數：
+ `sm_client` - SageMaker AI Boto3 用戶端。
+ `model_package_group_name` - 您要授予存取權的模型群組。
+ `model_package_group_arn` - 您要授與跨帳戶存取權的模型群組 ARN。
+ `bucket` - 存放模型訓練成品的 Amazon S3 儲存貯體。

若要能夠部署在不同帳戶中建立的模型，使用者必須擁有可以存取 SageMaker AI 動作的角色 (例如具有 `AmazonSageMakerFullAccess` 受管政策的角色)。如需 SageMaker AI 受管政策的相關資訊，請參閱 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md)。

### 必要的 IAM 資源政策
<a name="model-registry-version-xaccount-policies"></a>

下圖展示允許跨帳戶模型註冊時所必要的政策。如圖所示，這些政策必須在模型訓練期間處於作用中狀態，模型才能正確註冊至模型註冊表帳戶。

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/model_registry_cross_account.png)


Amazon ECR、Amazon S3 和 AWS KMS 政策會在下列程式碼範例中示範。

**Amazon ECR 政策範例**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "ecr:BatchGetImage",
                "ecr:Describe*"
            ]
        }
    ]
}
```

------

**Amazon S3 政策範例**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetBucketAcl",
                "s3:GetObjectAcl"
            ],
            "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
        }
    ]
}
```

------

**範例 AWS KMS 政策**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey*"
            ],
            "Resource": "*"
        }
    ]
}
```

------

### 將資源政策套用至帳戶
<a name="model-registry-version-xaccount-policy-usage"></a>

下列政策組態會套用前一節中討論的政策，且必須放入模型訓練帳戶。

```
import json

# The Model Registry account id of the Model Group 
model_registry_account = "111111111111"

# The model training account id where training happens
model_training_account = "222222222222"

# 1. Create a policy for access to the ECR repository 
# in the model training account for the Model Registry account Model Group
ecr_repository_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [{"Sid": "AddPerm",
        "Effect": "Allow",
        "Principal": {
          "AWS": f"arn:aws:iam::{model_registry_account}:root"
        },
        "Action": [
          "ecr:BatchGetImage",
          "ecr:Describe*"
        ]
    }]
}

# Convert the ECR policy from JSON dict to string
ecr_repository_policy = json.dumps(ecr_repository_policy)

# Set the new ECR policy
ecr = boto3.client('ecr')
response = ecr.set_repository_policy(
    registryId = model_training_account,
    repositoryName = "decision-trees-sample",
    policyText = ecr_repository_policy
)

# 2. Create a policy in the model training account for access to the S3 bucket 
# where the model is present in the Model Registry account Model Group
bucket_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [{"Sid": "AddPerm",
        "Effect": "Allow",
        "Principal": {"AWS": f"arn:aws:iam::{model_registry_account}:root"
        },
        "Action": [
          "s3:GetObject",
          "s3:GetBucketAcl",
          "s3:GetObjectAcl"
        ],
        "Resource": [
          "arn:aws:s3:::{bucket}/*",
	  "Resource: arn:aws:s3:::{bucket}"
        ]
    }]
}

# Convert the S3 policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new bucket policy
s3 = boto3.client("s3")
response = s3.put_bucket_policy(
    Bucket = bucket,
    Policy = bucket_policy)

# 3. Create the KMS grant for the key used during training for encryption
# in the model training account to the Model Registry account Model Group
client = boto3.client("kms")

response = client.create_grant(
    GranteePrincipal=model_registry_account,
    KeyId=kms_key_id
    Operations=[
        "Decrypt",
        "GenerateDataKey",
    ],
)
```

下列組態必須放入模型群組所在的模型註冊表帳戶。

```
# The Model Registry account id of the Model Group 
model_registry_account = "111111111111"

# 1. Create policy to allow the model training account to access the ModelPackageGroup
model_package_group_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPermModelPackageVersion",
            "Effect": "Allow",
            "Principal": {"AWS": f"arn:aws:iam::{model_training_account}:root"},
            "Action": ["sagemaker:CreateModelPackage"],
            "Resource": f"arn:aws:sagemaker:{region}:{model_registry_account}:model-package/{model_package_group_name}/*"
        }
    ]
}

# Convert the policy from JSON dict to string
model_package_group_policy = json.dumps(model_package_group_policy)

# Set the new policy
response = sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name,
    ResourcePolicy = model_package_group_policy)
```

最後，透過模型訓練帳戶使用 `create_model_package` 動作跨帳戶註冊模型套件。

```
# Specify the model source
model_url = "s3://{bucket}/model.tar.gz"

#Set up the parameter dictionary to pass to the create_model_package API operation
modelpackage_inference_specification =  {
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": f"{model_training_account}.dkr.ecr.us-east-2.amazonaws.com/decision-trees-sample:latest",
                "ModelDataUrl": model_url
            }
        ],
        "SupportedContentTypes": [ "text/csv" ],
        "SupportedResponseMIMETypes": [ "text/csv" ],
    }
}

# Alternatively, you can specify the model source like this:
# modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]=model_url

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_arn,
    "ModelPackageDescription" : "Model to detect 3 different types of irises (Setosa, Versicolour, and Virginica)",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)

# Create the model package in the Model Registry account
create_model_package_response = sm_client.create_model_package(**create_model_package_input_dict)
model_package_arn = create_model_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))
```

# 檢視模型群組和版本
<a name="model-registry-view"></a>

模型群組和版本可協助您整理模型。您可以使用 適用於 Python (Boto3) 的 AWS SDK (Boto3) 或 Amazon SageMaker Studio 主控台來檢視模型群組中的模型版本清單。

## 在群組中檢視模型版本的清單
<a name="model-registry-view-list"></a>

您可以檢視與模型群組相關聯的所有模型版本。如果模型群組代表您為解決特定機器學習 (ML) 問題而訓練的所有模型，則您可以檢視所有這些相關模型。

### 在群組中檢視模型版本的清單 (Boto3)
<a name="model-registry-view-list-api"></a>

若要使用 Boto3 來檢視與某個模型群組相關聯的模型版本，請呼叫 `list_model_packages` API 操作，並傳遞模型群組的名稱作為 `ModelPackageGroupName` 參數的值。下列程式碼列出與您在 [建立模型群組 (Boto3)](model-registry-model-group.md#model-registry-package-group-api) 中建立的模型群組相關聯的模型版本。

```
sm_client.list_model_packages(ModelPackageGroupName=model_package_group_name)
```

### 檢視群組中的模型版本清單 (Studio 或 Studio Classic)
<a name="model-registry-view-list-studio"></a>

若要在 Amazon SageMaker Studio 主控台中檢視模型群組中的模型版本清單，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，從功能表中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選擇您要檢視之模型群組左側的角括號。

1. 模型群組中模型版本的清單即會出現。

1. (選用) 如果顯示，請選擇**檢視全部**以檢視其他模型版本。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要檢視的模型群組的名稱。

1. 系統會顯示一個新標籤，其中包含模型群組中模型版本的清單。

------

# 更新模型版本的詳細資訊
<a name="model-registry-details"></a>

您可以使用 適用於 Python (Boto3) 的 AWS SDK 或使用 Amazon SageMaker Studio 主控台，來檢視和更新特定模型版本的詳細資訊。

**重要**  
Amazon SageMaker AI 將模型卡整合至模型註冊庫。在模型註冊庫中註冊的模型套件包含簡化的模型卡，做為模型套件的元件。如需詳細資訊，請參閱[模型套件模型卡結構描述 (Studio)](#model-card-schema)。

## 檢視和更新模型版本的詳細資訊 (Boto3)
<a name="model-registry-details-api"></a>

若要使用 Boto3 檢視模型版本的詳細資訊，請完成下列步驟。

1. 呼叫 `list_model_packages` API 操作來檢視模型群組中的模型版本。

   ```
   sm_client.list_model_packages(ModelPackageGroupName="ModelGroup1")
   ```

   系統會回應模型套件摘要的清單。您可以透過此清單取得模型版本的 Amazon Resource Name (ARN)。

   ```
   {'ModelPackageSummaryList': [{'ModelPackageGroupName': 'AbaloneMPG-16039329888329896',
      'ModelPackageVersion': 1,
      'ModelPackageArn': 'arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup1/1',
      'ModelPackageDescription': 'TestMe',
      'CreationTime': datetime.datetime(2020, 10, 29, 1, 27, 46, 46000, tzinfo=tzlocal()),
      'ModelPackageStatus': 'Completed',
      'ModelApprovalStatus': 'Approved'}],
    'ResponseMetadata': {'RequestId': '12345678-abcd-1234-abcd-aabbccddeeff',
     'HTTPStatusCode': 200,
     'HTTPHeaders': {'x-amzn-requestid': '12345678-abcd-1234-abcd-aabbccddeeff',
      'content-type': 'application/x-amz-json-1.1',
      'content-length': '349',
      'date': 'Mon, 23 Nov 2020 04:56:50 GMT'},
     'RetryAttempts': 0}}
   ```

1. 呼叫 `describe_model_package` 以查看模型版本的詳細資訊。您需要傳入之模型版本的 ARN 即您在呼叫 `list_model_packages` 時取得的輸出。

   ```
   sm_client.describe_model_package(ModelPackageName="arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup1/1")
   ```

   此呼叫的輸出是包含模型版本詳細資訊的 JSON 物件。

   ```
   {'ModelPackageGroupName': 'ModelGroup1',
    'ModelPackageVersion': 1,
    'ModelPackageArn': 'arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup/1',
    'ModelPackageDescription': 'Test Model',
    'CreationTime': datetime.datetime(2020, 10, 29, 1, 27, 46, 46000, tzinfo=tzlocal()),
    'InferenceSpecification': {'Containers': [{'Image': '257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',
       'ImageDigest': 'sha256:99fa602cff19aee33297a5926f8497ca7bcd2a391b7d600300204eef803bca66',
       'ModelDataUrl': 's3://sagemaker-us-east-2-123456789012/ModelGroup1/pipelines-0gdonccek7o9-AbaloneTrain-stmiylhtIR/output/model.tar.gz'}],
     'SupportedTransformInstanceTypes': ['ml.m5.xlarge'],
     'SupportedRealtimeInferenceInstanceTypes': ['ml.t2.medium', 'ml.m5.xlarge'],
     'SupportedContentTypes': ['text/csv'],
     'SupportedResponseMIMETypes': ['text/csv']},
    'ModelPackageStatus': 'Completed',
    'ModelPackageStatusDetails': {'ValidationStatuses': [],
     'ImageScanStatuses': []},
    'CertifyForMarketplace': False,
    'ModelApprovalStatus': 'PendingManualApproval',
    'LastModifiedTime': datetime.datetime(2020, 10, 29, 1, 28, 0, 438000, tzinfo=tzlocal()),
    'ResponseMetadata': {'RequestId': '12345678-abcd-1234-abcd-aabbccddeeff',
     'HTTPStatusCode': 200,
     'HTTPHeaders': {'x-amzn-requestid': '212345678-abcd-1234-abcd-aabbccddeeff',
      'content-type': 'application/x-amz-json-1.1',
      'content-length': '1038',
      'date': 'Mon, 23 Nov 2020 04:59:38 GMT'},
     'RetryAttempts': 0}}
   ```

### 模型套件模型卡結構描述 (Studio)
<a name="model-card-schema"></a>

與模型版本相關的所有詳細資訊都會封裝在模型套件的模型卡中。模型套件的模型卡是 Amazon SageMaker 模型卡的一種特殊用法，其結構描述已簡化。模型套件模型卡結構描述會顯示在下列可展開的下拉式清單中。

#### 模型套件模型卡結構描述
<a name="collapsible-section-model-package-model-card-schema"></a>

```
{
  "title": "SageMakerModelCardSchema",
  "description": "Schema of a model package’s model card.",
  "version": "0.1.0",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "model_overview": {
      "description": "Overview about the model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "model_creator": {
          "description": "Creator of model.",
          "type": "string",
          "maxLength": 1024
        },
        "model_artifact": {
          "description": "Location of the model artifact.",
          "type": "array",
          "maxContains": 15,
          "items": {
            "type": "string",
            "maxLength": 1024
          }
        }
      }
    },
    "intended_uses": {
      "description": "Intended usage of model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "purpose_of_model": {
          "description": "Reason the model was developed.",
          "type": "string",
          "maxLength": 2048
        },
        "intended_uses": {
          "description": "Intended use cases.",
          "type": "string",
          "maxLength": 2048
        },
        "factors_affecting_model_efficiency": {
          "type": "string",
          "maxLength": 2048
        },
        "risk_rating": {
          "description": "Risk rating for model card.",
          "$ref": "#/definitions/risk_rating"
        },
        "explanations_for_risk_rating": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "business_details": {
      "description": "Business details of model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "business_problem": {
          "description": "Business problem solved by the model.",
          "type": "string",
          "maxLength": 2048
        },
        "business_stakeholders": {
          "description": "Business stakeholders.",
          "type": "string",
          "maxLength": 2048
        },
        "line_of_business": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "training_details": {
      "description": "Overview about the training.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "objective_function": {
          "description": "The objective function for which the model is optimized.",
          "function": {
            "$ref": "#/definitions/objective_function"
          },
          "notes": {
            "type": "string",
            "maxLength": 1024
          }
        },
        "training_observations": {
          "type": "string",
          "maxLength": 1024
        },
        "training_job_details": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "training_arn": {
              "description": "SageMaker Training job ARN.",
              "type": "string",
              "maxLength": 1024
            },
            "training_datasets": {
              "description": "Location of the model datasets.",
              "type": "array",
              "maxContains": 15,
              "items": {
                "type": "string",
                "maxLength": 1024
              }
            },
            "training_environment": {
              "type": "object",
              "additionalProperties": false,
              "properties": {
                "container_image": {
                  "description": "SageMaker training image URI.",
                  "type": "array",
                  "maxContains": 15,
                  "items": {
                    "type": "string",
                    "maxLength": 1024
                  }
                }
              }
            },
            "training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "user_provided_training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            },
            "user_provided_hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            }
          }
        }
      }
    },
    "evaluation_details": {
      "type": "array",
      "default": [],
      "items": {
        "type": "object",
        "required": [
          "name"
        ],
        "additionalProperties": false,
        "properties": {
          "name": {
            "type": "string",
            "pattern": ".{1,63}"
          },
          "evaluation_observation": {
            "type": "string",
            "maxLength": 2096
          },
          "evaluation_job_arn": {
            "type": "string",
            "maxLength": 256
          },
          "datasets": {
            "type": "array",
            "items": {
              "type": "string",
              "maxLength": 1024
            },
            "maxItems": 10
          },
          "metadata": {
            "description": "Additional attributes associated with the evaluation results.",
            "type": "object",
            "additionalProperties": {
              "type": "string",
              "maxLength": 1024
            }
          },
          "metric_groups": {
            "type": "array",
            "default": [],
            "items": {
              "type": "object",
              "required": [
                "name",
                "metric_data"
              ],
              "properties": {
                "name": {
                  "type": "string",
                  "pattern": ".{1,63}"
                },
                "metric_data": {
                  "type": "array",
                  "items": {
                    "anyOf": [
                      {
                        "$ref": "#/definitions/simple_metric"
                      },
                      {
                        "$ref": "#/definitions/linear_graph_metric"
                      },
                      {
                        "$ref": "#/definitions/bar_chart_metric"
                      },
                      {
                        "$ref": "#/definitions/matrix_metric"
                      }
                    ]

                  }
                }
              }
            }
          }
        }
      }
    },
    "additional_information": {
      "additionalProperties": false,
      "type": "object",
      "properties": {
        "ethical_considerations": {
          "description": "Ethical considerations for model users.",
          "type": "string",
          "maxLength": 2048
        },
        "caveats_and_recommendations": {
          "description": "Caveats and recommendations for model users.",
          "type": "string",
          "maxLength": 2048
        },
        "custom_details": {
          "type": "object",
          "additionalProperties": {
            "$ref": "#/definitions/custom_property"
          }
        }
      }
    }
  },
  "definitions": {
    "source_algorithms": {
      "type": "array",
      "minContains": 1,
      "maxContains": 1,
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "algorithm_name"
        ],
        "properties": {
          "algorithm_name": {
            "description": "The name of the algorithm used to create the model package. The algorithm must be either an algorithm resource in your SageMaker AI account or an algorithm in AWS Marketplace that you are subscribed to.",
            "type": "string",
            "maxLength": 170
          },
          "model_data_url": {
            "description": "Amazon S3 path where the model artifacts, which result from model training, are stored.",
            "type": "string",
            "maxLength": 1024
          }
        }
      }
    },
    "inference_specification": {
      "type": "object",
      "additionalProperties": false,
      "required": [
        "containers"
      ],
      "properties": {
        "containers": {
          "description": "Contains inference related information used to create model package.",
          "type": "array",
          "minContains": 1,
          "maxContains": 15,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": [
              "image"
            ],
            "properties": {
              "model_data_url": {
                "description": "Amazon S3 path where the model artifacts, which result from model training, are stored.",
                "type": "string",
                "maxLength": 1024
              },
              "image": {
                "description": "Inference environment path. The Amazon Elastic Container Registry (Amazon ECR) path where inference code is stored.",
                "type": "string",
                "maxLength": 255
              },
              "nearest_model_name": {
                "description": "The name of a pre-trained machine learning benchmarked by an Amazon SageMaker Inference Recommender model that matches your model.",
                "type": "string"
              }
            }
          }
        }
      }
    },
    "risk_rating": {
      "description": "Risk rating of model.",
      "type": "string",
      "enum": [
        "High",
        "Medium",
        "Low",
        "Unknown"
      ]
    },
    "custom_property": {
      "description": "Additional property.",
      "type": "string",
      "maxLength": 1024
    },
    "objective_function": {
      "description": "Objective function for which the training job is optimized.",
      "additionalProperties": false,
      "properties": {
        "function": {
          "type": "string",
          "enum": [
            "Maximize",
            "Minimize"
          ]
        },
        "facet": {
          "type": "string",
          "maxLength": 63
        },
        "condition": {
          "type": "string",
          "maxLength": 63
        }
      }
    },
    "training_metric": {
      "description": "Training metric data.",
      "type": "object",
      "required": [
        "name",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "value": {
          "type": "number"
        }
      }
    },
    "training_hyper_parameter": {
      "description": "Training hyperparameter.",
      "type": "object",
      "required": [
        "name",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "value": {
          "type": "string",
          "pattern": ".{1,255}"
        }
      }
    },
    "linear_graph_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "linear_graph"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 2,
                "maxItems": 2
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "bar_chart_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "bar_chart"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "number"
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "matrix_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "matrix"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 1,
                "maxItems": 20
              },
              "minItems": 1,
              "maxItems": 20
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        }
      }
    },
    "simple_metric": {
      "description": "Metric data.",
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "number",
            "string",
            "boolean"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "string",
              "maxLength": 63
            },
            {
              "type": "boolean"
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "axis_name_array": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 63
      }
    },
    "axis_name_string": {
      "type": "string",
      "maxLength": 63
    }
  }
}
```

## 檢視和更新模型版本的詳細資訊 (Studio 或 Studio Classic)
<a name="model-registry-details-studio"></a>

若要檢視和更新模型版本的詳細資訊，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。在 Studio Classic 中，您可以更新模型版本的核准狀態。如需詳細資訊，請參閱[更新模型的核准狀態](model-registry-approve.md)。另一方面，在 Studio 中，SageMaker AI 會為模型套件建立模型卡，而模型版本 UI 則會提供更新模型卡中詳細資訊的選項。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，從功能表中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 選取包含要檢視之模型版本的模型群組名稱。

1. 在模型版本清單中，選取要檢視的模型版本。

1. 選擇下列其中一個索引標籤。
   + **訓練**：檢視或編輯與訓練任務相關的詳細資訊，包括效能指標、成品、IAM 角色和加密，以及容器。如需詳細資訊，請參閱[新增訓練任務 (Studio)](model-registry-details-studio-training.md)。
   + **評估**：檢視或編輯與訓練任務相關的詳細資訊，例如效能指標、評估資料集和安全性。如需詳細資訊，請參閱[新增評估任務 (Studio)](model-registry-details-studio-evaluate.md)。
   + **稽核**：檢視或編輯與模型業務用途、用量、風險和技術詳細資訊相關的高階詳細資訊，例如演算法和效能限制。如需詳細資訊，請參閱[更新稽核 (治理) 資訊 (Studio)](model-registry-details-studio-audit.md)。
   + **部署**：檢視或編輯推論映像容器的位置，以及構成端點的執行個體。如需詳細資訊，請參閱[更新部署資訊 (Studio)](model-registry-details-studio-deploy.md)。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要檢視的模型群組的名稱。

1. 系統會顯示一個新標籤，其中包含模型群組中模型版本的清單。

1. 在模型版本清單中，選取您要檢視其詳細資訊的模型版本名稱。

1. 在系統開啟的模型版本標籤上，選擇下列其中一項，以查看與模型版本相關的詳細資訊：
   + **活動**：展示模型版本的事件，例如核准狀態更新。
   + **模型品質**：報告與透過模型監控檢查模型品質相關的指標，這些檢查會將模型預測與 Ground Truth 進行比較。如需與透過模型監控檢查模型品質相關的詳細資訊，請參閱[模型品質](model-monitor-model-quality.md)。
   + **可解釋性**：報告與透過模型監控檢查功能屬性相關的指標，這些檢查會將訓練資料與即時資料中功能的相對排名進行比較。如需與透過模型監控檢查可解釋性相關的詳細資訊，請參閱[生產中模型的特徵屬性偏離](clarify-model-monitor-feature-attribution-drift.md)。
   + **偏差**：報告與透過監控偏差檢查偏差漂移相關的指標，這些檢查會將即時資料與訓練資料的分佈進行比較。如需與透過模型監控檢查偏差漂移相關的詳細資訊，請參閱[生產中模型的偏差偏離](clarify-model-monitor-bias-drift.md)。
   + **推論建議程式**：根據您的模型和範例承載，提供初始執行個體建議，以取得最佳效能。
   + **負載測試**：當您提供特定的生產需求 (例如延遲和輸送量限制) 時，針對您選擇的執行個體類型執行負載測試。
   + **推論規格**：顯示即時推論和轉換工作的執行個體類型，以及 Amazon ECR 容器的相關資訊。
   + **資訊**：展示與模型版本相關聯的專案、產生模型的管道、模型群組，以及 Amazon S3 中模型的位置等資訊。

------

# 新增訓練任務 (Studio)
<a name="model-registry-details-studio-training"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用更新的 Studio 體驗。如需使用 Studio Classic 應用程式的資訊，請參閱 [Amazon SageMaker Studio Classic](studio.md)。

您可以將外部建立或使用 SageMaker AI 建立的訓練任務新增至您的模型。如果您新增 SageMaker 訓練任務，SageMaker AI 會預先填入**訓練**索引標籤中所有子頁面的欄位。如果您新增外部建立的訓練任務，則需要手動新增與訓練任務相關的詳細資訊。

**若要將訓練任務新增至您的模型套件，請完成下列步驟。**

1. 選擇**測試**索引標籤。

1. 選擇**新增**。如果沒有看到此選項，您可能已連接訓練任務。如果您想要移除此訓練任務，請完成下列指示來移除訓練任務。

1. 您可以新增在 SageMaker AI 中建立的訓練任務，或外部建立的訓練任務。

   1. 若要新增您在 SageMaker AI 中建立的訓練任務，請完成下列步驟。

      1. 選擇 **SageMaker AI**。

      1. 選取您要新增之訓練任務旁邊的選項方塊。

      1. 選擇**新增**。

   1. 若要新增您在外部建立的訓練任務，請完成下列步驟。

      1. 選擇 **Custom (自訂)**。

      1. 在**名稱**欄位中，插入自訂訓練任務的名稱。

      1. 選擇**新增**。

# 移除訓練任務 (Studio)
<a name="model-registry-details-studio-training-remove"></a>

您可以完成下列步驟，從模型中移除外部建立或使用 SageMaker AI 建立的訓練任務。

**若要從模型套件中移除訓練任務，請完成下列步驟。**

1. 選擇**訓練**。

1. 選擇**訓練**索引標籤下的**齒輪** (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/Settings_squid.png)) 圖示。

1. 選擇訓練任務旁邊的**移除**。

1. 選擇**是，我想要移除 <訓練任務的名稱>**。

1. 選擇**完成**。

# 更新訓練任務詳細資訊 (Studio)
<a name="model-registry-details-studio-training-update"></a>

請完成下列步驟，以更新與您模型相關聯的訓練任務詳細資訊，此訓練任務是在外部或使用 SageMaker AI 建立的。

**若要更新 (和檢視) 與訓練任務相關的詳細資訊：**

1. 在**訓練**索引標籤上，檢視訓練任務的狀態。如果您已將訓練任務新增至模型套件，則狀態為 `Complete`，否則為 `Undefined`。

1. 若要檢視與訓練任務相關的詳細資訊，例如效能、超參數和識別詳細資訊，請選擇**訓練**索引標籤。

1. 若要更新和檢視與模型效能相關的詳細資訊，請完成下列步驟。

   1. 在**訓練**索引標籤的左側邊欄中選擇**效能**。

   1. 檢視與訓練任務相關的**指標**。**效能**頁面會依名稱、值和任何您已新增且與指標相關的備註列出指標。

   1. (選用) 若要將備註新增至現有指標，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 將備註新增至任何列出的指標。

      1. 在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

   1. 檢視與訓練任務相關的**自訂指標**。自訂指標的格式類似於指標。

   1. (選用) 若要新增自訂指標，請完成下列步驟。

      1. 選擇**新增**。

      1. 為您的新指標插入名稱、值和任何選用備註。

   1. (選用) 若要移除自訂指標，請選擇您要移除之指標旁邊的**垃圾桶**圖示。

   1. 在**可觀測性**文字方塊中，檢視任何您已新增且與訓練任務效能相關的備註。

   1. (選用) 若要新增或更新可觀測性，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 在**可觀測性**文字方塊中新增或更新備註。

      1. 在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與模型成品相關的詳細資訊，請完成下列步驟。

   1. 在**訓練**索引標籤的左側邊欄中選擇**成品**。

   1. 在**位置 (S3 URI)** 欄位中，檢視訓練資料集的 Amazon S3 位置。

   1. 在**模型**欄位中，檢視模型成品的名稱和 Amazon S3 位置，這些模型成品來自您包含在訓練任務中的其他模型。

   1. 若要更新**成品**頁面中的任何欄位，請完成下列步驟。

      1. 選擇模型版本頁面右上方的垂直省略符號，然後選擇**編輯**。

      1. 在任何欄位中輸入新值。

      1. 在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與超參數相關的詳細資訊，請完成下列步驟。

   1. 在**訓練**索引標籤的左側邊欄中選擇**超參數**。

   1. 檢視 SageMaker AI 提供的超參數和定義的自訂超參數。每個超參數都會與其名稱和值一起列出。

   1. 檢視您新增的自訂超參數。

   1. (選用) 若要新增其他自訂超參數，請完成下列步驟。

      1. 在**自訂超參數**資料表的右上角上方，選擇**新增**。一對新的空白欄位隨即出現。

      1. 輸入新自訂超參數的名稱和值。這些值會自動儲存。

   1. (選用) 若要移除自訂超參數，請選擇超參數右側的**垃圾桶**圖示。

1. 若要更新和檢視與訓練任務環境相關的詳細資訊，請完成下列步驟。

   1. 在**訓練**索引標籤的左側邊欄中選擇**環境**。

   1. 檢視 SageMaker AI (適用於 SageMaker 訓練任務) 或您 (適用於自訂訓練任務) 新增的任何訓練任務容器的 Amazon ECR URI 位置。

   1. (選用) 若要新增其他訓練任務容器，請選擇**新增**，然後輸入新訓練容器的 URI。

1. 若要更新和檢視訓練任務名稱和訓練任務的 Amazon Resource Name (ARN)，請完成下列步驟。

   1. 在**訓練**索引標籤的左側邊欄中選擇**詳細資訊**。

   1. 檢視訓練任務名稱和訓練任務的 ARN。

# 新增評估任務 (Studio)
<a name="model-registry-details-studio-evaluate"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用更新的 Studio 體驗。如需使用 Studio Classic 應用程式的資訊，請參閱 [Amazon SageMaker Studio Classic](studio.md)。

註冊您的模型後，您可以使用一或多個資料集測試模型，以評估其效能。您可以從 Amazon S3 新增一或多個評估任務，或手動輸入所有詳細資訊來定義自己的評估任務。如果您從 Amazon S3 新增任務，SageMaker AI 會預先填入**評估**索引標籤中所有子頁面的欄位。如果您定義自己的評估任務，則需要手動新增與評估任務相關的詳細資訊。

**若要將第一個評估任務新增至模型套件，請完成下列步驟。**

1. 選擇**評估**索引標籤。

1. 選擇**新增**。

1. 您可以從 Amazon S3 新增評估任務，或新增自訂評估任務。

   1. 若要從 Amazon S3 使用輔助資料新增評估任務，請完成下列步驟。

      1. 選擇 **S3**。

      1. 輸入評估任務的名稱。

      1. 輸入評估任務輸出輔助資料的 Amazon S3 位置。

      1. 選擇**新增**。

   1. 若要新增自訂評估任務，請完成下列步驟：

      1. 選擇 **Custom (自訂)**。

      1. 輸入評估任務的名稱。

      1. 選擇**新增**。

**若要將額外的評估任務新增至模型套件，請完成下列步驟。**

1. 選擇**評估**索引標籤。

1. 選擇**訓練**索引標籤下的**齒輪** (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/Settings_squid.png)) 圖示。

1. 在對話方塊中，選擇**新增**。

1. 您可以從 Amazon S3 新增評估任務，或新增自訂評估任務。

   1. 若要從 Amazon S3 使用輔助資料新增評估任務，請完成下列步驟。

      1. 選擇 **S3**。

      1. 輸入評估任務的名稱。

      1. 輸入評估任務輸出輔助資料的 Amazon S3 位置。

      1. 選擇**新增**。

   1. 若要新增自訂評估任務，請完成下列步驟：

      1. 選擇 **Custom (自訂)**。

      1. 輸入評估任務的名稱。

      1. 選擇**新增**。

# 移除評估任務 (Studio)
<a name="model-registry-details-studio-evaluate-remove"></a>

您可以完成下列步驟，從模型中移除外部建立或使用 SageMaker AI 建立的評估任務。

**若要從模型套件中移除評估任務，請完成下列步驟。**

1. 選擇**評估**索引標籤。

1. 選擇**訓練**索引標籤下的**齒輪** (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/icons/Settings_squid.png)) 圖示。

1. (選用) 若要從清單中尋找您的評估任務，請在搜尋方塊中輸入搜尋詞彙，以縮小選擇清單範圍。

1. 選擇評估任務旁邊的選項按鈕。

1. 選擇**移除**。

1. 選擇**是，我想要移除 <評估任務的名稱>**。

1. 選擇**完成**。

# 更新評估任務 (Studio)
<a name="model-registry-details-studio-evaluate-update"></a>

請完成下列步驟，以更新與您模型相關聯的評估任務詳細資訊，此訓練任務是在外部或使用 SageMaker AI 建立的。

**若要更新 (和檢視) 與評估任務相關的詳細資訊：**

1. 在**評估**索引標籤上，檢視評估任務的狀態。如果您已將評估任務新增至模型套件，則狀態為 `Complete`，否則為 `Undefined`。

1. 若要檢視與評估任務相關的詳細資訊，例如效能和成品位置，請選擇**評估**索引標籤。

1. 若要在評估期間更新和檢視與模型效能相關的詳細資訊，請完成下列步驟。

   1. 在**評估**索引標籤邊欄中選擇**效能**。

   1. 在指標清單中檢視與評估任務相關的**指標**。**指標**清單會依名稱、值和任何您已新增且與指標相關的備註來顯示個別指標。

   1. 在**可觀測性**文字方塊中，檢視任何您已新增且與評估任務效能相關的備註。

   1. 若要更新任何指標的任何**備註**欄位或更新**可觀測性**欄位，請完成下列步驟。

      1. 選擇模型版本頁面右上方的垂直省略符號，然後選擇**編輯**。

      1. 為任何指標或在**可觀測性**文字方塊中輸入備註。

      1. 在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與評估任務資料集相關的詳細資訊，請完成下列步驟。

   1. 在**評估**頁面的左側邊欄中選擇**成品**。

   1. 檢視評估任務中使用的資料集。

   1. (選用) 若要新增資料集，請選擇**新增**並輸入資料集的 Amazon S3 URI。

   1. (選用) 若要移除資料集，請選擇您要移除之資料集旁邊的**垃圾桶**圖示。

1. 若要檢視任務名稱和評估任務 ARN，請選擇**詳細資訊**。

# 更新稽核 (治理) 資訊 (Studio)
<a name="model-registry-details-studio-audit"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用更新的 Studio 體驗。如需使用 Studio Classic 應用程式的資訊，請參閱 [Amazon SageMaker Studio Classic](studio.md)。

記錄重要的模型詳細資訊，協助您的組織建立強大的模型治理架構。您和您的團隊成員可以參考這些詳細資訊，以便他們將模型用於適當的使用案例、了解模型的業務領域和擁有者，以及了解模型風險。您也可以儲存模型預期執行方式的詳細資訊，以及效能限制的原因。

**若要檢視或更新與模型治理相關的詳細資訊，請完成下列步驟。**

1. 在**稽核**索引標籤上，檢視模型卡的核准狀態。狀態可以是下列其中一個：
   + **草稿**：模型卡仍然是草稿。
   + **待核准**：模型卡正在等待核准。
   + **已核准**：模型卡已核准。

1. 若要更新模型卡的核准狀態，請選擇核准狀態旁邊的下拉式功能表，然後選擇更新的核准狀態。

1. 若要更新和檢視與模型套件風險相關的詳細資訊，請完成下列步驟。

   1. 在**稽核**索引標籤的左側邊欄中選擇**風險**。

   1. 檢視目前的風險評等和風險評等的說明。

   1. 若要更新評等或說明，請完成下列步驟。

      1. 選擇**稽核**頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. (選用) 選擇更新的風險評等。

      1. (選用) 更新風險評等說明。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與模型套件使用情況相關的詳細資訊，請完成下列步驟。

   1. 在**稽核**索引標籤的左側邊欄中選擇**使用情況**。

   1. 檢視您在下列欄位中新增的文字：
      + **問題類型**：用來建置模型的機器學習演算法類別。
      + **演算法類型**：用來建立模型的特定演算法。
      + **預期用途**：模型目前在業務問題中的應用。
      + **影響模型功效的因素**：有關模型效能限制的注意事項。
      + **建議用途**：您可以使用模型建立的應用類型、您可以預期合理效能的案例，或要與模型搭配使用的資料類型。
      + **道德考量**：描述您的模型如何根據年齡或性別等因素進行區分。

   1. 若要更新任何先前列出的欄位，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. (選用) 如有需要，使用**問題類型**和**演算法類型**的下拉式功能表來選取新值。

      1. (選用) 更新其餘欄位中的文字描述。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與模型套件利益相關者相關的詳細資訊，請完成以下步驟。

   1. 在**稽核**索引標籤的左側邊欄中選擇**利益相關者**。

   1. 如果有的話，請檢視目前的模型擁有者和建立者。

   1. 若要更新模型擁有者或建立者，請完成下列步驟：

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 更新模型擁有者或模型建立者欄位。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視與模型套件解決之業務問題相關的詳細資訊，請完成下列步驟。

   1. 在**稽核**索引標籤的左側邊欄中選擇**業務**。

   1. 檢視模型解決的業務問題、業務問題利益相關者和業務線的目前描述，如果有的話。

   1. 若要更新**業務**索引標籤中的任何欄位，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 更新任何欄位中的描述。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

1. 若要更新和檢視模型的現有文件 (以鍵值對表示)，請完成下列步驟。

   1. 在**稽核**頁面的左側邊欄中選擇**文件**。

   1. 檢視現有的鍵值對。

   1. 若要新增任何鍵值對，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 選擇**新增**。

      1. 輸入索引鍵名稱和相關聯值。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

   1. 若要移除任何鍵值對，請完成下列步驟。

      1. 選擇模型版本頁面右上角的垂直省略符號，然後選擇**編輯**。

      1. 選擇要移除的鍵值對旁邊的**垃圾桶**圖示。

      1.  在模型版本頁面頂端，選擇**編輯模型版本...**橫幅中的**儲存**。

# 更新部署資訊 (Studio)
<a name="model-registry-details-studio-deploy"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用更新的 Studio 體驗。如需使用 Studio Classic 應用程式的資訊，請參閱 [Amazon SageMaker Studio Classic](studio.md)。

在評估您的模型效能並確定模型已準備好用於生產工作負載之後，您可以變更模型的核准狀態，以啟動 CI/CD 部署。如需核准狀態定義的詳細資訊，請參閱[更新模型的核准狀態](model-registry-approve.md)。

**若要檢視或更新與模型套件部署相關的詳細資訊，請完成下列步驟。**

1. 在**部署**索引標籤上，檢視模型套件核准狀態。可能值如下：
   + **待核准**：模型已註冊，但尚未核准或拒絕部署。
   + **已核准**：模型已核准用於 CI/CD 部署。如果有適當的 EventBridge 規則，可對模型核准事件啟動模型部署，就像從 SageMaker AI 專案範本建置的模型一樣，SageMaker AI 也會部署該模型。
   + **已拒絕**：模型已被拒絕進行部署。

   如果您需要變更核准狀態，請選擇狀態旁邊的下拉式功能表，然後選擇更新的狀態。

1. 若要更新模型套件核准狀態，請選擇核准狀態旁邊的下拉式清單，然後選擇更新的核准狀態。

1. 在**容器**清單中，檢視推論映像容器。

1. 在**執行個體**清單中，檢視構成部署端點的執行個體。

# 比較模型版本
<a name="model-registry-version-compare"></a>

產生模型版本時，您可能想要並排檢視相關模型品質指標來比較模型版本。例如，您可能想要透過比較均方誤差 (MSE) 值來追蹤精確度，或者您可能決定移除在所選指標方面效能不佳的模型。下列程序展示如何使用 Amazon SageMaker Studio Classic，在模型註冊庫中設定模型版本比較。

## 比較模型版本 (Amazon SageMaker Studio Classic)
<a name="model-registry-version-compare-studio"></a>

**注意**  
您只能使用 Amazon SageMaker Studio Classic 主控台比較模型版本。

若要比較模型群組中的模型版本，請完成下列步驟：

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要檢視的模型群組的名稱。系統會開啟一個新標籤，其中包含模型群組中模型版本的清單。

1. 在模型版本清單中，勾選您要比較的模型版本旁邊的方塊。

1. 選擇**動作**下拉式功能表，然後選擇**比較**。系統會針對您選取的模型顯示模型品質指標的清單。

# 檢視和管理模型群組和模型版本標籤
<a name="model-registry-tags"></a>

模型註冊庫會協助您檢視和管理與模型群組相關的標籤。您可使用標籤來依照用途、擁有者、環境或其他條件對模型群組進行分類。下列指示展示如何在 Amazon SageMaker Studio 主控台中檢視、新增、刪除和編輯您的標籤。

**注意**  
SageMaker 模型註冊表中的模型套件不支援標記，這些模型套件已進行版本控制。您可以改為使用 `CustomerMetadataProperties` 新增鍵值對。模型註冊表中的模型套件群組支援標記。

## 檢視和管理模型群組標籤
<a name="model-registry-tags-model-group"></a>

------
#### [ Studio ]

**若要檢視模型群組標籤，請完成下列步驟：**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選取要檢視的模型群組名稱。

1. 在模型群組頁面中，選擇**標籤**索引標籤。檢視與您模型群組相關聯的標籤。

**若要新增模型群組標籤，請完成下列步驟：**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 在模型群組頁面中，選擇**標籤**索引標籤。

1. 選擇**新增/編輯標籤**。

1. 在 **\$1 新增標籤**上方，於空白的**索引鍵**欄位中輸入新的索引鍵。

1. (可選) 在空白的**值**欄位中輸入新值。

1. 選擇**確認變更**。

1. 確認您的新標籤在**資訊**頁面的**標籤**區段中顯示。

**若要刪除模型群組標籤，請完成下列步驟：**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 在模型群組頁面中，選擇**標籤**索引標籤。

1. 選擇**新增/編輯標籤**。

1. 選擇您要移除的鍵值對旁邊的**垃圾桶**圖示。

1. 選擇**確認變更**。

**若要編輯模型群組標籤，請完成下列步驟：**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 在模型群組頁面中，選擇**標籤**索引標籤。

1. 選擇**新增/編輯標籤**。

1. 在您要編輯之鍵對的**值**欄位中輸入新值。

1. 選擇**確認變更**。

------
#### [ Studio Classic ]

**若要檢視模型群組標籤，請完成下列步驟：**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 選擇**資訊**。

1. 在**資訊**頁面的**標籤**區段中檢視您的標籤。

**若要新增模型群組標籤，請完成下列步驟：**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 選擇**資訊**。

1. 如果您沒有任何標籤，請選擇**新增標籤**。

1. 如果有預先存在的標籤，請在**標籤**區段中選擇**管理標籤**。模型群組的標籤清單會顯示為鍵值對。

1. 在**新增標籤**上方，在空白的**金鑰**欄位中輸入新金鑰。

1. (可選) 在空白的**值**欄位中輸入新值。

1. 選擇**確認變更**。

1. 確認您的新標籤在**資訊**頁面的**標籤**區段中顯示。

**若要刪除模型群組標籤，請完成下列步驟：**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 選擇**資訊**。

1. 在**標籤**區段中，選擇**管理標籤**。模型群組的標籤清單會顯示為鍵值對。

1. 選擇您要移除的標籤右側的**垃圾桶**圖示。

1. 選擇**確認變更**。

1. 確認您移除的標籤不會在**資訊**頁面的**標籤**區段中顯示。

**若要編輯模型群組標籤，請完成下列步驟：**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要編輯的模型群組的名稱。

1. 選擇**資訊**。

1. 在**標籤**區段中，選擇**管理標籤**。模型群組的標籤清單會顯示為鍵值對。

1. 編輯任何鍵或值。

1. 選擇**確認變更**。

1. 確認標籤包含您在**資訊**頁面的**標籤**區段中所做的編輯。

**若要將模型群組指派給專案或將其標記到群組，請完成下列步驟：**

1. 使用 [ListTags](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListTags.html) API 為 SageMaker AI 專案取得索引鍵為 `sagemaker:project-name` 和 `sagemaker:project-id` 的標籤。

1. 若要將標籤套用至您的模型套件群組，請選擇下列其中一種方法：
   + 如果您建立新的模型套件群組並想要新增標籤，請將您的標籤從步驟 1 傳遞至 [CreateModelPackageGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackageGroup.html) API。
   + 如果您想要將標籤新增至現有的模型套件群組，請使用 [AddTags](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html) API。
   + 如果您透過 Pipelines 建立模型套件群組，請使用 `pipeline.create()` 或 `pipeline.upsert()` 方法，或將標籤傳遞至 [RegisterModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model) 步驟。

------

# 刪除模型版本
<a name="model-registry-delete-model-version"></a>

此程序示範如何在 Amazon SageMaker Studio 主控台中刪除模型版本。

## 刪除模型版本 (Studio 或 Studio Classic)
<a name="model-registry-delete-model-version-studio"></a>

若要在 Amazon SageMaker Studio 主控台中刪除模型版本，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選擇您要檢視之模型群組左側的角度括號。

1. 模型群組中模型版本的清單即會出現。如果您沒有看到要刪除的模型版本，請選擇**檢視全部**。

1. 選取您要刪除的模型版本旁邊的核取方塊。

1. 選擇資料表右上角的垂直省略符號，然後選擇**刪除** (或者，如果您位於模型群組詳細資訊頁面，則選擇**刪除模型版本**)。

1. 在**刪除模型版本**對話方塊中，選擇**是，刪除模型版本**。

1. 選擇 **刪除**。

1. 確認您已刪除的模型版本不再出現在模型群組中。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。系統隨即會顯示模型群組清單。

1. 從模型群組清單中，選取您要刪除之模型版本的模型群組名稱。

1. 從模型版本清單中，選取您要刪除的模型版本名稱。

1. 選擇**動作**下拉式功能表，然後選擇**移除**。

1. 在確認對話方塊中，輸入 `REMOVE`。

1. 選擇**移除**。

1. 確認您移除的模型版本未在模型群組的模型版本清單中顯示。

------

# 模型生命週期的預備建構模組
<a name="model-registry-staging-construct"></a>

您可以使用模型註冊庫預備建構模組，來定義模型可以針對模型工作流程和生命週期進展的一系列階段。這可簡化模型在轉換開發、測試和生產階段時的追蹤和管理。以下將提供預備建構模組的相關資訊，以及如何在模型治理中使用它們。

階段建構模組可讓您定義模型進展的一系列階段和狀態。在每個階段，具有相關許可的特定角色可以更新階段狀態。隨著模型在各個階段的進展，其中繼資料也隨之延續，從而提供模型生命週期的全面檢視。每個階段的授權角色都可以存取和檢閱此中繼資料，從而做出明智的決策。這包括下列優點：
+ 模型生命週期許可 - 為指定角色設定許可，以更新模型階段狀態，並在關鍵轉換點強制執行核准閘道。管理員可以使用 IAM 政策和條件金鑰搭配 API 來指派許可。例如，您可以限制資料科學家更新模型生命週期階段，從「開發」轉換到「生產」。如需範例，請參閱 [設定預備建構模組範例](model-registry-staging-construct-set-up.md)。
+ 透過 Amazon EventBridge 的模型生命週期事件 - 您可以使用 EventBridge 取用生命週期階段事件。這會設定您可在模型變更核准或預備狀態時接收事件通知，從而與第三方治理工具整合。如需範例，請參閱[取得 ModelLifeCycle 的事件通知](model-registry-staging-construct-event-bridge.md)。
+ 根據模型生命週期欄位搜尋 - 您可以使用 [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) API 搜尋和篩選階段和階段狀態。
+ 模型生命週期事件的稽核軌跡 - 您可以檢視模型生命週期轉換時模型核准和預備事件的歷程記錄。

下列主題將逐步引導您如何在管理員端設定階段建構模組，以及如何從使用者端更新階段狀態。

**Topics**
+ [設定預備建構模組範例](model-registry-staging-construct-set-up.md)
+ [在 Studio 中更新模型套件階段和狀態](model-registry-staging-construct-update-studio.md)
+ [更新模型套件階段和狀態範例 (boto3)](model-registry-staging-construct-update-boto3.md)
+ [使用 AWS CLI 範例叫用 ModelLifeCycle](model-registry-staging-construct-cli.md)
+ [取得 ModelLifeCycle 的事件通知](model-registry-staging-construct-event-bridge.md)

# 設定預備建構模組範例
<a name="model-registry-staging-construct-set-up"></a>

若要為您的 Amazon SageMaker 模型註冊表設定階段建構模組，管理員需要將相關許可授予預期的角色。以下提供如何為各種角色設定階段建構模組的範例。

**注意**  
Amazon SageMaker AI 網域內的使用者將能夠檢視網域內定義的所有階段，但只能使用他們具有許可的階段。

階段由 `ModelLifeCycle` 參數定義，並具有下列結構。管理員會設定哪些角色可以存取 `stage` 和 `stageStatus` 的許可。擔任角色的使用者可以使用相關的 `stage` 和 `stageStatus`，並包括自己的 `stageDescription`。

```
ModelLifeCycle {
    stage: String # Required (e.g., Development/QA/Production)
    stageStatus: String # Required (e.g., PendingApproval/Approved/Rejected)  
    stageDescription: String # Optional
}
```

下表包含模型註冊庫預先定義的階段建構模組範本。您可以根據您的使用案例定義自己的階段建構模組。必須先設定相關許可，使用者才能使用這些許可。


| 階段 | 階段狀態 | 
| --- | --- | 
|  提案  |  PendingApproval  | 
|  開發  |  InProgress  | 
|  QA  |  OnHold  | 
|  PreProduction  |  Approved  | 
|  生產  |  已拒絕  | 
|  已封存  |  已淘汰  | 

下列 API 可以調用 `ModelLifeCycle` 參數：
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeModelPackage.html)

------
#### [ Policy for a data scientist role ]

以下是使用模型生命週期條件金鑰的範例 IAM 政策。您可以根據自己的要求修改它們。在此範例中，角色的許可僅限於設定或定義模型生命週期階段，以：
+ 建立或更新階段為 `"Development"` 和狀態為 `"Approved"` 的模型。
+ 使用階段品質保證、`"QA"` 和狀態 `"PendingApproval"` 更新模型套件。

```
{
    "Action" : [
        "sagemaker:UpdateModelPackage",
        "sagemaker:CreateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Development"
            "sagemaker:ModelLifeCycle:stageStatus" : "Approved"       
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Staging"
            "sagemaker:ModelLifeCycle:stageStatus" : "PendingApproval"       
        }
    }
}
```

------
#### [ Policy for a quality assurance specialist ]

以下是使用模型生命週期條件金鑰的範例 IAM 政策。您可以根據自己的要求修改它們。在此範例中，角色的許可僅限於設定或定義模型生命週期階段，以：
+ 使用下列內容更新模型套件：
  + 階段 `"QA"` 和狀態 `"Approved"` 或 `"Rejected"`。
  + 階段 `"Production"` 和狀態 `"PendingApproval"`。

```
{
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Staging",
            "sagemaker:ModelLifeCycle:stageStatus": "Approved"
        }
    }
}, {
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Staging",
            "sagemaker:ModelLifeCycle:stageStatus": "Rejected"
        }
    }
}, {
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Production",
            "sagemaker:ModelLifeCycle:stageStatus": "PendingApproval"
        }
    }
}
```

------
#### [ Policy for lead engineer role ]

以下是使用模型生命週期條件金鑰的範例 IAM 政策。您可以根據自己的要求修改它們。在此範例中，角色的許可僅限於設定或定義模型生命週期階段，以：
+ 使用下列內容更新模型套件：
  + 階段 `"Production"` 和狀態 `"Approved"` 或 `"Rejected"`。
  + 階段 `"Development"` 和狀態 `"PendingApproval"`。

```
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "ForAnyvalue:StringEquals" : {
            "sagemaker:ModelLifeCycle:stage" : "Production",
            "sagemaker:ModelLifeCycle:stageStatus" : "Approved"
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals:" {
            "sagemaker:ModelLifeCycle:stage" : "Production"
            "sagemaker:ModelLifeCycle:stageStatus" : "Rejected"
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Development"
            "sagemaker:ModelLifeCycle:stageStatus" : "PendingApproval"
        }
    }
}
```

------

若要取得任何模型狀態更新的 Amazon EventBridge 通知，請參閱[取得 ModelLifeCycle 的事件通知](model-registry-staging-construct-event-bridge.md)中的範例。如需您可能會收到的範例 EventBridge 承載，請參閱 [SageMaker 模型套件狀態變更](automating-sagemaker-with-eventbridge.md#eventbridge-model-package)。

# 在 Studio 中更新模型套件階段和狀態
<a name="model-registry-staging-construct-update-studio"></a>

若要使用模型套件階段建構模組，您需要擔任具有相關許可的執行角色。以下頁面提供如何使用 Amazon SageMaker Studio 更新階段狀態的相關資訊。

所有使用者都可檢視網域中定義的所有階段建構模組。若要更新階段，您需要讓管理員為您設定存取該階段的相關許可。如需作法的相關資訊，請參閱[設定預備建構模組範例](model-registry-staging-construct-set-up.md)。

下列程序將帶您前往 Studio UI，您可以在其中更新模型套件階段。

1. 登入 Amazon SageMaker Studio。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio](studio-updated-launch.md)。

1. 在左側導覽窗格中，選擇**模型**。

1. 尋找您的模型。
   + 您可以使用索引標籤來尋找模型。例如，選擇**已註冊模型**或**可部署模型**索引標籤。
   + 您可以使用**我的模型**和**與我共用**選項，來尋找您建立的模型或您共用的模型。

1. 選取您要更新的模型旁邊的核取方塊。

1. 選擇**更多選項**圖示。

1. 選擇**更新模型生命週期**。這將帶您前往**更新模型生命週期**區段。

1. 完成任務以更新階段。

   如果您無法更新階段，將會收到錯誤。您的管理員需要為您設定許可才能執行此操作。如需如何設定許可的相關資訊，請參閱[設定預備建構模組範例](model-registry-staging-construct-set-up.md)。

# 更新模型套件階段和狀態範例 (boto3)
<a name="model-registry-staging-construct-update-boto3"></a>

若要更新模型套件階段和狀態，您需要擔任具有相關許可的執行角色。以下提供如何使用 [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html) API 或使用 適用於 Python (Boto3) 的 AWS SDK更新階段狀態的範例。

在此範例中，[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html) API 動作的 `ModelLifeCycle` 階段 `"Development"` 和階段狀態 `"Approved"` 條件金鑰已授予您的執行角色。您也可以在 `stage-description` 中包括描述。如需詳細資訊，請參閱[設定預備建構模組範例](model-registry-staging-construct-set-up.md)。

```
from sagemaker import get_execution_role, session 
import boto3 

region = boto3.Session().region_name role = get_execution_role() 
sm_client = boto3.client('sagemaker', region_name=region)

model_package_update_input_dict = {
    "ModelLifeCycle" : { 
        "stage" : "Development",
        "stageStatus" : "Approved",
        "stageDescription" : "stage-description"
    }
} 
model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)
```

# 使用 AWS CLI 範例叫用 ModelLifeCycle
<a name="model-registry-staging-construct-cli"></a>

您可以使用 AWS CLI 工具來管理您的 AWS 資源。一些 AWS CLI 命令包括 [search](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudsearchdomain/search.html) 和 [list-actions](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/fis/list-actions.html)。以下頁面將提供使用這些命令時如何使用 `ModelPackage` 的範例。如需設定階段建構模組的相關資訊和範例，請參閱[設定預備建構模組範例](model-registry-staging-construct-set-up.md)。

此頁面上的範例使用以下變數：
+ `region` 是模型套件存在的區域。
+ `stage-name` 是所定義階段的名稱。
+ `stage-status` 是所定義階段狀態的名稱。

以下是使用 ModelLifeCycle 的範例 AWS CLI 命令。

使用您已定義的 *stage-name* 搜尋模型套件。

```
aws sagemaker search --region 'region' --resource ModelPackage --search-expression '{"Filters": [{"Name": "ModelLifeCycle.Stage","Value": "stage-name"}]}'
```

列出與 `ModelLifeCycle` 相關聯的動作。

```
aws sagemaker list-actions --region 'region' --action-type ModelLifeCycle
```

使用 ModelLifeCycle 建立模型套件。

```
aws sagemaker create-model-package --model-package-group-name 'model-package-group-name' --source-uri 'source-uri' --region 'region' --model-life-cycle '{"Stage":"stage-name", "StageStatus":"stage-status", "StageDescription":"Your Staging Comment"}' 
```

使用 ModelLifeCycle 更新模型套件。

```
aws sagemaker update-model-package --model-package 'model-package-arn' --region 'region' --model-life-cycle '{"Stage":"stage-name", "StageStatus":"stage-status"}' 
```

透過 ModelLifeCycle 欄位搜尋。

```
aws sagemaker search --region 'region' --resource ModelPackage --search-expression '{"Filters": [{"Name": "ModelLifeCycle.Stage","Value": "stage-name"}]}'
```

透過 [Amazon SageMaker 機器學習 (ML) 歷程追蹤](lineage-tracking.md) API 擷取 ModelLifeField 更新的稽核記錄。

```
aws sagemaker list-actions --region 'region' --action-type ModelLifeCycle
```

```
aws sagemaker describe-action --region 'region' --action-name 'action-arn or action-name'
```

# 取得 ModelLifeCycle 的事件通知
<a name="model-registry-staging-construct-event-bridge"></a>

您可以在帳戶中使用 EventBridge 取得 ModelLifeCycle 更新通知和事件。以下是要在您帳戶中設定的 EventBridge 規則範例，以取得 ModelLifeCycle 事件通知。

```
{
  "source": ["aws.sagemaker"],
  "detail-type": ["SageMaker Model Package State Change"]
}
```

如需您可能會收到的範例 EventBridge 承載，請參閱 [SageMaker 模型套件狀態變更](automating-sagemaker-with-eventbridge.md#eventbridge-model-package)。

# 更新模型的核准狀態
<a name="model-registry-approve"></a>

建立模型版本之後，通常需要先評估其效能，然後再將其部署到生產端點。如果模型版本符合需求，您可以將其核准狀態更新為 `Approved`。將狀態設定為 `Approved` 可啟動模型的 CI/CD 部署。如果模型版本不符合需求，您可以將核准狀態更新為 `Rejected`。

您可以在註冊模型版本後手動更新模型版本的核准狀態，也可以在建立 SageMaker AI 管道時建立條件步驟來對模型進行評估。如需在 SageMaker AI 管道中建立條件步驟的相關資訊，請參閱[Pipelines 步驟](build-and-manage-steps.md)。

當您使用 SageMaker AI 提供的其中一個專案範本，且模型版本的核准狀況變更時，會發生下列動作。只顯示有效的轉變。
+ `PendingManualApproval` 至 `Approved` - 針對已核准的模型版本啟動 CI/CD 部署
+ `PendingManualApproval` 至 `Rejected` - 不採取任何動作
+ `Rejected` 至 `Approved` - 針對已核准的模型版本啟動 CI/CD 部署
+ `Approved` 至 `Rejected` - 啟動 CI/CD 以部署具有 `Approved` 狀態的最新模型版本

您可以使用 適用於 Python (Boto3) 的 AWS SDK 或使用 Amazon SageMaker Studio 主控台來更新模型版本的核准狀態。您也可以作為 SageMaker AI 管道中條件步驟的一部分來更新模型版本的核准狀態。如需在 SageMaker AI 管道中使用模型核准步驟的相關資訊，請參閱[管道概觀](pipelines-overview.md)。

## 更新模型的核准狀態 (Boto3)
<a name="model-registry-approve-api"></a>

在 [註冊模型版本](model-registry-version.md) 中建立模型版本時，可將 `ModelApprovalStatus` 設定為 `PendingManualApproval`。您可以透過調用 `update_model_package` 來更新模型的核准狀態。請注意，您可以撰寫程式碼來自動執行此程序，例如，根據對模型效能的某些評估結果來設定模型的核准狀態。您也可以在管道中建立一個步驟，從而在核准時自動部署新模型版本。下列程式碼片段展示如何將核准狀態手動變更為 `Approved`。

```
model_package_update_input_dict = {
    "ModelPackageArn" : model_package_arn,
    "ModelApprovalStatus" : "Approved"
}
model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)
```

## 更新模型的核准狀態 (Studio 或 Studio Classic)
<a name="model-registry-approve-studio"></a>

若要在 Amazon SageMaker Studio 主控台中手動變更核准狀態，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選擇您要檢視之模型群組左側的角度括號。

1. 模型群組中模型版本的清單即會出現。如果您沒有看到要刪除的模型版本，請選擇**檢視全部**，以在模型群組詳細資訊頁面中顯示模型版本的完整清單。

1. 選取要更新的模型版本名稱。

1. **部署**索引標籤會顯示目前的核准狀態。選擇目前核准狀態旁邊的下拉式選單，然後選取更新的核准狀態。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要檢視的模型群組名稱。系統會開啟一個新標籤，其中包含模型群組中模型版本的清單。

1. 在模型版本清單中，選取您要更新的模型版本名稱。

1. 在**動作**下拉式功能表下，您可以選擇兩個可能的功能表選項之一來更新模型版本狀態。
   + 使用**更新狀態** 選項

     1. 在**動作**下拉式功能表下，選擇**更新狀態**下拉式功能表，然後選擇新的模型版本狀態。

     1. (可選) 在**評論**欄位中，新增其他詳細資訊。

     1. 選擇**儲存並更新**。
   + 使用**編輯**選項

     1. 在**動作**下拉式功能表下，選擇**編輯**。

     1. (可選) 在**評論**欄位中，新增其他詳細資訊。

     1. 選擇**儲存變更**。

1. 確認模型版本狀態已在模型版本頁面中更新為正確的值。

------

對於 `us-east-1`、`ap-northeast-1`、 `us-west-2`和 `eu-west-1`區域，您可以使用下列指示來存取已記錄和已註冊模型版本的歷程詳細資訊：

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 選擇已記錄**的模型**索引標籤，如果尚未選取，請選取**已註冊的模型**。

1. 選取模型，然後選擇**檢視最新版本**。

1. 選擇**控管**索引標籤。

1. **控管概觀**下的**部署**區段會顯示目前的核准狀態。從下拉式選單中選取更新的核准狀態。

# 使用 Python 從登錄中部署模型
<a name="model-registry-deploy"></a>

註冊模型版本並核准其進行部署後，請將其部署到 SageMaker AI 端點以進行即時推論。您可以使用 SageMaker AI SDK 或 適用於 Python (Boto3) 的 AWS SDK部署模型。

當您建立機器學習維運 (MLOps) 專案並選擇包含模型部署的 MLOps 專案範本後，模型註冊庫中已核准的模型版本會自動部署到生產環境。如需使用 SageMaker MLOps 專案的相關資訊，請參閱[使用 SageMaker 專案進行 MLOps 自動化](sagemaker-projects.md)。

您也可以透過新增跨 AWS 帳戶資源政策，讓 帳戶部署在不同帳戶中建立的模型版本。例如，您組織中的一個團隊可能負責訓練模型，另一個團隊負責部署和更新模型。

**Topics**
+ [透過登錄檔部署模型 (SageMaker SDK)](#model-registry-deploy-smsdk)
+ [透過登錄檔部署模型 (Boto3)](#model-registry-deploy-api)
+ [從其他帳戶部署模型版本](#model-registry-deploy-xaccount)

## 透過登錄檔部署模型 (SageMaker SDK)
<a name="model-registry-deploy-smsdk"></a>

若要使用 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) 部署模型版本，請使用下列程式碼片段：

```
from sagemaker import ModelPackage
from time import gmtime, strftime

model_package_arn = 'arn:aws:sagemaker:us-east-2:12345678901:model-package/modeltest/1'
model = ModelPackage(role=role, 
                     model_package_arn=model_package_arn, 
                     sagemaker_session=sagemaker_session)
model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')
```

## 透過登錄檔部署模型 (Boto3)
<a name="model-registry-deploy-api"></a>

若要使用 部署模型版本 適用於 Python (Boto3) 的 AWS SDK，請完成下列步驟：

1. 下列程式碼片段假設您已建立 SageMaker AI Boto3 用戶端 `sm_client`，以及其 ARN 存放在變數 `model_version_arn` 中的模型版本。

   呼叫 [create\$1model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) API 操作，從模型版本建立模型物件。作為 `Containers` 模型物件的一部分，傳遞模型版本的 Amazon Resource Name (ARN)：

   ```
   model_name = 'DEMO-modelregistry-model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print("Model name : {}".format(model_name))
   container_list = [{'ModelPackageName': model_version_arn}]
   
   create_model_response = sm_client.create_model(
       ModelName = model_name,
       ExecutionRoleArn = role,
       Containers = container_list
   )
   print("Model arn : {}".format(create_model_response["ModelArn"]))
   ```

1. 調用 `create_endpoint_config`，建立端點組態。端點組態會指定端點要使用的 Amazon EC2 執行個體的數量和類型。

   ```
   endpoint_config_name = 'DEMO-modelregistry-EndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print(endpoint_config_name)
   create_endpoint_config_response = sm_client.create_endpoint_config(
       EndpointConfigName = endpoint_config_name,
       ProductionVariants=[{
           'InstanceType':'ml.m4.xlarge',
           'InitialVariantWeight':1,
           'InitialInstanceCount':1,
           'ModelName':model_name,
           'VariantName':'AllTraffic'}])
   ```

1. 調用 `create_endpoint` 來更新端點。

   ```
   endpoint_name = 'DEMO-modelregistry-endpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print("EndpointName={}".format(endpoint_name))
   
   create_endpoint_response = sm_client.create_endpoint(
       EndpointName=endpoint_name,
       EndpointConfigName=endpoint_config_name)
   print(create_endpoint_response['EndpointArn'])
   ```

## 從其他帳戶部署模型版本
<a name="model-registry-deploy-xaccount"></a>

您可以新增跨 AWS 帳戶資源政策，以允許 帳戶部署在不同帳戶中建立的模型版本。例如，您組織中的一個團隊可能負責訓練模型，另一個團隊負責部署和更新模型。建立資源政策後，您可以將政策套用至要向其授予存取權的特定資源。如需 中跨帳戶資源政策的詳細資訊 AWS，請參閱*AWS Identity and Access Management 《 使用者指南*》中的[跨帳戶政策評估邏輯](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic-cross-account.html)。

**注意**  
在跨帳戶模型部署訓練期間，您必須使用 KMS 金鑰來對[輸出資料設定](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputDataConfig.html)動作進行加密。

若要在 SageMaker AI 中啟用跨帳戶模型部署，您必須為模型群組提供跨帳戶資源政策，其中包含要部署的模型版本、模型群組推論映像所在的 Amazon ECR 儲存庫，以及儲存模型版本的 Amazon S3 儲存貯體。

您必須擁有對 SageMaker AI 動作有存取權的角色 (例如具有 `AmazonSageMakerFullAccess` 受管政策的角色)，才能夠部署在不同帳戶中建立的模型。如需 SageMaker AI 受管政策的相關資訊，請參閱 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md)。

下列範例會為所有這三種資源建立跨帳戶政策，並將政策套用至資源。此範例也假設您先前已定義下列變數：
+ `bucket` - 存放模型版本的 Amazon S3 儲存貯體。
+ `kms_key_id` - 用來加密訓練輸出的 KMS 金鑰。
+ `sm_client` - SageMaker AI Boto3 用戶端。
+ `model_package_group_name` - 您要授予跨帳戶存取權的模型群組。
+ `model_package_group_arn` - 您要授與跨帳戶存取權的模型群組 ARN。

```
import json

# The cross-account id to grant access to
cross_account_id = "123456789012"

# Create the policy for access to the ECR repository
ecr_repository_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPerm',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ['ecr:*']
    }]
}

# Convert the ECR policy from JSON dict to string
ecr_repository_policy = json.dumps(ecr_repository_policy)

# Set the new ECR policy
ecr = boto3.client('ecr')
response = ecr.set_repository_policy(
    registryId = account,
    repositoryName = 'decision-trees-sample',
    policyText = ecr_repository_policy
)

# Create a policy for accessing the S3 bucket
bucket_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPerm',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': 's3:*',
        'Resource': f'arn:aws:s3:::{bucket}/*'
    }]
}

# Convert the policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new policy
s3 = boto3.client('s3')
response = s3.put_bucket_policy(
    Bucket = bucket,
    Policy = bucket_policy)

# Create the KMS grant for encryption in the source account to the
# Model Registry account Model Group
client = boto3.client('kms')

response = client.create_grant(
    GranteePrincipal=cross_account_id,
    KeyId=kms_key_id
    Operations=[
        'Decrypt',
        'GenerateDataKey',
    ],
)

# 3. Create a policy for access to the Model Group.
model_package_group_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ['sagemaker:DescribeModelPackageGroup'],
        'Resource': f'arn:aws:sagemaker:{region}:{account}:model-package-group/{model_package_group_name}'
    },{
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ["sagemaker:DescribeModelPackage",
                   "sagemaker:ListModelPackages",
                   "sagemaker:UpdateModelPackage",
                   "sagemaker:CreateModel"],
        'Resource': f'arn:aws:sagemaker:{region}:{account}:model-package/{model_package_group_name}/*'
    }]
}

# Convert the policy from JSON dict to string
model_package_group_policy = json.dumps(model_package_group_policy)

# Set the policy to the Model Group
response = sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name,
    ResourcePolicy = model_package_group_policy)

print('ModelPackageGroupArn : {}'.format(create_model_package_group_response['ModelPackageGroupArn']))
print("First Versioned ModelPackageArn: " + model_package_arn)
print("Second Versioned ModelPackageArn: " + model_package_arn2)

print("Success! You are all set to proceed for cross-account deployment.")
```

# 在 Studio 中部署模型
<a name="model-registry-deploy-studio"></a>

註冊模型版本並核准其進行部署後，請將其部署到 Amazon SageMaker AI 端點以進行即時推論。您可以在 Amazon SageMaker Studio 中 [使用 Python 從登錄中部署模型](model-registry-deploy.md) 或部署模型。以下提供如何在 Studio 中部署模型的指示。

此功能不適用於 Amazon SageMaker Studio Classic。
+ 如果 Studio 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio UI 概觀](studio-updated-ui.md) 中找到的映像類似。
+ 如果 Studio Classic 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md) 中找到的映像類似。

在您可以部署模型套件之前，必須符合模型套件的下列要求：
+ 可用的有效推論規格。如需詳細資訊，請參閱 [InferenceSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html#sagemaker-CreateModelPackage-request-InferenceSpecification)。
+ 具有核准狀態的模型。如需詳細資訊，請參閱[更新模型的核准狀態](model-registry-approve.md)。

以下提供如何在 Studio 中部署模型的指示。

**使用 Studio 部署模型**

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. (選用) 如果您具有與您共用的模型，您可以選擇**我的模型**或**與我共用**。

1. 選取已註冊模型的核取方塊。如果符合上述要求，**部署**按鈕變成可供選擇。

1. 選擇**部署**以開啟**將模型部署至端點**頁面。

1. 在**端點設定**中設定部署資源。

1. 一旦驗證了設定，請選擇**部署**。然後，模型將部署到狀態為**服務中**的端點。

對於 `us-east-1`、`ap-northeast-1`、 `us-west-2`和 `eu-west-1`區域，您可以使用下列指示來部署模型：

**使用 Studio 部署模型**

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 選擇**我的模型**索引標籤。

1. 如果尚未選取，請選擇已記錄**模型**索引標籤。

1. 選取模型，然後選擇**檢視最新版本**。

1. 選擇**部署**，然後在 SageMaker AI 或 Amazon Bedrock 之間選取。

1. 一旦驗證了設定，請選擇**部署**。然後，模型將部署到狀態為**服務中**的端點。

# 跨帳戶探索能力
<a name="model-registry-ram"></a>

透過探索已在其他帳戶中註冊的模型套件群組，資料科學家和資料工程師可以提高資料一致性，簡化協作並減少重複工作量。使用 Amazon SageMaker 模型註冊表，您可以跨帳戶共用模型套件群組。有兩種類別的許可與共用資源相關聯：
+ **探索能力**：*探索能力*是指資源消費者帳戶能夠查看一或多個資源擁有者帳戶共用的模型套件群組。只有在資源擁有者將必要的資源政策連接到共用的模型套件群組時，才能夠進行探索。資源取用者可以在 UI 和 AWS RAM 中檢視所有共用模型套件群組 AWS CLI。
+ **可存取性**：*可存取性*是指資源消費者帳戶能夠使用共用的模型套件群組。例如，如果資源消費者具有必要的許可，則可以從不同的帳戶註冊或部署模型套件。

**Topics**
+ [在 Studio 中共用模型群組](model-registry-ram-studio-share.md)
+ [在 Studio 中檢視共用模型群組](model-registry-ram-studio-view.md)
+ [Accessibility](model-registry-ram-accessibility.md)
+ [設定探索能力](model-registry-ram-discover.md)
+ [檢視共用模型套件群組](model-registry-ram-view-shared.md)
+ [解除主體與資源共用的關聯，並移除資源共用](model-registry-ram-dissociate.md)
+ [提升許可和資源共用](model-registry-ram-promote.md)

# 在 Studio 中共用模型群組
<a name="model-registry-ram-studio-share"></a>

您可以使用 Studio UI 與其他 AWS 主體 (AWS 帳戶 或 AWS Organizations) 共用模型群組。這種簡化的共用程序可啟用跨團隊協作、提升最佳實務，以及促進跨團隊重複使用模型。以下將提供如何在 Studio 中共用模型群組的指示。

此功能不適用於 Amazon SageMaker Studio Classic。
+ 如果 Studio 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio UI 概觀](studio-updated-ui.md) 中找到的映像類似。
+ 如果 Studio Classic 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md) 中找到的映像類似。

若要共用模型群組，您必須先確定已將下列許可新增至您要從中共用資源的執行角色。

1. [取得您的執行角色](sagemaker-roles.md#sagemaker-roles-get-execution-role)。

1. 使用下列內容[更新角色許可](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_update-role-permissions.html)：

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "ram:ListPermissions",
                   "ram:GetPermission",
                   "ram:GetResourceShareAssociations",
                   "ram:ListResourceSharePermissions",
                   "ram:DeleteResourceShare",
                   "ram:GetResourceShareInvitations",
                   "ram:AcceptResourceShareInvitation"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

以下提供如何與其他 AWS 主體共用模型群組的指示。

**與其他 AWS 主體共用模型群組**

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 選取已註冊模型。

1. 在右上角選擇**共用**。這會開啟**共用模型群組**區段。

   如果您在畫面底部看到錯誤訊息，則需要將適當的許可新增至您的執行角色。如需詳細資訊，請參閱上述許可。

1. 在**資源共用**下，選擇資源共用以更新或建立新的資源共用。

1. 在**受管許可**下，選擇受管許可來控制您模型的存取層級。

   可檢視的選項包括已為您建立的許可或 AWS RAM中您自訂的許可。請參閱《AWS Resource Access Manager使用者指南》**中的[建立和使用客戶受管許可](https://docs.aws.amazon.com/ram/latest/userguide/create-customer-managed-permissions.html)。

1. 在**AWS 主體**下，輸入您要共用的 AWS Organizations ARN 或 AWS 帳戶 IDs，然後選擇**新增**。您可以透過這種方式新增多個 AWS 主體。

1. 滿足最低要求時，即可存取**共用**按鈕。一旦驗證了您的設定，請選擇**共用**。

   成功共用會在畫面底部產生綠色橫幅訊息。

# 在 Studio 中檢視共用模型群組
<a name="model-registry-ram-studio-view"></a>

您可以檢視與您或屬於相同 AWS Organizations的帳戶共用的模型群組。如果模型群組與屬於相同的帳戶共用 AWS Organizations，則共用的模型群組會自動獲得核准，供您在 Studio 中檢視。否則，您將需要先核准待定邀請，然後才能在 Studio 中檢視共用模型群組。以下將提供如何在 Studio 中檢視共用模型群組並接受模型群組共用邀請的指示。

此功能不適用於 Amazon SageMaker Studio Classic。
+ 如果 Studio 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio UI 概觀](studio-updated-ui.md) 中找到的映像類似。
+ 如果 Studio Classic 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md) 中找到的映像類似。

以下提供如何檢視和接受與您共用之模型群組的指示。

**檢視並接受與您共用的模型群組**

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 選擇**與我共用**以檢視與您共用的模型群組。

1. 若要接受待定的模型群組邀請：

   1. 選擇**檢視待核准**以開啟**待定邀請**清單。

   1. 如果您想要接受邀請，請選擇**接受**。

# Accessibility
<a name="model-registry-ram-accessibility"></a>

如果資源消費者具有使用共用模型套件群組的存取許可，他們可以註冊或部署模型套件群組的版本。如需資源消費者如何註冊共用模型套件群組的詳細資訊，請參閱[從其他帳戶註冊模型版本](model-registry-version.md#model-registry-version-xaccount)。如需資源消費者如何部署共用模型套件群組的詳細資訊，請參閱[從其他帳戶部署模型版本](model-registry-deploy.md#model-registry-deploy-xaccount)。

# 設定探索能力
<a name="model-registry-ram-discover"></a>

資源擁有者可以透過建立資源共用並將資源政策連接到實體來設定模型套件群組探索能力。如需如何在 中建立一般資源共享的詳細步驟 AWS RAM，請參閱 [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 文件中的[建立資源共享](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html#getting-started-sharing-create)。

完成下列指示，以使用 AWS RAM 主控台或模型登錄資源政策 APIs 設定模型套件群組可探索性。

------
#### [ AWS CLI ]

1. 在模型擁有者帳戶中建立資源共用。

   1. 模型擁有者會使用 SageMaker AI 資源政策 API [put-model-package-group-policy](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/put-model-package-group-policy.html) 將資源政策連接至模型套件群組，如下列命令所示。

      ```
      aws sagemaker put-model-package-group-policy
      --model-package-group-name <model-package-group-name>
      --resource-policy "{\"Version\":\"2012-10-17\",		 	 	 \"Statement\":[{\"Sid\":
      \"ExampleResourcePolicy\",\"Effect\":\"Allow\",\"Principal\":<principal>,
      \"Action\":[\"sagemaker:DescribeModelPackage\",
      \"sagemaker:ListModelPackages\",\"sagemaker:DescribeModelPackageGroup\"],
      \"Resource\":[\"<model-package-group-arn>,\"
      \"arn:aws:sagemaker:<region>:<owner-account-id>:model-package/
      <model-package-group-name>/*\"]}]}"
      ```
**注意**  
不同的動作組合可以連接到資源政策。對於自訂政策，模型套件群組擁有者應該提升建立的許可，而且只有已連接所提升許可的實體才能探索。無法透過 AWS RAM探索或管理無法提升的資源共用。

   1. 若要檢查 是否已 AWS RAM 建立資源共享 ARN，請使用下列命令：

      ```
      aws ram get-resource-share-associations --association-type resource --resource-arn <model-package-group-arn>
      ```

      回應包含實體的 *resource-share-arn*。

   1. 若要檢查連接的政策許可是受管政策還是自訂政策，請使用下列命令：

      ```
      aws ram list-resource-share-permissions --resource-share-arn <resource-share-arn>
      ```

      `featureSet` 欄位可以採用值 `CREATED_FROM_POLICY` 或 `STANDARD`，其定義如下：
      + `STANDARD`：許可已存在。
      + `CREATED_FROM_POLICY`：需要提升許可，才能探索實體。如需詳細資訊，請參閱[提升許可和資源共用](model-registry-ram-promote.md)。

1. 接受模型消費者帳戶中的資源共用邀請。

   1. 模型套件群組消費者接受資源共用的邀請。若要查看所有資源邀請，請執行下列命令：

      ```
      aws ram get-resource-share-invitations
      ```

      識別狀態為 `PENDING` 的請求，並包含擁有者帳戶的帳戶 ID。

   1. 使用下列命令接受來自模型擁有者的資源共用邀請：

      ```
      aws ram accept-resource-share-invitation --resource-share-invitation-arn <resource-share-invitation-arn>
      ```

------
#### [ AWS RAM console ]

1. 登入 [AWS RAM 主控台](https://console.aws.amazon.com/ram/home)。

1. 請完成下列步驟，從模型套件群組擁有者帳戶建立資源共用。

   1. 完成下列步驟來指定資源共用詳細資訊。

      1. 在**名稱**欄位中，為您的資源新增唯一名稱。

      1. 在**資源**卡片中，選擇下拉式功能表，然後選取 **SageMaker AI 模型套件群組**。

      1. 選取模型套件群組資源共用 ARN 的核取方塊。

      1. 在**選取資源**卡片中，選取模型套件群組資源共用的核取方塊。

      1. 在**標籤**卡片中，為要新增至資源共用的標籤新增鍵值對。

      1. 選擇**下一步**。

   1. 完成下列步驟，將受管許可與資源共用建立關聯。

      1. 如果您使用受管許可，請在**受管許可**下拉式功能表中選擇受管許可。

      1. 如果您使用自訂許可，請選擇**客戶受管許可**。在此情況下，無法立即探索模型套件群組。您在建立資源共用之後必須提升許可和資源政策。如需如何提升許可和資源共用的詳細資訊，請參閱[提升許可和資源共用](model-registry-ram-promote.md)。如需如何連接自訂許可的詳細資訊，請參閱[在 AWS RAM中建立和使用客戶受管許可](https://docs.aws.amazon.com/ram/latest/userguide/create-customer-managed-permissions.html)。

      1. 選擇**下一步**。

   1. 請完成下列步驟，以將存取權授予主體。

      1. 選擇**允許與任何人共用**，以允許與您組織外的帳戶共用，或選擇**僅允許在您的組織內共用**。

      1. 在**選取主體類型**下拉式功能表中，為您要新增的主體新增主體類型和 ID。

      1. 新增並選取所選主體進行共用。

      1. 選擇**下一步**。

   1. 檢閱顯示的共用組態，然後選擇**建立資源共用**。

1. 接受來自消費者帳戶的資源共用邀請。一旦模型擁有者建立資源共用和主體關聯，指定的資源消費者帳戶就會收到加入該資源共用的邀請。資源消費者帳戶可以在 AWS RAM 主控台中的[與我共用：資源共用](https://console.aws.amazon.com/ram/home#SharedResourceShares:)頁面中檢視和接受邀請。如需有關在 中接受和檢視資源的詳細資訊 AWS RAM，請參閱[存取與您共用 AWS 的資源](https://docs.aws.amazon.com//ram/latest/userguide/working-with-shared.html)。

------

# 檢視共用模型套件群組
<a name="model-registry-ram-view-shared"></a>

在資源擁有者完成先前建立資源共享的步驟，且取用者接受共享的邀請後，取用者可以使用 AWS CLI 或 AWS RAM 主控台檢視共用模型套件群組。

## AWS CLI
<a name="model-registry-ram-view-shared-cli"></a>

若要檢視共用的模型套件群組，請在模型消費者帳戶中使用下列命令：

```
aws sagemaker list-model-package-groups --cross-account-filter-option CrossAccount
```

## AWS RAM 主控台
<a name="model-registry-ram-view-shared-ram"></a>

在 AWS RAM 主控台中，資源擁有者和取用者可以檢視共用模型套件群組。資源擁有者可以遵循[檢視您在 AWS RAM中建立的資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-view-rs.html)中的步驟，來檢視與消費者共用的模型套件群組。資源消費者可以遵循[檢視與您共用的資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-shared-view-rs.html)中的步驟，來檢視擁有者共用的模型套件群組。

# 解除主體與資源共用的關聯，並移除資源共用
<a name="model-registry-ram-dissociate"></a>

資源擁有者可以使用 或 AWS CLI 主控台，將主體與一組許可的資源共用取消關聯，或刪除整個資源共用 AWS RAM 。如需如何解除主體與資源共用的關聯的詳細資訊，請參閱 [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 文件中的[更新資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-update.html)。如需如何刪除資源共用的詳細資訊，請參閱 [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 文件中的[刪除資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-delete.html)。

## AWS CLI
<a name="model-registry-ram-dissociate-cli"></a>

若要解除主體與資源共用的關聯，請使用如下命令 [dissociate-resource-share](https://docs.aws.amazon.com/cli/latest/reference/ram/disassociate-resource-share.html)：

```
aws ram disassociate-resource-share --resource-share-arn <resource-share-arn> --principals <principal>
```

若要刪除資源共用，請使用如下命令 [delete-resource-share](https://docs.aws.amazon.com/cli/latest/reference/ram/delete-resource-share.html)：

```
aws ram delete-resource-share --resource-share-arn <resource-share-arn>
```

## AWS RAM 主控台
<a name="model-registry-ram-dissociate-ram"></a>

如需如何解除主體與資源共用的關聯的更多詳細資訊，請參閱 [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 文件中的[更新資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-update.html)。如需如何刪除資源共用的更多詳細資訊，請參閱 [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) 文件中的[刪除資源共用](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-delete.html)。

# 提升許可和資源共用
<a name="model-registry-ram-promote"></a>

如果您使用自訂的 (客戶管理的) 許可，則需要提升許可和相關聯的資源共用，才能探索模型套件群組。完成下列步驟以提升許可和資源共用。

1. 若要提升您的自訂許可以供 存取 AWS RAM，請使用下列命令：

   ```
   aws ram promote-permission-created-from-policy —permission-arn <permission-arn>
   ```

1. 使用下列命令提升資源共用：

   ```
   aws ram promote-resource-share-created-from-policy --resource-share-arn <resource-share-arn>
   ```

如果您在執行先前步驟時看到 `OperationNotPermittedException` 錯誤，則實體無法探索但可存取。例如，如果資源擁有者使用 `“Principal”: {“AWS”: “arn:aws:iam::3333333333:role/Role-1”}` 等擔任角色主體連接資源政策，或者如果資源政策允許 `“Action”: “*”`，則相關聯的模型套件群組無法提升也無法探索。

# 檢視模型的部署歷史記錄
<a name="model-registry-deploy-history"></a>

若要在 Amazon SageMaker Studio 主控台中檢視模型版本的部署，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

------
#### [ Studio ]

**檢視模型版本的部署歷史記錄**

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**模型**以顯示模型群組的清單。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 從模型群組清單中，選擇您要檢視之模型群組左側的角度括號。

1. 模型群組中模型版本的清單即會出現。如果您沒有看到要刪除的模型版本，請選擇**檢視全部**。

1. 選取要檢視的模型版本名稱。

1. 選擇**活動**索引標籤。模型版本的部署會以事件的形式顯示在活動清單中，並且其**事件類型**為 **ModelDeployment**。

------
#### [ Studio Classic ]

**檢視模型版本的部署歷史記錄**

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 從模型群組清單中，選取要檢視的模型群組名稱。

1. 系統會顯示一個新標籤，其中包含模型群組中模型版本的清單。

1. 在模型版本清單中，選取您要檢視其詳細資訊的模型版本名稱。

1. 在開啟的模型版本標籤上，選擇**活動**。模型版本的部署會以事件的形式顯示在活動清單中，並且其**事件類型**為 **ModelDeployment**。

------

# 在 Studio 中檢視模型歷程詳細資訊
<a name="model-registry-lineage-view-studio"></a>

您可以在 Amazon SageMaker Studio 中檢視已註冊模型的歷程詳細資訊。以下將提供如何在 Studio 中存取歷程檢視的指示。如需 Amazon SageMaker Studio 中歷程追蹤的詳細資訊，請參閱 [Amazon SageMaker 機器學習 (ML) 歷程追蹤](lineage-tracking.md)。

此功能不適用於 Amazon SageMaker Studio Classic。
+ 如果 Studio 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio UI 概觀](studio-updated-ui.md) 中找到的映像類似。
+ 如果 Studio Classic 是您的預設體驗，則 UI 與 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md) 中找到的映像類似。

歷程檢視是與已註冊模型相關聯之資源的互動式視覺化。這些資源包括資料集、訓練任務、核准、模型和端點。在歷程中，您也可以檢視相關聯的資源詳細資訊，包括來源 URI、建立時間戳記和其他中繼資料。

下列功能可用於 `us-east-1`、`ap-northeast-1`、 `us-west-2`和 `eu-west-1`區域：

您可以追蹤已記錄和已註冊模型的歷程。此外，模型資源的譜系包括資料集、評估器、訓練任務、核准、模型、推論元件和端點。在歷程中，您也可以檢視相關聯的資源詳細資訊，包括來源 URI、建立時間戳記和其他中繼資料。

以下提供如何存取已註冊模型版本之歷程詳細資訊的指示。

**存取已註冊模型版本的歷程詳細資訊**

1. 遵循[啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. (選用) 如果您具有與您共用的模型，您可以選擇**我的模型**或**與我共用**。

1. 選取已註冊模型。

1. 如果尚未選取，請選擇**版本**索引標籤。

1. 從**版本**清單中選擇特定模型版本。

1. 選擇**歷程**頁標。

在**歷程**索引標籤中，您可以導覽與模型版本相關聯的資源。您也可以選擇資源來檢視資源詳細資訊。

請注意，歷程檢視僅用於視覺化目的。重新排列或移動此檢視中的元件不會影響實際註冊的模型資源。

對於 `us-east-1`、`ap-northeast-1`、 `us-west-2`和 `eu-west-1`區域，您可以使用下列指示來存取已記錄和已註冊模型版本的歷程詳細資訊：

1. 遵循 [啟動 Amazon SageMaker Studio](studio-updated-launch.md) 中的指示開啟 Studio 主控台。

1. 從主要導覽窗格中選擇**模型**。

1. 選擇**我的模型**索引標籤。

1. （選用） 如果您有與您共用的模型，您可以在**我建立**或**共用**之間進行選擇。

1. 選取模型，然後選擇**檢視最新版本**。

1. 選擇**歷程**頁標。

# 模型註冊表集合
<a name="modelcollections"></a>

您可以使用集合來對彼此相關的已註冊模型進行分組，並將它們整理在階層之內，以大規模改善的模型可發現性。您可以使用集合來整理彼此相關的已註冊模型。例如，您可以根據模型所解決問題的領域對模型進行分類，將其命名為 *NLP 模型*、*CV 模型*或*語音辨識模型*集合。若要以樹狀結構整理已註冊的模型，您可以將集合相互嵌套。您在集合中執行的任何操作 (例如建立、讀取、更新或刪除) 都不會變更您註冊的模型。您可以使用 Amazon SageMaker Studio UI 或 Python SDK 來管理集合。

模型註冊表中的**集合**標籤會顯示您帳戶中所有集合的清單。下列部分描述了如何使用**集合**標籤中的選項來執行下列操作：
+ 建立集合
+ 將模型群組新增至集合
+ 在集合之間移動模型群組
+ 從其他集合移除模型群組或集合

您在集合中執行的任何操作都不會影響它們所包含的個別模型群組的完整性，因此不會修改 Amazon S3 和 Amazon ECR 中的基礎模型群組成品。

雖然集合在組織模型方面提供了更大的靈活性，但內部表現形式會對階層的大小有一些限制。如需這些限制的摘要，請參閱[限制](modelcollections-limitations.md)。

下列主題將展示如何在模型註冊表中建立和使用集合。

**Topics**
+ [設定先決條件許可](modelcollections-permissions.md)
+ [建立集合](modelcollections-create.md)
+ [將模型群組新增至集合](modelcollections-add-models.md)
+ [從集合中移除模型群組或集合](modelcollections-remove-models.md)
+ [在集合之間移動模型群組](modelcollections-move-models.md)
+ [檢視模型群組的父集合](modelcollections-view-parent.md)
+ [限制](modelcollections-limitations.md)

# 設定先決條件許可
<a name="modelcollections-permissions"></a>

建立包含下列必要資源群組動作的自訂政策：
+ `resource-groups:CreateGroup`
+ `resource-groups:DeleteGroup`
+ `resource-groups:GetGroupQuery`
+ `resource-groups:ListGroupResources`
+ `resource-groups:Tag`
+ `tag:GetResources`

如需新增內嵌政策的指示，請參閱[新增 IAM 身分許可 (主控台)](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console)。選擇政策格式時，請選擇 JSON 格式並新增下列政策：

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:ListGroupResources"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:GetGroupQuery"
            ],
            "Resource": "arn:aws:resource-groups:*:*:group/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:CreateGroup",
                "resource-groups:Tag"
            ],
            "Resource": "arn:aws:resource-groups:*:*:group/*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:TagKeys": "sagemaker:collection"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "resource-groups:DeleteGroup",
            "Resource": "arn:aws:resource-groups:*:*:group/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:collection": "true"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "tag:GetResources",
            "Resource": "*"
        }
    ]
}
```

------

# 建立集合
<a name="modelcollections-create"></a>

**重要**  
允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授與許可，才能將標籤新增至這些資源。需要將標籤新增至資源的許可，因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源，但不允許標記，則在嘗試建立資源時可能會發生 "AccessDenied" 錯誤。如需詳細資訊，請參閱[提供標記 SageMaker AI 資源的許可](security_iam_id-based-policy-examples.md#grant-tagging-permissions)。  
提供許可來建立 SageMaker 資源的 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md) 已包含建立這些資源時新增標籤的許可。

您可以在 Amazon SageMaker Studio 主控台中建立集合。若要建立集合，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 在**已註冊模型**索引標籤的正下方，選擇**集合**。

1. (選用) 若要在另一個集合內建立集合，請導覽至您要新增集合的階層。否則，您的集合將在根層級建立。

1. 在右上角的**動作**下拉式功能表中，選擇**建立新的集合**。

1. 在對話方塊的**名稱**欄位中輸入集合的名稱。
**注意**  
如果您計劃在此集合中建立多個階層，請保持集合名稱簡短。絕對路徑 (一個從根層級開始的字串，代表集合位置) 長度必須為 256 個字元或更短。如需其他詳細資訊，請參閱[集合和模型群組標記](modelcollections-limitations.md#modelcollections-tagging)。

1. (選用) 若要將模型群組新增至集合，請完成下列步驟：

   1. 選擇**選取模型群組**。

   1. 選取要新增的模型群組。您最多可以選取 10 個。

1. 選擇**建立**。

1. 檢查並確保集合是在目前階層建立的。如果您沒有立即看到新集合，請選擇**重新整理**。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**集合**標籤。

1. (選用) 若要在另一個集合內建立集合，請導覽至您要新增集合的階層。否則，您的集合將在根層級建立。

1. 在右上角的**動作**下拉式功能表中，選擇**建立新的集合**。

1. 在對話方塊的**名稱**欄位中輸入集合的名稱。
**注意**  
如果您計劃在此集合中建立多個階層，請保持集合名稱簡短。絕對路徑 (一個從根層級開始的字串，代表集合位置) 長度必須為 256 個字元或更短。如需其他詳細資訊，請參閱[集合和模型群組標記](modelcollections-limitations.md#modelcollections-tagging)。

1. (選用) 若要將模型群組新增至集合，請完成下列步驟：

   1. 選擇**選取模型群組**。

   1. 選取要新增的模型群組。您最多可以選取 10 個。

1. 選擇**建立**。

1. 檢查並確保集合是在目前階層建立的。如果您沒有立即看到新集合，請選擇**重新整理**。

------

# 將模型群組新增至集合
<a name="modelcollections-add-models"></a>

您可以在 Amazon SageMaker Studio 主控台中將模型群組新增至集合。若要將模型群組新增至集合，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 在**已註冊模型**索引標籤的正下方，選擇**模型**，如果尚未選取的話。

1. 選取您要新增的模型群組旁邊的核取方塊。最多可選取 10 個模型群組。如果您選取的模型群組超過 10 個，則將模型群組新增至集合的使用者介面選項會處於非作用中狀態。

1. 選擇**建立**旁邊的垂直省略符號，然後選擇**新增至集合**。

1. 為您要將所選模型群組新增至其中的集合選取選項按鈕。

1. 選擇**新增至集合**。

1. 檢查以確保模型群組已新增至集合。在您所選模型群組的**集合**欄中，您應該會看到您已將模型群組新增至其中的集合名稱。

------
#### [ Studio Classic ]

您可以透過**模型群組**或**集合**標籤將模型群組新增至集合。

若要從**集合**標籤將一或多個模型群組新增至集合，請完成下列步驟：

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**集合**標籤。

1. 選取要新增模型群組的集合。如果所需的集合不在根層級，請導覽至要新增模型群組的階層。

1. 在右上角的**動作**下拉式功能表中，選擇**新增模型群組**。

1. 選取要新增的模型群組。最多可選取 10 個模型群組。如果您選取的模型群組超過 10 個，則將模型群組新增至集合的使用者介面選項會處於非作用中狀態。

1. 選擇**新增至集合**。

1. 檢查並確保模型群組已新增至目前階層。如果您沒有立即看到新模型群組，請選擇**重新整理**。

若要從**模型群組**標籤將一或多個模型群組新增至集合，請完成下列步驟：

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**模型群組**頁籤。

1. 選取要新增的模型群組。您最多可以選取 10 個。如果您選取的模型群組超過 10 個，則將模型群組新增至集合的使用者介面選項會處於非作用中狀態。

1. 在右上角的**動作**下拉式功能表中，選擇**新增至集合**。

1. 在快顯對話方塊中，選擇根路徑位置 `Collections`。根位置的連結會顯示在表格上方。

1. 導覽至包含目的地集合的階層，或者您要在其中建立新集合並在新集合中新增模型的位置。

1. (選用) 若要將您的模型群組新增至現有集合，請完成下列步驟：

   1. 選取目的地集合。

   1. 選擇**新增至集合**。

1. (選用) 若要將您的模型群組新增至新集合，請完成下列步驟：

   1. 選擇**新建集合**。

   1. 輸入新集合的名稱。

   1. 選擇**建立**。

------

# 從集合中移除模型群組或集合
<a name="modelcollections-remove-models"></a>

從集合中移除模型群組或集合時，系統會從特定群組中移除模型群組或集合，而不是從模型註冊表中移除。您可以在 Amazon SageMaker Studio 主控台中從集合中移除模型群組。

若要從集合中移除一個或多個模型群組或集合，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟：

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 在**已註冊模型**索引標籤的正下方，選擇**集合**。

1. 導覽至包含要移除之模型群組或集合的集合。

1. 選取要移除的模型群組或集合。您最多可以選取 10 個。如果您選取的模型群組或集合超過 10 個，則移除模型群組或集合的使用者介面選項將處於非作用中狀態。
**重要**  
您無法同時選取要移除的模型群組和集合。若要同時移除模型群組和集合，請先移除模型群組，然後移除集合。
**重要**  
您無法移除非空的集合。若要移除非空白的集合，請先移除其內容。

1. 在右上角的**動作**下拉式功能表中，選擇**從集合中移除 X 個項目** (其中 X 是您選取的模型群組數量)。

1. 確認您要移除選取的模型群組。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**集合**標籤。

1. 導覽至包含要移除之模型群組或集合的集合。

1. 選取要移除的模型群組或集合。您最多可以選取 10 個。如果您選取的模型群組或集合超過 10 個，則移除模型群組或集合的使用者介面選項將處於非作用中狀態。
**重要**  
您無法同時選取要移除的模型群組和集合。若要同時移除模型群組和集合，請先移除模型群組，然後移除集合。
**重要**  
您無法移除非空的集合。若要移除非空白的集合，請先移除其內容。

1. 在右上角的**動作**下拉式功能表中，選擇**從集合中移除 X 個項目** (其中 X 是您選取的模型群組的數量)。

1. 確認您要移除選取的模型群組。

------

# 在集合之間移動模型群組
<a name="modelcollections-move-models"></a>

您可以在 Amazon SageMaker Studio 主控台中將一或多個模型群組從一個集合移至另一個集合。

若要移動模型群組，請根據您是使用 Studio 還是 Studio Classic 完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 在**已註冊模型**索引標籤的正下方，選擇**集合**。

1. 導覽至包含您要移動之模型群組的集合。

1. 選取要移動的模型群組。您最多可以選取 10 個。如果您選取的模型群組超過 10 個，則移動模型群組的使用者介面選項將處於非作用中狀態。

1. 在右上角的**動作**下拉式功能表中，選擇**移動到**。

1. 在對話方塊中，選擇根路徑位置 `Collections`。根位置的連結會顯示在表格上方。

1. 導覽至包含目的地集合的階層。

1. 在表格中選取目的地集合。

1. 選擇**移至此處**。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**集合**標籤。

1. 導覽至包含您要移動之模型群組的集合。

1. 選取要移動的模型群組。您最多可以選取 10 個。如果您選取的模型群組超過 10 個，則移動模型群組的使用者介面選項將處於非作用中狀態。

1. 在右上角的**動作**下拉式功能表中，選擇**移動到**。

1. 在對話方塊中，選擇根路徑位置 `Collections`。根位置的連結會顯示在表格上方。

1. 導覽至包含目的地集合的階層。

1. 在表格中選取目的地集合。

1. 選擇**移至此處**。

------

# 檢視模型群組的父集合
<a name="modelcollections-view-parent"></a>

您可以在 Amazon SageMaker Studio 主控台中檢視包含特定模型群組的集合。

若要檢視包含特定模型群組的集合，請根據您是使用 Studio 還是 Studio Classic 完成以下步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中選擇 **Models (模型)**。

1. 如果尚未選取，請選擇**已註冊模型**索引標籤。

1. 如果尚未選取，請在**已註冊模型**索引標籤的正下方選擇**模型群組**。

1. 檢視模型群組的**集合**欄，此欄會顯示包含此模型群組之集合的名稱。如果多個集合包含此模型群組，請選擇**集合**欄項目以顯示快顯視窗，其中會列出包含此模型群組的集合。

------
#### [ Studio Classic ]

1. 登入 Amazon SageMaker Studio Classic。如需詳細資訊，請參閱[啟動 Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html)。

1. 在左側的導覽窗格中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 選擇**模型**，然後選擇**模型註冊表**。

1. 選擇**模型群組**頁籤。

1. 在表格中找到您的模型群組。

1. 檢視模型群組的**集合**欄，此欄會顯示包含此模型群組之集合的名稱。如果多個集合包含此模型群組，請選擇**集合**欄項目以顯示快顯視窗，其中會列出包含此模型群組的集合。

------

# 限制
<a name="modelcollections-limitations"></a>

使用集合時，您可能會遇到與集合作業的標籤長度限制或速率限制有關的問題。請檢閱下列警告清單，以便在使用集合時避免與這些限制相關的問題。

**VPC 限制**
+ VPC 模式不支援集合。

**集合作業限制**
+ 一次最多可以在一個集合中新增 10 個模型群組。
+ 一次最多可以從一個集合中移除 10 個模型群組。
+ 一次最多可以將一個集合中的 10 個模型群組移動至另一個集合。
+ 除非集合為空白，否則您無法刪除集合。
+ 模型群組可以屬於多個集合，但集合只能屬於一個集合。

**標記相關限制**
+ 一個模型群組最多可以屬於 48 個集合。如需詳細資訊，請參閱下一節：[集合和模型群組標記](#modelcollections-tagging)。
+ 集合絕對路徑的長度上限為 256 個字元。由於集合名稱是使用者指定的，因此您可以控制路徑長度。如需詳細資訊，請參閱下一節：[集合和模型群組標記](#modelcollections-tagging)。

## 集合和模型群組標記
<a name="modelcollections-tagging"></a>

SageMaker 模型註冊表使用標記規則和標記來在內部表示您的集合群組和階層。您可以在 AWS Resource Access Manager、SageMaker SDK 和 中存取這些標籤元素 AWS CLI，但請務必不要更改或刪除它們。

**重要**  
請勿刪除或變更屬於您的集合或模型群組的任何標記規則或標記。否則，系統會阻止您執行集合操作！

標籤規則是鍵值對，SageMaker AI 用來識別集合在階層中的位置。簡而言之，鍵是代表父集合的鍵，值是階層中集合的路徑。SageMaker AI 允許的標籤值長度上限為 256 個字元，因此如果您有多個巢狀階層，建議您保持集合名稱簡短。

**重要**  
保持集合名稱簡短。任何集合絕對路徑的長度都必須為 256 個字元或更短。

另一方面，模型群組沒有標記規則，而是使用標記。模型群組的標記包括適用於包含此模型群組之所有集合的標記規則。例如，如果四個集合包含*模型群組-1*，則*模型群組-1* 會有四個標記。SageMaker AI 允許單一 AWS 資源最多有 50 個標籤。由於系統預先配置兩個標記用於一般用途，因此一個模型群組最多可以有 48 個標記。總之，一個模型群組最多可以屬於 48 個集合。

# SageMaker AI 中的模型部署
<a name="model-deploy-mlops"></a>

一旦您訓練並核准用於生產的模型，請使用 SageMaker AI 將模型部署到端點以進行即時推論。SageMaker AI 提供多個推論選項，以便您可以挑選最適合自己工作負載的選項。您也可以透過選擇執行個體類型和執行個體數量來設定端點，以獲得最佳效能。如需與模型部署相關的詳細資訊，請參閱[部署用於推論的模型](deploy-model.md)。


將模型部署到生產環境之後，您可能想要探索進一步最佳化模型效能的方法，同時維持目前模型的可用性。例如，您可以設定陰影測試，在提交變更之前嘗試不同的模型或模型服務基礎設施。SageMaker AI 會以陰影模式部署新模型、容器或執行個體，並在相同端點內為其即時路由傳送一份推論請求副本。您可以記錄陰影變體的回應，以便進行比較。如需與陰影測試相關的詳細資訊，請參閱[陰影測試](shadow-tests.md)。如果您決定繼續變更模型，可以透過部署防護機制控制從目前模型到新模型的切換。您能為流量轉移程序選取藍/綠或 Canary 測試等方法，以便在更新期間維持精細控制。如需與部署防護機制相關的詳細資訊，請參閱[部署防護機制以更新生產環境中的模型](deployment-guardrails.md)。

# SageMaker Model Monitor
<a name="model-monitor-mlops"></a>

模型部署到生產環境後，您可以使用 Amazon SageMaker Model Monitor 即時監控模型效能。Model Monitor 可偵測資料品質、模型品質、偏差漂移和功能屬性漂移是否違反使用者定義閾值，從而協助您維護模型品質。此外，您可以設定警示，以便在發生違規時對進行故障診斷，並立即啟動重新訓練。Model Monitor 與 SageMaker Clarify 整合，可以提高對潛在偏差的可見性。

如需與 SageMaker Model Monitor 相關的詳細資訊，請參閱[使用 Amazon SageMaker Model Monitor 進行資料和模型品質監控](model-monitor.md)。

# 使用 SageMaker 專案進行 MLOps 自動化
<a name="sagemaker-projects"></a>

使用 SageMaker 專案搭配 CI/CD 建立端對端 ML 解決方案。

使用 SageMaker 專案建立 MLOps 解決方案，以進行協調和管理：
+ 建置用於處理、訓練和推論的自訂映像
+ 資料準備與特徵工程
+ 訓練模型
+ 評估模型
+ 部署模型
+ 監控和更新模型

**Topics**
+ [什麼是 SageMaker AI 專案？](sagemaker-projects-whatis.md)
+ [授予使用專案所需的 SageMaker Studio 許可](sagemaker-projects-studio-updates.md)
+ [使用 Amazon SageMaker Studio 或 Studio Classic 建立 MLOps 專案](sagemaker-projects-create.md)
+ [MLOps 專案範本](sagemaker-projects-templates.md)
+ [檢視專案資源](sagemaker-projects-resources.md)
+ [在 Amazon SageMaker Studio 或 Studio Classic 中更新 MLOps 專案](sagemaker-projects-update.md)
+ [使用 Amazon SageMaker Studio 或 Studio Classic 刪除 MLOps 專案](sagemaker-projects-delete.md)
+ [使用第三方 Git 儲存庫演練 SageMaker MLOps AI 專案](sagemaker-projects-walkthrough-3rdgit.md)

# 什麼是 SageMaker AI 專案？
<a name="sagemaker-projects-whatis"></a>

SageMaker 專案可協助組織為資料科學家建立開發人員環境並實現標準化，以及為 MLOps 工程師建立 CI/CD 系統。專案也可協助組織設定相依性管理、程式碼儲存庫管理、建置可重複性，以及成品共用。

您可以使用存放在 Amazon S3 儲存貯體中的自訂範本，或使用來自 或 SageMaker AI 的範本來佈建 SageMaker 專案。 AWS Service Catalog 如需 AWS Service Catalog 的相關資訊，請參閱[什麼是 AWS Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/dg/what-is-service-catalog.html)。MLOps 工程師和組織管理員可以透過 SageMaker 專案定義自己的範本，或使用 SageMaker AI 提供的範本。SageMaker AI 提供的範本透過原始碼版本控制、自動化機器學習 (ML) 管道和一系列程式碼來啟動 ML 工作流程，從而快速開始重複執行 ML 使用案例。

## 何時應該使用 SageMaker AI 專案？
<a name="sagemaker-projects-when"></a>

**重要**  
自 2024 年 9 月 9 日起，不再支援使用 AWS CodeCommit 儲存庫的專案範本。對於新專案，請從使用第三方 Git 儲存庫的可用專案範本中選取。

筆記本對於建立模型和實驗很有幫助，但共用程式碼的資料科學家和機器學習 (ML) 工程師團隊需要一種可擴展性更高的方式來維持程式碼一致性和嚴格的版本控制。

每個組織都有自己的一組標準和實務，為其 AWS 環境提供安全和控管。SageMaker AI 為想要快速開始使用 ML 工作流程和 CI/CD 的組織提供了一組第一方範本。範本包括使用 CI/CD AWS原生服務的專案，例如 AWS CodeBuild AWS CodePipeline和 AWS CodeCommit。這些範本也提供建立使用第三方工具 (例如 Jenkins 和 GitHub) 之專案的選項。如需 SageMaker AI 提供的專案範本清單，請參閱[使用 SageMaker AI 提供的專案範本](sagemaker-projects-templates-sm.md)。

組織通常需要嚴格控制其佈建和管理的 MLOps 資源。此類責任需要承擔某些任務，包括設定 IAM 角色和政策、強制執行資源標籤、強制執行加密，以及跨多個帳戶對資源進行解耦。SageMaker Projects 可以透過自訂範本產品支援所有這些任務，其中組織會使用 CloudFormation 範本來定義 ML 工作流程所需的資源。資料科學家可以選擇範本來引導和預先設定其機器學習 (ML) 工作流程。

若要開始使用，建議您在 Amazon S3 儲存貯體中建立和存放自訂範本。這樣做可讓您在組織支援的任何區域中建立儲存貯體。S3 支援版本控制，因此您可以維護範本的多個版本，並視需要轉返。如需如何從 Amazon S3 儲存貯體中的範本存放區建立專案的詳細資訊，請參閱 [使用 Amazon S3 儲存貯體中的範本](sagemaker-projects-templates-custom.md#sagemaker-projects-templates-s3)。

或者，您也可以將自訂範本建立為 Service Catalog 產品，也可以在 Studio 或 **Organization Templates** 下的 Studio Classic UI 中進行佈建。Service Catalog 是一項服務，可協助組織建立和管理已核准用於 的產品目錄 AWS。如需建立自訂範本的詳細資訊，請參閱[建置自訂 SageMaker AI 專案範本 - 最佳實務](https://aws.amazon.com/blogs/machine-learning/build-custom-sagemaker-project-templates-best-practices/)。

雖然您可以使用任一選項，但我們建議您透過 Service Catalog 使用 S3 儲存貯體，因此您可以在可使用 SageMaker AI 的支援區域中建立儲存貯體，而無需管理 Service Catalog 的複雜性。

SageMaker 專案可協助您管理 Git 儲存庫，從而提高跨團隊共同作業效率、確保程式碼一致性，並支援 CI/CD。SageMaker 專案可協助您完成下列任務：
+ 在一個專案下整理機器學習 (ML) 生命週期的所有實體。
+ 為模型訓練和部署建立一鍵式方法來設定標準機器學習 (ML) 基礎設施，並將最佳實務納入其中。
+ 建立和共用機器學習 (ML) 基礎設施範本，以處理多個使用案例。
+ 利用 SageMaker AI 提供的預先建置範本，快速開始專注於模型建置，或使用組織特定的資源和指南建立自訂範本。
+ 通過擴展項目範本與您選擇的工具集成。如需範例，請參閱[建立 SageMaker AI 專案以與 GitLab 和 GitLab 管道整合](https://aws.amazon.com/blogs/machine-learning/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines/)。
+ 在一個專案下整理機器學習 (ML) 生命週期的所有實體。

## SageMaker AI 專案中有什麼？
<a name="sagemaker-projects-within"></a>

客戶可以靈活地使用最適合其使用案例的資源來設定專案。以下範例展示機器學習工作流程的 MLOps 設定，包括模型訓練和部署。

![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/projects/projects-ml-workflow.png)


採用 SageMaker AI 提供之範本的典型專案可能包括下列項目：
+ 一個或多個包含範例程式碼的儲存庫，用於建置和部署機器學習 (ML) 解決方案。這些都是工作範例，您可以進行修改以符合您的需求。此程式碼歸您所有，您可以利用版本控制的儲存庫來完成自己的任務。
+ SageMaker AI 管道定義資料準備、訓練、模型評估和模型部署的步驟，如下圖所示。  
![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/projects/pipeline-in-project-simple.png)
+ CodePipeline 或 Jenkins 管道，每次您簽入新版本的程式碼時會執行 SageMaker AI 管道。如需 CodePipeline 的資訊，請參閱[什麼是 AWS CodePipeline。](https://docs.aws.amazon.com/codepipeline/latest/userguide/welcome.html)如需與 Jenkins 相關的資訊，請參閱 [Jenkins 使用者文件](https://www.jenkins.io/doc/)。
+ 包含模型版本的模型群組。每次您核准從 SageMaker AI 管道執行產生的模型版本時，都可以將其部署到 SageMaker AI 端點。

每個 SageMaker AI 專案都有唯一的名稱和 ID，以標籤形式套用至專案中 AWS 建立的所有 SageMaker AI 和資源。使用名稱和 ID，您可以檢視與專案相關聯的所有實體。其中包含：
+ 管道
+ 已註冊模型
+ 已部署模型 (端點)
+ 資料集
+ Service Catalog 產品
+ CodePipeline 和 Jenkins 管道
+ CodeCommit 和第三方 Git 儲存庫

## 是否必須建立專案才能使用 SageMaker AI 管道？
<a name="sagemaker-projects-need"></a>

否。SageMaker 管道是獨立實體，就像訓練任務、處理任務和其他 SageMaker AI 任務一樣。您可以使用 SageMaker Python SDK 直接在筆記本中建立、更新和執行管道，而無需使用 SageMaker AI 專案。

專案提供額外一個層級，可協助您整理程式碼，並採用生產品質系統所需的作業最佳實務。

# 授予使用專案所需的 SageMaker Studio 許可
<a name="sagemaker-projects-studio-updates"></a>

您新增至網域的 Amazon SageMaker Studio (或 Studio Classic) 管理員和 Studio (或 Studio Classic) 使用者可以檢視 SageMaker AI 提供的專案範本，並使用這些範本建立專案。根據預設，管理員可以在 Service Catalog 主控台中檢視 SageMaker AI 範本。如果使用者具有使用 SageMaker 專案的許可，管理員可以查看另一個使用者建立的內容。管理員也可以在 Service Catalog 主控台中檢視 CloudFormation SageMaker AI 專案範本定義的範本。如需與 Service Catalog 相關的資訊，請參閱 *Service Catalog 使用指南*中的[什麼是 Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/introduction.html)。

網域的 Studio (和 Studio Classic) 使用者若設定為使用與網域相同的執行角色，預設會具有使用 SageMaker AI 專案範本建立專案的許可。

**重要**  
請勿手動建立角色。始終使用以下程序中描述的步驟透過 **Studio 設定**建立角色。

對於使用網域執行角色以外的任何角色，來檢視和使用 SageMaker AI 提供的專案範本的使用者，您需要將**專案**許可授予個別使用者設定檔，方法是在您將 Studio 使用者新增至網域時，為這些使用者開啟**啟用 Amazon SageMaker AI 專案範本和 Amazon SageMaker JumpStart**。如需此步驟的相關資訊，請參閱 [新增使用者設定檔](domain-user-profile-add.md)。

由於 SageMaker 專案由 Service Catalog 提供支援，因此您必須將需要存取 SageMaker 專案的每個角色新增至服務目錄中的 **Amazon SageMaker AI 解決方案和 ML Ops 產品**組合。您可以在**群組、角色和使用者**索引標籤中執行此操作，如下列影像所示。如果 Studio Classic 中的每個使用者設定檔都有不同的角色，您應該將每個角色新增至服務目錄。您也可以在 Studio Classic 中建立使用者設定檔時執行此操作。

## 授予新網域角色專案的存取權
<a name="sagemaker-projects-grant-access"></a>

當您變更網域的執行角色或新增具有不同角色的使用者設定檔時，您必須授予這些新角色 Service Catalog 產品組合的存取權，才能使用 SageMaker 專案。請遵循下列步驟以確保所有角色都有必要的許可：

**授予新網域角色專案的存取權**

1. 開啟 [Service Catalog 主控台](https://console.aws.amazon.com/servicecatalog/)。

1. 在左側導覽功能表中，選擇**產品組合**。

1. 選取**已匯入**區段。

1. 選取 **Amazon SageMaker 解決方案和 ML Ops 產品**。

1. 選擇**存取**索引標籤。

1. 選擇 **Grant access (授與存取權)**。

1. 在**授予存取權**對話方塊中，選取**角色**。

1. 將存取權授予網域使用者設定檔使用的所有角色，包括：
   + 網域的執行角色
   + 指派給個別使用者設定檔的任何自訂執行角色

1. 選擇**授予存取權**以確認。

**重要**  
每當您變更網域的執行角色，或使用新的執行角色新增使用者設定檔時，都必須完成此程序。若沒有此存取權，使用者將無法建立或使用 SageMaker 專案。

下列程序展示如何在您加入 Studio 或 Studio Classic 之後授予**專案**許可。如需加入 Studio 或 Studio Classic 的更多詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

**若要確認您的 SageMaker AI 網域具有作用中的專案範本許可：**

1. 開啟 [SageMaker AI 主控台](https://console.aws.amazon.com/sagemaker/)。

1. 在左側導覽窗格中，選擇**管理員組態**。

1. 在**管理員組態**下，選擇**網域**。

1. 選擇網域。

1. 選擇**網域設定**標籤。

1. 在 **SageMaker 專案和 JumpStart** 下，確定已開啟下列選項：
   + **為此帳戶啟用 Amazon SageMaker AI 專案範本和 Amazon SageMaker JumpStart**
   + **為 Studio 使用者啟用 Amazon SageMaker AI 專案範本和 Amazon SageMaker JumpStart**

**檢視角色清單：**

1. 開啟 [SageMaker AI 主控台](https://console.aws.amazon.com/sagemaker/)。

1. 在左側導覽窗格中，選擇**管理員組態**。

1. 在**管理員組態**下，選擇**網域**。

1. 選擇網域。

1. 選擇**網域設定**標籤。

1. 您的角色清單會在 **Studio** 標籤下的 `Apps` 卡片中顯示。
**重要**  
自 7 月 25 日起，我們需要額外角色才能使用專案範本。以下是您應該在 `Projects` 下面看到的角色的完整清單：  
`AmazonSageMakerServiceCatalogProductsLaunchRole` `AmazonSageMakerServiceCatalogProductsUseRole` `AmazonSageMakerServiceCatalogProductsApiGatewayRole` `AmazonSageMakerServiceCatalogProductsCloudformationRole` `AmazonSageMakerServiceCatalogProductsCodeBuildRole` `AmazonSageMakerServiceCatalogProductsCodePipelineRole` `AmazonSageMakerServiceCatalogProductsEventsRole` `AmazonSageMakerServiceCatalogProductsFirehoseRole` `AmazonSageMakerServiceCatalogProductsGlueRole` `AmazonSageMakerServiceCatalogProductsLambdaRole` `AmazonSageMakerServiceCatalogProductsExecutionRole`  
如需這些指標的描述，請參閱[AWS SageMaker 專案和 JumpStart 的受管政策](security-iam-awsmanpol-sc.md)。

# 使用 Amazon SageMaker Studio 或 Studio Classic 建立 MLOps 專案
<a name="sagemaker-projects-create"></a>

**重要**  
允許 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 建立 Amazon SageMaker 資源的自訂 IAM 政策也必須授與許可，才能將標籤新增至這些資源。需要將標籤新增至資源的許可，因為 Studio 和 Studio Classic 會自動標記它們建立的任何資源。如果 IAM 政策允許 Studio 和 Studio Classic 建立資源，但不允許標記，則在嘗試建立資源時可能會發生 "AccessDenied" 錯誤。如需詳細資訊，請參閱[提供標記 SageMaker AI 資源的許可](security_iam_id-based-policy-examples.md#grant-tagging-permissions)。  
提供許可來建立 SageMaker 資源的 [AWS Amazon SageMaker AI 的 受管政策](security-iam-awsmanpol.md) 已包含建立這些資源時新增標籤的許可。

此程序示範如何使用 Amazon SageMaker Studio Classic 建立 MLOps 專案。

**先決條件**
+ 登入 Studio 或 Studio Classic 的 IAM 帳戶或 IAM Identity Center。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。
+ 使用 SageMaker AI 提供的專案範本的許可。如需詳細資訊，請參閱[授予使用專案所需的 SageMaker Studio 許可](sagemaker-projects-studio-updates.md)。
+ 對 Studio Classic 使用者介面的基本熟悉程度。如需詳細資訊，請參閱[Amazon SageMaker Studio Classic UI 概觀](studio-ui.md)。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**部署**，然後選擇**專案**。

1. 在專案清單的右上角，選擇**建立專案**。

1. 在**範本**頁面中，選擇要用於專案的範本。如需專案範本的詳細資訊，請參閱[MLOps 專案範本](sagemaker-projects-templates.md)。

1. 選擇**下一步**。

1. 在**專案詳細資訊**頁面中，輸入以下資訊：
   + **名稱**：您專案的名稱。
   + **描述**：您專案的選用描述。
   + 與所選範本相關的 Service Catalog 佈建參數的值。

1. 選擇**建立專案**，然後等待專案在**專案**清單中顯示。

1. (選用) 在 Studio 側邊欄中，選擇**管道**以檢視從您專案建立的管道。如需 Pipelines 的詳細資訊，請參閱[管道](pipelines.md)。

------
#### [ Studio Classic ]

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 選擇**建立專案**。

   系統隨即會開啟**建立專案**標籤，顯示可用範本的清單。

1. 如果尚未選取，請選擇 **SageMaker AI 範本**。如需專案範本的詳細資訊，請參閱[MLOps 專案範本](sagemaker-projects-templates.md)。

1. 選擇範本**模型建置、訓練和部署**。

1. 選擇**選取專案範本**。

   **建立專案**標籤會變更，以顯示**專案詳細資訊**。

1. 輸入下列資訊：
   + 對於**專案詳細資料**，輸入專案的名稱和說明。
   + (可選) 新增標籤，即用來追蹤專案的鍵值對。

1. 選擇**建立專案**，然後等待專案在**專案**清單中顯示。

------

# MLOps 專案範本
<a name="sagemaker-projects-templates"></a>

Amazon SageMaker AI 專案範本可自動為您的專案設定和實作 MLOps。SageMaker AI 專案範本是 SageMaker AI 提供給 Amazon SageMaker Studio (或 Studio Classic) 使用者的 Service Catalog 產品。在您加入或更新 Amazon SageMaker Studio (或 Studio Classic) 並啟用許可之後，這些 Service Catalog 產品會顯示在您的 Service Catalog 主控台中。如需啟用使用 SageMaker AI 專案範本的許可的相關資訊，請參閱[授予使用專案所需的 SageMaker Studio 許可](sagemaker-projects-studio-updates.md)。使用 SageMaker AI 專案範本建立一個專案，即端對端 MLOps 解決方案。

您可以使用 SageMaker 專案範本來實作影像建置 CI/CD。使用此範本，您可以自動執行已建立並推送至 Amazon ECR 的影像 CI/CD。您的專案來源控制儲存庫中容器檔案的變更會起始機器學習 (ML) 管道，並為容器部署最新版本。如需詳細資訊，請參閱部落格[使用映像建置 CI/CD 管道來建立 Amazon SageMaker 專案](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-with-image-building-ci-cd-pipelines/)。

如果您是管理員，可以從頭開始建立自訂專案範本，也可以修改 SageMaker AI 提供的其中一個專案範本。組織中的 Studio (或 Studio Classic) 使用者可以使用這些自訂專案範本來建立專案。

**Topics**
+ [使用 SageMaker AI 提供的專案範本](sagemaker-projects-templates-sm.md)
+ [建立自訂專案範本](sagemaker-projects-templates-custom.md)

# 使用 SageMaker AI 提供的專案範本
<a name="sagemaker-projects-templates-sm"></a>

**重要**  
自 2024 年 10 月 28 日起， AWS CodeCommit 範本已移除。對於新專案，請從使用第三方 Git 儲存庫的可用專案範本中選取。

Amazon SageMaker AI 會提供專案範本，而這些專案範本可建立您建立 MLOps 解決方案，進行 ML 模型持續整合和持續部署 (CI/CD) 所需的基礎設施。使用這些範本來處理資料、擷取功能、訓練和測試模型、在 SageMaker 模型註冊庫中註冊模型，以及部署模型以進行推論。您可以根據自己的需求自訂種子程式碼和組態檔。

**注意**  
需要額外角色才能使用專案範本。如需必要角色的完整清單以及有關如何建立這些角色的指示，請參閱[授予使用專案所需的 SageMaker Studio 許可](sagemaker-projects-studio-updates.md)。如果沒有新角色，您在嘗試建立新專案時會看到錯誤訊息 **CodePipeline 未授權在 arn:aws:iam::xxx:role/service-role/AmazonSageMakerServiceCatalogProductsCodePipelineRole 角色上執行 AssumeRole**，並且無法繼續操作。

SageMaker AI 專案範本為您提供下列程式碼儲存庫、工作流程自動化工具和管道階段選項：
+ **程式碼儲存庫**：或第三方 Git 儲存庫，例如 GitHub 和 Bitbucket
+ **CI/CD 工作流程自動化**： AWS CodePipeline 或 Jenkins
+ **管道階段**：模型建置和訓練、模型部署或兩者兼有

下列討論內容概述您在建立 SageMaker AI 專案時可以選擇的每個範本。您也可以遵循[專案演練](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html)中的[建立專案](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html#sagemaker-proejcts-walkthrough-create)來檢視 Studio (或 Studio Classic) 中的可用範本。

如需有關如何建立真實專案的逐步指示，您可以遵循其中一個專案演練：
+ 如果要使用範本 [使用 CodePipeline 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本](#sagemaker-projects-templates-git-code-pipeline)，請參閱[使用第三方 Git 儲存庫演練 SageMaker MLOps AI 專案](sagemaker-projects-walkthrough-3rdgit.md)。
+ 如果您想要使用範本 [使用 Jenkins 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本](#sagemaker-projects-templates-git-jenkins)，請參閱[使用第三方原始檔控制和 Jenkins 建立 Amazon SageMaker 專案](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-using-third-party-source-control-and-jenkins/)。

**Topics**

## 使用 CodePipeline 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本
<a name="sagemaker-projects-templates-git-code-pipeline"></a>
+ **代碼儲存庫**：第三方 Git。
**注意**  
建立從 AWS 您的帳戶到 GitHub 使用者或組織的 AWS CodeStar 連線。將具有金鑰`sagemaker`和值的標籤`true`新增至此 AWS CodeStar 連線。
+ **CI/CD 工作流程自動化**： AWS CodePipeline

### 模型建置和訓練
<a name="sagemaker-projects-templates-git-code-pipeline-building-training"></a>

此範本提供以下資源：
+ 與客戶指定的 Git 儲存庫建立關聯。儲存庫包含使用 Python 程式碼建立 Amazon SageMaker AI Pipeline 的範例程式碼，並顯示如何建立和更新 SageMaker AI 管道。這個儲存庫還有一個可以在 Studio (或 Studio Classic) 中開啟和執行的範例 Python 筆記本。
+ 具有來源和建置步驟的 AWS CodePipeline 管道。來源步驟指向第三方 Git 儲存庫。建置步驟會從儲存庫取得程式碼、建立和更新 SageMaker AI 管道、啟動管道執行，以及等待管道執行完成。
+ 將種子代碼資訊填入 Git 儲存庫的 AWS CodeBuild 專案。這需要 AWS CodeStar 從您的 AWS 帳戶 連線到 Git 儲存庫主機上的 帳戶。
+ 用於存放成品的 Amazon S3 儲存貯體，包括 CodePipeline 和 CodeBuild 成品，以及從 SageMaker AI 管道執行產生的任何成品。

### 模型部署
<a name="sagemaker-projects-templates-git-code-pipeline-deployment"></a>

此範本提供以下資源：
+ 與客戶指定的 Git 儲存庫建立關聯。儲存庫包含將模型部署到暫存環境和生產環境中端點的範例程式碼。
+ 具有來源、建置、deploy-to-staging和deploy-to-production步驟的 AWS CodePipeline 管道。來源步驟指向第三方 Git 儲存庫，而建置步驟會從該儲存庫取得程式碼，並產生要部署的 CloudFormation 堆疊。部署deploy-to-staging和deploy-to-production步驟會將 CloudFormation 堆疊部署到各自的環境。暫存和生產建置步驟之間有一個手動核准步驟，因此 MLOps 工程師必須先核准模型，才能將模型部署到生產環境。
+ 將種子代碼資訊填入 Git 儲存庫的 AWS CodeBuild 專案。這需要 AWS CodeStar 從您的 AWS 帳戶 連線到 Git 儲存庫主機上的 帳戶。
+ 用於存放成品的 Amazon S3 儲存貯體，包括 CodePipeline 和 CodeBuild 成品，以及從 SageMaker AI 管道執行產生的任何成品。

### 模型建置、訓練和部署
<a name="sagemaker-projects-templates-git-code-pipeline-building-training-deployment"></a>

此範本提供以下資源：
+ 與一或多個客戶指定的 Git 儲存庫建立關聯。
+ 具有來源、建置、deploy-to-staging和deploy-to-production步驟的 AWS CodePipeline 管道。來源步驟會指向第三方 Git 儲存庫，而建置步驟會從該儲存庫取得程式碼，並產生 CloudFormation 堆疊以進行部署。部署到暫存和部署到生產環境步驟會將 CloudFormation 堆疊部署到各自的環境。暫存和生產建置步驟之間有一個手動核准步驟，因此 MLOps 工程師必須先核准模型，才能將模型部署到生產環境。
+ 將種子代碼資訊填入 Git 儲存庫的 AWS CodeBuild 專案。這需要 AWS CodeStar 從 AWS 您的帳戶連線到 Git 儲存庫主機上的 帳戶。
+ 用於存放成品的 Amazon S3 儲存貯體，包括 CodePipeline 和 CodeBuild 成品，以及從 SageMaker AI 管道執行產生的任何成品。

如前所述，請參閱[使用第三方 Git 儲存庫的專案演練](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html)，以取得使用此範本建立真實專案的示範。

## 使用 CodePipeline 進行模型建置、訓練、部署和 Amazon SageMaker Model Monitor 的 MLOps 範本
<a name="sagemaker-projects-template-model-monitor"></a>
+ **代碼儲存庫**：第三方 Git。
**注意**  
建立從 AWS 您的帳戶到 GitHub 使用者或組織的 AWS CodeStar 連線。將具有金鑰`sagemaker`和值的標籤`true`新增至此 AWS CodeStar 連線。
+ **CI/CD 工作流程自動化**： AWS CodePipeline

下列範本包含額外的 Amazon SageMaker Model Monitor 範本，其中提供下列類型的監控：
+ [資料品質](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html) - 監控資料品質的漂移。
+ [模型品質](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html) - 監控模型品質指標中的漂移，例如準確性。
+ [生產中模型的偏差漂移](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) - 監控模型預測中的偏差。

### 模型建置、訓練、部署和 Amazon SageMaker Model Monitor
<a name="sagemaker-projects-template-model-monitor-training-deployment-model-monitor"></a>

此範本是 MLOps 範本的延伸，用於使用 CodePipeline 搭配 Git 儲存庫進行模型建置、訓練和部署。它包括範本模型建置、訓練和部署元件，以及提供下列類型監控功能的其他 Amazon SageMaker Model Monitor 範本：

### 監控已部署的模型
<a name="sagemaker-projects-template-model-monitor-deploy"></a>

這個 MLOps 解決方案範本用於部署一個或多個 Amazon SageMaker AI 資料品質、模型品質、模型偏差和模型可解釋性監視器，能夠監控 SageMaker AI 推論端點上已部署的模型。此範本提供以下資源：
+ 與一或多個客戶指定的 Git 儲存庫建立關聯。儲存庫包含範例 Python 程式碼，其可從 Amazon SageMaker 模型註冊表取得監視器使用的[基準](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html)，並更新暫存和生產環境的範本參數。它還包含用於建立 Amazon SageMaker Model Monitors 的 CloudFormation 範本。
+ 具有來源、建置和部署步驟的 AWS CodePipeline 管道。來源步驟會指向 CodePipeline 儲存庫。建置步驟會從該儲存庫取得程式碼、從模型註冊表取得基準，並更新暫存和生產環境的範本參數。部署步驟會將設定的監視器部署到暫存和生產環境中。`DeployStaging` 階段中的手動核准步驟要求您在核准並移至 `DeployProd` 階段之前，先確認生產 SageMaker AI 端點是否處於 `InService` 狀態。
+ 將種子代碼資訊填入 Git 儲存庫的 AWS CodeBuild 專案。這需要 AWS CodeStar 從您的 AWS 帳戶 連線到 Git 儲存庫主機上的 帳戶。
+ 此範本使用與 MLOps 範本為模型建置、訓練和部署而建立的 Amazon S3 儲存貯體相同的 S3 儲存貯體來儲存監視器的輸出。
+  AWS CodePipeline 每次更新預備 SageMaker AI 端點時，兩個 Amazon EventBridge 事件規則都會啟動 Amazon SageMaker Model Monitor。 Amazon SageMaker 

## 使用 Jenkins 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本
<a name="sagemaker-projects-templates-git-jenkins"></a>
+ **代碼儲存庫**：第三方 Git。
**注意**  
建立從 AWS 您的帳戶到 GitHub 使用者或組織的 AWS CodeStar 連線。將具有金鑰`sagemaker`和值的標籤`true`新增至此 AWS CodeStar 連線。
+ **CI/CD 工作流程自動化**：Jenkins

### 模型建置、訓練和部署
<a name="sagemaker-projects-templates-git-jenkins-building-training-deployment"></a>

此範本提供以下資源：
+ 與一或多個客戶指定的 Git 儲存庫建立關聯。
+ 產生 Jenkins 管道的種子程式碼，這些管道有來源、建置、部署至預備和部署至生產等步驟。來源步驟會指向客戶指定的 Git 儲存庫。建置步驟會從該儲存庫取得程式碼，並產生兩個 CloudFormation 堆疊。部署步驟會將 CloudFormation 堆疊部署到各自的環境中。預備步驟與生產步驟之間有核准步驟。
+ 將種子代碼資訊填入 Git 儲存庫的 AWS CodeBuild 專案。這需要 AWS CodeStar 從 AWS 您的帳戶連線到 Git 儲存庫主機上的 帳戶。
+ 用於存放 SageMaker AI 專案和 SageMaker AI 管道成品的 Amazon S3 儲存貯體。

範本會在您的專案與來源控制儲存庫之間建立關聯，但您需要執行額外的手動步驟，以在 AWS 您的帳戶與 Jenkins 之間建立通訊。如需詳細步驟，請參閱[使用第三方來源控制和 Jenkins 建立 Amazon SageMaker 專案](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-using-third-party-source-control-and-jenkins/)。

這些指示可協助您建置下圖所示的架構，以 GitHub 做為此範例中的來源控制儲存庫。如圖所示，您正在將 Git 儲存庫附加至專案以簽入和管理程式碼版本。當 Jenkins 在 Git 儲存庫中偵測到模型建置程式碼的變更時，會啟動模型建置管道。您也會將專案連線至 Jenkins，以協調模型部署步驟，這些步驟會在您核准在模型登錄檔中登錄的模型時開始，或在 Jenkins 偵測到模型部署程式碼的變更時開始。


![\[\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/projects/projects-templates-gitjenkins.png)


總而言之，這些步驟會引導您完成下列任務：

1. 建立 AWS 和 GitHub 帳戶之間的連線。

1. 建立 Jenkins 帳戶並匯入所需的外掛程式。

1. 建立 Jenkins IAM 使用者和許可政策。

1. 在 Jenkins 伺服器上設定 Jenkins IAM 使用者的 AWS 登入資料。

1. 建立 API 權杖與 Jenkins 伺服器進行通訊。

1. 使用 CloudFormation 範本來設定 EventBridge 規則，以監控新核准模型的模型登錄檔。

1. 建立 SageMaker AI 專案，該專案會使用模型建置和部署程式碼來植入您的 GitHub 儲存庫。

1. 使用模型建置種子程式碼建立 Jenkins 模型建置管道。

1. 使用模型部署種子程式碼建立 Jenkins 模型部署管道。

## 映像建置、模型建置和模型部署 MLOps 範本
<a name="sagemaker-projects-templates-image-building-model-building-deployment"></a>

此範本是 [使用 CodePipeline 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本](#sagemaker-projects-templates-git-code-pipeline) 的延伸。它包括該範本的模型建置、訓練和部署元件，以及下列選項：
+ 包括處理映像 - 建立管道
+ 包括訓練映像 - 建立管道
+ 包含推論映像 - 建置管道

對於在建立專案期間選取的每個元件，可使用範本建立下列項目：
+ Amazon ECR 儲存庫
+ [SageMaker 映像](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateImage.html)
+ 包含您可以自訂之 Dockerfile 的 CodeCommit 儲存庫
+ 由對 CodePipeline 儲存庫所做之變更啟動的 CodePipeline
+ 建置 Docker 映像檔並在 Amazon ECR 儲存庫中對其進行註冊的 CodeBuild 專案
+ 依排程啟動 CodePipeline 的 EventBridge 規則

CodePipeline 啟動時會建立新的 Docker 容器，並在 Amazon ECR 儲存庫中對其進行註冊。在 Amazon ECR 儲存庫中註冊新容器時，會在 SageMaker 映像中新增 `ImageVersion`。這會啟動模型建置管道，進而啟動部署管道。

如果適用，新建立的映像將用於工作流程的模型建置、訓練和部署部分。

## 更新 SageMaker 專案以使用第三方 Git 儲存庫
<a name="sagemaker-projects-templates-update"></a>

附加至 `AmazonSageMakerServiceCatalogProductsUseRole` 角色的受管政策已於 2021 年 7 月 27 日更新，以便與第三方 Git 範本搭配使用。在此日期之後加入 Amazon SageMaker Studio (或 Studio Classic) 並啟用專案範本的使用者會使用新政策。在此日期之前加入的使用者必須更新政策才能使用這些範本。使用下列其中一項來更新政策：
+ 刪除角色並切換 Studio (或 Studio Classic) 設定

  1. 在 IAM 主控台中，刪除 `AmazonSageMakerServiceCatalogProductsUseRole`。

  1. 在 Studio (或 Studio Classic) 控制面板中，選擇**編輯設定**。

  1. 切換兩個設定，然後選擇**提交**。
+ 在 IAM 主控台中，將下列許可新增至 `AmazonSageMakerServiceCatalogProductsUseRole`：

  ```
  {
        "Effect": "Allow",
        "Action": [
            "codestar-connections:UseConnection"
        ],
        "Resource": "arn:aws:codestar-connections:*:*:connection/*",
        "Condition": {
            "StringEqualsIgnoreCase": {
                "aws:ResourceTag/sagemaker": "true"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:PutObjectAcl"
        ],
        "Resource": [
            "arn:aws:s3:::sagemaker-*"
        ]
    }
  ```

# 建立自訂專案範本
<a name="sagemaker-projects-templates-custom"></a>

**重要**  
自 2024 年 10 月 28 日起， AWS CodeCommit 範本已移除。對於新專案，請從使用第三方 Git 儲存庫的可用專案範本中選取。如需詳細資訊，請參閱[MLOps 專案範本](sagemaker-projects-templates.md)。

如果 SageMaker AI 提供的範本不符合您的需求 (例如，您想要在 CodePipeline 中使用多個階段或自訂核准步驟進行更複雜的協同運作)，請建立您自己的範本。

我們建議您先使用 SageMaker AI 提供的範本，瞭解如何組織程式碼和資源，並在其上進行建置。若要這麼做，請在您啟用 SageMaker AI 範本的管理員存取權之後，登入 [https://console.aws.amazon.com/servicecatalog/](https://console.aws.amazon.com/servicecatalog/)，選擇**產品組合**，然後選擇**已匯入**。如需 Service Catalog 的相關資訊，請參閱 *Service Catalog 使用指南*中的 [Service Catalog 概觀](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/what-is_concepts.html)。

建立您自己的專案範本以自訂 MLOps 專案。SageMaker AI 專案範本是 Service Catalog 佈建的產品，可為您的 MLOps 專案佈建資源。

若要建立自訂專案範本，請完成下列步驟。

1. 建立組合。如需相關資訊，請參閱[步驟 3：建立 Service Catalog 組合](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-portfolio.html)。

1. 建立產品。產品是 CloudFormation 範本。您可以建立產品的多個版本。如需相關資訊，請參閱[步驟 4：建立 Service Catalog 產品](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-product.html)。

   對於要使用 SageMaker 專案的產品，請將下列參數新增至您的產品範本。

   ```
   SageMakerProjectName:
   Type: String
   Description: Name of the project
   
   SageMakerProjectId:
   Type: String
   Description: Service generated Id of the project.
   ```
**重要**  
我們建議您將 CodeCommit 儲存庫包裝到 SageMaker AI 程式碼儲存庫中，以便在 VPC 模式下可看見專案的儲存庫。範例範本和必要的新增內容會顯示在下列程式碼範例中。  
原始 (範例) 範本：  

   ```
   ModelBuildCodeCommitRepository:
       Type: AWS::CodeCommit::Repository
       Properties:
         # Max allowed length: 100 chars
         RepositoryName: !Sub sagemaker-${SageMakerProjectName}-${SageMakerProjectId}-modelbuild # max: 10+33+15+10=68
         RepositoryDescription: !Sub SageMaker Model building workflow infrastructure as code for the Project ${SageMakerProjectName}
         Code:
           S3:
             Bucket: SEEDCODE_BUCKETNAME
             Key: toolchain/model-building-workflow-v1.0.zip
           BranchName: main
   ```
要在 VPC 模式下新增的其他內容：  

   ```
   SageMakerRepository:
       Type: AWS::SageMaker::CodeRepository
       Properties:
           GitConfig:
               RepositoryUrl: !GetAtt ModelBuildCodeCommitRepository.CloneUrlHttp
               Branch: main
   ```

1. 新增啟動限制。當使用者啟動產品時，啟動限制可指定 Service Catalog 擔任的 IAM 角色。如需相關資訊，請參閱[步驟 6：新增啟動限制以指派 IAM 角色](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-launchconstraint.html)。

1. 在 [https://console.aws.amazon.com/servicecatalog/](https://console.aws.amazon.com/servicecatalog/) 上佈建產品以測試範本。如果您對範本感到滿意，請繼續執行下一個步驟，讓範本可在 Studio (或 Studio Classic) 中使用。

1. 將您在步驟 1 中建立之 Service Catalog 組合的存取權授與 Studio (或 Studio Classic) 執行角色。使用網域執行角色或具有 Studio (或 Studio Classic) 存取權的使用者角色。有關向產品組合新增角色的資訊，請參閱[步驟 7：授予最終用戶對產品組合的存取權限](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-deploy.html)。

1. 若要讓您的專案範本可在 Studio (或 Studio Classic) 的**組織範本**清單中使用，請使用您在步驟 2 中建立的 Service Catalog 產品的下列索引鍵和值來建立標籤。
   + **key**：`sagemaker:studio-visibility`
   + **值**：`true`

在您完成這些步驟之後，您組織中的 Studio (或 Studio Classic) 使用者可以使用您建立的範本來建立專案，方法是遵循[使用 Amazon SageMaker Studio 或 Studio Classic 建立 MLOps 專案](sagemaker-projects-create.md)中的步驟，並在選擇範本時選擇**組織範本**。

## 使用 Amazon S3 儲存貯體中的範本
<a name="sagemaker-projects-templates-s3"></a>

您也可以使用存放在 Amazon S3 中的範本建立 SageMaker 專案。

**注意**  
雖然您可以使用 中的範本 AWS Service Catalog，但我們建議您將範本存放在 S3 儲存貯體中，並使用這些範本建立專案。

### 管理員設定
<a name="sagemaker-projects-templates-s3-setup"></a>

在使用 S3 儲存貯體中的範本建立專案之前，請執行下列步驟。

1. [建立 S3 儲存貯](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html)體，並將您的範本上傳至儲存貯體。

1. [在 S3 儲存貯體上設定 CORS 政策，以設定存取許可](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html)。

1. 將下列鍵/值標籤新增至範本，讓 SageMaker AI 可以看到它們。

   ```
   sagemaker:studio-visibility : true
   ```

1. [建立網域](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html)。

1. SageMaker AI 完成建立網域後，請將下列鍵值標籤新增至網域：

   ```
   sagemaker:projectS3TemplatesLocation : s3://<amzn-s3-demo-bucket>
   ```

然後使用 AWS 主控台、Python 或 [CreateProject](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProject.html) 和 [UpdateProject](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateProject.html) API 操作，從 S3 儲存貯體內的範本建立或更新 SageMaker 專案。 S3 

------
#### [ Studio ]

**建立專案**

1. 開啟位在 [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) 的 Amazon SageMaker AI 主控台。

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**部署**、**專案**、**建立專案**。

1. 選擇**組織範本**，然後選擇 **S3 範本**以查看可供您使用的範本。如果您沒有看到預期的範本，請通知您的管理員。

1. 選擇您要使用的範本，然後選擇**下一步**。

1. 輸入專案的名稱、選用描述和其他必要欄位。完成時，選擇 **Create (建立)**。

**更新專案**

1. 開啟位在 [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) 的 Amazon SageMaker AI 主控台。

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 選擇您要更新的專案。選擇**動作**，然後選擇**更新專案**。

1. 更新專案時，您可以更新範本參數或範本 URL。完成後，請選擇 **Next (下一步)**。

1. 檢閱摘要資料表中的專案更新，然後選擇**更新**。

------
#### [ Python Boto3 ]

建立 S3 儲存貯體並上傳範本後，您可以使用下列範例來建立 SageMaker 專案。

```
sagemaker_client = boto3.client('sagemaker', region_name='us-west-2')

response = sagemaker_client.create_project(
    ProjectName='my-custom-project',
    ProjectDescription='SageMaker project with custom CFN template stored in S3',
    TemplateProviders=[{
        'CfnTemplateProvider': {
            'TemplateName': 'CustomProjectTemplate',
            'TemplateURL': f'https://<bucket_name>.s3.us-west-2.amazonaws.com/custom-project-template.yml',
            'Parameters': [
                {'Key': 'ParameterKey', 'Value': 'ParameterValue'}
            ]
        }
    }]
)
print(f"Project ARN: {response['ProjectArn']}")
```

若要更新 SageMaker 專案，請參閱下列範例。

```
sagemaker_client = boto3.client('sagemaker', region_name='us-west-2')

response = sagemaker_client.update_project(
    ProjectName='my-custom-project',
    ProjectDescription='SageMaker project with custom CFN template stored in S3',
    TemplateProvidersToUpdate=[{
        'CfnTemplateProvider': {
            'TemplateName': 'CustomProjectTemplate',
            'TemplateURL': f'https://<bucket_name>.s3.us-west-2.amazonaws.com/custom-project-template.yml',
            'Parameters': [
                {'Key': 'ParameterKey', 'Value': 'ParameterValue'}
            ]
        }
    }]
)
print(f"Project ARN: {response['ProjectArn']}")
```

------

# 檢視專案資源
<a name="sagemaker-projects-resources"></a>

建立專案後，請在 Amazon SageMaker Studio Classic 中檢視與該專案相關聯的資源。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**部署**，然後選擇**專案**。

1. 選取您要檢視詳細資訊的專案名稱。具有專案詳細資料的標籤隨即出現。

在專案詳細資訊頁面上，您可以檢視下列實體，並可以開啟與專案相關聯之實體對應的下列任何索引標籤。
+ 儲存庫：與此專案相關聯的程式碼儲存庫 (Repos)。如果您在建立專案時使用 SageMaker AI 提供的範本，則它會建立 AWS CodeCommit 儲存庫或第三方 Git 儲存庫。如需 CodeCommit 的詳細資訊，請參閱[什麼是 AWS CodeCommit](https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html) 。
+ 管道：定義準備資料、訓練和部署模型之步驟的 SageMaker AI ML 管道。如需 SageMaker AI ML 管道的相關資訊，請參閱 [Pipelines 動作](pipelines-build.md)。
+ 實驗：與專案相關聯的一個或多個 Amazon SageMaker Autopilot 實驗。若要取得有關 Autopilot 的更多資訊，請參閱[SageMaker Autopilot](autopilot-automate-model-development.md)。
+ 模型群組：由專案中的管道執行建立的模型版本群組。如需有關模型群組的資訊，請參閱[建立模型群組](model-registry-model-group.md)。
+ 端點：託管已部署模型以進行即時推論的 SageMaker AI 端點。模型版本核准後，會將其部署至端點。
+ 標籤：與專案相關聯的所有標籤。如需標籤的詳細資訊，請參閱 *AWS 一般參考* 中的[標記 AWS 資源](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html)。
+ 中繼資料：與專案相關聯的中繼資料。這包括使用的範本和版本，以及範本啟動路徑。

------
#### [ Studio Classic ]

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 選取您要檢視詳細資訊的專案名稱。

   隨即顯示包含專案詳細資料的標籤。

在專案詳細資料標籤上，您可以檢視下列與專案相關聯的實體。
+ 儲存庫：與此專案相關聯的程式碼儲存庫 (Repos)。如果您在建立專案時使用 SageMaker AI 提供的範本，則它會建立 AWS CodeCommit 儲存庫或第三方 Git 儲存庫。如需 CodeCommit 的詳細資訊，請參閱[什麼是 AWS CodeCommit](https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html) 。
+ 管道：定義準備資料、訓練和部署模型之步驟的 SageMaker AI ML 管道。如需 SageMaker AI ML 管道的相關資訊，請參閱 [Pipelines 動作](pipelines-build.md)。
+ 實驗：與專案相關聯的一個或多個 Amazon SageMaker Autopilot 實驗。若要取得有關 Autopilot 的更多資訊，請參閱[SageMaker Autopilot](autopilot-automate-model-development.md)。
+ 模型群組：由專案中的管道執行建立的模型版本群組。如需有關模型群組的資訊，請參閱[建立模型群組](model-registry-model-group.md)。
+ 端點：託管已部署模型以進行即時推論的 SageMaker AI 端點。模型版本核准後，會將其部署至端點。
+ 設定：專案的設定。這包括專案的名稱和描述、專案範本和 `SourceModelPackageGroupName` 的相關資訊，以及與專案相關的中繼資料。

------

# 在 Amazon SageMaker Studio 或 Studio Classic 中更新 MLOps 專案
<a name="sagemaker-projects-update"></a>

此程序示範如何在 Amazon SageMaker Studio 或 Studio Classic 中更新 MLOps 專案。更新專案可讓您選擇修改端對端 ML 解決方案。您可以更新**描述**、範本版本和範本參數。

**先決條件**
+ 登入 Studio 或 Studio Classic 的 IAM 帳戶或 IAM Identity Center。如需相關資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。
+ 對 Studio 或 Studio Classic 使用者介面的基本熟悉程度。如需 Studio UI 的相關資訊，請參閱 [Amazon SageMaker Studio](studio-updated.md)。如需 Studio Classic 的相關資訊，請參閱 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md)。
+ 將下列自訂內嵌政策新增至指定的角色：

  使用者建立的角色有 `AmazonSageMakerFullAccess`

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "servicecatalog:CreateProvisionedProductPlan",
                  "servicecatalog:DescribeProvisionedProductPlan",
                  "servicecatalog:DeleteProvisionedProductPlan"
              ],
              "Resource": "*"
          }
      ]
  }
  ```

------

  `AmazonSageMakerServiceCatalogProductsLaunchRole`

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "cloudformation:CreateChangeSet",
                  "cloudformation:DeleteChangeSet",
                  "cloudformation:DescribeChangeSet"
              ],
              "Resource": "arn:aws:cloudformation:*:*:stack/SC-*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "codecommit:PutRepositoryTriggers"
              ],
              "Resource": "arn:aws:codecommit:*:*:sagemaker-*"
          }
      ]
  }
  ```

------

若要在 Studio 或 Studio Classic 中更新您的專案，請完成下列步驟。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**部署**，然後選擇**專案**。

1. 選擇您要更新的專案旁邊的選項按鈕。

1. 選擇專案清單右上角的垂直省略符號，然後選擇**更新**。

1. 選擇**下一步**。

1. 檢閱摘要資料表中的專案更新，然後選擇**更新**。更新專案可能需要幾分鐘的時間。

------
#### [ Studio Classic ]

**在 Studio Classic 中更新專案**

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。會出現您的專案清單。

1. 在專案清單中選取您要更新的專案名稱。

1. 從專案索引標籤右上角的**動作**功能表中，選擇**更新**。

1. 在**更新專案**對話方塊中，您可以編輯**描述**和列出的範本參數。

1. 選擇**檢視差異**。

   會出現對話方塊顯示原始和更新的專案設定。您的專案設定中的任何變更都可以修改或刪除目前專案中的資源。對話方塊也會顯示這些變更。

1. 您可能需要等待幾分鐘，**更新**按鈕才會變為作用中狀態。選擇**更新**。

1. 專案更新可能需要幾分鐘才能完成。在專案索引標籤中選取**設定**，並確保參數已正確更新。

------

# 使用 Amazon SageMaker Studio 或 Studio Classic 刪除 MLOps 專案
<a name="sagemaker-projects-delete"></a>

此程序示範如何使用 Amazon SageMaker Studio 或 Studio Classic 刪除 MLOps 專案。

**先決條件**

**注意**  
您只能刪除 Studio 或 Studio Classic 中您已建立的專案。此條件是 `AmazonSageMakerFullAccess` 政策中 Service Catalog 權限 `servicecatalog:TerminateProvisionedProduct` 的一部分。如有需要，您可以更新此政策以移除此條件。
+ 登入 Studio 或 Studio Classic 的 IAM 帳戶或 IAM Identity Center。如需相關資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。
+ 對 Studio 或 Studio Classic 使用者介面的基本熟悉程度。如需 Studio UI 的相關資訊，請參閱 [Amazon SageMaker Studio](studio-updated.md)。如需 Studio Classic 的相關資訊，請參閱 [Amazon SageMaker Studio Classic UI 概觀](studio-ui.md)。

------
#### [ Studio ]

1. 請遵循[啟動 Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html) 中的指示來開啟 SageMaker Studio 主控台。

1. 在左側導覽窗格中，選擇**部署**，然後選擇**專案**。

1. 選擇您要刪除之專案旁邊的選項按鈕。

1. 選擇專案清單右上角的垂直省略符號，然後選擇**刪除**。

1. 檢閱**刪除專案**對話方塊中的資訊，然後如果您仍要刪除專案，請選擇**是，刪除專案**。

1. 選擇 **刪除**。

1. 您的專案清單隨即出現。確認您的專案不再出現在清單中。

------
#### [ Studio Classic ]

1. 登入 Studio Classic。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 從下拉式清單中選取目標專案。如果沒有看到您的專案，請輸入專案名稱，並套用篩選器來尋找您的專案。

1. 找到您的專案後，選取專案名稱以檢視詳細資訊。

1. 在**動作**功能表中，選擇**刪除**。

1. 從**刪除專案**視窗中，選擇**刪除**來確認您的選擇。

------

# 使用第三方 Git 儲存庫演練 SageMaker MLOps AI 專案
<a name="sagemaker-projects-walkthrough-3rdgit"></a>

**重要**  
自 2023 年 11 月 30 日起，先前的 Amazon SageMaker Studio 體驗現在命名為 Amazon SageMaker Studio Classic。下節專門介紹如何使用 Studio Classic 應用程式。如需使用已更新 Studio 體驗的資訊，請參閱 [Amazon SageMaker Studio](studio-updated.md)。  
Studio Classic 仍會針對現有工作負載進行維護，但無法再用於加入。您只能停止或刪除現有的 Studio Classic 應用程式，而且無法建立新的應用程式。建議您[將工作負載遷移至新的 Studio 體驗](studio-updated-migrate.md)。

本演練使用範本 [使用 CodePipeline 搭配第三方 Git 儲存庫進行模型建置、訓練和部署的 MLOps 範本](sagemaker-projects-templates-sm.md#sagemaker-projects-templates-git-code-pipeline) 來示範如何使用 MLOps 專案建立 CI/CD 系統以建置、訓練和部署模型。

**先決條件**

若要完成本演練，您需要：
+ 用於登入 Studio Classic 的 IAM 或 IAM Identity Center 帳戶。如需相關資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。
+ 使用 SageMaker AI 提供的專案範本的許可。如需相關資訊，請參閱[授予使用專案所需的 SageMaker Studio 許可](sagemaker-projects-studio-updates.md)。
+ 對 Studio Classic 使用者介面的基本熟悉程度。如需相關資訊，請參閱[Amazon SageMaker Studio Classic UI 概觀](studio-ui.md)。
+ 兩個空的 GitHub 儲存庫。您將這些儲存庫輸入到專案範本中，該範本將使用模型建置和部署程式碼來植入這些儲存庫。

**Topics**
+ [步驟 1：設定 GitHub 連線](#sagemaker-proejcts-walkthrough-connect-3rdgit)
+ [步驟 2：建立專案](#sagemaker-proejcts-walkthrough-create-3rdgit)
+ [步驟 3：在程式碼中進行變更](#sagemaker-projects-walkthrough-change-3rdgit)
+ [步驟 4：核准模型](#sagemaker-proejcts-walkthrough-approve-3rdgit)
+ [(選擇性) 步驟 5：將模型版本部署至生產](#sagemaker-projects-walkthrough-prod-3rdgit)
+ [步驟 6：清除資源](#sagemaker-projectcts-walkthrough-cleanup-3rdgit)

## 步驟 1：設定 GitHub 連線
<a name="sagemaker-proejcts-walkthrough-connect-3rdgit"></a>

在此步驟中，您可以使用[AWS CodeConnections 連線](https://docs.aws.amazon.com/dtconsole/latest/userguide/welcome-connections.html)至 GitHub 儲存庫。SageMaker AI 專案會使用此連線來存取您的原始程式碼儲存庫。

**若要設定 GitHub 連線：**

1. 前往 [https://console.aws.amazon.com/codepipeline/](https://console.aws.amazon.com/codepipeline/) 登入 CodePipeline 主控台

1. 在導覽窗格中，於**設定**下選擇**連線**。

1. 選擇**建立連線**。

1. 在**選取供應商**，選取 **GitHub**。

1. 在**連線名稱**中輸入名稱。

1. 選擇**連線到 GitHub**。

1. 如果先前未安裝 AWS Connector GitHub 應用程式，請選擇**安裝新應用程式**。

   這會顯示您有權存取的所有 GitHub 個人帳戶和組織的清單。

1. 選擇您要在其中建立連線與 SageMaker 專案和 GitHub 儲存庫搭配使用的帳戶。

1. 選擇**設定**。

1. 您可以選擇性地選取特定儲存庫或選擇**所有儲存庫**。

1. 選擇**儲存**。安裝應用程式後，系統會將您重新導向至**連線至 GitHub** 頁面，並自動填入安裝 ID。

1. 選擇**連線**。

1. 將金鑰為 `sagemaker` 和值為 `true` 的標籤新增至此 CodeConnections 連線。

1. 複製連線 ARN 以儲存供稍後使用。您可以在專案建立步驟中使用 ARN 做為參數。

## 步驟 2：建立專案
<a name="sagemaker-proejcts-walkthrough-create-3rdgit"></a>

在此步驟中，您可以使用 SageMaker AI 提供的專案範本來建置、訓練和部署模型，以建立 SageMaker AI MLOps 專案。

**建立 SageMaker AI MLOps 專案**

1. 登入 Studio。如需詳細資訊，請參閱[Amazon SageMaker AI 網域概觀](gs-studio-onboard.md)。

1. 在 Studio 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 選擇**建立專案**。

   會顯示**建立專案**索引標籤。

1. 對於 **SageMaker AI 專案範本**，請選擇**使用 CodePipeline 搭配第三方 Git 儲存庫進行模型建置、訓練和部署**。

1. 選擇**下一步**。

1. 在**模型建置程式碼儲存庫資訊**下，提供下列參數：
   + 在**分支**中，輸入從 Git 儲存庫要用於管道活動的分支。
   + 在**完整儲存庫名稱**中，以*使用者名稱/儲存庫名稱*或*組織/儲存庫名稱*的格式輸入 Git 儲存庫名稱。
   + 針對**程式碼連線 ARN**，輸入您在步驟 1 中建立的 CodeConnections 連線 ARN。

1. 在**模型部署程式碼儲存庫資訊**下，提供下列參數：
   + 在**分支**中，輸入從 Git 儲存庫要用於管道活動的分支。
   + 在**完整儲存庫名稱**中，以*使用者名稱/儲存庫名稱*或*組織/儲存庫名稱*的格式輸入 Git 儲存庫名稱。
   + 針對**程式碼連線 ARN**，輸入您在步驟 1 中建立的 CodeConnections 連線 ARN。

1. 選擇**建立專案**。

專案會顯示在**專案**清單中，其**狀態**為**已建立**。

## 步驟 3：在程式碼中進行變更
<a name="sagemaker-projects-walkthrough-change-3rdgit"></a>

現在對建置模型的管道程式碼進行變更，並遞交變更以啟動新的管道執行。管道執行會註冊新的模型版本。

**變更程式碼**

1. 在您的模型建置 GitHub 儲存庫中，導覽到 `pipelines/abalone` 資料夾。按兩下 `pipeline.py` 以開啟程式碼檔案。

1. 在 `pipeline.py` 檔案中，尋找設定訓練執行個體類型的行。

   ```
   training_instance_type = ParameterString(
           name="TrainingInstanceType", default_value="ml.m5.xlarge"
   ```

   開啟要編輯的檔案，變更 `ml.m5.xlarge` 為 `ml.m5.large`，然後遞交。

遞交程式碼變更之後，MLOps 系統會啟動建立新模型版本的管道執行。在下一個步驟中，您會核准新模型版本以將其部署至生產中。

## 步驟 4：核准模型
<a name="sagemaker-proejcts-walkthrough-approve-3rdgit"></a>

現在，您核准在上一個步驟中建立的新模型版本，以啟動將模型版本部署到 SageMaker AI 端點的操作。

**核准模型版本**

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 尋找您在第一步中建立的專案名稱，然後按兩下以開啟您的專案的專案索引標籤。

1. 在專案索引標籤中，選擇**模型群組**，然後按兩下出現的模型群組名稱。

   模型群組索引標籤隨即出現。

1. 在模型群組索引標籤中，按兩下**版本 1**。**版本 1** 索引標籤隨即開啟。選擇**更新狀態**。

1. 在模型**更新模型版本狀態**對話方塊的**狀態**下拉式清單中，選擇**核准**，然後選擇**更新狀態**。

   核准模型版本會導致 MLOps 系統將模型部署至預備。若要檢視端點，請選擇專案索引標籤上的**端點**索引標籤。

## (選擇性) 步驟 5：將模型版本部署至生產
<a name="sagemaker-projects-walkthrough-prod-3rdgit"></a>

現在，您可以將模型版本部署到生產環境。

**注意**  
若要完成此步驟，您必須是 Studio Classic 網域中的管理員。如果您不是管理員，請略過此步驟。

**將模型版本部署到生產環境**

1. 前往 [https://console.aws.amazon.com/codepipeline/](https://console.aws.amazon.com/codepipeline/) 登入 CodePipeline 主控台

1. 選擇**管道**，然後選擇名稱為 **sagemaker-*projectname*-*projectid*-modeldeploy** 的管道，其中 *projectname* 是您的專案名稱，*projectid* 是您的專案 ID。

1. 在 **DeployStaging** 階段中，選擇**檢閱**。

1. 在**檢閱**對話方塊中，選擇**核准**。

   核准 **DeployStaging** 階段會導致 MLOps 系統將模型部署至生產。若要檢視端點，請選擇 Studio Classic 中專案索引標籤上的**端點**索引標籤。

## 步驟 6：清除資源
<a name="sagemaker-projectcts-walkthrough-cleanup-3rdgit"></a>

若要停止產生費用，清除在此演練中已建立的資源。

**注意**  
若要刪除 CloudFormation 堆疊和 Amazon S3 儲存貯體，您需要是 Studio Classic 中的管理員。如果您不是管理員，請要求管理員完成這些步驟。

1. 在 Studio Classic 側邊欄中，選擇**首頁**圖示 (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/studio/icons/house.png))。

1. 從功能表中選取**部署**，然後選取**專案**。

1. 從下拉式清單中選取目標專案。如果沒有看到您的專案，請輸入專案名稱，並套用篩選器來尋找專案。

1. 選擇您的專案在主面板中查看其詳細資料。

1. 在**動作**功能表中，選擇**刪除**。

1. 從**刪除專案**視窗中，選擇**刪除**來確認您的選擇。

   這會刪除專案所建立的 Service Catalog 佈建產品。這包括為專案建立的程式 CodeCommit、CodePipeline 和 CodeBuild 資源。

1. 刪除專案建立的 CloudFormation 堆疊。有兩個堆疊，一個用於預備，一個用於生產。堆疊的名稱為 **sagemaker-*projectname*-*project-id*-deploy-staging** 和 **sagemaker-*projectname*-*project-id*-deploy-prod**，其中 *projectname* 是您的專案名稱，而 *project-id* 是您的專案 ID。

   如需有關如何刪除 CloudFormation 堆疊的資訊，請參閱*CloudFormation 《 使用者指南*》中的[刪除 CloudFormation 主控台上的堆疊](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html)。

1. 刪除專案建立的 Amazon S3 儲存貯體。儲存貯體的名稱為 **sagemaker-project-*project-id***，其中 *project-id* 是您的專案 ID。

# Amazon SageMaker AI MLOps 疑難排解
<a name="mlopsfaq"></a>

使用下列內容，針對 SageMaker AI 中的 MLOps 問題進行疑難排解。本主題提供與常見錯誤及解決方法相關的資訊。

## 如果我嘗試刪除從 SageMaker AI 範本建立的 SageMaker AI 專案，然後因為非空白的 Amazon S3 儲存貯體或 Amazon ECR 儲存庫而收到錯誤訊息，該如何刪除專案？
<a name="collapsible-section-14"></a>

如果您嘗試刪除 SageMaker AI 專案並收到下列其中一個錯誤訊息：

```
The bucket you tried to delete is not empty
```

```
The repository with name 'repository-name' in registry 
        with id 'id' cannot be deleted because it still contains images
```

然後，您有非空的 Amazon S3 儲存貯體或 Amazon ECR 儲存庫，在刪除 SageMaker AI 專案之前需要手動刪除。 CloudFormation 不會為您自動刪除非空的 Amazon S3 儲存貯體或 Amazon ECR 儲存庫。