

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# Amazon EMR 上的 Jupyter 筆記本
<a name="emr-jupyter"></a>

[Jupyter 筆記本](https://jupyter.org/)是一種可用於建立和共用文件的開放原始碼 web 應用程式，其中包含即時程式碼、方程式、視覺化和敘述文字。Amazon EMR 提供三種選項來使用 Jupyter 筆記本：

**Topics**
+ [EMR Studio](emr-studio-jupyter.md)
+ [EMR Notebooks](emr-jupyter-emr-managed-notebooks.md)
+ [JupyterHub](emr-jupyterhub.md)

# EMR Studio
<a name="emr-studio-jupyter"></a>

Amazon EMR Studio 是 Web 型整合式開發環境 (IDE)，適用於在 Amazon EMR 叢集上執行的全受管 [Jupyter 筆記本](https://jupyter.org/)。可以為您的團隊設定 EMR Studio，以開發、可視化和偵錯使用 R、Python、Scala 和 PySpark 編寫的應用程式。

建議在 Amazon EMR 上使用 Jupyter 筆記本時使用 EMR Studio。如需詳細資訊，請參閱《Amazon EMR 管理指南》**中的 [EMR Studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio.html)。

# 採用 Jupyter 筆記本的 Amazon EMR Notebooks
<a name="emr-jupyter-emr-managed-notebooks"></a>

EMR Notebooks 是內建於 Amazon EMR 主控台的 [Jupyter 筆記本](https://jupyter.org/)環境，可讓您快速建立 Jupyter 筆記本，將其附接至 Spark 叢集，然後在主控台中開啟 Jupyter 筆記本編輯器，以遠端執行查詢和程式碼。EMR Notebooks 獨立於叢集儲存在 Amazon S3 中，以提供持久的儲存、快速存取和靈活性。您可以開啟多個筆記本，將多個筆記本附接至單一叢集，以及在不同叢集上重複使用筆記本。

如需詳細資訊，請參閱《Amazon EMR 管理指南》**中的 [EMR Notebooks](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html)。

# JupyterHub
<a name="emr-jupyterhub"></a>

[Jupyter 筆記本](https://jupyter.org/)是一種可用於建立和共用文件的開放原始碼 web 應用程式，其中包含即時程式碼、方程式、視覺化和敘述文字。[JupyterHub](https://jupyterhub.readthedocs.io/en/latest/) 可讓您託管單一使用者 Jupyter 筆記本伺服器的多個執行個體。當您使用 JupyterHub 來建立叢集時，Amazon EMR 會在叢集的主節點上建立 Docker 容器。JupyterHub、Jupyter 需要的所有元件，以及 [Sparkmagic](https://github.com/jupyter-incubator/sparkmagic/blob/master/README.md) 都在容器內執行。

Sparkmagic 是一種核心的程式庫，可讓 Jupyter 筆記本透過 [Apache Livy](emr-livy.md) (適用於 Spark 的一種 REST 伺服器) 與在 Amazon EMR 上執行的 [Apache Spark](https://aws.amazon.com/big-data/what-is-spark/) 互動。當您建立使用 JupyterHub 的叢集時，會自動安裝 Spark 和 Apache Livy。適用於 Jupyter 的預設 Python 3 核心，可與 PySpark 3、PySpark 和 Spark 提供的 Spark 核心一起使用。您可以使用這些核心執行臨機操作 Spark 程式碼，並使用 Python 和 Scala 進行互動式 SQL 查詢。您可以在 Docker 容器手動安裝其他核心。如需詳細資訊，請參閱[安裝其他核心和程式庫](emr-jupyterhub-install-kernels-libs.md)。

下圖說明了 Amazon EMR 上的 JupyterHub 元件，以及和筆記本使用者與管理員對應的身分驗證方法。如需詳細資訊，請參閱[新增 Jupyter 筆記本使用者和管理員](emr-jupyterhub-user-access.md)。

![\[JupyterHub architecture on EMR showing user authentication and component interactions.\]](http://docs.aws.amazon.com/zh_tw/emr/latest/ReleaseGuide/images/jupyter-arch.png)


以下表格列出了 Amazon EMR 7.x 系列最新版本中包含的 JupyterHub 版本，以及 Amazon EMR 與 JupyterHub 一起搭配安裝的元件。

如需此版本中與 JupyterHub 一起安裝的元件版本，請參閱[發行版本 7.12.0 元件版本。](emr-7120-release.md)


**emr-7.12.0 的 JupyterHub 版本資訊**  

| Amazon EMR 發行標籤 | JupyterHub 版本 | 與 JupyterHub 一起搭配安裝的元件 | 
| --- | --- | --- | 
| emr-7.12.0 | JupyterHub 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

下表列出 Amazon EMR 6.x 系列最新版本中包含的 JupyterHub 版本，以及 Amazon EMR 與 JupyterHub 一起搭配安裝的元件。

如需此版本中與 JupyterHub 一起搭配安裝的元件版本，請參閱[發行版本 6.15.0 元件版本](emr-6150-release.md)。


**emr-6.15.0 的 JupyterHub 版本資訊**  

| Amazon EMR 發行標籤 | JupyterHub 版本 | 與 JupyterHub 一起搭配安裝的元件 | 
| --- | --- | --- | 
| emr-6.15.0 | JupyterHub 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

下表列出 Amazon EMR 5.x 系列最新版本中包含的 JupyterHub 版本，以及 Amazon EMR 與 JupyterHub 一起搭配安裝的元件。

如需此版本中與 JupyterHub 一起安裝的元件版本，請參閱[發行版本 5.36.2 元件版本。](emr-5362-release.md)


**emr-5.36.2 的 JupyterHub 版本資訊**  

| Amazon EMR 發行標籤 | JupyterHub 版本 | 與 JupyterHub 一起搭配安裝的元件 | 
| --- | --- | --- | 
| emr-5.36.2 | JupyterHub 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

Amazon EMR 上 JupyterHub 隨附的 Python 3 核心是 3.6.4 版。

在 `jupyterhub` 容器中安裝的程式庫，可能因 Amazon EMR 發行版本與 Amazon EC2 AMI 版本而異。

**使用 `conda` 列出已安裝的程式庫。**
+ 在主節點命令列上執行以下命令：

  ```
  sudo docker exec jupyterhub bash -c "conda list"
  ```

**使用 `pip` 列出已安裝的程式庫。**
+ 在主節點命令列上執行以下命令：

  ```
  sudo docker exec jupyterhub bash -c "pip freeze"
  ```

**Topics**
+ [建立使用 JupyterHub 的叢集](emr-jupyterhub-launch.md)
+ [在 Amazon EMR 上使用 JupyterHub 時的考量](emr-jupyterhub-considerations.md)
+ [設定 JupyterHub](emr-jupyterhub-configure.md)
+ [在 Amazon S3 中設定筆記本的持久性](emr-jupyterhub-s3.md)
+ [連接至主節點和筆記本伺服器](emr-jupyterhub-connect.md)
+ [JupyterHub 組態和管理](emr-jupyterhub-administer.md)
+ [新增 Jupyter 筆記本使用者和管理員](emr-jupyterhub-user-access.md)
+ [安裝其他核心和程式庫](emr-jupyterhub-install-kernels-libs.md)
+ [JupyterHub 版本歷史記錄](JupyterHub-release-history.md)

# 建立使用 JupyterHub 的叢集
<a name="emr-jupyterhub-launch"></a>

您可以使用 AWS 管理主控台 AWS Command Line Interface或 Amazon EMR API，透過 JupyterHub 建立 Amazon EMR 叢集。請確定建立叢集時，未選擇在完成步驟後自動終止 ( AWS CLI中的 `--auto-terminate` 選項)。此外，確保管理員和筆記本使用者可以存取您在建立叢集時使用的金鑰對。如需詳細資訊，請參閱《Amazon EMR 管理指南》**中的[使用 SSH 憑證的金鑰對](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-ssh.html)。

## 使用主控台建立具有 JupyterHub 的叢集
<a name="emr-jupyterhub-launch-console"></a>

利用下列的程序，在 Amazon EMR 主控台中使用**進階選項**，來建立安裝了 JupyterHub 的叢集。

**使用 Amazon EMR 主控台來建立安裝了 JupyterHub 的 Amazon EMR 叢集**

1. 導覽至新的 Amazon EMR 主控台，然後從側邊導覽選取**切換至舊主控台**。如需有關切換至舊主控台時預期情況的詳細資訊，請參閱[使用舊主控台](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in)。

1. 選擇 **Create cluster (建立叢集)**，然後選擇 **Go to advanced options (前往進階選項)**。

1. 在 **Software Configuration (軟體組態)** 中：
   + 針對**版本**，選取 emr-5.36.2，然後選擇 JupyterHub。
   + 如果您使用 Spark，若要使用 AWS Glue Data Catalog 做為 Spark SQL 的中繼存放區，請選取**用於 Spark 資料表中繼資料**。如需詳細資訊，請參閱[在 AWS Amazon EMR 上使用 Glue Data Catalog Catalog 搭配 Spark](emr-spark-glue.md)。
   + 針對 **Edit software settings (編輯軟體設定)** 選擇 **Enter configuration (輸入組態)** 並指定值，或選擇 **Load JSON from S3 (從 S3 載入 JSON)**，然後指定 JSON 組態檔案。如需詳細資訊，請參閱[設定 JupyterHub](emr-jupyterhub-configure.md)。

1. 在 **Add steps (optional) ((新增步驟) (選用))** 中，設定建立叢集時所要執行的步驟，請務必不要選取 **Auto-terminate cluster after the last step is completed (完成最後一個步驟後，自動終止叢集)**，然後選擇 **Next (下一步)**。

1. 選擇 **Hardware Configuration (硬體組態)** 選項、**Next (下一步)**。如需詳細資訊，請參閱《Amazon EMR 管理指南》**中的[設定叢集硬體與聯網](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances.html)。

1. 選擇 **General Cluster Settings (一般叢集設定)** 的選項、**Next (下一步)**。

1. 選擇 **Security Options (安全選項)**、指定一組金鑰對，然後選擇 **Create Cluster (建立叢集)**。

## 使用 使用 JupyterHub 建立叢集 AWS CLI
<a name="emr-jupyterhub-launch-cli"></a>

若要啟動安裝了 JupyterHub 的叢集，請使用 `aws emr create-cluster` 指令，並針對 `--applications` 選項指定 `Name=JupyterHub`。以下範例會在 Amazon EMR 上使用兩個 EC2 執行個體啟動 JupyterHub 叢集 (一個主要和一個核心執行個體)。此外，已啟用偵錯功能，日誌儲存於透過 `--log-uri` 所指定的 Amazon S3 位置。該指定的金鑰對在叢集中提供對 Amazon EC2 執行個體的存取權。

**注意**  
包含 Linux 行接續字元 (\$1) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

```
aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
--applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
--use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair
```

# 在 Amazon EMR 上使用 JupyterHub 時的考量
<a name="emr-jupyterhub-considerations"></a>

在 Amazon EMR 上使用 JupyterHub 時，考慮下列事項。
+ 
**警告**  
使用者筆記本和檔案會儲存到主節點的檔案系統。這是暫時性儲存，不會在叢集終止時保留。若未進行備份，叢集終止時將遺失此資料。我們建議您使用 `cron` 任務或其他適用於您應用程式的方式來排程定期備份。  
此外，如果容器重新啟動，可能無法保留容器內所做的組態變更。我們建議您編寫指令碼或自動化容器組態，如此您便可更容易的重現自訂項目。
+ 不支援已設定為使用 Amazon EMR 安全組態的 Kerberos 身分驗證。
+ 不支援 [OAuthenticator](https://github.com/jupyterhub/oauthenticator)。

# 設定 JupyterHub
<a name="emr-jupyterhub-configure"></a>

您可以藉由連接至叢集主節點並編輯組態檔案，來自訂 Amazon EMR 上 JupyterHub 的組態和個別的使用者筆記本。在您變更值後，重新啟動 `jupyterhub` 容器。

在下列檔案中修改屬性來設定 JupyterHub 和個別的 Jupyter 筆記本：
+ `jupyterhub_config.py` – 依預設，此檔案儲存在主節點上的 `/etc/jupyter/conf/` 目錄中。如需詳細資訊，請參閱 JupyterHub 文件中的[組態基礎概念](http://jupyterhub.readthedocs.io/en/latest/getting-started/config-basics.html)。
+ `jupyter_notebook_config.py` – 此檔案預設儲存在 `/etc/jupyter/` 目錄中，而且預設為複製到 `jupyterhub` 容器。如需詳細資訊，請參閱 Jupyter 筆記本文件中的[組態檔和命令列選項](https://jupyter-notebook.readthedocs.io/en/5.7.4/config.html)。

您也可以使用 `jupyter-sparkmagic-conf` 組態分類來自訂 Sparkmagic，這會為 Sparkmagic 更新 `config.json` 檔案中的值。關於可用的設定，詳細資訊請參閱 [GitHub 上的 example\$1config.json](https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json)。如需有關在 Amazon EMR 中對應用程式使用組態分類的詳細資訊，請參閱 [設定應用程式](emr-configure-apps.md)。

下列範例使用 啟動叢集 AWS CLI，並參考 Sparkmagic 組態分類設定`MyJupyterConfig.json`的檔案。

**注意**  
包含 Linux 行接續字元 (\$1) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

```
aws emr create-cluster --use-default-roles --release-label emr-5.14.0 \
--applications Name=Jupyter --instance-type m4.xlarge --instance-count 3 \
--ec2-attributes KeyName=MyKey,SubnetId=subnet-1234a5b6 --configurations file://MyJupyterConfig.json
```

`MyJupyterConfig.json` 範例內容如下：

```
[
    {
    "Classification":"jupyter-sparkmagic-conf",
    "Properties": {
      "kernel_python_credentials" : "{\"username\":\"diego\",\"base64_password\":\"mypass\",\"url\":\"http:\/\/localhost:8998\",\"auth\":\"None\"}"
      }
    }
]
```

**注意**  
對於 Amazon EMR 版本 5.21.0 及更高版本，您可以覆寫叢集組態，並且為執行中叢集的每個執行個體群組，指定額外組態分類。您可以使用 Amazon EMR 主控台、 AWS Command Line Interface (AWS CLI) 或 AWS SDK 來執行此操作。如需詳細資訊，請參閱[為執行中叢集的執行個體群組提供組態](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html)。

# 在 Amazon S3 中設定筆記本的持久性
<a name="emr-jupyterhub-s3"></a>

您可以在 Amazon EMR 中設定 JupyterHub 叢集，讓使用者所儲存的筆記本能夠持久存在於 Amazon S3 中 (在叢集 EC2 執行個體上暫時性儲存區的外部)。

當您建立叢集時，可以使用 `jupyter-s3-conf` 組態分類來指定 Amazon S3 持久性。如需詳細資訊，請參閱[設定應用程式](emr-configure-apps.md)。

除了使用 `s3.persistence.enabled` 屬性來啟用 Amazon S3 持久性以外，您也可以利用 `s3.persistence.bucket` 屬性，來指定筆記本儲存所在 Amazon S3 中的儲存貯體。每個使用者的筆記本，會儲存到指定儲存貯體中的 `jupyter/jupyterhub-user-name` 資料夾。該儲存貯體必須已存在於 Amazon S3 中，而且您在建立叢集時，所指定的 EC2 執行個體設定檔角色，必須具有對該儲存貯體的許可 (此角色預設為 `EMR_EC2_DefaultRole`)。如需詳細資訊，請參閱[設定 AWS 服務 Amazon EMR 許可的 IAM 角色](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html)。

當您使用相同的組態分類屬性，來啟動新的叢集時，使用者可以開啟筆記本，其內容來自儲存的位置。

請注意，您在啟用了 Amazon S3 的情況下匯入檔案作為筆記本中的模組時，這將會導致檔案上傳至 Amazon S3。在未啟用 Amazon S3 持久性的情況下匯入檔案時，檔案會上傳至您的 JupyterHub 容器。

下列範例可實現 Amazon S3 持久性。使用者所儲存的筆記本，會儲存於每個使用者的 `s3://MyJupyterBackups/jupyter/jupyterhub-user-name` 資料夾中，其中 `jupyterhub-user-name` 是使用者名稱，例如 `diego`。

```
[
    {
        "Classification": "jupyter-s3-conf",
        "Properties": {
            "s3.persistence.enabled": "true",
            "s3.persistence.bucket": "MyJupyterBackups"
        }
    }
]
```

# 連接至主節點和筆記本伺服器
<a name="emr-jupyterhub-connect"></a>

JupyterHub 管理員和筆記本使用者必須使用 SSH 通道連接到叢集的主節點，再連接到主節點上由 JupyterHub 所服務的 web 界面。如需有關設定 SSH 通道並使用通道來代理 Web 連線的詳細資訊，請參閱《Amazon EMR 管理指南》**中的[連接至叢集](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node.html)。

依預設，Amazon EMR 上的 JupyterHub 可透過主節點上的**連接埠 9443** 使用。內部 JupyterHub 代理程式也可透過連接埠 9443 來服務筆記本執行個體。您可以使用下列模式的 URL 來存取 JupyterHub 與 Jupyter web 界面：

**https://***MasterNodeDNS***:9443**

您可以使用 `c.JupyterHub.port` 檔案中的 `jupyterhub_config.py` 屬性來指定不同的連接埠。如需詳細資訊，請參閱 JupyterHub 文件中的[聯網基礎概念](http://jupyterhub.readthedocs.io/en/latest/getting-started/networking-basics.html)。

依預設，Amazon EMR 上的 JupyterHub 會針對使用 HTTPS 的 SSL 加密，使用自簽憑證。提示使用者在進行連接時信任該自簽憑證。您可以使用自己的信任憑證和金鑰。以您自己的憑證與金鑰檔案取代主節點 `server.crt` 目錄中預設的 `server.key` 憑證檔案與 `/etc/jupyter/conf/` 金鑰檔案目錄。使用 `c.JupyterHub.ssl_key` 檔案中的 `c.JupyterHub.ssl_cert` 和 `jupyterhub_config.py` 屬性以指定您的 SSL 資料。如需詳細資訊，請參閱 JupyterHub 文件中的[安全設定](https://jupyterhub.readthedocs.io/en/latest/tutorial/getting-started/security-basics.html)。在您更新 `jupyterhub_config.py` 後，重新啟動容器。

# JupyterHub 組態和管理
<a name="emr-jupyterhub-administer"></a>

JupyterHub 和相關元件在名為 `jupyterhub` 的 Docker 容器內執行，該容器執行 Ubuntu 作業系統。您可使用多種方法來管理容器內執行的元件。

**警告**  
如果容器重新啟動，您在容器內執行的自訂項目可能不會保留。我們建議您編寫指令碼或自動化容器組態，如此您便可更容易的重現自訂項目。

## 使用命令列進行管理
<a name="emr-jupyterhub-administer-cli"></a>

當使用 SSH 連接到主節點時，您可以使用 Docker 命令列界面 (CLI) 和透過名稱 (`jupyterhub`) 或 ID 來指定容器以發出命令。例如，`sudo docker exec jupyterhub command` 執行由作業系統或容器內執行的應用程式所辨識的命令。您可以使用此方法將使用者新增到作業系統，並在 Docker 容器中安裝其他應用程式和程式庫。例如，預設的容器映像包含針對套件安裝的 Conda，因此您可以在主節點命令列上執行以下命令，以在容器中安裝應用程式、Keras：

```
sudo docker exec jupyterhub conda install keras
```

## 提交步驟以進行管理
<a name="emr-jupyterhub-administer-steps"></a>

步驟是一種提交工作到叢集的方式。您可以在啟動叢集時提交步驟，或者提交步驟到執行中的叢集。您可使用 `command-runner.jar`，以將您在命令列上執行的命令做為步驟來提交。如需詳細資訊，請參閱《Amazon EMR 管理指南》**中的[使用 CLI 和主控台來使用步驟](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-work-with-steps.html)，以及 [在 Amazon EMR 叢集上執行命令和指令碼](emr-commandrunner.md)。

例如，您可以在本機電腦上使用下列 AWS CLI 命令來安裝 Keras，方法與您在先前範例中從主節點命令列執行的操作相同：

```
aws emr add-steps --cluster-id MyClusterID --steps Name="Command Runner",Jar="command-runner.jar",Args="/usr/bin/sudo","/usr/bin/docker","exec","jupyterhub","conda","install","keras"
```

此外，您也可以編寫一系列的步驟指令碼，並上傳至 Amazon S3，然後在建立叢集或將指令碼新增為步驟時，使用 `script-runner.jar` 來執行指令碼。如需詳細資訊，請參閱[在 Amazon EMR 叢集上執行命令和指令碼](emr-commandrunner.md)。如需範例，請參閱 [範例：新增多個使用者的 Bash 指令碼](emr-jupyterhub-pam-users.md#emr-jupyterhub-script-multuser)。

## 使用 REST API 進行管理
<a name="emr-jupyterhub-administer-rest"></a>

適用於 JupyterHub 的 Jupyter、JupyterHub 和 HTTP 代理提供您用於傳送請求的 REST API。若要傳送請求至 JupyterHub，您必須與請求一起傳遞 API 字符。您可以從主節點命令列使用 `curl` 命令來執行 REST 命令。如需詳細資訊，請參閱下列資源：
+ JupyterHub 文件中的[使用 JupyterHub 的 REST API](http://jupyterhub.readthedocs.io/en/latest/reference/rest.html)，其中包含產生 API 字符的說明。
+ GitHub 上的 [Jupyter 筆記本伺服器 API](https://github.com/jupyter/jupyter/wiki/Jupyter-Notebook-Server-API)
+ GitHub 上的 [configurable-http-proxy](https://github.com/jupyterhub/configurable-http-proxy)

以下範例示範了如何使用適用於 JupyterHub 的 REST API 以取得使用者清單。該指令會傳遞先前產生的管理字符，並針對 JupyterHub 使用預設的 9443 連接埠，將輸出透過管道傳送到 [jq](https://stedolan.github.io/jq/)，以便於檢視：

```
curl -XGET -s -k https://$HOST:9443/hub/api/users \
-H "Authorization: token $admin_token" | jq .
```

# 新增 Jupyter 筆記本使用者和管理員
<a name="emr-jupyterhub-user-access"></a>

您可以使用以下兩種方法之一以讓使用者驗證 JupyterHub，如此他們便能建立筆記本以及選擇性地管理 JupyterHub。最簡單的方法是使用 JupyterHub 的插入式驗證模組 (PAM)。此外，Amazon EMR 上的 JupyterHub 支援[適用於 JupyterHub 的 LDAP 驗證器外掛程式](https://github.com/jupyterhub/ldapauthenticator/) (可用來從 Microsoft Active Directory 伺服器等 LDAP 伺服器取得使用者身分)。本節提供使用每個身分驗證方法以新增使用者的指示和範例。

在 Amazon EMR 上的 JupyterHub 具有擁有管理員許可的預設使用者。使用者名稱為 `jovyan` 且密碼是 `jupyter`。我們強烈建議您以另一個具管理權限的使用者來取代該使用者。在建立叢集時使用一個步驟，或在叢集執行時連接至主節點，也可以達到相同結果。

**Topics**
+ [使用 PAM 身分驗證](emr-jupyterhub-pam-users.md)
+ [使用 LDAP 身分驗證](emr-jupyterhub-ldap-users.md)
+ [使用者模擬](emr-jupyterhub-user-impersonation.md)

# 使用 PAM 身分驗證
<a name="emr-jupyterhub-pam-users"></a>

在 Amazon EMR 的 JupyterHub 中建立 PAM 使用者的程序有兩個步驟。第一個步驟是將使用者新增到在主節點 `jupyterhub` 容器中執行的作業系統，並為每個使用者新增對應的使用者主目錄。第二個步驟是將這些作業系統使用者新增為 JupyterHub 使用者 – 這項程序在 JupyterHub 中稱為列入允許清單。在新增 JupyterHub 使用者後，他們可以連接到 JupyterHub URL 並提供他們的作業系統登入資料以進行存取。

當使用者登入時，JupyterHub 會為該使用者開啟筆記本伺服器執行個體，此執行個體儲存在該使用者的主節點主目錄，即 `/var/lib/jupyter/home/username`。如果筆記本伺服器執行個體不存在，JupyterHub 會在使用者的主目錄中產生筆記本執行個體。以下章節將示範如何個別將使用者新增到作業系統和 JupyterHub，接著是新增多個使用者的早期 bash 指令碼。

## 將作業系統使用者新增至容器
<a name="emr-jupyterhub-system-user"></a>

下列的範例會先在容器中使用 [useradd](https://linux.die.net/man/8/useradd) 指令，來新增一位使用者 diego，然後為該名使用者建立主目錄。第二個指令使用 [chpasswd](https://linux.die.net/man/8/chpasswd) 來為此使用者建立 diego 的密碼。當使用 SSH 連接時，命令在主節點命令列上執行。您也可以使用步驟執行這些命令，如 [提交步驟以進行管理](emr-jupyterhub-administer.md#emr-jupyterhub-administer-steps) 所述。

```
sudo docker exec jupyterhub useradd -m -s /bin/bash -N diego
sudo docker exec jupyterhub bash -c "echo diego:diego | chpasswd"
```

## 新增 JupyterHub 使用者
<a name="emr-jupyterhub-jupyterhub-user"></a>

您可以使用 JupyterHub 中的 **Admin (管理員)** 面板或 REST API，來新增使用者和管理員，或是只新增使用者。

**若要使用 JupyterHub 中的管理面板來新增使用者和管理員**

1. 使用 SSH 連線到主節點，並以具有管理者許可的身分登入 https://*MasterNodeDNS*:9443。

1. 選擇 **Control Panel (控制面板)**、**Admin (管理員)**。

1. 選擇 **User (使用者)**、**Add Users (新增使用者)**，或選擇 **Admin (管理員)**、**Add Admins (新增管理員)**。

**使用 REST API 新增使用者**

1. 使用 SSH 連接到主節點並使用下列主節點的命令，或將該命令做為步驟執行。

1. 取得管理字符以發出 API 請求，並將下列步驟中的 *AdminToken (AdminToken)* 換成該字符。

1. 使用下列的指令，將 *UserName (使用者名稱)* 換成在容器中建立的作業系統使用者。

   ```
   curl -XPOST -H "Authorization: token AdminToken" "https://$(hostname):9443/hub/api/users/UserName
   ```

**注意**  
在您首次登入 JupyterHub Web 介面時，系統會自動將您新增為 JupyterHub 非管理員使用者。

## 範例：新增多個使用者的 Bash 指令碼
<a name="emr-jupyterhub-script-multuser"></a>

以下的範例 bash 指令碼，和此章節之前欲建立多個 JupyterHub 使用者的步驟環環相扣。此指令碼可以直接在主節點上執行，也可以上傳至 Amazon S3，然後作為步驟執行。

該指令碼首先會建立一系列的使用者名稱，並使用 `jupyterhub token` 命令來建立預設管理員 jovyan 的 API 字符。然後，它會為每個使用者在 `jupyterhub` 容器中建立作業系統使用者，並個別指派等同於他們使用者名稱的初始密碼。最後，它會呼叫 REST API 操作以在 JupyterHub 建立每個使用者。它會傳遞之前在指令碼中產生的字符，並導入 REST 回應至 `jq` 以更輕鬆地檢視。

```
# Bulk add users to container and JupyterHub with temp password of username
set -x
USERS=(shirley diego ana richard li john mary anaya)
TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
for i in "${USERS[@]}"; 
do 
   sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i
   sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd"
   curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \
 -H "Authorization: token $TOKEN" | jq
done
```

將此指令碼儲存至的 Amazon S3 中的位置，例如 `s3://amzn-s3-demo-bucket/createjupyterusers.sh`。然後，您可以使用 `script-runner.jar` 以將其做為步驟執行。

### 範例：在建立叢集 (AWS CLI) 時執行此指令碼
<a name="emr-jupyterhub-multuser-createcluster"></a>

**注意**  
包含 Linux 行接續字元 (\$1) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

```
aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
--applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
--use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

### 在現有叢集 (AWS CLI) 執行此指令碼
<a name="emr-jupyterhub-multuser-runningcluster"></a>

**注意**  
包含 Linux 行接續字元 (\$1) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

```
aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=CUSTOM_JAR,\
Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

# 使用 LDAP 身分驗證
<a name="emr-jupyterhub-ldap-users"></a>

輕量型目錄存取協定 (LDAP) 是一種應用程式協定，用於查詢和修改對應到資源的物件，例如存放在相容 LDAP 的目錄服務供應商 (例如 Active Directory 或 OpenLDAP 伺服器) 的使用者和電腦。您可以透過 Amazon EMR 上的 JupyterHub 來使用[適用於 JupyterHub 的 LDAP 驗證器外掛程式](https://github.com/jupyterhub/ldapauthenticator/)，以使用 LDAP 進行使用者身分驗證。該外掛程式處理適用於 LDAP 使用者的登入工作階段，並提供使用者資訊給 Jupyter。這可讓使用者使用存放在 LDAP 相容伺服器中的身分登入資料連接到 JupyterHub 和筆記本。

本節中的步驟會逐步引導您完成以下步驟，以使用適用於 JupyterHub 的 LDAP Authenticator 外掛程式來設定並啟用 LDAP。您將在連接到主節點命令列的同時執行該步驟。如需詳細資訊，請參閱[連接至主節點和筆記本伺服器](emr-jupyterhub-connect.md)。

1. 以 LDAP 伺服器的資訊 (例如主機 IP 地址、連接埠、綁定名稱，以此類推) 建立 LDAP 組態檔案。

1. 修改 `/etc/jupyter/conf/jupyterhub_config.py`，以啟用適用於 JupyterHub 的 LDAP 驗證器外掛程式。

1. 在 `jupyterhub` 容器中建立和執行設定 LDAP 的指令碼。

1. 為使用者查詢 LDAP，然後在容器中為每個使用者建立主目錄。JupyterHub 需要主目錄以託管筆記本。

1. 執行重新啟動 JupyterHub 的指令碼

**重要**  
在您設定 LDAP 前，請測試您的網路基礎設施，以確保 LDAP 伺服器和叢集主節點可以視需要進行通訊。TLS 通常會對純 TCP 連接使用連接埠 389。如果您的 LDAP 連線使用 SSL，熟知的 SSL TCP 連接埠是 636。

## 建立 LDAP 組態檔案
<a name="emr-jupyterhub-ldap-config"></a>

以下範例使用以下預留位置組態值。將這些取代為符合您實作的參數。
+ 該 LDAP 伺服器執行版本 3 並可在連接埠 389 上使用。此為標準的 LDAP 非 SSL 連接埠。
+ 該基本辨別名稱 (DN) 為 `dc=example, dc=org`。

使用文字編輯器來建立 [ldap.conf](http://manpages.ubuntu.com/manpages/bionic/man5/ldap.conf.5.html) 檔案，此檔案具有類似於下列的內容。使用適用於您 LDAP 實作的值。將 *host (主機)* 換成您 LDAP 伺服器的 IP 地址或可解析主機名稱。

```
base dc=example,dc=org
uri ldap://host
ldap_version 3
binddn cn=admin,dc=example,dc=org
bindpw admin
```

## 啟用適用於 JupyterHub 的 LDAP 驗證器外掛程式
<a name="emr-jupyterhub-ldap-plugin"></a>

使用文字編輯器來修改 `/etc/jupyter/conf/jupyterhub_config.py` 檔案，並新增類似於下列的 [ldapauthenticator](https://github.com/jupyterhub/ldapauthenticator) 屬性。將 *host (主機)* 換成 LDAP 伺服器的 IP 地址或可解析主機名稱。此範例假設該使用者物件在名為 *people (人員) *的組織單位 (ou) 中，並使用您之前使用 `ldap.conf` 所建立的辨別名稱元件。

```
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
c.LDAPAuthenticator.use_ssl = False
c.LDAPAuthenticator.server_address = 'host' 
c.LDAPAuthenticator.bind_dn_template = 'cn={username},ou=people,dc=example,dc=org'
```

## 使用容器設定 LDAP
<a name="emr-jupyterhub-ldap-container"></a>

使用文字編輯器以使用下列內容建立 bash 程式碼：

```
#!/bin/bash

# Uncomment the following lines to install LDAP client libraries only if
# using Amazon EMR release version 5.14.0. Later versions install libraries by default.
# sudo docker exec jupyterhub bash -c "sudo apt-get update"
# sudo docker exec jupyterhub bash -c "sudo apt-get -y install libnss-ldap libpam-ldap ldap-utils nscd"
 
# Copy ldap.conf
sudo docker cp ldap.conf jupyterhub:/etc/ldap/
sudo docker exec jupyterhub bash -c "cat /etc/ldap/ldap.conf"
 
# configure nss switch
sudo docker exec jupyterhub bash -c "sed -i 's/\(^passwd.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^group.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^shadow.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "cat /etc/nsswitch.conf"
 
# configure PAM to create home directories
sudo docker exec jupyterhub bash -c "echo 'session required        pam_mkhomedir.so skel=/etc/skel umask=077' >> /etc/pam.d/common-session"
sudo docker exec jupyterhub bash -c "cat /etc/pam.d/common-session"
 
# restart nscd service
sudo docker exec jupyterhub bash -c "sudo service nscd restart"
 
# Test
sudo docker exec jupyterhub bash -c "getent passwd"

# Install ldap plugin
sudo docker exec jupyterhub bash -c "pip install jupyterhub-ldapauthenticator"
```

在主節點上儲存該指令碼，然後在主節點命令列執行。例如，將指令碼儲存為 `configure_ldap_client.sh`，使檔案可執行：

```
chmod +x configure_ldap_client.sh
```

並執行該指令碼：

```
./configure_ldap_client.sh
```

## 將屬性新增至 Active Directory
<a name="emr-jupyterhub-ldap-adproperties"></a>

若要在資料庫中尋找每位使用者並建立適當的項目，則 JupyterHub Docker 容器會要求 Active Directory 中對應的使用者物件皆具備下列 UNIX 屬性：如需詳細資訊，請參閱文章[關於 Windows Server 2016 技術預覽版及更高版本中 Unix 身分管理 (IDMU) 和 NIS 伺服器角色的狀態的澄清](https://blogs.technet.microsoft.com/activedirectoryua/2016/02/09/identity-management-for-unix-idmu-is-deprecated-in-windows-server/)中的*既然 Unix 屬性外掛程式不再可用於 Active Directory 使用者和電腦 MMC 嵌入式管理單元，我該如何繼續編輯 GID/UID RFC 2307 屬性？*。
+ `homeDirectory`

  此為使用者的主目錄位置，通常是 `/home/username`。
+ `gidNumber`

  此為大於 60000 且尚未有其他使用者使用過的數值。請檢查使用中 GID 的 `etc/passwd` 檔案。
+ `uidNumber`

  此為大於 60000 且尚未有其他群組使用過的數值。請檢查使用中 UID 的 `etc/group` 檔案。
+ `uid`

  此數值與 *username* 相同。

## 建立使用者主目錄
<a name="emr-jupyterhub-ldap-directories"></a>

JupyterHub 需要容器中的主目錄來驗證 LDAP 使用者並儲存執行個體資料。下列的範例示範 LDAP 目錄中的兩個使用者，*shirley* 和 *diego*。

第一個步驟是使用 [ldapsearch](http://manpages.ubuntu.com/manpages/xenial/man1/ldapsearch.1.html)，來查詢 LDAP 伺服器，以取得每個使用者的使用者 id 和群組 id 資訊，如下列範例所示；請將 *host (主機)* 換成您 LDAP 伺服器的 IP 地址或可解析主機名稱：

```
ldapsearch -x -H ldap://host \
 -D "cn=admin,dc=example,dc=org" \
 -w admin \
 -b "ou=people,dc=example,dc=org" \
 -s sub \
 "(objectclass=*)" uidNumber gidNumber
```

`ldapsearch` 指令會針對使用者 *shirley* 和 *diego*，傳回 LDIF 格式的回應 (類似於下列的內容)。

```
# extended LDIF

# LDAPv3
# base <ou=people,dc=example,dc=org> with scope subtree
# filter: (objectclass=*)
# requesting: uidNumber gidNumber sn 

# people, example.org
dn: ou=people,dc=example,dc=org

# diego, people, example.org
dn: cn=diego,ou=people,dc=example,dc=org
sn: B
uidNumber: 1001
gidNumber: 100

# shirley, people, example.org
dn: cn=shirley,ou=people,dc=example,dc=org
sn: A
uidNumber: 1002
gidNumber: 100

# search result
search: 2
result: 0 Success

# numResponses: 4
# numEntries: 3
```

使用來自回應的資訊，在容器內執行命令，以為每個使用者的常見名稱 (`cn`) 建立主目錄。使用 `uidNumber` 和 `gidNumber` 以修正該使用者主目錄的擁有權。下列的範例指令會為使用者 *shirley* 執行此動作。

```
sudo docker container exec jupyterhub bash -c "mkdir /home/shirley"
sudo docker container exec jupyterhub bash -c "chown -R $uidNumber /home/shirley"
sudo docker container exec jupyterhub bash -c "sudo chgrp -R $gidNumber /home/shirley"
```

**注意**  
適用於 JupyterHub 的 LDAP 驗證器不支援建立本機使用者。如需詳細資訊，請參閱[本機使用者建立的 LDAP 驗證器組態注意事項](https://github.com/jupyterhub/ldapauthenticator#configuration-note-on-local-user-creation)。  
若要手動建立本機使用者，請使用下列命令。  

```
sudo docker exec jupyterhub bash -c "echo 'shirley:x:$uidNumber:$gidNumber::/home/shirley:/bin/bash' >> /etc/passwd"
```

## 重新啟動 Jupyterhub 容器
<a name="emr-jupyterhub-ldap-restart"></a>

若要重新啟動 `jupyterhub` 容器，請執行下列命令：

```
sudo docker stop jupyterhub
sudo docker start jupyterhub
```

# 使用者模擬
<a name="emr-jupyterhub-user-impersonation"></a>

在 Jupyter 筆記本內執行的 Spark 作業，會在其於 Amazon EMR 上執行期間周遊多個應用程式。例如，使用者在 Jupyter 內執行的 PySpark3 程式碼，將由 Sparkmagic 接收，此應用程式會使用 HTTP POST 要求將該程式碼提交到 Livy，再由其建立可在使用 YARN 的叢集上執行的 Spark 工作。

根據預設，以這種方式提交的 YARN 工作將會以使用者 `livy` 的身分執行，無論該工作當初是由哪位使用者啟動。設定*使用者模擬*之後，筆記本使用者的使用者 ID 也會成為與該 YARN 工作關聯的使用者。因此不需要讓 `shirley` 和 `diego` 起始的工作都與使用者 `livy` 建立關聯，因為由每位使用者所起始的工作，都已分別與 `shirley` 和 `diego` 互有關聯。這樣能幫助您稽核 Jupyter 用量，以及管理組織內的應用程式。

只有當從 Sparkmagic 發出的 Livy 呼叫是未經授權時，此組態才會受到支援。不支援為 Hadoop 應用程式和 Livy 之間提供身分驗證或代理層 (例如 Apache Knox Gateway) 的應用程式。本節中所述用於設定使用者模擬的步驟，則假設 JupyterHub 和 Livy 執行於相同主節點上。如果您的應用程式具有不同的叢集，則您必須修改 [步驟 3：建立 HDFS 使用者主目錄](#Step3-UserImpersonation)，以便在 Livy 主節點上建立該 HDFS 目錄。

**Topics**
+ [步驟 1：設定 Livy](#Step1-UserImpersonation)
+ [步驟 2：新增使用者](#Step2-UserImpersonation)
+ [步驟 3：建立 HDFS 使用者主目錄](#Step3-UserImpersonation)

## 步驟 1：設定 Livy
<a name="Step1-UserImpersonation"></a>

當您建立叢集來啟用 Livy 使用者模擬時，您就會用到 `livy-conf` 和 `core-site` 組態分類，如下面範例所示。將組態分類儲存為 JSON，然後在建立叢集時參考該檔案，或是指定內嵌的組態分類。如需詳細資訊，請參閱[設定應用程式](emr-configure-apps.md)。

```
[
  {
    "Classification": "livy-conf",
    "Properties": {
      "livy.impersonation.enabled": "true"
    }
  },
  {
    "Classification": "core-site",
    "Properties": {
      "hadoop.proxyuser.livy.groups": "*",
      "hadoop.proxyuser.livy.hosts": "*"
    }
  }
]
```

## 步驟 2：新增使用者
<a name="Step2-UserImpersonation"></a>

使用 PAM 或 LDAP 以新增 JupyterHub 使用者。如需詳細資訊，請參閱[使用 PAM 身分驗證](emr-jupyterhub-pam-users.md)及[使用 LDAP 身分驗證](emr-jupyterhub-ldap-users.md)。

## 步驟 3：建立 HDFS 使用者主目錄
<a name="Step3-UserImpersonation"></a>

您已連接到主節點，並可建立使用者。在持續與主節點連線情況下複製下面內容，並將其存成指令碼檔案。這個指令碼會為主節點上的每位 JupyterHub 使用者建立 HDFS 主目錄。這段指令碼會假設您使用預設的管理員使用者 ID *jovyan*。

```
#!/bin/bash

CURL="curl --silent -k"
HOST=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname)

admin_token() {
    local user=jovyan
    local pwd=jupyter
    local token=$($CURL https://$HOST:9443/hub/api/authorizations/token \
        -d "{\"username\":\"$user\", \"password\":\"$pwd\"}" | jq ".token")
    if [[ $token != null ]]; then
        token=$(echo $token | sed 's/"//g')
    else
        echo "Unable to get Jupyter API Token."
        exit 1
    fi
    echo $token
}

# Get Jupyter Admin token
token=$(admin_token)

# Get list of Jupyter users
users=$(curl -XGET -s -k https://$HOST:9443/hub/api/users \
 -H "Authorization: token $token" | jq '.[].name' | sed 's/"//g')

# Create HDFS home dir 
for user in ${users[@]}; 
do
 echo "Create hdfs home dir for $user"
 hadoop fs -mkdir /user/$user
 hadoop fs -chmod 777 /user/$user
done
```

# 安裝其他核心和程式庫
<a name="emr-jupyterhub-install-kernels-libs"></a>

當您在 Amazon EMR 上建立具有 JupyterHub 的叢集時，Docker 容器上會安裝預設的適用於 Jupyter 的 Python 3 核心，以及適用於 Sparkmagic 的 PySpark 和 Spark 核心。您可以安裝額外的核心。您也可以安裝其他程式庫和套件，然後將其匯入適當的 shell。

## 安裝核心
<a name="emr-jupyterhub-install-kernels"></a>

核心安裝在 Docker 容器中。要達成這件事最簡單的方法，就是以安裝命令來建立 bash 指令碼並將它儲存到主節點，然後使用 `sudo docker exec jupyterhub script_name` 命令，以在 `jupyterhub` 容器中執行指令碼。以下範例指令碼會安裝核心，然後在主節點的核心上安裝幾個程式庫，如此您便可在之後使用 Jupyter 的核心匯入該程式庫。

```
#!/bin/bash

# Install Python 2 kernel
conda create -n py27 python=2.7 anaconda
source /opt/conda/envs/py27/bin/activate
apt-get update
apt-get install -y gcc
/opt/conda/envs/py27/bin/python -m pip install --upgrade ipykernel
/opt/conda/envs/py27/bin/python -m ipykernel install

# Install libraries for Python 2
/opt/conda/envs/py27/bin/pip install paramiko nltk scipy numpy scikit-learn pandas
```

要在容器中安裝核心和程式庫，請開啟連線到主節點的終端機，將指令碼儲存在 `/etc/jupyter/install_kernels.sh`，並在主節點命令列上執行下列命令：

```
sudo docker exec jupyterhub bash /etc/jupyter/install_kernels.sh
```

## 使用程式庫並安裝其他程式庫
<a name="emr-jupyterhub-install-libs"></a>

機器學習的核心組合和適用於 Python 3 的資料科學程式庫，已經連同 JupyterHub 預先安裝於 Amazon EMR 上。您可以使用 `sudo docker exec jupyterhub bash -c "conda list" ` 與 `sudo docker exec jupyterhub bash -c "pip freeze"`。

如果 Spark 任務需要工作者節點上的程式庫，我們建議您使用引導操作來執行程式碼，以在建立叢集時安裝程式庫。叢集建立過程中，在所有叢集節點上執行的引導操作可簡化安裝。在叢集開始執行後，如果您在核心/工作者節點上安裝程式庫，該操作將更顯複雜。我們在此章節提供一個範例的 Python 計畫，說明如何安裝這些程式庫。

在此章節中顯示的引導操作和 Python 計畫範例，會在所有節點上使用儲存至 Amazon S3 的 bash 程式碼以安裝該程式庫。

下列範例中參考的指令碼使用 `pip` 安裝 paramiko、nltk、scipy、scikit-learn 以及適用於 Python 3 核心的 pandas：

```
#!/bin/bash

sudo python3 -m pip install boto3 paramiko nltk scipy scikit-learn pandas
```

在您建立指令碼之後，請將其上傳至 Amazon S3 中的位置，例如 `s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh`。如需詳細資訊，請參閱《Amazon Simple Storage Service 使用者指南》**中的[上傳物件](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html)，以便您可以在引導操作或 Python 程式中使用此物件。

**使用 建立叢集時，指定在所有節點上安裝程式庫的引導操作 AWS CLI**

1. 建立類似先前範例的指令碼，並將它儲存至 Amazon S3 中的位置。我們使用範例 `s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh`。

1. 使用 JupyterHub 來建立叢集，然後利用 `--bootstrap-actions` 選項的 `Path` 引數，來指定指令碼的位置，如下列範例所示：
**注意**  
包含 Linux 行接續字元 (\$1) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

   ```
   aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
   --applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
   --use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair \
   --bootstrap-actions Path=s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh,Name=InstallJupyterLibs
   ```

**若要在使用主控台建立叢集時指定在所有節點上安裝程式庫的引導操作**

1. 導覽至新的 Amazon EMR 主控台，然後從側邊導覽選取**切換至舊主控台**。如需有關切換至舊主控台時預期情況的詳細資訊，請參閱[使用舊主控台](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in)。

1. 選擇 **Create cluster (建立叢集)**，然後選擇 **Go to advanced options (前往進階選項)**。

1. 為您的應用程式指定適用的 **Software and Steps (軟體和步驟)** 和 **Hardware (硬體)** 設定。

1. 在 **General Cluster Settings (一般叢集設定)** 畫面中，展開 **Bootstrap Actions (引導操作)**。

1. 在 **Add bootstrap action (新增引導操作)** 中，選擇 **Custom action (自訂動作)**、**Configure and add (設定和新增)**。

1. 在 **Name (名稱)** 中，輸入易記的名稱。針對**指令碼位置**，輸入指令碼在 Amazon S3 中的位置 （我們使用的範例為 *s3：//amzn-s3-demo-bucket/install-my-jupyter-libraries.sh)。*讓 **Optional arguments (選擇性引數)** 留白，然後選擇 **Add (新增)**。

1. 為您的叢集指定其他設定，然後選擇 **Next (下一步)**。

1. 指定安全性設定，然後選擇 **Create cluster (建立叢集)**。

**Example 在執行中叢集的核心節點上安裝程式庫**  
在 Jupyter 中的主節點上安裝程式庫之後，您可以在執行中的核心節點上以各種方式安裝程式庫。以下範例顯示了本機電腦上執行的已撰寫 Python 計畫。當您在本機執行 Python 程式時，它會使用 `AWS-RunShellScript` AWS Systems Manager 執行範例指令碼，如本節稍早所示，在叢集的核心節點上安裝程式庫。  

```
import argparse
import time
import boto3


def install_libraries_on_core_nodes(cluster_id, script_path, emr_client, ssm_client):
    """
    Copies and runs a shell script on the core nodes in the cluster.

    :param cluster_id: The ID of the cluster.
    :param script_path: The path to the script, typically an Amazon S3 object URL.
    :param emr_client: The Boto3 Amazon EMR client.
    :param ssm_client: The Boto3 AWS Systems Manager client.
    """
    core_nodes = emr_client.list_instances(
        ClusterId=cluster_id, InstanceGroupTypes=["CORE"]
    )["Instances"]
    core_instance_ids = [node["Ec2InstanceId"] for node in core_nodes]
    print(f"Found core instances: {core_instance_ids}.")

    commands = [
        # Copy the shell script from Amazon S3 to each node instance.
        f"aws s3 cp {script_path} /home/hadoop",
        # Run the shell script to install libraries on each node instance.
        "bash /home/hadoop/install_libraries.sh",
    ]
    for command in commands:
        print(f"Sending '{command}' to core instances...")
        command_id = ssm_client.send_command(
            InstanceIds=core_instance_ids,
            DocumentName="AWS-RunShellScript",
            Parameters={"commands": [command]},
            TimeoutSeconds=3600,
        )["Command"]["CommandId"]
        while True:
            # Verify the previous step succeeded before running the next step.
            cmd_result = ssm_client.list_commands(CommandId=command_id)["Commands"][0]
            if cmd_result["StatusDetails"] == "Success":
                print(f"Command succeeded.")
                break
            elif cmd_result["StatusDetails"] in ["Pending", "InProgress"]:
                print(f"Command status is {cmd_result['StatusDetails']}, waiting...")
                time.sleep(10)
            else:
                print(f"Command status is {cmd_result['StatusDetails']}, quitting.")
                raise RuntimeError(
                    f"Command {command} failed to run. "
                    f"Details: {cmd_result['StatusDetails']}"
                )


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("cluster_id", help="The ID of the cluster.")
    parser.add_argument("script_path", help="The path to the script in Amazon S3.")
    args = parser.parse_args()

    emr_client = boto3.client("emr")
    ssm_client = boto3.client("ssm")

    install_libraries_on_core_nodes(
        args.cluster_id, args.script_path, emr_client, ssm_client
    )


if __name__ == "__main__":
    main()
```

# JupyterHub 版本歷史記錄
<a name="JupyterHub-release-history"></a>

下表列出 Amazon EMR 的每個發行版本中包含的 JupyterHub 版本，以及與應用程式一起搭配安裝的元件。如需每個發行版本中的元件版本，請參閱 [Amazon EMR 7.x 發行版本](emr-release-7x.md)、[Amazon EMR 6.x 發行版本](emr-release-6x.md) 或 [Amazon EMR 5.x 發行版本](emr-release-5x.md) 中適用於您發行版本的「元件版本」一節。


**JupyterHub 版本資訊**  

| Amazon EMR 發行標籤 | JupyterHub 版本 | 與 JupyterHub 一起搭配安裝的元件 | 
| --- | --- | --- | 
| emr-7.12.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.11.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.10.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.9.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.8.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.7.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.6.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.5.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.4.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.3.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.2.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.2 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.1.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.0.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.15.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.14.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.13.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.12.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.11.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.11.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.10.1 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.10.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.9.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.9.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.8.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.8.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.7.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.6.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.35.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.5.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.4.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.3.1 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.3.0 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.2.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.2.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.1.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.1.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.0.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.0.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.34.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.33.1 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.33.0 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.32.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.32.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.31.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.31.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.2 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.29.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.28.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.28.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.27.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.27.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.26.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.25.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.24.1 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.24.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.23.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.23.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.22.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.2 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.20.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.20.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.19.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.19.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.18.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.18.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.2 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.16.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.16.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.15.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.15.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.2 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 