

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# JupyterHub
<a name="emr-jupyterhub"></a>

[Jupyter Notebook](https://jupyter.org/) 是一款开源 Web 应用程序，可用于创建和共享包含实时代码、方程式、可视化效果和叙述文本的文档。 [JupyterHub](https://jupyterhub.readthedocs.io/en/latest/)允许您托管单用户 Jupyter 笔记本服务器的多个实例。当您使用创建集群时 JupyterHub，Amazon EMR 会在集群的主节点上创建一个 Docker 容器。 JupyterHub、Jupyter 和 [Sparkmagic](https://github.com/jupyter-incubator/sparkmagic/blob/master/README.md) 所需的所有组件都在容器内运行。

Sparkmagic 是内核库，内核允许 Jupyter notebook 通过 [Apache Livy](emr-livy.md)（适用于 Spark 的 REST 服务器）与在 Amazon EMR 上运行的 [Apache Spark](https://aws.amazon.com/big-data/what-is-spark/) 通信。使用 JupyterHub 创建集群时，将自动安装 Spark 和 Apache Livy。Jupyter 的默认 Python 3 内核与 Sparkmagic 提供的 PySpark 3 PySpark、和 Spark 内核一起可用。通过使用 Python 和 Scala，可以使用这些内核运行临时 Spark 代码和交互式 SQL 查询。可以在 Docker 容器内手动安装其它内核。有关更多信息，请参阅 [安装其它内核和库](emr-jupyterhub-install-kernels-libs.md)。

下图描述了 Amazon EMR JupyterHub 上的组件以及适用于笔记本用户和管理员的相应身份验证方法。有关更多信息，请参阅 [添加 Jupyter notebook 用户和管理员](emr-jupyterhub-user-access.md)。

![\[JupyterHub architecture on EMR showing user authentication and component interactions.\]](http://docs.aws.amazon.com/zh_cn/emr/latest/ReleaseGuide/images/jupyter-arch.png)


下表列出了最新版本的 Amazon EMR 7.x 系列中 JupyterHub 包含的版本以及与之一起安装的 Amazon EMR 的组件。 JupyterHub

有关此版本 JupyterHub 中安装的组件的版本，请参阅 [7.12.0 版组件版本](emr-7120-release.md)。


**JupyterHub emr-7.12.0 的版本信息**  

| Amazon EMR 发行版标签 | JupyterHub 版本 | 安装在一起的组件 JupyterHub | 
| --- | --- | --- | 
| emr-7.12.0 | JupyterHub 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

下表列出了最新版本的 Amazon EMR 6.x 系列中 JupyterHub 包含的版本以及与之一起安装的 Amazon EMR 的组件。 JupyterHub

有关此版本 JupyterHub 中安装的组件的版本，请参阅 [6.15.0 版组件版本](emr-6150-release.md)。


**JupyterHub emr-6.15.0 的版本信息**  

| Amazon EMR 发行版标签 | JupyterHub 版本 | 安装在一起的组件 JupyterHub | 
| --- | --- | --- | 
| emr-6.15.0 | JupyterHub 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

下表列出了最新版本的 Amazon EMR 5.x 系列中 JupyterHub 包含的版本，以及 Amazon EMR 随之安装的组件。 JupyterHub

有关此版本 JupyterHub 中安装的组件的版本，请参阅 [5.36.2 版组件版本](emr-5362-release.md)。


**JupyterHub emr-5.36.2 的版本信息**  

| Amazon EMR 发行版标签 | JupyterHub 版本 | 安装在一起的组件 JupyterHub | 
| --- | --- | --- | 
| emr-5.36.2 | JupyterHub 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 

亚马逊 EMR 中包含的 Python 3 内核是 3.6.4。 JupyterHub 

在 Amazon EMR 版本和 Amazon EC2 AMI 版本之间，`jupyterhub` 容器内安装的库可能不同。

**使用 `conda` 列出已安装的库**
+ 在主节点命令行上运行以下命令：

  ```
  sudo docker exec jupyterhub bash -c "conda list"
  ```

**使用 `pip` 列出已安装的库**
+ 在主节点命令行上运行以下命令：

  ```
  sudo docker exec jupyterhub bash -c "pip freeze"
  ```

**Topics**
+ [使用创建集群 JupyterHub](emr-jupyterhub-launch.md)
+ [JupyterHub 在 Amazon EMR 上使用时的注意事项](emr-jupyterhub-considerations.md)
+ [正在配置 JupyterHub](emr-jupyterhub-configure.md)
+ [在 Amazon S3 中配置 Notebook 的持久性](emr-jupyterhub-s3.md)
+ [连接到主节点和 Notebook 服务器](emr-jupyterhub-connect.md)
+ [JupyterHub 配置和管理](emr-jupyterhub-administer.md)
+ [添加 Jupyter notebook 用户和管理员](emr-jupyterhub-user-access.md)
+ [安装其它内核和库](emr-jupyterhub-install-kernels-libs.md)
+ [JupyterHub 发布历史](JupyterHub-release-history.md)

# 使用创建集群 JupyterHub
<a name="emr-jupyterhub-launch"></a>

您可以 JupyterHub 使用 AWS 管理控制台、 AWS Command Line Interface或 Amazon EMR API 创建 Amazon EMR 集群。确保不使用在完成步骤后自动终止的选项（ AWS CLI中的 `--auto-terminate` 选项）创建此集群。此外，确保管理员和 Notebook 用户可以访问创建集群时使用的密钥对。有关更多信息，请参阅《Amazon EMR 管理指南》**中的[对 SSH 凭证使用密钥对](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-ssh.html)。

## JupyterHub 使用控制台创建集群
<a name="emr-jupyterhub-launch-console"></a>

使用以下步骤在 Amazon EMR 控制台中使用**高级选项**创建 JupyterHub 已安装的集群。

**使用亚马逊 EMR 控制台创建 JupyterHub 已安装的 Amazon EMR 集群**

1. 导航到 Amazon EMR 新控制台，然后从侧面导航栏中选择**切换到旧控制台**。有关切换到旧控制台后预期情况的更多信息，请参阅 [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in)。

1. 依次选择 **Create cluster (创建集群)**、**Go to advanced options (转到高级选项)**。

1. 在 **Software Configuration (软件配置)** 下：
   + 对于 “**版本**”，选择 emr-5.36.2，然后选择。 JupyterHub
   + 如果你使用 Spark，要使用 AWS Glue 数据目录作为 Spark SQL 的元数据库，请选择 “**用于 Spark 表元数据**”。有关更多信息，请参阅 [在亚马逊 EMR 上使用 AWS Glue 数据目录和 Spark](emr-spark-glue.md)。
   + 对于 **Edit software settings (编辑软件设置)**，请选择 **Enter configuration (输入配置)** 并指定值，或选择 **Load JSON from S3 (从 S3 加载 JSON)** 并指定 JSON 配置文件。有关更多信息，请参阅[正在配置 JupyterHub](emr-jupyterhub-configure.md)。

1. 在 **Add steps (添加步骤) (可选)** 下，配置创建集群后要运行的步骤，确保 **Auto-terminate cluster after the last step is completed (完成最后的步骤后，自动终止集群)** 未选中，然后选择 **Next (下一步)**。

1. 选择 **Hardware Configuration (硬件配置)** 选项、**Next (下一步)**。有关更多信息，请参阅《Amazon EMR 管理指南》**中的[配置集群硬件和联网](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances.html)。

1. 选择 **General Cluster Settings (常规集群设置)** 和 **Next (下一步)** 选项。

1. 选择 **Security Options (安全选项)** 以指定密钥对，然后选择 **Create Cluster (创建集群)**。

## JupyterHub 使用创建集群 AWS CLI
<a name="emr-jupyterhub-launch-cli"></a>

要使用启动集群 JupyterHub，请使用`aws emr create-cluster`命令，并在`--applications`选项中指定`Name=JupyterHub`。以下示例在 Amazon EMR 上启动具有两个 EC2 实例（一个主实例和一个核心实例）的 JupyterHub集群。此外，已启用调试，日志存储在 `--log-uri` 所指定的 Amazon S3 位置中。指定密钥对提供对集群中 Amazon EC2 实例的访问权限。

**注意**  
为了便于读取，包含 Linux 行继续符（\$1）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

```
aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
--applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
--use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair
```

# JupyterHub 在 Amazon EMR 上使用时的注意事项
<a name="emr-jupyterhub-considerations"></a>

在 Amazon EMR JupyterHub 上使用时，请考虑以下几点。
+ 
**警告**  
用户 Notebook 和文件将保存到主节点上的文件系统中。这是短暂存储，在集群终止后将不复存在。集群终止后，此数据如果未备份的化将丢失。建议使用 `cron` 作业或其它适用于应用程序的方式安排定期备份。  
此外，如果容器重启，在容器内进行的配置更改可能不复存在。建议为容器配置编写脚本或以其它方式实现容器配置的自动化，以便可以更轻松地重现自定义。
+ 不支持使用 Amazon EMR 安全配置设置的 Kerberos 身份验证。
+ 不支持 [OAuthenticator](https://github.com/jupyterhub/oauthenticator)。

# 正在配置 JupyterHub
<a name="emr-jupyterhub-configure"></a>

您可以通过连接到集群主节点并编辑配置文件来自定义 Amaz JupyterHub on EMR 和个人用户笔记本的配置。在更改值之后，重启 `jupyterhub` 容器。

修改以下文件中的属性以配置 JupyterHub 和各个 Jupyter 笔记本：
+ `jupyterhub_config.py`：默认情况下，此文件保存在主节点上的 `/etc/jupyter/conf/` 目录中。有关更多信息，请参阅 JupyterHub 文档中的[配置基础知识](http://jupyterhub.readthedocs.io/en/latest/getting-started/config-basics.html)。
+ `jupyter_notebook_config.py`：默认情况下，该文件保存在 `/etc/jupyter/` 目录中，并作为默认值复制到 `jupyterhub` 容器中。有关更多信息，请参阅 Jupyter notebook 文档中的[配置文件和命令行选项](https://jupyter-notebook.readthedocs.io/en/5.7.4/config.html)。

您也可以使用 `jupyter-sparkmagic-conf` 配置分类自定义 Sparkmagic，这会更新 Sparkmagic 的 `config.json` 文件中的值。有关可用设置的更多信息，请参阅上的 e [xample\$1c](https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json) onfig.json。 GitHub有关在 Amazon EMR 中对应用程序使用配置分类的更多信息，请参阅[配置应用程序](emr-configure-apps.md)。

以下示例使用启动集群 AWS CLI，引用了 Sparkmagic 配置分类设置`MyJupyterConfig.json`的文件。

**注意**  
为了便于读取，包含 Linux 行继续符（\$1）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

```
aws emr create-cluster --use-default-roles --release-label emr-5.14.0 \
--applications Name=Jupyter --instance-type m4.xlarge --instance-count 3 \
--ec2-attributes KeyName=MyKey,SubnetId=subnet-1234a5b6 --configurations file://MyJupyterConfig.json
```

`MyJupyterConfig.json` 的示例内容如下所示：

```
[
    {
    "Classification":"jupyter-sparkmagic-conf",
    "Properties": {
      "kernel_python_credentials" : "{\"username\":\"diego\",\"base64_password\":\"mypass\",\"url\":\"http:\/\/localhost:8998\",\"auth\":\"None\"}"
      }
    }
]
```

**注意**  
对于 Amazon EMR 5.21.0 及更高版本，您可以覆盖集群配置，并为运行的集群中的每个实例组指定额外的配置分类。您可以使用 Amazon EMR 控制台、 AWS Command Line Interface (AWS CLI) 或软件开发工具包来完成此操作。 AWS 有关更多信息，请参阅[为运行的集群中的实例组提供配置](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html)。

# 在 Amazon S3 中配置 Notebook 的持久性
<a name="emr-jupyterhub-s3"></a>

您可以在 Amazon EMR 中配置 JupyterHub 集群，以便用户保存的笔记本保留在 Amazon S3 中，而不是集群 EC2 实例上的临时存储空间。

在创建集群时使用 `jupyter-s3-conf` 配置分类指定 Amazon S3 持久性。有关更多信息，请参阅[配置应用程序](emr-configure-apps.md)。

除了使用 `s3.persistence.enabled` 属性启用 Amazon S3 持久性之外，还请使用 `s3.persistence.bucket` 属性在保存 Notebook 的 Amazon S3 中指定存储桶。每个用户的 Notebook 将保存到指定存储桶中的 `jupyter/jupyterhub-user-name` 文件夹。该存储桶必须已存在于 Amazon S3 中，并且您在创建集群时指定的 EC2 实例配置文件的角色必须对此存储桶具有权限（默认情况下，此角色为 `EMR_EC2_DefaultRole`）。有关更多信息，请参阅[配置 IAM 角色以获得 Amazon EMR 服务的权限。 AWS](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html)

当您使用相同的配置分类属性启动新集群时，用户可以打开内容来自已保存位置的 Notebook。

请注意，当您启用了 Amazon S3 后，如果将文件作为模块导入到 Notebook 中，这会导致文件上载到 Amazon S3。当您在不启用 Amazon S3 持久性的情况下导入文件时，它们会上传到您的 JupyterHub 容器中。

以下示例启用 Amazon S3 持久性。用户保存的 Notebook 保存在每个用户的 `s3://MyJupyterBackups/jupyter/jupyterhub-user-name` 文件夹中，其中 `jupyterhub-user-name` 是一个用户名（如 `diego`）。

```
[
    {
        "Classification": "jupyter-s3-conf",
        "Properties": {
            "s3.persistence.enabled": "true",
            "s3.persistence.bucket": "MyJupyterBackups"
        }
    }
]
```

# 连接到主节点和 Notebook 服务器
<a name="emr-jupyterhub-connect"></a>

JupyterHub 管理员和笔记本用户必须使用 SSH 隧道连接到群集主节点，然后连接到主节点 JupyterHub 上提供服务的 Web 界面。有关配置 SSH 隧道和使用此隧道代理 Web 连接的更多信息，请参阅《Amazon EMR 管理指南》**中的[连接到集群](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node.html)。

默认情况下， JupyterHub 在 Amazon 上，EMR 可通过主节点上的**端口 9443** 获得。内部 JupyterHub 代理还通过端口 9443 为笔记本实例提供服务。 JupyterHub 而且 Jupyter 的 Web 界面可以使用具有以下模式的 URL 进行访问：

****https: //: 9443 *MasterNodeDNS*****

可以使用 `c.JupyterHub.port` 文件中的 `jupyterhub_config.py` 属性指定不同的端口。有关更多信息，请参阅 JupyterHub 文档中的[网络基础知识](http://jupyterhub.readthedocs.io/en/latest/getting-started/networking-basics.html)。

默认情况下， JupyterHub 在亚马逊上，EMR 使用自签名证书通过 HTTPS 进行 SSL 加密。用户连接时，系统将提示用户信任自签名凭证。可以使用自己的受信任凭证和密钥。将主节点上 `server.crt` 目录中的默认凭证文件 `server.key` 和密钥文件 `/etc/jupyter/conf/` 更换为自己的凭证和密钥文件。使用 `c.JupyterHub.ssl_key` 文件中的 `c.JupyterHub.ssl_cert` 和 `jupyterhub_config.py` 属性指定 SSL 材料。有关更多信息，请参阅 JupyterHub 文档中的[安全设置](https://jupyterhub.readthedocs.io/en/latest/tutorial/getting-started/security-basics.html)。在更新 `jupyterhub_config.py` 之后，重启容器。

# JupyterHub 配置和管理
<a name="emr-jupyterhub-administer"></a>

JupyterHub 相关组件在运行 Ubuntu 操作系统的名为 `jupyterhub` Docker 容器中运行。有多种方法可用于管理此容器内运行的组件。

**警告**  
在此容器内执行的自定义将在此容器重启后不复存在。建议为容器配置编写脚本或以其它方式实现容器配置的自动化，以便可以更轻松地重现自定义。

## 使用命令行管理
<a name="emr-jupyterhub-administer-cli"></a>

当使用 SSH 连接到主节点后，可以通过使用 Docker 命令行界面 (CLI) 并按名称 (`jupyterhub`) 或 ID 指定容器来发出命令。例如，`sudo docker exec jupyterhub command` 将运行容器内运行的操作系统或应用程序识别的命令。可以使用此方法将用户添加到操作系统和在 Docker 容器内安装其它应用程序和库。例如，默认容器映像包括用于安装软件包的 Conda，因此可能在主节点命令行上运行以下命令以在容器内安装应用程序 Keras：

```
sudo docker exec jupyterhub conda install keras
```

## 通过提交步骤管理
<a name="emr-jupyterhub-administer-steps"></a>

步骤是将工作提交到集群的一种方式。可以在启动集群时提交步骤，也可以将步骤提交给正在运行的集群。可以使用 `command-runner.jar` 将在命令行上运行的命令作为步骤提交。有关更多信息，请参阅《Amazon EMR 管理指南》**中的[使用 CLI 和控制台执行步骤](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-work-with-steps.html)，以及[在 Amazon EMR 集群上运行命令和脚本](emr-commandrunner.md)。

例如，您可以在本地计算机上使用以下 AWS CLI 命令来安装 Keras，方法与在前面的示例中通过主节点的命令行所做的方式相同：

```
aws emr add-steps --cluster-id MyClusterID --steps Name="Command Runner",Jar="command-runner.jar",Args="/usr/bin/sudo","/usr/bin/docker","exec","jupyterhub","conda","install","keras"
```

此外，可以为步骤的序列编写脚本，并将脚本上载到 Amazon S3，然后在创建集群或将脚本作为步骤添加时使用 `script-runner.jar` 运行脚本。有关更多信息，请参阅[在 Amazon EMR 集群上运行命令和脚本](emr-commandrunner.md)。有关示例，请参阅[示例：用于添加多个用户的清除脚本](emr-jupyterhub-pam-users.md#emr-jupyterhub-script-multuser)。

## 使用 REST 进行管理 APIs
<a name="emr-jupyterhub-administer-rest"></a>

Jupyter 和的 HTTP 代理 JupyterHub 提供了 REST APIs ，你可以用它来发送请求。 JupyterHub要向发送请求 JupyterHub，您必须在请求中传递一个 API 令牌。可以从主节点命令行使用 `curl` 命令执行 REST 命令。有关更多信息，请参阅以下资源：
+ 在@@ [ JupyterHub的文档中使用的 REST API](http://jupyterhub.readthedocs.io/en/latest/reference/rest.html) JupyterHub，其中包括生成 API 令牌的说明
+ [Jupyter 笔记本服务器](https://github.com/jupyter/jupyter/wiki/Jupyter-Notebook-Server-API) API 已开启 GitHub
+ [configurable-http-proxy](https://github.com/jupyterhub/configurable-http-proxy)on GitHub

以下示例演示了如何使用的 REST API JupyterHub 来获取用户列表。该命令传递先前生成的管理员令牌，并使用默认端口 9443，将输出通过管道传输到 [jq 以便于查看： JupyterHub](https://stedolan.github.io/jq/)

```
curl -XGET -s -k https://$HOST:9443/hub/api/users \
-H "Authorization: token $admin_token" | jq .
```

# 添加 Jupyter notebook 用户和管理员
<a name="emr-jupyterhub-user-access"></a>

您可以使用两种方法中的一种让用户进行身份验证， JupyterHub 以便他们可以创建笔记本并可以选择进行管理 JupyterHub。最简单的方法是使用 JupyterHub可插拔身份验证模块 (PAM)。此外，在亚马逊 JupyterHub 上，EMR 支持 [LDAP 身份验证器插件， JupyterHub用于](https://github.com/jupyterhub/ldapauthenticator/)从 LDAP 服务器（例如微软 Active Directory 服务器）获取用户身份。此部分提供了通过每种身份验证方法添加用户的说明和示例。

JupyterHub 在 Amazon 上，EMR 有一个具有管理员权限的默认用户。此用户名为 `jovyan`，密码为 `jupyter`。强烈建议将此用户替换为另一个具有管理员权限的用户。您可以在创建集群时使用一个步骤以执行该操作，也可以在集群运行时连接到主节点以执行该操作。

**Topics**
+ [使用 PAM 身份验证](emr-jupyterhub-pam-users.md)
+ [使用 LDAP 身份验证](emr-jupyterhub-ldap-users.md)
+ [用户模拟](emr-jupyterhub-user-impersonation.md)

# 使用 PAM 身份验证
<a name="emr-jupyterhub-pam-users"></a>

在 Amazon EMR JupyterHub 上创建 PAM 用户是一个分为两个步骤的过程。第一步是，将用户添加到在主节点上的 `jupyterhub` 容器中运行的操作系统，以及为每个用户添加一个相应的用户主目录。第二步是将这些操作系统用户添加为 JupyterHub用户，此过程称为白名单。 JupyterHub添加 JupyterHub 用户后，他们可以连接到 JupyterHub URL 并提供其操作系统凭据进行访问。

当用户登录时，会 JupyterHub 打开该用户的笔记本服务器实例，该实例保存在主节点上用户的主目录中，即`/var/lib/jupyter/home/username`。如果笔记本服务器实例不存在，则会在用户的主目录中 JupyterHub 生成一个笔记本实例。以下各节演示如何将用户单独添加到操作系统和操作系统中 JupyterHub，接下来是添加多个用户的基本 bash 脚本。

## 将操作系统用户添加到容器
<a name="emr-jupyterhub-system-user"></a>

以下示例先在容器内使用 [useradd](https://linux.die.net/man/8/useradd) 命令添加单个用户 diego 并为该用户创建一个主目录。第二个命令使用 [chpasswd](https://linux.die.net/man/8/chpasswd) 为此用户设置密码 diego。在使用 SSH 连接时，命令将在主节点命令行上运行。还可以使用步骤运行这些命令，如之前的[通过提交步骤管理](emr-jupyterhub-administer.md#emr-jupyterhub-administer-steps)中所述。

```
sudo docker exec jupyterhub useradd -m -s /bin/bash -N diego
sudo docker exec jupyterhub bash -c "echo diego:diego | chpasswd"
```

## 添加 JupyterHub 用户
<a name="emr-jupyterhub-jupyterhub-user"></a>

您可以使用中的**管理**面板 JupyterHub 或 REST API 来添加用户和管理员，或者只添加用户。

**使用中的管理面板添加用户和管理员 JupyterHub**

1. 使用 SSH 连接到主节点，然后使用*MasterNodeDNS*具有管理员权限的身份登录 https: //: 9443。

1. 选择 **Control Panel (控制面板)**、**Admin (管理员)**。

1. 选择 **User (用户)**、**Add Users (添加用户)**，或选择 **Admin (管理员)**、**Add Admins (添加管理员)**。

**使用 REST API 添加用户**

1. 使用 SSH 连接到主节点并在主节点上使用以下命令，或将此命令作为步骤运行。

1. 获取管理令牌以发出 API 请求，然后*AdminToken*在以下步骤中使用该令牌替换。

1. 使用以下命令，*UserName*替换为在容器中创建的操作系统用户。

   ```
   curl -XPOST -H "Authorization: token AdminToken" "https://$(hostname):9443/hub/api/users/UserName
   ```

**注意**  
首次登录 JupyterHub Web 界面时，系统会自动将您添加为 JupyterHub 非管理员用户。

## 示例：用于添加多个用户的清除脚本
<a name="emr-jupyterhub-script-multuser"></a>

以下示例 bash 脚本结合了本节中前面的步骤，创建了多个 JupyterHub 用户。此脚本可以直接在主节点上运行，也可上载到 Amazon S3 并在之后作为步骤运行。

此脚本先建立一组用户名，并使用 `jupyterhub token` 命令为默认管理员 jovyan 创建一个 API 令牌。然后，它在 `jupyterhub` 容器中为每个用户创建一个操作系统用户，以为每个用户分配一个与其用户名相同的初始密码。最后，它调用 REST API 操作在中创建每个用户 JupyterHub。它在脚本中传递之前生成的令牌并将 REST 响应传输到 `jq` 以方便查看。

```
# Bulk add users to container and JupyterHub with temp password of username
set -x
USERS=(shirley diego ana richard li john mary anaya)
TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
for i in "${USERS[@]}"; 
do 
   sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i
   sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd"
   curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \
 -H "Authorization: token $TOKEN" | jq
done
```

将此脚本保存到 Amazon S3 中的位置（如 `s3://amzn-s3-demo-bucket/createjupyterusers.sh`）。然后，可以使用 `script-runner.jar` 将此脚本作为步骤运行。

### 示例：创建集群时运行脚本（AWS CLI）
<a name="emr-jupyterhub-multuser-createcluster"></a>

**注意**  
为了便于读取，包含 Linux 行继续符（\$1）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

```
aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
--applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
--use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

### 在现有集群上运行脚本（AWS CLI）
<a name="emr-jupyterhub-multuser-runningcluster"></a>

**注意**  
为了便于读取，包含 Linux 行继续符（\$1）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

```
aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=CUSTOM_JAR,\
Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

# 使用 LDAP 身份验证
<a name="emr-jupyterhub-ldap-users"></a>

轻型目录访问协议 (LDAP) 是一种应用程序协议，用于查询和修改与 LDAP 兼容目录服务提供程序（如 Active Directory 或 OpenLDAP server）中存储的资源（如用户和计算机）对应的对象。您可以在 Amazon EMR JupyterHub 上使用的 [LDAP 身份验证器插件](https://github.com/jupyterhub/ldapauthenticator/)使用 LDAP 进行用户身份验证。 JupyterHub此插件处理 LDAP 用户的登录会话并为 Jupyter 提供用户信息。这允许用户使用存储在 JupyterHub 兼容 LDAP 的服务器中的身份凭证来连接和笔记本电脑。

本节中的步骤将引导您完成以下步骤，使用的 LDAP 身份验证器插件设置和启用 LDAP。 JupyterHub在连接到主节点命令行时执行这些步骤。有关更多信息，请参阅[连接到主节点和 Notebook 服务器](emr-jupyterhub-connect.md)。

1. 创建一个包含 LDAP 服务器相关信息（如主机 IP 地址、端口、绑定名称等）的 LDAP 配置文件。

1. 修改 `/etc/jupyter/conf/jupyterhub_config.py` 以启用适用于 JupyterHub 的 LDAP Authenticator 插件。

1. 创建并运行在 `jupyterhub` 容器内配置 LDAP 的脚本。

1. 查询用户的 LDAP，然后在容器中为每个用户创建主目录。 JupyterHub 需要主目录来托管笔记本电脑。

1. 运行可重新启动的脚本 JupyterHub

**重要**  
在设置 LDAP 之前，测试网络基础设施以确保 LDAP 服务器和集群主节点可以根据需要进行通信。TLS 一般通过普通 TCP 连接使用端口 389。如果 LDAP 连接使用 SSL，那么显而易见，适用于 SSL 的 TCP 端口为 636。

## 创建 LDAP 配置文件
<a name="emr-jupyterhub-ldap-config"></a>

下方的示例使用以下占位符配置值。将这些值替换为与您的实施匹配的参数。
+ 正在运行的是 LDAP 服务器版本 3，通过端口 389 提供。这是适用于 LDAP 的标准非 SSL 端口。
+ 基本可分辨名称 (DN) 为 `dc=example, dc=org`。

使用文本编辑器创建内容与下类似的 [ldap.conf](http://manpages.ubuntu.com/manpages/bionic/man5/ldap.conf.5.html) 文件。使用适用于 LDAP 实施的值。*host*替换为 LDAP 服务器的 IP 地址或可解析的主机名。

```
base dc=example,dc=org
uri ldap://host
ldap_version 3
binddn cn=admin,dc=example,dc=org
bindpw admin
```

## 启用 LDAP 身份验证器插件 JupyterHub
<a name="emr-jupyterhub-ldap-plugin"></a>

使用文本编辑器修改 `/etc/jupyter/conf/jupyterhub_config.py` 文件并添加与下类似的 [ldapauthenticator](https://github.com/jupyterhub/ldapauthenticator) 属性。*host*替换为 LDAP 服务器的 IP 地址或可解析的主机名。该示例假设用户对象位于名为的组织单位 (ou) 中*people*，并使用您之前使用建立的可分辨名称组件`ldap.conf`。

```
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
c.LDAPAuthenticator.use_ssl = False
c.LDAPAuthenticator.server_address = 'host' 
c.LDAPAuthenticator.bind_dn_template = 'cn={username},ou=people,dc=example,dc=org'
```

## 在容器内配置 LDAP
<a name="emr-jupyterhub-ldap-container"></a>

使用文本编辑器创建包含以下内容的清除脚本：

```
#!/bin/bash

# Uncomment the following lines to install LDAP client libraries only if
# using Amazon EMR release version 5.14.0. Later versions install libraries by default.
# sudo docker exec jupyterhub bash -c "sudo apt-get update"
# sudo docker exec jupyterhub bash -c "sudo apt-get -y install libnss-ldap libpam-ldap ldap-utils nscd"
 
# Copy ldap.conf
sudo docker cp ldap.conf jupyterhub:/etc/ldap/
sudo docker exec jupyterhub bash -c "cat /etc/ldap/ldap.conf"
 
# configure nss switch
sudo docker exec jupyterhub bash -c "sed -i 's/\(^passwd.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^group.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^shadow.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "cat /etc/nsswitch.conf"
 
# configure PAM to create home directories
sudo docker exec jupyterhub bash -c "echo 'session required        pam_mkhomedir.so skel=/etc/skel umask=077' >> /etc/pam.d/common-session"
sudo docker exec jupyterhub bash -c "cat /etc/pam.d/common-session"
 
# restart nscd service
sudo docker exec jupyterhub bash -c "sudo service nscd restart"
 
# Test
sudo docker exec jupyterhub bash -c "getent passwd"

# Install ldap plugin
sudo docker exec jupyterhub bash -c "pip install jupyterhub-ldapauthenticator"
```

将脚本保存到主节点，然后从主节点命令行运行它。例如，对于另存为 `configure_ldap_client.sh` 的脚本，使此文件成为可执行文件：

```
chmod +x configure_ldap_client.sh
```

并运行此脚本：

```
./configure_ldap_client.sh
```

## 将属性添加到 Active Directory
<a name="emr-jupyterhub-ldap-adproperties"></a>

要查找每个用户并在数据库中创建相应的条目， JupyterHub docker 容器需要 Active Directory 中相应用户对象的以下 UNIX 属性。有关更多信息，请参阅 “*既然 Unix 属性插件不再适用于 Active Directory 用户和计算机 MM GID/UID C 管理单元，如何继续编辑 RFC 2307 属性*？ 在《[澄清 Windows Server 2016 技术预览版及以后的 Unix 身份管理 (IDMU) 和 NIS 服务器角色的状态》一文中](https://blogs.technet.microsoft.com/activedirectoryua/2016/02/09/identity-management-for-unix-idmu-is-deprecated-in-windows-server/)。
+ `homeDirectory`

  这是用户主目录的位置，通常是 `/home/username`。
+ `gidNumber`

  这是一个大于 60000 的值，尚未被其它用户使用。检查 `etc/passwd` 文件中是否有正在使用的 GID。
+ `uidNumber`

  这是一个大于 60000 的值，尚未被其它组使用。检查 `etc/group` 文件中是否有正在使用的 UID。
+ `uid`

  这与*username*.

## 创建用户主目录
<a name="emr-jupyterhub-ldap-directories"></a>

JupyterHub 需要容器内的主目录来对 LDAP 用户进行身份验证并存储实例数据。以下示例演示了 LDAP 目录中的两个用户 *shirley* 和 *diego*。

第一步是使用 [ldapsearch 向 LDAP 服务器查询每个用户的用户 ID 和组 ID 信息，如以下示例所示，*host*替换为 LD](http://manpages.ubuntu.com/manpages/xenial/man1/ldapsearch.1.html) AP 服务器的 IP 地址或可解析的主机名：

```
ldapsearch -x -H ldap://host \
 -D "cn=admin,dc=example,dc=org" \
 -w admin \
 -b "ou=people,dc=example,dc=org" \
 -s sub \
 "(objectclass=*)" uidNumber gidNumber
```

此 `ldapsearch` 命令将为 *shirley* 和 *diego* 用户返回看上去与下类似的 LDIF 格式的响应。

```
# extended LDIF

# LDAPv3
# base <ou=people,dc=example,dc=org> with scope subtree
# filter: (objectclass=*)
# requesting: uidNumber gidNumber sn 

# people, example.org
dn: ou=people,dc=example,dc=org

# diego, people, example.org
dn: cn=diego,ou=people,dc=example,dc=org
sn: B
uidNumber: 1001
gidNumber: 100

# shirley, people, example.org
dn: cn=shirley,ou=people,dc=example,dc=org
sn: A
uidNumber: 1002
gidNumber: 100

# search result
search: 2
result: 0 Success

# numResponses: 4
# numEntries: 3
```

通过使用响应中的信息，在容器内运行命令以为每个用户公用名 (`cn`) 创建一个主目录。使用 `uidNumber` 和 `gidNumber` 确定用户对主目录的所有权。以下示例命令可为用户执行此操作*shirley*。

```
sudo docker container exec jupyterhub bash -c "mkdir /home/shirley"
sudo docker container exec jupyterhub bash -c "chown -R $uidNumber /home/shirley"
sudo docker container exec jupyterhub bash -c "sudo chgrp -R $gidNumber /home/shirley"
```

**注意**  
的 LDAP 身份验证器 JupyterHub 不支持创建本地用户。有关更多信息，请参阅[关于本地用户创建的 LDAP 身份验证器配置说明](https://github.com/jupyterhub/ldapauthenticator#configuration-note-on-local-user-creation)。  
要手动创建本地用户，请使用以下命令。  

```
sudo docker exec jupyterhub bash -c "echo 'shirley:x:$uidNumber:$gidNumber::/home/shirley:/bin/bash' >> /etc/passwd"
```

## 重启 JupyterHub 容器
<a name="emr-jupyterhub-ldap-restart"></a>

运行以下命令重新启动 `jupyterhub` 容器：

```
sudo docker stop jupyterhub
sudo docker start jupyterhub
```

# 用户模拟
<a name="emr-jupyterhub-user-impersonation"></a>

在 Amazon EMR 上执行期间，在 Jupyter notebook 中运行的 Spark 作业将访问多个应用程序。例如，Sparkmagic 接收到用户在 Jupyter 中运行的 PySpark 3 个代码，Sparkmagic 使用 HTTP POST 请求将其提交给 Livy，然后使用 YARN 创建一个 Spark 作业在集群上执行。

默认情况下，以这种方式提交的 YARN 作业以 `livy` 用户身份运行，而不管启动该作业的用户如何。通过设置*用户模拟*，您也可以将 Notebook 用户的用户 ID 作为与 YARN 作业关联的用户。每个用户启动的作业分别与 `shirley` 和 `diego` 相关联，而不是由与 `livy` 用户关联的 `shirley` 和 `diego` 同时启动作业。这有助于审核 Jupyter 使用情况以及在组织中管理应用程序。

只有在从 Sparkmagic 到 Livy 的调用未进行身份验证时，才支持该配置。不支持在 Hadoop 应用程序和 Livy 之间提供身份验证或代理层的应用程序（如 Apache Knox Gateway）。本节中配置用户模拟的步骤假设 JupyterHub 和 Livy 在同一个主节点上运行。如果您的应用程序具有单独的集群，则需要修改 [步骤 3：为用户创建 HDFS 主目录](#Step3-UserImpersonation)，以便在 Livy 主节点上创建 HDFS 目录。

**Topics**
+ [步骤 1：配置 Livy](#Step1-UserImpersonation)
+ [步骤 2：添加用户](#Step2-UserImpersonation)
+ [步骤 3：为用户创建 HDFS 主目录](#Step3-UserImpersonation)

## 步骤 1：配置 Livy
<a name="Step1-UserImpersonation"></a>

在创建集群时，您可以使用 `livy-conf` 和 `core-site` 配置分类启用 Livy 用户模拟，如以下示例所示。将配置分类保存为 JSON，然后在创建集群时引用该分类，或者指定内联的配置分类。有关更多信息，请参阅[配置应用程序](emr-configure-apps.md)。

```
[
  {
    "Classification": "livy-conf",
    "Properties": {
      "livy.impersonation.enabled": "true"
    }
  },
  {
    "Classification": "core-site",
    "Properties": {
      "hadoop.proxyuser.livy.groups": "*",
      "hadoop.proxyuser.livy.hosts": "*"
    }
  }
]
```

## 步骤 2：添加用户
<a name="Step2-UserImpersonation"></a>

使用 PAM 或 LDAP 添加 JupyterHub 用户。有关更多信息，请参阅[使用 PAM 身份验证](emr-jupyterhub-pam-users.md)和[使用 LDAP 身份验证](emr-jupyterhub-ldap-users.md)。

## 步骤 3：为用户创建 HDFS 主目录
<a name="Step3-UserImpersonation"></a>

您已连接到主节点以创建用户。在仍连接到主节点时，复制以下内容并将其保存到脚本文件中。该脚本为主节点上的每个 JupyterHub 用户创建 HDFS 主目录。该脚本假设您使用的是默认管理员用户 ID *jovyan*。

```
#!/bin/bash

CURL="curl --silent -k"
HOST=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname)

admin_token() {
    local user=jovyan
    local pwd=jupyter
    local token=$($CURL https://$HOST:9443/hub/api/authorizations/token \
        -d "{\"username\":\"$user\", \"password\":\"$pwd\"}" | jq ".token")
    if [[ $token != null ]]; then
        token=$(echo $token | sed 's/"//g')
    else
        echo "Unable to get Jupyter API Token."
        exit 1
    fi
    echo $token
}

# Get Jupyter Admin token
token=$(admin_token)

# Get list of Jupyter users
users=$(curl -XGET -s -k https://$HOST:9443/hub/api/users \
 -H "Authorization: token $token" | jq '.[].name' | sed 's/"//g')

# Create HDFS home dir 
for user in ${users[@]}; 
do
 echo "Create hdfs home dir for $user"
 hadoop fs -mkdir /user/$user
 hadoop fs -chmod 777 /user/$user
done
```

# 安装其它内核和库
<a name="emr-jupyterhub-install-kernels-libs"></a>

当你 JupyterHub 在 Amazon EMR 上创建集群时，Jupyter 的默认 Python 3 内核以及 Sparkmagic 的 Spark 内核将 PySpark 安装在 Docker 容器上。可以安装其它内核。还可以安装其它库和软件包，然后将它们导入相应的 shell。

## 安装内核
<a name="emr-jupyterhub-install-kernels"></a>

内核安装在 Docker 容器中。安装内核最简单的方式是，创建包含安装命令的清除脚本，将脚本保存到主节点，然后使用 `sudo docker exec jupyterhub script_name` 命令以在 `jupyterhub` 容器内运行脚本。以下示例脚本安装内核，然后在主节点上安装内核的一些库，以便之后在 Jupyter 中使用内核时可以导出库。

```
#!/bin/bash

# Install Python 2 kernel
conda create -n py27 python=2.7 anaconda
source /opt/conda/envs/py27/bin/activate
apt-get update
apt-get install -y gcc
/opt/conda/envs/py27/bin/python -m pip install --upgrade ipykernel
/opt/conda/envs/py27/bin/python -m ipykernel install

# Install libraries for Python 2
/opt/conda/envs/py27/bin/pip install paramiko nltk scipy numpy scikit-learn pandas
```

要在容器内安装内核和库，请打开至主节点的终端连接，将脚本保存到 `/etc/jupyter/install_kernels.sh`，然后在主节点命令行上运行以下命令：

```
sudo docker exec jupyterhub bash /etc/jupyter/install_kernels.sh
```

## 使用库和安装其它库
<a name="emr-jupyterhub-install-libs"></a>

Amazon EMR JupyterHub 上预装了一组适用于 Python 3 的核心机器学习和数据科学库。可以使用 `sudo docker exec jupyterhub bash -c "conda list" ` 和 `sudo docker exec jupyterhub bash -c "pip freeze"`。

如果 Spark 作业需要 Worker 节点上的库，建议使用引导操作运行脚本以在创建集群时安装库。集群创建过程中，引导操作将在所有集群节点上运行，这将简化安装。如果于集群运行后在核心/Worker 节点上安装库，则操作更复杂。我们在此部分中提供了示例 Python 程序以演示如何安装这些库。

此部分中演示的引导操作和 Python 程序示例都使用保存到 Amazon S3 的清除脚本在所有节点上安装库。

以下示例中引用的脚本将通过 `pip` 安装适用于 Python 3 内核的 paramiko、nltk、scipy、scikit-learn 和 pandas：

```
#!/bin/bash

sudo python3 -m pip install boto3 paramiko nltk scipy scikit-learn pandas
```

创建脚本后，将其上载到 Amazon S3 中的位置（例如，`s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh`）。有关更多信息，请参阅*《Amazon Simple Storage Service 用户指南》*中的[上传对象](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html)，以便可以在引导操作或 Python 程序中使用此操作。

**要指定在创建集群时在所有节点上安装库的引导操作，请使用 AWS CLI**

1. 创建与之前的示例类似的脚本并将脚本保存在 Amazon S3 中的位置。我们将使用示例 `s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh`。

1. 使用选项创建集群 JupyterHub 并使用`--bootstrap-actions`选项的`Path`参数来指定脚本位置，如以下示例所示：
**注意**  
为了便于读取，包含 Linux 行继续符（\$1）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

   ```
   aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
   --applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
   --use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair \
   --bootstrap-actions Path=s3://amzn-s3-demo-bucket/install-my-jupyter-libraries.sh,Name=InstallJupyterLibs
   ```

**指定将在使用控制台创建集群时在所有节点上安装库的引导操作**

1. 导航到 Amazon EMR 新控制台，然后从侧面导航栏中选择**切换到旧控制台**。有关切换到旧控制台后预期情况的更多信息，请参阅 [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in)。

1. 依次选择 **Create cluster (创建集群)**、**Go to advanced options (转到高级选项)**。

1. 根据应用程序的情况，指定 **Software and Steps (软件和步骤)** 和 **Hardware (硬件)** 的设置。

1. 在 **General Cluster Settings (常规集群设置)** 屏幕上，展开 **Bootstrap Actions (引导操作)**。

1. 对于 **Add bootstrap action (添加引导操作)**，选择 **Custom action (自定义操作)**、**Configure and add (配置和添加)**。

1. 对于**名称**，输入易于理解的名称。对于**脚本位置**，请输入脚本在 Amazon S3 中的位置（我们使用的示例是 *s3://amzn-s3-demo-bucket/ install-my-jupyter-libraries .sh*）。保留 **Optional arguments (可选参数)** 为空，然后选择 **Add (添加)**。

1. 指定集群的其它设置，然后选择 **Next (下一步)**。

1. 指定安全设置，然后选择 **Create cluster (创建集群)**。

**Example 在运行集群的核心节点上安装库**  
在 Jupyter 内的主节点上安装库之后，可以通过不同的方式将库安装在运行的核心节点上。以下示例显示了编写为在本地计算机上运行的 Python 程序。当您在本地运行 Python 程序时，它会使用 of AWS Systems Manager 来运行示例脚本，如本节前面所示，该脚本将在集群的核心节点上安装库。`AWS-RunShellScript`  

```
import argparse
import time
import boto3


def install_libraries_on_core_nodes(cluster_id, script_path, emr_client, ssm_client):
    """
    Copies and runs a shell script on the core nodes in the cluster.

    :param cluster_id: The ID of the cluster.
    :param script_path: The path to the script, typically an Amazon S3 object URL.
    :param emr_client: The Boto3 Amazon EMR client.
    :param ssm_client: The Boto3 AWS Systems Manager client.
    """
    core_nodes = emr_client.list_instances(
        ClusterId=cluster_id, InstanceGroupTypes=["CORE"]
    )["Instances"]
    core_instance_ids = [node["Ec2InstanceId"] for node in core_nodes]
    print(f"Found core instances: {core_instance_ids}.")

    commands = [
        # Copy the shell script from Amazon S3 to each node instance.
        f"aws s3 cp {script_path} /home/hadoop",
        # Run the shell script to install libraries on each node instance.
        "bash /home/hadoop/install_libraries.sh",
    ]
    for command in commands:
        print(f"Sending '{command}' to core instances...")
        command_id = ssm_client.send_command(
            InstanceIds=core_instance_ids,
            DocumentName="AWS-RunShellScript",
            Parameters={"commands": [command]},
            TimeoutSeconds=3600,
        )["Command"]["CommandId"]
        while True:
            # Verify the previous step succeeded before running the next step.
            cmd_result = ssm_client.list_commands(CommandId=command_id)["Commands"][0]
            if cmd_result["StatusDetails"] == "Success":
                print(f"Command succeeded.")
                break
            elif cmd_result["StatusDetails"] in ["Pending", "InProgress"]:
                print(f"Command status is {cmd_result['StatusDetails']}, waiting...")
                time.sleep(10)
            else:
                print(f"Command status is {cmd_result['StatusDetails']}, quitting.")
                raise RuntimeError(
                    f"Command {command} failed to run. "
                    f"Details: {cmd_result['StatusDetails']}"
                )


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("cluster_id", help="The ID of the cluster.")
    parser.add_argument("script_path", help="The path to the script in Amazon S3.")
    args = parser.parse_args()

    emr_client = boto3.client("emr")
    ssm_client = boto3.client("ssm")

    install_libraries_on_core_nodes(
        args.cluster_id, args.script_path, emr_client, ssm_client
    )


if __name__ == "__main__":
    main()
```

# JupyterHub 发布历史
<a name="JupyterHub-release-history"></a>

下表列出了 Amazon EMR 的每个发行版本中 JupyterHub 包含的版本，以及随应用程序一起安装的组件。有关每个发行版本中的组件版本，请参阅 [Amazon EMR 7.x 发行版](emr-release-7x.md)、[Amazon EMR 6.x 发行版](emr-release-6x.md) 或 [Amazon EMR 5.x 发行版](emr-release-5x.md) 中的发行版“组件版本”部分。


**JupyterHub 版本信息**  

| Amazon EMR 发行版标签 | JupyterHub 版本 | 安装在一起的组件 JupyterHub | 
| --- | --- | --- | 
| emr-7.12.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.11.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.10.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.9.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.8.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.7.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.6.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.5.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.4.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.3.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.2.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.2 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.1.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-7.0.0 | 1.5.0 | emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.15.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.14.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.13.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.12.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.11.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.11.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.10.1 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.10.0 | 1.5.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.9.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.9.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.8.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.8.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.7.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.1 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.36.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.6.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hudi, hudi-spark, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.35.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.5.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.4.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.3.1 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.3.0 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.2.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.2.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.1.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.1.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.0.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-6.0.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.34.0 | 1.4.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.33.1 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.33.0 | 1.2.2 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.32.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.32.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.31.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.31.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.2 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.1 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.30.0 | 1.1.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.29.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.28.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.28.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.27.1 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.27.0 | 1.0.0 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.26.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.25.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.24.1 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.24.0 | 0.9.6 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.23.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.23.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.22.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.2 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.21.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.20.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.20.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.19.1 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.19.0 | 0.9.4 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.18.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.18.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.2 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.17.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.16.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.16.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.15.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.15.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.2 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.1 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 
| emr-5.14.0 | 0.8.1 | aws-sagemaker-spark-sdk, emrfs, emr-goodies, emr-ddb, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, r, spark-client, spark-history-server, spark-on-yarn, spark-yarn-slave, livy-server, jupyterhub | 