

# Adding Jupyter Notebook users and administrators
<a name="emr-jupyterhub-user-access"></a>

You can use one of two methods for users to authenticate to JupyterHub so that they can create notebooks and, optionally, administer JupyterHub. The easiest method is to use JupyterHub's pluggable authentication module (PAM). In addition, JupyterHub on Amazon EMR supports the [LDAP authenticator plugin for JupyterHub](https://github.com/jupyterhub/ldapauthenticator/) for obtaining user identities from an LDAP server, such as a Microsoft Active Directory server. Instructions and examples for adding users with each authentication method are provided in this section.

JupyterHub on Amazon EMR has a default user with administrator permissions. The user name is `jovyan` and the password is `jupyter`. We strongly recommend that you replace the user with another user who has administrative permissions. You can do this using a step when you create the cluster, or by connecting to the master node when the cluster is running.

**Topics**
+ [

# Using PAM authentication
](emr-jupyterhub-pam-users.md)
+ [

# Using LDAP authentication
](emr-jupyterhub-ldap-users.md)
+ [

# User impersonation
](emr-jupyterhub-user-impersonation.md)

# Using PAM authentication
<a name="emr-jupyterhub-pam-users"></a>

Creating PAM users in JupyterHub on Amazon EMR is a two-step process. The first step is to add users to the operating system running in the `jupyterhub` container on the master node, and to add a corresponding user home directory for each user. The second step is to add these operating system users as JupyterHub users—a process known as whitelisting in JupyterHub. After a JupyterHub user is added, they can connect to the JupyterHub URL and provide their operating system credentials for access.

When a user logs in, JupyterHub opens the notebook server instance for that user, which is saved in the user's home directory on the master node, which is `/var/lib/jupyter/home/username`. If a notebook server instance doesn't exist, JupyterHub spawns a notebook instance in the user's home directory. The following sections demonstrate how to add users individually to the operating system and to JupyterHub, followed by a rudimentary bash script that adds multiple users.

## Adding an operating system user to the container
<a name="emr-jupyterhub-system-user"></a>

The following example first uses the [useradd](https://linux.die.net/man/8/useradd) command within the container to add a single user, diego, and create a home directory for that user. The second command uses [chpasswd](https://linux.die.net/man/8/chpasswd) to establish a password of diego for this user. Commands are run on the master node command line while connected using SSH. You could also run these commands using a step as described earlier in [Administration by submitting steps](emr-jupyterhub-administer.md#emr-jupyterhub-administer-steps).

```
sudo docker exec jupyterhub useradd -m -s /bin/bash -N diego
sudo docker exec jupyterhub bash -c "echo diego:diego | chpasswd"
```

## Adding a JupyterHub user
<a name="emr-jupyterhub-jupyterhub-user"></a>

You can use the **Admin** panel in JupyterHub or the REST API to add users and administrators, or just users.

**To add users and administrators using the admin panel in JupyterHub**

1. Connect to the master node using SSH and log in to https://*MasterNodeDNS*:9443 with an identity that has administrator permissions.

1. Choose **Control Panel**, **Admin**.

1. Choose **User**, **Add Users**, or choose **Admin**, **Add Admins**.

**To add a user using the REST API**

1. Connect to the master node using SSH and use the following command on the master node, or run the command as a step.

1. Acquire an administrative token to make API requests, and replace *AdminToken* in the following step with that token.

1. Use the following command, replacing *UserName* with an operating system user that has been created within the container.

   ```
   curl -XPOST -H "Authorization: token AdminToken" "https://$(hostname):9443/hub/api/users/UserName
   ```

**Note**  
You are automatically added as a JupyterHub non-admin user when you log in to the JupyterHub web interface for the first time.

## Example: Bash script to add multiple users
<a name="emr-jupyterhub-script-multuser"></a>

The following sample bash script ties together the previous steps in this section to create multiple JupyterHub users. The script can be run directly on the master node, or it can be uploaded to Amazon S3 and then run as a step.

The script first establishes an array of user names, and uses the `jupyterhub token` command to create an API token for the default administrator, jovyan. It then creates an operating system user in the `jupyterhub` container for each user, assigning an initial password to each that is equal to their user name. Finally, it calls the REST API operation to create each user in JupyterHub. It passes the token generated earlier in the script and pipes the REST response to `jq` for easier viewing.

```
# Bulk add users to container and JupyterHub with temp password of username
set -x
USERS=(shirley diego ana richard li john mary anaya)
TOKEN=$(sudo docker exec jupyterhub /opt/conda/bin/jupyterhub token jovyan | tail -1)
for i in "${USERS[@]}"; 
do 
   sudo docker exec jupyterhub useradd -m -s /bin/bash -N $i
   sudo docker exec jupyterhub bash -c "echo $i:$i | chpasswd"
   curl -XPOST --silent -k https://$(hostname):9443/hub/api/users/$i \
 -H "Authorization: token $TOKEN" | jq
done
```

Save the script to a location in Amazon S3 such as `s3://amzn-s3-demo-bucket/createjupyterusers.sh`. Then you can use `script-runner.jar` to run it as a step.

### Example: Running the script when creating a cluster (AWS CLI)
<a name="emr-jupyterhub-multuser-createcluster"></a>

**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

```
aws emr create-cluster --name="MyJupyterHubCluster" --release-label emr-5.36.2 \
--applications Name=JupyterHub --log-uri s3://amzn-s3-demo-bucket/MyJupyterClusterLogs \
--use-default-roles --instance-type m5.xlarge --instance-count 2 --ec2-attributes KeyName=MyKeyPair \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

### Running the script on an existing cluster (AWS CLI)
<a name="emr-jupyterhub-multuser-runningcluster"></a>

**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

```
aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=CUSTOM_JAR,\
Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/createjupyterusers.sh"]
```

# Using LDAP authentication
<a name="emr-jupyterhub-ldap-users"></a>

Lightweight Directory Access Protocol (LDAP) is an application protocol for querying and modifying objects that correspond to resources such as users and computers stored in an LDAP-compatible directory service provider such as Active Directory or an OpenLDAP server. You can use the [LDAP authenticator plugin for JupyterHub](https://github.com/jupyterhub/ldapauthenticator/) with JupyterHub on Amazon EMR to use LDAP for user authentication. The plugin handles login sessions for LDAP users and provides user information to Jupyter. This lets users connect to JupyterHub and notebooks by using the credentials for their identities stored in an LDAP-compatible server.

The steps in this section walk you through the following steps to set up and enable LDAP using the LDAP Authenticator Plugin for JupyterHub. You perform the steps while connected to the master node command line. For more information, see [Connecting to the master node and Notebook servers](emr-jupyterhub-connect.md).

1. Create an LDAP configuration file with information about the LDAP server, such as the host IP address, port, binding names, and so on.

1. Modify `/etc/jupyter/conf/jupyterhub_config.py` to enable the LDAP Authenticator Plugin for JupyterHub.

1. Create and run a script that configures LDAP within the `jupyterhub` container.

1. Query LDAP for users, and then create home directories within the container for each user. JupyterHub requires home directories to host notebooks.

1. Run a script that restarts JupyterHub

**Important**  
Before you set up LDAP, test your network infrastructure to ensure that the LDAP server and the cluster master node can communicate as required. TLS typically uses port 389 over a plain TCP connection. If your LDAP connection uses SSL, the well-known TCP port for SSL is 636.

## Create the LDAP configuration file
<a name="emr-jupyterhub-ldap-config"></a>

The example below uses the following place-holder configuration values. Replace these with parameters that match your implementation.
+ The LDAP server is running version 3 and available on port 389. This is the standard non-SSL port for LDAP.
+ The base distinguished name (DN) is `dc=example, dc=org`.

Use a text editor to create the file [ldap.conf](http://manpages.ubuntu.com/manpages/bionic/man5/ldap.conf.5.html), with contents similar to the following. Use values appropriate for your LDAP implementation. Replace *host* with the IP address or resolvable host name of your LDAP server.

```
base dc=example,dc=org
uri ldap://host
ldap_version 3
binddn cn=admin,dc=example,dc=org
bindpw admin
```

## Enable LDAP Authenticator Plugin for JupyterHub
<a name="emr-jupyterhub-ldap-plugin"></a>

Use a text editor to modify the `/etc/jupyter/conf/jupyterhub_config.py` file and add [ldapauthenticator](https://github.com/jupyterhub/ldapauthenticator) properties similar to the following. Replace *host* with the IP address or resolvable host name of the LDAP server. The example assumes that the user objects are within an organizational unit (ou) named *people*, and uses the distinguished name components that you established earlier using `ldap.conf`.

```
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
c.LDAPAuthenticator.use_ssl = False
c.LDAPAuthenticator.server_address = 'host' 
c.LDAPAuthenticator.bind_dn_template = 'cn={username},ou=people,dc=example,dc=org'
```

## Configure LDAP within the container
<a name="emr-jupyterhub-ldap-container"></a>

Use a text editor to create a bash script with the following contents:

```
#!/bin/bash

# Uncomment the following lines to install LDAP client libraries only if
# using Amazon EMR release version 5.14.0. Later versions install libraries by default.
# sudo docker exec jupyterhub bash -c "sudo apt-get update"
# sudo docker exec jupyterhub bash -c "sudo apt-get -y install libnss-ldap libpam-ldap ldap-utils nscd"
 
# Copy ldap.conf
sudo docker cp ldap.conf jupyterhub:/etc/ldap/
sudo docker exec jupyterhub bash -c "cat /etc/ldap/ldap.conf"
 
# configure nss switch
sudo docker exec jupyterhub bash -c "sed -i 's/\(^passwd.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^group.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "sed -i 's/\(^shadow.*\)/\1 ldap/g' /etc/nsswitch.conf"
sudo docker exec jupyterhub bash -c "cat /etc/nsswitch.conf"
 
# configure PAM to create home directories
sudo docker exec jupyterhub bash -c "echo 'session required        pam_mkhomedir.so skel=/etc/skel umask=077' >> /etc/pam.d/common-session"
sudo docker exec jupyterhub bash -c "cat /etc/pam.d/common-session"
 
# restart nscd service
sudo docker exec jupyterhub bash -c "sudo service nscd restart"
 
# Test
sudo docker exec jupyterhub bash -c "getent passwd"

# Install ldap plugin
sudo docker exec jupyterhub bash -c "pip install jupyterhub-ldapauthenticator"
```

Save the script to the master node, and then run it from the master node command line. For example, with the script saved as `configure_ldap_client.sh`, make the file executable:

```
chmod +x configure_ldap_client.sh
```

And run the script:

```
./configure_ldap_client.sh
```

## Add attributes to Active Directory
<a name="emr-jupyterhub-ldap-adproperties"></a>

To find each user and create the appropriate entry in the database, the JupyterHub docker container requires the following UNIX properties for the corresponding user object in Active Directory. For more information, see the section *How do I continue to edit the GID/UID RFC 2307 attributes now that the Unix Attributes Plug-in is no longer available for the Active Directory Users and Computers MMC snap-in?* in the article [Clarification regarding the status of identity management for Unix (IDMU) and NIS server role in Windows Server 2016 technical preview and beyond](https://blogs.technet.microsoft.com/activedirectoryua/2016/02/09/identity-management-for-unix-idmu-is-deprecated-in-windows-server/).
+ `homeDirectory`

  This is the location to the user's home directory, which is usually `/home/username`.
+ `gidNumber`

  This is a value greater than 60000 that is not already used by a another user. Check the `etc/passwd` file for gids in use.
+ `uidNumber`

  This is a value greater than 60000 that is not already used by a another group. Check the `etc/group` file for uids in use.
+ `uid`

  This is the same as the *username*.

## Create user home directories
<a name="emr-jupyterhub-ldap-directories"></a>

JupyterHub needs home directories within the container to authenticate LDAP users and store instance data. The following example demonstrates two users, *shirley* and *diego*, in the LDAP directory.

The first step is to query the LDAP server for each user's user id and group id information using [ldapsearch](http://manpages.ubuntu.com/manpages/xenial/man1/ldapsearch.1.html) as shown in the following example, replacing *host* with the IP address or resolvable host name of your LDAP server:

```
ldapsearch -x -H ldap://host \
 -D "cn=admin,dc=example,dc=org" \
 -w admin \
 -b "ou=people,dc=example,dc=org" \
 -s sub \
 "(objectclass=*)" uidNumber gidNumber
```

The `ldapsearch` command returns an LDIF-formatted response that looks similar to the following for users *shirley* and *diego*.

```
# extended LDIF

# LDAPv3
# base <ou=people,dc=example,dc=org> with scope subtree
# filter: (objectclass=*)
# requesting: uidNumber gidNumber sn 

# people, example.org
dn: ou=people,dc=example,dc=org

# diego, people, example.org
dn: cn=diego,ou=people,dc=example,dc=org
sn: B
uidNumber: 1001
gidNumber: 100

# shirley, people, example.org
dn: cn=shirley,ou=people,dc=example,dc=org
sn: A
uidNumber: 1002
gidNumber: 100

# search result
search: 2
result: 0 Success

# numResponses: 4
# numEntries: 3
```

Using information from the response, run commands within the container to create a home directory for each user common name (`cn`). Use the `uidNumber` and `gidNumber` to fix ownership for the home directory for that user. The following example commands do this for the user *shirley*.

```
sudo docker container exec jupyterhub bash -c "mkdir /home/shirley"
sudo docker container exec jupyterhub bash -c "chown -R $uidNumber /home/shirley"
sudo docker container exec jupyterhub bash -c "sudo chgrp -R $gidNumber /home/shirley"
```

**Note**  
LDAP authenticator for JupyterHub does not support local user creation. For more information, see [LDAP authenticator configuration note on local user creation](https://github.com/jupyterhub/ldapauthenticator#configuration-note-on-local-user-creation).   
To create a local user manually, use the following command.  

```
sudo docker exec jupyterhub bash -c "echo 'shirley:x:$uidNumber:$gidNumber::/home/shirley:/bin/bash' >> /etc/passwd"
```

## Restart the JupyterHub container
<a name="emr-jupyterhub-ldap-restart"></a>

Run the following commands to restart the `jupyterhub` container:

```
sudo docker stop jupyterhub
sudo docker start jupyterhub
```

# User impersonation
<a name="emr-jupyterhub-user-impersonation"></a>

A Spark job running inside a Jupyter notebook traverses multiple applications during its execution on Amazon EMR. For example, PySpark3 code that a user runs inside Jupyter is received by Sparkmagic, which uses an HTTP POST request to submit it to Livy, which then creates a Spark job to execute on the cluster using YARN.

By default, YARN jobs submitted this way run as user `livy`, regardless of the user who initiated the job. By setting up *user impersonation* you can have the user ID of the notebook user also be the user associated with the YARN job. Rather than having jobs initiated by both `shirley` and `diego` associated with the user `livy`, jobs that each user initiates are associated with `shirley` and `diego` respectively. This helps you to audit Jupyter usage and manage applications within your organization.

This configuration is only supported when calls from Sparkmagic to Livy are unauthenticated. Applications that provide an authentication or proxying layer between Hadoop applications and Livy (such as Apache Knox Gateway) are not supported. The steps to configure user impersonation in this section assume that JupyterHub and Livy are running on the same master node. If your application has separate clusters, [Step 3: Create HDFS home directories for users](#Step3-UserImpersonation) needs to be modified so that HDFS directories are created on the Livy master node.

**Topics**
+ [

## Step 1: Configure Livy
](#Step1-UserImpersonation)
+ [

## Step 2: Add users
](#Step2-UserImpersonation)
+ [

## Step 3: Create HDFS home directories for users
](#Step3-UserImpersonation)

## Step 1: Configure Livy
<a name="Step1-UserImpersonation"></a>

You use the `livy-conf` and `core-site` configuration classifications when you create a cluster to enable Livy user impersonation as shown in the following example. Save the configuration classification as a JSON and then reference it when you create the cluster, or specify the configuration classification inline. For more information, see [Configure applications](emr-configure-apps.md).

```
[
  {
    "Classification": "livy-conf",
    "Properties": {
      "livy.impersonation.enabled": "true"
    }
  },
  {
    "Classification": "core-site",
    "Properties": {
      "hadoop.proxyuser.livy.groups": "*",
      "hadoop.proxyuser.livy.hosts": "*"
    }
  }
]
```

## Step 2: Add users
<a name="Step2-UserImpersonation"></a>

Add JupyterHub users using PAM or LDAP. For more information, see [Using PAM authentication](emr-jupyterhub-pam-users.md) and [Using LDAP authentication](emr-jupyterhub-ldap-users.md).

## Step 3: Create HDFS home directories for users
<a name="Step3-UserImpersonation"></a>

You connected to the master node to create users. While still connected to the master node, copy the contents below and save it to a script file. The script creates HDFS home directories for each JupyterHub user on the master node. The script assumes you are using the default administrator user ID, *jovyan*.

```
#!/bin/bash

CURL="curl --silent -k"
HOST=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname)

admin_token() {
    local user=jovyan
    local pwd=jupyter
    local token=$($CURL https://$HOST:9443/hub/api/authorizations/token \
        -d "{\"username\":\"$user\", \"password\":\"$pwd\"}" | jq ".token")
    if [[ $token != null ]]; then
        token=$(echo $token | sed 's/"//g')
    else
        echo "Unable to get Jupyter API Token."
        exit 1
    fi
    echo $token
}

# Get Jupyter Admin token
token=$(admin_token)

# Get list of Jupyter users
users=$(curl -XGET -s -k https://$HOST:9443/hub/api/users \
 -H "Authorization: token $token" | jq '.[].name' | sed 's/"//g')

# Create HDFS home dir 
for user in ${users[@]}; 
do
 echo "Create hdfs home dir for $user"
 hadoop fs -mkdir /user/$user
 hadoop fs -chmod 777 /user/$user
done
```