

# Using a data preparation recipe in AWS Glue Studio
Using a data preparation recipe in AWS Glue Studio

 The **Data preparation recipe** transform allows you to author a data preparation recipe from scratch using an interactive grid style authoring interface. It also allows you to import an existing AWS Glue DataBrew recipe and then edit it in AWS Glue Studio. 

 The **Data Preparation Recipe** node is available from the Resource panel. You can connect the **Data Preparation Recipe** node to another node in the visual workflow, whether it is a Data source node or another transformation node. After choosing a AWS Glue DataBrew recipe and version, the applied steps in the recipe are visible in the node properties tab. 

## Prerequisites

+  If importing an AWS Glue DataBrew recipe, you have the required IAM permissions as described in [Import a AWS Glue DataBrew recipe in AWS Glue Studio](glue-studio-data-preparation-import-recipe.md) . 
+  A data preview session must be created. 

## Limitations

+  AWS Glue DataBrew recipes are only supported in [commercial DataBrew regions](https://docs.aws.amazon.com/general/latest/gr/databrew.html). 
+  Not all AWS Glue DataBrew recipes are supported by AWS Glue. Some recipes will not be able to be run in AWS Glue Studio. 
  +  Recipes with `UNION` and `JOIN` transforms are not supported, however, AWS Glue Studio already has "Join" and "Union" transform nodes which can be used before or after a **Data Preparation Recipe** node. 
+  **Data Preparation Recipe** nodes are supported for jobs starting with AWS Glue version 4.0. This version will be auto-selected after a **Data Preparation Recipe** node is added to the job. 
+  **Data Preparation Recipe** nodes require Python. This is automatically set when the **Data Preparation Recipe** node is added to the job. 
+  Adding a new **Data Preparation Recipe** node to the visual graph will automatically restart your Data Preview session with the correct libraries to use the **Data Preparation Recipe** node. 
+  The following transforms are not supported for import or editing in a **Data Preparation Recipe** node: `GROUP_BY`, `PIVOT`, `UNPIVOT`, and `TRANSPOSE`. 

## Additional features


 When you've selected the **Data Preparation Recipe** transform, you have the ability to take additional actions after choosing **Author recipe**. 
+  Add step – you can add additional steps to a recipe as needed by choosing the add step icon, or use the toolbar in the Preview pane by choosing an action.   
![\[The screenshot shows the add recipe icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/add-recipe-icon.png)  
![\[The screenshot shows the add recipe icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-toolbar.png)
+  Import recipe – choose **More** then **Import recipe** to use in your AWS Glue Studio job.   
![\[The screenshot shows the more icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-recipe-node-more-icon.png)  
![\[The screenshot shows the more icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-recipe-node-more-features.png)
+  Download as YAML – choose **More** then **Download as YAML** to download your recipe to save outside of AWS Glue Studio. 
+  Download as JSON – choose **More** then **Download as JSON** to download your recipe to save outside of AWS Glue Studio. 
+  Undo and redo recipe steps – You can undo and redo recipe steps in the Preview pane when working with data in the grid.   
![\[The screenshot shows the more icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-toolbar-undo-redo.png)

# Author and run data preparation recipes in a visual ETL AWS Glue job


 In this scenario, you can author data preparation recipes without having to first create them in DataBrew. Before you can start authoring recipes, you must: 
+  Have an active Data Preview session running. When the data preview session is READY, then **Author Recipe** will become active and you can begin authoring or editing your recipe.   
![\[The screenshot shows the Data Preview session as complete.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-recipe-data-preview-complete.png)
+  Ensure that the toggle for **Automatically import glue libraries** is enabled.   
![\[The screenshot shows the option for Automatically import glue libraries toggled on.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-recipe-automatically-import-glue-libraries.png)

   You can do this by choosing the gear icon in the Data Preview pane.   
![\[The screenshot shows the option for Automatically import glue libraries toggled on.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preview-preferences.png)

**To author a data preparation recipe in AWS Glue Studio:**

1.  Add the **Data Preparation Recipe** transform to your job canvas. Your transform should be connected to a data source node parent. When adding the **Data Preparation Recipe** node, the node will restart with the proper libraries and you will see the Data Frame being prepared.   
![\[The screenshot shows the data frame loading after adding the Data Preparation Recipe.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-preparing-dataframe.png)

1.  Once the Data Preview session is ready, the data with any previously applied steps will appear on the bottom of the screen. 

1.  Choose **Author Recipe**. This will allow you to start a new recipe in AWS Glue Studio.   
![\[The screenshot shows the Transform panel with the fields for Name and Node parents, as well as option to Author Recipe.\]](http://docs.aws.amazon.com/glue/latest/dg/images/data-preparation-recipe-transform-tab-new.png)

1.  In the **Transform** panel to the right of the job canvas, enter a name for your data preparation recipe. 

1.  On the left-side, the canvas will be replaced with a grid view of your data. To the right, the **Transform** panel will change to show you your recipe steps. Choose **Add step** to add the first step in your recipe.   
![\[The screenshot shows the Transform panel after choosing Add Step. When you choose a column, the options will change dynamically. You can choose to sort, take an action on the column, and filter values.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-preview-data-transform-panel.png)

1.  In the **Transform** panel, choose to sort, take an action on the column, and filter values. For example, choose **Rename column**.   
![\[The screenshot shows the Transform panel after choosing Add Step. When you choose a column, the options will change dynamically. You can choose to sort, take an action on the column, and filter values.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-add-step.png)

1.  In the Transform panel on the right-side, options for renaming a column allow you to choose the source column to rename, and to enter the new column name. Once you have done so, choose **Apply**. 

    You can preview each step, undo a step, and re-order steps and use any of the action icons, such as Filter, Sort, Split, Merge, etc. When you perform actions in the data grid, the steps are added to the recipe in the Transform panel.   
![\[The screenshot shows the Preview data grid with the toolbar highlighted. You can apply an action by using any of the tools and it will be added to the recipe in the Transform panel on the right.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-preview-data-grid.png)

    If you need to make a change, you can do this in the Preview pane by previewing the result of each step, undoing a step, and re-ordering steps. For example: 
   +  Undo/redo step – undo a step by choosing the **undo** icon. You can repeat a step by choosing the **redo** icon.   
![\[The screenshot shows the more icon.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-toolbar-undo-redo.png)
   +  Reorder step – when you reorder a step, AWS Glue Studio will validate each step and let you know if the step is invalid. 

1.  Once you've applied a step, the Transform panel will show you all the steps in your recipe. You can clear all the steps to start over, add more steps by choosing the add icon, or choose **Done Authoring Recipe**.   
![\[The screenshot shows the Transform panel with steps added to your recipe. When done, choose Done Authoring Recipe or choose the add icon to add more steps to the recipe.\]](http://docs.aws.amazon.com/glue/latest/dg/images/author-recipe-done-authoring-recipe.png)

1.  Choose **Save** at the top right side of your screen. Your recipe steps will not be saved until you save your job. 

# Import a AWS Glue DataBrew recipe in AWS Glue Studio


 In AWS Glue DataBrew, a recipe is a set of data transformation steps. AWS Glue DataBrew recipes prescribes how to transform data that have already been read and doesn't describe where and how to read data, as well as how and where to write data. This is configured in Source and Target nodes in AWS Glue Studio. For more information on recipes, see [ Creating and using AWS Glue DataBrew recipes ](https://docs.aws.amazon.com/databrew/latest/dg/recipes.html). 

 To use AWS Glue DataBrew recipes in AWS Glue Studio, begin with creating recipes in AWS Glue DataBrew. If you have recipes you want to use, you can skip this step. 

## IAM permissions for AWS Glue DataBrew


 This topic provides information to help you understand the actions and resources that you an IAM administrator can use in an AWS Identity and Access Management (IAM) policy for the Data Preparation Recipe transform. 

 For additional information about security in AWS Glue, see [Access Management](https://docs.aws.amazon.com/glue/latest/dg/security.html). 

**Note**  
 The following table lists the permissions that a user needs if importing an existing AWS Glue DataBrew recipe. 


**Data Preparation Recipe transform actions**  

| Action | Description | 
| --- | --- | 
| databrew:ListRecipes | Grants permission to retrieve AWS Glue DataBrew recipes. | 
| databrew:ListRecipeVersions | Grants permission to retrieve AWS Glue DataBrew recipe versions. | 
| databrew:DescribeRecipe | Grants permission to retrieve AWS Glue DataBrew recipe description. | 



 The role you’re using for accessing this functionality should have a policy that allows several AWS Glue DataBrew actions. You can achieve this by either using the `AWSGlueConsoleFullAccess` policy that includes the necessary actions or add the following inline policy to your role: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "databrew:ListRecipes",
        "databrew:ListRecipeVersions",
        "databrew:DescribeRecipe"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------



 To use the Data Preparation Recipe transform, you must add the `IAM:PassRole` action to the permissions policy. 


**Additional required permissions**  

| Action | Description | 
| --- | --- | 
| iam:PassRole | Grants permission for IAM to allow the user to pass the approved roles. | 

Without these permissions the following error occurs:

```
"errorCode": "AccessDenied"
"errorMessage": "User: arn:aws:sts::account_id:assumed-role/AWSGlueServiceRole is not 
authorized to perform: iam:PassRole on resource: arn:aws:iam::account_id:role/service-role/AWSGlueServiceRole 
because no identity-based policy allows the iam:PassRole action"
```



## Importing an AWS Glue DataBrew recipe


**To import an AWS Glue DataBrew recipe and use in AWS Glue Studio:**

 If you have an existing **Data Preparation Recipe** node and you want to edit the recipe steps directly in AWS Glue Studio, you will have to import the recipe steps into your AWS Glue Studio job. 

1.  Start a AWS Glue job in AWS Glue Studio with a datasource. 

1.  Add the **Data Preparation Recipe** node to the job canvas.   
![\[The screenshot shows the Add node modal with data preparation recipe available for selection.\]](http://docs.aws.amazon.com/glue/latest/dg/images/glue-add-node-data-preparation-recipe.png)

1.  In the Transform panel, enter a name for your recipe. 

1.  Choose one or more parent nodes by selecting the available nodes on the canvas from the drop-down list. 

1.  Choose **Author Recipe**. If **Author Recipe** is grey it is unavailable until node parents have been selected and a data preview session has finished.   
![\[Author Data Preparation Recipe form with name field and node parents selection dropdown.\]](http://docs.aws.amazon.com/glue/latest/dg/images/glue-author-data-preparation-recipe.png)

1.  The data frame loads and shows you detailed information about your source data. 

    Select the **more actions** icon and choose **Import recipe**.   
![\[Data preparation interface showing "Build your Recipe" with an "Add step" button.\]](http://docs.aws.amazon.com/glue/latest/dg/images/glue-dataframe-import-recipe.png)

1.  Use the Import recipe wizard to complete the steps. In step 1, search for your recipe, select it, and choose **Next**.   
![\[Import recipe interface showing two recipes, with one selected for import.\]](http://docs.aws.amazon.com/glue/latest/dg/images/import-recipe-step-1.png)

1.  In step 2, choose your import options. You can choose to Append a new recipe to an existing recipe or Overwrite an existing recipe. Choose **Next**.   
![\[Import recipe interface showing selected recipe, version, and two imported steps.\]](http://docs.aws.amazon.com/glue/latest/dg/images/import-recipe-step-2.png)

1.  In step 3, validate the recipe steps. Once you import your AWS Glue DataBrew recipe, you can edit this recipe directly in AWS Glue Studio.   
![\[Recipe import interface showing two steps and a validation progress indicator.\]](http://docs.aws.amazon.com/glue/latest/dg/images/import-recipe-step-3.png)  
![\[Import recipe interface showing validated steps for sorting and formatting data.\]](http://docs.aws.amazon.com/glue/latest/dg/images/import-recipe-step-3-validated-2.png)

1.  After this, the steps will be imported as part of your AWS Glue job. Make necessary configuration changes in the **Job details** tab, like naming your job and adjusting allocated capacity as needed. Choose **Save** to save your job and recipe. 
**Note**  
 JOIN, UNION, GROUP\$1BY, PIVOT, UNPIVOT, TRANSPOSE are not supported for recipe import, nor will they be available in recipe authoring mode. 

1.  Optionally, you can finish authoring the job by adding other transformations nodes as needed and add Data target node(s). 

    If you reorder steps after you import a recipe, AWS Glue performs validation on those steps. For example, if you renamed and then deleted a column, and you moved the delete step on top, then the rename step would be invalid. You can then edit the steps to fix the validation error. 

# Migrating from AWS Glue DataBrew to AWS Glue Studio
Migrating from DataBrew

 If you have recipes in AWS Glue DataBrew, use the following checklist to migrate your recipes to AWS Glue Studio. 


| If you want to | Then do this | 
| --- | --- | 
|  Allow users to retrieve AWS Glue DataBrew recipes, recipe versions, and recipe descriptions.  |  Add IAM permissions to a policy that allows your role to access the necessary actions. See [IAM permissions for AWS Glue DataBrew](glue-studio-data-preparation-import-recipe.md#glue-studio-databrew-permissions).  | 
|  Import an existing AWS Glue DataBrew recipe into AWS Glue Studio.  |  Follow the steps in [Importing an AWS Glue DataBrew recipe](glue-studio-data-preparation-import-recipe.md#glue-studio-databrew-import-steps).  | 
|  Import a recipe with JOIN and UNION.  |  Recipes with UNION and JOIN transforms are not supported. Use the Join and Union transforms in AWS Glue Studio before or after a Data Preparation Recipe node.  | 