

# Developing blueprints in AWS Glue
Developing blueprints

Your organization might have a set of similar ETL use cases that could benefit from being able to parametrize a single workflow to handle them all. To address this need, AWS Glue enables you to define *blueprints*, which you can use to generate workflows. A blueprint accepts parameters, so that from a single blueprint, a data analyst can create different workflows to handle similar ETL use cases. After you create a blueprint, you can reuse it for different departments, teams, and projects.

**Topics**
+ [

# Overview of blueprints in AWS Glue
](blueprints-overview.md)
+ [

# Developing blueprints in AWS Glue
](developing-blueprints.md)
+ [

# Registering a blueprint in AWS Glue
](registering-blueprints.md)
+ [

# Viewing blueprints in AWS Glue
](viewing_blueprints.md)
+ [

# Updating a blueprint in AWS Glue
](updating_blueprints.md)
+ [

# Creating a workflow from a blueprint in AWS Glue
](creating_workflow_blueprint.md)
+ [

# Viewing blueprint runs in AWS Glue
](viewing_blueprint_runs.md)

# Overview of blueprints in AWS Glue
Overview of blueprints

**Note**  
The blueprints feature is currently unavailable in the following Regions in the AWS Glue console: Asia Pacific (Jakarta) and Middle East (UAE).

AWS Glue blueprints provide a way to create and share AWS Glue workflows. When there is a complex ETL process that could be used for similar use cases, rather than creating an AWS Glue workflow for each use case, you can create a single blueprint. 

The blueprint specifies the jobs and crawlers to include in a workflow, and specifies parameters that the workflow user supplies when they run the blueprint to create a workflow. The use of parameters enables a single blueprint to generate workflows for the various similar use cases. For more information about workflows, see [Overview of workflows in AWS Glue](workflows_overview.md).

The following are example use cases for blueprints:
+ You want to partition an existing dataset. The input parameters to the blueprint are Amazon Simple Storage Service (Amazon S3) source and target paths and a list of partition columns.
+ You want to snapshot an Amazon DynamoDB table into a SQL data store like Amazon Redshift. The input parameters to the blueprint are the DynamoDB table name and an AWS Glue connection, which designates an Amazon Redshift cluster and destination database.
+ You want to convert CSV data in multiple Amazon S3 paths to Parquet. You want the AWS Glue workflow to include a separate crawler and job for each path. The input parameters are the destination database in the AWS Glue Data Catalog and a comma-delimited list of Amazon S3 paths. Note that in this case, the number of crawlers and jobs that the workflow creates is variable.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/s3Bm8ay53Ms/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/s3Bm8ay53Ms)


**Blueprint components**  
A blueprint is a ZIP archive that contains the following components:
+ A Python layout generator script

  Contains a function that specifies the workflow *layout*—the crawlers and jobs to create for the workflow, the job and crawler properties, and the dependencies between the jobs and crawlers. The function accepts blueprint parameters and returns a workflow structure (JSON object) that AWS Glue uses to generate the workflow. Because you use a Python script to generate the workflow, you can add your own logic that is suitable for your use cases.
+ A configuration file

  Specifies the fully qualified name of the Python function that generates the workflow layout. Also specifies the names, data types, and other properties of all blueprint parameters used by the script.
+ (Optional) ETL scripts and supporting files

  As an advanced use case, you can parameterize the location of the ETL scripts that your jobs use. You can include job script files in the ZIP archive and specify a blueprint parameter for an Amazon S3 location where the scripts are to be copied to. The layout generator script can copy the ETL scripts to the designated location and specify that location as the job script location property. You can also include any libraries or other supporting files, provided that your script handles them.

![\[Box labeled Blueprint contains two smaller boxes, one labeled Python Script and the other labeled Config File.\]](http://docs.aws.amazon.com/glue/latest/dg/images/blueprint.png)


**Blueprint runs**  
When you create a workflow from a blueprint, AWS Glue runs the blueprint, which starts an asynchronous process to create the workflow and the jobs, crawlers, and triggers that the workflow encapsulates. AWS Glue uses the blueprint run to orchestrate the creation of the workflow and its components. You view the status of the creation process by viewing the blueprint run status. The blueprint run also stores the values that you supplied for the blueprint parameters.

![\[Box labeled Blueprint run contains icons labeled Workflow and Parameter Values.\]](http://docs.aws.amazon.com/glue/latest/dg/images/blueprint-run.png)


You can view blueprint runs using the AWS Glue console or AWS Command Line Interface (AWS CLI). When viewing or troubleshooting a workflow, you can always return to the blueprint run to view the blueprint parameter values that were used to create the workflow.

**Lifecycle of a blueprint**  
blueprints are developed, tested, registered with AWS Glue, and run to create workflows. There are typically three personas involved in the blueprint lifecycle.


| Persona | Tasks | 
| --- | --- | 
| AWS Glue developer |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html)  | 
| AWS Glue administrator |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html)  | 
| Data analyst |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html)  | 

**See also**  
[Developing blueprints in AWS Glue](developing-blueprints.md)
[Creating a workflow from a blueprint in AWS Glue](creating_workflow_blueprint.md)
[Permissions for personas and roles for AWS Glue blueprints](blueprints-personas-permissions.md)

# Developing blueprints in AWS Glue
Developing blueprints

As an AWS Glue developer, you can create and publish blueprints that data analysts can use to generate workflows.

**Topics**
+ [

# Overview of developing blueprints
](developing-blueprints-overview.md)
+ [

# Prerequisites for developing blueprints
](developing-blueprints-prereq.md)
+ [

# Writing the blueprint code
](developing-blueprints-code.md)
+ [

# Sample blueprint project
](developing-blueprints-sample.md)
+ [

# Testing a blueprint
](developing-blueprints-testing.md)
+ [

# Publishing a blueprint
](developing-blueprints-publishing.md)
+ [

# AWS Glue blueprint classes reference
](developing-blueprints-code-classes.md)
+ [

# Blueprint samples
](developing-blueprints-samples.md)

**See also**  
[Overview of blueprints in AWS Glue](blueprints-overview.md)

# Overview of developing blueprints


The first step in your development process is to identify a common use case that would benefit from a blueprint. A typical use case involves a recurring ETL problem that you believe should be solved in a general manner. Next, design a blueprint that implements the generalized use case, and define the blueprint input parameters that together can define a specific use case from the generalized use case.

A blueprint consists of a project that contains a blueprint parameter configuration file and a script that defines the *layout* of the workflow to generate. The layout defines the jobs and crawlers (or *entities* in blueprint script terminology) to create.

You do not directly specify any triggers in the layout script. Instead you write code to specify the dependencies between the jobs and crawlers that the script creates. AWS Glue generates the triggers based on your dependency specifications. The output of the layout script is a workflow object, which contains specifications for all workflow entities.

You build your workflow object using the following AWS Glue blueprint libraries:
+ `awsglue.blueprint.base_resource` – A library of base resources used by the libraries.
+ `awsglue.blueprint.workflow` – A library for defining a `Workflow` class.
+ `awsglue.blueprint.job` – A library for defining a `Job` class.
+ `awsglue.blueprint.crawler` – A library for defining a `Crawler` class.

The only other libraries that are supported for layout generation are those libraries that are available for the Python shell.

Before publishing your blueprint, you can use methods defined in the blueprint libraries to test the blueprint locally.

When you're ready to make the blueprint available to data analysts, you package the script, the parameter configuration file, and any supporting files, such as additional scripts and libraries, into a single deployable asset. You then upload the asset to Amazon S3 and ask an administrator to register it with AWS Glue.

For information about more sample blueprint projects, see [Sample blueprint project](developing-blueprints-sample.md) and [Blueprint samples](developing-blueprints-samples.md).

# Prerequisites for developing blueprints


To develop blueprints, you should be familiar with using AWS Glue and writing scripts for Apache Spark ETL jobs or Python shell jobs. In addition, you must complete the following setup tasks.
+ Download four AWS Python libraries to use in your blueprint layout scripts.
+ Set up the AWS SDKs.
+ Set up the AWS CLI.

## Download the Python libraries


Download the following libraries from GitHub, and install them into your project:
+ [https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/base\$1resource.py](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/base_resource.py)
+ [https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/workflow.py](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/workflow.py)
+ [https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/crawler.py](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/crawler.py)
+ [https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/job.py](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/awsglue/blueprint/job.py)

## Set up the AWS Java SDK


For the AWS Java SDK, you must add a `jar` file that includes the API for blueprints.

1. If you haven't already done so, set up the AWS SDK for Java.
   + For Java 1.x, follow the instructions in [Set up the AWS SDK for Java](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-install.html) in the *AWS SDK for Java Developer Guide*.
   + For Java 2.x, follow the instructions in [Setting up the AWS SDK for Java 2.x](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html) in the *AWS SDK for Java 2.x Developer Guide*.

1. Download the client `jar` file that has access to the APIs for blueprints.
   + For Java 1.x: s3://awsglue-custom-blueprints-preview-artifacts/awsglue-java-sdk-preview/AWSGlueJavaClient-1.11.x.jar
   + For Java 2.x: s3://awsglue-custom-blueprints-preview-artifacts/awsglue-java-sdk-v2-preview/AwsJavaSdk-Glue-2.0.jar

1. Add the client `jar` to the front of the Java classpath to override the AWS Glue client provided by the AWS Java SDK.

   ```
   export CLASSPATH=<path-to-preview-client-jar>:$CLASSPATH
   ```

1. (Optional) Test the SDK with the following Java application. The application should output an empty list.

   Replace `accessKey` and `secretKey` with your credentials, and replace `us-east-1` with your Region.

   ```
   import com.amazonaws.auth.AWSCredentials;
   import com.amazonaws.auth.AWSCredentialsProvider;
   import com.amazonaws.auth.AWSStaticCredentialsProvider;
   import com.amazonaws.auth.BasicAWSCredentials;
   import com.amazonaws.services.glue.AWSGlue;
   import com.amazonaws.services.glue.AWSGlueClientBuilder;
   import com.amazonaws.services.glue.model.ListBlueprintsRequest;
   
   public class App{
       public static void main(String[] args) {
           AWSCredentials credentials = new BasicAWSCredentials("accessKey", "secretKey");
           AWSCredentialsProvider provider = new AWSStaticCredentialsProvider(credentials);
           AWSGlue glue = AWSGlueClientBuilder.standard().withCredentials(provider)
                   .withRegion("us-east-1").build();
           ListBlueprintsRequest request = new ListBlueprintsRequest().withMaxResults(2);
           System.out.println(glue.listBlueprints(request));
       }
   }
   ```

## Set up the AWS Python SDK


The following steps assume that you have Python version 2.7 or later, or version 3.9 or later installed on your computer.

1. Download the following boto3 wheel file. If prompted to open or save, save the file. s3://awsglue-custom-blueprints-preview-artifacts/aws-python-sdk-preview/boto3-1.17.31-py2.py3-none-any.whl

1. Download the following botocore wheel file: s3://awsglue-custom-blueprints-preview-artifacts/aws-python-sdk-preview/botocore-1.20.31-py2.py3-none-any.whl

1. Check your Python version.

   ```
   python --version
   ```

1. Depending on your Python version, enter the following commands (for Linux):
   + For Python 2.7 or later.

     ```
     python3 -m pip install --user virtualenv
     source env/bin/activate
     ```
   + For Python 3.9 or later.

     ```
     python3 -m venv python-sdk-test
     source python-sdk-test/bin/activate
     ```

1. Install the botocore wheel file.

   ```
   python3 -m pip install <download-directory>/botocore-1.20.31-py2.py3-none-any.whl
   ```

1. Install the boto3 wheel file.

   ```
   python3 -m pip install <download-directory>/boto3-1.17.31-py2.py3-none-any.whl
   ```

1. Configure your credentials and default region in the `~/.aws/credentials` and `~/.aws/config` files. For more information, see [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) in the *AWS Command Line Interface User Guide*.

1. (Optional) Test your setup. The following commands should return an empty list.

   Replace `us-east-1` with your Region.

   ```
   $ python
   >>> import boto3
   >>> glue = boto3.client('glue', 'us-east-1')
   >>> glue.list_blueprints()
   ```

## Set up the preview AWS CLI


1. If you haven't already done so, install and/or update the AWS Command Line Interface (AWS CLI) on your computer. The easiest way to do this is with `pip`, the Python installer utility:

   ```
   pip install awscli --upgrade --user
   ```

   You can find complete installation instructions for the AWS CLI here: [Installing the AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/userguide/installing.html).

1. Download the AWS CLI wheel file from: s3://awsglue-custom-blueprints-preview-artifacts/awscli-preview-build/awscli-1.19.31-py2.py3-none-any.whl

1. Install the AWS CLI wheel file.

   ```
   python3 -m pip install awscli-1.19.31-py2.py3-none-any.whl
   ```

1. Run the `aws configure` command. Configure your AWS credentials (including access key, and secret key) and AWS Region. You can find information on configuring the AWS CLI here: [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html).

1. Test the AWS CLI. The following command should return an empty list.

   Replace `us-east-1` with your Region.

   ```
   aws glue list-blueprints --region us-east-1
   ```

# Writing the blueprint code


Each blueprint project that you create must contain at a minimum the following files:
+ A Python layout script that defines the workflow. The script contains a function that defines the entities (jobs and crawlers) in a workflow, and the dependencies between them.
+ A configuration file, `blueprint.cfg`, which defines:
  + The full path of the workflow layout definition function.
  + The parameters that the blueprint accepts.

**Topics**
+ [

# Creating the blueprint layout script
](developing-blueprints-code-layout.md)
+ [

# Creating the configuration file
](developing-blueprints-code-config.md)
+ [

# Specifying blueprint parameters
](developing-blueprints-code-parameters.md)

# Creating the blueprint layout script


The blueprint layout script must include a function that generates the entities in your workflow. You can name this function whatever you like. AWS Glue uses the configuration file to determine the fully qualified name of the function.

Your layout function does the following:
+ (Optional) Instantiates the `Job` class to create `Job` objects, and passes arguments such as `Command` and `Role`. These are job properties that you would specify if you were creating the job using the AWS Glue console or API.
+ (Optional) Instantiates the `Crawler` class to create `Crawler` objects, and passes name, role, and target arguments.
+ To indicate dependencies between the objects (workflow entities), passes the `DependsOn` and `WaitForDependencies` additional arguments to `Job()` and `Crawler()`. These arguments are explained later in this section.
+ Instantiates the `Workflow` class to create the workflow object that is returned to AWS Glue, passing a `Name` argument, an `Entities` argument, and an optional `OnSchedule` argument. The `Entities` argument specifies all of the jobs and crawlers to include in the workflow. To see how to construct an `Entities` object, see the sample project later in this section.
+ Returns the `Workflow` object.

For definitions of the `Job`, `Crawler`, and `Workflow` classes, see [AWS Glue blueprint classes reference](developing-blueprints-code-classes.md).

The layout function must accept the following input arguments.


| Argument | Description | 
| --- | --- | 
| user\$1params | Python dictionary of blueprint parameter names and values. For more information, see [Specifying blueprint parameters](developing-blueprints-code-parameters.md). | 
| system\$1params | Python dictionary containing two properties: region and accountId. | 

Here is a sample layout generator script in a file named `Layout.py`:

```
import argparse
import sys
import os
import json
from awsglue.blueprint.workflow import *
from awsglue.blueprint.job import *
from awsglue.blueprint.crawler import *


def generate_layout(user_params, system_params):

    etl_job = Job(Name="{}_etl_job".format(user_params['WorkflowName']),
                  Command={
                      "Name": "glueetl",
                      "ScriptLocation": user_params['ScriptLocation'],
                      "PythonVersion": "2"
                  },
                  Role=user_params['PassRole'])
    post_process_job = Job(Name="{}_post_process".format(user_params['WorkflowName']),
                            Command={
                                "Name": "pythonshell",
                                "ScriptLocation": user_params['ScriptLocation'],
                                "PythonVersion": "2"
                            },
                            Role=user_params['PassRole'],
                            DependsOn={
                                etl_job: "SUCCEEDED"
                            },
                            WaitForDependencies="AND")
    sample_workflow = Workflow(Name=user_params['WorkflowName'],
                            Entities=Entities(Jobs=[etl_job, post_process_job]))
    return sample_workflow
```

The sample script imports the required blueprint libraries and includes a `generate_layout` function that generates a workflow with two jobs. This is a very simple script. A more complex script could employ additional logic and parameters to generate a workflow with many jobs and crawlers, or even a variable number of jobs and crawlers.

## Using the DependsOn argument


The `DependsOn` argument is a dictionary representation of a dependency that this entity has on other entities within the workflow. It has the following form. 

```
DependsOn = {dependency1 : state, dependency2 : state, ...}
```

The keys in this dictionary represent the object reference, not the name, of the entity, while the values are strings that correspond to the state to watch for. AWS Glue infers the proper triggers. For the valid states, see [Condition Structure](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-trigger.html#aws-glue-api-jobs-trigger-Condition).

For example, a job might depend on the successful completion of a crawler. If you define a crawler object named `crawler2` as follows:

```
crawler2 = Crawler(Name="my_crawler", ...)
```

Then an object depending on `crawler2` would include a constructor argument such as: 

```
DependsOn = {crawler2 : "SUCCEEDED"}
```

For example:

```
job1 = Job(Name="Job1", ..., DependsOn = {crawler2 : "SUCCEEDED", ...})
```

If `DependsOn` is omitted for an entity, that entity depends on the workflow start trigger.

## Using the WaitForDependencies argument


The `WaitForDependencies` argument defines whether a job or crawler entity should wait until *all* entities on which it depends complete or until *any* completes.

The allowable values are "`AND`" or "`ANY`".

## Using the OnSchedule argument


The `OnSchedule` argument for the `Workflow` class constructor is a `cron` expression that defines the starting trigger definition for a workflow.

If this argument is specified, AWS Glue creates a schedule trigger with the corresponding schedule. If it isn't specified, the starting trigger for the workflow is an on-demand trigger.

# Creating the configuration file


The blueprint configuration file is a required file that defines the script entry point for generating the workflow, and the parameters that the blueprint accepts. The file must be named `blueprint.cfg`.

Here is a sample configuration file.

```
{
    "layoutGenerator": "DemoBlueprintProject.Layout.generate_layout",
    "parameterSpec" : {
           "WorkflowName" : {
                "type": "String",
                "collection": false
           },
           "WorkerType" : {
                "type": "String",
                "collection": false,
                "allowedValues": ["G1.X", "G2.X"],
                "defaultValue": "G1.X"
           },
           "Dpu" : {
                "type" : "Integer",
                "allowedValues" : [2, 4, 6],
                "defaultValue" : 2
           },
           "DynamoDBTableName": {
                "type": "String",
                "collection" : false
           },
           "ScriptLocation" : {
                "type": "String",
                "collection": false
    	}
    }
}
```

The `layoutGenerator` property specifies the fully qualified name of the function in the script that generates the layout.

The `parameterSpec` property specifies the parameters that this blueprint accepts. For more information, see [Specifying blueprint parameters](developing-blueprints-code-parameters.md).

**Important**  
Your configuration file must include the workflow name as a blueprint parameter, or you must generate a unique workflow name in your layout script.

# Specifying blueprint parameters


The configuration file contains blueprint parameter specifications in a `parameterSpec` JSON object. `parameterSpec` contains one or more parameter objects.

```
"parameterSpec": {
    "<parameter_name>": {
      "type": "<parameter-type>",
      "collection": true|false, 
      "description": "<parameter-description>",
      "defaultValue": "<default value for the parameter if value not specified>"
      "allowedValues": "<list of allowed values>" 
    },
    "<parameter_name>": {    
       ...
    }
  }
```

The following are the rules for coding each parameter object:
+ The parameter name and `type` are mandatory. All other properties are optional.
+ If you specify the `defaultValue` property, the parameter is optional. Otherwise the parameter is mandatory and the data analyst who is creating a workflow from the blueprint must provide a value for it.
+ If you set the `collection` property to `true`, the parameter can take a collection of values. Collections can be of any data type.
+ If you specify `allowedValues`, the AWS Glue console displays a dropdown list of values for the data analyst to choose from when creating a workflow from the blueprint.

The following are the permitted values for `type`:


| Parameter data type | Notes | 
| --- | --- | 
| String | - | 
| Integer | - | 
| Double | - | 
| Boolean | Possible values are true and false. Generates a check box on the Create a workflow from <blueprint> page on the AWS Glue console. | 
| S3Uri | Complete Amazon S3 path, beginning with s3://. Generates a text field and Browse button on the Create a workflow from <blueprint> page. | 
| S3Bucket | Amazon S3 bucket name only. Generates a bucket picker on the Create a workflow from <blueprint> page. | 
| IAMRoleArn | Amazon Resource Name (ARN) of an AWS Identity and Access Management (IAM) role. Generates a role picker on the Create a workflow from <blueprint> page. | 
| IAMRoleName | Name of an IAM role. Generates a role picker on the Create a workflow from <blueprint> page. | 

# Sample blueprint project


Data format conversion is a frequent extract, transform, and load (ETL) use case. In typical analytic workloads, column-based file formats like Parquet or ORC are preferred over text formats like CSV or JSON. This sample blueprint enables you to convert data from CSV/JSON/etc. into Parquet for files on Amazon S3. 

This blueprint takes a list of S3 paths defined by a blueprint parameter, converts the data to Parquet format, and writes it to the S3 location specified by another blueprint parameter. The layout script creates a crawler and job for each path. The layout script also uploads the ETL script in `Conversion.py` to an S3 bucket specified by another blueprint parameter. The layout script then specifies the uploaded script as the ETL script for each job. The ZIP archive for the project contains the layout script, the ETL script, and the blueprint configuration file.

For information about more sample blueprint projects, see [Blueprint samples](developing-blueprints-samples.md).

The following is the layout script, in the file `Layout.py`.

```
from awsglue.blueprint.workflow import *
from awsglue.blueprint.job import *
from awsglue.blueprint.crawler import *
import boto3

s3_client = boto3.client('s3')

# Ingesting all the S3 paths as Glue table in parquet format
def generate_layout(user_params, system_params):
    #Always give the full path for the file
    with open("ConversionBlueprint/Conversion.py", "rb") as f:
        s3_client.upload_fileobj(f, user_params['ScriptsBucket'], "Conversion.py")
    etlScriptLocation = "s3://{}/Conversion.py".format(user_params['ScriptsBucket'])    
    crawlers = []
    jobs = []
    workflowName = user_params['WorkflowName']
    for path in user_params['S3Paths']:
      tablePrefix = "source_" 
      crawler = Crawler(Name="{}_crawler".format(workflowName),
                        Role=user_params['PassRole'],
                        DatabaseName=user_params['TargetDatabase'],
                        TablePrefix=tablePrefix,
                        Targets= {"S3Targets": [{"Path": path}]})
      crawlers.append(crawler)
      transform_job = Job(Name="{}_transform_job".format(workflowName),
                         Command={"Name": "glueetl",
                                  "ScriptLocation": etlScriptLocation,
                                  "PythonVersion": "3"},
                         Role=user_params['PassRole'],
                         DefaultArguments={"--database_name": user_params['TargetDatabase'],
                                           "--table_prefix": tablePrefix,
                                           "--region_name": system_params['region'],
                                           "--output_path": user_params['TargetS3Location']},
                         DependsOn={crawler: "SUCCEEDED"},
                         WaitForDependencies="AND")
      jobs.append(transform_job)
    conversion_workflow = Workflow(Name=workflowName, Entities=Entities(Jobs=jobs, Crawlers=crawlers))
    return conversion_workflow
```

The following is the corresponding blueprint configuration file `blueprint.cfg`.

```
{
    "layoutGenerator": "ConversionBlueprint.Layout.generate_layout",
    "parameterSpec" : {
        "WorkflowName" : {
            "type": "String",
            "collection": false,
            "description": "Name for the workflow."
        },
        "S3Paths" : {
            "type": "S3Uri",
            "collection": true,
            "description": "List of Amazon S3 paths for data ingestion."
        },
        "PassRole" : {
            "type": "IAMRoleName",
            "collection": false,
            "description": "Choose an IAM role to be used in running the job/crawler"
        },
        "TargetDatabase": {
            "type": "String",
            "collection" : false,
            "description": "Choose a database in the Data Catalog."
        },
        "TargetS3Location": {
            "type": "S3Uri",
            "collection" : false,
            "description": "Choose an Amazon S3 output path: ex:s3://<target_path>/."
        },
        "ScriptsBucket": {
            "type": "S3Bucket",
            "collection": false,
            "description": "Provide an S3 bucket name(in the same AWS Region) to store the scripts."
        }
    }
}
```

The following script in the file `Conversion.py` is the uploaded ETL script. Note that it preserves the partitioning scheme during conversion. 

```
import sys
from pyspark.sql.functions import *
from pyspark.context import SparkContext
from awsglue.transforms import *
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
import boto3

args = getResolvedOptions(sys.argv, [
    'JOB_NAME',
    'region_name',
    'database_name',
    'table_prefix',
    'output_path'])
databaseName = args['database_name']
tablePrefix = args['table_prefix']
outputPath = args['output_path']

glue = boto3.client('glue', region_name=args['region_name'])

glue_context = GlueContext(SparkContext.getOrCreate())
spark = glue_context.spark_session
job = Job(glue_context)
job.init(args['JOB_NAME'], args)

def get_tables(database_name, table_prefix):
    tables = []
    paginator = glue.get_paginator('get_tables')
    for page in paginator.paginate(DatabaseName=database_name, Expression=table_prefix+"*"):
        tables.extend(page['TableList'])
    return tables

for table in get_tables(databaseName, tablePrefix):
    tableName = table['Name']
    partitionList = table['PartitionKeys']
    partitionKeys = []
    for partition in partitionList:
        partitionKeys.append(partition['Name'])

    # Create DynamicFrame from Catalog
    dyf = glue_context.create_dynamic_frame.from_catalog(
        name_space=databaseName,
        table_name=tableName,
        additional_options={
            'useS3ListImplementation': True
        },
        transformation_ctx='dyf'
    )

    # Resolve choice type with make_struct
    dyf = ResolveChoice.apply(
        frame=dyf,
        choice='make_struct',
        transformation_ctx='resolvechoice_' + tableName
    )

    # Drop null fields
    dyf = DropNullFields.apply(
        frame=dyf,
        transformation_ctx="dropnullfields_" + tableName
    )

    # Write DynamicFrame to S3 in glueparquet
    sink = glue_context.getSink(
        connection_type="s3",
        path=outputPath,
        enableUpdateCatalog=True,
        partitionKeys=partitionKeys
    )
    sink.setFormat("glueparquet")

    sink.setCatalogInfo(
        catalogDatabase=databaseName,
        catalogTableName=tableName[len(tablePrefix):]
    )
    sink.writeFrame(dyf)

job.commit()
```

**Note**  
Only two Amazon S3 paths can be supplied as an input to the sample blueprint. This is because AWS Glue triggers are limited to invoking only two crawler actions.

# Testing a blueprint


While you develop your code, you should perform local testing to verify that the workflow layout is correct.

Local testing doesn't generate AWS Glue jobs, crawlers, or triggers. Instead, you run the layout script locally and use the `to_json()` and `validate()` methods to print objects and find errors. These methods are available in all three classes defined in the libraries. 

There are two ways to handle the `user_params` and `system_params` arguments that AWS Glue passes to your layout function. Your test-bench code can create a dictionary of sample blueprint parameter values and pass that to the layout function as the `user_params` argument. Or, you can remove the references to `user_params` and replace them with hardcoded strings.

If your code makes use of the `region` and `accountId` properties in the `system_params` argument, you can pass in your own dictionary for `system_params`.

**To test a blueprint**

1. Start a Python interpreter in a directory with the libraries, or load the blueprint files and the supplied libraries into your preferred integrated development environment (IDE).

1. Ensure that your code imports the supplied libraries.

1. Add code to your layout function to call `validate()` or `to_json()` on any entity or on the `Workflow` object. For example, if your code creates a `Crawler` object named `mycrawler`, you can call `validate()` as follows.

   ```
   mycrawler.validate()
   ```

   You can print `mycrawler` as follows:

   ```
   print(mycrawler.to_json())
   ```

   If you call `to_json` on an object, there is no need to also call `validate()`, because` to_json()` calls `validate()`. 

   It is most useful to call these methods on the workflow object. Assuming that your script names the workflow object `my_workflow`, validate and print the workflow object as follows.

   ```
   print(my_workflow.to_json())
   ```

   For more information about `to_json()` and `validate()`, see [Class methods](developing-blueprints-code-classes.md#developing-blueprints-code-methods).

   You can also import `pprint` and pretty-print the workflow object, as shown in the example later in this section.

1. Run the code, fix errors, and finally remove any calls to `validate()` or `to_json()`.

**Example**  
The following example shows how to construct a dictionary of sample blueprint parameters and pass it in as the `user_params` argument to layout function `generate_compaction_workflow`. It also shows how to pretty-print the generated workflow object.  

```
from pprint import pprint
from awsglue.blueprint.workflow import *
from awsglue.blueprint.job import *
from awsglue.blueprint.crawler import *
 
USER_PARAMS = {"WorkflowName": "compaction_workflow",
               "ScriptLocation": "s3://amzn-s3-demo-bucket/scripts/threaded-compaction.py",
               "PassRole": "arn:aws:iam::111122223333:role/GlueRole-ETL",
               "DatabaseName": "cloudtrial",
               "TableName": "ct_cloudtrail",
               "CoalesceFactor": 4,
               "MaxThreadWorkers": 200}
 
 
def generate_compaction_workflow(user_params: dict, system_params: dict) -> Workflow:
    compaction_job = Job(Name=f"{user_params['WorkflowName']}_etl_job",
                         Command={"Name": "glueetl",
                                  "ScriptLocation": user_params['ScriptLocation'],
                                  "PythonVersion": "3"},
                         Role="arn:aws:iam::111122223333:role/AWSGlueServiceRoleDefault",
                         DefaultArguments={"DatabaseName": user_params['DatabaseName'],
                                           "TableName": user_params['TableName'],
                                           "CoalesceFactor": user_params['CoalesceFactor'],
                                           "max_thread_workers": user_params['MaxThreadWorkers']})
 
    catalog_target = {"CatalogTargets": [{"DatabaseName": user_params['DatabaseName'], "Tables": [user_params['TableName']]}]}
 
    compacted_files_crawler = Crawler(Name=f"{user_params['WorkflowName']}_post_crawl",
                                      Targets = catalog_target,
                                      Role=user_params['PassRole'],
                                      DependsOn={compaction_job: "SUCCEEDED"},
                                      WaitForDependencies="AND",
                                      SchemaChangePolicy={"DeleteBehavior": "LOG"})
 
    compaction_workflow = Workflow(Name=user_params['WorkflowName'],
                                   Entities=Entities(Jobs=[compaction_job],
                                                     Crawlers=[compacted_files_crawler]))
    return compaction_workflow
 
generated = generate_compaction_workflow(user_params=USER_PARAMS, system_params={})
gen_dict = generated.to_json()
 
pprint(gen_dict)
```

# Publishing a blueprint


After you develop a blueprint, you must upload it to Amazon S3. You must have write permissions on the Amazon S3 bucket that you use to publish the blueprint. You must also make sure that the AWS Glue administrator, who will register the blueprint, has read access to the Amazon S3 bucket. For the suggested AWS Identity and Access Management (IAM) permissions policies for personas and roles for AWS Glue blueprints, see [Permissions for personas and roles for AWS Glue blueprints](blueprints-personas-permissions.md).

**To publish a blueprint**

1. Create the necessary scripts, resources, and blueprint configuration file.

1. Add all files to a ZIP archive and upload the ZIP file to Amazon S3. Use an S3 bucket that is in the same Region as the Region in which users will register and run the blueprint.

   You can create a ZIP file from the command line using the following command.

   ```
   zip -r folder.zip folder
   ```

1. Add a bucket policy that grants read permissions to the AWS desired account. The following is a sample policy.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "AWS": "arn:aws:iam::111122223333:root"
         },
         "Action": "s3:GetObject",
         "Resource": "arn:aws:s3:::my-blueprints/*"
       }
     ]
   }
   ```

------

1. Grant the IAM `s3:GetObject` permission on the Amazon S3 bucket to the AWS Glue administrator or to whoever will be registering blueprints. For a sample policy to grant to administrators, see [AWS Glue administrator permissions for blueprints](blueprints-personas-permissions.md#bp-persona-admin).

After you have completed local testing of your blueprint, you may also want to test a blueprint on AWS Glue. To test a blueprint on AWS Glue, it must be registered. You can limit who sees the registered blueprint using IAM authorization, or by using separate testing accounts.

**See also:**  
[Registering a blueprint in AWS Glue](registering-blueprints.md)

# AWS Glue blueprint classes reference
Blueprint classes reference

The libraries for AWS Glue blueprints define three classes that you use in your workflow layout script: `Job`, `Crawler`, and `Workflow`.

**Topics**
+ [

## Job class
](#developing-blueprints-code-jobclass)
+ [

## Crawler class
](#developing-blueprints-code-crawlerclass)
+ [

## Workflow class
](#developing-blueprints-code-workflowclass)
+ [

## Class methods
](#developing-blueprints-code-methods)

## Job class


The `Job` class represents an AWS Glue ETL job.

**Mandatory constructor arguments**  
The following are mandatory constructor arguments for the `Job` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| Name | str | Name to assign to the job. AWS Glue adds a randomly generated suffix to the name to distinguish the job from those created by other blueprint runs. | 
| Role | str | Amazon Resource Name (ARN) of the role that the job should assume while executing. | 
| Command | dict | Job command, as specified in the [JobCommand structure](aws-glue-api-jobs-job.md#aws-glue-api-jobs-job-JobCommand) in the API documentation.  | 

**Optional constructor arguments**  
The following are optional constructor arguments for the `Job` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| DependsOn | dict | List of workflow entities that the job depends on. For more information, see [Using the DependsOn argument](developing-blueprints-code-layout.md#developing-blueprints-code-layout-depends-on). | 
| WaitForDependencies | str | Indicates whether the job should wait until all entities on which it depends complete before executing or until any completes. For more information, see [Using the WaitForDependencies argument](developing-blueprints-code-layout.md#developing-blueprints-code-layout-wait-for-dependencies). Omit if the job depends on only one entity. | 
| (Job properties) | - | Any of the job properties listed in [Job structure](aws-glue-api-jobs-job.md#aws-glue-api-jobs-job-Job) in the AWS Glue API documentation (except CreatedOn and LastModifiedOn). | 

## Crawler class


The `Crawler` class represents an AWS Glue crawler.

**Mandatory constructor arguments**  
The following are mandatory constructor arguments for the `Crawler` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| Name | str | Name to assign to the crawler. AWS Glue adds a randomly generated suffix to the name to distinguish the crawler from those created by other blueprint runs. | 
| Role | str | ARN of the role that the crawler should assume while running. | 
| Targets | dict | Collection of targets to crawl. Targets class constructor arguments are defined in the [CrawlerTargets structure](aws-glue-api-crawler-crawling.md#aws-glue-api-crawler-crawling-CrawlerTargets) in the API documentation. All Targets constructor arguments are optional, but you must pass at least one.  | 

**Optional constructor arguments**  
The following are optional constructor arguments for the `Crawler` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| DependsOn | dict | List of workflow entities that the crawler depends on. For more information, see [Using the DependsOn argument](developing-blueprints-code-layout.md#developing-blueprints-code-layout-depends-on). | 
| WaitForDependencies | str | Indicates whether the crawler should wait until all entities on which it depends complete before running or until any completes. For more information, see [Using the WaitForDependencies argument](developing-blueprints-code-layout.md#developing-blueprints-code-layout-wait-for-dependencies). Omit if the crawler depends on only one entity. | 
| (Crawler properties) | - | Any of the crawler properties listed in [Crawler structure](aws-glue-api-crawler-crawling.md#aws-glue-api-crawler-crawling-Crawler) in the AWS Glue API documentation, with the following exceptions:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/developing-blueprints-code-classes.html) | 

## Workflow class


The `Workflow` class represents an AWS Glue workflow. The workflow layout script returns a `Workflow` object. AWS Glue creates a workflow based on this object.

**Mandatory constructor arguments**  
The following are mandatory constructor arguments for the `Workflow` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| Name | str | Name to assign to the workflow. | 
| Entities | Entities | A collection of entities (jobs and crawlers) to include in the workflow. The Entities class constructor accepts a Jobs argument, which is a list of Job objects, and a Crawlers argument, which is a list of Crawler objects. | 

**Optional constructor arguments**  
The following are optional constructor arguments for the `Workflow` class.


| Argument name | Type | Description | 
| --- | --- | --- | 
| Description | str | See [Workflow structure](aws-glue-api-workflow.md#aws-glue-api-workflow-Workflow). | 
| DefaultRunProperties | dict | See [Workflow structure](aws-glue-api-workflow.md#aws-glue-api-workflow-Workflow). | 
| OnSchedule | str | A cron expression. | 

## Class methods


All three classes include the following methods.

**validate()**  
Validates the properties of the object and if errors are found, outputs a message and exits. Generates no output if there are no errors. For the `Workflow` class, calls itself on every entity in the workflow.

**to\$1json()**  
Serializes the object to JSON. Also calls `validate()`. For the `Workflow` class, the JSON object includes job and crawler lists, and a list of triggers generated by the job and crawler dependency specifications.

# Blueprint samples


There are a number of sample blueprint projects available on the [AWS Glue blueprint Github repository](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples). These samples are for reference only and are not intended for production use.

The titles of the sample projects are:
+ Compaction: this blueprint creates a job that compacts input files into larger chunks based on desired file size.
+ Conversion: this blueprint converts input files in various standard file formats into Apache Parquet format, which is optimized for analytic workloads.
+ Crawling Amazon S3 locations: this blueprint crawls multiple Amazon S3 locations to add metadata tables to the Data Catalog.
+ Custom connection to Data Catalog: this blueprint accesses data stores using AWS Glue custom connectors, reads the records, and populates the table definitions in the AWS Glue Data Catalog based on the record schema.
+ Encoding: this blueprint converts your non-UTF files into UTF encoded files.
+ Partitioning: this blueprint creates a partitioning job that places output files into partitions based on specific partition keys.
+ Importing Amazon S3 data into a DynamoDB table: this blueprint imports data from Amazon S3 into a DynamoDB table.
+ Standard table to governed: this blueprint imports an AWS Glue Data Catalog table into a Lake Formation table.

# Registering a blueprint in AWS Glue
Registering a blueprint

After the AWS Glue developer has coded the blueprint and uploaded a ZIP archive to Amazon Simple Storage Service (Amazon S3), an AWS Glue administrator must register the blueprint. Registering the blueprint makes it available for use.

When you register a blueprint, AWS Glue copies the blueprint archive to a reserved Amazon S3 location. You can then delete the archive from the upload location.

To register a blueprint, you need read permissions on the Amazon S3 location that contains the uploaded archive. You also need the AWS Identity and Access Management (IAM) permission `glue:CreateBlueprint`. For the suggested permissions for an AWS Glue administrator who must register, view, and maintain blueprints, see [AWS Glue administrator permissions for blueprints](blueprints-personas-permissions.md#bp-persona-admin).

You can register a blueprint by using the AWS Glue console, AWS Glue API, or AWS Command Line Interface (AWS CLI).

**To register a blueprint (console)**

1. Ensure that you have read permissions (`s3:GetObject`) on the blueprint ZIP archive in Amazon S3.

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

   Sign in as a user that has permissions to register a blueprint. Switch to the same AWS Region as the Amazon S3 bucket that contains the blueprint ZIP archive.

1. In the navigation pane, choose **blueprints**. Then on the **blueprints** page, choose **Add blueprint**.

1. Enter a blueprint name and optional description.

1. For **ZIP archive location (S3)**, enter the Amazon S3 path of the uploaded blueprint ZIP archive. Include the archive file name in the path and begin the path with `s3://`.

1. (Optional) Add tag one or more tags.

1. Choose **Add blueprint**.

   The **blueprints** page returns and shows that the blueprint status is `CREATING`. Choose the refresh button until the status changes to `ACTIVE` or `FAILED`.

1. If the status is `FAILED`, select the blueprint, and on the **Actions** menu, choose **View**.

   The detail page shows the reason for the failure. If the error message is "Unable to access object at location..." or "Access denied on object at location...", review the following requirements:
   + The user that you are signed in as must have read permission on the blueprint ZIP archive in Amazon S3.
   + The Amazon S3 bucket that contains the ZIP archive must have a bucket policy that grants read permission on the object to your AWS account ID. For more information, see [Developing blueprints in AWS Glue](developing-blueprints.md).
   + The Amazon S3 bucket that you're using must be in the same Region as the Region that you're signed into on the console.

1. Ensure that data analysts have permissions on the blueprint.

   The suggested IAM policy for data analysts is shown in [Data analyst permissions for blueprints](blueprints-personas-permissions.md#bp-persona-analyst). This policy grants `glue:GetBlueprint` on any resource. If your policy is more fine-grained at the resource level, then grant data analysts permissions on this newly created resource.

**To register a blueprint (AWS CLI)**

1. Enter the following command.

   ```
   aws glue create-blueprint --name <blueprint-name> [--description <description>] --blueprint-location s3://<s3-path>/<archive-filename>
   ```

1. Enter the following command to check the blueprint status. Repeat the command until the status goes to `ACTIVE` or `FAILED`.

   ```
   aws glue get-blueprint --name <blueprint-name>
   ```

   If the status is `FAILED` and the error message is "Unable to access object at location..." or "Access denied on object at location...", review the following requirements:
   + The user that you are signed in as must have read permission on the blueprint ZIP archive in Amazon S3.
   + The Amazon S3 bucket containing the ZIP archive must have a bucket policy that grants read permission on the object to your AWS account ID. For more information, see [Publishing a blueprint](developing-blueprints-publishing.md).
   + The Amazon S3 bucket that you're using must be in the same Region as the Region that you're signed into on the console.

**See also:**  
[Overview of blueprints in AWS Glue](blueprints-overview.md)

# Viewing blueprints in AWS Glue
Viewing blueprints

View a blueprint to review the blueprint description, status, and parameter specifications, and to download the blueprint ZIP archive.

You can view a blueprint by using the AWS Glue console, AWS Glue API, or AWS Command Line Interface (AWS CLI).

**To view a blueprint (console)**

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **blueprints**.

1. On the **blueprints** page, select a blueprint. Then on the **Actions** menu, choose **View**.

**To view a blueprint (AWS CLI)**
+ Enter the following command to view just the blueprint name, description, and status. Replace *<blueprint-name>* with the name of the blueprint to view.

  ```
  aws glue get-blueprint --name <blueprint-name>
  ```

  The command output looks something like the following.

  ```
  {
      "Blueprint": {
          "Name": "myDemoBP",
          "CreatedOn": 1587414516.92,
          "LastModifiedOn": 1587428838.671,
          "BlueprintLocation": "s3://amzn-s3-demo-bucket1/demo/DemoBlueprintProject.zip",
          "Status": "ACTIVE"
      }
  }
  ```

  Enter the following command to also view the parameter specifications.

  ```
  aws glue get-blueprint --name <blueprint-name>  --include-parameter-spec
  ```

  The command output looks something like the following.

  ```
  {
      "Blueprint": {
          "Name": "myDemoBP",
          "CreatedOn": 1587414516.92,
          "LastModifiedOn": 1587428838.671,
          "ParameterSpec": "{\"WorkflowName\":{\"type\":\"String\",\"collection\":false,\"description\":null,\"defaultValue\":null,\"allowedValues\":null},\"PassRole\":{\"type\":\"String\",\"collection\":false,\"description\":null,\"defaultValue\":null,\"allowedValues\":null},\"DynamoDBTableName\":{\"type\":\"String\",\"collection\":false,\"description\":null,\"defaultValue\":null,\"allowedValues\":null},\"ScriptLocation\":{\"type\":\"String\",\"collection\":false,\"description\":null,\"defaultValue\":null,\"allowedValues\":null}}",
          "BlueprintLocation": "s3://awsexamplebucket1/demo/DemoBlueprintProject.zip",
          "Status": "ACTIVE"
      }
  }
  ```

  Add the `--include-blueprint` argument to include a URL in the output that you can paste into your browser to download the blueprint ZIP archive that AWS Glue stored.

**See also:**  
[Overview of blueprints in AWS Glue](blueprints-overview.md)

# Updating a blueprint in AWS Glue
Updating a blueprint

You can update a blueprint if you have a revised layout script, a revised set of blueprint parameters, or revised supporting files. Updating a blueprint creates a new version.

Updating a blueprint doesn't affect existing workflows created from the blueprint.

You can update a blueprint by using the AWS Glue console, AWS Glue API, or AWS Command Line Interface (AWS CLI).

The following procedure assumes that the AWS Glue developer has created and uploaded an updated blueprint ZIP archive to Amazon S3.

**To update a blueprint (console)**

1. Ensure that you have read permissions (`s3:GetObject`) on the blueprint ZIP archive in Amazon S3.

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

   Sign in as a user that has permissions to update a blueprint. Switch to the same AWS Region as the Amazon S3 bucket that contains the blueprint ZIP archive.

1. In the navigation pane, choose **blueprints**.

1. On the **blueprints** page, select a blueprint, and on the **Actions** menu, choose **Edit**.

1. On the **Edit a blueprint** page, update the blueprint **Description** or **ZIP archive location (S3)**. Be sure to include the archive file name in the path.

1. Choose **Save**.

   The **blueprints** page returns and shows that the blueprint status is `UPDATING`. Choose the refresh button until the status changes to `ACTIVE` or `FAILED`.

1. If the status is `FAILED`, select the blueprint, and on the **Actions** menu, choose **View**.

   The detail page shows the reason for the failure. If the error message is "Unable to access object at location..." or "Access denied on object at location...", review the following requirements:
   + The user that you are signed in as must have read permission on the blueprint ZIP archive in Amazon S3.
   + The Amazon S3 bucket that contains the ZIP archive must have a bucket policy that grants read permission on the object to your AWS account ID. For more information, see [Publishing a blueprint](developing-blueprints-publishing.md).
   + The Amazon S3 bucket that you're using must be in the same Region as the Region that you're signed into on the console.
**Note**  
If the update fails, the next blueprint run uses the latest version of the blueprint that was successfully registered or updated.

**To update a blueprint (AWS CLI)**

1. Enter the following command.

   ```
   aws glue update-blueprint --name <blueprint-name> [--description <description>] --blueprint-location s3://<s3-path>/<archive-filename>
   ```

1. Enter the following command to check the blueprint status. Repeat the command until the status goes to `ACTIVE` or `FAILED`.

   ```
   aws glue get-blueprint --name <blueprint-name>
   ```

   If the status is `FAILED` and the error message is "Unable to access object at location..." or "Access denied on object at location...", review the following requirements:
   + The user that you are signed in as must have read permission on the blueprint ZIP archive in Amazon S3.
   + The Amazon S3 bucket containing the ZIP archive must have a bucket policy that grants read permission on the object to your AWS account ID. For more information, see [Publishing a blueprint](developing-blueprints-publishing.md).
   + The Amazon S3 bucket that you're using must be in the same Region as the Region that you're signed into on the console.

**See also**  
[Overview of blueprints in AWS Glue](blueprints-overview.md)

# Creating a workflow from a blueprint in AWS Glue
Creating a workflow from a blueprint

You can create an AWS Glue workflow manually, adding one component at a time, or you can create a workflow from an AWS Glue [blueprint](blueprints-overview.md). AWS Glue includes blueprints for common use cases. Your AWS Glue developers can create additional blueprints.

**Important**  
Limit the total number of jobs, crawlers, and triggers within a workflow to 100 or less. If you include more than 100, you might get errors when trying to resume or stop workflow runs.

When you use a blueprint, you can quickly generate a workflow for a specific use case based on the generalized use case defined by the blueprint. You define the specific use case by providing values for the blueprint parameters. For example, a blueprint that partitions a dataset could have the Amazon S3 source and target paths as parameters.

AWS Glue creates a workflow from a blueprint by *running* the blueprint. The blueprint run saves the parameter values that you supplied, and is used to track the progress and outcome of the creation of the workflow and its components. When troubleshooting a workflow, you can view the blueprint run to determine the blueprint parameter values that were used to create a workflow.

To create and view workflows, you require certain IAM permissions. For a suggested IAM policy, see [Data analyst permissions for blueprints](blueprints-personas-permissions.md#bp-persona-analyst).

You can create a workflow from a blueprint by using the AWS Glue console, AWS Glue API, or AWS Command Line Interface (AWS CLI).

**To create a workflow from a blueprint (console)**

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

   Sign in as a user that has permissions to create a workflow.

1. In the navigation pane, choose **blueprints**.

1. Select a blueprint, and on the **Actions** menu, choose **Create workflow**. 

1. On the **Create a workflow from <blueprint-name>** page, enter the following information:  
**Blueprint parameters**  
These vary depending on the blueprint design. For questions about the parameters, see the developer. blueprints typically include a parameter for the workflow name.  
**IAM role**  
The role that AWS Glue assumes to create the workflow and its components. The role must have permissions to create and delete workflows, jobs, crawlers, and triggers. For a suggested policy for the role, see [Permissions for blueprint roles](blueprints-personas-permissions.md#blueprints-role-permissions).

1. Choose **Submit**.

   The **Blueprint Details** page appears, showing a list of blueprint runs at the bottom.

1. In the blueprint runs list, check the topmost blueprint run for workflow creation status. 

   The initial status is `RUNNING`. Choose the refresh button until the status goes to `SUCCEEDED` or `FAILED`. 

1. Do one of the following:
   + If the completion status is `SUCCEEDED`, you can go to the **Workflows** page, select the newly created workflow, and run it. Before running the workflow, you can review the design graph.
   + If the completion status is `FAILED`, select the blueprint run, and on the **Actions** menu, choose **View** to see the error message.

For more information on workflows and blueprints, see the following topics.
+ [Overview of workflows in AWS Glue](workflows_overview.md)
+ [Updating a blueprint in AWS Glue](updating_blueprints.md)
+ [Creating and building out a workflow manually in AWS Glue](creating_running_workflows.md)

# Viewing blueprint runs in AWS Glue
Viewing blueprint runs

View a blueprint run to see the following information:
+ Name of the workflow that was created.
+ blueprint parameter values that were used to create the workflow.
+ Status of the workflow creation operation.

You can view a blueprint run by using the AWS Glue console, AWS Glue API, or AWS Command Line Interface (AWS CLI).

**To view a blueprint run (console)**

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **blueprints**.

1. On the **blueprints** page, select a blueprint. Then on the **Actions** menu, choose **View**.

1. At the bottom of the **Blueprint Details** page, select a blueprint run, and on the **Actions** menu, choose **View**.

**To view a blueprint run (AWS CLI)**
+ Enter the following command. Replace *<blueprint-name>* with the name of the blueprint. Replace *<blueprint-run-id>* with the blueprint run ID.

  ```
  aws glue get-blueprint-run --blueprint-name <blueprint-name> --run-id <blueprint-run-id>
  ```

**See also:**  
[Overview of blueprints in AWS Glue](blueprints-overview.md)