

We are no longer updating the Amazon Machine Learning service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. For more information, see [ What is Amazon Machine Learning](https://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html).

# Creating an Amazon ML Datasource from Data in Amazon Redshift


If you have data stored in Amazon Redshift, you can use the **Create Datasource** wizard in the Amazon Machine Learning (Amazon ML) console to create a datasource object. When you create a datasource from Amazon Redshift data, you specify the cluster that contains your data and the SQL query to retrieve your data. Amazon ML executes the query by invoking the Amazon Redshift `Unload` command on the cluster. Amazon ML stores the results in the Amazon Simple Storage Service (Amazon S3) location of your choice, and then uses the data stored in Amazon S3 to create the datasource. The datasource, Amazon Redshift cluster, and S3 bucket must all be in the same region.

**Note**  
 Amazon ML doesn't support creating datasources from Amazon Redshift clusters in private VPCs. The cluster must have a public IP address.

**Topics**
+ [

# Required Parameters for the Create Datasource Wizard
](redshift-parameters.md)
+ [

# Creating a Datasource with Amazon Redshift Data (Console)
](create-datasource-from-redshift-procedure.md)
+ [

# Troubleshooting Amazon Redshift Issues
](troubleshooting.md)

# Required Parameters for the Create Datasource Wizard


 To allow Amazon ML to connect to your Amazon Redshift database and read data on your behalf, you must provide the following: 
+ The Amazon Redshift `ClusterIdentifier`
+ The Amazon Redshift database name
+ The Amazon Redshift database credentials (user name and password)
+ The Amazon ML Amazon Redshift AWS Identity and Access Management (IAM) role
+ The Amazon Redshift SQL query
+ (Optional) The location of the Amazon ML schema
+ The Amazon S3 staging location (where Amazon ML puts the data before it creates the datasource)

Additionally, you need to ensure that the IAM users or roles who create Amazon Redshift datasources (whether through the console or by using the `CreateDatasourceFromRedshift` action) have the `iam:PassRole` permission.

**Amazon Redshift `ClusterIdentifier`**  
 Use this case-sensitive parameter to enable Amazon ML to find and connect to your cluster. You can obtain the cluster identifier (name) from the Amazon Redshift console. For more information about clusters, see [Amazon Redshift Clusters](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html). 

**Amazon Redshift Database Name**  
 Use this parameter to tell Amazon ML which database in the Amazon Redshift cluster contains the data that you want to use as your datasource. 

**Amazon Redshift Database Credentials**  
 Use these parameters to specify the username and password of the Amazon Redshift database user in whose context the security query will be executed.   
Amazon ML requires an Amazon Redshift username and password to connect to your Amazon Redshift database. After unloading the data to Amazon S3, Amazon ML never reuses your password, nor does it store it. 

**Amazon ML Amazon Redshift Role**  
 Use this parameter to specify the name of the IAM role that Amazon ML should use to configure the security groups for the Amazon Redshift cluster and the bucket policy for the Amazon S3 staging location.  
If you don't have an IAM role that can access Amazon Redshift, Amazon ML can create a role for you. When Amazon ML creates a role, it creates and attaches a customer managed policy to an IAM role. The policy that Amazon ML creates grants Amazon ML permission to access only the cluster that you specify.  
If you already have an IAM role to access Amazon Redshift, you can type the ARN of the role, or choose the role from the drop down list. IAM roles with Amazon Redshift access are listed at the top of the drop down.  
The IAM role must have the following contents:    
****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
        "Effect": "Allow",
        "Principal": {
            "Service": "machinelearning.amazonaws.com"
        },
        "Action": "sts:AssumeRole",
        "Condition": {
            "StringEquals": { "aws:SourceAccount": "123456789012" },
           "ArnLike": { "aws:SourceArn": "arn:aws:machinelearning:us-east-1:123456789012:datasource/*" }
        }
    }]
}
```
For more information about Customer Managed Policies, see [Customer Managed Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#customer-managed-policies) in the *IAM User Guide*.

**Amazon Redshift SQL Query**  
 Use this parameter to specify the SQL SELECT query that Amazon ML executes on your Amazon Redshift database to select your data. Amazon ML uses the Amazon Redshift [UNLOAD](https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html) action to securely copy the results of your query to an Amazon S3 location.   
 Amazon ML works best when input records are in a random order (shuffled). You can easily shuffle the results of your Amazon Redshift SQL query by using the Amazon Redshift **random()** function. For example, let's say that this is the original query:   

```
 "SELECT col1, col2, … FROM training_table" 
```
 You can embed random shuffling by updating the query like this:   

```
 "SELECT col1, col2, … FROM training_table ORDER BY random()" 
```

**Schema Location (Optional) **  
Use this parameter to specify the Amazon S3 path to your schema for the Amazon Redshift data that Amazon ML will export.  
If you don't provide a schema for your datasource, the Amazon ML console automatically creates an Amazon ML schema based on the data schema of the Amazon Redshift SQL query. Amazon ML schemas have fewer data types than Amazon Redshift schemas, so it is not a one-to-one conversion. The Amazon ML console converts Amazon Redshift data types to Amazon ML data types using the following conversion scheme.      
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/machine-learning/latest/dg/redshift-parameters.html)
To be converted to Amazon ML `Binary` data types, the values of the Amazon Redshift Booleans in your data must be supported Amazon ML Binary values. If your Boolean data type has unsupported values, Amazon ML converts them to the most specific data type it can. For example, if an Amazon Redshift Boolean has the values `0`, `1`, and `2`, Amazon ML converts the Boolean to a `Numeric` data type. For more information about supported binary values, see [Using the AttributeType Field](creating-a-data-schema-for-amazon-ml.md#assigning-data-types).  
If Amazon ML can't figure out a data type, it defaults to `Text`.   
After Amazon ML converts the schema, you can review and correct the assigned Amazon ML data types in the Create Datasource wizard, and revise the schema before Amazon ML creates the datasource. 

**Amazon S3 Staging Location**  
 Use this parameter to specify the name of the Amazon S3 staging location where Amazon ML stores the results of the Amazon Redshift SQL query. After creating the datasource, Amazon ML uses the data in the staging location instead of returning to Amazon Redshift.  
Because Amazon ML assumes the IAM role defined by the Amazon ML Amazon Redshift role, Amazon ML has permissions to access any objects in the specified Amazon S3 staging location. Because of this, we recommend that you store only files that don't contain sensitive information in the Amazon S3 staging location. For example, if your root bucket is `s3://mybucket/`, we suggest that you create a location to store only the files that you want Amazon ML to access, such as `s3://mybucket/AmazonMLInput/`. 

# Creating a Datasource with Amazon Redshift Data (Console)


The Amazon ML console provides two ways to create a datasource using Amazon Redshift data. You can create a datasource by completing the Create Datasource wizard, or, if you already have a datasource created from Amazon Redshift data, you can copy the original datasource and modify its settings. Copying a datasource allows you to easily create multiple similar datasources. 

For information about creating a datasource using the API, see [CreateDataSourceFromRedshift](https://docs.aws.amazon.com/machine-learning/latest/APIReference/API_CreateDataSourceFromRedshift.html).

For more information about the parameters in the following procedures, see [Required Parameters for the Create Datasource Wizard](redshift-parameters.md).

**Topics**
+ [

## Creating a Datasource (Console)
](#create-redshift-datasource)
+ [

## Copying a Datasource (Console)
](#copy-redshift-datasource)

## Creating a Datasource (Console)


To unload data from Amazon Redshift into an Amazon ML datasource, use the Create Datasource wizard. 

**To create a datasource from data in Amazon Redshift**

1. Open the Amazon Machine Learning console at [https://console.aws.amazon.com/machinelearning/](https://console.aws.amazon.com/machinelearning/).

1. On the Amazon ML dashboard, under **Entities**, choose **Create new...**, and then choose **Datasource**.

1. On the **Input data** page, choose **Amazon Redshift**. 

1. In the Create Datasource wizard, for **Cluster identifier**, type the name of your cluster. 

1. For **Database name**, type the name of the Amazon Redshift database. 

1. For **Database user name**, type your database username. 

1. For **Database password**, type your database password. 

1. For **IAM role**, choose your IAM role. If you don't already have one, choose **Create a new role**. Amazon ML creates an IAM Amazon Redshift role for you. 

1. To test your Amazon Redshift settings, choose **Test Access** (next to **IAM role**). If Amazon ML can't connect to Amazon Redshift with the provided settings, you can't continue creating a datasource. For troubleshooting help, see [Troubleshooting Errors](troubleshooting.md#trouble-errors).

1. For **SQL query**, type your SQL query. 

1. For **Schema location**, choose whether you want Amazon ML to create a schema for you. If you have created a schema yourself, type the Amazon S3 path to your schema file. 

1. For **Amazon S3 staging location**, type the Amazon S3 path to the bucket where you want Amazon ML to put the data it unloads from Amazon Redshift. 

1. (Optional) For **Datasource name**, type a name for your datasource.

1. Choose **Verify**. Amazon ML verifies that it can connect to your Amazon Redshift database.

1. On the **Schema** page, review the data types for all attributes and correct them, as necessary.

1. Choose **Continue**.

1. If you want to use this datasource to create or evaluate an ML model, for **Do you plan to use this dataset to create or evaluate an ML model?**, choose **Yes**. If you choose **Yes**, choose your target row. For information about targets, see [Using the targetAttributeName Field](creating-a-data-schema-for-amazon-ml.md#using-the-targetattributename-field).

   If you want to use this datasource along with a model that you have already created to create predictions, choose **No**. 

1. Choose **Continue**.

1. For **Does your data contain an identifier?**, if your data doesn't contain a row identifier, choose **No**.

   If your data does contain a row identifier, choose **Yes**. For information about row identifiers, see [Using the rowID Field](creating-a-data-schema-for-amazon-ml.md#using-the-rowid-field).

1. Choose **Review**.

1. On the **Review** page, review your settings, and then choose **Finish**.

After you have created a datasource, you can use it to [create an ML model](creating-ml-model-on-the-amazon-ml-console.md). If you have already created a model, you can use the datasource to [evaluate an ML model](evaluating_models.md) or [generate predictions](interpreting_predictions.md).

## Copying a Datasource (Console)


When you want to create a datasource that is similar to an existing datasource, you can use the Amazon ML console to copy the original datasource and modify its settings. For example, you might choose to start with an existing datasource, and then modify the data schema to match your data more closely; change the SQL query used to unload data from Amazon Redshift; or specify a different AWS Identity and Access Management (IAM) user to access the Amazon Redshift cluster.

**To copy and modify an Amazon Redshift datasource**

1. Open the Amazon Machine Learning console at [https://console.aws.amazon.com/machinelearning/](https://console.aws.amazon.com/machinelearning/).

1. On the Amazon ML dashboard, under **Entities**, choose **Create new...**, and then choose **Datasource**.

1. On the **Input data** page, for **Where is your data?**, choose **Amazon Redshift**. If you already have a datasource created from Amazon Redshift data, you have the option of copying settings from another datasource.   
![\[Amazon S3 and Amazon Redshift icons with option to copy settings from existing datasource.\]](http://docs.aws.amazon.com/machine-learning/latest/dg/images/infobar.png)

   If you don't already have a datasource created from Amazon Redshift data, this option doesn't appear.

1. Choose **Find a datasource**. 

1. Select the datasource that you want to copy, and choose **Copy settings**. Amazon ML auto-populates most of the datasource settings with settings from the original datasource. It doesn't copy the database password, schema location, or datasource name from the original datasource.

1. Modify any of the auto-populated settings that you want to change. For example, if you want to change the data that Amazon ML unloads from Amazon Redshift, change the SQL query.

1. For **Database password**, type your database password. Amazon ML doesn't store or reuse your password, so you must always provide it.

1. (Optional) For **Schema location**, Amazon ML pre-selects **I want Amazon ML to generate a recommended schema** for you. If you have already created a schema, choose **I want to use the schema that I created and stored in Amazon S3** and type the path to your schema file in Amazon S3. 

1. (Optional) For **Datasource name**, type a name for your datasource. Otherwise, Amazon ML generates a new datasource name for you.

1. Choose **Verify**. Amazon ML verifies that it can connect to your Amazon Redshift database.

1. (Optional) If Amazon ML inferred the schema for you, on the **Schema** page, review the data types for all attributes and correct them, as necessary.

1. Choose **Continue**.

1. If you want to use this datasource to create or evaluate an ML model, for **Do you plan to use this dataset to create or evaluate an ML model?**, choose **Yes**. If you choose **Yes**, choose your target row. For information about targets, see [Using the targetAttributeName Field](creating-a-data-schema-for-amazon-ml.md#using-the-targetattributename-field).

   If you want to use this datasource along with a model that you have already created to create predictions, choose **No**. 

1. Choose **Continue**.

1. For **Does your data contain an identifier?**, if your data doesn't contain a row identifier, choose **No**.

   If your data contains a row identifier, choose **Yes**, and select the row that you want to use as an identifier. For information about row identifiers, see [Using the rowID Field](creating-a-data-schema-for-amazon-ml.md#using-the-rowid-field).

1. Choose **Review**.

1. Review your settings, and then choose **Finish**.

After you have created a datasource, you can use it to [create an ML model](creating-ml-model-on-the-amazon-ml-console.md). If you have already created a model, you can use the datasource to [evaluate an ML model](evaluating_models.md) or [generate predictions](interpreting_predictions.md).

# Troubleshooting Amazon Redshift Issues


As you create your Amazon Redshift datasource, ML models, and evaluation, Amazon Machine Learning (Amazon ML) reports the status of your Amazon ML objects in the Amazon ML console. If Amazon ML returns error messages, use the following information and resources to troubleshoot the issues. 

For answers to general questions about Amazon ML, see the [Amazon Machine Learning FAQs](https://aws.amazon.com/machine-learning/faqs/). You can also search for answers and post questions in the [Amazon Machine Learning forum](https://forums.aws.amazon.com/forum.jspa?forumID=194). 



**Topics**
+ [

## Troubleshooting Errors
](#trouble-errors)
+ [

## Contacting AWS Support
](#contacting-support)

## Troubleshooting Errors


### The format of the role is invalid. Provide a valid IAM role. For example, arn:aws:iam::YourAccountID:role/YourRedshiftRole.


**Cause**

The format of the Amazon Resource Name (ARN) of your IAM role is incorrect. 

**Solution**

In the Create Datasource wizard, correct the ARN for your role. For information about formatting role ARNs, see [IAM ARNs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-arns) in the *IAM User Guide*. The region is optional for IAM role ARNs.

### The role is invalid. Amazon ML can't assume the <role ARN> IAM role. Provide a valid IAM role and make it accessible to Amazon ML.


**Cause**

Your role isn't set up to allow Amazon ML to assume it.

**Solution**

In the [IAM console](https://console.aws.amazon.com/iam/), edit your role so that it has a trust policy that allows Amazon ML to assume the role attached to it. 

### This <user ARN> user is not authorized to pass the <role ARN> IAM role.


**Cause**

Your IAM user doesn't have a permissions policy that allows it to pass a role to Amazon ML.

**Solution**

Attach a permissions policy to your IAM user that allows you to pass roles to Amazon ML. You can attach a permissions policy to your IAM user in the [IAM console](https://console.aws.amazon.com/iam/).

### Passing an IAM role across accounts isn't allowed. The IAM role must belong to this account.


**Cause**

You can't pass a role that belongs to another IAM account.

**Solution**

Sign in to the AWS account that you used to create the role. You can see your IAM roles in your [IAM console](https://console.aws.amazon.com/iam/).

### The specified role doesn't have permissions to perform the operation. Provide a role that has a policy that provides Amazon ML the required permissions.


**Cause**

Your IAM role doesn't have the permissions to perform the requested operation. 

**Solution**

Edit the permission policy attached to your role in the [IAM console](https://console.aws.amazon.com/iam/) to provide the required permissions. 

### Amazon ML can't configure a security group on that Amazon Redshift cluster with the specified IAM role.


**Cause**

Your IAM role doesn't have the permissions required to configure an Amazon Redshift security cluster. 

**Solution**

Edit the permission policy attached to your role in the [IAM console](https://console.aws.amazon.com/iam/) to provide the required permissions.

### An error occurred when Amazon ML attempted to configure a security group on your cluster. Try again later.


**Cause**

When Amazon ML tried to connect to your Amazon Redshift cluster, it encountered an issue.

**Solution**

Verify that the IAM role that you provided in the Create Datasource wizard has all of the required permissions.

### The format of the cluster ID is invalid. Cluster IDs must begin with a letter and must contain only alphanumeric characters and hyphens. They can't contain two consecutive hyphens or end with a hyphen.


**Cause**

Your Amazon Redshift cluster ID format is incorrect. 

**Solution**

In the Create Datasource wizard, correct your cluster ID so that it contains only alphanumeric characters and hyphens and doesn't contain two consecutive hyphens or end with a hyphen.

### There is no <Amazon Redshift cluster name> cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as this Amazon ML.


**Cause**

Amazon ML can't find your Amazon Redshift cluster because it's not located in the region where you are creating an Amazon ML datasource.

**Solution**

Verify that your cluster exists on the Amazon Redshift console [Clusters](https://console.aws.amazon.com/redshift/home) page, that you are creating a datasource in the same region where your Amazon Redshift cluster is located, and that the cluster ID specified in the Create Datasource wizard is correct.

### Amazon ML can't read the data in your Amazon Redshift cluster. Provide the correct Amazon Redshift cluster ID.


**Cause**

Amazon ML can't read the data in the Amazon Redshift cluster that you specified.

**Solution**

In the Create Datasource wizard, specify the correct Amazon Redshift cluster ID, verify that you are creating a datasource in the same region that has your Amazon Redshift cluster, and that your cluster is listed on the Amazon Redshift [Clusters](https://console.aws.amazon.com/redshift/home) page.

### The <Amazon Redshift cluster name> cluster isn't publicly accessible.


**Cause**

Amazon ML can't access your cluster because the cluster is not publicly accessible and does not have a public IP address.

**Solution**

Make the cluster publicly accessible and give it a public IP address. For information about making clusters publicly accessible, see [Modifying a Cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-console.html#modify-cluster) in the *Amazon Redshift Management Guide*.

### The <Redshift> cluster status isn't available to Amazon ML. Use the Amazon Redshift console to view and resolve this cluster status issue. The cluster status must be "available."


**Cause**

Amazon ML can't see the cluster status.

**Solution**

Make sure that your cluster is available. For information on checking the status of your cluster, see [Getting an Overview of Cluster Status](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-console.html#status-cluster) in the *Amazon Redshift Management Guide*. For information on rebooting the cluster so that it is available, see [Rebooting a Cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-console.html#reboot-cluster) in the *Amazon Redshift Management Guide*.

### There is no <database name> database in this cluster. Verify that the database name is correct or specify another cluster and database.


**Cause**

Amazon ML can't find the specified database in the specified cluster.

**Solution**

Verify that the database name entered in the Create Datasource wizard is correct, or specify the correct cluster and database names. 

### Amazon ML couldn't access your database. Provide a valid password for database user <user name>.


**Cause**

The password you provided in the Create Datasource wizard to allow Amazon ML to access your Amazon Redshift database is incorrect.

**Solution**

Provide the correct password for your Amazon Redshift database user. 

### An error occurred when Amazon ML attempted to validate the query.


**Cause**

There's an issue with your SQL query.

**Solution**

Verify that your query is valid SQL.

### An error occurred when executing your SQL query. Verify the database name and the provided query. Root cause: \$1serverMessage\$1.


**Cause**

Amazon Redshift was unable to run your query.

**Solution**

Verify that you specified the correct database name in the Create Datasource wizard, and that your query is valid SQL. 

### An error occurred when executing your SQL query. Root cause: \$1serverMessage\$1.


**Cause**

Amazon Redshift was unable to find the specified table.

**Solution**

Verify that the table you specified in the Create Datasource wizard is present in your Amazon Redshift cluster database, and that you entered the correct cluster ID, database name, and SQL query.

## Contacting AWS Support


If you have AWS Premium Support, you can create a technical support case at the [AWS Support Center](https://console.aws.amazon.com/support/home#). 