# Loading data into a Neptune Analytics graph
<a name="loading-data"></a>

Neptune Analytics provides several options for loading data into a graph, supporting both RDF (Resource Description Framework) and LPG (Labeled Property Graph) models.
+  **Bulk import**  –   Designed to handle large scale data ingestion and is the fastest way to load large volumes of data. Bulk import runs a task to load data from files in [Amazon S3](bulk-import-create-from-s3.md). This option **must** be done on an empty graph, either at creation time using the [CreateGraphUsingImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) action, or on an existing graph using the [StartImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_StartImportTask.html) action. 
+  **Batch load**  –   Designed to handle incremental data ingestion to existing graphs using files in Amazon S3. This can be used to add more data or update single cardinality property values in existing graph data. The volume of data that can be ingested in a single request is lower than what bulk import can support. 
+  **openCypher queries**  –   Add more data through [queries](query.md), if data is not available from files in Amazon S3 or the data volume is small. This is also a more generic approach for conditional inserts based on data already in the graph, and updating contents of the graph. 

**Warning**  
 Be cautious while loading a file of edges. If the same edge file is loaded twice, duplicate edges will be inserted into the graph which can lead to unintended results.   
 Also, while using the SDK/CLI command execute-query to run neptune.load(), it is recommended to increase the timeout window and disable the retries for the SDK/CLI.   
 For more information about increasing the timeout and disabling retries, see [ExecuteQuery](query-APIs-execute-query.md). 

**Topics**
+ [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md)
+ [Batch load](batch-load.md)
+ [Bulk import data into a graph](bulk-import.md)
+ [neptune.read()](neptune-read.md)

# Data format for loading from Amazon S3 into Neptune Analytics
<a name="loading-data-formats"></a>

 Neptune Analytics, just like [Neptune Database](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format.html), supports four formats for loading data: 
+  [RDF (ntriples)](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-rdf.html), which is a line-based format for triples. See [Using RDF data](using-rdf-data.md) for more information on how this data is handled. 
+  [csv](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html) and [opencypher](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-opencypher.html), which are csv-based formats with schema restrictions. A csv file must contain a header row and the column values. The remainder of the files are interpreted based on the corresponding header column. The header could contain predefined system column names and user-defined column names annotated with predefined datatypes and cardinality. 
+  [Parquet](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/using-Parquet-data.html), which is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk. The data for each column in a Parquet file is stored together. 

 It's possible to combine CSV, RDF and Parquet data in the same graph, for example by first loading CSV data and enriching it with RDF data. 

# Using CSV data
<a name="using-CSV-data"></a>

 Neptune Analytics, like [Neptune Database](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format.html), supports two csv formats for loading graph data: [csv](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html) and [opencypher](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-opencypher.html). Both are csv-based formats with a specified schema. A csv file must contain a header row and the column values. The remainder of the files are interpreted based on the corresponding header column. The header could contain predefined system column names and user-defined column names, annotated with predefined datatypes and cardinality. 

## Behavioral differences from Neptune csv (opencypher) format
<a name="using-CSV-data-differences"></a>

**Edge files**:
+  The `~id` (`:ID`) column in `edge` (`relationship`) files in `CSV` (`opencypher`) format is not supported. It is ignored if provided in any of the `edge` (`relationship`) files. 

**Vertex files**:
+  Only explicitly provided labels are associated with the vertices. If the label provided is empty, the vertex is added without a label. If a row contains just the vertex id without any labels or properties then the row is ignored, and no vertex is added. For more information about vertices, see [vertices](query-openCypher-data-model.md#query-openCypher-data-model-vertices). 

**Edge or vertex files**:
+  Unlike Neptune Database, a vertex identifier can appear just in edge files. Neptune Analytics allows loading just the edge data from files in Amazon S3, and running an algorithm over the data without needing to provide any additional vertex information. The edges are created between vertices with the given identifiers, and the vertices have no labels or properties unless any are provided in the vertex files. For more information on vertices and what they are, see [vertices](query-openCypher-data-model.md#query-openCypher-data-model-vertices). 
+  Unlike Neptune Database, Neptune Analytics doesn't convert the `Date` type into `Datetime` type. 

## Supported column types
<a name="using-CSV-data-supported-types"></a>

### Date and Datetime
<a name="using-CSV-data-date-datetime"></a>

 The `Date` column type is supported. The following date formats are supported: `yyyy-MM-dd`, `yyyy-MM-dd[+|-]hhmm`. To include time along with date, use the `Datetime` column type instead. 

 The datetime values can either be provided in the [XSD format](https://www.w3.org/TR/xmlschema-2/) or one of the following formats: 
+ `yyyy-MM-dd`
+ `yyyy-MM-ddTHH:mm`
+ `yyyy-MM-ddTHH:mm:ss`
+ `yyyy-MM-ddTHH:mm:ssZ`
+ `yyyy-MM-ddTHH:mm:ss.SSSZ`
+ `yyyy-MM-ddTHH:mm:ss[+|-]hhmm`
+ `yyyy-MM-ddTHH:mm:ss.SSS[+|-]hhmm`

### Vector
<a name="using-CSV-data-vector"></a>

 A new column type `Vector` is supported for associating embeddings with vertices. Since Neptune Analytics only supports one index type at this moment, the property name for embeddings is currently fixed to `embedding`. If the element type of the embeddings is not floating point (FP32), it is cast to FP32. The embeddings in the `csv` files are optional when the vector index is enabled. This means that not every node needs to be associated with an embedding. If you want to set up a vector index for the graph, choose the `vector dimension` and then specify the number of dimensions for the vectors in the index. The changes to vector embeddings are non-atomic and unisolated (see [Vector index transaction support](vector-index.md#vector-index-transaction-support)), that is they become durable and visible to other queries immediately upon write, unlike other properties. 

**Important**  
 The `dimension` must match the dimension of the embeddings in the vertex files. 

 For more details of loading embeddings, refer to [vector-index](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/vector-index.html). 

### Any type
<a name="using-CSV-data-any-type"></a>

 A column type `Any` is supported in the user columns. An `Any` type is a type "syntactic sugar" for all of the other types we support. It is extremely useful if a user column has multiple types in it. The payload of an `Any` type value is a list of json strings as follows: `"{""value"": ""10"", ""type"": ""Int""};{""value"": ""1.0"", ""type"": ""Float""}"` , which has a `value` field and a `type` field in each individual json string. The column header of an `Any` type is `propertyname:Any`. The cardinality value of an `Any` column is `set`, meaning that the column can accept multiple values. 

 Neptune Analytics supports the following types in an `Any` type: `Bool` (or `Boolean`), `Byte`, `Short`, `Int`, `Long`, `UnsignedByte`, `UnsignedShort`, `UnsignedInt`, `UnsignedLong`, `Float`, `Double`, `Date`, `dateTime`, and `String`. 

**Any type limitations**
+  `Vector` type is not supported in `Any` type. 
+  Nested `Any` type is not supported. For example, `"{""value"": "{""value"": ""10"", ""type"": ""Int""}", ""type"": ""Any""}"`. 

## Limitations and unsupported features
<a name="using-CSV-data-limitations"></a>
+  Multi-line string values are not supported. Import behavior is undefined if the dataset contains multi-line string values. 
+  Quoted string values must not have a leading space between the delimiter and quotes. For example, if a line is `abc, "def"` then that is interpreted as a line with two fields, with string values of `abc` and `"def"`. `"def"` is a non-quoted string field and quotes are stored as-is in the value, with a size of 6 characters. If the line is `abc,"def"` then it is interpreted as a line with two fields with string values `abc` and `def`. 
+  `Gzip` files are not supported. 
+  Float and double values in scientific notation are currently not supported. However, `Infinity`, `INF`, `-Infinity`, `-INF`, and `NaN` (`Not-a-number`) values are supported. 
+  The maximum length of the strings supported is limited to 1,048,062 bytes. The limit is lower for strings with unicode characters since some unicode characters are represented using multiple bytes. 
+  The `allowEmptyStrings` parameter is not supported. Empty string values ("") are not treated as null or missing value, and are stored as a property value. 

# Using Parquet data
<a name="using-Parquet-data"></a>

 Neptune Analytics supports importing data using the Parquet format. A Parquet file must contain a header row and the column values. The remainder of the files are interpreted based on the corresponding header column. The header should contain predefined system column names and/or user-defined column names. Aside from the header row and column values, a Parquet file also has metadata which is stored in-line with the Parquet file, and is used in the reading and decoding of said data. 

**Note**  
 Compression for Parquet format is not supported at this time. 

## System column headers
<a name="using-Parquet-data-system-column-headers"></a>

 The required and allowed system column headers are different for vertex files and edge files. Each system column can appear only once in a header. All labels are case sensitive. 

**Note**  
 The `~id` column in `edge` (`relationship`) files in `Parquet` format are not supported. They are ignored if provided in any of the `edge` (`relationship`) files. 

**Vertex headers**
+  `~id` - Required. An `id` for the vertex. 
+  `~label` - Optional. List of labels for the vertex. Each label is a string. Multiple labels can either be semicolon (`;`) separated, or a list of strings. 

**Edge headers**
+  `~from` - Required. The vertex `id` of the **from** vertex. 
+  `~to` - Required. The vertex `id` of the **to** vertex. 
+  `~label` - Optional. A label for the edge. The label is a string value. 

## Property column headers
<a name="using-Parquet-data-property-column-headers"></a>

 Unlike the property column headers of the CSV format, the property column headers of the Parquet format only need to have the property names, there is no need to have the type names nor the cardinality. 

 There are however, some special column types in the Parquet format that requires annotation in the metadata, including `Any` type, `Date` type, and `dateTime` type. For more details of `Any` type, `Date` type, and `dateTime` type, please refer to [using CSV data](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/using-CSV-data.html). The following object is an example of the metadata that has `Any` type column, `Date` type column and `dateTime` type column annotated: 

```
"metadata": {
    "anyTypeColumns": ["UserCol1"],
    "dateTypeColumns": ["UserCol2"],
    "dateTimeTypeColumns": ["UserCol3"]
}
```

**Note**  
 Space, comma, carriage return and newline characters are not allowed in the column headers, so property names cannot include these characters. 

**Warning**  
 Without the annotation in the metadata for the special column types, the values of these special columns will be stored as strings instead of the intended types. 

# Using RDF data
<a name="using-rdf-data"></a>

 Neptune Analytics supports importing RDF data using the n-triples format. The handling of RDF values is described below, including how RDF data is interpreted as LPG concepts and can be queried using openCypher. 

## Handling of RDF values
<a name="rdf-handling"></a>

 The handling of RDF specific values, that don‘t have a direct equivalent in LPG, is described here. 

### IRIs
<a name="rdf-handling-iri"></a>

 Values of type IRI, like `<http://example.com/Alice>` , are stored as such. IRIs and Strings are distinct data types. 

 Calling openCypher function `TOSTRING()` on an IRI returns a string containing the IRI wrapped inside `<>`. For example, if `x` is the IRI `<http://example.com/Alice>`, then `TOSTRING(x)` returns `"<http://example.com/Alice>"`. When serializing openCypher query results in json format, IRI values are included as strings in this same format. 

### Language-tagged literals
<a name="rdf-handling-language-tagged-literals"></a>

 Values like `"Hallo"@de` are treated as follows: 
+  When used as input for openCypher string functions, like `trim()`, a language-tagged string is treated as a simple string; so `trim("Hallo"@de)` is equivalent to `trim("Hallo")`. 
+  When used in comparison operations, like `x = y` or `x <> y` or `x < y` or `ORDER BY`, a language-tagged literal is “greater than” (and thus “not equal to”) the corresponding simple string: `"Hallo" < "Hallo"@de`. 

 Calling a function, such as `TOSTRING()` on a language-tagged literal, returns that literal as a string without language tag. For example, if `x` is the value `"Hallo"@de`, then `TOSTRING(x)` returns `"Hallo"`. When serializing openCypher query results in JSON format, language-tagged literals are also serialized as strings without an associated language tag. 

### Blank nodes
<a name="rdf-handling-blank-nodes"></a>

 Blank nodes in n-triples data files are replaced with globally unique IRIs at import time. 

 Loading RDF datasets that contains blank nodes is supported; but those blank nodes are represented as IRIs in the graph. When loading ntriples files the parameter `blankNodeHandling` needs to be specified, with the value `convertToIri`. 

 The generated IRI for a blank node has the format: `<http://aws.amazon.com/neptune/vocab/v01/BNode/scope#id>` 

 In these IRIs, `scope` is a unique identifier for the blank node scope, and `id` is the blank node identifier in the file. For example for a blank node `_:b123` the generated IRI could be `<http://aws.amazon.com/neptune/vocab/v01/BNode/737c0b5386448f78#b123>`. 

 The **blank node scope** (e.g. 737c0b5386448f78) is generated by Neptune Analytics and designates one file within one load operation. This means that when two different ntriples files reference the same blank node identifier, like `_:b123`, there will be two IRIs generated, namely one for each file. All references to `_:b123` within the first file will end up as references to the first IRI, like `<http://aws.amazon.com/neptune/vocab/v01/BNode/1001#b123>`, and all references within the second file will end up referring to another IRI, like `<http://aws.amazon.com/neptune/vocab/v01/BNode/1002#b123>`. 

# Referencing IRIs in queries
<a name="rdf-handling-iri-ref"></a>

 There are two ways to reference an IRI in an openCypher query: 
+  Wrap the full IRI inside `<` and `>` . Depending on where in the query this IRI is referenced, the IRI is then provided as a String, such as `"<http://example.com/Alice>"` (when the IRI is the value of property `~id`), or in backticks such as ``<http://example.com/Alice>`` (when the IRI is a label, or property key). 

  ```
  CREATE (:`<http://xmlns.com/foaf/0.1/Person>` {`~id`: "<http://example.com/Alice>"})
  ```
+  Define a PREFIX at the start of the query, and inside the query reference an IRI using `prefix::suffix`. For example, after `PREFIX ex: <http://example.com/>` the reference `ex::Alice` also references the full IRI `<http://example.com/Alice>`. 

  ```
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX ex: <http://example.com/>
  CREATE (: foaf::Person {`~id`: ex::Alice})
  ```

 Additional query examples below show the use of both full IRIs and the prefix syntax. 

# Mapping RDF triples to LPG concepts
<a name="rdf-mapping-triples"></a>

 There are three rules that define how RDF triples correspond to LPG concepts: 

```
Case     RDF triple                   ⇆    LPG concept
-----------------------------------------------------------------
Case #1  { <iri> rdf:type <iri> }     ⇆    vertex with id + label
Case #2  { <iri> <iri> "literal"}     ⇆    vertex property
Case #3  { <iri> <iri> <iri> }        ⇆    edge with label
```

**Case \$11: Vertex with id and label**

 A triple like: 

```
<http://example.com/Alice> rdf:type <http://xmlns.com/foaf/0.1/Person>
```

 is equivalent to creating the vertex in openCypher like: 

```
CREATE (:`<http://xmlns.com/foaf/0.1/Person>` {`~id`: "<http://example.com/Alice>"})
```

 In this example, the vertex label `<http://xmlns.com/foaf/0.1/Person>` is interpreted and stored as an IRI. 

**Note**  
 The back quote syntax ```` is part of openCypher which allows inserting characters that normally cannot be used in labels. Using this mechanism, it’s possible to include complete IRIs in a query. 

 Using `PREFIX`, the same `CREATE` query could look like: 

```
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.com/>
CREATE (: foaf::Person {`~id`: ex::Alice})
```

 To match the newly created vertex based on its id: 

```
MATCH (v {`~id`: "<http://example.com/Alice>"}) RETURN v
```

 or equivalently: 

```
PREFIX ex: <http://example.com/>
MATCH (v {`~id`: ex::Alice}) RETURN v
```

 To find vertices with that RDF Class/LPG Label: 

```
MATCH (v:`<http://xmlns.com/foaf/0.1/Person>`) RETURN v
```

 or equivalently: 

```
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
MATCH (v : foaf::Person) RETURN v
```

**Case \$12: Vertex property**

 A triple like: 

```
<http://example.com/Alice> <http://xmlns.com/foaf/0.1/name> "Alice Smith"
```

 is equivalent to defining with openCypher node with a given `~id` and property, where both the `~id` and the property key are IRIs: 

```
CREATE ({`~id`: "<http://example.com/Alice>",
        `<http://xmlns.com/foaf/0.1/name>`: "Alice Smith" })
```

 or equivalently: 

```
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.com/>
CREATE ({`~id`: ex::Alice, foaf::name: "Alice Smith" })
```

 To match the vertex with that property: 

```
MATCH (v {`<http://xmlns.com/foaf/0.1/name>`: "Alice Smith"}) RETURN v
```

 or equivalently: 

```
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
MATCH (v { foaf::name : "Alice Smith"}) RETURN v
```

**Case \$13: Edge**

 A triple like: 

```
<http://example.com/Alice> <http://example.com/knows> <http://example.com/Bob>
```

 is equivalent to defining with OpenCypher an edge like this, where the edge label and vertices ids are all IRIs: 

```
CREATE ({`~id`: "<http://example.com/Alice>"})
          -[:`<http://example.com/knows>`]->({`~id`: "<http://example.com/Bob>"})
```

 or equivalently: 

```
PREFIX ex: <http://example.com/>
CREATE ({`~id`: ex::Alice })-[: ex::knows ]->({`~id`: ex::Bob })
```

 To match the edges with that label: 

```
MATCH (v)-[:`<http://example.com/knows>`]->(w) RETURN v, w
```

 or equivalently: 

```
PREFIX ex: <http://example.com/>
MATCH (v)-[: ex::knows ]->(w) RETURN v, w
```

## Query Examples
<a name="rdf-query-examples"></a>

**Matching language-tagged literals**

 If this triple was loaded from a dataset: 

```
<http://example.com/German> <http://example.com/greeting> "Hallo"@de
```

 then it will **not** be matched by this query: 

```
MATCH (n) WHERE n.`<http://example.com/greeting>` = "Hallo"
```

 because the language-tagged literal `"Hallo"@de` and the string “Hallo” are not equal. For more information, see [Language-tagged literals](using-rdf-data.md#rdf-handling-language-tagged-literals). The query can use `TOSTRING()` in order to find the match: 

```
MATCH (n) WHERE TOSTRING(n.`<http://example.com/greeting>`) = "Hallo"
```

# Batch load
<a name="batch-load"></a>

 Neptune Analytics supports a `CALL` procedure `neptune.load` to load data from Amazon S3, to insert new vertices, edges, and properties, or to update single cardinality vertex property values. It executes as a mutation query and does atomic writes. It uses the IAM credentials of the caller to access the data in Amazon S3. See [Create your IAM role for Amazon S3 access](bulk-import-create-from-s3.md#create-iam-role-for-s3-access) to set up the permissions. 

## Request syntax
<a name="batch-load-request"></a>

 The signature of the `CALL` procedure is shown below: 

```
CALL neptune.load(
  {
    source: "string",
    region: "us-east-1",
    format: "csv",
    failOnError: true,
    concurrency: 1
  }
)
```
+  **source** (required) – An Amazon S3 URI prefix. All object names with matching prefixes are loaded. See [ Neptune Database loader reference](https://docs.aws.amazon.com//neptune/latest/userguide/load-api-reference-load.html#load-api-reference-load-parameters) for Amazon S3 URI prefix examples. The IAM user who signs the openCypher request must have permissions to list and download these objects, and must be authorized for `WriteDataViaQuery` and `DeleteDataViaQuery` actions. See [ IAM role mapping](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/query-APIs-IAM-role-mappings.html) for more IAM authentication related details. 
+  **region** (required) – The AWS region where the Amazon S3 bucket is hosted. Currently, cross-region loads are not supported. 
+  **format** (required) – The data format of the Amazon S3 data to be loaded, valid options are `csv`, `opencypher`, `ntriples` or `parquet`. For more information, see [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md). 
+  **ParquetType** (required if the format is `parquet`) - The data type of the Parquet format, with the only valid option being `columnar`. For more information, see [Using Parquet data](using-Parquet-data.md). 
+  **blankNodeHanding**(must be provided when format is `ntriples`) - The method to handle blank nodes in the dataset. Currently, only `convertToIri` is supported, meaning blank nodes are converted to unique IRIs at load time. For more information, see [Handling RDF values](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/using-rdf-data.html#rdf-handling). 
+  **failOnError** (optional) default: true – If set to `true` (the default), the load process halts whenever there is an error parsing or inserting data. If set to `false`, the load process continues and commits whatever data was successfully inserted. 

   The edge or relationship data should be loaded with `failOnError` set to `true`, to avoid duplication of partially committed edges or relationships in subsequent loads. 
+  **concurrency** (optional) default: 1 – This value controls the number of threads used to run the load process, up to the maximum available. 

**Note**  
 Unlike bulk import, there is no need to pass the `role-arn` for batch load since the IAM credentials of the signer of the openCypher query are used to download data from Amazon S3. The signer must have permissions to download data from Amazon S3 with the trust relationship set up to assume the role, so that Neptune Analytics can assume the role to load the data into the graph from files in Amazon S3. 

## Response syntax
<a name="batch-load-response"></a>

 A sample response is shown below. 

```
{
    "results": [
        {        
            "totalRecords": 108070,   
            "totalDuplicates": 46521,
            "totalTimeSpentMillis": 558,
            "numThreads": 16,
            "insertErrors": 0,
            "throughputRecordsPerSec": 193673,
            "loadId": "13a60c3b-754d-c49b-4c23-06b9dd5b346b"
        }
    ]
}
```
+  `totalRecords`: The number of graph elements - vertex labels, edges, and properties - attempted for insertion. 
+  `totalDuplicates`: The count of duplicate graph elements - vertex labels or properties - encountered. These elements may have pre-existed before the load request or were duplicates within the input CSV files. Each edge is treated as new, so edges are excluded from this count. 
+  `totalTimeSpentMillis`: The total time taken for downloading, parsing, and inserting data from CSV files, excluding the request queue time. 
+  `numThreads`: The number of threads utilized for downloading and inserting data. This correlates with the provided `concurrency` parameter input, reflecting any caps applied. 
+  `insertErrors`: Errors faced during insertions, including parsing errors and Amazon S3 access issues. Error details are available in the CloudWatch logs. Refer to the [Troubleshooting](bulk-import-troubleshooting.md) section of this document to understand troubleshooting insertErrors. Concurrent modification errors may also cause insert errors in batch loads attempting to modify a vertex property value being concurrently changed by another request. 
+  `throughputRecordsPerSec`: The total throughput in records per second. 
+  `loadId`: The loadId for searching errors and load summary. All batch information is published to CloudWatch logs under `/aws/neptune/import-task-logs/<graph-id>/<load-id>`. 

**Note**  
 Around 2.5Gb of Amazon S3 files can be loaded in a single request on 128 m-NCU. Larger sized datasets could run into `out of memory` errors. To workaround that, the Amazon S3 files can be split across multiple serial batch load requests. The source argument takes a prefix, so files can be partitioned across requests by including prefixes of file names. The limit scales linearly based on m-NCUs, so for example 5Gb of Amazon S3 files can be loaded in a single request on 256 m-NCU. Also, if the dataset contains larger string values for example, then larger volumes of data can also be ingested in a single request, since they would generate fewer number of graph elements per byte of dataset. It is recommended to run tests with your data to determine the exact details for this process. 

**Important**  
 Duplicate edges get created if the same edge file content is loaded more than once. This could happen if, for example:   
 The same Amazon S3 source or file is accidentally included for load in more than one request that succeeded. 
 The edge data is first loaded with `failOnError` set to false and runs into partial errors, and the errors are fixed and the entire dataset is reloaded. All of the edges that were successfully inserted on the first request would get duplicated after the second request. 

# Bulk import data into a graph
<a name="bulk-import"></a>

 The task system in Neptune Analytics provides a powerful and flexible way to bulk import data into your graph. The `import` task is specifically designed to handle large-scale data ingestion from various data [formats](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/loading-data-formats.html). 

 To initiate a bulk data import, you would first create an import task by specifying the data source, the target graph, and any necessary configuration options. This can be done through the AWS console or programmatically via the API. 

 Throughout the import process, you can monitor the progress of the import task through the user interface or via API calls. Progress reports, and any potential errors or warnings will be accessible in your CloudWatch account, allowing for close monitoring and [troubleshooting](bulk-import-troubleshooting.md) if needed. 

 Importing of data through Import Task is supported in two ways: 
+  During graph creation: [Create a graph from Amazon S3, a Neptune cluster, or a snapshot](bulk-import-into-a-graph.md) 
+  On an existing empty graph: [Bulk import data into an existing Neptune Analytics graph](loading-data-existing-graph.md) 

# Create a graph from Amazon S3, a Neptune cluster, or a snapshot
<a name="bulk-import-into-a-graph"></a>

 You can create a Neptune Analytics graph directly from Amazon S3 or from Neptune using the [CreateGraphUsingImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) API. This is recommended for importing large graphs from files in Amazon S3 (>50GB of data), importing from existing Neptune clusters, or importing from existing Neptune snapshots. This API automatically analyzes the data, provisions a new graph based on the analysis, and imports data as one atomic operation using maximum available resources. 

**Note**  
 The graph is made available for querying only after the data loading is completed successfully. 

 If errors are encountered during the import process, Neptune Analytics will automatically roll back the provisioned resources, and perform the cleanup. No manual cleanup actions are needed. Error details are available in the CloudWatch logs. See [troubleshooting](bulk-import-troubleshooting.md) for more details. 

**Topics**
+ [Creating a Neptune Analytics graph from Amazon S3](bulk-import-create-from-s3.md)
+ [Creating a Neptune Analytics graph from Neptune cluster or snapshot](bulk-import-create-from-neptune.md)

# Creating a Neptune Analytics graph from Amazon S3
<a name="bulk-import-create-from-s3"></a>

 Neptune Analytics supports bulk importing of CSV, ntriples, and Parquet data directly from Amazon S3 into a Neptune Analytics graph using the `CreateGraphUsingImportTask` API. The data formats supported are listed in [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md). It is recommended that you try the batch load process with a subset of your data first to validate that it is correctly formatted. Once you have validated that your data files are fully compatible with Neptune Analytics, you can prepare your full dataset and perform the bulk import using the steps below. 

 A quick summary of steps needed to import a graph from Amazon S3: 
+  [Copy the data files to an Amazon S3 bucket](#create-bucket-copy-data): Copy the data files to an Amazon Simple Storage Service bucket in the same region where you want the Neptune Analytics graph to be created. See [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md) for the details of the format when loading data from Amazon S3 into Neptune Analytics. 
+  [Create your IAM role for Amazon S3 access](#create-iam-role-for-s3-access): Create an IAM role with `read` and `list` access to the bucket and a trust relationship that allows Neptune Analytics graphs to use your IAM role for importing. 
+  Use the `CreateGraphUsingImportTask` API to import from Amazon S3: Create a graph using the `CreateGraphUsingImportTask` API. This will generate a `taskId` for the operation. 
+  Use the `GetImportTask` API to get the details of the import task. The response will indicate the status of the task (ie. INITIALIZING, ANALYZING\$1DATA, IMPORTING etc.). 
+  Once the task has completed successfully, you will see a `COMPLETED` status for the import task and also the `graphId` for the newly created graph. 
+  Use the `GetGraphs` API to fetch all the details about your new graph, including the ARN, endpoint, etc. 

**Note**  
 If you're creating a private graph endpoint, the following permissions are required:   
ec2:CreateVpcEndpoint
ec2:DescribeAvailabilityZones
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcAttribute
ec2:DescribeVpcEndpoints
ec2:DescribeVpcs
ec2:ModifyVpcEndpoint
route53:AssociateVPCWithHostedZone
 For more information about required permissions, see [ Actions defined by Neptune Analytics](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonneptuneanalytics.html#amazonneptuneanalytics-actions-as-permissions). 

## Copy the data files to an Amazon S3 bucket
<a name="create-bucket-copy-data"></a>

 The Amazon S3 bucket must be in the same AWS region as the cluster that loads the data. You can use the following AWS CLI command to copy the files to the bucket. 

```
aws s3 cp data-file-name s3://bucket-name/object-key-name
```

**Note**  
 In Amazon S3, an object key name is the entire path of a file, including the file name.   
 In the command   

```
aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt
```
 the object key name is `mydirectory/datafile.txt` 

 You can also use the AWS management console to upload files to the Amazon S3 bucket. Open the Amazon S3 [console](https://console.aws.amazon.com/s3/), and choose a bucket. In the upper-left corner, choose **Upload** to upload files. 

## Create your IAM role for Amazon S3 access
<a name="create-iam-role-for-s3-access"></a>

 Create an IAM role with permissions to `read` and `list` the contents of your bucket. Add a trust relationship that allows Neptune Analytics to assume this role for doing the import task. You could do this using the AWS console, or through the CLI/SDK. 

1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose `AmazonS3ReadOnlyAccess`. 
**Note**  
 This policy grants s3:Get\$1 and s3:List\$1 permissions to all buckets. Later steps restrict access to the role using the trust policy. The loader only requires s3:Get\$1 and s3:List\$1 permissions to the bucket you are loading from, so you can also restrict these permissions by the Amazon S3 resource. If your Amazon S3 bucket is encrypted, you need to add `kms:Decrypt` permissions as well. `kms:Decrypt` permission is needed for the exported data from Neptune Database 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "neptune-graph.amazonaws.com"
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

## Use the CreateGraphUsingImportTask API to import from Amazon S3
<a name="use-createGraphUsingImportTask-to-import"></a>

 You can perform this operation from the Neptune console as well as from AWS CLI/SDK. For more information on different parameters, see [https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) 

**Via CLI/SDK**

```
aws neptune-graph create-graph-using-import-task \
  --graph-name <name> \
  --format <format> \
  --source <s3 path> \
  --role-arn <role arn> \
  [--blank-node-handling "convertToIri"--] \
  [--fail-on-error | --no-fail-on-error] \
  [--deletion-protection | --no-deletion-protection]
  [--public-connectivity | --no-public-connectivity]
  [--min-provisioned-memory]
  [--max-provisioned-memory]
  [--vector-search-configuration]
```
+  **Different Minimum and Maximum Provisioned Memory**: When the `--min-provisioned-memory` and `--max-provisioned-memory` values are specified differently, the graph is created with the maximum provisioned memory specified by `--max-provisioned-memory`. 
+  **Single Provisioned Memory Value**: When only one of `--min-provisioned-memory` or `--max-provisioned-memory` is provided, the graph is created with the specified memory value. 
+  **No Provisioned Memory Values**: If neither `--min-provisioned-memory` nor `--max-provisioned-memory` is provided, the graph is created with a default provisioned memory of 128 m-NCU (memory optimized Neptune Compute Units). 

 Example 1: Create a graph from Amazon S3, with no min/max provisioned memory. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV
```

 Example 2: Create a graph from Amazon S3, with min & max provisioned memory. A graph with m-NCU of 1024 is created. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --min-provisioned-memory 128 \
  --max-provisioned-memory 1024
```

Example 3: Create a graph from Amazon S3, and not fail on parsing errors.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --no-fail-on-error
```

Example 4: Create a graph from Amazon S3, with 2 replicas.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2
```

Example 5: Create a graph from Amazon S3 with vector search index.

**Note**  
 The `dimension` must match the dimension of the embeddings in the vertex files. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2 \
  --vector-search-configuration "{\"dimension\":768}"
```

**Via Neptune console**

1. Start the Create Graph wizard and choose **Create graph from existing source**.  
![\[Step 1 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-1.png)

1. Choose type of source as Amazon S3, minimum and maximum provisioned memory, Amazon S3 path, and load role ARN.  
![\[Step 2 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-2.png)

1. Choose the Network Settings and Replica counts.  
![\[Step 3 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-3.png)

1. Create graph.

# Creating a Neptune Analytics graph from Neptune cluster or snapshot
<a name="bulk-import-create-from-neptune"></a>

 Neptune Analytics provides an easy way to bulk import data from an existing Neptune Database cluster or snapshot into a new Neptune Analytics graph, using the `CreateGraphUsingImportTask` API. Data from your source cluster or snapshot is bulk exported into an Amazon S3 bucket that you configure, analyzed to find the right memory configuration, and bulk imported into a new Neptune Analytics graph. You can check the progress of your bulk import at any time using the `GetImportTask` API as well. 

 A few things to consider while using this feature: 
+  You can only import from Neptune Database clusters and snapshots running on a version newer than or equal to 1.3.0. 
+  Import from an existing Neptune Database cluster only supports the ingest of property graph data. RDF data within a Neptune Database cluster cannot be ingested using an import task. If looking to ingest RDF data into Neptune Analytics, this data needs to be manually exported from the Neptune Database cluster to an Amazon S3 bucket before it can be ingested using an import task with an Amazon S3 bucket source. 
+  The exported data from your source Neptune Database cluster or snapshot will reside in your buckets only, and will be encrypted using a KMS key that you provide. The exported data is not directly consumable in any other way into Neptune outside of the `CreateGraphUsingImportTask` API. The exported data is not used after the lifetime of the request, and can be deleted by the user. 
+  You need to provide permissions to perform the export task on the Neptune Database cluster or snapshot, write to your Amazon S3 bucket, and use your KMS key while writing data. 
+  If your source is a Neptune Database cluster, a clone is taken from it and used for export. The original Neptune Database cluster will not be impacted. The cloned cluster is internally managed by the service and is deleted upon completion. 
+  If your source is a Neptune snapshot, a restored DBCluster is created from it, and used for export. The restored cluster is internally managed by the service and is deleted upon completion. 
+  This process is not recommended for small sized graphs. The export process is async, and works best for medium/large sized graphs with a size greater than 25GB. For smaller graphs, a better alternative is to use the [Neptune export](https://docs.aws.amazon.com//neptune/latest/userguide/neptune-export.html) feature to generate CSV data directly from your source, upload that to Amazon S3 and then use the [Batch load](batch-load.md) API instead. 

 A quick summary of steps to import from a Neptune cluster or a Neptune snapshot: 

1.  [Obtain the ARN of your Neptune cluster or snapshot](#obtain-arn-of-neptune-cluster): This can be done from the AWS console or using the Neptune CLI. 

1.  [Create an IAM role with permissions to export from Neptune to Neptune Analytics](#iam-create-role-export-neptune-analytics): Create an IAM role that has permissions to perform an export of your Neptune graph, write to Amazon S3 and use your KMS key for writing data in Amazon S3. 

1.  Use the `CreateGraphUsingImportTask` API with source = NEPTUNE, and provide the ARN of your source, Amazon S3 path to export the data, KMS key to use for exporting data and additional arguments for your Neptune Analytics graph. This should return a `task-id`. 

1.  Use `GetImportTask` API to get the details of your task. 

## Obtain the ARN of your Neptune cluster or snapshot
<a name="obtain-arn-of-neptune-cluster"></a>

 The following instructions demonstrate how to obtain the Amazon Resource Name (ARN) for an existing Amazon Neptune database cluster or snapshot using the AWS Command Line Interface (CLI). The ARN is a unique identifier for an AWS resource, such as a Neptune cluster or snapshot, and is commonly used when interacting with AWS services programmatically or through the AWS management console. 

**Via the CLI:**

```
# Obtaining the ARN of an existing DB Cluster
  aws neptune describe-db-clusters   \
      --db-cluster-identifier *<name> \
      --query 'DBClusters[0].DBClusterArn'
      
      
  # Obtaining the ARN of an existing DB Cluster Snapshot 
  aws neptune describe-db-cluster-snapshots \
      --db-cluster-snapshot-identifier <snapshot name> \
      --query 'DBClusterSnapshots[0].DBClusterSnapshotArn'
```

**Via the AWS console. The ARN can be found on the cluster details page.**

![\[Cluster details option 1.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-1.png)


![\[Cluster details option 2.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-2.png)


## Create an IAM role with permissions to export from Neptune to Neptune Analytics
<a name="iam-create-role-export-neptune-analytics"></a>

1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose: 
   + `AmazonS3FullAccess`
   + `NeptuneFullAccess`
   + `AmazonRDSFullAccess`

1.  Also create a custom policy with at least the following permissions for the AWS KMS key used: 
   + `kms:ListGrants`
   + `kms:CreateGrant`
   + `kms:RevokeGrant`
   + `kms:DescribeKey`
   + `kms:GenerateDataKey`
   + `kms:Encrypt`
   + `kms:ReEncrypt*`
   + `kms:Decrypt`
**Note**  
 Make sure there are no resource-level `Deny` policies attached to your AWS KMS key. If there are, explicitly allow the AWS KMS permissions for the `Export` role. 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "export.rds.amazonaws.com",
                         "neptune-graph.amazonaws.com"      
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

**Via CLI/SDK**

 For importing data via Neptune , the API expects additional import-options as defined here [ NeptuneImportOptions ](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_NeptuneImportOptions.html). 

Example 1: Create a graph from a Neptune cluster.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>"
   }}'
```

Example 2: Create a graph from a Neptune cluster with the default vertex preserved.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveDefaultVertexLabels" : true 
   }}'
```

Example 3: Create a graph from Neptune cluster with the default edge Id preserved

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveEdgeIds" : true 
   }}'
```

# Bulk import data into an existing Neptune Analytics graph
<a name="loading-data-existing-graph"></a>

 Neptune Analytics now allows you to efficiently import large datasets into an already provisioned graph database using the `StartImportTask` API. This API facilitates the direct loading of data from an Amazon S3 bucket into an **empty** Neptune Analytics graph. This is designed for loading data into existing empty clusters. 

Two common use cases for using this feature:

1.  Bulk importing data multiple times without provisioning a new graph for each dataset. This helps during the development phase of a project where datasets are being converted into Neptune Analytics compatible load formats. 

1.  Use cases where graph provisioning privileges need to be separated from data operation privileges. For example, scenarios where graph provisioning needs to be done by only by the infrastructure team, and data loading and querying is done by the data engineering team. 

 For use cases where you want to create a new graph loaded with data, use the `CreateGraphUsingImportTask` API instead. 

 For incrementally loading data from Amazon S3 you can use the loader integration with the openCypher `CALL` clause. For more information see [Batch load](batch-load.md). 

**Prerequisites**
+  An empty Amazon Neptune Analytics graph. 
+  Data stored in an Amazon Amazon S3 bucket in the same region as the graph. 
+  An IAM role with permissions to access the Amazon S3 bucket. For more information, see [Create your IAM role for Amazon S3 access](bulk-import-create-from-s3.md#create-iam-role-for-s3-access). 

**Important considerations**
+  **Data integrity**: The `StartImportTask` API is designed to work with graphs that are empty. If your graph contains data, you can first reset the graph using the [reset-graph](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_ResetGraph.html) API. If the Import task finds that the graph is not empty the operation will fail. This operation will delete all data from the graph, so ensure you have backups if necessary. You can use the [ create-graph-snapshot](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphSnapshot.html) API to create snapshot of your existing graph. 
+  **Atomic Operation**: The data import is atomic, meaning it either completes fully or does not apply at all. If the import fails we would reset the state back to an empty graph. 
+  **Format Support**: Loading data supports the same data format as supported by `create-graph-using-import-task` and `neptune.load()` This API doesn’t support importing data from Neptune . 
+  **Queries**: Queries will stop working while the import is in progress. You will get a `Cannot execute any query until bulk import is complete` error until the import finishes. 

**Steps for bulk importing data**

1.  Resetting the graph (if necessary): 

    If your graph is not empty, reset it using the following command: 

   ```
   aws neptune-graph reset-graph --graph-identifier <graph-id>
   ```
**Note**  
 This command will completely remove all existing data from your graph. It is recommended that you take a graph snapshot before performing this action. 

1.  Start the import task: 

    To load data into your Neptune graph, use the `start-import-task` command as follows: 

   ```
   aws neptune-graph start-import-task \
   --graph-identifier <graph-id> \
   --source <s3-path-to-data> \
   --format <data-format> \
   --role-arn <IAM-role-ARN> \
   [--fail-on-error | --no-fail-on-error]
   ```
   +  `graph-identifier`: The unique identifier of your Neptune graph. 
   +  `source`: An Amazon S3 URI prefix. All object names with matching prefixes are loaded. See [ Neptune loader request parameters](/neptune/latest/userguide/load-api-reference-load.html#load-api-reference-load-parameters) for Amazon S3 URI prefix examples. 
   +  `format`: The data format of the Amazon S3 data to be loaded, either `csv`, `openCypher`, or `ntriples`. For more information, see [Data formats](loading-data-formats.md). 
   +  `role-arn`: The ARN of the IAM role that Neptune Analytics can assume to access your Amazon S3 data. 
   +  `(--no-)fail-on-error`: (Optional) Stops the import process early if an error occurs. By default, the system attempts to stop at the first error. 

## Troubleshooting bulk import
<a name="loading-data-existing-graph-troubleshooting"></a>

 The following troubleshooting guidance is for common errors encountered during bulk import of data into an Amazon Neptune graph database. It covers three main issues: the Amazon S3 bucket and the graph being in different regions, the IAM role used not having the correct permissions, and the bulk load files in a public Amazon S3 bucket not being made public for reading. 

**Common errors**

1. The Amazon S3 bucket and your graph are in different regions.

   Verify that your graph and the Amazon S3 bucket are in the same region. Neptune Analytics only supports loading data in the same region.

   ```
   export GRAPH_ID="<graphId>"                // Replace with your graph identifier
   export S3_BUCKET_NAME="<bucketName>"        // Replace with your S3 bucket which contains your graph data files. 
   
   # Make sure your graph and S3 bucket are in the same region
   aws neptune-graph get-graph --graph-identifier $GRAPH_ID
   aws s3api get-bucket-location --bucket $S3_BUCKET_NAME
   ```

1. The IAM role used does not have the correct permissions.

   Verify that you have created the IAM role correctly with read permission to Amazon S3 - see [ Create your IAM role for Amazon S3 access](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-create-from-s3.html#create-iam-role-for-s3-access).

   ```
   export GRAPH_EXEC_ROLE="GraphExecutionRole"
   aws iam list-attached-role-policies --role-name $GRAPH_EXEC_ROLE
   # Output should contain "PolicyName": "AmazonS3*Access".
   ```

1. The `AssumeRole` permission is not granted to Neptune Analytics through the AssumeRolePolicy.

   Verify that you have attached the policy that allows Neptune Analytics to assume the IAM role to access the Amazon S3 bucket. See [ Create your IAM role for Amazon S3 access](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-create-from-s3.html#create-iam-role-for-s3-access).

   ```
   export GRAPH_EXEC_ROLE="GraphExecutionRole"   // Replace with your IAM role. 
   
   
   #Check to make sure Neptune Analytics can assume this role to read from the specificed S3 bucket. 
   aws iam get-role --role-name $GRAPH_EXEC_ROLE --query 'Role.AssumeRolePolicyDocument' --output text
   # Output should contain - SERVICE neptune-graph.amazonaws.com
   ```

1.  The bulk load files are in a public Amazon S3 bucket, but the files themselves are not made public for reading. 

    When adding bulk load files to a public Amazon S3 bucket, ensure that each file's access control list (ACL) is set to allow public reads. For example, to set this through the AWS CLI: 

   ```
     aws s3 cp <FileSourceLocation> <FileTargetLocation> --acl public-read
   ```

    This setting can also be done through the Amazon S3 console or the AWS SDKs. For more details, refer to the documentation for [ Configuring ACLs](https://docs.aws.amazon.com//AmazonS3/latest/userguide/managing-acls.html). 

# Checking the details and progress of an import task
<a name="bulk-import-checking-details"></a>

 You can use the [ GetImportTask](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_GetImportTask.html) API to track the progress and the status of your import task. 

```
aws neptune-graph get-import-task --task-id <task-id>
```

 An Import task can be in the following state: 
+  **INITIALIZING**: The task is preparing for import, including provisioning a graph when using the `CreateGraphUsingImportTask` API. 
+  **ANALYZING\$1DATA**: The task is taking an initial pass through the dataset to determine the optimal configuration for the graph. 
+  **IMPORTING**: The data is being loaded into the graph. 
+  **EXPORTING**: Data is being exported from the Neptune cluster or snapshot. This is only applicable when performing an import task with a source of Neptune and through the `CreateGraphUsingImportTask` API. 
+  **ROLLING\$1BACK**: The import task encountered an error. Refer to the [troubleshooting](bulk-import-troubleshooting.md) section to investigate the errors. The import task will be rolled back and eventually marked as `FAILED`. 
+  **SUCCEEDED**: Graph creation and data loading have succeeded. Use the `get-graph` API to view details of the final graph. 
+  **REPROVISIONING**: A temporary state while the graph is being reconfigured during the import task. 
+  **FAILED**: Graph creation or data loading has failed. Refer to the [troubleshooting](bulk-import-troubleshooting.md) section to understand the reason for the failure. 
+  **CANCELLING**: The user has cancelled the import task, and cancellation is in progress. 
+  **CANCELLED**: The import task has been cancelled, and all resources have been released. 

 Additionally, import task can be used to track the progress of the load, error count and graph summary. 

# Canceling an import task
<a name="bulk-import-cancelling-import"></a>

 You can cancel a running import task by using the [ CancelImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CancelImportTask.html) API. 

```
aws neptune-graph cancel-import-task \ 
--task-id <task-id>
```

 The import task will will be canceled and all changes rolled back. The state of the import task will switch to `CANCELING` after `cancel-import-task` API is called and eventually the state will be `CANCELED` when rollback finishes. You can check the current state of your import task using the [GetImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-checking-details.html) API. 

```
aws neptune-graph get-import-task \
--task-id <task-id>
```

# Troubleshooting
<a name="bulk-import-troubleshooting"></a>

 For both bulk load and batch load, all the errors and summary of the load is sent to the CloudWatch log group in your account. To view the logs go to CloudWatch, click log groups from the left column, then search for and click `/aws/neptune/import-task-logs/`. 

1.  **Batch Load**: The logs for each load is saved under `/aws/neptune/import-task-logs/<graph-id>/<load-id>` CloudWatch log stream. 

1.  **Bulk Load using Import Task**: The logs are saved under `/aws/neptune/import-task-logs/<graph-id>/<task-id>` CloudWatch log stream. 
+  **S3\$1ACCESS\$1DENIED**: The server does not have permissions to list or download the given file. Fix the permissions and retry. See [Create your IAM role for Amazon S3 access](bulk-import-create-from-s3.md#create-iam-role-for-s3-access) for help setting up the Amazon S3 permissions. 
+  **LARGE\$1STRING\$1ERROR**: One or more strings exceeded the limit on the size of strings. This data cannot be inserted as is. Update the strings exceeding the limit and retry. 
+  **PARSING\$1ERROR**: Error parsing the given value(s). Correct the value(s) and retry. More information on different parsing errors is provided in this section. 
+  **OUT\$1OF\$1MEMORY**: No more data can be loaded in the current m-NCU. If encountered during import task, set a higher m-NCU and retry. If encountered during batch load, scale the number of m-NCU and retry the batch load. 
+  **PARTITION\$1FULL\$1ERROR**: No more data can be loaded in the internal server configuration. If encountered during import task, the import workflow would change the server configuration and retry. If encountered during batch load, reach out to the AWS service team to unblock loading of new data. 

**Common parsing errors and solutions**


| Error template | Solution | 
| --- | --- | 
|  Invalid data type encountered for header val:badtype when parsing line `[:ID,firstName:String,val:badtype,:LABEL]`.  |  Incorrect Datatype provided. Check the documentation for supported data types. See [Data formats](loading-data-formats.md) for more information.  | 
|  Multi-valued columns are not supported `firstName:String[]` when parsing line `[:ID,firstName:String[],val:String,:LABEL]`.  |  The `opencypher` format does not support multivalued user defined properties. Try using the `csv` format to insert multivalued vertex properties, or remove multivalued properties.  | 
|  Bad header for a file in '`OPEN_CYPHER`' format, could not determine node or relationship file, found system columns from '`csv`' format when parsing line `[~id,firstName:String,val:int,:LABEL]`.  |  Both the `opencypher` and `csv` format expect certain header columns to be present. Make sure you have entered them correctly.  Check the [Data formats](loading-data-formats.md) documentation for required fields by format.   | 
|  Bad header for a file in '`OPEN_CYPHER`' format, could not determine node or relationship file.  |  The header of the files does not have the required system columns. Check the [Data formats](loading-data-formats.md) for required fields by format.  | 
|  Relationship file in '`OPEN_CYPHER`' format should contain both `:START_ID` and `:END_ID` columns when parsing line `[:START_ID,firstName:String]`.  |  The header of the edge files does not have all the required system columns. Check the [Data formats](loading-data-formats.md) for required fields by format.  | 
|  Invalid data type. Found system columns from '`OPEN_CYPHER`' format `:ID` when parsing line `[:ID,firstName:String,val:Int,~label]`.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  Named column name is not present for header field `:BLAH` when parsing line `[:ID,:BLAH,firstName:String]`.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  System column other than `ID` cannot be stored as a property: <columnHeader>.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  Duplicate user column `firstName` when parsing line `[:ID,:LABEL, firstName:String, firstName:String]`.  |  The file contains duplicate user defined property column names in the header. Remove all of the duplicate columns.  | 
|  Duplicate system column `:ID` found when parsing line `[:ID,:ID,firstName:String,:LABEL]`.  |  The file contains duplicate system column names in the header. Remove all of the duplicate columns.  | 
|  Invalid column name provided for loading embeddings: `[abcd]` for filename: someFilename. Embedding column name must be the same as their corresponding vector index name when parsing line `[:ID,firstName:String,abcd:Vector,:LABEL] in [filename]`.  |  An incorrect name is used for the vector embeddings.  | 
|  "date" type is curretly not supported. "datetime" may be an alternative type.  |  Use `datetime` as the field type as date type suppoorted yet in Neptune Analytics.  | 
|  Headers must be non-empty.  |  Headers need to be non empty. If the file has an empty line in the beginning, remove the empty line.  | 
|  Failure encounted while parsing the `csv` file.  |  Likely reason is the number of columns in the row doesn't match the number of columns provided in the header. If you dont have a value for a column, provide an empty value. For example: `123,vertex,,,`.  | 
|  Could not process value of type:`http://www.w3.org/2001/XMLSchema#int` for value: `a` when parsing line `[v1,v19683,con,a]` in [file].  |  There is a mismatch between the type of the value provided for that column in the row and the type specified in the header. In this specific case the column header is annotated with integer type but a is not parseable as an integer.  | 
|  Could not load vector embedding: `[a,bc]`. Check the dimensionality for this vector.  |  The size of the vector does not match the dimension defined in the vector search configuration for the graph.  | 
|  Could not load vector embedding: `[a,NaN]`. Check the value for this vector.  |  Float and double values in scientific notation are currently not supported. Also `Infinity`, `-Infinity`, `INF`, `-INF`, and `NaN` are not recognized.  | 
|  Could not process value of type: date for value: "2024-11-22T21:40:40Z".  |   The values in columns of type 'date' must not contain time. For instance, "2024-11-22T21:40:40Z" is not a valid value for the 'date' column since it contains the time component '21:40:40Z'. Change the column type to 'dateTime' or remove the time from the column values.   | 
|   Please check if you are loading lines longer than 65536.   |   The CSV format does not support lines longer than 65536 characters. Check if some lines are unexpectedly longer than 65536 characters, and fix those. Also check for properties with long string values and consider excluding those. For files with vector embeddings, if vector embeddings are too long then consider shortening the precision of floating point values. Alternatively, try the Parquet format to ingest data with long lines.   | 

# neptune.read()
<a name="neptune-read"></a>

 Neptune supports a `CALL` procedure `neptune.read` to read data from Amazon S3 and then run an openCypher query (read, insert, update) using the data. The procedure yields each row in the file as a declared result variable row. It uses the IAM credentials of the caller to access the data in Amazon S3. See [ Create your IAM role for Amazon S3 access](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-create-from-s3.html#create-iam-role-for-s3-access) to set up the permissions. The AWS region of the Amazon S3 bucket must be in the same region where Neptune Analytics instance is located. Currently, cross-region reads are not supported. 

 **Syntax** 

```
CALL neptune.read(
  {
    source: "string",
    format: "parquet/csv",
    concurrency: 10
  }
)
YIELD row
...
```

**Inputs**
+  **source** (required) - Amazon S3 URI to a **single** object. Amazon S3 prefix to multiple objects is not supported. 
+  **format** (required) - `parquet` and `csv` are supported. 
  +  More details on the supported Parquet format can be found in [Supported Parquet column types](parquet-column-types.md). 
  +  For more information on the supported csv format, see [Gremlin load data format](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html). 
+  **concurrency** (optional) - Type: 0 or greater integer. Default: 0. Specifies the number of threads to be used for reading the file. If the value is 0, the maximum number of threads allowed by the resource will be used. For Parquet, it is recommended to be set to a number of row groups. 

**Outputs**

 The neptune.read returns: 
+  **row** - type:Map 
  +  Each row in the file, where the keys are the columns and the values are the data found in each column. 
  +  You can access each column's data like a property access (`row.col`). 

# Query examples using Parquet
<a name="parquet-examples"></a>

 The following example query returns the number of rows in a given Parquet file: 

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "parquet"
  }
)
YIELD row
RETURN count(row)
```

 You can run the query example using the `execute-query` operation in the AWS CLI by executing the following code: 

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.read({source: "<s3 path>", 
    format: "parquet"}) YIELD row RETURN count(row)' \
  --language open_cypher \
  /tmp/out.txt
```

 A query can be flexible in what it does with rows read from a Parquet file. For example, the following query creates a node with a field being set to data found in the Parquet file: 

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "parquet"
  }
)
YIELD row
CREATE (n {someField: row.someCol}) 
RETURN n
```

**Warning**  
 It is not considered good practice to use a large results-producing clause like `MATCH(n)` prior to a `CALL` clause. This would lead to a long-running query, due to cross product between incoming solutions from prior clauses and the rows read by neptune.read. It’s recommended to start the query with `CALL neptune.read`. 

# Supported Parquet column types
<a name="parquet-column-types"></a>

**Parquet data types:**
+  NULL 
+  BOOLEAN 
+  FLOAT 
+  DOUBLE 
+  STRING 
+  SIGNED INTEGER: UINT8, UINT16, UINT32, UINT64 
+  MAP: Only supports one-level. Does not support nested. 
+  LIST: Only supports one-level. Does not support nested. 

**Neptune -specific:**
+  A column type `Any` is supported in the user columns. An `Any` type is a type “syntactic sugar” for all of the other types we support. It is extremely useful if a user column has multiple types in it. The payload of an `Any` type value is a list of json strings as follows: `"{""value"": ""10"", ""type"": ""Int""};{""value"": ""1.0"", ""type"": ""Float""}"` , which has a `value` field and a `type` field in each individual json string. The column header of an `Any` type is `propertyname:Any`. The cardinality value of an `Any` column is `set`, meaning that the column can accept multiple values. 
  +  Neptune Analytics supports the following types in an `Any` type: `Bool` (or `Boolean`), `Byte`, `Short`, `Int`, `Long`, `UnsignedByte`, `UnsignedShort`, `UnsignedInt`, `UnsignedLong`, `Float`, `Double`, `Date`, `dateTime`, and `String`. 
  +  `Vector` type is not supported in `Any` type. 
  +  Nested `Any` type is not supported. For example, `"{""value"": "{""value"": ""10"", ""type"": ""Int""}", ""type"": ""Any""}"`. 

# Sample Parquet output
<a name="parquet-output"></a>

 Given a Parquet file like this: 

```
<s3 path>

Parquet Type:
    int8     int16       int32             int64              float      double    string
+--------+---------+-------------+----------------------+------------+------------+----------+
|   Byte |   Short |       Int   |                Long  |     Float  |    Double  | String   |
|--------+---------+-------------+----------------------+------------+------------+----------|
|   -128 |  -32768 | -2147483648 | -9223372036854775808 |    1.23456 |    1.23457 | first    |
|    127 |   32767 |  2147483647 |  9223372036854775807 |  nan       |  nan       | second   |
|      0 |       0 |           0 |                    0 | -inf       | -inf       | third    |
|      0 |       0 |           0 |                    0 |  inf       |  inf       | fourth   |
+--------+---------+-------------+----------------------+------------+------------+----------+
```

 Here is an example of the output returned by `neptune.read` using the following query: 

```
aws neptune-graph execute-query \ 
--graph-identifier ${graphIdentifier} \ 
--query-string "CALL neptune.read({source: '<s3 path>', format: 'parquet'}) YIELD row RETURN row" \ 
--language open_cypher \
 /tmp/out.txt 
 
 
cat /tmp/out.txt 

{
 "results": [{
 "row": {
 "Float": 1.23456,
 "Byte": -128,
 "Int": -2147483648,
 "Long": -9223372036854775808,
 "String": "first",
 "Short": -32768,
 "Double": 1.2345678899999999
 }
 }, {
 "row": {
 "Float": "NaN",
 "Byte": 127,
 "Int": 2147483647,
 "Long": 9223372036854775807,
 "String": "second",
 "Short": 32767,
 "Double": "NaN"
 }
 }, {
 "row": {
 "Float": "-INF",
 "Byte": 0,
 "Int": 0,
 "Long": 0,
 "String": "third",
 "Short": 0,
 "Double": "-INF"
 }
 }, {
 "row": {
 "Float": "INF",
 "Byte": 0,
 "Int": 0,
 "Long": 0,
 "String": "fourth",
 "Short": 0,
 "Double": "INF"
 }
 }]
}%
```

 Currently, there is no way to set a node or edge label to a data field coming from a Parquet file. It is recommended that you partition the queries into multiple queries, one for each label/Type. 

```
CALL neptune.read({source: '<s3 path>', format: 'parquet'})
 YIELD row 
WHERE row.`~label` = 'airport'
CREATE (n:airport)

CALL neptune.read({source: '<s3 path>', format: 'parquet'})
YIELD row 
WHERE row.`~label` = 'country'
CREATE (n:country)
```

# Query examples using CSV
<a name="csv-examples"></a>

 In this example, the query returns the number of rows in a given CSV file: 

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "csv"
  }
)
YIELD row
RETURN count(row)
```

 You can run the query using the `execute-query` operation in the AWS CLI: 

```
aws neptune-graph execute-query \
  --graph-identifier ${graphIdentifier} \
  --query-string 'CALL neptune.read({source: "<s3 path>", 
    format: "csv"}) YIELD row RETURN count(row)' \
  --language open_cypher \
  /tmp/out.txt
```

 A query can be flexible in what it does with rows read from a Parquet file. For instance, the following query creates a node with a field set to data from a CSV file: 

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "csv"
  }
)
YIELD row
CREATE (n {someField: row.someCol}) 
RETURN n
```

**Warning**  
 It is not considered good practice use a large results-producing clause like `MATCH(n)` prior to a `CALL` clause. This would lead to a long-running query due to cross product between incoming solutions from prior clauses and the rows read by neptune.read. It is recommended to start the query with CALL neptune.read. 

# Property column headers
<a name="property-column-headers"></a>

 You can specify a column (`:`) for a property by using the following syntax. The type names are not case sensitive. If a colon appears within a property name, it must be escaped by preceding it with a backslash: `\:`. 

```
propertyname:type
```

**Note**  
 Space, comma, carriage return and newline characters are not allowed in the column headers, so property names cannot include these characters.   
 You can specify a column for an array type by adding [] to the type:   

  ```
  propertyname:type[]
  ```
 Edge properties can only have a single value and will cause an error if an array type is specified or a second value is specified. The following example shows the column header for a property named age of type Int.   

  ```
  age:Int
  ```
 Every row in the file would be required to have an integer in that position or be left empty. Arrays of strings are allowed, but strings in an array cannot include the semicolon (`;`) character unless it is escaped using a backslash (`\;`). 

# Supported CSV column types
<a name="csv-column-types"></a>
+  Bool (or Boolean) - Allowed values: `true`, `false`. Indicates a Boolean field. Any value other than `true` will be treated as `false`. 
+  FLOAT - Range: 32-bit IEEE 754 floating point including Infinity, INF, -Infinity, -INF and NaN (not-a-number). 
+  DOUBLE - Range: 64-bit IEEE 754 floating point including Infinity, INF, -Infinity, -INF and NaN (not-a-number). 
+  STRING - 
  +  Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks (`"`). Example: `"Hello, World"`. 
  +  To include quotation marks in a quoted string, you can escape the quotation mark by using two in a row: Example: `"Hello ""World"""`. 
  +  Arrays of strings are allowed, but strings in an array cannot include the semicolon (`;`) character unless it is escaped using a backslash (`\;`). 
  +  If you want to surround strings in an array with quotation marks, you must surround the whole array with one set of quotation marks. Example: `"String one; String 2; String 3"`. 
+  Datetime - The datetime values can be provided in either the XSD format, or one of the following formats: 
  +  yyyy-MM-dd 
  +  yyyy-MM-ddTHH:mm 
  +  yyyy-MM-ddTHH:mm:ss 
  +  yyyy-MM-ddTHH:mm:ssZ 
  +  yyyy-MM-ddTHH:mm:ss.SSSZ 
  +  yyyy-MM-ddTHH:mm:ss[\$1\$1-]hhmm 
  +  yyyy-MM-ddTHH:mm:ss.SSS[\$1\$1-]hhmm 
+  SIGNED INTEGER - 
  +  Byte: -128 to 127 
  +  Short: -32768 to 32767 
  +  Int: -2^31 to 2^31-1 
  +  Long: -2^63 to 2^63-1 

**Neptune -specific:**
+  A column type `Any` is supported in the user columns. An `Any` type is a type “syntactic sugar” for all of the other types we support. It is extremely useful if a user column has multiple types in it. The payload of an `Any` type value is a list of json strings as follows: `"{""value"": ""10"", ""type"": ""Int""};{""value"": ""1.0"", ""type"": ""Float""}"` , which has a `value` field and a `type` field in each individual json string. The column header of an `Any` type is `propertyname:Any`. The cardinality value of an `Any` column is `set`, meaning that the column can accept multiple values. 
  +  Neptune Analytics supports the following types in an `Any` type: `Bool` (or `Boolean`), `Byte`, `Short`, `Int`, `Long`, `UnsignedByte`, `UnsignedShort`, `UnsignedInt`, `UnsignedLong`, `Float`, `Double`, `Date`, `dateTime`, and `String`. 
  +  `Vector` type is not supported in `Any` type. 
  +  Nested `Any` type is not supported. For example, `"{""value"": "{""value"": ""10"", ""type"": ""Int""}", ""type"": ""Any""}"`. 

# Sample CSV output
<a name="csv-output"></a>

 Given the following CSV file: 

```
<s3 path>
colA:byte,colB:short,colC:int,colD:long,colE:float,colF:double,colG:string
-128,-32768,-2147483648,-9223372036854775808,1.23456,1.23457,first
127,32767,2147483647,9223372036854775807,nan,nan,second
0,0,0,0,-inf,-inf,third
0,0,0,0,inf,inf,fourth
```

 This example shows the output returned by `neptune.read` using the following query: 

```
aws neptune-graph execute-query \ 
--graph-identifier ${graphIdentifier} \ 
--query-string "CALL neptune.read({source: '<s3 path>', format: 'csv'}) YIELD row RETURN row" \ 
--language open_cypher \
 /tmp/out.txt 
 
 
cat /tmp/out.txt 
{
  "results": [{
      "row": {
        "colD": -9223372036854775808,
        "colC": -2147483648,
        "colE": 1.23456,
        "colB": -32768,
        "colF": 1.2345699999999999,
        "colG": "first",
        "colA": -128
      }
    }, {
      "row": {
        "colD": 9223372036854775807,
        "colC": 2147483647,
        "colE": "NaN",
        "colB": 32767,
        "colF": "NaN",
        "colG": "second",
        "colA": 127
      }
    }, {
      "row": {
        "colD": 0,
        "colC": 0,
        "colE": "-INF",
        "colB": 0,
        "colF": "-INF",
        "colG": "third",
        "colA": 0
      }
    }, {
      "row": {
        "colD": 0,
        "colC": 0,
        "colE": "INF",
        "colB": 0,
        "colF": "INF",
        "colG": "fourth",
        "colA": 0
      }
    }]
}%
```

 Currently, there is no way to set a node or edge label to a data field coming from a csv file. It is recommended that you partition the queries into multiple queries, one for each label/type. 

```
CALL neptune.read({source: '<s3 path>', format: 'csv'})
 YIELD row 
WHERE row.`~label` = 'airport'
CREATE (n:airport)

CALL neptune.read({source: '<s3 path>', format: 'csv'})
YIELD row 
WHERE row.`~label` = 'country'
CREATE (n:country)
```