

# Querying a Neptune Graph
<a name="access-graph-queries"></a>

Neptune supports the following graph query languages to access a graph:
+ [Gremlin](https://tinkerpop.apache.org/gremlin.html), defined by [Apache TinkerPop](https://tinkerpop.apache.org/) for creating and querying property graphs.

  A query in Gremlin is a traversal made up of discrete steps, each of which follows an edge to a node.

  See [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to learn about using Gremlin in Neptune, and [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md) to find specific details about the Neptune implementation of Gremlin.
+ [openCypher](access-graph-opencypher.md) is a declarative query language for property graphs that was originally developed by Neo4j, then open-sourced in 2015, and contributed to the [openCypher](http://www.opencypher.org/) project under an Apache 2 open-source license. Its syntax is documented in the [openCypher spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf).
+ [SPARQL](https://www.w3.org/TR/sparql11-overview/) is a declarative language based on graph pattern-matching, for querying [RDF](https://www.w3.org/2001/sw/wiki/RDF) data. It is supported by the [World Wide Web Consortium](https://www.w3.org/).

  See [Accessing the Neptune graph with SPARQL](access-graph-sparql.md) to learn about using SPARQL in Neptune, and [SPARQL standards compliance in Amazon Neptune](feature-sparql-compliance.md) to find specific details about the Neptune implementation of SPARQL.

**Note**  
Both Gremlin and openCypher can be used to query any property-graph data stored in Neptune, regardless of how it was loaded.

**Topics**
+ [Query queuing in Amazon Neptune](access-graph-queuing.md)
+ [Query plan cache in Amazon Neptune](access-graph-qpc.md)
+ [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md)
+ [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md)
+ [Accessing the Neptune Graph with openCypher](access-graph-opencypher.md)
+ [Accessing the Neptune graph with SPARQL](access-graph-sparql.md)

# Query queuing in Amazon Neptune
<a name="access-graph-queuing"></a>

When developing and tuning graph applications, it can be helpful to know the implications of how queries are being queued by the database. In Amazon Neptune, query queuing occurs as follows:
+ The maximum number of queries that can be queued up per instance, regardless of the instance size, is 8,192. Any queries over that number are rejected and fail with a `ThrottlingException`.
+ The maximum number of queries that can be executing at one time is determined by the number of worker threads assigned, which is generally set to twice the number of virtual CPU cores (vCPUs) that are available.
+ Query latency includes the time a query spends in the queue as well as network round-tripping and the time it actually takes to execute.

## Determining how many queries are in your queue at a given moment
<a name="access-graph-queuing-count"></a>

The `MainRequestQueuePendingRequests` CloudWatch metric records the the number of requests waiting in the input queue at a five-minutes granularity (see [Neptune CloudWatch Metrics](cw-metrics.md)).

For Gremlin, you can obtain a current count of queries in the queue using the `acceptedQueryCount` value returned by the [Gremlin query status API](gremlin-api-status.md). Note, however, that the `acceptedQueryCount` value returned by the [SPARQL query status API](sparql-api-status.md) includes all queries accepted since the server was started, including completed queries.

## How query queuing can affect timeouts
<a name="access-graph-queuing-timeouts"></a>

As noted above, query latency includes the time a query spends in the queue as well as the time it takes to execute.

Because a query's timeout period is generally measured starting from when it enters the queue, a slow-moving queue can make many queries time out as soon as they are dequeued. This is obviously undesirable, so it is good to avoid queuing up a large number of queries unless they can be executed rapidly.

# Query plan cache in Amazon Neptune
<a name="access-graph-qpc"></a>

 When a query is submitted to Neptune, the query string is parsed, optimized, and transformed into a query plan, which then gets executed by the engine. Applications are often backed by common query patterns that are instantiated with different values. Query plan cache can reduce the overall latency by caching the query plans and thereby avoiding parsing and optimization for such repeated patterns. 

 Query Plan Cache can be used for **OpenCypher** queries — both non-parameterized or parameterized queries. It is enabled for READ, and for HTTP and Bolt. It is **not** supported for OC mutation queries. It is **not** supported for Gremlin or SPARQL queries. 

## How to force enable or disable query plan cache
<a name="access-graph-qpc-enable"></a>

 Query plan cache is enabled by default for low-latency parameterized queries. A plan for a parameterized query is cached only when latency is lower than the threshold of **100ms**. This behavior can be overridden on a per-query (parameterized or not) basis by the query-level Query Hint `QUERY:PLANCACHE`. It needs to be used with the `USING` clause. The query hint accepts `enabled` or `disabled` as a value. 

------
#### [ AWS CLI ]

Forcing plan to be cached or reused:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

With parameters:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
  --parameters '{"arg": 123}'
```

Forcing plan to be neither cached nor reused:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1"
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

# Forcing plan to be cached or reused
response = client.execute_open_cypher_query(
    openCypherQuery='Using QUERY:PLANCACHE "enabled" MATCH(n) RETURN n LIMIT 1'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

Forcing plan to be cached or reused:

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

Forcing plan to be cached or reused:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

With parameters:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
  -d "parameters={\"arg\": 123}"
```

Forcing plan to be neither cached nor reused:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1"
```

------

## How to determine if a plan is cached or not
<a name="access-graph-qpc-status"></a>

 For HTTP READ, if the query was submitted and the plan was cached, `explain` would show details relevant to query plan cache. 

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery='Using QUERY:PLANCACHE "enabled" MATCH(n) RETURN n LIMIT 1',
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \
  -d "explain=details"
```

------

If the plan was cached, the `explain` output shows:

```
Query: <QUERY STRING>
Plan cached by request: <REQUEST ID OF FIRST TIME EXECUTION>
Plan cached at: <TIMESTAMP OF FIRST TIME EXECUTION>
Parameters: <PARAMETERS, IF QUERY IS PARAMETERIZED QUERY>
Plan cache hits: <NUMBER OF CACHE HITS FOR CACHED PLAN>
First query evaluation time: <LATENCY OF FIRST TIME EXECUTION>

The query has been executed based on a cached query plan. Detailed explain with operator runtime statistics can be obtained by running the query with plan cache disabled (using HTTP parameter planCache=disabled).
```

 When using Bolt, the explain feature is not supported. 

## Eviction
<a name="access-graph-qpc-eviction"></a>

 A query plan is evicted by the cache time to live (TTL) or when a maximum number of cached query plans have been reached. When the query plan is hit, the TTL is refreshed. The defaults are: 
+  1000 - The maximum number of plans that can be cached per instance. 
+  TTL - 300,000 milliseconds or 5 minutes. The cache hit restarts the TTL, and resets it back to 5 min. 

## Conditions causing the plan not to be cached
<a name="access-graph-qpc-conditions"></a>

 Query plan cache would not be used under the following conditions: 

1.  When a query is submitted using the query hint `QUERY:PLANCACHE "disabled"`. You can re-run the query and remove `QUERY:PLANCACHE "disabled"` to enable the query plan cache. 

1.  If the query that was submitted is not a parameterized query and does not contain the hint `QUERY:PLANCACHE "enabled"`. 

1.  If the query evaluation time is larger than the latency threshold, the query is not cached and is considered a long-running query that would not benefit from the query plan cache. 

1.  If the query contains a pattern that doesn't return any results. 
   +  i.e. `MATCH (n:nonexistentLabel) return n` when there are zero nodes with the specified label. 
   +  i.e. `MATCH (n {name: $param}) return n` with `parameters={"param": "abcde"}` when there are zero nodes containing `name=abcde`. 

1.  If the query parameter is a composite type, such as a `list` or a `map`. 

   ```
   curl https://your-neptune-endpoint:port/openCypher \
     -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
     -d "parameters={\"arg\": [1, 2, 3]}"
   
   curl https://your-neptune-endpoint:port/openCypher \
     -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
     -d "parameters={\"arg\": {\"a\": 1}}"
   ```

1.  If the query parameter is a string that has not been part of a data load or data insertion operation. For example, if `CREATE (n {name: "X"})` is ran to insert `"X"`, then `RETURN "X"` is cached, while `RETURN "Y"` would not be cached, as `"Y"` has not been inserted and does not exist in the database. 

# Inject a Custom ID Into a Neptune Gremlin or SPARQL Query
<a name="features-query-id"></a>

By default, Neptune assigns a unique `queryId` value to every query. You can use this ID to get information about a running query (see [Gremlin query status API](gremlin-api-status.md) or [SPARQL query status API](sparql-api-status.md)), or cancel it (see [Gremlin query cancellation](gremlin-api-status-cancel.md) or [SPARQL query cancellation](sparql-api-status-cancel.md)).

Neptune also lets you specify your own `queryId` value for a Gremlin or SPARQL query, either in the HTTP header, or for a SPARQL query by using the `queryId` query hint. Assigning your own `queryID` makes it easy to keep track of a query so as to get status or cancel it.

## Injecting a Custom `queryId` Value Using the HTTP Header
<a name="features-query-id-header"></a>

For both Gremlin and SPARQL, the HTTP header can be used to inject your own `queryId` value into a query.

**Gremlin Example**

```
curl -XPOST https://your-neptune-endpoint:port \
    -d "{\"gremlin\": \
        \"g.V().limit(1).count()\" , \
        \"queryId\":\"4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47\"  }"
```

**SPARQL Example**

```
curl https://your-neptune-endpoint:port/sparql \
    -d "query=SELECT * WHERE { ?s ?p ?o } " \
       --data-urlencode \
       "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47"
```

## Injecting a Custom `queryId` Value Using a SPARQL Query Hint
<a name="features-query-id-hint"></a>

Here is an example of how you would use the SPARQL `queryId` query hint to inject a custom `queryId` value into a SPARQL query:

```
curl https://your-neptune-endpoint:port/sparql \
    -d "query=PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#> \
       SELECT * WHERE { hint:Query hint:queryId \"4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47\" \
       {?s ?p ?o}}"
```

## Using the `queryId` Value to Check Query Status
<a name="features-query-id-check-status"></a>

**Gremlin Example**

```
curl https://your-neptune-endpoint:port/gremlin/status \
    -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47"
```

**SPARQL Example**

```
curl https://your-neptune-endpoint:port/sparql/status \
    -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47"
```

# Accessing a Neptune graph with Gremlin
<a name="access-graph-gremlin"></a>

Amazon Neptune is compatible with Apache TinkerPop and Gremlin. This means that you can connect to a Neptune DB instance and use the Gremlin traversal language to query the graph (see [The Graph](https://tinkerpop.apache.org/docs/current/reference/#graph) in the Apache TinkerPop documentation). For differences in the Neptune implementation of Gremlin, see [Gremlin standards compliance](access-graph-gremlin-differences.md).

 A *traversal* in Gremlin is a series of chained steps. It starts at a vertex (or edge). It walks the graph by following the outgoing edges of each vertex and then the outgoing edges of those vertices. Each step is an operation in the traversal. For more information, see [The Traversal](https://tinkerpop.apache.org/docs/current/reference/#traversal) in the TinkerPop documentation.

Different Neptune engine versions support different Gremlin versions. Check the [engine release page](engine-releases.md) of the Neptune version you are running to determine which Gremlin release it supports or consult the following table which lists the earliest and latest versions of TinkerPop supported by different Neptune engine versions:


| Neptune Engine Version | Minimum TinkerPop Version | Maximum TinkerPop Version | 
| --- | --- | --- | 
| `1.3.2.0 and newer` | `3.7.1` | `3.7.3` | 
| `1.3.1.0` | `3.6.2` | `3.6.5` | 
| `1.3.0.0` | `3.6.2` | `3.6.4` | 
| `1.2.1.0 <= 1.2.1.2` | `3.6.2` | `3.6.2` | 
| `1.1.1.0 <= 1.2.0.2` | `3.5.5` | `3.5.6` | 
| `1.1.0.0 and older` | `(deprecated)` | `(deprecated)` | 

TinkerPop clients are usually backwards compatible within a series (`3.6.x`, for example, or `3.7.x`) and while they can often work across those boundaries, the table above recommends the version combinations to use for the best possible experience and compatibility. Unless otherwise advised, it is generally best to adhere to these guidelines and upgrade client applications to match the version of TinkerPop you are using.

When upgrading TinkerPop versions it is always important to refer to [TinkerPop's upgrade documentation](http://tinkerpop.apache.org/docs/current/upgrade/) which will help you identify new features you can take advantage of, but also issues you may need to be aware of as you approach your upgrade. You should typically expect existing queries and features to work after upgrade unless something in particular is called out as an issue to consider. Finally, it is important to note that should a version you upgrade to have a new feature, you may not be able to use it if it is from a version later than what Neptune supports.

There are Gremlin language variants and support for Gremlin access in various programming languages. For more information, see [On Gremlin Language Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants) in the TinkerPop documentation.

This documentation describes how to access Neptune with the following variants and programming languages:
+ [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md)
+ [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md)
+ [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md)
+ [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md)
+ [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md)
+ [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md)
+ [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md)

As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), you must use Transport Layer Security/Secure Sockets Layer (TLS/SSL) when connecting to Neptune in all AWS Regions.

Before you begin, you must have the following:
+ A Neptune DB instance. For information about creating a Neptune DB instance, see [Creating an Amazon Neptune cluster](get-started-create-cluster.md).
+ An Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

For more information about loading data into Neptune, including prerequisites, loading formats, and load parameters, see [Loading data into Amazon Neptune](load-data.md).

**Topics**
+ [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md)
+ [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md)
+ [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md)
+ [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md)
+ [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md)
+ [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md)
+ [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md)
+ [Using the AWS SDK to run Gremlin queries](access-graph-gremlin-sdk.md)
+ [Gremlin query hints](gremlin-query-hints.md)
+ [Gremlin query status API](gremlin-api-status.md)
+ [Gremlin query cancellation](gremlin-api-status-cancel.md)
+ [Support for Gremlin script-based sessions](access-graph-gremlin-sessions.md)
+ [Gremlin transactions in Neptune](access-graph-gremlin-transactions.md)
+ [Using the Gremlin API with Amazon Neptune](gremlin-api-reference.md)
+ [Caching query results in Amazon Neptune Gremlin](gremlin-results-cache.md)
+ [Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps](gremlin-efficient-upserts.md)
+ [Making efficient Gremlin upserts with `fold()/coalesce()/unfold()`](gremlin-efficient-upserts-pre-3.6.md)
+ [Analyzing Neptune query execution using Gremlin `explain`](gremlin-explain.md)
+ [Using Gremlin with the Neptune DFE query engine](gremlin-with-dfe.md)

# Set up the Gremlin console to connect to a Neptune DB instance
<a name="access-graph-gremlin-console"></a>

The Gremlin Console allows you to experiment with TinkerPop graphs and queries in a REPL (read-eval-print loop) environment.

## Installing the Gremlin console and connecting to it in the usual way
<a name="access-graph-gremlin-console-usual-connect"></a>

You can use the Gremlin Console to connect to a remote graph database. The following section walks you through installing and configuring the Gremlin Console to connect remotely to a Neptune DB instance. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](access-graph-gremlin-java.md#access-graph-gremlin-java-ssl).

**Note**  
If you have [IAM authentication enabled](iam-auth-enable.md) on your Neptune DB cluster, follow the instructions in [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md) to install the Gremlin console rather than the instructions here.

**To install the Gremlin Console and connect to Neptune**

1. The Gremlin Console binaries require Java 8 or Java 11. These instructions assume usage of Java 11. You can install Java 11 on your EC2 instance as follows:
   + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2):

     ```
     sudo amazon-linux-extras install java-openjdk11
     ```
   + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html):

     ```
     sudo yum install java-11-amazon-corretto-devel
     ```
   + For other distributions, use whichever of the following is appropriate:

     ```
     sudo yum install java-11-openjdk-devel
     ```

     or:

     ```
     sudo apt-get install openjdk-11-jdk
     ```

1. Enter the following to set Java 11 as the default runtime on your EC2 instance.

   ```
   sudo /usr/sbin/alternatives --config java
   ```

   When prompted, enter the number for Java 11.

1. Download the appropriate version of the Gremlin Console from the Apache web site. You can check the [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine which Gremlin version your version of Neptune supports. For example, if you need version 3.7.2, you can download the [Gremlin console](https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip) from the [Apache Tinkerpop](https://tinkerpop.apache.org/download.html) website onto your EC2 instance like this:

   ```
   wget https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip
   ```

1. Unzip the Gremlin Console zip file.

   ```
   unzip apache-tinkerpop-gremlin-console-3.7.2-bin.zip
   ```

1. Change directories into the unzipped directory.

   ```
   cd apache-tinkerpop-gremlin-console-3.7.2
   ```

1. In the `conf` subdirectory of the extracted directory, create a file named `neptune-remote.yaml` with the following text. Replace *your-neptune-endpoint* with the hostname or IP address of your Neptune DB instance. The square brackets (`[ ]`) are required.
**Note**  
For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   hosts: [your-neptune-endpoint]
   port: 8182
   connectionPool: { enableSsl: true }
   serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1,
                 config: { serializeResultToString: true }}
   ```
**Note**  
 Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. 

1. In a terminal, navigate to the Gremlin Console directory (`apache-tinkerpop-gremlin-console-3.7.2`), and then enter the following command to run the Gremlin Console.

   ```
   bin/gremlin.sh
   ```

   You should see the following output:

   ```
            \,,,/
            (o o)
   -----oOOo-(3)-oOOo-----
   plugin activated: tinkerpop.server
   plugin activated: tinkerpop.utilities
   plugin activated: tinkerpop.tinkergraph
   gremlin>
   ```

   You are now at the `gremlin>` prompt. You will enter the remaining steps at this prompt.

1. At the `gremlin>` prompt, enter the following to connect to the Neptune DB instance.

   ```
   :remote connect tinkerpop.server conf/neptune-remote.yaml
   ```

1. At the `gremlin>` prompt, enter the following to switch to remote mode. This sends all Gremlin queries to the remote connection.

   ```
   :remote console
   ```

1. Enter the following to send a query to the Gremlin Graph.

   ```
   g.V().limit(1)
   ```

1. When you are finished, enter the following to exit the Gremlin Console.

   ```
   :exit
   ```

**Note**  
Use a semicolon (`;`) or a newline character (`\n`) to separate each statement.   
Each traversal preceding the final traversal must end in `next()` to be executed. Only the data from the final traversal is returned.

For more information on the Neptune implementation of Gremlin, see [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md).

# An alternate way to connect to the Gremlin console
<a name="access-graph-gremlin-console-connect"></a>

**Drawbacks of the normal connection approach**

The most common way to connect to the Gremlin console is the one explained above, using commands like this at the `gremlin>` prompt:

```
gremlin> :remote connect tinkerpop.server conf/(file name).yaml
gremlin> :remote console
```

This works well, and lets you send queries to Neptune. However, it takes the Groovy script engine out of the loop, so Neptune treats all queries as pure Gremlin. This means that the following query forms fail:

```
gremlin> 1 + 1
gremlin> x = g.V().count()
```

The closest you can get to using a variable when connected this way is to use the `result` variable maintained by the console and send the query using `:>`, like this:

```
gremlin> :remote console
==>All scripts will now be evaluated locally - type ':remote console' to return to remote mode for Gremlin Server - [krl-1-cluster.cluster-ro-cm9t6tfwbtsr.us-east-1.neptune.amazonaws.com/172.31.19.217:8182]
gremlin> :> g.V().count()
==>4249

gremlin> println(result)
[result{object=4249 class=java.lang.Long}]

gremlin> println(result['object'])
[4249]
```

 

**A different way to connect**

You can also connect to the Gremlin console in a different way, which you may find nicer, like this:

```
gremlin> g = traversal().withRemote('conf/neptune.properties')
```

Here `neptune.properties` takes this form:

```
gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=conf/my-cluster.yaml
gremlin.remote.driver.sourceName=g
```

The `my-cluster.yaml` file should look like this:

```
hosts: [my-cluster-abcdefghijk.us-east-1.neptune.amazonaws.com]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1,
              config: { serializeResultToString: false } }
connectionPool: { enableSsl: true }
```

**Note**  
 Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. 

Configuring the Gremlin console connection like that lets you make the following kinds of queries successfully:

```
gremlin> 1+1
==>2

gremlin> x=g.V().count().next()
==>4249

gremlin> println("The answer was ${x}")
The answer was 4249
```

You can avoid displaying the result, like this:

```
gremlin> x=g.V().count().next();[]
gremlin> println(x)
4249
```

All the usual ways of querying (without the terminal step) continue to work. For example:

```
gremlin> g.V().count()
==>4249
```

You can even use the [https://tinkerpop.apache.org/docs/current/reference/#io-step](https://tinkerpop.apache.org/docs/current/reference/#io-step) step to load a file with this kind of connection.

## IAM authentication
<a name="access-graph-gremlin-console-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from the Gremlin console, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md).

# Using the HTTPS REST endpoint to connect to a Neptune DB instance
<a name="access-graph-gremlin-rest"></a>

Amazon Neptune provides an HTTPS endpoint for Gremlin queries. The REST interface is compatible with whatever Gremlin version your DB cluster is using (see the [engine release page](engine-releases.md) of the Neptune engine version you are running to determine which Gremlin release it supports).

**Note**  
As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), Neptune now requires that you connect using HTTPS instead of HTTP. In addition, Neptune does not currently support HTTP/2 for REST API requests. Clients must use HTTP/1.1 when connecting to endpoints.

The following instructions walk you through connecting to the Gremlin endpoint using the `curl` command and HTTPS. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

The HTTPS endpoint for Gremlin queries to a Neptune DB instance is `https://your-neptune-endpoint:port/gremlin`.

**Note**  
For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md).

## To connect to Neptune using the HTTP REST endpoint
<a name="access-graph-gremlin-rest-connect"></a>

The following examples show how to submit a Gremlin query to the REST endpoint. You can use the AWS SDK, the AWS CLI, or **curl**.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.V().limit(1)"
```

For more information, see [execute-gremlin-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
import json
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_query(
    gremlinQuery='g.V().limit(1)',
    serializer='application/vnd.gremlin-v3.0+json;types=false'
)

print(json.dumps(response['result'], indent=2))
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().limit(1)"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

The following example uses **curl** to submit a Gremlin query through HTTP **POST**. The query is submitted in JSON format in the body of the post as the `gremlin` property.

```
curl -X POST -d '{"gremlin":"g.V().limit(1)"}' https://your-neptune-endpoint:port/gremlin
```

Although HTTP **POST** requests are recommended for sending Gremlin queries, it is also possible to use HTTP **GET** requests:

```
curl -G "https://your-neptune-endpoint:port?gremlin=g.V().count()"
```

------

These examples return the first vertex in the graph by using the `g.V().limit(1)` traversal. You can query for something else by replacing it with another Gremlin traversal.

**Important**  
By default, the REST endpoint returns all results in a single JSON result set. If this result set is too large, an `OutOfMemoryError` exception can occur on the Neptune DB instance.  
You can avoid this by enabling chunked responses (results returned in a series of separate responses). See [Use optional HTTP trailing headers to enable multi-part Gremlin responses](access-graph-gremlin-rest-trailing-headers.md).

**Note**  
Neptune does not support the `bindings` property.

# Use optional HTTP trailing headers to enable multi-part Gremlin responses
<a name="access-graph-gremlin-rest-trailing-headers"></a>

By default, the HTTP response to Gremlin queries is returned in a single JSON result set. In the case of a very large result set, this can cause an `OutOfMemoryError` exception on the DB instance.

However, you can enable *chunked* responses (responses that are returned in multiple separate parts). You do this by including a transfer-encoding (TE) trailers header (`te: trailers`) in your request. See [the MDN page about TE request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/TE)) for more information about TE headers.

When a response is returned in multiple parts, it can be hard to diagnose a problem that occurs after the first part is received, since the first part arrives with an HTTP status code of `200` (OK). A subsequent failure usually results in a message body containing a corrupt response, at the end of which Neptune appends an error message.

To make detection and diagnosis of this kind of failure easier, Neptune also includes two new header fields within the trailing headers of every response chunk:
+ `X-Neptune-Status`  –   contains the response code followed by a short name. For instance, in case of success the trailing header would be: `X-Neptune-Status: 200 OK`. In the case of failure, the response code would be one of the [Neptune engine error code](errors-engine-codes.md), such as `X-Neptune-Status: 500 TimeLimitExceededException`.
+ `X-Neptune-Detail`  –   is empty for successful requests. In the case of errors, it contains the JSON error message. Because only ASCII characters are allowed in HTTP header values, the JSON string is URL encoded.

**Note**  
Neptune does not currently support `gzip` compression of chunked responses. If the client requests both chunked encoding and compression at the same time, Neptune skips the compression.

# Java-based Gremlin clients to use with Amazon Neptune
<a name="access-graph-gremlin-client"></a>

You can use either of two open-source Java-based Gremlin clients with Amazon Neptune: the [Apache TinkerPop Java Gremlin client](https://search.maven.org/artifact/org.apache.tinkerpop/gremlin-driver), or the [Gremlin client for Amazon Neptune](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client).

## Apache TinkerPop Java Gremlin client
<a name="access-graph-gremlin-java-driver"></a>

The Apache TinkerPop Java [gremlin-driver](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) is the standard, official Gremlin client that works with any TinkerPop-enabled graph database. Use this client when you need maximum compatibility with the broader TinkerPop development space, when you're working with multiple graph database systems, or when you don't require the advanced cluster management and load balancing features specific to Neptune. This client is also suitable for simple applications that connect to a single Neptune instance or when you prefer to handle load balancing at the infrastructure level rather than within the client.

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

**Note**  
The table that helps you determine the correct Apache TinkerPop version to use with Neptune has been moved to [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md). This table was previously located on this page for many years and is now more centralized for reference for all programming languages that TinkerPop supports.

## Gremlin Java client for Amazon Neptune
<a name="access-graph-neptune-gremlin-client"></a>

The Gremlin client for Amazon Neptune is an [open-source Java-based Gremlin client](https://github.com/aws/neptune-gremlin-client) that acts as a drop-in replacement for the standard TinkerPop Java client.

The Neptune Gremlin client is optimized for Neptune clusters. It lets you manage traffic distribution across multiple instances in a cluster, and adapts to changes in cluster topology when you add or remove a replica. You can even configure the client to distribute requests across a subset of instances in your cluster, based on role, instance type, availability zone (AZ), or tags associated with instances.

The [latest version of the Neptune Gremlin Java client](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client) is available on Maven Central.

For more information about the Neptune Gremlin Java client, see [this blog post](https://aws.amazon.com/blogs/database/load-balance-graph-queries-using-the-amazon-neptune-gremlin-client/). For code samples and demos, check out the [client's GitHub project](https://github.com/aws/neptune-gremlin-client).

When choosing the version of the Neptune Gremlin client, you need to consider the underlying TinkerPop version in relation to your Neptune engine version. Refer to the compatibility table at [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine the correct TinkerPop version for your Neptune engine, then use the following table to select the appropriate Neptune Gremlin client version:


**Neptune Gremlin client version compatibility**  

| Neptune Gremlin client version | TinkerPop version | 
| --- | --- | 
| 3.x | 3.7.x (AWS SDK for Java 2.x/1.x) | 
| 2.1.x | 3.7.x (AWS SDK for Java 1.x) | 
| 2.0.x | 3.6.x | 
| 1.12 | 3.5.x | 

# Using a Java client to connect to a Neptune DB instance
<a name="access-graph-gremlin-java"></a>

The following section walks you through the running of a complete Java sample that connects to a Neptune DB instance and performs a Gremlin traversal using the Apache TinkerPop Gremlin client.

These instructions must be followed from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

**To connect to Neptune using Java**

1. Install Apache Maven on your EC2 instance. If using Amazon Linux 2023 (preferred), use:

   ```
   sudo dnf update -y
   sudo dnf install maven -y
   ```

   If using Amazon Linux 2, download the latest binary from [ https://maven.apache.org/download.cgi: ](https://maven.apache.org/download.cgi:)

   ```
   sudo yum remove maven -y
   wget https://dlcdn.apache.org/maven/maven-3/ <version>/binaries/apache-maven-<version>-bin.tar.gz
   sudo tar -xzf apache-maven-<version>-bin.tar.gz -C /opt/
   sudo ln -sf /opt/apache-maven-<version> /opt/maven
   echo 'export MAVEN_HOME=/opt/maven' >> ~/.bashrc
   echo 'export PATH=$MAVEN_HOME/bin:$PATH' >> ~/.bashrc
   source ~/.bashrc
   ```

1. **Install Java.** The Gremlin libraries need Java 8 or 11. You can install Java 11 as follows:
   + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2):

     ```
     sudo amazon-linux-extras install java-openjdk11
     ```
   + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html):

     ```
     sudo yum install java-11-amazon-corretto-devel
     ```
   + For other distributions, use whichever of the following is appropriate:

     ```
     sudo yum install java-11-openjdk-devel
     ```

     or:

     ```
     sudo apt-get install openjdk-11-jdk
     ```

1. **Set Java 11 as the default runtime on your EC2 instance:** Enter the following to set Java 8 as the default runtime on your EC2 instance:

   ```
   sudo /usr/sbin/alternatives --config java
   ```

   When prompted, enter the number for Java 11.

1. **Create a new directory named `gremlinjava`:**

   ```
   mkdir gremlinjava
   cd gremlinjava
   ```

1.  In the `gremlinjava` directory, create a `pom.xml` file, and then open it in a text editor:

   ```
   nano pom.xml
   ```

1. Copy the following into the `pom.xml` file and save it:

   ```
   <project xmlns="https://maven.apache.org/POM/4.0.0"
            xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/maven-v4_0_0.xsd">
     <properties>
       <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     </properties>
     <modelVersion>4.0.0</modelVersion>
     <groupId>com.amazonaws</groupId>
     <artifactId>GremlinExample</artifactId>
     <packaging>jar</packaging>
     <version>1.0-SNAPSHOT</version>
     <name>GremlinExample</name>
     <url>https://maven.apache.org</url>
     <dependencies>
       <dependency>
         <groupId>org.apache.tinkerpop</groupId>
         <artifactId>gremlin-driver</artifactId>
         <version>3.7.2</version>
       </dependency>
       <dependency>
         <groupId>org.slf4j</groupId>
         <artifactId>slf4j-jdk14</artifactId>
         <version>1.7.25</version>
       </dependency>
     </dependencies>
     <build>
       <plugins>
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-compiler-plugin</artifactId>
           <version>2.5.1</version>
           <configuration>
             <source>11</source>
             <target>11</target>
           </configuration>
         </plugin>
           <plugin>
             <groupId>org.codehaus.mojo</groupId>
             <artifactId>exec-maven-plugin</artifactId>
             <version>1.3</version>
             <configuration>
               <executable>java</executable>
               <arguments>
                 <argument>-classpath</argument>
                 <classpath/>
                 <argument>com.amazonaws.App</argument>
               </arguments>
               <mainClass>com.amazonaws.App</mainClass>
               <complianceLevel>1.11</complianceLevel>
               <killAfter>-1</killAfter>
             </configuration>
           </plugin>
       </plugins>
     </build>
   </project>
   ```
**Note**  
If you are modifying an existing Maven project, the required dependency is highlighted in the preceding code.

1. Create subdirectories for the example source code (`src/main/java/com/amazonaws/`) by typing the following at the command line:

   ```
   mkdir -p src/main/java/com/amazonaws/
   ```

1. In the `src/main/java/com/amazonaws/` directory, create a file named `App.java`, and then open it in a text editor.

   ```
   nano src/main/java/com/amazonaws/App.java
   ```

1. Copy the following into the `App.java` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance. Do *not* include the `https://` prefix in the `addContactPoint` method.
**Note**  
For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md).

   ```
   package com.amazonaws;
   import org.apache.tinkerpop.gremlin.driver.Cluster;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
   import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
   import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
   import org.apache.tinkerpop.gremlin.structure.T;
   
   public class App
   {
     public static void main( String[] args )
     {
       Cluster.Builder builder = Cluster.build();
       builder.addContactPoint("your-neptune-endpoint");
       builder.port(8182);
       builder.enableSsl(true);
   
       Cluster cluster = builder.create();
   
       GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
   
       // Add a vertex.
       // Note that a Gremlin terminal step, e.g. iterate(), is required to make a request to the remote server.
       // The full list of Gremlin terminal steps is at https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
       g.addV("Person").property("Name", "Justin").iterate();
   
       // Add a vertex with a user-supplied ID.
       g.addV("Custom Label").property(T.id, "CustomId1").property("name", "Custom id vertex 1").iterate();
       g.addV("Custom Label").property(T.id, "CustomId2").property("name", "Custom id vertex 2").iterate();
   
       g.addE("Edge Label").from(__.V("CustomId1")).to(__.V("CustomId2")).iterate();
   
       // This gets the vertices, only.
       GraphTraversal t = g.V().limit(3).elementMap();
   
       t.forEachRemaining(
         e ->  System.out.println(t.toList())
       );
   
       cluster.close();
     }
   }
   ```

   For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](#access-graph-gremlin-java-ssl).

1. Compile and run the sample using the following Maven command:

   ```
   mvn compile exec:exec
   ```

The preceding example returns a map of the key and values of each property for the first two vertexes in the graph by using the `g.V().limit(3).elementMap()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

**Note**  
The final part of the Gremlin query, `.toList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.  
You also must append an appropriate ending when you add a vertex or edge, such as when you use the `addV( )` step.

The following methods submit the query to the Neptune DB instance:
+ `toList()`
+ `toSet()`
+ `next()`
+ `nextTraverser()`
+ `iterate()`

## SSL/TLS configuration for Gremlin Java client
<a name="access-graph-gremlin-java-ssl"></a>

Neptune requires SSL/TLS to be enabled by default. Typically, if the Java driver is configured with `enableSsl(true)`, it can connect to Neptune without having to set up a `trustStore()` or `keyStore()` with a local copy of a certificate.

However, if the instance with which you are connecting doesn't have an internet connection through which to verify a public certificate, or if the certificate you're using isn't public, you can take the following steps to configure a local certificate copy:

**Setting up a local certificate copy to enable SSL/TLS**

1. Download and install [keytool](https://docs.oracle.com/javase/9/tools/keytool.htm#JSWOR-GUID-5990A2E4-78E3-47B7-AE75-6D1826259549) from Oracle. This will make setting up the local key store much easier.

1. Download the `SFSRootCAG2.pem`CA certificate (the Gremlin Java SDK requires a certificateto verify the remote certificate):

   ```
   wget https://www.amazontrust.com/repository/SFSRootCAG2.pem
   ```

1. Create a key store in either JKS or PKCS12 format. This example uses JKS. Answer the questions that follow at the prompt. The password that you create here will be needed later:

   ```
   keytool -genkey -alias (host name) -keyalg RSA -keystore server.jks
   ```

1. Import the `SFSRootCAG2.pem` file that you downloaded into the newly created key store:

   ```
   keytool -import -keystore server.jks -file .pem
   ```

1. Configure the `Cluster` object programmatically:

   ```
   Cluster cluster = Cluster.build("(your neptune endpoint)")
                            .port(8182)
                            .enableSSL(true)
                            .keyStore(‘server.jks’)
                            .keyStorePassword("(the password from step 2)")
                            .create();
   ```

   You can do the same thing in a configuration file if you want, as you might do with the Gremlin console:

   ```
   hosts: [(your neptune endpoint)]
   port: 8182
   connectionPool: { enableSsl: true, keyStore: server.jks, keyStorePassword: (the password from step 2) }
   serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
   ```

## IAM authentication
<a name="access-graph-gremlin-java-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Java client, see [Connecting to Amazon Neptune databases using IAM with Gremlin Java](iam-auth-connecting-gremlin-java.md).

# Java example of connecting to a Neptune DB instance with re-connect logic
<a name="access-graph-gremlin-java-reconnect-example"></a>

The following Java example demonstrates how to connect to the Gremlin client with reconnect logic to recover from an unexpected disconnect.

It has the following dependencies:

```
<dependency>
    <groupId>org.apache.tinkerpop</groupId>
    <artifactId>gremlin-driver</artifactId>
    <version>${gremlin.version}</version>
</dependency>

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>amazon-neptune-sigv4-signer</artifactId>
    <version>${sig4.signer.version}</version>
</dependency>

<dependency>
    <groupId>com.evanlennick</groupId>
    <artifactId>retry4j</artifactId>
    <version>0.15.0</version>
</dependency>
```

Here is the sample code:

**Important**  
 The `CallExecutor` from Retry4j may not be thread-safe. Consider having each thread use its own `CallExecutor` instance, or use a different retrying library. 

**Note**  
 The following example has been updated to include the use of requestInterceptor(). This was added in TinkerPop 3.6.6. Prior to TinkerPop version 3.6.6, the code example used handshakeInterceptor(), which was deprecated with that release. 

```
public static void main(String args[]) {
  boolean useIam = true;

  // Create Gremlin cluster and traversal source
  Cluster.Builder builder = Cluster.build()
         .addContactPoint(System.getenv("neptuneEndpoint"))
         .port(Integer.parseInt(System.getenv("neptunePort")))
         .enableSsl(true)
         .minConnectionPoolSize(1)
         .maxConnectionPoolSize(1)
         .serializer(Serializers.GRAPHBINARY_V1D0)
         .reconnectInterval(2000);

  if (useIam) {
      builder.requestInterceptor( r -> {
         try {
            NeptuneNettyHttpSigV4Signer sigV4Signer =
                        new NeptuneNettyHttpSigV4Signer("(your region)", new DefaultAWSCredentialsProviderChain());
            sigV4Signer.signRequest(r);
         } catch (NeptuneSigV4SignerException e) {
            throw new RuntimeException("Exception occurred while signing the request", e);
         }
         return r;
      });
   }

  Cluster cluster = builder.create();

  GraphTraversalSource g = AnonymousTraversalSource
      .traversal()
      .withRemote(DriverRemoteConnection.using(cluster));

  // Configure retries
  RetryConfig retryConfig = new RetryConfigBuilder()
      .retryOnCustomExceptionLogic(getRetryLogic())
      .withDelayBetweenTries(1000, ChronoUnit.MILLIS)
      .withMaxNumberOfTries(5)
      .withFixedBackoff()
      .build();

  @SuppressWarnings("unchecked")
  CallExecutor<Object> retryExecutor = new CallExecutorBuilder<Object>()
      .config(retryConfig)
      .build();

  // Do lots of queries
  for (int i = 0; i < 100; i++){
    String id = String.valueOf(i);

    @SuppressWarnings("unchecked")
    Callable<Object> query = () -> g.V(id)
        .fold()
        .coalesce(
            unfold(),
            addV("Person").property(T.id, id))
        .id().next();

    // Retry query
    // If there are connection failures, the Java Gremlin client will automatically
    // attempt to reconnect in the background, so all we have to do is wait and retry.
    Status<Object> status = retryExecutor.execute(query);

    System.out.println(status.getResult().toString());
  }

  cluster.close();
}

private static Function<Exception, Boolean> getRetryLogic() {

  return e -> {

    Class<? extends Exception> exceptionClass = e.getClass();

    StringWriter stringWriter = new StringWriter();
    String message = stringWriter.toString();


    if (RemoteConnectionException.class.isAssignableFrom(exceptionClass)){
      System.out.println("Retrying because RemoteConnectionException");
      return true;
    }

    // Check for connection issues
    if (message.contains("Timed out while waiting for an available host") ||
        message.contains("Timed-out") && message.contains("waiting for connection on Host") ||
        message.contains("Connection to server is no longer active") ||
        message.contains("Connection reset by peer") ||
        message.contains("SSLEngine closed already") ||
        message.contains("Pool is shutdown") ||
        message.contains("ExtendedClosedChannelException") ||
        message.contains("Broken pipe") ||
        message.contains(System.getenv("neptuneEndpoint")))
    {
      System.out.println("Retrying because connection issue");
      return true;
    };

    // Concurrent writes can sometimes trigger a ConcurrentModificationException.
    // In these circumstances you may want to backoff and retry.
    if (message.contains("ConcurrentModificationException")) {
      System.out.println("Retrying because ConcurrentModificationException");
      return true;
    }

    // If the primary fails over to a new instance, existing connections to the old primary will
    // throw a ReadOnlyViolationException. You may want to back and retry.
    if (message.contains("ReadOnlyViolationException")) {
      System.out.println("Retrying because ReadOnlyViolationException");
      return true;
    }

    System.out.println("Not a retriable error");
    return false;
  };
}
```

# Using Python to connect to a Neptune DB instance
<a name="access-graph-gremlin-python"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section walks you through the running of a Python sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Download and install Python 3.6 or later from the [Python.org website](https://www.python.org/downloads/).
+ Verify that you have **pip** installed. If you don't have **pip** or you're not sure, see [Do I need to install pip?](https://pip.pypa.io/en/stable/installing/#do-i-need-to-install-pip) in the **pip** documentation.
+ If your Python installation does not already have it, download `futures` as follows: `pip install futures`



**To connect to Neptune using Python**

1. Enter the following to install the `gremlinpython` package:

   ```
   pip install --user gremlinpython
   ```

1. Create a file named `gremlinexample.py`, and then open it in a text editor.

1. Copy the following into the `gremlinexample.py` file. Replace *your-neptune-endpoint* with the address of your Neptune DB cluster and *your-neptune-port* with the port of your Neptune DB cluster (default:8182). 

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

    The example below demonstrates how to connect with Gremlin Python. 

   ```
   import boto3
   import os
   from botocore.auth import SigV4Auth
   from botocore.awsrequest import AWSRequest
   from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
   from gremlin_python.process.anonymous_traversal import traversal
   
   database_url = "wss://your-neptune-endpoint:your-neptune-port/gremlin"
   
   remoteConn = DriverRemoteConnection(database_url, "g")
   
   g = traversal().withRemote(remoteConn)
   
   print(g.inject(1).toList())
   remoteConn.close()
   ```

1. Enter the following command to run the sample:

   ```
   python gremlinexample.py
   ```

   The Gremlin query at the end of this example returns the vertices (`g.V().limit(2)`) in a list. This list is then printed with the standard Python `print` function.
**Note**  
The final part of the Gremlin query, `toList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

   The following methods submit the query to the Neptune DB instance:
   + `toList()`
   + `toSet()`
   + `next()`
   + `nextTraverser()`
   + `iterate()`

   

   The preceding example returns the first two vertices in the graph by using the `g.V().limit(2).toList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-python-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Python client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin Python](gremlin-python-iam-auth.md).

# Using .NET to connect to a Neptune DB instance
<a name="access-graph-gremlin-dotnet"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section contains a code example written in C\$1 that connects to a Neptune DB instance and performs a Gremlin traversal.

Connections to Amazon Neptune must be from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. This sample code was tested on an Amazon EC2 instance running Ubuntu.

Before you begin, do the following:
+ Install .NET on the Amazon EC2 instance. To get instructions for installing .NET on multiple operating systems, including Windows, Linux, and macOS, see [Get Started with .NET](https://www.microsoft.com/net/learn/get-started/).
+ Install Gremlin.NET by running `dotnet add package gremlin.net` for your package. For more information, see [Gremlin.NET](https://tinkerpop.apache.org/docs/current/reference/#gremlin-DotNet) in the TinkerPop documentation.



**To connect to Neptune using Gremlin.NET**

1. Create a new .NET project.

   ```
   dotnet new console -o gremlinExample
   ```

1. Change directories into the new project directory.

   ```
   cd gremlinExample
   ```

1. Copy the following into the `Program.cs` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance.

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   using System;
   using System.Threading.Tasks;
   using System.Collections.Generic;
   using Gremlin.Net;
   using Gremlin.Net.Driver;
   using Gremlin.Net.Driver.Remote;
   using Gremlin.Net.Structure;
   using static Gremlin.Net.Process.Traversal.AnonymousTraversalSource;
   namespace gremlinExample
   {
     class Program
     {
       static void Main(string[] args)
       {
         try
         {
           var endpoint = "your-neptune-endpoint";
           // This uses the default Neptune and Gremlin port, 8182
           var gremlinServer = new GremlinServer(endpoint, 8182, enableSsl: true );
           var gremlinClient = new GremlinClient(gremlinServer);
           var remoteConnection = new DriverRemoteConnection(gremlinClient, "g");
           var g = Traversal().WithRemote(remoteConnection);
           g.AddV("Person").Property("Name", "Justin").Iterate();
           g.AddV("Custom Label").Property("name", "Custom id vertex 1").Iterate();
           g.AddV("Custom Label").Property("name", "Custom id vertex 2").Iterate();
           var output = g.V().Limit<Vertex>(3).ToList();
           foreach(var item in output) {
               Console.WriteLine(item);
           }
         }
         catch (Exception e)
         {
             Console.WriteLine("{0}", e);
         }
       }
     }
   }
   ```

1. Enter the following command to run the sample:

   ```
   dotnet run
   ```

   The Gremlin query at the end of this example returns the count of a single vertex for testing purposes. It is then printed to the console.
**Note**  
The final part of the Gremlin query, `Next()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

   The following methods submit the query to the Neptune DB instance:
   + `ToList()`
   + `ToSet()`
   + `Next()`
   + `NextTraverser()`
   + `Iterate()`

   Use `Next()` if you need the query results to be serialized and returned, or `Iterate()` if you don't.

   The preceding example returns a list by using the `g.V().Limit(3).ToList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-dotnet-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a .NET client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin .NET](gremlin-dotnet-iam-auth.md).

# Using Node.js to connect to a Neptune DB instance
<a name="access-graph-gremlin-node-js"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section walks you through the running of a Node.js sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Verify that Node.js version 8.11 or higher is installed. If it is not, download and install Node.js from the [Nodejs.org website](https://nodejs.org).

**To connect to Neptune using Node.js**

1. Enter the following to install the `gremlin-javascript` package:

   ```
   npm install gremlin
   ```

1. Create a file named `gremlinexample.js` and open it in a text editor.

1. Copy the following into the `gremlinexample.js` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance.

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   const gremlin = require('gremlin');
   const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
   const Graph = gremlin.structure.Graph;
   
   dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin',{});
   
   const graph = new Graph();
   const g = graph.traversal().withRemote(dc);
   
   g.V().limit(1).count().next().
       then(data => {
           console.log(data);
           dc.close();
       }).catch(error => {
           console.log('ERROR', error);
           dc.close();
       });
   ```

1. Enter the following command to run the sample:

   ```
   node gremlinexample.js
   ```

The preceding example returns the count of a single vertex in the graph by using the `g.V().limit(1).count().next()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

**Note**  
The final part of the Gremlin query, `next()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

The following methods submit the query to the Neptune DB instance:
+ `toList()`
+ `toSet()`
+ `next()`
+ `nextTraverser()`
+ `iterate()`

Use `next()` if you need the query results to be serialized and returned, or `iterate()` if you don't.

**Important**  
This is a standalone Node.js example. If you are planning to run code like this in an AWS Lambda function, see [Lambda function examples](lambda-functions-examples.md) for details about using JavaScript efficiently in a Neptune Lambda function.

## IAM authentication
<a name="access-graph-gremlin-nodejs-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a JavaScript client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin JavaScript](gremlin-javascript-iam-auth.md).

# Using Go to connect to a Neptune DB instance
<a name="access-graph-gremlin-go"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

**Note**  
The gremlingo 3.5.x versions are backwards compatible with TinkerPop 3.4.x versions as long as you only use 3.4.x features in the Gremlin queries you write.

The following section walks you through the running of a Go sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Download and install Go 1.17 or later from the [go.dev](https://go.dev/dl/) website.

**To connect to Neptune using Go**

1. Starting from an empty directory, initialize a new Go module:

   ```
   go mod init example.com/gremlinExample
   ```

1. Add gremlin-go as a dependency of your new module:

   ```
   go get github.com/apache/tinkerpop/gremlin-go/v3/driver
   ```

1. Create a file named `gremlinExample.go` and then open it in a text editor.

1. Copy the following into the `gremlinExample.go` file, replacing *`(your neptune endpoint)`* with the address of your Neptune DB instance:

   ```
   package main
   
   import (
     "fmt"
     gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
   )
   
   func main() {
     // Creating the connection to the server.
     driverRemoteConnection, err := gremlingo.NewDriverRemoteConnection("wss://(your neptune endpoint):8182/gremlin",
       func(settings *gremlingo.DriverRemoteConnectionSettings) {
         settings.TraversalSource = "g"
       })
     if err != nil {
       fmt.Println(err)
       return
     }
     // Cleanup
     defer driverRemoteConnection.Close()
   
     // Creating graph traversal
     g := gremlingo.Traversal_().WithRemote(driverRemoteConnection)
   
     // Perform traversal
     results, err := g.V().Limit(2).ToList()
     if err != nil {
       fmt.Println(err)
       return
     }
     // Print results
     for _, r := range results {
       fmt.Println(r.GetString())
     }
   }
   ```
**Note**  
The Neptune TLS certificate format is not currently supported on Go 1.18\$1 with macOS, and may give a 509 error when trying to initiate a connection. For local testing, this can be skipped by adding "crypto/tls" to the imports and modifying the `DriverRemoteConnection` settings as follows:  

   ```
   // Creating the connection to the server.
   driverRemoteConnection, err := gremlingo.NewDriverRemoteConnection("wss://your-neptune-endpoint:8182/gremlin",
     func(settings *gremlingo.DriverRemoteConnectionSettings) {
         settings.TraversalSource = "g"
         settings.TlsConfig = &tls.Config{InsecureSkipVerify: true}
     })
   ```

1. Enter the following command to run the sample:

   ```
   go run gremlinExample.go
   ```

The Gremlin query at the end of this example returns the vertices `(g.V().Limit(2))` in a slice. This slice is then iterated through and printed with the standard `fmt.Println` function.

**Note**  
The final part of the Gremlin query, `ToList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

The following methods submit the query to the Neptune DB instance:
+ `ToList()`
+ `ToSet()`
+ `Next()`
+ `GetResultSet()`
+ `Iterate()`

The preceding example returns the first two vertices in the graph by using the `g.V().Limit(2).ToList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-go-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Go client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin Go](gremlin-go-iam-auth.md).

# Using the AWS SDK to run Gremlin queries
<a name="access-graph-gremlin-sdk"></a>

With the AWS SDK, you can run Gremlin queries against your Neptune graph using a programming language of your choice. The Neptune data API SDK (service name `neptunedata`) provides the [ExecuteGremlinQuery](https://docs.aws.amazon.com/neptune/latest/data-api/API_ExecuteGremlinQuery.html) action for submitting Gremlin queries.

You must run these examples from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB cluster, or from a location that has network connectivity to your cluster endpoint.

Direct links to the API reference documentation for the `neptunedata` service in each SDK language can be found below:


| Programming language | neptunedata API reference | 
| --- | --- | 
| C\$1\$1 | [https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html) | 
| Go | [https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/](https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/) | 
| Java | [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html) | 
| JavaScript | [https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/) | 
| Kotlin | [https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html](https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html) | 
| .NET | [https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html](https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html) | 
| PHP | [https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html](https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html) | 
| Python | [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html) | 
| Ruby | [https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html) | 
| Rust | [https://crates.io/crates/aws-sdk-neptunedata](https://crates.io/crates/aws-sdk-neptunedata) | 
| CLI | [https://docs.aws.amazon.com/cli/latest/reference/neptunedata/](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/) | 

## Gremlin AWS SDK examples
<a name="access-graph-gremlin-sdk-examples"></a>

The following examples show how to set up a `neptunedata` client, run a Gremlin query, and print the results. Replace *YOUR\$1NEPTUNE\$1HOST* and *YOUR\$1NEPTUNE\$1PORT* with the endpoint and port of your Neptune DB cluster.

**Client-side timeout and retry configuration**  
The SDK client timeout controls how long the *client* waits for a response. It does not control how long the query runs on the server. If the client times out before the server finishes, the query may continue running on Neptune while the client has no way to retrieve the results.  
We recommend setting the client-side read timeout to `0` (no timeout) or to a value that is at least a few seconds longer than the server-side [neptune\$1query\$1timeout](parameters.md#parameters-db-cluster-parameters-neptune_query_timeout) setting on your Neptune DB cluster. This lets Neptune control when queries time out.  
We also recommend setting the maximum retry attempts to `1` (no retries). If the SDK retries a query that is still running on the server, it can result in duplicate operations. This is especially important for mutation queries, where a retry could cause unintended duplicate writes.

------
#### [ Python ]

1. Follow the [installation instructions](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) to install Boto3.

1. Create a file named `gremlinExample.py` and paste the following code:

   ```
   import boto3
   import json
   from botocore.config import Config
   
   # Disable the client-side read timeout and retries so that
   # Neptune's server-side neptune_query_timeout controls query duration.
   client = boto3.client(
       'neptunedata',
       endpoint_url=f'https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   # Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   response = client.execute_gremlin_query(
       gremlinQuery='g.V().limit(1)',
       serializer='application/vnd.gremlin-v3.0+json;types=false'
   )
   
   print(json.dumps(response['result'], indent=2))
   ```

1. Run the example: `python gremlinExample.py`

------
#### [ Java ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-java/latest/developer-guide/setup.html) to set up the AWS SDK for Java.

1. Use the following code to set up a `NeptunedataClient`, run a Gremlin query, and print the result:

   ```
   import java.net.URI;
   import java.time.Duration;
   import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
   import software.amazon.awssdk.core.retry.RetryPolicy;
   import software.amazon.awssdk.services.neptunedata.NeptunedataClient;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteGremlinQueryRequest;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteGremlinQueryResponse;
   
   // Disable the client-side timeout and retries so that
   // Neptune's server-side neptune_query_timeout controls query duration.
   NeptunedataClient client = NeptunedataClient.builder()
       .endpointOverride(URI.create("https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT"))
       .overrideConfiguration(ClientOverrideConfiguration.builder()
           .apiCallTimeout(Duration.ZERO)
           .retryPolicy(RetryPolicy.none())
           .build())
       .build();
   
   // Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   ExecuteGremlinQueryRequest request = ExecuteGremlinQueryRequest.builder()
       .gremlinQuery("g.V().limit(1)")
       .serializer("application/vnd.gremlin-v3.0+json;types=false")
       .build();
   
   ExecuteGremlinQueryResponse response = client.executeGremlinQuery(request);
   
   System.out.println(response.result().toString());
   ```

------
#### [ JavaScript ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-javascript/v3/developer-guide/getting-started-nodejs.html) to set up the AWS SDK for JavaScript. Install the neptunedata client package: `npm install @aws-sdk/client-neptunedata`.

1. Create a file named `gremlinExample.js` and paste the following code:

   ```
   import { NeptunedataClient, ExecuteGremlinQueryCommand } from "@aws-sdk/client-neptunedata";
   import { NodeHttpHandler } from "@smithy/node-http-handler";
   
   const config = {
       endpoint: "https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT",
       // Disable the client-side request timeout so that
       // Neptune's server-side neptune_query_timeout controls query duration.
       requestHandler: new NodeHttpHandler({
           requestTimeout: 0
       }),
       maxAttempts: 1
   };
   
   const client = new NeptunedataClient(config);
   
   // Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   const input = {
       gremlinQuery: "g.V().limit(1)",
       serializer: "application/vnd.gremlin-v3.0+json;types=false"
   };
   
   const command = new ExecuteGremlinQueryCommand(input);
   const response = await client.send(command);
   
   console.log(JSON.stringify(response, null, 2));
   ```

1. Run the example: `node gremlinExample.js`

------

# Gremlin query hints
<a name="gremlin-query-hints"></a>

You can use query hints to specify optimization and evaluation strategies for a particular Gremlin query in Amazon Neptune. 

Query hints are specified by adding a `withSideEffect` step to the query with the following syntax.

```
g.withSideEffect(hint, value)
```
+ *hint* – Identifies the type of the hint to apply.
+ *value* – Determines the behavior of the system aspect under consideration.

For example, the following shows how to include a `repeatMode` hint in a Gremlin traversal.

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

```
g.withSideEffect('Neptune#repeatMode', 'DFS').V("3").repeat(out()).times(10).limit(1).path()
```

The preceding query instructs the Neptune engine to traverse the graph *Depth First* (`DFS`) rather than the default Neptune, *Breadth First* (`BFS`).

The following sections provide more information about the available query hints and their usage.

**Topics**
+ [Gremlin repeatMode query hint](gremlin-query-hints-repeatMode.md)
+ [Gremlin noReordering query hint](gremlin-query-hints-noReordering.md)
+ [Gremlin typePromotion query hint](gremlin-query-hints-typePromotion.md)
+ [Gremlin useDFE query hint](gremlin-query-hints-useDFE.md)
+ [Gremlin query hints for using the results cache](gremlin-query-hints-results-cache.md)

# Gremlin repeatMode query hint
<a name="gremlin-query-hints-repeatMode"></a>

The Neptune `repeatMode` query hint specifies how the Neptune engine evaluates the `repeat()` step in a Gremlin traversal: breadth first, depth first, or chunked depth first.

The evaluation mode of the `repeat()` step is important when it is used to find or follow a path, rather than simply repeating a step a limited number of times.

## Syntax
<a name="gremlin-query-hints-repeatMode-syntax"></a>

The `repeatMode` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#repeatMode', 'mode').gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Modes**
+ `BFS`

  Breadth-First Search

  Default execution mode for the `repeat()` step. This gets all sibling nodes before going deeper along the path.

  This version is memory-intensive and frontiers can get very large. There is a higher risk that the query will run out of memory and be cancelled by the Neptune engine. This most closely matches other Gremlin implementations.
+ `DFS`

  Depth-First Search

  Follows each path to the maximum depth before moving on to the next solution.

  This uses less memory. It may provide better performance in situations like finding a single path from a starting point out multiple hops.
+ `CHUNKED_DFS`

  Chunked Depth-First Search

  A hybrid approach that explores the graph depth-first in chunks of 1,000 nodes, rather than 1 node (`DFS`) or all nodes (`BFS)`.

  The Neptune engine will get up to 1,000 nodes at each level before following the path deeper.

  This is a balanced approach between speed and memory usage. 

  It is also useful if you want to use `BFS`, but the query is using too much memory.



## Example
<a name="gremlin-query-hints-repeatMode-example"></a>

The following section describes the effect of the repeat mode on a Gremlin traversal.

In Neptune the default mode for the `repeat()` step is to perform a breadth-first (`BFS`) execution strategy for all traversals. 

In most cases, the TinkerGraph implementation uses the same execution strategy, but in some cases it alters the execution of a traversal. 

For example, the TinkerGraph implementation modifies the following query.

```
g.V("3").repeat(out()).times(10).limit(1).path()
```

The `repeat()` step in this traversal is "unrolled" into the following traversal, which results in a depth-first (`DFS`) strategy.

```
g.V(<id>).out().out().out().out().out().out().out().out().out().out().limit(1).path()
```

**Important**  
The Neptune query engine does not do this automatically.

Breadth-first (`BFS`) is the default execution strategy, and is similar to TinkerGraph in most cases. However, there are certain cases where depth-first (`DFS`) strategies are preferable.

 

**BFS (Default)**  
Breadth-first (BFS) is the default execution strategy for the `repeat()` operator.

```
g.V("3").repeat(out()).times(10).limit(1).path()
```

The Neptune engine fully explores the first nine-hop frontiers before finding a solution ten hops out. This is effective in many cases, such as a shortest-path query.

However, for the preceding example, the traversal would be much faster using the depth-first (`DFS`) mode for the `repeat()` operator.

**DFS**  
The following query uses the depth-first (`DFS`) mode for the `repeat()` operator.

```
g.withSideEffect("Neptune#repeatMode", "DFS").V("3").repeat(out()).times(10).limit(1)
```

This follows each individual solution out to the maximum depth before exploring the next solution. 

# Gremlin noReordering query hint
<a name="gremlin-query-hints-noReordering"></a>

When you submit a Gremlin traversal, the Neptune query engine investigates the structure of the traversal and reorders parts of the query, trying to minimize the amount of work required for evaluation and query response time. For example, a traversal with multiple constraints, such as multiple `has()` steps, is typically not evaluated in the given order. Instead it is reordered after the query is checked with static analysis.

The Neptune query engine tries to identify which constraint is more selective and runs that one first. This often results in better performance, but the order in which Neptune chooses to evaluate the query might not always be optimal.

If you know the exact characteristics of the data and want to manually dictate the order of the query execution, you can use the Neptune `noReordering` query hint to specify that the traversal be evaluated in the order given.

## Syntax
<a name="gremlin-query-hints-noReordering-syntax"></a>

The `noReordering` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#noReordering', true or false).gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Values**
+ `true`
+ `false`

# Gremlin typePromotion query hint
<a name="gremlin-query-hints-typePromotion"></a>

When you submit a Gremlin traversal that filters on a numerical value or range, the Neptune query engine must normally use type promotion when it executes the query. This means that it has to examine values of every type that could hold the value you are filtering on.

For example, if you are filtering for values equal to 55, the engine must look for integers equal to 55, long integers equal to 55L, floats equal to 55.0, and so forth. Each type promotion requires an additional lookup on storage, which can cause an apparently simple query to take an unexpectedly long time to complete.

Let's say you are searching for all vertexes with a customer-age property greater than 5:

```
g.V().has('customerAge', gt(5))
```

To execute that traversal thoroughly, Neptune must expand the query to examine every numeric type that the value you are querying for could be promoted to. In this case, the `gt` filter has to be applied for any integer over 5, any long over 5L, any float over 5.0, and any double over 5.0. Because each of these type promotions requires an additional lookup on storage, you will see multiple filters per numeric filter when you run the [Gremlin `profile` API](gremlin-profile-api.md) for this query, and it will take significantly longer to complete than you might expect.

Often type promotion is unnecessary because you know in advance that you only need to find values of one specific type. When this is the case, you can speed up your queries dramatically by using the `typePromotion` query hint to turn off type promotion.

## Syntax
<a name="gremlin-query-hints-typePromotion-syntax"></a>

The `typePromotion` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#typePromotion', true or false).gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Values**
+ `true`
+ `false`

To turn off type promotion for the query above, you would use:

```
g.withSideEffect('Neptune#typePromotion', false).V().has('customerAge', gt(5))
```

# Gremlin useDFE query hint
<a name="gremlin-query-hints-useDFE"></a>

Use this query hint to enable use of the DFE for executing the query. By default Neptune does not use the DFE without this query hint being set to `true`, because the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter defaults to `viaQueryHint`. If you set that instance parameter to `enabled`, the DFE engine is used for all queries except those having the `useDFE` query hint set to `false`.

Example of enabling the DFE for a query:

```
g.withSideEffect('Neptune#useDFE', true).V().out()
```

# Gremlin query hints for using the results cache
<a name="gremlin-query-hints-results-cache"></a>

The following query hints can be used when the [query results cache](gremlin-results-cache.md) is enabled.

## Gremlin `enableResultCache` query hint
<a name="gremlin-query-hints-results-cache-enableResultCache"></a>

The `enableResultCache` query hint with a value of `true` causes query results to be returned from the cache if they have already been cached. If not, it returns new results and caches them until such time as they are cleared from the cache. For example:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Later, you can access the cached results by issuing exactly the same query again.

If the value of this query hint is `false`, or if it isn't present, query results are not cached. However, setting it to `false` does not clear existing cached results. To clear cached results, use the `invalidateResultCache` or `invalidateResultCachekey` hint.

## Gremlin `enableResultCacheWithTTL` query hint
<a name="gremlin-query-hints-results-cache-enableResultCacheWithTTL"></a>

The `enableResultCacheWithTTL` query hint also returns cached results if there are any, without affecting the TTL of results already in the cache. If there are currently no cached results, the query returns new results and caches them for the time to live (TTL) specified by the `enableResultCacheWithTTL` query hint. That time to live is specified in seconds. For example, the following query specifies a time to live of sixty seconds:

```
g.with('Neptune#enableResultCacheWithTTL', 60)
 .V().has('genre','drama').in('likes')
```

Before the 60-second time-to-live is over, you can use the same query (here, `g.V().has('genre','drama').in('likes')`) with either the `enableResultCache` or the `enableResultCacheWithTTL` query hint to access the cached results.

**Note**  
The time to live specified with `enableResultCacheWithTTL` does not affect results that have already been cached.  
If results were previously cached using `enableResultCache`, the cache must first be explicitly cleared before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.
If results were previously cached using `enableResultCachewithTTL`, that previous TTL must first expire before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.

After the time to live has passed, the cached results for the query are cleared, and a subsequent instance of the same query then returns new results. If `enableResultCacheWithTTL` is attached to that subsequent query, the new results are cached with the TTL that it specifies.

## Gremlin `invalidateResultCacheKey` query hint
<a name="gremlin-query-hints-results-cache-invalidateResultCacheKey"></a>

The `invalidateResultCacheKey` query hint can take a `true` or `false` value. A `true` value causes cached results for the the query to which `invalidateResultCacheKey` is attached to be cleared. For example, the following example causes results cached for the query key `g.V().has('genre','drama').in('likes')` to be cleared:

```
g.with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

The example query above does not cause its new results to be cached. You can include `enableResultCache` (or `enableResultCacheWithTTL`) in the same query if you want to cache the new results after clearing the existing cached ones:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

## Gremlin `invalidateResultCache` query hint
<a name="gremlin-query-hints-results-cache-invalidateResultCache"></a>

The `invalidateResultCache` query hint can take a `true` or `false` value. A `true` value causes all results in the results cache to be cleared. For example:

```
g.with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

The example query above does not cause its results to be cached. You can include `enableResultCache` (or `enableResultCacheWithTTL`) in the same query if you want to cache new results after completely clearing the existing cache:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

## Gremlin `numResultsCached` query hint
<a name="gremlin-query-hints-results-cache-numResultsCached"></a>

The `numResultsCached` query hint can only be used with queries that contain `iterate()`, and it specifies the maximum number of results to cache for the query to which it is attached. Note that the results cached when `numResultsCached` is present are not returned, only cached.

For example, the following query specifies that up to 100 of its results should be cached, but none of those cached results returned:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#numResultsCached', 100)
 .V().has('genre','drama').in('likes').iterate()
```

You can then use a query like the following to retrieve a range of the cached results (here, the first ten):

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#numResultsCached', 100)
 .V().has('genre','drama').in('likes').range(0, 10)
```

## Gremlin `noCacheExceptions` query hint
<a name="gremlin-query-hints-results-cache-noCacheExceptions"></a>

The `noCacheExceptions` query hint can take a `true` or `false` value. A `true` value causes any exceptions related to the results cache to be suppressed. For example:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#noCacheExceptions', true)
 .V().has('genre','drama').in('likes')
```

In particular, this suppresses the `QueryLimitExceededException`, which is raised if the results of a query are too large to fit in the results cache.

# Gremlin query status API
<a name="gremlin-api-status"></a>

You can list all active Gremlin queries or get the status of a specific query. The underlying HTTP endpoint for both operations is `https://your-neptune-endpoint:port/gremlin/status`.

## Listing active Gremlin queries
<a name="gremlin-api-status-list"></a>

To list all active Gremlin queries, call the endpoint with no `queryId` parameter.

### Request parameters
<a name="gremlin-api-status-list-request"></a>
+ **includeWaiting** (*optional*)   –   If set to `TRUE`, the response includes waiting queries in addition to running queries.

### Response syntax
<a name="gremlin-api-status-list-response"></a>

```
{
  "acceptedQueryCount": integer,
  "runningQueryCount": integer,
  "queries": [
    {
      "queryId": "guid",
      "queryEvalStats": {
        "waited": integer,
        "elapsed": integer,
        "cancelled": boolean
      },
      "queryString": "string"
    }
  ]
}
```
+ **acceptedQueryCount**   –   The number of queries that have been accepted but not yet completed, including queries in the queue.
+ **runningQueryCount**   –   The number of currently running Gremlin queries.
+ **queries**   –   A list of the current Gremlin queries.

### Example
<a name="gremlin-api-status-list-example"></a>

------
#### [ AWS CLI ]

```
aws neptunedata list-gremlin-queries \
  --endpoint-url https://your-neptune-endpoint:port
```

For more information, see [list-gremlin-queries](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/list-gremlin-queries.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.list_gremlin_queries()

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status
```

------

The following output shows a single running query.

```
{
  "acceptedQueryCount": 9,
  "runningQueryCount": 1,
  "queries": [
    {
      "queryId": "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f",
      "queryEvalStats": {
        "waited": 0,
        "elapsed": 23,
        "cancelled": false
      },
      "queryString": "g.V().out().count()"
    }
  ]
}
```

## Getting the status of a specific Gremlin query
<a name="gremlin-api-status-get-single"></a>

To get the status of a specific Gremlin query, provide the `queryId` parameter.

### Request parameters
<a name="gremlin-api-status-get-request"></a>
+ **queryId** (*required*)   –   The ID of the Gremlin query. Neptune automatically assigns this ID value to each query, or you can assign your own ID (see [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md)).

### Response syntax
<a name="gremlin-api-status-get-response-syntax"></a>

```
{
  "queryId": "guid",
  "queryString": "string",
  "queryEvalStats": {
    "waited": integer,
    "elapsed": integer,
    "cancelled": boolean,
    "subqueries": document
  }
}
```
+ **queryId**   –   The ID of the query.
+ **queryString**   –   The submitted query. This is truncated to 1024 characters if it is longer than that.
+ **queryEvalStats**   –   Statistics for the query, including `waited` (wait time in milliseconds), `elapsed` (run time in milliseconds), `cancelled` (whether the query was cancelled), and `subqueries` (the number of subqueries).

### Example
<a name="gremlin-api-status-get-example"></a>

------
#### [ AWS CLI ]

```
aws neptunedata get-gremlin-query-status \
  --endpoint-url https://your-neptune-endpoint:port \
  --query-id "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

For more information, see [get-gremlin-query-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-gremlin-query-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.get_gremlin_query_status(
    queryId='fb34cd3e-f37c-4d12-9cf2-03bb741bf54f'
)

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status/fb34cd3e-f37c-4d12-9cf2-03bb741bf54f \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status/fb34cd3e-f37c-4d12-9cf2-03bb741bf54f
```

------

The following is an example response.

```
{
  "queryId": "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f",
  "queryString": "g.V().out().count()",
  "queryEvalStats": {
    "waited": 0,
    "elapsed": 23,
    "cancelled": false
  }
}
```

# Gremlin query cancellation
<a name="gremlin-api-status-cancel"></a>

To get the status of Gremlin queries, use HTTP `GET` or `POST` to make a request to the `https://your-neptune-endpoint:port/gremlin/status` endpoint.

## Gremlin query cancellation request parameters
<a name="gremlin-api-status-cancel-request"></a>
+ **cancelQuery**   –   Required for cancellation. This parameter has no corresponding value.
+ **queryId**   –   The ID of the running Gremlin query to cancel.

## Gremlin query cancellation example
<a name="gremlin-api-status-cancel-example"></a>

The following is an example of cancelling a query.

------
#### [ AWS CLI ]

```
aws neptunedata cancel-gremlin-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --query-id "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

For more information, see [cancel-gremlin-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/cancel-gremlin-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.cancel_gremlin_query(
    queryId='fb34cd3e-f37c-4d12-9cf2-03bb741bf54f'
)

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status \
  --region us-east-1 \
  --service neptune-db \
  --data-urlencode "cancelQuery" \
  --data-urlencode "queryId=fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status \
  --data-urlencode "cancelQuery" \
  --data-urlencode "queryId=fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

------

Successful cancellation returns HTTP `200` OK.

# Support for Gremlin script-based sessions
<a name="access-graph-gremlin-sessions"></a>

You can use Gremlin sessions with implicit transactions in Amazon Neptune. For information about Gremlin sessions, see [Considering Sessions](http://tinkerpop.apache.org/docs/current/reference/#sessions) in the Apache TinkerPop documentation. The sections below describe how to use Gremlin sessions with Java.

**Important**  
Currently, the longest time Neptune can keep a script-based session open is 10 minutes. If you don't close a session before that, the session times out and everything in it is rolled back.

**Topics**
+ [Gremlin sessions on the Gremlin console](#access-graph-gremlin-sessions-console)
+ [Gremlin sessions in the Gremlin Language Variant](#access-graph-gremlin-sessions-glv)

## Gremlin sessions on the Gremlin console
<a name="access-graph-gremlin-sessions-console"></a>

If you create a remote connection on the Gremlin Console without the `session` parameter, the remote connection is created in *sessionless* mode. In this mode, each request that is submitted to the server is treated as a complete transaction in itself, and no state is saved between requests. If a request fails, only that request is rolled back.

If you create a remote connection that *does* use the `session` parameter, you create a script-based session that lasts until you close the remote connection. Every session is identified by a unique UUID that the console generates and returns to you.

The following is an example of one console call that creates a session. After queries are submitted, another call closes the session and commits the queries.

**Note**  
The Gremlin client must always be closed to release server side resources.

```
gremlin> :remote connect tinkerpop.server conf/neptune-remote.yaml session
  . . .
  . . .
gremlin> :remote close
```

For more information and examples, see [Sessions](http://tinkerpop.apache.org/docs/current/reference/#console-sessions) in the TinkerPop documentation.

All the queries that you run during a session form a single transaction that isn't committed until all the queries succeed and you close the remote connection. If a query fails, or if you don't close the connection within the maximum session lifetime that Neptune supports, the session transaction is not committed, and all the queries in it are rolled back.

## Gremlin sessions in the Gremlin Language Variant
<a name="access-graph-gremlin-sessions-glv"></a>

In the Gremlin language variant (GLV), you need to create a `SessionedClient` object to issue multiple queries in a single transaction, as in the following example.

```
try {                              // line 1
  Cluster cluster = Cluster.open();                    // line 2
  Client client = cluster.connect("sessionName");      // line 3
   ...
   ...
} finally {
  // Always close. If there are no errors, the transaction is committed; otherwise, it's rolled back.
  client.close();
}
```

Line 3 in the preceding example creates the `SessionedClient` object according to the configuration options set for the cluster in question. The *sessionName* string that you pass to the connect method becomes the unique name of the session. To avoid collisions, use a UUID for the name.

The client starts a session transaction when it is initialized. All the queries that you run during the session form are committed only when you call `client.close( )`. Again, if a single query fails, or if you don't close the connection within the maximum session lifetime that Neptune supports, the session transaction fails, and all the queries in it are rolled back.

**Note**  
The Gremlin client must always be closed to release server side resources.

```
GraphTraversalSource g = traversal().withRemote(conn);

Transaction tx = g.tx();

// Spawn a GraphTraversalSource from the Transaction.
// Traversals spawned from gtx are executed within a single transaction.
GraphTraversalSource gtx = tx.begin();
try {
  gtx.addV('person').iterate();
  gtx.addV('software').iterate();

  tx.commit();
} finally {
    if (tx.isOpen()) {
        tx.rollback();
    }
}
```

# Gremlin transactions in Neptune
<a name="access-graph-gremlin-transactions"></a>

There are several contexts within which Gremlin [transactions](transactions.md) are executed. When working with Gremlin it is important to understand the context you are working within and what its implications are:
+ **`Script-based`**   –   Requests are made using text-based Gremlin strings, like this:
  + Using the Java driver and `Client.submit(string)`.
  + Using the Gremlin console and `:remote connect`.
  + Using the HTTP API.
+ **`Bytecode-based`**   –   Requests are made using serialized Gremlin bytecode typical of [Gremlin Language Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants)(GLV).

  For example, using the Java driver, `g = traversal().withRemote(...)`.

For either of the above contexts, there is the additional context of the request being sent as sessionless or as bound to a session.

**Note**  
 Gremlin transactions must always either be committed or rolled back, so that server-side resources can be released. In the event of an error during the transaction, it is important to retry the entire transaction and not just the particular request that failed. 

## Sessionless requests
<a name="access-graph-gremlin-transactions-sessionless"></a>

 When sessionless, a request is equivalent to a single transaction.

For scripts, the implication is that one or more Gremlin statements sent in a single request will commit or rollback as a single transaction. For example:

```
Cluster cluster = Cluster.open();
Client client = cluster.connect(); // sessionless
// 3 vertex additions in one request/transaction:
client.submit("g.addV();g.addV();g.addV()").all().get();
```

For bytecode, a sessionless request is made for each traversal spawned and executed from `g`:

```
GraphTraversalSource g = traversal().withRemote(...);

// 3 vertex additions in three individual requests/transactions:
g.addV().iterate();
g.addV().iterate();
g.addV().iterate();

// 3 vertex additions in one single request/transaction:
g.addV().addV().addV().iterate();
```

## Requests bound to a session
<a name="access-graph-gremlin-transactions-session-bound"></a>

When bound to a session, multiple requests can be applied within the context of a single transaction.

For scripts, the implication is that there is no need to concatenate together all of the graph operations into a single embedded string value:

```
Cluster cluster = Cluster.open();
Client client = cluster.connect(sessionName); // session
try {
    // 3 vertex additions in one request/transaction:
    client.submit("g.addV();g.addV();g.addV()").all().get();
} finally {
    client.close();
}

try {
    // 3 vertex additions in three requests, but one transaction:
    client.submit("g.addV()").all().get(); // starts a new transaction with the same sessionName
    client.submit("g.addV()").all().get();
    client.submit("g.addV()").all().get();
} finally {
    client.close();
}
```

For script-based sessions, closing the client with `client.close()` commits the transaction. There is no explicit rollback command available in script-based sessions. To force a rollback, you can cause the transaction to fail by issuing a query such as `g.inject(0).fail('rollback')` before closing the client.

**Note**  
A query like `g.inject(0).fail('rollback')`, used to intentionally throw an error to force a rollback, produces an exception on the client. Catch and discard the resulting exception before closing the client.

For bytecode, the transaction can be explicitly controlled and the session managed transparently. Gremlin Language Variants (GLV) support Gremlin's `tx()` syntax to `commit()` or `rollback()` a transaction as follows:

```
GraphTraversalSource g = traversal().withRemote(conn);

Transaction tx = g.tx();

// Spawn a GraphTraversalSource from the Transaction.
// Traversals spawned from gtx are executed within a single transaction.
GraphTraversalSource gtx = tx.begin();
try {
    gtx.addV('person').iterate();
    gtx.addV('software').iterate();

    tx.commit();
} finally {
    if (tx.isOpen()) {
        tx.rollback();
    }
}
```

Although the example above is written in Java, you can also use this `tx()` syntax in other languages. For language-specific transaction syntax, see the Transactions section of the Apache TinkerPop documentation for [Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java-transactions), [Python](https://tinkerpop.apache.org/docs/current/reference/#gremlin-python-transactions), [Javascript](https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-transactions), [.NET](https://tinkerpop.apache.org/docs/current/reference/#gremlin-dotnet-transactions), and [Go](https://tinkerpop.apache.org/docs/current/reference/#gremlin-go-transactions).

**Warning**  
Sessionless read-only queries are executed under [SNAPSHOT](transactions-isolation-levels.md) isolation, but read-only queries run within an explicit transaction are executed under [SERIALIZABLE](transactions-isolation-levels.md) isolation. The read-only queries executed under `SERIALIZABLE` isolation incur higher overhead and can block or get blocked by concurrent writes, unlike those run under `SNAPSHOT` isolation.

## Timeout behavior for bytecode commit and rollback
<a name="access-graph-gremlin-transactions-commit-rollback-timeout"></a>

When you use bytecode-based transactions with the `tx()` syntax, the `commit()` and `rollback()` operations are not subject to query timeout settings. Neither the global `neptune_query_timeout` parameter nor per-query timeout values set through `evaluationTimeout` apply to these operations. On the server, `commit()` and `rollback()` run without a time limit until they complete or encounter an error.

On the client side, the Gremlin driver's `tx.commit()` and `tx.rollback()` calls will not complete until the server responds. Depending on the language, this might manifest as a blocking call or an unresolved async operation. No driver provides a built-in timeout setting that bounds these calls. Consult the API documentation for your specific Gremlin Language Variant for details on concurrency behavior around these transaction features.

**Important**  
If a `commit()` or `rollback()` call takes longer than expected, it might be blocked by lock contention from a concurrent transaction. For more information about lock conflicts, see [Conflict Resolution Using Lock-Wait Timeouts](transactions-neptune.md#transactions-neptune-conflicts).

If you need to bound the time your application waits for a `commit()` or `rollback()`, you can use your language's concurrency features to apply a client-side timeout. If the client-side timeout fires, the server continues processing the operation. The server-side operation holds a worker thread until it completes. After a client-side timeout, close the connection and create a new one rather than reusing the existing connection, because the transaction state is indeterminate.

### Server-side transaction cleanup
<a name="access-graph-gremlin-transactions-server-side-cleanup"></a>

If a client disconnects or abandons a transaction without committing or rolling back, Neptune has server-side mechanisms that eventually clean up the orphaned transaction:
+ **Session timeout**   –   Bytecode-based sessions that remain idle for longer than the maximum session lifetime (10 minutes) are closed, and any open transaction is rolled back.
+ **Connection idle timeout**   –   Neptune closes WebSocket connections that are idle for approximately 20 minutes. When the connection closes, the server rolls back any open transaction associated with that connection.

These cleanup mechanisms are safety nets. We recommend that you always explicitly commit or roll back transactions when you are finished with them.

# Using the Gremlin API with Amazon Neptune
<a name="gremlin-api-reference"></a>

**Note**  
Amazon Neptune does not support the `bindings` property.

Gremlin HTTPS requests all use a single endpoint: `https://your-neptune-endpoint:port/gremlin`. All Neptune connections must use HTTPS.

You can connect the Gremlin Console to a Neptune graph directly through WebSockets.

For more information about connecting to the Gremlin endpoint, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The Amazon Neptune implementation of Gremlin has specific details and differences that you need to consider. For more information, see [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md).

For information about the Gremlin language and traversals, see [The Traversal](https://tinkerpop.apache.org/docs/current/reference/#traversal) in the Apache TinkerPop documentation.

# Caching query results in Amazon Neptune Gremlin
<a name="gremlin-results-cache"></a>

Amazon Neptune supports a results cache for Gremlin queries.

You can enable the query results cache and then use a query hint to cache the results of a Gremlin read-only query.

Any re-run of the query then retrieves the cached results with low latency and no I/O costs, as long as they are still in the cache. This works for queries submitted both on an HTTP endpoint and using Websockets, either as byte-code or in string form.

**Note**  
Queries sent to the profile endpoint are not cached even when the query cache is enabled.

You can control how the Neptune query results cache behaves in several ways. For example:
+ You can get cached results paginated, in blocks.
+ You can specify the time-to-live (TTL) for specified queries.
+ You can clear the cache for specified queries.
+ You can clear the entire cache.
+ You can set up to be notified if results exceed the cache size.

The cache is maintained using a least-recently-used (LRU) policy, meaning that once the space allotted to the cache is full, the least-recently-used results are removed to make room when new results are being cached.

**Important**  
The query-results cache is not available on `t3.medium` or `t4.medium` instance types.

## Enabling the query results cache in Neptune
<a name="gremlin-results-cache-enabling"></a>

 The query results cache can be enabled across all instances in a cluster or per-instance. To enable the results cache on all instances in a cluster, set the `neptune_result_cache` parameter in the cluster's `cluster-parameter-group` to `1`. To enable this on a specific instance, set the `neptune_result_cache` parameter in the instance's `instance-parameter-group` to `1`. The cluster parameter group setting will override the instance parameter group value. 

 A restart is required on any affected instances for the results cache parameter settings to be applied. While you can enable the results cache across all instances in a cluster via the `cluster-parameter-group`, each instance maintains its own cache. The query results cache feature is not a cluster-wide cache. 

Once the results cache is enabled, Neptune sets aside a portion of current memory for caching query results. The larger the instance type you're using and the more memory is available, the more memory Neptune sets aside for the cache.

If the results cache memory fills up, Neptune automatically drops least-recently-used (LRU) cached results to make way for new ones.

You can check the current status of the results cache using the [Instance Status](access-graph-status.md) command.

## Using hints to cache query results
<a name="gremlin-results-cache-using"></a>

Once the query results cache is enabled, you use query hints to control query caching. All the examples below apply to the same query traversal, namely:

```
g.V().has('genre','drama').in('likes')
```

### Using `enableResultCache`
<a name="using-enableResultCache"></a>

With the query results cache enabled, you can cache the results of a Gremlin query using the `enableResultCache` query hint, as follows:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Neptune then returns the query results to you, and also caches them. Later, you can access the cached results by issuing exactly the same query again:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

The cache key that identifies the cached results is the query string itself, namely:

```
g.V().has('genre','drama').in('likes')
```

### Using `enableResultCacheWithTTL`
<a name="using-enableResultCacheWithTTL"></a>

You can specify how long the query results should be cached for by using the `enableResultCacheWithTTL` query hint. For example, the following query specifies that the query results should expire after 120 seconds:

```
g.with('Neptune#enableResultCacheWithTTL', 120)
 .V().has('genre','drama').in('likes')
```

Again, the cache key that identifies the cached results is the base query string:

```
g.V().has('genre','drama').in('likes')
```

And again, you can access the cached results using that query string with the `enableResultCache` query hint:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

If 120 or more seconds have passed since the results were cached, that query will return new results, and cache them, without any time-to-live.

You can also access the cached results by issuing the same query again with the `enableResultCacheWithTTL` query hint. For example:

```
g.with('Neptune#enableResultCacheWithTTL', 140)
 .V().has('genre','drama').in('likes')
```

Until 120 seconds have passed (that is, the TTL currently in effect), this new query using the `enableResultCacheWithTTL` query hint returns the cached results. After 120 seconds, it would return new results and cache them with a time-to-live of 140 seconds.

**Note**  
If results for a query key are already cached, then the same query key with `enableResultCacheWithTTL` does not generate new results and has no effect on the time-to-live of the currently cached results.  
If results were previously cached using `enableResultCache`, the cache must first be cleared before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.
If results were previously cached using `enableResultCachewithTTL`, that previous TTL must first expire before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.

### Using `invalidateResultCacheKey`
<a name="using-invalidateResultCacheKey"></a>

You can use the `invalidateResultCacheKey` query hint to clear cached results for one particular query. For example:

```
g.with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

That query clears the cache for the query key, `g.V().has('genre','drama').in('likes')`, and returns new results for that query.

You can also combine `invalidateResultCacheKey` with `enableResultCache` or `enableResultCacheWithTTL`. For example, the following query clears the current cached results, caches new results, and returns them:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

### Using `invalidateResultCache`
<a name="using-invalidateResultCache"></a>

You can use the `invalidateResultCache` query hint to clear all cached results in the query result cache. For example:

```
g.with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

That query clears the entire result cache and returns new results for the query.

You can also combine `invalidateResultCache` with `enableResultCache` or `enableResultCacheWithTTL`. For example, the following query clears the entire results cache, caches new results for this query, and returns them:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

## Paginating cached query results
<a name="gremlin-results-cache-paginating"></a>

Suppose you have already cached a large number of results like this:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Now suppose you issue the following range query:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes').range(0,10)
```

Neptune first looks for the full cache key, namely `g.V().has('genre','drama').in('likes').range(0,10)`. If that key doesn't exist, Neptune next looks to see if there is a key for that query string without the range (namely `g.V().has('genre','drama').in('likes')`). When it finds that key, Neptune then fetches the first ten results from its cache, as the range specifies.

**Note**  
If you use the `invalidateResultCacheKey` query hint with a query that has a range at the end, Neptune clears the cache for a query without the range if it doesn't find an exact match for the query with the range.

### Using `numResultsCached` with `.iterate()`
<a name="gremlin-results-cache-paginating-numResultsCached"></a>

Using the `numResultsCached` query hint, you can populate the results cache without returning all the results being cached, which can be useful when you prefer to paginate a large number of results.

The `numResultsCached` query hint only works with queries that end with `iterate()`.

For example, if you want to cache the first 50 results of the sample query:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').iterate()
```

In this case the query key in the cache is: `g.with("Neptune#numResultsCached", 50).V().has('genre','drama').in('likes')`. You can now retrieve the first ten of the cached results with this query:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').range(0, 10)
```

And, you can retrieve the next ten results from the query as follows:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').range(10, 20)
```

Don't forget to include the `numResultsCached` hint\$1 It is an essential part of the query key and must therefore be present in order to access the cached results.

**Some things to keep in mind when using `numResultsCached`**
+ **The number you supply with `numResultsCached` is applied at the end of the query.**   This means, for example, that the following query actually caches results in the range `(1000, 1500)`:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).iterate()
  ```
+ **The number you supply with `numResultsCached` specifies the maximum number of results to cache.**   This means, for example, that the following query actually caches results in the range `(1000, 2000)`:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 100000)
   .V().range(1000, 2000).iterate()
  ```
+ **Results cached by queries that end with `.range().iterate()` have their own range.**   For example, suppose you cache results using a query like this:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).iterate()
  ```

  To retrieve the first 100 results from the cache, you would write a query like this:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).range(0, 100)
  ```

  Those hundred results would be equivalent to results from the base query in the range `(1000, 1100)`.

## The query cache keys used to locate cached results
<a name="gremlin-results-cache-query-keys"></a>

After the results of a query have been cached, subsequent queries with the same *query cache key* retrieve results from the cache rather than generating new ones. The query cache key of a query is evaluated as follows:

1. All the cache-related query hints are ignored, except for `numResultsCached`.

1. A final `iterate()` step is ignored.

1. The rest of the query is ordered according to its byte-code representation.

The resulting string is matched against an index of the query results already in the cache to determine whether there is a cache hit for the query.

For example, take this query:

```
g.withSideEffect('Neptune#typePromotion', false).with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').iterate()
```

It will be stored as the byte-code version of this:

```
g.withSideEffect('Neptune#typePromotion', false)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes')
```

## Exceptions related to the results cache
<a name="gremlin-results-cache-exceptions"></a>

If the results of a query that you are trying to cache are too large to fit in the cache memory even after removing everything previously cached, Neptune raises a `QueryLimitExceededException` fault. No results are returned, and the exception generates the following error message:

```
The result size is larger than the allocated cache,
      please refer to results cache best practices for options to rerun the query.
```

You can supress this message using the `noCacheExceptions` query hint, as follows:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#noCacheExceptions', true)
 .V().has('genre','drama').in('likes')
```

# Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps
<a name="gremlin-efficient-upserts"></a>

An upsert (or conditional insert) reuses a vertex or edge if it already exists, or creates it if it doesn't. Efficient upserts can make a significant difference in the performance of Gremlin queries.

Upserts allow you to write idempotent insert operations: no matter how many times you run such an operation, the overall outcome is the same. This is useful in highly concurrent write scenarios where concurrent modifications to the same part of the graph can force one or more transactions to roll back with a `ConcurrentModificationException`, thereby necessitating retries.

For example, the following query upserts a vertex by using the supplied `Map` to first try to find a vertex with a `T.id` of `"v-1"`. If that vertex is found then it is returned. If it is not found then a vertex with that `id` and property are created through the `onCreate` clause.

```
g.mergeV([(id):'v-1']).
  option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org'])
```

## Batching upserts to improve throughput
<a name="gremlin-upserts-batching"></a>

For high throughput write scenarios, you can chain `mergeV()` and `mergeE()` steps together to upsert vertices and edges in batches. Batching reduces the transactional overhead of upserting large numbers of vertices and edges. You can then further improve throughput by upserting batch requests in parallel using multiple clients.

As a rule of thumb we recommend upserting approximately 200 records per batch request. A record is a single vertex or edge label or property. A vertex with a single label and 4 properties, for example, creates 5 records. An edge with a label and a single property creates 2 records. If you wanted to upsert batches of vertices, each with a single label and 4 properties, you should start with a batch size of 40, because `200 / (1 + 4) = 40`.

You can experiment with the batch size. 200 records per batch is a good starting point, but the ideal batch size may be higher or lower depending on your workload. Note, however, that Neptune may limit the overall number of Gremlin steps per request. This limit is not documented, but to be on the safe side, try to ensure that your requests contain no more than 1,500 Gremlin steps. Neptune may reject large batch requests with more than 1,500 steps.

To increase throughput, you can upsert batches in parallel using multiple clients (see [Creating Efficient Multithreaded Gremlin Writes](best-practices-gremlin-multithreaded-writes.md)). The number of clients should be the same as the number of worker threads on your Neptune writer instance, which is typically 2 x the number of vCPUs on the server. For instance, an `r5.8xlarge` instance has 32 vCPUs and 64 worker threads. For high-throughput write scenarios using an `r5.8xlarge`, you would use 64 clients writing batch upserts to Neptune in parallel.

Each client should submit a batch request and wait for the request to complete before submitting another request. Although the multiple clients run in parallel, each individual client submits requests in a serial fashion. This ensures that the server is supplied with a steady stream of requests that occupy all the worker threads without flooding the server-side request queue (see [Sizing DB instances in a Neptune DB cluster](feature-overview-db-clusters.md#feature-overview-sizing-instances)).

## Try to avoid steps that generate multiple traversers
<a name="gremlin-upserts-single-traverser"></a>

When a Gremlin step executes, it takes an incoming traverser, and emits one or more output traversers. The number of traversers emitted by a step determines the number of times the next step is executed.

Typically, when performing batch operations you want each operation, such as upsert vertex A, to execute once, so that the sequence of operations looks like this: upsert vertex A, then upsert vertex B, then upsert vertex C, and so on. As long as a step creates or modifies only one element, it emits only one traverser, and the steps that represent the next operation are executed only once. If, on the other hand, an operation creates or modifies more than one element, it emits multiple traversers, which in turn cause the subsequent steps to be executed multiple times, once per emitted traverser. This can result in the database performing unnecessary additional work, and in some cases can result in the creation of unwanted additional vertices, edges or property values.

An example of how things can go wrong is with a query like `g.V().addV()`. This simple query adds a vertex for every vertex found in the graph, because `V()` emits a traverser for each vertex in the graph and each of those traversers triggers a call to `addV()`.

See [Mixing upserts and inserts](#gremlin-upserts-and-inserts) for ways to deal with operations that can emit multiple traversers.

## Upserting vertices
<a name="gremlin-upserts-vertices"></a>

The `mergeV()` step is specifically designed for upserting vertices. It takes as an argument a `Map` that represents elements to match for existing vertices in the graph, and if an element is not found, uses that `Map` to create a new vertex. The step also allows you to alter the behavior in the event of a creation or a match, where the `option()` modulator can be applied with `Merge.onCreate` and `Merge.onMatch` tokens to control those respective behaviors. See the TinkerPop [Reference Documentation](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step) for further information about how to use this step.

You can use a vertex ID to determine whether a specific vertex exists. This is the preferred approach, because Neptune optimizes upserts for highly concurrent use cases around IDs. As an example, the following query creates a vertex with a given vertex ID if it doesn't already exist, or reuses it if it does:

```
g.mergeV([(T.id): 'v-1']).
    option(onCreate, [(T.label): 'PERSON', email: 'person-1@example.org', age: 21]).
    option(onMatch, [age: 22]).
  id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the vertex, an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the vertex properties back to the client, which helps reduce the locking cost of the query.

Alternatively, you can use a vertex property to identify a vertex:

```
g.mergeV([email: 'person-1@example.org']).
    option(onCreate, [(T.label): 'PERSON', age: 21]).
    option(onMatch, [age: 22]).
  id()
```

If possible, use your own user-supplied IDs to create vertices, and use these IDs to determine whether a vertex exists during an upsert operation. This lets Neptune optimize the upserts. An ID-based upsert can be significantly more efficient than a property-based upsert when concurrent modifications are common.

### Chaining vertex upserts
<a name="gremlin-upserts-vertices-chaining"></a>

You can chain vertex upserts together to insert them in a batch:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .id()
```

Alternatively, you can also use this `mergeV()` syntax:

```
g.mergeV([(T.id): 'v-1', (T.label): 'PERSON', email: 'person-1@example.org']).
  mergeV([(T.id): 'v-2', (T.label): 'PERSON', email: 'person-2@example.org']).
  mergeV([(T.id): 'v-3', (T.label): 'PERSON', email: 'person-3@example.org'])
```

However, because this form of the query includes elements in the search criteria that are superfluous to the basic lookup by `id`, it isn't as efficient as the previous query.

## Upserting edges
<a name="gremlin-upserts-edges"></a>

The `mergeE()` step is specifically designed for upserting edges. It takes a `Map` as an argument that represents elements to match for existing edges in the graph and if an element is not found, uses that `Map` to create a new edge. The step also allows you to alter the behavior in the event of a creation or a match, where the `option()` modulator can be applied with `Merge.onCreate` and `Merge.onMatch` tokens to control those respective behaviors. See the TinkerPop [Reference Documentation](https://tinkerpop.apache.org/docs/current/reference/#mergeedge-step) for further information about how to use this step.

You can use edge IDs to upsert edges in the same way you upsert vertices using custom vertex IDs. Again, this is the preferred approach because it allows Neptune to optimize the query. For example, the following query creates an edge based on its edge ID if it doesn't already exist, or reuses it if it does. The query also uses the IDs of the `Direction.from` and `Direction.to` vertices if it needs to create a new edge:

```
g.mergeE([(T.id): 'e-1']).
    option(onCreate, [(from): 'v-1', (to): 'v-2', weight: 1.0]).
    option(onMatch, [weight: 0.5]).
  id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the edge, adding an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the edge properties back to the client, which helps reduce the locking cost of the query.

Many applications use custom vertex IDs, but leave Neptune to generate edge IDs. If you don't know the ID of an edge, but you do know the `from` and `to` vertex IDs, you can use this kind of query to upsert an edge:

```
g.mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  id()
```

All vertices referenced by `mergeE()` must exist for the step to create the edge.

### Chaining edge upserts
<a name="gremlin-upserts-edges-chaining"></a>

As with vertex upserts, it's straightforward to chain `mergeE()` steps together for batch requests:

```
g.mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  mergeE([(from): 'v-2', (to): 'v-3', (T.label): 'KNOWS']).
  mergeE([(from): 'v-3', (to): 'v-4', (T.label): 'KNOWS']).
  id()
```

## Combining vertex and edge upserts
<a name="gremlin-upserts-vertexes-and-edges"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

```
g.mergeV([(id):'v-1']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org']).
  mergeV([(id):'v-2']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-2@example.org']).
  mergeV([(id):'v-3']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-3@example.org']).
  mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  mergeE([(from): 'v-2', (to): 'v-3', (T.label): 'KNOWS']).
 id()
```

## Mixing upserts and inserts
<a name="gremlin-upserts-and-inserts"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

Upserts typically proceed one element at a time. If you stick to the upsert patterns presented here, each upsert operation emits a single traverser, which causes the subsequent operation to be executed just once.

However, sometimes you may want to mix upserts with inserts. This can be the case, for example, if you use edges to represent instances of actions or events. A request might use upserts to ensure that all necessary vertices exist, and then use inserts to add edges. With requests of this kind, pay attention to the potential number of traversers being emitted from each operation.

Consider the following example, which mixes upserts and inserts to add edges that represent events into the graph:

```
// Fully optimized, but inserts too many edges
g.mergeV([(id):'v-1']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org']).
  mergeV([(id):'v-2']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-2@example.org']).
  mergeV([(id):'v-3']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-3@example.org']).
  mergeV([(T.id): 'c-1', (T.label): 'CITY', name: 'city-1']).
  V('p-1', 'p-2').
  addE('FOLLOWED').to(V('p-1')).
  V('p-1', 'p-2', 'p-3').
  addE('VISITED').to(V('c-1')).
  id()
```

The query should insert 5 edges: 2 FOLLOWED edges and 3 VISITED edges. However, the query as written inserts 8 edges: 2 FOLLOWED and 6 VISITED. The reason for this is that the operation that inserts the 2 FOLLOWED edges emits 2 traversers, causing the subsequent insert operation, which inserts 3 edges, to be executed twice.

The fix is to add a `fold()` step after each operation that can potentially emit more than one traverser:

```
g.mergeV([(T.id): 'v-1', (T.label): 'PERSON', email: 'person-1@example.org']).
  mergeV([(T.id): 'v-2', (T.label): 'PERSON', email: 'person-2@example.org']).
  mergeV([(T.id): 'v-3', (T.label): 'PERSON', email: 'person-3@example.org']).
  mergeV([(T.id): 'c-1', (T.label): 'CITY', name: 'city-1']).
  V('p-1', 'p-2').
  addE('FOLLOWED').
    to(V('p-1')).
  fold().
  V('p-1', 'p-2', 'p-3').
  addE('VISITED').
    to(V('c-1')).
  id()
```

Here we’ve inserted a `fold()` step after the operation that inserts FOLLOWED edges. This results in a single traverser, which then causes the subsequent operation to be executed only once.

The downside of this approach is that the query is now not fully optimized, because `fold()` is not optimized. The insert operation that follows `fold()` will now also not be optimized.

If you need to use `fold()` to reduce the number of traversers on behalf of subsequent steps, try to order your operations so that the least expensive ones occupy the non-optimized part of the query.

## Setting Cardinality
<a name="gremlin-upserts-setting-cardinality"></a>

 The default cardinality for vertex properties in Neptune is set, which means that when using mergeV() the values supplied in the map are all going to be given that cardinality. To use single cardinality, you must be explicit in its usage. Starting in TinkerPop 3.7.0, there is a new syntax that allows the cardinality to be supplied as part of the map as shown in the following example: 

```
g.mergeV([(T.id): '1234']).
  option(onMatch, ['age': single(20), 'name': single('alice'), 'city': set('miami')])
```

 Alternatively, you may set the cardinality as a default for that `option` as follows: 

```
// age and name are set to single cardinality by default
g.mergeV([(T.id): '1234']).
  option(onMatch, ['age': 22, 'name': 'alice', 'city': set('boston')], single)
```

 There are fewer options for setting cardinality in `mergeV()` prior to version 3.7.0. The general approach is to fall back to the `property()` step as follows: 

```
g.mergeV([(T.id): '1234']). 
  option(onMatch, sideEffect(property(single,'age', 20).
  property(set,'city','miami')).constant([:]))
```

**Note**  
 This approach will only work with `mergeV()` when it is used with a start step. You would therefore not be able to chain `mergeV()` within a single traversal as the first `mergeV()` after the start step that uses this syntax will produce an error should the incoming traverser be a graph element. In this case, you would want to break up your `mergeV()` calls into multiple requests where each can be a start step. 

# Making efficient Gremlin upserts with `fold()/coalesce()/unfold()`
<a name="gremlin-efficient-upserts-pre-3.6"></a>

An upsert (or conditional insert) reuses a vertex or edge if it already exists, or creates it if it doesn't. Efficient upserts can make a significant difference in the performance of Gremlin queries.

This page shows how use the `fold()/coalesce()/unfold()` Gremlin pattern to make efficient upserts. However, with the release of TinkerPop version 3.6.x introduced in Neptune in engine version [1.2.1.0](engine-releases-1.2.1.0.md), the new `mergeV()` and `mergeE()` steps are preferable in most cases. The `fold()/coalesce()/unfold()` pattern described here may still be useful in a some complex situations, but in general use `mergeV()` and `mergeE()` if you can, as described in [Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps](gremlin-efficient-upserts.md).

Upserts allow you to write idempotent insert operations: no matter how many times you run such an operation, the overall outcome is the same. This is useful in highly concurrent write scenarios where concurrent modifications to the same part of the graph can force one or more transactions to roll back with a `ConcurrentModificationException`, thereby necessitating a retry.

For example, the following query upserts a vertex by first looking for the specified vertex in the dataset, and then folding the results into a list. In the first traversal supplied to the `coalesce()` step, the query then unfolds this list. If the unfolded list is not empty, the results are emitted from the `coalesce()`. If, however, the `unfold()` returns an empty collection because the vertex does not currently exist, `coalesce()` moves on to evaluate the second traversal with which it has been supplied, and in this second traversal the query creates the missing vertex.

```
g.V('v-1').fold()
          .coalesce(
             unfold(),
             addV('Person').property(id, 'v-1')
                           .property('email', 'person-1@example.org')
           )
```

## Use an optimized form of `coalesce()` for upserts
<a name="gremlin-upserts-pre-3.6-coalesce"></a>

Neptune can optimize the `fold().coalesce(unfold(), ...)` idiom to make high-throughput updates, but this optimization only works if both parts of the `coalesce()` return either a vertex or an edge but nothing else. If you try to return something different, such as a property, from any part of the `coalesce()`, the Neptune optimization does not occur. The query may succeed, but it will not perform as well as an optimized version, particularly against large datasets.

Because unoptimized upsert queries increase execution times and reduce throughput, it's worth using the Gremlin `explain` endpoint to determine whether an upsert query is fully optimized. When reviewing `explain` plans, look for lines that begin with `+ not converted into Neptune steps` and `WARNING: >>`. For example:

```
+ not converted into Neptune steps: [FoldStep, CoalesceStep([[UnfoldStep], [AddEdgeSte...
WARNING: >> FoldStep << is not supported natively yet
```

These warnings can help you identify the parts of a query that are preventing it from being fully optimized.

Sometimes it isn't possible to optimize a query fully. In these situations you should try to put the steps that cannot be optimized at the end of the query, thereby allowing the engine to optimize as many steps as possible. This technique is used in some of the batch upsert examples, where all optimized upserts for a set of vertices or edges are performed before any additional, potentially unoptimized modifications are applied to the same vertices or edges.

## Batching upserts to improve throughput
<a name="gremlin-upserts-pre-3.6-batching"></a>

For high throughput write scenarios, you can chain upsert steps together to upsert vertices and edges in batches. Batching reduces the transactional overhead of upserting large numbers of vertices and edges. You can then further improve throughput by upserting batch requests in parallel using multiple clients.

As a rule of thumb we recommend upserting approximately 200 records per batch request. A record is a single vertex or edge label or property. A vertex with a single label and 4 properties, for example, creates 5 records. An edge with a label and a single property creates 2 records. If you wanted to upsert batches of vertices, each with a single label and 4 properties, you should start with a batch size of 40, because `200 / (1 + 4) = 40`.

You can experiment with the batch size. 200 records per batch is a good starting point, but the ideal batch size may be higher or lower depending on your workload. Note, however, that Neptune may limit the overall number of Gremlin steps per request. This limit is not documented, but to be on the safe side try to ensure that your requests contain no more than 1500 Gremlin steps. Neptune may reject large batch requests with more than 1500 steps.

To increase throughput, you can upsert batches in parallel using multiple clients (see [Creating Efficient Multithreaded Gremlin Writes](best-practices-gremlin-multithreaded-writes.md)). The number of clients should be the same as the number of worker threads on your Neptune writer instance, which is typically 2 x the number of vCPUs on the server. For instance, an `r5.8xlarge` instance has 32 vCPUs and 64 worker threads. For high-throughput write scenarios using an `r5.8xlarge`, you would use 64 clients writing batch upserts to Neptune in parallel.

Each client should submit a batch request and wait for the request to complete before submitting another request. Although the multiple clients run in parallel, each individual client submits requests in a serial fashion. This ensures that the server is supplied with a steady stream of requests that occupy all the worker threads without flooding the server-side request queue (see [Sizing DB instances in a Neptune DB cluster](feature-overview-db-clusters.md#feature-overview-sizing-instances)).

## Try to avoid steps that generate multiple traversers
<a name="gremlin-upserts-pre-3.6-single-traverser"></a>

When a Gremlin step executes, it takes an incoming traverser, and emits one or more output traversers. The number of traversers emitted by a step determines the number of times the next step is executed.

Typically, when performing batch operations you want each operation, such as upsert vertex A, to execute once, so that the sequence of operations looks like this: upsert vertex A, then upsert vertex B, then upsert vertex C, and so on. As long as a step creates or modifies only one element, it emits only one traverser, and the steps that represent the next operation are executed only once. If, on the other hand, an operation creates or modifies more than one element, it emits multiple traversers, which in turn cause the subsequent steps to be executed multiple times, once per emitted traverser. This can result in the database performing unnecessary additional work, and in some cases can result in the creation of unwanted additional vertices, edges or property values.

An example of how things can go wrong is with a query like `g.V().addV()`. This simple query adds a vertex for every vertex found in the graph, because `V()` emits a traverser for each vertex in the graph and each of those traversers triggers a call to `addV()`.

See [Mixing upserts and inserts](#gremlin-upserts-pre-3.6-and-inserts) for ways to deal with operations that can emit multiple traversers.

## Upserting vertices
<a name="gremlin-upserts-pre-3.6-vertices"></a>

You can use a vertex ID to determine whether a corresponding vertex exists. This is the preferred approach, because Neptune optimizes upserts for highly concurrent use cases around IDs. As an example, the following query creates a vertex with a given vertex ID if it doesn't already exist, or reuses it if it does:

```
g.V('v-1')
 .fold()
  .coalesce(unfold(),
            addV('Person').property(id, 'v-1')
                          .property('email', 'person-1@example.org'))
  .id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the vertex, adding an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the vertex properties back to the client, which helps reduce the locking cost of the query.

Alternatively, you can use a vertex property to determine whether the vertex exists:

```
g.V()
 .hasLabel('Person')
 .has('email', 'person-1@example.org')
 .fold()
 .coalesce(unfold(),
           addV('Person').property('email', 'person-1@example.org'))
 .id()
```

If possible, use your own user-supplied IDs to create vertices, and use these IDs to determine whether a vertex exists during an upsert operation. This lets Neptune optimize upserts around the IDs. An ID-based upsert can be significantly more efficient than a property-based upsert in highly concurrent modification scenarios.

### Chaining vertex upserts
<a name="gremlin-upserts-pre-3.6-vertices-chaining"></a>

You can chain vertex upserts together to insert them in a batch:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .id()
```

## Upserting edges
<a name="gremlin-upserts-pre-3.6-edges"></a>

You can use edge IDs to upsert edges in the same way you upsert vertices using custom vertex IDs. Again, this is the preferred approach because it allows Neptune to optimize the query. For example, the following query creates an edge based on its edge ID if it doesn't already exist, or reuses it if it does. The query also uses the IDs of the `from` and `to` vertices if it needs to create a new edge.

```
g.E('e-1')
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2'))
                        .property(id, 'e-1'))
 .id()
```

Many applications use custom vertex IDs, but leave Neptune to generate edge IDs. If you don't know the ID of an edge, but you do know the `from` and `to` vertex IDs, you can use this formulation to upsert an edge:

```
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2')))
 .id()
```

Note that the vertex step in the `where()` clause should be `inV()` (or `outV()` if you've used `inE()` to find the edge), not `otherV()`. Do not use `otherV()`, here, or the query will not be optimized and performance will suffer. For example, Neptune would not optimize the following query:

```
// Unoptimized upsert, because of otherV()
g.V('v-1')
 .outE('KNOWS')
 .where(otherV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2')))
 .id()
```

If you don't know the edge or vertex IDs up front, you can upsert using vertex properties:

```
g.V()
 .hasLabel('Person')
 .has('name', 'person-1')
 .outE('LIVES_IN')
 .where(inV().hasLabel('City').has('name', 'city-1'))
 .fold()
 .coalesce(unfold(),
           addE('LIVES_IN').from(V().hasLabel('Person')
                                    .has('name', 'person-1'))
                           .to(V().hasLabel('City')
                                  .has('name', 'city-1')))
 .id()
```

As with vertex upserts, it's preferable to use ID-based edge upserts using either an edge ID or `from` and `to` vertex IDs, rather than property-based upserts, so that Neptune can fully optimize the upsert.

### Checking for `from` and `to` vertex existence
<a name="gremlin-upserts-pre-3.6-edges-checking"></a>

Note the construction of the steps that create a new edge: `addE().from().to()`. This construction ensures that the query checks the existence of both the `from` and the `to` vertex. If either of these does not exist, the query returns an error as follows:

```
{
  "detailedMessage": "Encountered a traverser that does not map to a value for child...
  "code": "IllegalArgumentException",
  "requestId": "..."
}
```

If it's possible that either the `from` or the `to` vertex doesn't exist, you should attempt to upsert them before upserting the edge between them. See [Combining vertex and edge upserts](#gremlin-upserts-pre-3.6-vertexes-and-edges).

There's an alternative construction for creating an edge that you shouldn't use: `V().addE().to()`. It only adds an edge if the `from` vertex exists. If the `to` vertex doesn't exist the, query generates an error, as described previously, but if the `from` vertex doesn't exist, it silently fails to insert an edge, without generating any error. For example, the following upsert completes without upserting an edge if the `from` vertex doesn't exist:

```
// Will not insert edge if from vertex does not exist
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2')))
 .id()
```

### Chaining edge upserts
<a name="gremlin-upserts-pre-3.6-edges-chaining"></a>

If you want to chain edge upserts together to create a batch request, you must begin each upsert with a vertex lookup, even if you already know the edge IDs.

If you do already know the IDs of the edges you want to upsert, and the IDs of the `from` and `to` vertices, you can use this formulation:

```
g.V('v-1')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2'))
                   .property(id, 'e-1'))
 .V('v-3')
 .outE('KNOWS')
 .hasId('e-2').fold()
 .coalesce(unfold(),
           V('v-3').addE('KNOWS')
                   .to(V('v-4'))
                   .property(id, 'e-2'))
 .V('v-5')
 .outE('KNOWS')
 .hasId('e-3')
 .fold()
 .coalesce(unfold(),
           V('v-5').addE('KNOWS')
                   .to(V('v-6'))
                   .property(id, 'e-3'))
 .id()
```

Perhaps the most common batch edge upsert scenario is that you know the `from` and `to` vertex IDs, but don't know the IDs of the edges you want to upsert. In that case, use the following formulation:

```
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2')))

 .V('v-3')
 .outE('KNOWS')
 .where(inV().hasId('v-4'))
 .fold()
 .coalesce(unfold(),
           V('v-3').addE('KNOWS')
                   .to(V('v-4')))
 .V('v-5')
 .outE('KNOWS')
 .where(inV().hasId('v-6'))
 .fold()
 .coalesce(unfold(),
           V('v-5').addE('KNOWS').to(V('v-6')))
 .id()
```

If you know IDs of the edges you want to upsert, but don’t know the IDs of the `from` and `to` vertices (this is unusual), you can use this formulation:

```
g.V()
 .hasLabel('Person')
 .has('email', 'person-1@example.org')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-1@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-2@example.org'))
                     .property(id, 'e-1'))
 .V()
 .hasLabel('Person')
 .has('email', 'person-3@example.org')
 .outE('KNOWS')
 .hasId('e-2')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-3@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-4@example.org'))
              .property(id, 'e-2'))
 .V()
 .hasLabel('Person')
 .has('email', 'person-5@example.org')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-5@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-6@example.org'))
                     .property(id, 'e-3'))
 .id()
```

## Combining vertex and edge upserts
<a name="gremlin-upserts-pre-3.6-vertexes-and-edges"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

```
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2')
                         .property('name', 'person-2@example.org'))
 .V('c-1')
 .fold()
 .coalesce(unfold(),
           addV('City').property(id, 'c-1')
                       .property('name', 'city-1'))
 .V('p-1')
 .outE('LIVES_IN')
 .where(inV().hasId('c-1'))
 .fold()
 .coalesce(unfold(),
           V('p-1').addE('LIVES_IN')
                   .to(V('c-1')))
 .V('p-2')
 .outE('LIVES_IN')
 .where(inV().hasId('c-1'))
 .fold()
 .coalesce(unfold(),
           V('p-2').addE('LIVES_IN')
                   .to(V('c-1')))
 .id()
```

## Mixing upserts and inserts
<a name="gremlin-upserts-pre-3.6-and-inserts"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

Upserts typically proceed one element at a time. If you stick to the upsert patterns presented here, each upsert operation emits a single traverser, which causes the subsequent operation to be executed just once.

However, sometimes you may want to mix upserts with inserts. This can be the case, for example, if you use edges to represent instances of actions or events. A request might use upserts to ensure that all necessary vertices exist, and then use inserts to add edges. With requests of this kind, pay attention to the potential number of traversers being emitted from each operation.

Consider the following example, which mixes upserts and inserts to add edges that represent events into the graph:

```
// Fully optimized, but inserts too many edges
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2')
                         .property('name', 'person-2@example.org'))
 .V('p-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-3')
                         .property('name', 'person-3@example.org'))
 .V('c-1')
 .fold()
 .coalesce(unfold(),
           addV('City').property(id, 'c-1')
                       .property('name', 'city-1'))
 .V('p-1', 'p-2')
 .addE('FOLLOWED')
 .to(V('p-1'))
 .V('p-1', 'p-2', 'p-3')
 .addE('VISITED')
 .to(V('c-1'))
 .id()
```

The query should insert 5 edges: 2 FOLLOWED edges and 3 VISITED edges. However, the query as written inserts 8 edges: 2 FOLLOWED and 6 VISITED. The reason for this is that the operation that inserts the 2 FOLLOWED edges emits 2 traversers, causing the subsequent insert operation, which inserts 3 edges, to be executed twice.

The fix is to add a `fold()` step after each operation that can potentially emit more than one traverser:

```
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2').
                         .property('name', 'person-2@example.org'))
 .V('p-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-3').
                         .property('name', 'person-3@example.org'))
 .V('c-1')
 .fold().
 .coalesce(unfold(),
            addV('City').property(id, 'c-1').
                        .property('name', 'city-1'))
 .V('p-1', 'p-2')
 .addE('FOLLOWED')
 .to(V('p-1'))
 .fold()
 .V('p-1', 'p-2', 'p-3')
 .addE('VISITED')
 .to(V('c-1')).
 .id()
```

Here we’ve inserted a `fold()` step after the operation that inserts FOLLOWED edges. This results in a single traverser, which then causes the subsequent operation to be executed only once.

The downside of this approach is that the query is now not fully optimized, because `fold()` is not optimized. The insert operation that follows `fold()` will now not be optimized.

If you need to use `fold()` to reduce the number of traversers on behalf of subsequent steps, try to order your operations so that the least expensive ones occupy the non-optimized part of the query.

## Upserts that modify existing vertices and edges
<a name="gremlin-upserts-pre-3.6-that-modify"></a>

Sometimes you want to create a vertex or edge if it doesn’t exist, and then add or update a property to it, regardless of whether it is a new or existing vertex or edge.

To add or modify a property, use the `property()` step. Use this step outside the `coalesce()` step. If you try to modify the property of an existing vertex or edge inside the `coalesce()` step, the query may not be optimized by the Neptune query engine.

The following query adds or updates a counter property on each upserted vertex. Each `property()` step has single cardinality to ensure that the new values replace any existing values, rather than being added to a set of existing values.

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .property(single, 'counter', 1)
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .property(single, 'counter', 2)
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .property(single, 'counter', 3)
 .id()
```

If you have a property value, such as a `lastUpdated` timestamp value, that applies to all upserted elements, you can add or update it at the end of the query:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2').
 .fold().
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .V('v-1', 'v-2', 'v-3')
 .property(single, 'lastUpdated', datetime('2020-02-08'))
 .id()
```

If there are additional conditions that determine whether or not a vertex or edge should be further modified, you can use a `has()` step to filter the elements to which a modification will be applied. The following example uses a `has()` step to filter upserted vertices based on the value of their `version` property. The query then updates to 3 the `version` of any vertex whose `version` is less than 3:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org')
                         .property('version', 3))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org')
                         .property('version', 3))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org')
                         .property('version', 3))
 .V('v-1', 'v-2', 'v-3')
 .has('version', lt(3))
 .property(single, 'version', 3)
 .id()
```

# Analyzing Neptune query execution using Gremlin `explain`
<a name="gremlin-explain"></a>

Amazon Neptune has added a Gremlin feature named *explain*. This feature is a self-service tool for understanding the execution approach taken by the Neptune engine. You invoke it by adding an `explain` parameter to an HTTP call that submits a Gremlin query.

The `explain` feature provides information about the logical structure of query execution plans. You can use this information to identify potential evaluation and execution bottlenecks and tune your query, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). You can also use [query hints](gremlin-query-hints.md) to improve query execution plans.

**Topics**
+ [Understanding how Gremlin queries work in Neptune](gremlin-explain-background.md)
+ [Using the Gremlin `explain` API in Neptune](gremlin-explain-api.md)
+ [Gremlin `profile` API in Neptune](gremlin-profile-api.md)
+ [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md)
+ [Native Gremlin step support in Amazon Neptune](gremlin-step-support.md)

# Understanding how Gremlin queries work in Neptune
<a name="gremlin-explain-background"></a>

To take full advantage of the Gremlin `explain` and `profile` reports in Amazon Neptune, it is helpful to understand some background information about Gremlin queries.

**Topics**
+ [Gremlin statements in Neptune](gremlin-explain-background-statements.md)
+ [How Neptune processes Gremlin queries using statement indexes](gremlin-explain-background-indexing-examples.md)
+ [How Gremlin queries are processed in Neptune](gremlin-explain-background-querying.md)

# Gremlin statements in Neptune
<a name="gremlin-explain-background-statements"></a>

Property graph data in Amazon Neptune is composed of four-position (quad) statements. Each of these statements represents an individual atomic unit of property graph data. For more information, see [Neptune Graph Data Model](feature-overview-data-model.md). Similar to the Resource Description Framework (RDF) data model, these four positions are as follows:
+ `subject (S)`
+ `predicate (P)`
+ `object (O)`
+ `graph (G)`

Each statement is an assertion about one or more resources. For example, a statement can assert the existence of a relationship between two resources, or it can attach a property (key-value pair) to some resource.

You can think of the predicate as the verb of the statement, describing the type of relationship or property. The object is the target of the relationship, or the value of the property. The graph position is optional and can be used in many different ways. For the Neptune property graph (PG) data, it is either unused (null graph) or it is used to represent the identifier for an edge. A set of statements with shared resource identifiers creates a graph.

There are three classes of statements in the Neptune property graph data model:

**Topics**
+ [Vertex Label Statements](#gremlin-explain-background-vertex-labels)
+ [Edge Statements](#gremlin-explain-background-edge-statements)
+ [Property Statements](#gremlin-explain-background-property-statements)

## Gremlin Vertex Label Statements
<a name="gremlin-explain-background-vertex-labels"></a>

Vertex label statements in Neptune serve two purposes:
+ They track the labels for a vertex.
+ The presence of at least one of these statements is what implies the existence of a particular vertex in the graph.

The subject of these statements is a vertex identifier, and the object is a label, both of which are specified by the user. You use a special fixed predicate for these statements, displayed as `<~label>`, and a default graph identifier (the null graph), displayed as `<~>`.

For example, consider the following `addV` traversal.

```
g.addV("Person").property(id, "v1")
```

This traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <~label> <Person> <~>) .]
```

## Gremlin Edge Statements
<a name="gremlin-explain-background-edge-statements"></a>

A Gremlin edge statement is what implies the existence of an edge between two vertices in a graph in Neptune. The subject (S) of an edge statement is the source `from` vertex. The predicate (P) is a user-supplied edge label. The object (O) is the target `to` vertex. The graph (G) is a user-supplied edge identifier.

For example, consider the following `addE` traversal.

```
g.addE("knows").from(V("v1")).to(V("v2")).property(id, "e1")
```

The traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <knows> <v2> <e1>) .]
```

## Gremlin Property Statements
<a name="gremlin-explain-background-property-statements"></a>

A Gremlin property statement in Neptune asserts an individual property value for a vertex or edge. The subject is a user-supplied vertex or edge identifier. The predicate is the property name (key), and the object is the individual property value. The graph (G) is again the default graph identifier, the null graph, displayed as `<~>`.

Consider the following vertex property example.

```
g.V("v1").property("name", "John")
```

This statement results in the following.

```
StatementEvent[Added(<v1> <name> "John" <~>) .]
```

Property statements differ from others in that their object is a primitive value (a `string`, `date`, `byte`, `short`, `int`, `long`, `float`, or `double`). Their object is not a resource identifier that could be used as the subject of another assertion.

For multi-properties, each individual property value in the set receives its own statement.

```
g.V("v1").property(set, "phone", "956-424-2563").property(set, "phone", "956-354-3692 (tel:9563543692)")
```

This results in the following.

```
StatementEvent[Added(<v1> <phone> "956-424-2563" <~>) .]
StatementEvent[Added(<v1> <phone> "956-354-3692" <~>) .]
```

Edge properties are handled similarly to vertex properties, but use the edge identifier in the (S) position. For example, adding a property to an edge:

```
g.E("e1").property("weight", 0.8)
```

This results in the following statement being added to the graph.

```
StatementEvent[Added(<e1> <weight> 0.8 <~>) .]
```

# How Neptune processes Gremlin queries using statement indexes
<a name="gremlin-explain-background-indexing-examples"></a>

Statements are accessed in Amazon Neptune by way of three statement indexes, as detailed in [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md). Neptune extracts a statement *pattern* from a Gremlin query in which some positions are known, and the rest are left for discovery by index search.

Neptune assumes that the size of the property graph schema is not large. This means that the number of distinct edge labels and property names is fairly low, resulting in a low total number of distinct predicates. Neptune tracks distinct predicates in a separate index. It uses this cache of predicates to do a union scan of `{ all P x POGS }` rather than use an OSGP index. Avoiding the need for a reverse traversal OSGP index saves both storage space and load throughput.

The Neptune Gremlin Explain/Profile API lets you obtain the predicate count in your graph. You can then determine whether your application invalidates the Neptune assumption that your property graph schema is small.

The following examples help illustrate how Neptune uses indexes to process Gremlin queries.

**Question: What are the labels of vertex `v1`?**

```
  Gremlin code:      g.V('v1').label()
  Pattern:           (<v1>, <~label>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<~label>:*
```

**Question: What are the 'knows' out-edges of vertex `v1`?**

```
  Gremlin code:      g.V('v1').out('knows')
  Pattern:           (<v1>, <knows>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<knows>:*
```

**Question: Which vertices have a `Person` vertex label?**

```
  Gremlin code:      g.V().hasLabel('Person')
  Pattern:           (?, <~label>, <Person>, <~>)
  Known positions:   POG
  Lookup positions:  S
  Index:             POGS
  Key range:         <~label>:<Person>:<~>:*
```

**Question: What are the from/to vertices of a given edge `e1`?**

```
  Gremlin code:      g.E('e1').bothV()
  Pattern:           (?, ?, ?, <e1>)
  Known positions:   G
  Lookup positions:  SPO
  Index:             GPSO
  Key range:         <e1>:*
```

One statement index that Neptune does **not** have is a reverse traversal OSGP index. This index could be used to gather all incoming edges across all edge labels, as in the following example.

**Question: What are the incoming adjacent vertices `v1`?**

```
  Gremlin code:      g.V('v1').in()
  Pattern:           (?, ?, <v1>, ?)
  Known positions:   O
  Lookup positions:  SPG
  Index:             OSGP  // <-- Index does not exist
```

# How Gremlin queries are processed in Neptune
<a name="gremlin-explain-background-querying"></a>

In Amazon Neptune, more complex traversals can be represented by a series of patterns that create a relation based on the definition of named variables that can be shared across patterns to create joins. This is shown in the following example.

**Question: What is the two-hop neighborhood of vertex `v1`?**

```
  Gremlin code:      g.V(‘v1’).out('knows').out('knows').path()
  Pattern:           (?1=<v1>, <knows>, ?2, ?) X Pattern(?2, <knows>, ?3, ?)

  The pattern produces a three-column relation (?1, ?2, ?3) like this:
                     ?1     ?2     ?3
                     ================
                     v1     v2     v3
                     v1     v2     v4
                     v1     v5     v6
```

By sharing the `?2` variable across the two patterns (at the O position in the first pattern and the S position of the second pattern), you create a join from the first hop neighbors to the second hop neighbors. Each Neptune solution has bindings for the three named variables, which can be used to re-create a [TinkerPop Traverser](http://tinkerpop.apache.org/docs/current/reference/#_the_traverser) (including path information).

```
```

The first step in Gremlin query processing is to parse the query into a TinkerPop [Traversal](http://tinkerpop.apache.org/docs/current/reference/#traversal) object, composed of a series of TinkerPop [steps](http://tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps). These steps, which are part of the open-source [Apache TinkerPop project](http://tinkerpop.apache.org/), are both the logical and physical operators that compose a Gremlin traversal in the reference implementation. They are both used to represent the model of the query. They are executable operators that can produce solutions according to the semantics of the operator that they represent. For example, `.V()` is both represented and executed by the TinkerPop [GraphStep](http://tinkerpop.apache.org/docs/current/reference/#graph-step).

Because these off-the-shelf TinkerPop steps are executable, such a TinkerPop Traversal can execute any Gremlin query and produce the correct answer. However, when executed against a large graph, TinkerPop steps can sometimes be very inefficient and slow. Instead of using them, Neptune tries to convert the traversal into a declarative form composed of groups of patterns, as described previously.

Neptune doesn't currently support all Gremlin operators (steps) in its native query engine. So it tries to collapse as many steps as possible down into a single `NeptuneGraphQueryStep`, which contains the declarative logical query plan for all the steps that have been converted. Ideally, all steps are converted. But when a step is encountered that can't be converted, Neptune breaks out of native execution and defers all query execution from that point forward to the TinkerPop steps. It doesn't try to weave in and out of native execution.

After the steps are translated into a logical query plan, Neptune runs a series of query optimizers that rewrite the query plan based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.

After an optimized query plan is produced, Neptune creates a pipeline of physical operators that do the work of executing the query. This includes reading data from the statement indices, performing joins of various types, filtering, ordering, and so on. The pipeline produces a solution stream that is then converted back into a stream of TinkerPop Traverser objects.

## Serialization of query results
<a name="gremlin-explain-background-querying-serialization"></a>

Amazon Neptune currently relies on the TinkerPop response message serializers to convert query results (TinkerPop Traversers) into the serialized data to be sent over the wire back to the client. These serialization formats tend to be quite verbose.

For example, to serialize the result of a vertex query such as `g.V().limit(1)`, the Neptune query engine must perform a single search to produce the query result. However, the `GraphSON` serializer would perform a large number of additional searches to package the vertex into the serialization format. It would have to perform one search to get the label, one to get the property keys, and one search per property key for the vertex to get all the values for each key.

Some of the serialization formats are more efficient, but all require additional searches. Additionally, the TinkerPop serializers don't try to avoid duplicated searches, often resulting in many searches being repeated unnecessarily.

This makes it very important to write your queries so that they ask specifically just for the information they need. For example, `g.V().limit(1).id()` would return just the vertex ID and eliminate all the additional serializer searches. The [Gremlin `profile` API in Neptune](gremlin-profile-api.md) allows you to see how many search calls are made during query execution and during serialization.

# Using the Gremlin `explain` API in Neptune
<a name="gremlin-explain-api"></a>

The Amazon Neptune Gremlin `explain` API returns the query plan that would be executed if a specified query were run. Because the API doesn't actually run the query, the plan is returned almost instantaneously.

It differs from the TinkerPop .explain() step so as to be able to report information specific to the Neptune engine.

## Information contained in a Gremlin `explain` report
<a name="gremlin-explain-api-results"></a>

An `explain` report contains the following information:
+ The query string as requested.
+ **The original traversal.** This is the TinkerPop Traversal object produced by parsing the query string into TinkerPop steps. It is equivalent to the original query produced by running `.explain()` on the query against the TinkerPop TinkerGraph.
+ **The converted traversal.** This is the Neptune Traversal produced by converting the TinkerPop Traversal into the Neptune logical query plan representation. In many cases the entire TinkerPop traversal is converted into two Neptune steps: one that executes the entire query (`NeptuneGraphQueryStep`) and one that converts the Neptune query engine output back into TinkerPop Traversers (`NeptuneTraverserConverterStep`).
+ **The optimized traversal.** This is the optimized version of the Neptune query plan after it has been run through a series of static work-reducing optimizers that rewrite the query based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.
+ **The predicate count.** Because of the Neptune indexing strategy described earlier, having a large number of different predicates can cause performance problems. This is especially true for queries that use reverse traversal operators with no edge label (`.in` or `.both`). If such operators are used and the predicate count is high enough, the `explain` report displays a warning message.
+ **DFE information.** When the DFE alternative engine is enabled, the following traversal components may show up in the optimized traversal:
  + **`DFEStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFENode`. `DFEStep` represents the part of the query plan that is executed in the DFE engine.
  + **`DFENode`**   –   Contains the intermediate representation as one or more child `DFEJoinGroupNodes`.
  + **`DFEJoinGroupNode`**   –   Represents a join of one or more `DFENode` or `DFEJoinGroupNode` elements.
  + **`NeptuneInterleavingStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFEStep`.

    Also contains a `stepInfo` element that contains information about the traversal, such as the frontier element, the path elements used, and so on. This information is used to process the child `DFEStep`.

  An easy way to find out if your query is being evaluated by DFE is to check whether the `explain` output contains a `DFEStep`. Any part of the traversal that is not part of the `DFEStep` will not be executed by DFE and will be executed by the TinkerPop engine.

  See [Example with DFE enabled](#gremlin-explain-dfe) for a sample report.

## Gremlin `explain` syntax
<a name="gremlin-explain-api-syntax"></a>

The syntax of the `explain` API is the same as that for the HTTP API for query, except that it uses `/gremlin/explain` as the endpoint instead of `/gremlin`, as in the following examples.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.V().limit(1)"
```

For more information, see [execute-gremlin-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_explain_query(
    gremlinQuery='g.V().limit(1)'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/explain \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().limit(1)"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain \
  -d '{"gremlin":"g.V().limit(1)"}'
```

------

The preceding query would produce the following output.

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().limit(1)

Original Traversal
==================
[GraphStep(vertex,[]), RangeGlobalStep(0,1)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Unconverted TinkerPop Steps
<a name="gremlin-explain-unconverted-steps"></a>

Ideally, all TinkerPop steps in a traversal have native Neptune operator coverage. When this isn't the case, Neptune falls back on TinkerPop step execution for gaps in its operator coverage. If a traversal uses a step for which Neptune does not yet have native coverage, the `explain` report displays a warning showing where the gap occurred.

When a step without a corresponding native Neptune operator is encountered, the entire traversal from that point forward is run using TinkerPop steps, even if subsequent steps do have native Neptune operators.

The exception to this is when Neptune full-text search is invoked. The NeptuneSearchStep implements steps without native equivalents as full-text search steps.

## Example of `explain` output where all steps in a query have native equivalents
<a name="gremlin-explain-all-steps-converted"></a>

The following is an example `explain` report for a query where all steps have native equivalents:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().out()

Original Traversal
==================
[GraphStep(vertex,[]), VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Example where some steps in a query do not have native equivalents
<a name="gremlin-explain-not-all-steps-converted"></a>

Neptune handles both `GraphStep` and `VertexStep` natively, but if you introduce a `FoldStep` and `UnfoldStep`, the resulting `explain` output is different:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().fold().unfold().out()

Original Traversal
==================
[GraphStep(vertex,[]), FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep,
    NeptuneMemoryTrackerStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

WARNING: >> FoldStep << is not supported natively yet
```

In this case, the `FoldStep` breaks you out of native execution. But even the subsequent `VertexStep` is no longer handled natively because it appears downstream of the `Fold/Unfold` steps.

For performance and cost-savings, it's important that you try to formulate traversals so that the maximum amount of work possible is done natively inside the Neptune query engine, instead of by the TinkerPop step implementations.

## Example of a query that uses Neptune full-text-search
<a name="gremlin-explain-full-text-search-steps"></a>

The following query uses Neptune full-text search:

```
g.withSideEffect("Neptune#fts.endpoint", "some_endpoint")
  .V()
  .tail(100)
  .has("Neptune#fts mark*")
  -------
  .has("name", "Neptune#fts mark*")
  .has("Person", "name", "Neptune#fts mark*")
```

The `.has("name", "Neptune#fts mark*")` part limits the search to vertexes with `name`, while `.has("Person", "name", "Neptune#fts mark*")` limits the search to vertexes with `name` and the label `Person`. This results in the following traversal in the `explain` report:

```
Final Traversal
[NeptuneGraphQueryStep(Vertex) {
    JoinGroupNode {
        PatternNode[(?1, termid(1,URI), ?2, termid(0,URI)) . project distinct ?1 .], {estimatedCardinality=INFINITY}
    }, annotations={path=[Vertex(?1):GraphStep], maxVarId=4}
}, NeptuneTraverserConverterStep, NeptuneTailGlobalStep(10), NeptuneTinkerpopTraverserConverterStep, NeptuneSearchStep {
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
}]
```

## Example of using `explain` when the DFE is enabled
<a name="gremlin-explain-dfe"></a>

The following is an example of an `explain` report when the DFE alternative query engine is enabled:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().as("a").out().has("name", "josh").out().in().where(eq("a"))


Original Traversal
==================
[GraphStep(vertex,[])@[a], VertexStep(OUT,vertex), HasStep([name.eq(josh)]), VertexStep(OUT,vertex), VertexStep(IN,vertex), WherePredicateStep(eq(a))]

Converted Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, ?2, <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>) . project DISTINCT[?1] {rangeCountEstimate=unknown}],
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: HasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


Optimized Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: NeptuneHasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneMemoryTrackerStep,
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


WARNING: >> [NeptuneHasStep([name.eq(josh)]), WherePredicateStep(eq(a))] << (or one of the children for each step) is not supported natively yet

Predicates
==========
# of predicates: 8
```

See [Information in `explain`](#gremlin-explain-api-results) for a description of the DFE-specific sections in the report.

# Gremlin `profile` API in Neptune
<a name="gremlin-profile-api"></a>

The Neptune Gremlin `profile` API runs a specified Gremlin traversal, collects various metrics about the run, and produces a profile report as output.

It differs from the TinkerPop .profile() step so as to be able to report information specific to the Neptune engine.

The profile report includes the following information about the query plan:
+ The physical operator pipeline
+ The index operations for query execution and serialization
+ The size of the result

The `profile` API uses an extended version of the HTTP API syntax for query, with `/gremlin/profile` as the endpoint instead of `/gremlin`.

## Parameters specific to Neptune Gremlin `profile`
<a name="gremlin-profile-api-parameters"></a>
+ **profile.results** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `TRUE`.

  If true, the query results are gathered and displayed as part of the `profile` report. If false, only the result count is displayed.
+ **profile.chop** – `int`, default value: 250.

  If non-zero, causes the results string to be truncated at that number of characters. This does not keep all results from being captured. It simply limits the size of the string in the profile report. If set to zero, the string contains all the results.
+ **profile.serializer** – `string`, default value: `<null>`.

  If non-null, the gathered results are returned in a serialized response message in the format specified by this parameter. The number of index operations necessary to produce that response message is reported along with the size in bytes to be sent to the client.

  Allowed values are `<null>` or any of the valid MIME type or TinkerPop driver "Serializers" enum values.

  ```
  "application/json" or "GRAPHSON"
  "application/vnd.gremlin-v1.0+json" or "GRAPHSON_V1"
  "application/vnd.gremlin-v1.0+json;types=false" or "GRAPHSON_V1_UNTYPED"
  "application/vnd.gremlin-v2.0+json" or "GRAPHSON_V2"
  "application/vnd.gremlin-v2.0+json;types=false" or "GRAPHSON_V2_UNTYPED"
  "application/vnd.gremlin-v3.0+json" or "GRAPHSON_V3"
  "application/vnd.gremlin-v3.0+json;types=false" or "GRAPHSON_V3_UNTYPED"
  "application/vnd.graphbinary-v1.0" or "GRAPHBINARY_V1"
  ```
+ **profile.indexOps** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `FALSE`.

  If true, shows a detailed report of all index operations that took place during query execution and serialization. Warning: This report can be verbose.



## Sample output of Neptune Gremlin `profile`
<a name="gremlin-profile-sample-output"></a>

The following is a sample `profile` query.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query 'g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)' \
  --serializer "application/vnd.gremlin-v3.0+json"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery='g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)',
    serializer='application/vnd.gremlin-v3.0+json'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

------

This query generates the following `profile` report when executed on the air-routes sample graph from the blog post, [Let Me Graph That For You – Part 1 – Air Routes](https://aws.amazon.com/blogs/database/let-me-graph-that-for-you-part-1-air-routes/).

```
*******************************************************
                Neptune Gremlin Profile
*******************************************************

Query String
==================
g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([~label.eq(airport), code.eq(AUS)]), RepeatStep(emit(true),[VertexStep(IN,vertex), PathFilterStep(simple), RepeatEndStep],until(loops(2))), RangeGlobalStep(0,100)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true, joinTime=3, actualTotalOutput=1}
            PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true, joinTime=0, actualTotalOutput=61}
            RepeatNode {
                Repeat {
                    PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0, joinTime=3}
                }
                Emit {
                    Filter(true)
                }
                LoopsCondition {
                    LoopsFilter([?1, ?3],eq(2))
                }
            }, annotations={repeatMode=BFS, emitFirst=true, untilFirst=false, leftVar=?1, rightVar=?3}
        }, finishers=[limit(100)], annotations={path=[Vertex(?1):GraphStep, Repeat[Vertex(?3):VertexStep]], joinStats=true, optimizationTime=495, maxVarId=7, executionTime=323}
    },
    NeptuneTraverserConverterStep
]

Physical Pipeline
=================
NeptuneGraphQueryStep
    |-- StartOp
    |-- JoinGroupOp
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true})
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true})
        |-- RepeatOp
            |-- <upstream input> (Iteration 0) [visited=1, output=1 (until=0, emit=1), next=1]
            |-- BindingSetQueue (Iteration 1) [visited=61, output=61 (until=0, emit=61), next=61]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
            |-- BindingSetQueue (Iteration 2) [visited=38, output=38 (until=38, emit=0), next=0]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
        |-- LimitOp(100)

Runtime (ms)
============
Query Execution:  392.686
Serialization:   2636.380

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                        100         100         314.162    82.78
NeptuneTraverserConverterStep                                        100         100          65.333    17.22
                                            >TOTAL                     -           -         379.495        -

Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        1        1        0        1        1
        1       61       61        0       61       61
        2       38       38       38        0        0
------------------------------------------------------
               100      100       38       62       62

Predicates
==========
# of predicates: 16

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance

Results
=======
Count: 100
Output: [v[3], v[3600], v[3614], v[4], v[5], v[6], v[7], v[8], v[9], v[10], v[11], v[12], v[47], v[49], v[136], v[13], v[15], v[16], v[17], v[18], v[389], v[20], v[21], v[22], v[23], v[24], v[25], v[26], v[27], v[28], v[416], v[29], v[30], v[430], v[31], v[9...
Response serializer: GRYO_V3D0
Response size (bytes): 23566

Index Operations
================
Query execution:
    # of statement index ops: 3
    # of unique statement index ops: 3
    Duplication ratio: 1.0
    # of terms materialized: 0
Serialization:
    # of statement index ops: 200
    # of unique statement index ops: 140
    Duplication ratio: 1.43
    # of terms materialized: 393
```

In addition to the query plans returned by a call to Neptune `explain`, the `profile` results include runtime statistics around query execution. Each Join operation is tagged with the time it took to perform its join as well as the actual number of solutions that passed through it.

The `profile` output includes the time taken during the core query execution phase, as well as the serialization phase if the `profile.serializer` option was specified.

The breakdown of the index operations performed during each phase is also included at the bottom of the `profile` output.

Note that consecutive runs of the same query may show different results in terms of run-time and index operations because of caching.

For queries using the `repeat()` step, a breakdown of the frontier on each iteration is available if the `repeat()` step was pushed down as part of a `NeptuneGraphQueryStep`.

## Differences in `profile` reports when DFE is enabled
<a name="gremlin-profile-dfe-output"></a>

When the Neptune DFE alternative query engine is enabled, `profile` output is somewhat different:

**Optimized Traversal:** This section is similar to the one in `explain` output, but contains additional information. This includes the type of DFE operators that were considered in planning, and the associated worst case and best case cost estimates.

**Physical Pipeline:** This section captures the operators that are used to execute the query. `DFESubQuery` elements abstract the physical plan that is used by DFE to execute the portion of the plan it is responsible for. The `DFESubQuery` elements are unfolded in the following section where DFE statistics are listed.

**DFEQueryEngine Statistics:** This section shows up only when at least part of the query is executed by DFE. It outlines various runtime statistics that are specific to DFE, and contains a detailed breakdown of the time spent in the various parts of the query execution, by `DFESubQuery`.

Nested subqueries in different `DFESubQuery` elements are flattened in this section, and unique identifiers are marked with a header that starts with `subQuery=`.

**Traversal metrics:** This section shows step-level traversal metrics, and when the DFE engine runs all or part of the query, displays metrics for `DFEStep` and/or `NeptuneInterleavingStep`. See [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md).

**Note**  
DFE is an experimental feature released under lab mode, so the exact format of the `profile` output is still subject to change.

## Sample `profile` output when the Neptune Dataflow engine (DFE) is enabled
<a name="gremlin-profile-sample-dfe-output"></a>

When the DFE engine is being used to run Gremlin queries, output of the [Gremlin `profile` API](#gremlin-profile-api) is formatted as shown in the example below.

Query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery="g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

------

```
*******************************************************
                    Neptune Gremlin Profile
    *******************************************************

    Query String
    ==================
    g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()

    Original Traversal
    ==================
    [GraphStep(vertex,[]), HasStep([code.eq(ATL)]), VertexStep(OUT,vertex)]

    Optimized Traversal
    ===================
    Neptune steps:
    [
        DFEStep(Vertex) {
          DFENode {
            DFEJoinGroupNode[null](
              children=[
                DFEPatternNode((?1, vp://code[419430926], ?4, defaultGraph[526]) . project DISTINCT[?1] objectFilters=(in(ATL[452987149]) . ), {rangeCountEstimate=1},
                  opInfo=(type=PipelineJoin, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00))))),
                DFEPatternNode((?1, ?5, ?6, ?7) . project ALL[?1, ?6] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807})],
              opInfo=[
                OperatorInfoWithAlternative[
                  rec=(type=PipelineJoin, cost=(exp=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))),
                  alt=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))]])
          } [Vertex(?1):GraphStep, Vertex(?6):VertexStep]
        } ,
        NeptuneTraverserConverterDFEStep,
        DFECleanupStep
    ]


    Physical Pipeline
    =================
    DFEStep
        |-- DFESubQuery1

    DFEQueryEngine Statistics
    =================
    DFESubQuery1
    ╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection  │ solutions=[]                                                                                                 │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                       │ outSchema=[]                                                                                                 │      │          │           │        │           ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1 │ -    │ 1        │ 1         │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 3      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2 │ -    │ 1        │ 242       │ 242.00 │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 4      │ -      │ DFEMergeChunks        │ -                                                                                                            │ -    │ 242      │ 242       │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                   │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFEPipelineScan      │ pattern=Node(?1) with property 'code' as ?4 and label 'ALL' │ -    │ 0        │ 1         │ 0.00  │ 0.22      ║
    ║    │        │        │                      │ inlineFilters=[(?4 IN ["ATL"])]                             │      │          │           │       │           ║
    ║    │        │        │                      │ patternEstimate=1                                           │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEMergeChunks       │ -                                                           │ -    │ 1        │ 1         │ 1.00  │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 2  │ 4      │ -      │ DFERelationalJoin    │ joinVars=[]                                                 │ -    │ 2        │ 1         │ 0.50  │ 0.09      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 3  │ 2      │ -      │ DFESolutionInjection │ solutions=[]                                                │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
    ║    │        │        │                      │ outSchema=[]                                                │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain             │ -                                                           │ -    │ 1        │ 0         │ 0.00  │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                           │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection │ solutions=[]                        │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                      │ outSchema=[?1]                      │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ 3      │ DFETee               │ -                                   │ -    │ 1        │ 2         │ 2.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 4      │ -      │ DFEDistinctColumn    │ column=?1                           │ -    │ 1        │ 1         │ 1.00   │ 0.21      ║
    ║    │        │        │                      │ ordered=false                       │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 5      │ -      │ DFEHashIndexBuild    │ vars=[?1]                           │ -    │ 1        │ 1         │ 1.00   │ 0.03      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ 5      │ -      │ DFEPipelineJoin      │ pattern=Edge((?1)-[?7:?5]->(?6))    │ -    │ 1        │ 242       │ 242.00 │ 0.51      ║
    ║    │        │        │                      │ constraints=[]                      │      │          │           │        │           ║
    ║    │        │        │                      │ patternEstimate=9223372036854775807 │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 5  │ 6      │ 7      │ DFESync              │ -                                   │ -    │ 243      │ 243       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 6  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 1        │ 1         │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 7  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 242      │ 242       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 8  │ 9      │ -      │ DFEHashIndexJoin     │ -                                   │ -    │ 243      │ 242       │ 1.00   │ 0.31      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 9  │ -      │ -      │ DFEDrain             │ -                                   │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    Runtime (ms)
    ============
    Query Execution: 11.744

    Traversal Metrics
    =================
    Step                                                               Count  Traversers       Time (ms)    % Dur
    -------------------------------------------------------------------------------------------------------------
    DFEStep(Vertex)                                                      242         242          10.849    95.48
    NeptuneTraverserConverterDFEStep                                     242         242           0.514     4.52
                                                >TOTAL                     -           -          11.363        -

    Predicates
    ==========
    # of predicates: 18

    Results
    =======
    Count: 242


    Index Operations
    ================
    Query execution:
        # of statement index ops: 0
        # of terms materialized: 0
```

**Note**  
Because the DFE engine is an experimental feature released in lab mode, the exact format of the `profile` output is subject to change.

# Tuning Gremlin queries using `explain` and `profile`
<a name="gremlin-traversal-tuning"></a>

You can often tune your Gremlin queries in Amazon Neptune to get better performance, using the information available to you in the reports you get from the Neptune [explain](gremlin-explain-api.md) and [profile](gremlin-profile-api.md) APIs. To do so, it helps to understand how Neptune processes Gremlin traversals.

**Important**  
A change was made in TinkerPop version 3.4.11 that improves correctness of how queries are processed, but for the moment can sometimes seriously impact query performance.  
For example, a query of this sort may run significantly slower:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  out()
```
The vertices after the limit step are now fetched in a non-optimal way beause of the TinkerPop 3.4.11 change. To avoid this, you can modify the query by adding the barrier() step at any point after the `order().by()`. For example:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  barrier().
  out()
```
TinkerPop 3.4.11 was enabled in Neptune [engine version 1.0.5.0](engine-releases-1.0.5.0.md).

## Understanding Gremlin traversal processing in Neptune
<a name="gremlin-traversal-processing"></a>

When a Gremlin traversal is sent to Neptune, there are three main processes that transform the traversal into an underlying execution plan for the engine to execute. These are parsing, conversion, and optimization:

![\[3 processes transform a Gremlin query into an execution plan.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_traversal_processing.png)


### The traversal parsing process
<a name="gremlin-traversal-processing-parsing"></a>

The first step in processing a traversal is to parse it into a common language. In Neptune, that common language is the set of TinkerPop steps that are part of the [TinkerPop API](http://tinkerpop.apache.org/javadocs/3.4.8/full/org/apache/tinkerpop/gremlin/process/traversal/Step.html). Each of these steps represents a unit of computation within the traversal.

You can send a Gremlin traversal to Neptune either as a string or as bytecode. The REST endpoint and the Java client driver `submit()` method send traversals as strings, as in this example:

```
client.submit("g.V()")
```

Applications and language drivers using [Gremlin language variants (GLV)](https://tinkerpop.apache.org/docs/current/tutorials/gremlin-language-variants/) send traversals in bytecode.

### The traversal conversion process
<a name="gremlin-traversal-processing-conversion"></a>

The second step in processing a traversal is to convert its TinkerPop steps into a set of converted and non-converted Neptune steps. Most steps in the Apache TinkerPop Gremlin query language are converted to Neptune-specific steps that are optimized to run on the underlying Neptune engine. When a TinkerPop step without a Neptune equivalent is encountered in a traversal, that step and all subsequent steps in the traversal are processed by the TinkerPop query engine.

For more information about what steps can be converted under what circumstances, see [Gremlin step support](gremlin-step-support.md).

### The traversal optimization process
<a name="gremlin-traversal-processing-optimization"></a>

The final step in traversal processing is to run the series of converted and non-converted steps through the optimizer, to try to determine the best execution plan. The output of this optimization is the execution plan that the Neptune engine processes.

## Using the Neptune Gremlin `explain` API to tune queries
<a name="gremlin-traversal-tuning-explain"></a>

The Neptune explain API is not the same as the Gremlin `explain()` step. It returns the final execution plan that the Neptune engine would process when executing the query. Because it does not perform any processing, it returns the same plan regardless of the parameters used, and its output contains no statistics about actual execution.

Consider the following simple traversal that finds all the airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

There are two ways you can run this traversal through the Neptune `explain` API. The first way is to make a REST call to the explain endpoint, like this:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain -d '{"gremlin":"g.V().has('code','ANC')"}'
```

The second way is to use the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `explain` parameter. This passes the traversal contained in the cell body to the Neptune `explain` API and then displays the resulting output when you run the cell:

```
%%gremlin explain

g.V().has('code','ANC')
```

The resulting `explain` API output describes Neptune's execution plan for the traversal. As you can see in the image below, the plan includes each of the 3 steps in the processing pipeline:

![\[Explain API output for a simple Gremlin traversal.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_1.png)


### Tuning a traversal by looking at steps that are not converted
<a name="gremlin-traversal-tuning-explain-non-converted-steps"></a>

One of the first things to look for in the Neptune `explain` API output is for Gremlin steps that are not converted to Neptune native steps. In a query plan, when a step is encountered that cannot be converted to a Neptune native step, it and all subsequent steps in the plan are processed by the Gremlin server.

In the example above, all steps in the traversal were converted. Let's examine `explain` API output for this traversal:

```
g.V().has('code','ANC').out().choose(hasLabel('airport'), values('code'), constant('Not an airport'))
```

As you can see in the image below, Neptune could not convert the `choose()` step:

![\[Explain API output in which not all steps can be converted.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_2.png)


There are several things you could do to tune the performance of the traversal. The first would be to rewrite it in such a way as to eliminate the step that could not be converted. Another would be to move the step to the end of the traversal so that all other steps can be converted to native ones.

A query plan with steps that are not converted does not always need to be tuned. If the steps that cannot be converted are at the end of the traversal, and are related to how output is formatted rather than how the graph is traversed, they may have little effect on performance.

### 
<a name="gremlin-traversal-tuning-explain-unindexed-lookups"></a>

Another thing to look for when examining output from the Neptune `explain` API is steps that do not use indexes. The following traversal finds all airports with flights that land in Anchorage:

```
g.V().has('code','ANC').in().values('code')
```

Output from the explain API for this traversal is:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in().values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
```

The `WARNING` message at the bottom of the output occurs because the `in()` step in the traversal cannot be handled using one of the 3 indexes that Neptune maintains (see [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md) and [Gremlin statements in Neptune](gremlin-explain-background-statements.md)). Because the `in()` step contains no edge filter, it cannot be resolved using the `SPOG`, `POGS` or `GPSO` index. Instead, Neptune must perform a union scan to find the requested vertices, which is much less efficient.

There are two ways to tune the traversal in this situation. The first is to add one or more filtering criteria to the `in()` step so that an indexed lookup can be used to resolve the query. For the example above, this might be:

```
g.V().has('code','ANC').in('route').values('code')
```

Output from the Neptune `explain` API for the revised traversal no longer contains the `WARNING` message:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in('route').values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,[route],vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . ContainsFilter(?5 in (<route>)) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5=<route>, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=32042}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26
```

Another option if you are running many traversals of this kind is to run them in a Neptune DB cluster that has the optional `OSGP` index enabled (see [Enabling an OSGP Index](feature-overview-storage-indexing.md#feature-overview-storage-indexing-osgp)). Enabling an `OSGP` index has drawbacks:
+ It must be enabled in a DB cluster before any data is loaded.
+ Insertion rates for vertices and edges may slow by up to 23%.
+ Storage usage will increase by around 20%.
+ Read queries that scatter requests across all indexes may have increased latencies.

Having an `OSGP` index makes a lot of sense for a restricted set of query patterns, but unless you are running those frequently, it is usually preferable to try to ensure that the traversals you write can be resolved using the three primary indexes.

### Using a large number of predicates
<a name="gremlin-traversal-tuning-explain-many-predicates"></a>

Neptune treats each edge label and each distinct vertex or edge property name in your graph as a predicate, and is designed by default to work with a relatively low number of distinct predicates. When you have more than a few thousand predicates in your graph data, performance can degrade.

Neptune `explain` output will warn you if this is the case:

```
Predicates
==========
# of predicates: 9549
WARNING: high predicate count (# of distinct property names and edge labels)
```

If it is not convenient to rework your data model to reduce the number of labels and properties, and therefore the number of predicates, the best way to tune traversals is to run them in a DB cluster that has the `OSGP` index enabled, as discussed above.

## Using the Neptune Gremlin `profile` API to tune traversals
<a name="gremlin-traversal-tuning-profile"></a>

The Neptune `profile` API is quite different from the Gremlin `profile()` step. Like the `explain` API, its output includes the query plan that the Neptune engine uses when executing the traversal. In addition, the `profile` output includes actual execution statistics for the traversal, given how its parameters are set.

Again, take the simple traversal that finds all airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

As with the `explain` API, you can invoke the `profile` API using a REST call:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile -d '{"gremlin":"g.V().has('code','ANC')"}'
```

You use also the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `profile` parameter. This passes the traversal contained in the cell body to the Neptune `profile` API and then displays the resulting output when you run the cell:

```
%%gremlin profile

g.V().has('code','ANC')
```

The resulting `profile` API output contains both Neptune's execution plan for the traversal and statistics about the plan's execution, as you can see in this image:

![\[An example of Neptune profile API output.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_profile_output_1.png)


In `profile` output, the execution plan section only contains the final execution plan for the traversal, not the intermediate steps. The pipeline section contains the physical pipeline operations that were performed as well as the actual time (in milliseconds) that traversal execution took. The runtime metric is extremely helpful in comparing the times that two different versions of a traversal take as you are optimizing them.

**Note**  
The initial runtime of a traversal is generally longer than subsequent runtimes, because the first one causes the relevant data to be cached.

The third section of the `profile` output contains execution statistics and the results of the traversal. To see how this information can be useful in tuning a traversal, consider the following traversal, which finds every airport whose name begins with "Anchora", and all the airports reachable in two hops from those airports, returning airport codes, flight routes, and distances:

```
%%gremlin profile

g.withSideEffect("Neptune#fts.endpoint", "{your-OpenSearch-endpoint-URL").
    V().has("city", "Neptune#fts Anchora~").
    repeat(outE('route').inV().simplePath()).times(2).
    project('Destination', 'Route').
        by('code').
        by(path().by('code').by('dist'))
```

### Traversal metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-traversal-metrics"></a>

The first set of metrics that is available in all `profile` output is the traversal metrics. These are similar to the Gremlin `profile()` step metrics, with a few differences:

```
Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                       3856        3856          91.701     9.09
NeptuneTraverserConverterStep                                       3856        3856          38.787     3.84
ProjectStep([Destination, Route],[value(code), ...                  3856        3856         878.786    87.07
  PathStep([value(code), value(dist)])                              3856        3856         601.359
                                            >TOTAL                     -           -        1009.274        -
```

The first column of the traversal-metrics table lists the steps executed by the traversal. The first two steps are generally the Neptune-specific steps, `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep`.

`NeptuneGraphQueryStep` represents the execution time for the entire portion of the traversal that could be converted and executed natively by the Neptune engine.

`NeptuneTraverserConverterStep` represents the process of converting the output of those converted steps into TinkerPop traversers which allow steps that could not be converted steps, if any, to be processed, or to return the results in a TinkerPop-compatible format.

In the example above, we have several non-converted steps, so we see that each of these TinkerPop steps (`ProjectStep`, `PathStep`) then appears as a row in the table.

The second column in the table, `Count`, reports the number of *represented* traversers that passed through the step, while the third column, `Traversers`, reports the number of traversers which passed through that step, as explained in the [TinkerPop profile step documentation](https://tinkerpop.apache.org/docs/current/reference/#profile-step).

In our example there are 3,856 vertices and 3,856 traversers returned by the `NeptuneGraphQueryStep`, and these numbers remain the same throughout the remaining processing because `ProjectStep` and `PathStep` are formatting the results, not filtering them.

**Note**  
Unlike TinkerPop, the Neptune engine does not optimize performance by *bulking* in its `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep` steps. Bulking is the TinkerPop operation that combines traversers on the same vertex to reduce operational overhead, and that is what causes the `Count` and `Traversers` numbers to differ. Because bulking only occurs in steps that Neptune delegates to TinkerPop, and not in steps that Neptune handles natively, the `Count` and `Traverser` columns seldom differ.

The Time column reports the number of milliseconds that the step took, and the the `% Dur` column reports what percent of the total processing time the step took. These are the metrics that tell you where to focus your tuning efforts by showing the steps that took the most time.

### Index operation metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-index-operations"></a>

Another set of metrics in the output of the Neptune profile API is the index operations:

```
Index Operations
================
Query execution:
    # of statement index ops: 23191
    # of unique statement index ops: 5960
    Duplication ratio: 3.89
    # of terms materialized: 0
```

These report:
+ The total number of index lookups.
+ The number of unique index lookups performed.
+ The ratio of total index lookups to unique ones. A lower ratio indicates less redundancy.
+ The number of terms materialized from the term dictionary.

### Repeat metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-repeat-metrics"></a>

If your traversal uses a `repeat()` step as in the example above, then a section containing repeat metrics appears in the `profile` output:

```
Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        2        0        0        0        2
        1       53        0        0        0       53
        2     3856     3856     3856        0        0
------------------------------------------------------
              3911     3856     3856        0       55
```

These report:
+ The loop count for a row (the `Iteration` column).
+ The number of elements visited by the loop (the `Visited` column).
+ The number of elements output by the loop (the `Output` column).
+ The last element output by the loop (the `Until` column).
+ The number of elements emitted by the loop (the `Emit` column).
+ The number of elements passed from the loop to the subsequent loop (the `Next` column).

These repeat metrics are very helpful in understanding the branching factor of your traversal, to get a feeling for how much work is being done by the database. You can use these numbers to diagnose performance problems, especially when the same traversal performs dramatically differently with different parameters.

### Full-text search metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-fts-metrics"></a>

When a traversal uses a [full-text search](full-text-search.md) lookup, as in the example above, then a section containing the full-text search (FTS) metrics appears in the `profile` output:

```
FTS Metrics
==============
SearchNode[(idVar=?1, query=Anchora~, field=city) . project ?1 .],
    {endpoint=your-OpenSearch-endpoint-URL, incomingSolutionsThreshold=1000, estimatedCardinality=INFINITY,
    remoteCallTimeSummary=[total=65, avg=32.500000, max=37, min=28],
    remoteCallTime=65, remoteCalls=2, joinTime=0, indexTime=0, remoteResults=2}

    2 result(s) produced from SearchNode above
```

This shows the query sent to the ElasticSearch (ES) cluster and reports several metrics about the interaction with ElasticSearch that can help you pinpoint performance problems relating to full-text search:
+ Summary information about the calls into the ElasticSearch index:
  + The total number of milliseconds required by all remoteCalls to satisfy the query (`total`).
  + The average number of milliseconds spent in a remoteCall (`avg`).
  + The minimum number of milliseconds spent in a remoteCall (`min`).
  + The maximum number of milliseconds spent in a remoteCall (`max`).
+ Total time consumed by remoteCalls to ElasticSearch (`remoteCallTime`).
+ The number of remoteCalls made to ElasticSearch (`remoteCalls`).
+ The number of milliseconds spent in joins of ElasticSearch results (`joinTime`).
+ The number of milliseconds spent in index lookups (`indexTime`).
+ The total number of results returned by ElasticSearch (`remoteResults`).

# Native Gremlin step support in Amazon Neptune
<a name="gremlin-step-support"></a>

The Amazon Neptune engine does not currently have full native support for all Gremlin steps, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). Current support falls into four categories:
+ [Gremlin steps that can always be converted to native Neptune engine operations](#gremlin-steps-always)
+ [Gremlin steps that can be converted to native Neptune engine operations in some cases](#gremlin-steps-sometimes) 
+ [Gremlin steps that are never converted to native Neptune engine operations](#gremlin-steps-never) 
+ [Gremlin steps that are not supported in Neptune at all](#neptune-gremlin-steps-unsupported) 

## Gremlin steps that can always be converted to native Neptune engine operations
<a name="gremlin-steps-always"></a>

Many Gremlin steps can be converted to native Neptune engine operations as long as they meet the following conditions:
+ They are not preceded in the query by a step that cannot be converted.
+ Their parent step, if any, can be converted,
+ All their child traversals, if any, can be converted.

The following Gremlin steps are always converted to native Neptune engine operations if they meet those conditions:
+ [and( )](http://tinkerpop.apache.org/docs/current/reference/#and-step)
+ [as( )](http://tinkerpop.apache.org/docs/current/reference/#as-step)
+ [count( )](http://tinkerpop.apache.org/docs/current/reference/#count-step)
+ [E( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [emit( )](http://tinkerpop.apache.org/docs/current/reference/#emit-step)
+ [explain( )](http://tinkerpop.apache.org/docs/current/reference/#explain-step)
+ [group( )](http://tinkerpop.apache.org/docs/current/reference/#group-step)
+ [groupCount( )](http://tinkerpop.apache.org/docs/current/reference/#groupcount-step)
+ [identity( )](http://tinkerpop.apache.org/docs/current/reference/#identity-step)
+ [is( )](http://tinkerpop.apache.org/docs/current/reference/#is-step)
+ [key( )](http://tinkerpop.apache.org/docs/current/reference/#key-step)
+ [label( )](http://tinkerpop.apache.org/docs/current/reference/#label-step)
+ [limit( )](http://tinkerpop.apache.org/docs/current/reference/#limit-step)
+ [local( )](http://tinkerpop.apache.org/docs/current/reference/#local-step)
+ [loops( )](http://tinkerpop.apache.org/docs/current/reference/#loops-step)
+ [not( )](http://tinkerpop.apache.org/docs/current/reference/#not-step)
+ [or( )](http://tinkerpop.apache.org/docs/current/reference/#or-step)
+ [profile( )](http://tinkerpop.apache.org/docs/current/reference/#profile-step)
+ [properties( )](http://tinkerpop.apache.org/docs/current/reference/#properties-step)
+ [subgraph( )](http://tinkerpop.apache.org/docs/current/reference/#subgraph-step)
+ [until( )](http://tinkerpop.apache.org/docs/current/reference/#until-step)
+ [V( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [value( )](http://tinkerpop.apache.org/docs/current/reference/#value-step)
+ [valueMap( )](http://tinkerpop.apache.org/docs/current/reference/#valuemap-step)
+ [values( )](http://tinkerpop.apache.org/docs/current/reference/#values-step)

## Gremlin steps that can be converted to native Neptune engine operations in some cases
<a name="gremlin-steps-sometimes"></a>

Some Gremlin steps can be converted to native Neptune engine operations in some situations but not in others:
+ [addE( )](http://tinkerpop.apache.org/docs/current/reference/#addedge-step)   –   The `addE()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key.
+ [addV( )](http://tinkerpop.apache.org/docs/current/reference/#addvertex-step)   –   The `addV()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key, or unless multiple labels are assigned.
+ [aggregate( )](http://tinkerpop.apache.org/docs/current/reference/#store-step)   –   The `aggregate()` step can generally be converted to a native Neptune engine operation, unless the step is used in a child traversal or sub-traversal, or unless the value being stored is something other than a vertex, edge, id, label or property value.

  In example below, `aggregate()` is not converted because it is being used in a child traversal:

  ```
  g.V().has('code','ANC').as('a')
       .project('flights').by(select('a')
       .outE().aggregate('x'))
  ```

  In this example, aggregate() is not converted because what is stored is the `min()` of a value:

  ```
  g.V().has('code','ANC').outE().aggregate('x').by(values('dist').min())
  ```
+ [barrier( )](http://tinkerpop.apache.org/docs/current/reference/#barrier-step)   –   The `barrier()` step can generally be converted to a native Neptune engine operation, unless the step following it is not converted.
+ [cap( )](http://tinkerpop.apache.org/docs/current/reference/#cap-step)   –   The only case in which the `cap()` step is converted is when it is combined with the `unfold()` step to return an unfolded version of an aggregate of vertex, edge, id, or poperty values. In this example, `cap()` will be converted because it is followed by `.unfold()`:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```

  However, if you remove the `.unfold()`, `cap()` will not be converted:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport')
  ```
+ [coalesce( )](http://tinkerpop.apache.org/docs/current/reference/#coalesce-step)   –   The only case where the `coalesce()` step is converted is when it follows the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/). Other coalesce() patterns are not allowed. Conversion is limited to the case where all child traversals can be converted, they all produce the same type as output (vertex, edge, id, value, key, or label), they all traverse to a new element, and they do not contain the `repeat()` step.
+ [constant( )](http://tinkerpop.apache.org/docs/current/reference/#constant-step)   –   The constant() step is currently only converted if it is used within a `sack().by()` part of a traversal to assign a constant value, like this:

  ```
  g.V().has('code','ANC').sack(assign).by(constant(10)).out().limit(2)
  ```
+ [cyclicPath( )](http://tinkerpop.apache.org/docs/current/reference/#cyclicpath-step)   –   The `cyclicPath()` step can generally be converted to a native Neptune engine operation, unless the step is used with `by()`, `from()`, or `to()` modulators. In the following queries, for example, `cyclicPath()` is not converted:

  ```
  g.V().has('code','ANC').as('a').out().out().cyclicPath().by('code')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().from('a')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().to('a')
  ```
+ [drop( )](http://tinkerpop.apache.org/docs/current/reference/#drop-step)   –   The `drop()` step can generally be converted to a native Neptune engine operation, unless the step is used inside a `sideEffect(`) or `optional()` step.
+ [fold( )](http://tinkerpop.apache.org/docs/current/reference/#fold-step)   –   There are only two situations where the fold() step can be converted, namely when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used in a `group().by()` context like this:

  ```
  g.V().has('code','ANC').out().group().by().by(values('code', 'city').fold())
  ```
+  [has( )](http://tinkerpop.apache.org/docs/current/reference/#has-step)   –   The `has()` step can generally be converted to a native Neptune engine operation provided queries with `T` use the predicate `P.eq`, `P.neq` or `P.contains`. Expect variations of `has()` that imply those instances of `P` to convert to native as well, such as `hasId('id1234')` which is equivalent to `has(eq, T.id, 'id1234')`. 
+ [id( )](http://tinkerpop.apache.org/docs/current/reference/#id-step)   –   The `id()` step is converted unless it is used on a property, like this:

  ```
  g.V().has('code','ANC').properties('code').id()
  ```
+  [mergeE()](https://tinkerpop.apache.org/docs/current/reference/#mergeedge-step)   –   The `mergeE()` step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting edges ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-edges) can be converted. 
+  [mergeV()](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step)   –   The mergeV() step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting vertices ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-vertices) can be converted. 
+ [order( )](http://tinkerpop.apache.org/docs/current/reference/#order-step)   –   The `order()` step can generally be converted to a native Neptune engine operation, unless one of the following is true:
  + The `order()` step is within a nested child traversal, like this:

    ```
    g.V().has('code','ANC').where(V().out().order().by(id))
    ```
  + Local ordering is being used, as for example with `order(local)`.
  + A custom comparator is being used in the `by()` modulation to order by. An example is this use of `sack()`:

    ```
    g.withSack(0).
      V().has('code','ANC').
          repeat(outE().sack(sum).by('dist').inV()).times(2).limit(10).
          order().by(sack())
    ```
  + There are multiple orderings on the same element.
+ [project( )](http://tinkerpop.apache.org/docs/current/reference/#project-step)   –   The `project()` step can generally be converted to a native Neptune engine operation, unless the number of `by()` statements following the `project()` does not match the number of labels specified, as here:

  ```
  g.V().has('code','ANC').project('x', 'y').by(id)
  ```
+ [range( )](http://tinkerpop.apache.org/docs/current/reference/#range-step)   –   The `range()` step is only converted when the lower end of the range in question is zero (for example, `range(0,3)`).
+ [repeat( )](http://tinkerpop.apache.org/docs/current/reference/#repeat-step)   –   The `repeat()` step can generally be converted to a native Neptune engine operation, unless it is nested within another `repeat()` step, like this:

  ```
  g.V().has('code','ANC').repeat(out().repeat(out()).times(2)).times(2)
  ```
+ [sack( )](http://tinkerpop.apache.org/docs/current/reference/#sack-step)   –   The `sack()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + If a non-numeric sack operator is being used.
  + If a numeric sack operator other than `+`, `-`, `mult`, `div`, `min` and `max` is being used.
  + If `sack()` is used inside a `where()` step to filter based on a sack value, as here:

    ```
    g.V().has('code','ANC').sack(assign).by(values('code')).where(sack().is('ANC'))
    ```
+ [sum( )](http://tinkerpop.apache.org/docs/current/reference/#sum-step)   –   The `sum()` step can generally be converted to a native Neptune engine operation, but not when used to calculate a global summation, like this:

  ```
  g.V().has('code','ANC').outE('routes').values('dist').sum()
  ```
+ [union( )](http://tinkerpop.apache.org/docs/current/reference/#union-step)   –   The `union()` step can be converted to a native Neptune engine operation as long as it is the last step in the query aside from the terminal step.
+ [unfold( )](http://tinkerpop.apache.org/docs/current/reference/#unfold-step)   –   The `unfold()` step can only be converted to a native Neptune engine operation when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used together with `cap()` like this:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```
+ [where( )](http://tinkerpop.apache.org/docs/current/reference/#where-step)   –   The `where()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + When by() modulations are used, like this:

    ```
    g.V().hasLabel('airport').as('a')
         .where(gt('a')).by('runways')
    ```
  + When comparison operators other than `eq`, `neq`, `within`, and `without` are used.
  + When user-supplied aggregations are used.

## Gremlin steps that are never converted to native Neptune engine operations
<a name="gremlin-steps-never"></a>

The following Gremlin steps are supported in Neptune but are never converted to native Neptune engine operations. Instead, they are executed by the Gremlin server.
+ [choose( )](http://tinkerpop.apache.org/docs/current/reference/#choose-step)
+ [coin( )](http://tinkerpop.apache.org/docs/current/reference/#coin-step)
+ [inject( )](http://tinkerpop.apache.org/docs/current/reference/#inject-step)
+ [match( )](http://tinkerpop.apache.org/docs/current/reference/#match-step)
+ [math( )](http://tinkerpop.apache.org/docs/current/reference/#math-step)
+ [max( )](http://tinkerpop.apache.org/docs/current/reference/#max-step)
+ [mean( )](http://tinkerpop.apache.org/docs/current/reference/#mean-step)
+ [min( )](http://tinkerpop.apache.org/docs/current/reference/#min-step)
+ [option( )](http://tinkerpop.apache.org/docs/current/reference/#option-step)
+ [optional( )](http://tinkerpop.apache.org/docs/current/reference/#optional-step)
+ [path( )](http://tinkerpop.apache.org/docs/current/reference/#path-step)
+ [propertyMap( )](http://tinkerpop.apache.org/docs/current/reference/#propertymap-step)
+ [sample( )](http://tinkerpop.apache.org/docs/current/reference/#sample-step)
+ [skip( )](http://tinkerpop.apache.org/docs/current/reference/#skip-step)
+ [tail( )](http://tinkerpop.apache.org/docs/current/reference/#tail-step)
+ [timeLimit( )](http://tinkerpop.apache.org/docs/current/reference/#timelimit-step)
+ [tree( )](http://tinkerpop.apache.org/docs/current/reference/#tree-step)

## Gremlin steps that are not supported in Neptune at all
<a name="neptune-gremlin-steps-unsupported"></a>

The following Gremlin steps are not supported at all in Neptune. In most cases this is because they require a `GraphComputer`, which Neptune does not currently support.
+ [connectedComponent( )](http://tinkerpop.apache.org/docs/current/reference/#connectedcomponent-step)
+ [io( )](http://tinkerpop.apache.org/docs/current/reference/#io-step)
+ [shortestPath( )](http://tinkerpop.apache.org/docs/current/reference/#shortestpath-step)
+ [withComputer( )](http://tinkerpop.apache.org/docs/current/reference/#with-step)
+ [pageRank( )](http://tinkerpop.apache.org/docs/current/reference/#pagerank-step)
+ [peerPressure( )](http://tinkerpop.apache.org/docs/current/reference/#peerpressure-step)
+ [program( )](http://tinkerpop.apache.org/docs/current/reference/#program-step)

The `io()` step is actually partially supported, in that it can be used to `read()` from a URL but not to `write()`.

# Using Gremlin with the Neptune DFE query engine
<a name="gremlin-with-dfe"></a>

If you enable Neptunes [alternative query engine](neptune-dfe-engine.md) known as the DFE in [lab mode](features-lab-mode.md) (by setting the `neptune_lab_mode` DB cluster parameter to `DFEQueryEngine=enabled`), then Neptune translates read-only Gremlin queries/traversals into an intermediate logical representation and runs them on the DFE engine whenever possible.

However, the DFE does not yet support all Gremlin steps. When a step can't be run natively on the DFE, Neptune falls back on TinkerPop to run the step. The `explain` and `profile` reports include warnings when this happens.

# Gremlin step coverage in DFE
<a name="gremlin-step-coverage-in-DFE"></a>

 Gremlin DFE is a labmode feature and can be used by either enabling the cluster parameter or using the `Neptune#useDFE` query hint. For more information please refer to [ Using Gremlin with the Neptune DFE query engine](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-with-dfe.html). 

 The following steps are available to use in Gremlin DFE. 

## Path and traversal steps:
<a name="DFE-path-and-traversal"></a>

 [asDate()](https://tinkerpop.apache.org/docs/current/reference/#asDate-step), [barrier()](https://tinkerpop.apache.org/docs/current/reference/#barrier-step), [call()](https://tinkerpop.apache.org/docs/current/reference/#call-step), [cap()](https://tinkerpop.apache.org/docs/current/reference/#cap-step), [dateAdd()](https://tinkerpop.apache.org/docs/current/reference/#dateadd-step), [dateDiff()](https://tinkerpop.apache.org/docs/current/reference/#datediff-step), [disjunct()](https://tinkerpop.apache.org/docs/current/reference/#disjunct-step), [drop()](https://tinkerpop.apache.org/docs/current/reference/#drop-step), [fail()](https://tinkerpop.apache.org/docs/current/reference/#fail-step), [filter()](https://tinkerpop.apache.org/docs/current/reference/#filter-step), [flatMap()](https://tinkerpop.apache.org/docs/current/reference/#flatmap-step), [id()](https://tinkerpop.apache.org/docs/current/reference/#id-step), [identity()](https://tinkerpop.apache.org/docs/current/reference/#identity-step), [index()](https://tinkerpop.apache.org/docs/current/reference/#index-step), [intersect()](https://tinkerpop.apache.org/docs/current/reference/#intersect-step), [inject()](https://tinkerpop.apache.org/docs/current/reference/#inject-step), [label()](https://tinkerpop.apache.org/docs/current/reference/#label-step), [length()](https://tinkerpop.apache.org/docs/current/reference/#length-step), [loops()](https://tinkerpop.apache.org/docs/current/reference/#loops-step), [map()](https://tinkerpop.apache.org/docs/current/reference/#map-step), [order()](https://tinkerpop.apache.org/docs/current/reference/#order-step), [order(local)](https://tinkerpop.apache.org/docs/current/reference/#order-step), [path()](https://tinkerpop.apache.org/docs/current/reference/#path-step), [project()](https://tinkerpop.apache.org/docs/current/reference/#project-step), [range()](https://tinkerpop.apache.org/docs/current/reference/#range-step), [repeat()](https://tinkerpop.apache.org/docs/current/reference/#repeat-step), [reverse()](https://tinkerpop.apache.org/docs/current/reference/#reverse-step), [sack()](https://tinkerpop.apache.org/docs/current/reference/#sack-step), [sample()](https://tinkerpop.apache.org/docs/current/reference/#sample-step), [select()](https://tinkerpop.apache.org/docs/current/reference/#select-step), [sideEffect()](https://tinkerpop.apache.org/docs/current/reference/#sideeffect-step), [split()](https://tinkerpop.apache.org/docs/current/reference/#split-step), [unfold()](https://tinkerpop.apache.org/docs/current/reference/#unfold-step), [union()](https://tinkerpop.apache.org/docs/current/reference/#union-step) 

## Aggregate and collection steps:
<a name="DFE-aggregate-and-collection"></a>

 [aggregate(global)](https://tinkerpop.apache.org/docs/current/reference/#aggregate-step), [combine()](https://tinkerpop.apache.org/docs/current/reference/#combine-step), [count()](https://tinkerpop.apache.org/docs/current/reference/#count-step), [dedup()](https://tinkerpop.apache.org/docs/current/reference/#dedup-step), [dedup(local)](https://tinkerpop.apache.org/docs/current/reference/#dedup-step), [fold()](https://tinkerpop.apache.org/docs/current/reference/#fold-step), [group()](https://tinkerpop.apache.org/docs/current/reference/#group-step), [groupCount()](https://tinkerpop.apache.org/docs/current/reference/#groupcount-step), 

## Mathematical steps:
<a name="DFE-mathematical"></a>

 [max()](https://tinkerpop.apache.org/docs/current/reference/#max-step), [mean()](https://tinkerpop.apache.org/docs/current/reference/#mean-step), [min()](https://tinkerpop.apache.org/docs/current/reference/#min-step), [sum()](https://tinkerpop.apache.org/docs/current/reference/#sum-step) 

## Element steps:
<a name="DFE-element"></a>

 [otherV()](https://tinkerpop.apache.org/docs/current/reference/#otherv-step), [elementMap()](https://tinkerpop.apache.org/docs/current/reference/#elementmap-step), [element()](https://tinkerpop.apache.org/docs/current/reference/#element-step), [v()](https://tinkerpop.apache.org/docs/current/reference/#graph-step), [ out(), in(), both(), outE(), inE(), bothE(), outV(), inV(), bothV(), otherV()](https://tinkerpop.apache.org/docs/current/reference/#vertex-step) 

## Property steps:
<a name="DFE-property"></a>

 [properties()](https://tinkerpop.apache.org/docs/current/reference/#properties-step), [key()](https://tinkerpop.apache.org/docs/current/reference/#key-step), [valueMap()](https://tinkerpop.apache.org/docs/current/reference/#propertymap-step), [value()](https://tinkerpop.apache.org/docs/current/reference/#value-step) 

## Filter steps:
<a name="DFE-filter"></a>

 [and()](https://tinkerpop.apache.org/docs/current/reference/#and-step), [coalesce()](https://tinkerpop.apache.org/docs/current/reference/#coalesce-step), [coin()](https://tinkerpop.apache.org/docs/current/reference/#coin-step), [has()](https://tinkerpop.apache.org/docs/current/reference/#has-step), [is()](https://tinkerpop.apache.org/docs/current/reference/#is-step), [local()](https://tinkerpop.apache.org/docs/current/reference/#local-step), [none()](https://tinkerpop.apache.org/docs/current/reference/#none-step), [not()](https://tinkerpop.apache.org/docs/current/reference/#not-step), [or()](https://tinkerpop.apache.org/docs/current/reference/#or-step), [where()](https://tinkerpop.apache.org/docs/current/reference/#where-step) 

## String manipulation steps:
<a name="DFE-string-manipulation"></a>

 [concat()](https://tinkerpop.apache.org/docs/current/reference/#concat-step), [lTrim()](https://tinkerpop.apache.org/docs/current/reference/#lTrim-step), [rTrim()](https://tinkerpop.apache.org/docs/current/reference/#rtrim-step), [substring()](https://tinkerpop.apache.org/docs/current/reference/#substring-step), [toLower()](https://tinkerpop.apache.org/docs/current/reference/#toLower-step), [toUpper()](https://tinkerpop.apache.org/docs/current/reference/#toUpper-step), [trim()](https://tinkerpop.apache.org/docs/current/reference/#trim-step) 

## Predicates:
<a name="DFE-predicates"></a>
+  [ Compare: eq, neq, lt, lte, gt, gte](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [Contains: within, without](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [ TextP: endingWith, containing, notStartingWith, notEndingWith, notContaining](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [ P: and, or, between, outside, inside](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 

## Limitations
<a name="gremlin-with-dfe-limitations"></a>

 Repeat with Limit, Labels inside repeat traversal and dedup are not supported in DFE yet. 

```
// With Limit inside the repeat traversal
  g.V().has('code','AGR').repeat(out().limit(5)).until(has('code','FRA'))
  
  // With Labels inside the repeat traversal
  g.V().has('code','AGR').repeat(out().as('a')).until(has('code','FRA'))
  
  // With Dedup inside the repeat traversal
  g.V().has('code','AGR').repeat(out().dedup()).until(has('code','FRA'))
```

 Path with nested repeats, or branching steps are not supported yet. 

```
// Path with branching steps
  g.V().has('code','AGR').union(identity, outE().inV()).path().by('code')
  
  
  // With nested repeat
  g.V().has('code','AGR').repeat(out().union(identity(), out())).path().by('code')
```

## Query planning interleaving
<a name="gremlin-with-dfe-interleaving"></a>

When the translation process encounters a Gremlin step that does not have a corresponding native DFE operator, before falling back to using Tinkerpop it tries to find other intermediate query parts that can be run natively on the DFE engine. It does this by applying interleaving logic to the top level traversal. The result is that supported steps are used wherever possible.

Any such intermediate, non-prefix query translation is represented using `NeptuneInterleavingStep` in the `explain` and `profile` outputs.

For performance comparison, you might want to turn off interleaving in a query, while still using the DFE engine to run the prefix part. Or, you might want to use only the TinkerPop engine for non-prefix query execution. You can do this by using `disableInterleaving` query hint.

Just as the [useDFE](gremlin-query-hints-useDFE.md) query hint with a value of `false` prevents a query from being run on the DFE at all, the `disableInterleaving` query hint with a value of `true` turns off DFE interleaving for translation of a query. For example:

```
g.with('Neptune#disableInterleaving', true)
 .V().has('genre','drama').in('likes')
```

## Updated Gremlin `explain` and `profile` output
<a name="gremlin-with-dfe-explain-update"></a>

Gremlin [explain](gremlin-explain.md) provides details about the optimized traversal that Neptune uses to run a query. See the [sample DFE `explain` output](gremlin-explain-api.md#gremlin-explain-dfe) for an example of what `explain` output looks like when the DFE engine is enabled.

The [Gremlin `profile` API](gremlin-profile-api.md) runs a specified Gremlin traversal, collects various metrics about the run, and produces a profile report that contains details about the optimized query plan and the runtime statistics of various operators. See [sample DFE `profile` output](gremlin-profile-api.md#gremlin-profile-sample-dfe-output) for an example of what `profile` output looks like when the DFE engine is enabled.

**Note**  
Because the DFE engine is an experimental feature released in lab mode, the exact format of the `explain` and `profile` output is subject to change.

# Accessing the Neptune Graph with openCypher
<a name="access-graph-opencypher"></a>

Neptune supports building graph applications using openCypher, currently one of the most popular query languages for developers working with graph databases. Developers, business analysts, and data scientists like openCypher’s SQL-inspired syntax because it provides a familiar structure to compose queries for graph applications.

**openCypher** is a declarative query language for property graphs that was originally developed by Neo4j, then open-sourced in 2015, and contributed to the [openCypher](http://www.opencypher.org/) project under an Apache 2 open-source license. Its syntax is documented in the [Cypher Query Language Reference, Version 9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf).

For the limitations and differences in Neptune support of the openCypher specification, see [openCypher specification compliance in Amazon Neptune](feature-opencypher-compliance.md).

**Note**  
The current Neo4j implementation of the Cypher query language has diverged in some ways from the openCypher specification. If you are migrating current Neo4j Cypher code to Neptune, see [Neptune compatibility with Neo4j](migration-compatibility.md) and [Rewriting Cypher queries to run in openCypher on Neptune](migration-opencypher-rewrites.md) for help.

Starting with engine release 1.1.1.0, openCypher is available for production use in Neptune.

## Gremlin vs. openCypher: similarities and differences
<a name="access-graph-opencypher-overview-with-gremlin"></a>

Gremlin and openCypher are both property-graph query languages, and they are complementary in many ways.

Gremlin was designed to appeal to programmers and fit seamlessly into code. As a result, Gremlin is imperative by design, whereas openCypher's declarative syntax may feel more familiar for people with SQL or SPARQL experience. Gremlin might seem more natural to a data scientist using Python in a Jupyter notebook, whereas openCypher might seem more intuitive to a business user with some SQL background.

The nice thing is that **you don't have to choose** between Gremlin and openCypher in Neptune. Queries in either language can operate on the same graph regardless of which of the two language was used to enter that data. You may find it more convenient to use Gremlin for some things and openCypher for others, depending on what you're doing.

Gremlin uses an imperative syntax that lets you control how you move through your graph in a series of steps, each of which takes in a stream of data, performs some action on it (using a filter, map, and so forth), and then outputs the results to the next step. A Gremlin query commonly takes the form, `g.V()`, followed by additional steps.

In openCypher, you use a declarative syntax, inspired by SQL, that specifies a pattern of nodes and relationships to find in your graph using a motif syntax (like `()-[]->()`). An openCypher query often starts with a `MATCH` clause, followed by other clauses such as `WHERE`, `WITH`, and `RETURN`.

# Getting started using openCypher
<a name="access-graph-opencypher-overview-getting-started"></a>

You can query property-graph data in Neptune using openCypher regardless of how it was loaded, but you can't use openCypher to query data loaded as RDF.

The [Neptune bulk loader](bulk-load.md) accepts property-graph data in a [CSV format for Gremlin](bulk-load-tutorial-format-gremlin.md), and in a [CSV format for openCypher](bulk-load-tutorial-format-opencypher.md). Also, of course, you can add property data to your graph using Gremlin and/or openCypher queries.

There are many online tutorials available for learning the Cypher query language. Here, a few quick examples of openCypher queries may help you get an idea of the language, but by far the best and easiest way to get started using openCypher to query your Neptune graph is by using the openCypher notebooks in the [Neptune workbench](graph-notebooks.md). The workbench is open-source, and is hosted on GitHub at [https://github.com/aws-samples/amazon-neptune-samples](https://github.com/aws-samples/amazon-neptune-samples/).

You'll find the openCypher notebooks in the GitHub [Neptune graph-notebook repository](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks). In particular, check out the [Air-routes visualization](https://github.com/aws/graph-notebook/blob/main/src/graph_notebook/notebooks/02-Visualization/Air-Routes-openCypher.ipynb), and [English Premier Teams](https://github.com/aws/graph-notebook/blob/main/src/graph_notebook/notebooks/02-Visualization/EPL-openCypher.ipynb) notebooks for openCypher.

Data processed by openCypher takes the form of an unordered series of key/value maps. The main way to refine, manipulate, and augment these maps is to use clauses that perform tasks such as pattern matching, insertion, update, and deletion on the key/value pairs.

There are several clauses in openCypher for finding data patterns in the graph, of which `MATCH` is the most common. `MATCH` lets you specify the pattern of nodes, relationships, and filters that you want to look for in your graph. For example:
+ **Get all nodes**

  ```
  MATCH (n) RETURN n
  ```
+ **Find connected nodes**

  ```
  MATCH (n)-[r]->(d) RETURN n, r, d
  ```
+ **Find a path**

  ```
  MATCH p=(n)-[r]->(d) RETURN p
  ```
+ **Get all nodes with a label**

  ```
  MATCH (n:airport) RETURN n
  ```

Note that the first query above returns every single node in your graph, and the next two return every node that has a relationship— this is not generally recommended\$1 In almost all cases, you want to narrow down the data being returned, which you can do by specifying node or relationship labels and properties, as in the fourth example.

You can find a handy cheat-sheet for openCypher syntax in the Neptune [github sample repository](https://github.com/aws-samples/amazon-neptune-samples/tree/master/opencypher/Cheatsheet.md).

# Neptune openCypher status servlet and status endpoint
<a name="access-graph-opencypher-status"></a>

The openCypher status endpoint provides access to information about queries that are currently running on the server or waiting to run. It also lets you cancel those queries. The endpoint is:

```
https://(the server):(the port number)/openCypher/status
```

You can use the HTTP `GET` and `POST` methods to get current status from the server, or to cancel a query. You can also use the `DELETE` method to cancel a running or waiting query.

## Parameters for status requests
<a name="access-graph-opencypher-status-parameters"></a>

**Status query parameters**
+ **`includeWaiting`** (`true` or `false`)   –   When set to `true` and other parameters are not present, causes status information for waiting queries to be returned as well as for running queries.
+ **`cancelQuery`**   –   Used only with `GET` and `POST` methods, to indicate that this is a cancelation request. The `DELETE` method does not need this parameter.

  The value of the `cancelQuery` parameter is not used, but when `cancelQuery` is present, the `queryId` parameter is required, to identify which query to cancel.
+ **`queryId`**   –   Contains the ID of a specific query.

  When used with the `GET` or `POST` method and the `cancelQuery` parameter is not present, `queryId` causes status information to be returned for the specific query it identifies. If the `cancelQuery` parameter is present, then the specific query that `queryId` identifies is canceled.

  When used with the `DELETE` method, `queryId` always indicates a specific query to be canceled.
+ **`silent`**   –   Only used when canceling a query. If set to `true`, causes the cancelation to happen silently.

## Status request response fields
<a name="access-graph-opencypher-status-response-fields"></a>

**Status response fields if the ID of a specific query is not provided**
+ **acceptedQueryCount**   –   The number of queries that have been accepted but not yet completed, including queries in the queue.
+ **runningQueryCount**   –   The number of currently running openCypher queries.
+ **queries**   –   A list of the current openCypher queries.

**Status response fields for a specific query**
+ **queryId**   –   A GUID id for the query. Neptune automatically assigns this ID value to each query, or you can also assign your own ID (see [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md)).
+ **queryString**   –   The submitted query. This is truncated to 1024 characters if it is longer than that.
+ **queryEvalStats**   –   Statistics for this query:
  + **waited**   –   Indicates how long the query waited, in milliseconds.
  + **elapsed**   –   The number of milliseconds the query has been running so far.
  + **cancelled**   –   `True` indicates that the query was cancelled, or `False` that it has not been cancelled.

## Examples of status requests and responses
<a name="access-graph-opencypher-status-samples"></a>
+ **Request for the status of all queries, including those waiting:**

------
#### [ AWS CLI ]

  ```
  aws neptunedata get-open-cypher-query-status \
    --endpoint-url https://your-neptune-endpoint:port \
    --include-waiting
  ```

  For more information, see [get-open-cypher-query-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-open-cypher-query-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

  ```
  import boto3
  from botocore.config import Config
  
  client = boto3.client(
      'neptunedata',
      endpoint_url='https://your-neptune-endpoint:port',
      config=Config(read_timeout=None, retries={'total_max_attempts': 1})
  )
  
  response = client.get_open_cypher_query_status(
      includeWaiting=True
  )
  
  print(response)
  ```

  For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

  ```
  awscurl https://your-neptune-endpoint:port/openCypher/status \
    --region us-east-1 \
    --service neptune-db \
    -X POST \
    -d "includeWaiting=true"
  ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

  ```
  curl https://your-neptune-endpoint:port/openCypher/status \
    --data-urlencode "includeWaiting=true"
  ```

------

  *Response:*

  ```
  {
    "acceptedQueryCount" : 0,
    "runningQueryCount" : 0,
    "queries" : [ ]
  }
  ```
+ **Request for the status of running queries, **not** including those waiting:**:

------
#### [ AWS CLI ]

  ```
  aws neptunedata get-open-cypher-query-status \
    --endpoint-url https://your-neptune-endpoint:port
  ```

  For more information, see [get-open-cypher-query-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-open-cypher-query-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

  ```
  import boto3
  from botocore.config import Config
  
  client = boto3.client(
      'neptunedata',
      endpoint_url='https://your-neptune-endpoint:port',
      config=Config(read_timeout=None, retries={'total_max_attempts': 1})
  )
  
  response = client.get_open_cypher_query_status()
  
  print(response)
  ```

  For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

  ```
  awscurl https://your-neptune-endpoint:port/openCypher/status \
    --region us-east-1 \
    --service neptune-db
  ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

  ```
  curl https://your-neptune-endpoint:port/openCypher/status
  ```

------

  *Response:*

  ```
  {
    "acceptedQueryCount" : 0,
    "runningQueryCount" : 0,
    "queries" : [ ]
  }
  ```
+ **Request for the status of a single query:**

------
#### [ AWS CLI ]

  ```
  aws neptunedata get-open-cypher-query-status \
    --endpoint-url https://your-neptune-endpoint:port \
    --query-id eadc6eea-698b-4a2f-8554-5270ab17ebee
  ```

  For more information, see [get-open-cypher-query-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-open-cypher-query-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

  ```
  import boto3
  from botocore.config import Config
  
  client = boto3.client(
      'neptunedata',
      endpoint_url='https://your-neptune-endpoint:port',
      config=Config(read_timeout=None, retries={'total_max_attempts': 1})
  )
  
  response = client.get_open_cypher_query_status(
      queryId='eadc6eea-698b-4a2f-8554-5270ab17ebee'
  )
  
  print(response)
  ```

  For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

  ```
  awscurl https://your-neptune-endpoint:port/openCypher/status \
    --region us-east-1 \
    --service neptune-db \
    -X POST \
    -d "queryId=eadc6eea-698b-4a2f-8554-5270ab17ebee"
  ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

  ```
  curl https://your-neptune-endpoint:port/openCypher/status \
    --data-urlencode "queryId=eadc6eea-698b-4a2f-8554-5270ab17ebee"
  ```

------

  *Response:*

  ```
  {
    "queryId" : "eadc6eea-698b-4a2f-8554-5270ab17ebee",
    "queryString" : "MATCH (n1)-[:knows]->(n2), (n2)-[:knows]->(n3), (n3)-[:knows]->(n4), (n4)-[:knows]->(n5), (n5)-[:knows]->(n6), (n6)-[:knows]->(n7), (n7)-[:knows]->(n8), (n8)-[:knows]->(n9), (n9)-[:knows]->(n10) RETURN COUNT(n1);",
    "queryEvalStats" : {
      "waited" : 0,
      "elapsed" : 23463,
      "cancelled" : false
    }
  }
  ```
+ **Requests to cancel a query**

------
#### [ AWS CLI ]

  ```
  aws neptunedata cancel-open-cypher-query \
    --endpoint-url https://your-neptune-endpoint:port \
    --query-id f43ce17b-db01-4d37-a074-c76d1c26d7a9
  ```

  For more information, see [cancel-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/cancel-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

  ```
  import boto3
  from botocore.config import Config
  
  client = boto3.client(
      'neptunedata',
      endpoint_url='https://your-neptune-endpoint:port',
      config=Config(read_timeout=None, retries={'total_max_attempts': 1})
  )
  
  response = client.cancel_open_cypher_query(
      queryId='f43ce17b-db01-4d37-a074-c76d1c26d7a9'
  )
  
  print(response)
  ```

  For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

  ```
  awscurl https://your-neptune-endpoint:port/openCypher/status \
    --region us-east-1 \
    --service neptune-db \
    -X POST \
    -d "cancelQuery" \
    -d "queryId=f43ce17b-db01-4d37-a074-c76d1c26d7a9"
  ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

  1. Using `POST`:

  ```
  curl -X POST https://your-neptune-endpoint:port/openCypher/status \
    --data-urlencode "cancelQuery" \
    --data-urlencode "queryId=f43ce17b-db01-4d37-a074-c76d1c26d7a9"
  ```

  2. Using `GET`:

  ```
  curl -X GET https://your-neptune-endpoint:port/openCypher/status \
    --data-urlencode "cancelQuery" \
    --data-urlencode "queryId=588af350-cfde-4222-bee6-b9cedc87180d"
  ```

  3. Using `DELETE`:

  ```
  curl -X DELETE \
    "https://your-neptune-endpoint:port/openCypher/status?queryId=b9a516d1-d25c-4301-bb80-10b2743ecf0e"
  ```

------

  *Response:*

  ```
  {
    "status" : "200 OK",
    "payload" : true
  }
  ```

# The Amazon Neptune OpenCypher HTTPS endpoint
<a name="access-graph-opencypher-queries"></a>

**Topics**
+ [OpenCypher read and write queries on the HTTPS endpoint](#access-graph-opencypher-queries-read-write)
+ [The default OpenCypher JSON results format](#access-graph-opencypher-queries-results-simple-JSON)
+ [Optional HTTP trailing headers for multi-part OpenCypher responses](#optional-http-trailing-headers)

**Note**  
Neptune does not currently support HTTP/2 for REST API requests. Clients must use HTTP/1.1 when connecting to endpoints.

## OpenCypher read and write queries on the HTTPS endpoint
<a name="access-graph-opencypher-queries-read-write"></a>

The OpenCypher HTTPS endpoint supports read and update queries using both the `GET` and the `POST` method. The `DELETE` and `PUT` methods are not supported.

The following instructions walk you through connecting to the OpenCypher endpoint using the `curl` command and HTTPS. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

The syntax is:

```
HTTPS://(the server):(the port number)/openCypher
```

Here is a sample read query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH (n1) RETURN n1"
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_query(
    openCypherQuery='MATCH (n1) RETURN n1'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH (n1) RETURN n1"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH (n1) RETURN n1"
```

------

Here is a sample write/update query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "CREATE (n:Person { age: 25 })"
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_query(
    openCypherQuery='CREATE (n:Person { age: 25 })'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=CREATE (n:Person { age: 25 })"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=CREATE (n:Person { age: 25 })"
```

------

## The default OpenCypher JSON results format
<a name="access-graph-opencypher-queries-results-simple-JSON"></a>

The following JSON format is returned by default, or by setting the request header explicitly to `Accept: application/json`. This format is designed to be easily parsed into objects using native-language features of most libraries.

The JSON document that is returned contains one field, `results`, which contains the query return values. The examples below show the JSON formatting for common values.

**Value response example:**

```
{
  "results": [
    {
      "count(a)": 121
    }
  ]
}
```

**Node response example:**

```
{
  "results": [
    {
      "a": {
        "~id": "22",
        "~entityType": "node",
        "~labels": [
          "airport"
        ],
        "~properties": {
          "desc": "Seattle-Tacoma",
          "lon": -122.30899810791,
          "runways": 3,
          "type": "airport",
          "country": "US",
          "region": "US-WA",
          "lat": 47.4490013122559,
          "elev": 432,
          "city": "Seattle",
          "icao": "KSEA",
          "code": "SEA",
          "longest": 11901
        }
      }
    }
  ]
}
```

**Relationship response example:**

```
{
  "results": [
    {
      "r": {
        "~id": "7389",
        "~entityType": "relationship",
        "~start": "22",
        "~end": "151",
        "~type": "route",
        "~properties": {
          "dist": 956
        }
      }
    }
  ]
}
```

**Path response example:**

```
{
  "results": [
    {
      "p": [
        {
          "~id": "22",
          "~entityType": "node",
          "~labels": [
            "airport"
          ],
          "~properties": {
            "desc": "Seattle-Tacoma",
            "lon": -122.30899810791,
            "runways": 3,
            "type": "airport",
            "country": "US",
            "region": "US-WA",
            "lat": 47.4490013122559,
            "elev": 432,
            "city": "Seattle",
            "icao": "KSEA",
            "code": "SEA",
            "longest": 11901
          }
        },
        {
          "~id": "7389",
          "~entityType": "relationship",
          "~start": "22",
          "~end": "151",
          "~type": "route",
          "~properties": {
            "dist": 956
          }
        },
        {
          "~id": "151",
          "~entityType": "node",
          "~labels": [
            "airport"
          ],
          "~properties": {
            "desc": "Ontario International Airport",
            "lon": -117.600997924805,
            "runways": 2,
            "type": "airport",
            "country": "US",
            "region": "US-CA",
            "lat": 34.0559997558594,
            "elev": 944,
            "city": "Ontario",
            "icao": "KONT",
            "code": "ONT",
            "longest": 12198
          }
        }
      ]
    }
  ]
}
```

## Optional HTTP trailing headers for multi-part OpenCypher responses
<a name="optional-http-trailing-headers"></a>

 This feature is available starting with Neptune engine release [1.4.5.0](https://docs.aws.amazon.com/releases/release-1.4.5.0.xml). 

 The HTTP response to OpenCypher queries and updates is typically returned in multiple chunks. When failures occur after the initial response chunks have been sent (with an HTTP status code of 200), it can be challenging to diagnose the issue. By default, `Neptune reports such failures by appending an error message to the message body, which may be corrupted due to the streaming nature of the response. 

**Using trailing headers**  
 To improve error detection and diagnosis, you can enable trailing headers by including a transfer-encoding (TE) trailers header (te: trailers)in your request. Doing this will cause Neptune to include two new header fields within the trailing headers of the response chunks: 
+  `X-Neptune-Status` – contains the response code followed by a short name. For instance, in case of success the trailing header would be: `X-Neptune-Status: 200 OK`. In the case of failure, the response code would be a Neptune engine error code such as `X-Neptune-Status: 500 TimeLimitExceededException`. 
+  `X-Neptune-Detail` – is empty for successful requests. In the case of errors, it contains the JSON error message. Because only ASCII characters are allowed in HTTP header values, the JSON string is URL encoded. The error message is also still appended to the response message body. 

 For more information, see the [MDN page about TE request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/TE). 

**OpenCypher trailing headers usage example**  
 This example demonstrates how trailing headers help diagnose a query that exceeds its time limit: 

```
curl --raw 'https://your-neptune-endpoint:port/openCypher' \
-H 'TE: trailers' \
-d 'query=MATCH(n) RETURN n.firstName'
 
 
Output:
< HTTP/1.1 200 OK
< transfer-encoding: chunked
< trailer: X-Neptune-Status, X-Neptune-Detail
< content-type: application/json;charset=UTF-8
< 
< 
{
  "results": [{
      "n.firstName": "Hossein"
    }, {
      "n.firstName": "Jan"
    }, {
      "n.firstName": "Miguel"
    }, {
      "n.firstName": "Eric"
    }, 
{"detailedMessage":"Operation terminated (deadline exceeded)",
"code":"TimeLimitExceededException",
"requestId":"a7e9d2aa-fbb7-486e-8447-2ef2a8544080",
"message":"Operation terminated (deadline exceeded)"}
0
X-Neptune-Status: 500 TimeLimitExceededException
X-Neptune-Detail: %7B%22detailedMessage%22%3A%22Operation+terminated+%28deadline+exceeded%29%22%2C%22code%22%3A%22TimeLimitExceededException%22%2C%22requestId%22%3A%22a7e9d2aa-fbb7-486e-8447-2ef2a8544080%22%2C%22message%22%3A%22Operation+terminated+%28deadline+exceeded%29%22%7D
```

**Response breakdown:**  
 The previous example shows how an OpenCypher response with trailing headers can help diagnose query failures. Here we see four sequential parts: (1) initial headers with a 200 OK status indicating streaming begins, (2) partial (broken) JSON results successfully streamed before the failure, (3) the appended error message showing the timeout, and (4) trailing headers containing the final status (500 TimeLimitExceededException) and detailed error information. 

# Using the AWS SDK to run openCypher queries
<a name="access-graph-opencypher-sdk"></a>

With the AWS SDK, you can run openCypher queries against your Neptune graph using a programming language of your choice. The Neptune data API SDK (service name `neptunedata`) provides the [ExecuteOpenCypherQuery](https://docs.aws.amazon.com/neptune/latest/data-api/API_ExecuteOpenCypherQuery.html) action for submitting openCypher queries.

You must run these examples from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB cluster, or from a location that has network connectivity to your cluster endpoint.

Direct links to the API reference documentation for the `neptunedata` service in each SDK language can be found below:


| Programming language | neptunedata API reference | 
| --- | --- | 
| C\$1\$1 | [https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html) | 
| Go | [https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/](https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/) | 
| Java | [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html) | 
| JavaScript | [https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/) | 
| Kotlin | [https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html](https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html) | 
| .NET | [https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html](https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html) | 
| PHP | [https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html](https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html) | 
| Python | [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html) | 
| Ruby | [https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html) | 
| Rust | [https://crates.io/crates/aws-sdk-neptunedata](https://crates.io/crates/aws-sdk-neptunedata) | 
| CLI | [https://docs.aws.amazon.com/cli/latest/reference/neptunedata/](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/) | 

## openCypher AWS SDK examples
<a name="access-graph-opencypher-sdk-examples"></a>

The following examples show how to set up a `neptunedata` client, run an openCypher query, and print the results. Replace *YOUR\$1NEPTUNE\$1HOST* and *YOUR\$1NEPTUNE\$1PORT* with the endpoint and port of your Neptune DB cluster.

**Client-side timeout and retry configuration**  
The SDK client timeout controls how long the *client* waits for a response. It does not control how long the query runs on the server. If the client times out before the server finishes, the query may continue running on Neptune while the client has no way to retrieve the results.  
We recommend setting the client-side read timeout to `0` (no timeout) or to a value that is at least a few seconds longer than the server-side [neptune\$1query\$1timeout](parameters.md#parameters-db-cluster-parameters-neptune_query_timeout) setting on your Neptune DB cluster. This lets Neptune control when queries time out.  
We also recommend setting the maximum retry attempts to `1` (no retries). If the SDK retries a query that is still running on the server, it can result in duplicate operations. This is especially important for mutation queries, where a retry could cause unintended duplicate writes.

------
#### [ Python ]

1. Follow the [installation instructions](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) to install Boto3.

1. Create a file named `openCypherExample.py` and paste the following code:

   ```
   import boto3
   import json
   from botocore.config import Config
   
   # Disable the client-side read timeout and retries so that
   # Neptune's server-side neptune_query_timeout controls query duration.
   client = boto3.client(
       'neptunedata',
       endpoint_url=f'https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   response = client.execute_open_cypher_query(
       openCypherQuery='MATCH (n) RETURN n LIMIT 1'
   )
   
   print(json.dumps(response['results'], indent=2))
   ```

1. Run the example: `python openCypherExample.py`

------
#### [ Java ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-java/latest/developer-guide/setup.html) to set up the AWS SDK for Java.

1. Use the following code to set up a `NeptunedataClient`, run an openCypher query, and print the result:

   ```
   import java.net.URI;
   import java.time.Duration;
   import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
   import software.amazon.awssdk.core.retry.RetryPolicy;
   import software.amazon.awssdk.services.neptunedata.NeptunedataClient;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteOpenCypherQueryRequest;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteOpenCypherQueryResponse;
   
   // Disable the client-side timeout and retries so that
   // Neptune's server-side neptune_query_timeout controls query duration.
   NeptunedataClient client = NeptunedataClient.builder()
       .endpointOverride(URI.create("https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT"))
       .overrideConfiguration(ClientOverrideConfiguration.builder()
           .apiCallTimeout(Duration.ZERO)
           .retryPolicy(RetryPolicy.none())
           .build())
       .build();
   
   ExecuteOpenCypherQueryRequest request = ExecuteOpenCypherQueryRequest.builder()
       .openCypherQuery("MATCH (n) RETURN n LIMIT 1")
       .build();
   
   ExecuteOpenCypherQueryResponse response = client.executeOpenCypherQuery(request);
   
   System.out.println(response.results().toString());
   ```

------
#### [ JavaScript ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-javascript/v3/developer-guide/getting-started-nodejs.html) to set up the AWS SDK for JavaScript. Install the neptunedata client package: `npm install @aws-sdk/client-neptunedata`.

1. Create a file named `openCypherExample.js` and paste the following code:

   ```
   import { NeptunedataClient, ExecuteOpenCypherQueryCommand } from "@aws-sdk/client-neptunedata";
   import { NodeHttpHandler } from "@smithy/node-http-handler";
   
   const config = {
       endpoint: "https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT",
       // Disable the client-side request timeout so that
       // Neptune's server-side neptune_query_timeout controls query duration.
       requestHandler: new NodeHttpHandler({
           requestTimeout: 0
       }),
       maxAttempts: 1
   };
   
   const client = new NeptunedataClient(config);
   
   const input = {
       openCypherQuery: "MATCH (n) RETURN n LIMIT 1"
   };
   
   const command = new ExecuteOpenCypherQueryCommand(input);
   const response = await client.send(command);
   
   console.log(JSON.stringify(response, null, 2));
   ```

1. Run the example: `node openCypherExample.js`

------

# Using the Bolt protocol to make openCypher queries to Neptune
<a name="access-graph-opencypher-bolt"></a>

[Bolt](https://boltprotocol.org/) is a statement-oriented client/server protocol initially developed by Neo4j and licensed under the Creative Commons 3.0 [Attribution-ShareAlike](https://creativecommons.org/licenses/by-sa/3.0/) license. It is client-driven, meaning that the client always initiates message exchanges.

To connect to Neptune using Neo4j's Bolt drivers, simply replace the URL and Port number with your cluster endpoints using the `bolt` URI scheme. If you have a single Neptune instance running, use the read\$1write endpoint. If multiple instances are running, then two drivers are recommended, one for the writer and another for all the read replicas. If you have only the default two endpoints, a read\$1write and a read\$1only driver are sufficient, but if you have custom endpoints as well, consider creating a driver instance for each one.

**Note**  
Althought the Bolt spec states that Bolt can connect using either TCP or WebSockets, Neptune only supports TCP connections for Bolt.

Neptune allows up to 1000 concurrent Bolt connections on all instance sizes except for t3.medium and t4g.medium. On t3.medium and t4g.medium instances only 512 connections are allowed.

For examples of openCypher queries in various languages that use the Bolt drivers, see the Neo4j [Drivers & Language Guides](https://neo4j.com/developer/language-guides/) documentation.

**Important**  
The Neo4j Bolt drivers for Python, .NET, JavaScript, and Golang did not initially support the automatic renewal of AWS Signature v4 authentication tokens. This means that after the signature expired (often in 5 minutes), the driver failed to authenticate, and subsequent requests failed. The Python, .NET, JavaScript, and Go examples below were all affected by this issue.  
See [Neo4j Python driver issue \$1834](https://github.com/neo4j/neo4j-python-driver/issues/834), [Neo4j .NET issue \$1664](https://github.com/neo4j/neo4j-dotnet-driver/issues/664), [Neo4j JavaScript driver issue \$1993](https://github.com/neo4j/neo4j-javascript-driver/issues/993), and [Neo4j goLang driver issue \$1429](https://github.com/neo4j/neo4j-go-driver/issues/429) for more information.  
As of driver version 5.8.0, a new preview re-authentication API was released for the Go driver (see [v5.8.0 - Feedback wanted on re-authentication](https://github.com/neo4j/neo4j-go-driver/discussions/482)).

## Using Bolt with Java to connect to Neptune
<a name="access-graph-opencypher-bolt-java"></a>

You can download a driver for whatever version you want to use from the Maven [MVN repository](https://mvnrepository.com/artifact/org.neo4j.driver/neo4j-java-driver), or can add this dependency to your project:

```
<dependency>
  <groupId>org.neo4j.driver</groupId>
  <artifactId>neo4j-java-driver</artifactId>
  <version>4.3.3</version>
</dependency>
```

Then, to connect to Neptune in Java using one of these Bolt drivers, create a driver instance for the primary/writer instance in your cluster using code like the following:

```
import org.neo4j.driver.Driver;
import org.neo4j.driver.GraphDatabase;

final Driver driver =
  GraphDatabase.driver("bolt://(your cluster endpoint URL):(your cluster port)",
    AuthTokens.none(),
    Config.builder().withEncryption()
                    .withTrustStrategy(TrustStrategy.trustSystemCertificates())
                    .build());
```

If you have one or more reader replicas, you can similarly create a driver instance for them using code like this:

```
final Driver read_only_driver =              // (without connection timeout)
  GraphDatabase.driver("bolt://(your cluster endpoint URL):(your cluster port)",
      Config.builder().withEncryption()
                      .withTrustStrategy(TrustStrategy.trustSystemCertificates())
                      .build());
```

Or, with a timeout:

```
final Driver read_only_timeout_driver =      // (with connection timeout)
  GraphDatabase.driver("bolt://(your cluster endpoint URL):(your cluster port)",
    Config.builder().withConnectionTimeout(30, TimeUnit.SECONDS)
                    .withEncryption()
                    .withTrustStrategy(TrustStrategy.trustSystemCertificates())
                    .build());
```

If you have custom endpoints, it may also be worthwhile to create a driver instance for each one.

## A Python openCypher query example using Bolt
<a name="access-graph-opencypher-bolt-python"></a>

Here is how to make an openCypher query in Python using Bolt:

```
python -m pip install neo4j
```

```
from neo4j import GraphDatabase
uri = "bolt://(your cluster endpoint URL):(your cluster port)"
driver = GraphDatabase.driver(uri, auth=("username", "password"), encrypted=True)
```

Note that the `auth` parameters are ignored.

## A .NET openCypher query example using Bolt
<a name="access-graph-opencypher-bolt-dotnet"></a>

To make an openCypher query in .NET using Bolt, the first step is to install the Neo4j driver using NuHet. To make synchronous calls, use the `.Simple` version, like this:

```
Install-Package Neo4j.Driver.Simple-4.3.0
```

```
using Neo4j.Driver;

namespace hello
{
  // This example creates a node and reads a node in a Neptune
  // Cluster where IAM Authentication is not enabled.
  public class HelloWorldExample : IDisposable
  {
    private bool _disposed = false;
    private readonly IDriver _driver;
    private static string url = "bolt://(your cluster endpoint URL):(your cluster port)";
    private static string createNodeQuery = "CREATE (a:Greeting) SET a.message = 'HelloWorldExample'";
    private static string readNodeQuery = "MATCH(n:Greeting) RETURN n.message";

    ~HelloWorldExample() => Dispose(false);

    public HelloWorldExample(string uri)
    {
      _driver = GraphDatabase.Driver(uri, AuthTokens.None, o => o.WithEncryptionLevel(EncryptionLevel.Encrypted));
    }

    public void createNode()
    {
      // Open a session
      using (var session = _driver.Session())
      {
         // Run the query in a write transaction
        var greeting = session.WriteTransaction(tx =>
        {
          var result = tx.Run(createNodeQuery);
          // Consume the result
          return result.Consume();
        });

        // The output will look like this:
        //   ResultSummary{Query=`CREATE (a:Greeting) SET a.message = 'HelloWorldExample".....
        Console.WriteLine(greeting);
      }
    }

    public void retrieveNode()
    {
      // Open a session
      using (var session = _driver.Session())
      {
        // Run the query in a read transaction
        var greeting = session.ReadTransaction(tx =>
        {
          var result = tx.Run(readNodeQuery);
          // Consume the result. Read the single node
          // created in a previous step.
          return result.Single()[0].As<string>();
        });
        // The output will look like this:
        //   HelloWorldExample
        Console.WriteLine(greeting);
      }
    }

    public void Dispose()
    {
      Dispose(true);
      GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
      if (_disposed)
        return;
      if (disposing)
      {
        _driver?.Dispose();
      }
      _disposed = true;
    }

    public static void Main()
    {
      using (var apiCaller = new HelloWorldExample(url))
      {
        apiCaller.createNode();
        apiCaller.retrieveNode();
      }
    }
  }
}
```

## A Java openCypher query example using Bolt with IAM authentication
<a name="access-graph-opencypher-bolt-java-iam-auth"></a>

The Java code below shows how to make openCypher queries in Java using Bolt with IAM authentication. The JavaDoc comment describes its usage. Once a driver instance is available, you can use it to make multiple authenticated requests.

```
package software.amazon.neptune.bolt;

import com.amazonaws.DefaultRequest;
import com.amazonaws.Request;
import com.amazonaws.auth.AWS4Signer;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.http.HttpMethodName;
import com.google.gson.Gson;
import lombok.Builder;
import lombok.Getter;
import lombok.NonNull;
import org.neo4j.driver.Value;
import org.neo4j.driver.Values;
import org.neo4j.driver.internal.security.InternalAuthToken;
import org.neo4j.driver.internal.value.StringValue;

import java.net.URI;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import static com.amazonaws.auth.internal.SignerConstants.AUTHORIZATION;
import static com.amazonaws.auth.internal.SignerConstants.HOST;
import static com.amazonaws.auth.internal.SignerConstants.X_AMZ_DATE;
import static com.amazonaws.auth.internal.SignerConstants.X_AMZ_SECURITY_TOKEN;

/**
 * Use this class instead of `AuthTokens.basic` when working with an IAM
 * auth-enabled server. It works the same as `AuthTokens.basic` when using
 * static credentials, and avoids making requests with an expired signature
 * when using temporary credentials. Internally, it generates a new signature
 * on every invocation (this may change in a future implementation).
 *
 * Note that authentication happens only the first time for a pooled connection.
 *
 * Typical usage:
 *
 * NeptuneAuthToken authToken = NeptuneAuthToken.builder()
 *     .credentialsProvider(credentialsProvider)
 *     .region("aws region")
 *     .url("cluster endpoint url")
 *     .build();
 *
 * Driver driver = GraphDatabase.driver(
 *     authToken.getUrl(),
 *     authToken,
 *     config
 * );
 */

public class NeptuneAuthToken extends InternalAuthToken {
  private static final String SCHEME = "basic";
  private static final String REALM = "realm";
  private static final String SERVICE_NAME = "neptune-db";
  private static final String HTTP_METHOD_HDR = "HttpMethod";
  private static final String DUMMY_USERNAME = "username";
  @NonNull
  private final String region;
  @NonNull
  @Getter
  private final String url;
  @NonNull
  private final AWSCredentialsProvider credentialsProvider;
  private final Gson gson = new Gson();

  @Builder
  private NeptuneAuthToken(
      @NonNull final String region,
      @NonNull final String url,
      @NonNull final AWSCredentialsProvider credentialsProvider
  ) {
      // The superclass caches the result of toMap(), which we don't want
      super(Collections.emptyMap());
      this.region = region;
      this.url = url;
      this.credentialsProvider = credentialsProvider;
  }

  @Override
  public Map<String, Value> toMap() {
    final Map<String, Value> map = new HashMap<>();
    map.put(SCHEME_KEY, Values.value(SCHEME));
    map.put(PRINCIPAL_KEY, Values.value(DUMMY_USERNAME));
    map.put(CREDENTIALS_KEY, new StringValue(getSignedHeader()));
    map.put(REALM_KEY, Values.value(REALM));

    return map;
  }

  private String getSignedHeader() {
    final Request<Void> request = new DefaultRequest<>(SERVICE_NAME);
    request.setHttpMethod(HttpMethodName.GET);
    request.setEndpoint(URI.create(url));
    // Comment out the following line if you're using an engine version older than 1.2.0.0
    request.setResourcePath("/opencypher");

    final AWS4Signer signer = new AWS4Signer();
    signer.setRegionName(region);
    signer.setServiceName(request.getServiceName());
    signer.sign(request, credentialsProvider.getCredentials());

    return getAuthInfoJson(request);
  }

  private String getAuthInfoJson(final Request<Void> request) {
    final Map<String, Object> obj = new HashMap<>();
    obj.put(AUTHORIZATION, request.getHeaders().get(AUTHORIZATION));
    obj.put(HTTP_METHOD_HDR, request.getHttpMethod());
    obj.put(X_AMZ_DATE, request.getHeaders().get(X_AMZ_DATE));
    obj.put(HOST, request.getHeaders().get(HOST));
    obj.put(X_AMZ_SECURITY_TOKEN, request.getHeaders().get(X_AMZ_SECURITY_TOKEN));

    return gson.toJson(obj);
  }
}
```

## A Python openCypher query example using Bolt with IAM authentication
<a name="access-graph-opencypher-bolt-python-iam-auth"></a>

The Python class below lets you make openCypher queries in Python using Bolt with IAM authentication:

```
import json

from neo4j import Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
from botocore.auth import (
  SigV4Auth,
  _host_from_url,
)

SCHEME = "basic"
REALM = "realm"
SERVICE_NAME = "neptune-db"
DUMMY_USERNAME = "username"
HTTP_METHOD_HDR = "HttpMethod"
HTTP_METHOD = "GET"
AUTHORIZATION = "Authorization"
X_AMZ_DATE = "X-Amz-Date"
X_AMZ_SECURITY_TOKEN = "X-Amz-Security-Token"
HOST = "Host"


class NeptuneAuthToken(Auth):
  def __init__(
    self,
    credentials: Credentials,
    region: str,
    url: str,
    **parameters
  ):
    # Do NOT add "/opencypher" in the line below if you're using an engine version older than 1.2.0.0
    request = AWSRequest(method=HTTP_METHOD, url=url + "/opencypher")
    request.headers.add_header("Host", _host_from_url(request.url))
    sigv4 = SigV4Auth(credentials, SERVICE_NAME, region)
    sigv4.add_auth(request)

    auth_obj = {
      hdr: request.headers[hdr]
      for hdr in [AUTHORIZATION, X_AMZ_DATE, X_AMZ_SECURITY_TOKEN, HOST]
    }
    auth_obj[HTTP_METHOD_HDR] = request.method
    creds: str = json.dumps(auth_obj)
    super().__init__(SCHEME, DUMMY_USERNAME, creds, REALM, **parameters)
```

You use this class to create a driver as follows:

```
  authToken = NeptuneAuthToken(creds, REGION, URL)
  driver = GraphDatabase.driver(URL, auth=authToken, encrypted=True)
```

## A Node.js example using IAM authentication and Bolt
<a name="access-graph-opencypher-bolt-nodejs-iam-auth"></a>

The Node.js code below uses the AWS SDK for JavaScript version 3 and ES6 syntax to create a driver that authenticates requests:

```
import neo4j from "neo4j-driver";
import { HttpRequest }  from "@smithy/protocol-http";
import { defaultProvider } from "@aws-sdk/credential-provider-node";
import { SignatureV4 } from "@smithy/signature-v4";
import crypto from "@aws-crypto/sha256-js";
const { Sha256 } = crypto;
import assert from "node:assert";

const region = "us-west-2";
const serviceName = "neptune-db";
const host = "(your cluster endpoint URL)";
const port = 8182;
const protocol = "bolt";
const hostPort = host + ":" + port;
const url = protocol + "://" + hostPort;
const createQuery = "CREATE (n:Greeting {message: 'Hello'}) RETURN ID(n)";
const readQuery = "MATCH(n:Greeting) WHERE ID(n) = $id RETURN n.message";

async function signedHeader() {
  const req = new HttpRequest({
    method: "GET",
    protocol: protocol,
    hostname: host,
    port: port,
    // Comment out the following line if you're using an engine version older than 1.2.0.0
    path: "/opencypher",
    headers: {
      host: hostPort
    }
  });

  const signer = new SignatureV4({
    credentials: defaultProvider(),
    region: region,
    service: serviceName,
    sha256: Sha256
  });

  return signer.sign(req, { unsignableHeaders: new Set(["x-amz-content-sha256"]) })
    .then((signedRequest) => {
      const authInfo = {
        "Authorization": signedRequest.headers["authorization"],
        "HttpMethod": signedRequest.method,
        "X-Amz-Date": signedRequest.headers["x-amz-date"],
        "Host": signedRequest.headers["host"],
        "X-Amz-Security-Token": signedRequest.headers["x-amz-security-token"]
      };
      return JSON.stringify(authInfo);
    });
}

async function createDriver() {
  let authToken = { scheme: "basic", realm: "realm", principal: "username", credentials: await signedHeader() };

  return neo4j.driver(url, authToken, {
      encrypted: "ENCRYPTION_ON",
      trust: "TRUST_SYSTEM_CA_SIGNED_CERTIFICATES",
      maxConnectionPoolSize: 1,
      // logging: neo4j.logging.console("debug")
    }
  );
}

async function unmanagedTxn(driver) {
  const session = driver.session();
  const tx = session.beginTransaction();
  try {
    const created = await tx.run(createQuery);
    const matched = await tx.run(readQuery, { id: created.records[0].get(0) });
    const msg = matched.records[0].get("n.message");
    assert.equal(msg, "Hello");
    await tx.commit();
  } catch (err) {
    // The transaction will be rolled back, now handle the error.
    console.log(err);
  } finally {
    await session.close();
  }
}

const driver = await createDriver();
try {
  await unmanagedTxn(driver);
} catch (err) {
  console.log(err);
} finally {
  await driver.close();
}
```

## A .NET openCypher query example using Bolt with IAM authentication
<a name="access-graph-opencypher-bolt-dotnet-iam-auth"></a>

To enable IAM authentication in .NET, you need to sign a request when establishing the connection. The example below shows how to create a `NeptuneAuthToken` helper to generate an authentication token:

```
using Amazon.Runtime;
using Amazon.Util;
using Neo4j.Driver;
using System.Security.Cryptography;
using System.Text;
using System.Text.Json;
using System.Web;

namespace Hello
{
  /*
   * Use this class instead of `AuthTokens.None` when working with an IAM-auth-enabled server.
   *
   * Note that authentication happens only the first time for a pooled connection.
   *
   * Typical usage:
   *
   * var authToken = new NeptuneAuthToken(AccessKey, SecretKey, Region).GetAuthToken(Host);
   * _driver = GraphDatabase.Driver(Url, authToken, o => o.WithEncryptionLevel(EncryptionLevel.Encrypted));
   */

  public class NeptuneAuthToken
  {
    private const string ServiceName = "neptune-db";
    private const string Scheme = "basic";
    private const string Realm = "realm";
    private const string DummyUserName = "username";
    private const string Algorithm = "AWS4-HMAC-SHA256";
    private const string AWSRequest = "aws4_request";

    private readonly string _accessKey;
    private readonly string _secretKey;
    private readonly string _region;

    private readonly string _emptyPayloadHash;

    private readonly SHA256 _sha256;


    public NeptuneAuthToken(string awsKey = null, string secretKey = null, string region = null)
    {
      var awsCredentials = awsKey == null || secretKey == null
        ? FallbackCredentialsFactory.GetCredentials().GetCredentials()
        : null;

      _accessKey = awsKey ?? awsCredentials.AccessKey;
      _secretKey = secretKey ?? awsCredentials.SecretKey;
      _region = region ?? FallbackRegionFactory.GetRegionEndpoint().SystemName; //ex: us-east-1

      _sha256 = SHA256.Create();
      _emptyPayloadHash = Hash(Array.Empty<byte>());
    }

    public IAuthToken GetAuthToken(string url)
    {
      return AuthTokens.Custom(DummyUserName, GetCredentials(url), Realm, Scheme);
    }

    /******************** AWS SIGNING FUNCTIONS *********************/
    private string Hash(byte[] bytesToHash)
    {
      return ToHexString(_sha256.ComputeHash(bytesToHash));
    }

    private static byte[] HmacSHA256(byte[] key, string data)
    {
      return new HMACSHA256(key).ComputeHash(Encoding.UTF8.GetBytes(data));
    }

    private byte[] GetSignatureKey(string dateStamp)
    {
      var kSecret = Encoding.UTF8.GetBytes($"AWS4{_secretKey}");
      var kDate = HmacSHA256(kSecret, dateStamp);
      var kRegion = HmacSHA256(kDate, _region);
      var kService = HmacSHA256(kRegion, ServiceName);
      return HmacSHA256(kService, AWSRequest);
    }

    private static string ToHexString(byte[] array)
    {
      return Convert.ToHexString(array).ToLowerInvariant();
    }

    private string GetCredentials(string url)
    {
      var request = new HttpRequestMessage
      {
        Method = HttpMethod.Get,
        RequestUri = new Uri($"https://{url}/opencypher")
      };

      var signedrequest = Sign(request);

      var headers = new Dictionary<string, object>
      {
        [HeaderKeys.AuthorizationHeader] = signedrequest.Headers.GetValues(HeaderKeys.AuthorizationHeader).FirstOrDefault(),
        ["HttpMethod"] = HttpMethod.Get.ToString(),
        [HeaderKeys.XAmzDateHeader] = signedrequest.Headers.GetValues(HeaderKeys.XAmzDateHeader).FirstOrDefault(),
        // Host should be capitalized, not like in Amazon.Util.HeaderKeys.HostHeader
        ["Host"] = signedrequest.Headers.GetValues(HeaderKeys.HostHeader).FirstOrDefault(),
      };

      return JsonSerializer.Serialize(headers);
    }

    private HttpRequestMessage Sign(HttpRequestMessage request)
    {
      var now = DateTimeOffset.UtcNow;
      var amzdate = now.ToString("yyyyMMddTHHmmssZ");
      var datestamp = now.ToString("yyyyMMdd");

      if (request.Headers.Host == null)
      {
        request.Headers.Host = $"{request.RequestUri.Host}:{request.RequestUri.Port}";
      }

      request.Headers.Add(HeaderKeys.XAmzDateHeader, amzdate);

      var canonicalQueryParams = GetCanonicalQueryParams(request);

      var canonicalRequest = new StringBuilder();
      canonicalRequest.Append(request.Method + "\n");
      canonicalRequest.Append(request.RequestUri.AbsolutePath + "\n");
      canonicalRequest.Append(canonicalQueryParams + "\n");

      var signedHeadersList = new List<string>();
      foreach (var header in request.Headers.OrderBy(a => a.Key.ToLowerInvariant()))
      {
        canonicalRequest.Append(header.Key.ToLowerInvariant());
        canonicalRequest.Append(':');
        canonicalRequest.Append(string.Join(",", header.Value.Select(s => s.Trim())));
        canonicalRequest.Append('\n');
        signedHeadersList.Add(header.Key.ToLowerInvariant());
      }
      canonicalRequest.Append('\n');

      var signedHeaders = string.Join(";", signedHeadersList);
      canonicalRequest.Append(signedHeaders + "\n");
      canonicalRequest.Append(_emptyPayloadHash);

      var credentialScope = $"{datestamp}/{_region}/{ServiceName}/{AWSRequest}";
      var stringToSign = $"{Algorithm}\n{amzdate}\n{credentialScope}\n"
        + Hash(Encoding.UTF8.GetBytes(canonicalRequest.ToString()));

      var signing_key = GetSignatureKey(datestamp);
      var signature = ToHexString(HmacSHA256(signing_key, stringToSign));

      request.Headers.TryAddWithoutValidation(HeaderKeys.AuthorizationHeader,
        $"{Algorithm} Credential={_accessKey}/{credentialScope}, SignedHeaders={signedHeaders}, Signature={signature}");

      return request;
    }

    private static string GetCanonicalQueryParams(HttpRequestMessage request)
    {
      var querystring = HttpUtility.ParseQueryString(request.RequestUri.Query);

      // Query params must be escaped in upper case (i.e. "%2C", not "%2c").
      var queryParams = querystring.AllKeys.OrderBy(a => a)
        .Select(key => $"{key}={Uri.EscapeDataString(querystring[key])}");
      return string.Join("&", queryParams);
    }
  }
}
```

Here is how to make an openCypher query in .NET using Bolt with IAM authentication. The example below uses the `NeptuneAuthToken` helper:

```
using Neo4j.Driver;

namespace Hello
{
  public class HelloWorldExample
  {
    private const string Host = "(your hostname):8182";
    private const string Url = $"bolt://{Host}";
    private const string CreateNodeQuery = "CREATE (a:Greeting) SET a.message = 'HelloWorldExample'";
    private const string ReadNodeQuery = "MATCH(n:Greeting) RETURN n.message";

    private const string AccessKey = "(your access key)";
    private const string SecretKey = "(your secret key)";
    private const string Region = "(your AWS region)"; // e.g. "us-west-2"

    private readonly IDriver _driver;

    public HelloWorldExample()
    {
      var authToken = new NeptuneAuthToken(AccessKey, SecretKey, Region).GetAuthToken(Host);

      // Note that when the connection is reinitialized after max connection lifetime
      // has been reached, the signature token could have already been expired (usually 5 min)
      // You can face exceptions like:
      //   `Unexpected server exception 'Signature expired: XXXX is now earlier than YYYY (ZZZZ - 5 min.)`
      _driver = GraphDatabase.Driver(Url, authToken, o =>
                o.WithMaxConnectionLifetime(TimeSpan.FromMinutes(60)).WithEncryptionLevel(EncryptionLevel.Encrypted));
    }

    public async Task CreateNode()
    {
      // Open a session
      using (var session = _driver.AsyncSession())
      {
        // Run the query in a write transaction
        var greeting = await session.WriteTransactionAsync(async tx =>
        {
          var result = await tx.RunAsync(CreateNodeQuery);
          // Consume the result
          return await result.ConsumeAsync();
        });

        // The output will look like this:
        //   ResultSummary{Query=`CREATE (a:Greeting) SET a.message = 'HelloWorldExample".....
        Console.WriteLine(greeting.Query);
      }
    }

    public async Task RetrieveNode()
    {
      // Open a session
      using (var session = _driver.AsyncSession())
      {
        // Run the query in a read transaction
        var greeting = await session.ReadTransactionAsync(async tx =>
        {
          var result = await tx.RunAsync(ReadNodeQuery);
          var records = await result.ToListAsync();

          // Consume the result. Read the single node
          // created in a previous step.
          return records[0].Values.First().Value;
        });
        // The output will look like this:
        //   HelloWorldExample
        Console.WriteLine(greeting);
      }
    }
  }
}
```

This example can be launched by running the code below on `.NET 6` or `.NET 7` with the following packages:
+ **`Neo4j`**`.Driver=4.3.0`
+ **`AWSSDK`**`.Core=3.7.102.1`

```
namespace Hello
{
  class Program
  {
    static async Task Main()
    {
      var apiCaller = new HelloWorldExample();

      await apiCaller.CreateNode();
      await apiCaller.RetrieveNode();
    }
  }
}
```

## A Golang openCypher query example using Bolt with IAM authentication
<a name="access-graph-opencypher-bolt-golang-iam-auth"></a>

The Golang package below shows how to make openCypher queries in the Go language using Bolt with IAM authentication:

```
package main

import (
  "context"
  "encoding/json"
  "fmt"
  "github.com/aws/aws-sdk-go/aws/credentials"
  "github.com/aws/aws-sdk-go/aws/signer/v4"
  "github.com/neo4j/neo4j-go-driver/v5/neo4j"
  "log"
  "net/http"
  "os"
  "time"
)

const (
  ServiceName   = "neptune-db"
  DummyUsername = "username"
)

// Find node by id using Go driver
func findNode(ctx context.Context, region string, hostAndPort string, nodeId string) (string, error) {
  req, err := http.NewRequest(http.MethodGet, "https://"+hostAndPort+"/opencypher", nil)

  if err != nil {
    return "", fmt.Errorf("error creating request, %v", err)
  }

  // credentials must have been exported as environment variables
  signer := v4.NewSigner(credentials.NewEnvCredentials())
  _, err = signer.Sign(req, nil, ServiceName, region, time.Now())

  if err != nil {
    return "", fmt.Errorf("error signing request: %v", err)
  }

  hdrs := []string{"Authorization", "X-Amz-Date", "X-Amz-Security-Token"}
  hdrMap := make(map[string]string)
  for _, h := range hdrs {
    hdrMap[h] = req.Header.Get(h)
  }

  hdrMap["Host"] = req.Host
  hdrMap["HttpMethod"] = req.Method

  password, err := json.Marshal(hdrMap)
  if err != nil {
    return "", fmt.Errorf("error creating JSON, %v", err)
  }
  authToken := neo4j.BasicAuth(DummyUsername, string(password), "")
  // +s enables encryption with a full certificate check
  // Use +ssc to disable client side TLS verification
  driver, err := neo4j.NewDriverWithContext("bolt+s://"+hostAndPort+"/opencypher", authToken)
  if err != nil {
    return "", fmt.Errorf("error creating driver, %v", err)
  }

  defer driver.Close(ctx)

  if err := driver.VerifyConnectivity(ctx); err != nil {
    log.Fatalf("failed to verify connection, %v", err)
  }

  config := neo4j.SessionConfig{}

  session := driver.NewSession(ctx, config)
  defer session.Close(ctx)

  result, err := session.Run(
    ctx,
    fmt.Sprintf("MATCH (n) WHERE ID(n) = '%s' RETURN n", nodeId),
    map[string]any{},
  )
  if err != nil {
    return "", fmt.Errorf("error running query, %v", err)
  }

  if !result.Next(ctx) {
    return "", fmt.Errorf("node not found")
  }

  n, found := result.Record().Get("n")
  if !found {
    return "", fmt.Errorf("node not found")
  }

  return fmt.Sprintf("+%v\n", n), nil
}

func main() {
  if len(os.Args) < 3 {
    log.Fatal("Usage: go main.go (region) (host and port)")
  }
  region := os.Args[1]
  hostAndPort := os.Args[2]
  ctx := context.Background()

  res, err := findNode(ctx, region, hostAndPort, "72c2e8c1-7d5f-5f30-10ca-9d2bb8c4afbc")
  if err != nil {
    log.Fatal(err)
  }
  fmt.Println(res)
}
```

## Bolt connection behavior in Neptune
<a name="access-graph-opencypher-bolt-connections"></a>

Here are some things to keep in mind about Neptune Bolt connections:
+ Because Bolt connections are created at the TCP layer, you can't use an [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) in front of them, as you can with an HTTP endpoint.
+ The port that Neptune uses for Bolt connections is your DB cluster's port.
+ Based on the Bolt preamble passed to it, the Neptune server selects the highest appropriate Bolt version (1, 2, 3, or 4.0).
+ The maximum number of connections to the Neptune server that a client can have open at any point in time is 1,000.
+ If the client doesn't close a connection after a query, that connection can be used to execute the next query.
+ However, if a connection is idle for 20 minutes, the server closes it automatically.
+ If IAM authentication is not enabled, you can use `AuthTokens.none()` rather than supplying a dummy user name and password. For example, in Java:

  ```
  GraphDatabase.driver("bolt://(your cluster endpoint URL):(your cluster port)", AuthTokens.none(),
      Config.builder().withEncryption().withTrustStrategy(TrustStrategy.trustSystemCertificates()).build());
  ```
+ When IAM authentication is enabled, a Bolt connection is always disconnected a few minutes more than 10 days after it was established if it hasn't already closed for some other reason.
+ If the client sends a query for execution over a connection without having consumed the results of a previous query, the new query is discarded. To discard the previous results instead, the client must send a reset message over the connection.
+ Only one transaction at a time can be created on a given connection.
+ If an exception occurs during a transaction, the Neptune server rolls back the transaction and closes the connection. In this case, the driver creates a new connection for the next query.
+ Be aware that sessions are not thread-safe. Multiple parallel operations must use multiple separate sessions.

# Examples of openCypher parameterized queries
<a name="opencypher-parameterized-queries"></a>

Neptune supports parameterized openCypher queries. This lets you use the same query structure multiple times with different arguments. Since the query structure doesn't change, Neptune can cache its abstract syntax tree (AST) rather than having to parse it multiple times.

## Example of an openCypher parameterized query using the HTTPS endpoint
<a name="opencypher-http-parameterized-queries"></a>

Below is an example of using a parameterized query with the Neptune openCypher HTTPS endpoint. The query is:

```
MATCH (n {name: $name, age: $age})
RETURN n
```

The parameters are defined as follows:

```
parameters={"name": "john", "age": 20}
```

You can submit the parameterized query like this:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH (n {name: \$name, age: \$age}) RETURN n" \
  --parameters '{"name": "john", "age": 20}'
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_query(
    openCypherQuery='MATCH (n {name: $name, age: $age}) RETURN n',
    parameters='{"name": "john", "age": 20}'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH (n {name: \$name, age: \$age}) RETURN n" \
  -d 'parameters={"name": "john", "age": 20}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

Using `POST`:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH (n {name: \$name, age: \$age}) RETURN n" \
  -d "parameters={\"name\": \"john\", \"age\": 20}"
```

Using `GET` (URL-encoded):

```
curl -X GET \
  "https://your-neptune-endpoint:port/openCypher?query=MATCH%20%28n%20%7Bname:\$name,age:\$age%7D%29%20RETURN%20n&parameters=%7B%22name%22:%22john%22,%22age%22:20%7D"
```

Using `DIRECT POST`:

```
curl -H "Content-Type: application/opencypher" \
  "https://your-neptune-endpoint:port/openCypher?parameters=%7B%22name%22:%22john%22,%22age%22:20%7D" \
  -d "MATCH (n {name: \$name, age: \$age}) RETURN n"
```

------

## Examples of openCypher parameterized queries using Bolt
<a name="opencypher-bolt-parameterized-queries"></a>

Here is a Python example of an openCypher parameterized query using the Bolt protocol:

```
from neo4j import GraphDatabase
uri = "bolt://[neptune-endpoint-url]:8182"
driver = GraphDatabase.driver(uri, auth=("", ""))

def match_name_and_age(tx, name, age):
  # Parameterized Query
  tx.run("MATCH (n {name: $name, age: $age}) RETURN n", name=name, age=age)

with driver.session() as session:
  # Parameters
  session.read_transaction(match_name_and_age, "john", 20)

driver.close()
```

Here is a Java example of an openCypher parameterized query using the Bolt protocol:

```
Driver driver = GraphDatabase.driver("bolt+s://(your cluster endpoint URL):8182");
HashMap<String, Object> parameters = new HashMap<>();
parameters.put("name", "john");
parameters.put("age", 20);
String queryString = "MATCH (n {name: $name, age: $age}) RETURN n";
Result result = driver.session().run(queryString, parameters);
```

# openCypher data model
<a name="access-graph-opencypher-data-model"></a>

The Neptune openCypher engine builds on the same property-graph model as Gremlin. In particular:
+ Every node has one or more labels. If you insert a node without labels, a default label named `vertex` is attached. If you try to delete all of a node's labels, an error is thrown.
+ A relationship is an entity that has exactly one relationship type and that forms a unidirectional connection between two nodes (that is, *from* one of the nodes *to* the other).
+ Both nodes and relationships can have properties, but don't have to. Neptune supports nodes and relationships with zero properties.
+ Neptune does not support metaproperties, which are not included in the openCypher specification either.
+ Properties in your graph can be multi-valued if they were created using Gremlin. That is a node or relationship property can have a set of different values rather than only one. Neptune has extended openCypher semantics to handle multi-valued properties gracefully.

Supported data types are documented in [openCypher data format](bulk-load-tutorial-format-opencypher.md). However, we do not recommend inserting `Array` property values into an openCypher graph at present. Although it is possible to insert an array property value using the bulk loader, the current Neptune openCypher release treats it as a set of multi-valued properties instead of as a single list value.

Below is the list of data types supported in this release:
+ `Bool`
+ `Byte`
+ `Short`
+ `Int` 
+ `Long`
+ `Float` (Includes plus and minus Infinity and NaN, but not INF)
+ `Double` (Includes plus and minus Infinity and NaN, but not INF)
+ `DateTime` 
+ `String`

# The openCypher `explain` feature
<a name="access-graph-opencypher-explain"></a>

The openCypher `explain` feature is a self-service tool in Amazon Neptune that helps you understand the execution approach taken by the Neptune engine. To invoke explain, you pass a parameter to an openCypher [HTTPS](access-graph-opencypher-queries.md) request with `explain=mode`, where the `mode` value can be one of the following:

****
+ **`static`**   –   In `static` mode, `explain` prints only the static structure of the query plan. It doesn't actually run the query.
+ **`dynamic`**   –   In `dynamic` mode, `explain` also runs the query, and includes dynamic aspects of the query plan. These may include the number of intermediate bindings flowing through the operators, the ratio of incoming bindings to outgoing bindings, and the total time taken by each operator.
+ **`details`**   –   In `details` mode, `explain` prints the information shown in dynamic mode plus additional details, such as the actual openCypher query string and the estimated range count for the pattern underlying a join operator.

  

For example, using `POST` with `dynamic` mode:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH (n) RETURN n LIMIT 1" \
  --explain-mode dynamic
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery='MATCH (n) RETURN n LIMIT 1',
    explainMode='dynamic'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH (n) RETURN n LIMIT 1" \
  -d "explain=dynamic"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH (n) RETURN n LIMIT 1" \
  -d "explain=dynamic"
```

------

## Limitations for openCypher `explain` in Neptune
<a name="access-graph-opencypher-explain-limitations"></a>

The current release of openCypher explain has the following limitations:
+ Explain plans are currently only available for queries that perform read-only operations. Queries that perform any sort of mutation, such as `CREATE`, `DELETE`, `MERGE`, `SET` and so on, are not supported.
+ Operators and output for a specific plan may change in future releases.

## DFE operators in openCypher `explain` output
<a name="access-graph-opencypher-dfe-operators"></a>

To use the information that the openCypher `explain` feature provides, you need to understand some details about how the [DFE query engine](neptune-dfe-engine.md) works (DFE being the engine that Neptune uses to process openCypher queries).

The DFE engine translates every query into a pipeline of operators. Starting from the first operator, intermediate solutions flow from one operator to the next through this operator pipeline. Each row in the explain table represents a result, up to the point of evaluation.

The operators that can appear in a DFE query plan are as follows:

**DFEApply**   –   Executes the function specified in the arguments section, on the value stored in the specified variable

**DFEBindRelation**   –   Binds together variables with the specified names

**DFEChunkLocalSubQuery**   –   This is a non-blocking operation that acts as a wrapper around subqueries being performed.

**DFEDistinctColumn**   –   Returns the distinct subset of the input values based on the variable specified.

**DFEDistinctRelation**   –   Returns the distinct subset of the input solutions based on the variable specified.

**DFEDrain**   –   Appears at the end of a subquery to act as a termination step for that subquery. The number of solutions is recorded as `Units In`. `Units Out` is always zero.

**DFEForwardValue**   –   Copies all input chunks directly as output chunks to be passed to its downstream operator.

**DFEGroupByHashIndex**   –   Performs a group-by operation over the input solutions based on a previously computed hash index (using the `DFEHashIndexBuild` operation). As an output, the given input is extended by a column containing a group key for every input solution.

**DFEHashIndexBuild**   –   Builds a hash index over a set of variables as a side-effect. This hash index is typically reused in later operations. See `DFEHashIndexJoin` or `DFEGroupByHashIndex` for where this hash index might be used.

**DFEHashIndexJoin**   –   Performs a join over the incoming solutions against a previously built hash index. See `DFEHashIndexBuild` for where this hash index might be built.

**DFEJoinExists**   –   Takes a left and right hand input relation, and retains values from the left relation that have a corresponding value in the right relation as defined by the given join variables. 

****   –   This is a non-blocking operation that acts as a wrapper for a subquery, allowing it to be run repeatedly for use in loops.

**DFEMergeChunks**   –   This is a blocking operation that combines chunks from its upstream operator into a single chunk of solutions to pass to its downstream operator (inverse of `DFESplitChunks`).

**DFEMinus**   –   Takes a left and right hand input relation, and retains values from the left relation that do not have a corresponding value in the right relation as defined by the given join variables. If there is no overlap in variables across both relations, then this operator simply returns the left hand input relation.

**DFENotExists**   –   Takes a left and right hand input relation, and retains values from the left relation that do not have a corresponding value in the right relation as defined by the given join variables. If there is no overlap in variables across both relations, then this operator returns an empty relation.

**DFEOptionalJoin**   –   Performs a left outer join (also called OPTIONAL join): solutions from the left hand side that have at least one join partner in the right-hand side are joined, and solutions from the left-hand side without join partner in the right-hand side are forwarded as is. This is a blocking operation.

**DFEPipelineJoin**   –   Joins the input against the tuple pattern defined by the `pattern` argument.

**DFEPipelineRangeCount**   –   Counts the number of solutions matching a given pattern, and returns a single one-ary solution containing the count value.

**DFEPipelineScan**   –   Scans the database for the given `pattern` argument, with or without a given filter on column(s).

**DFEProject**   –   Takes multiple input columns and projects only the desired columns.

**DFEReduce**   –   Performs the specified aggregation function on specified variables.

**DFERelationalJoin**   –   Joins the input of the previous operator based on the specified pattern keys using a merge join. This is a blocking operation.

**DFERouteChunks**   –   Takes input chunks from its singular incoming edge and routes those chunks along its multiple outgoing edges.

**DFESelectRows**   –   This operator selectively takes rows from its left input relation solutions to forward to its downstream operator. The rows selected based on the row identifiers supplied in the operator's right input relation.

**DFESerialize**   –   Serializes a query’s final results into a JSON string serialization, mapping each input solution to the appropriate variable name. For node and edge results, these results are serialized into a map of entity properties and metadata.

**DFESort**   –   Takes an input relation and produces a sorted relation based on the provided sort key.

**DFESplitByGroup**   –   Splits each single input chunk from one incoming edge into smaller output chunks corresponding to row groups identified by row IDs from the corresponding input chunk from the other incoming edge.

**DFESplitChunks**   –   Splits each single input chunk into smaller output chunks (inverse of `DFEMergeChunks`).

**DFEStreamingHashIndexBuild**   –   Streaming version of `DFEHashIndexBuild`.

**DFEStreamingGroupByHashIndex**   –   Streaming version of `DFEGroupByHashIndex`.

**DFESubquery**   –   This operator appears at the beginning of all plans and encapsulates the portions of the plan that are run on the [DFE engine](neptune-dfe-engine.md), which is the entire plan for openCypher.

**DFESymmetricHashJoin**   –   Joins the input of the previous operator based on the specified pattern keys using a hash join. This is a non-blocking operation.

**DFESync**   –   This operator is a synchronization operator supporting non-blocking plans. It takes solutions from two incoming edges and forwards these solutions to the appropriate downstream edges. For synchronization purposes, the inputs along one of these edges may be buffered internally. 

**DFETee**   –   This is a branching operator that sends the same set of solutions to multiple operators.

**DFETermResolution**   –   Performs a localize or globalize operation on its inputs, resulting in columns of either localized or globalized identifiers respectively.

****   –   Unfolds lists of values from an input column into the output column as individual elements.

**DFEUnion**   –   Takes two or more input relations and produces a union of those relations using the desired output schema.

**SolutionInjection**   –   Appears before everything else in the explain output, with a value of 1 in the Units Out column. However, it serves as a no-op, and doesn't actually inject any solutions into the DFE engine.

**TermResolution**   –   Appears at the end of plans and translates objects from the Neptune engine into openCypher objects.

## Columns in openCypher `explain` output
<a name="access-graph-opencypher-explain-columns"></a>

The query plan information that Neptune generates as openCypher explain output contains tables with one operator per row. The table has the following columns:

**ID**   –   The numeric ID of this operator in the plan.

**Out \$11** (and **Out \$12**)   –   The ID(s) of operator(s) that are downstream from this operator. There can be at most two downstream operators.

**Name**   –   The name of this operator.

**Arguments**   –   Any relevant details for the operator. This includes things like input schema, output schema, pattern (for `PipelineScan` and `PipelineJoin`), and so on.

**Mode**   –   A label describing fundamental operator behavior. This column is mostly blank (`-`). One exception is `TermResolution`, where mode can be `id2value_opencypher`, indicating a resolution from ID to openCypher value.

**Units In**   –   The number of solutions passed as input to this operator. Operators without upstream operators, such as `DFEPipelineScan`, `SolutionInjections`, and a `DFESubquery` with no static value injected, would have zero value.

**Units Out**   –   The number of solutions produced as output of this operator. `DFEDrain` is a special case, where the number of solutions being drained is recorded in `Units In` and `Units Out` is always zero.

**Ratio**   –   The ratio of `Units Out` to `Units In`.

**Time (ms)**   –   The CPU time consumed by this operator, in milliseconds.

## A basic example of openCypher explain output
<a name="access-graph-opencypher-explain-basic-example"></a>

The following is a basic example of openCypher `explain` output. The query is a single-node lookup in the air routes dataset for a node with the airport code `ATL` that invokes `explain` using the `details` mode in default ASCII output format.

To invoke `explain` for this query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH (n {code: 'ATL'}) RETURN n" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery="MATCH (n {code: 'ATL'}) RETURN n",
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH (n {code: 'ATL'}) RETURN n" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH (n {code: 'ATL'}) RETURN n" \
  -d "explain=details"
```

------

The `explain` output:

```
Query:
MATCH (n {code: 'ATL'}) RETURN n

╔════╤════════╤════════╤═══════════════════╤════════════════════╤═════════════════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments          │ Mode                │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪════════════════════╪═════════════════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]     │ -                   │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFESubquery       │ subQuery=subQuery1 │ -                   │ 0        │ 1         │ 0.00  │ 4.00      ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ TermResolution    │ vars=[?n]          │ id2value_opencypher │ 1        │ 1         │ 1.00  │ 2.00      ║
╚════╧════════╧════════╧═══════════════════╧════════════════════╧═════════════════════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan       │ pattern=Node(?n) with property 'code' as ?n_code2 and label 'ALL'                                            │ -    │ 0        │ 1         │ 0.00  │ 0.21      ║
║    │        │        │                       │ inlineFilters=[(?n_code2 IN ["ATL"^^xsd:string])]                                                            │      │          │           │       │           ║
║    │        │        │                       │ patternEstimate=1                                                                                            │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#9d84f97c-c3b0-459a-98d5-955a8726b159/graph_1 │ -    │ 1        │ 1         │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFEProject            │ columns=[?n]                                                                                                 │ -    │ 1        │ 1         │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 1        │ 0         │ 0.00  │ 0.03      ║
╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#9d84f97c-c3b0-459a-98d5-955a8726b159/graph_1
╔════╤════════╤════════╤══════════════════════╤════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                  │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪══════════════════════╪════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFESolutionInjection │ outSchema=[?n, ?n_code2]                                   │ -    │ 0        │ 1         │ 0.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ 3      │ DFETee               │ -                                                          │ -    │ 1        │ 2         │ 2.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 4      │ -      │ DFEDistinctColumn    │ column=?n                                                  │ -    │ 1        │ 1         │ 1.00  │ 0.20      ║
║    │        │        │                      │ ordered=false                                              │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 5      │ -      │ DFEHashIndexBuild    │ vars=[?n]                                                  │ -    │ 1        │ 1         │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEPipelineJoin      │ pattern=Node(?n) with property 'ALL' and label '?n_label1' │ -    │ 1        │ 1         │ 1.00  │ 0.25      ║
║    │        │        │                      │ patternEstimate=3506                                       │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ 6      │ 7      │ DFESync              │ -                                                          │ -    │ 2        │ 2         │ 1.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 6  │ 8      │ -      │ DFEForwardValue      │ -                                                          │ -    │ 1        │ 1         │ 1.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 7  │ 8      │ -      │ DFEForwardValue      │ -                                                          │ -    │ 1        │ 1         │ 1.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 8  │ 9      │ -      │ DFEHashIndexJoin     │ -                                                          │ -    │ 2        │ 1         │ 0.50  │ 0.35      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 9  │ -      │ -      │ DFEDrain             │ -                                                          │ -    │ 1        │ 0         │ 0.00  │ 0.02      ║
╚════╧════════╧════════╧══════════════════════╧════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝
```

At the top-level, `SolutionInjection` appears before everything else, with 1 unit out. Note that it doesn't actually inject any solutions. You can see that the next operator, `DFESubquery`, has 0 units in.

After `SolutionInjection` at the top-level are `DFESubquery` and `TermResolution` operators. `DFESubquery` encapsulates the parts of the query execution plan that is being pushed to the [DFE engine](neptune-dfe-engine.md) (for openCypher queries, the entire query plan is executed by the DFE). All the operators in the query plan are nested inside `subQuery1` that is referenced by `DFESubquery`. The only exception is `TermResolution`, which materializes internal IDs into fully serialized openCypher objects.

All the operators that are pushed down to the DFE engine have names that start with a `DFE` prefix. As mentioned above, the whole openCypher query plan is executed by the DFE, so as a result, all the operators except the final `TermResolution` operator start with `DFE`.

Inside `subQuery1`, there can be zero or more `DFEChunkLocalSubQuery` or `DFELoopSubQuery` operators that encapsulate a part of the pushed execution plan that is executed in a memory-bounded mechanism. `DFEChunkLocalSubQuery` here contains one `SolutionInjection` that is used as an input to the subquery. To find the table for that subquery in the output, search for the `subQuery=graph URI` specified in the `Arguments` column for the `DFEChunkLocalSubQuery` or `DFELoopSubQuery` operator.

In `subQuery1`, `DFEPipelineScan` with `ID` 0 scans the database for a specified `pattern`. The pattern scans for an entity with property `code` saved as a variable `?n_code2` over all labels (you could filter on a specific label by appending `airport` to `n:airport`). The `inlineFilters` argument shows the filtering for the `code` property equalling `ATL`.

Next, the `DFEChunkLocalSubQuery` operator joins the intermediate results of a subquery that contains `DFEPipelineJoin`. This ensures that `?n` is actually a node, since the previous `DFEPipelineScan` scans for any entity with the `code` property.

# Example of `explain` output for a relationship lookup with a limit
<a name="access-graph-opencypher-explain-example-2"></a>

This query looks for relationships between two anonymous nodes with type `route`, and returns at most 10. Again, the `explain` mode is `details` and the output format is the default ASCII format.

Here, `DFEPipelineScan` scans for edges that start from anonymous node `?anon_node7` and end at another anonymous node `?anon_node21`, with a relationship type saved as `?p_type1`. There is a filter for `?p_type1` being `el://route` (where `el` stands for edge label), which corresponds to `[p:route]` in the query string.

`DFEDrain` collects the output solution with a limit of 10, as shown in its `Arguments` column. `DFEDrain` terminates once the limit is reached or the all solutions are produced, whichever happens first.

To invoke `explain` for this query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH ()-[p:route]->() RETURN p LIMIT 10" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery='MATCH ()-[p:route]->() RETURN p LIMIT 10',
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH ()-[p:route]->() RETURN p LIMIT 10" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH ()-[p:route]->() RETURN p LIMIT 10" \
  -d "explain=details"
```

------

The `explain` output:

```
Query:
MATCH ()-[p:route]->() RETURN p LIMIT 10

╔════╤════════╤════════╤═══════════════════╤════════════════════╤═════════════════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments          │ Mode                │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪════════════════════╪═════════════════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]     │ -                   │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFESubquery       │ subQuery=subQuery1 │ -                   │ 0        │ 10        │ 0.00  │ 5.00      ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ TermResolution    │ vars=[?p]          │ id2value_opencypher │ 10       │ 10        │ 1.00  │ 1.00      ║
╚════╧════════╧════════╧═══════════════════╧════════════════════╧═════════════════════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤═════════════════╤═══════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name            │ Arguments                                                 │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═════════════════╪═══════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan │ pattern=Edge((?anon_node7)-[?p:?p_type1]->(?anon_node21)) │ -    │ 0        │ 1000      │ 0.00  │ 0.66      ║
║    │        │        │                 │ inlineFilters=[(?p_type1 IN [<el://route>])]              │      │          │           │       │           ║
║    │        │        │                 │ patternEstimate=26219                                     │      │          │           │       │           ║
╟────┼────────┼────────┼─────────────────┼───────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEProject      │ columns=[?p]                                              │ -    │ 1000     │ 1000      │ 1.00  │ 0.14      ║
╟────┼────────┼────────┼─────────────────┼───────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ DFEDrain        │ limit=10                                                  │ -    │ 1000     │ 0         │ 0.00  │ 0.11      ║
╚════╧════════╧════════╧═════════════════╧═══════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝
```

# Example of `explain` output for a value expression function
<a name="access-graph-opencypher-explain-example-3"></a>

The function is:

```
MATCH (a) RETURN DISTINCT labels(a)
```

In the `explain` output below, `DFEPipelineScan` (ID 0) scans for all the node labels. This corresponds to `MATCH (a`).

`DFEChunkLocalSubquery` (ID 1) aggregates the label of `?a` for each `?a`. This corresponds to `labels(a)`. You can see that through `DFEApply` and `DFEReduce`.

`BindRelation` (ID 2) is used to rename the column generic `?__gen_labelsOfa2` into `?labels(a)`.

`DFEDistinctRelation` (ID 4) retrieves only the distinct labels (multiple :airport nodes would give duplicate labels(a): ["airport"]). This corresponds to `DISTINCT labels(a)`.

To invoke `explain` for this query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH (a) RETURN DISTINCT labels(a)" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery='MATCH (a) RETURN DISTINCT labels(a)',
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH (a) RETURN DISTINCT labels(a)" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH (a) RETURN DISTINCT labels(a)" \
  -d "explain=details"
```

------

The `explain` output:

```
Query:
MATCH (a) RETURN DISTINCT labels(a)

╔════╤════════╤════════╤═══════════════════╤════════════════════╤═════════════════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments          │ Mode                │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪════════════════════╪═════════════════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]     │ -                   │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFESubquery       │ subQuery=subQuery1 │ -                   │ 0        │ 5         │ 0.00  │ 81.00     ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ TermResolution    │ vars=[?labels(a)]  │ id2value_opencypher │ 5        │ 5         │ 1.00  │ 1.00      ║
╚════╧════════╧════════╧═══════════════════╧════════════════════╧═════════════════════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan       │ pattern=Node(?a) with property 'ALL' and label '?a_label1'                                                   │ -    │ 0        │ 3750      │ 0.00  │ 26.77     ║
║    │        │        │                       │ patternEstimate=3506                                                                                         │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#8b314f55-2cc7-456a-a48a-c76a0465cfab/graph_1 │ -    │ 3750     │ 3750      │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFEBindRelation       │ inputVars=[?a, ?__gen_labelsOfa2, ?__gen_labelsOfa2]                                                         │ -    │ 3750     │ 3750      │ 1.00  │ 0.08      ║
║    │        │        │                       │ outputVars=[?a, ?__gen_labelsOfa2, ?labels(a)]                                                               │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ DFEProject            │ columns=[?labels(a)]                                                                                         │ -    │ 3750     │ 3750      │ 1.00  │ 0.05      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEDistinctRelation   │ -                                                                                                            │ -    │ 3750     │ 5         │ 0.00  │ 2.78      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 5        │ 0         │ 0.00  │ 0.03      ║
╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#8b314f55-2cc7-456a-a48a-c76a0465cfab/graph_1
╔════╤════════╤════════╤══════════════════════╤════════════════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                  │ Mode     │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪══════════════════════╪════════════════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFESolutionInjection │ outSchema=[?a]                                             │ -        │ 0        │ 3750      │ 0.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ 3      │ DFETee               │ -                                                          │ -        │ 3750     │ 7500      │ 2.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 4      │ -      │ DFEProject           │ columns=[?a]                                               │ -        │ 3750     │ 3750      │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 17     │ -      │ DFEOptionalJoin      │ -                                                          │ -        │ 7500     │ 3750      │ 0.50  │ 0.44      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEDistinctRelation  │ -                                                          │ -        │ 3750     │ 3750      │ 1.00  │ 2.23      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ 6      │ -      │ DFEDistinctColumn    │ column=?a                                                  │ -        │ 3750     │ 3750      │ 1.00  │ 1.50      ║
║    │        │        │                      │ ordered=false                                              │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 6  │ 7      │ -      │ DFEPipelineJoin      │ pattern=Node(?a) with property 'ALL' and label '?a_label3' │ -        │ 3750     │ 3750      │ 1.00  │ 10.58     ║
║    │        │        │                      │ patternEstimate=3506                                       │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 7  │ 8      │ 9      │ DFETee               │ -                                                          │ -        │ 3750     │ 7500      │ 2.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 8  │ 10     │ -      │ DFEBindRelation      │ inputVars=[?a_label3]                                      │ -        │ 3750     │ 3750      │ 1.00  │ 0.04      ║
║    │        │        │                      │ outputVars=[?100]                                          │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 9  │ 11     │ -      │ DFEBindRelation      │ inputVars=[?a, ?a_label3, ?100]                            │ -        │ 7500     │ 3750      │ 0.50  │ 0.07      ║
║    │        │        │                      │ outputVars=[?a, ?a_label3, ?100]                           │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 10 │ 9      │ -      │ DFETermResolution    │ column=?100                                                │ id2value │ 3750     │ 3750      │ 1.00  │ 7.60      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 11 │ 12     │ -      │ DFEBindRelation      │ inputVars=[?a, ?a_label3, ?100]                            │ -        │ 3750     │ 3750      │ 1.00  │ 0.06      ║
║    │        │        │                      │ outputVars=[?a, ?100, ?a_label3]                           │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 12 │ 13     │ -      │ DFEApply             │ functor=nodeLabel(?a_label3)                               │ -        │ 3750     │ 3750      │ 1.00  │ 0.55      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 13 │ 14     │ -      │ DFEProject           │ columns=[?a, ?a_label3_alias4]                             │ -        │ 3750     │ 3750      │ 1.00  │ 0.05      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 14 │ 15     │ -      │ DFEMergeChunks       │ -                                                          │ -        │ 3750     │ 3750      │ 1.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 15 │ 16     │ -      │ DFEReduce            │ functor=collect(?a_label3_alias4)                          │ -        │ 3750     │ 3750      │ 1.00  │ 6.37      ║
║    │        │        │                      │ segmentationKey=[?a]                                       │          │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 16 │ 3      │ -      │ DFEMergeChunks       │ -                                                          │ -        │ 3750     │ 3750      │ 1.00  │ 0.03      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 17 │ -      │ -      │ DFEDrain             │ -                                                          │ -        │ 3750     │ 0         │ 0.00  │ 0.02      ║
╚════╧════════╧════════╧══════════════════════╧════════════════════════════════════════════════════════════╧══════════╧══════════╧═══════════╧═══════╧═══════════╝
```

# Example of `explain` output for a mathematical value expression function
<a name="access-graph-opencypher-explain-example-4"></a>

In this example, `RETURN abs(-10)` performs a simple evaluation, taking the absolute value of a constant, `-10`.

`DFEChunkLocalSubQuery` (ID 1) performs a solution injection for the static value `-10`, which is stored in the variable, `?100`.

`DFEApply` (ID 2) is the operator that executes the absolute value function `abs()` on the static value stored in `?100` variable.

Here is the query and resulting `explain` output.

To invoke `explain` for this query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "RETURN abs(-10)" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery='RETURN abs(-10)',
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=RETURN abs(-10)" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=RETURN abs(-10)" \
  -d "explain=details"
```

------

The `explain` output:

```
Query:
RETURN abs(-10)

╔════╤════════╤════════╤═══════════════════╤═══════════════════════╤═════════════════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments             │ Mode                │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪═══════════════════════╪═════════════════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]        │ -                   │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼───────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFESubquery       │ subQuery=subQuery1    │ -                   │ 0        │ 1         │ 0.00  │ 4.00      ║
╟────┼────────┼────────┼───────────────────┼───────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ TermResolution    │ vars=[?_internalVar1] │ id2value_opencypher │ 1        │ 1         │ 1.00  │ 1.00      ║
╚════╧════════╧════════╧═══════════════════╧═══════════════════════╧═════════════════════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFESolutionInjection  │ outSchema=[]                                                                                                 │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#c4cc6148-cce3-4561-93c0-deb91f257356/graph_1 │ -    │ 1        │ 1         │ 1.00  │ 0.03      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFEApply              │ functor=abs(?100)                                                                                            │ -    │ 1        │ 1         │ 1.00  │ 0.26      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ DFEBindRelation       │ inputVars=[?_internalVar2, ?_internalVar2]                                                                   │ -    │ 1        │ 1         │ 1.00  │ 0.04      ║
║    │        │        │                       │ outputVars=[?_internalVar2, ?_internalVar1]                                                                  │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEProject            │ columns=[?_internalVar1]                                                                                     │ -    │ 1        │ 1         │ 1.00  │ 0.06      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 1        │ 0         │ 0.00  │ 0.05      ║
╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝

subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#c4cc6148-cce3-4561-93c0-deb91f257356/graph_1
╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                           │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFESolutionInjection │ solutions=[?100 -> [-10^^<LONG>]]   │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
║    │        │        │                      │ outSchema=[?100]                    │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 3      │ -      │ DFERelationalJoin    │ joinVars=[]                         │ -    │ 2        │ 1         │ 0.50  │ 0.18      ║
╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 1      │ -      │ DFESolutionInjection │ outSchema=[]                        │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ -      │ -      │ DFEDrain             │ -                                   │ -    │ 1        │ 0         │ 0.00  │ 0.02      ║
╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝
```

# Example of `explain` output for a variable-length path (VLP) query
<a name="access-graph-opencypher-explain-example-5"></a>

This is an example of a more complex query plan for handling a variable-length path query. This example only shows part of the `explain` output, for clarity.

In `subQuery1`, `DFEPipelineScan` (ID 0) and `DFEChunkLocalSubQuery` (ID 1), which injects the `...graph_1` subquery, are responsible for scanning for a node with the `YPO` code.

In `subQuery1`, `DFEChunkLocalSubQuery` (ID 2), which injects the `...graph_2` subquery, is responsible for scanning for a node with the `LAX` code.

In `subQuery1`, `DFEChunkLocalSubQuery` (ID 3) injects the `...graph3` subquery, which contains `DFELoopSubQuery` (ID 17), which in turn injects the `...graph5` subquery. This operation is responsible for resolving the `-[*2]->` variable-length pattern in the query string between two nodes.

To invoke `explain` for this query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "MATCH p=(a {code: 'YPO'})-[*2]->(b{code: 'LAX'}) return p" \
  --explain-mode details
```

For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_explain_query(
    openCypherQuery="MATCH p=(a {code: 'YPO'})-[*2]->(b{code: 'LAX'}) return p",
    explainMode='details'
)

print(response['results'].read().decode('utf-8'))
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=MATCH p=(a {code: 'YPO'})-[*2]->(b{code: 'LAX'}) return p" \
  -d "explain=details"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=MATCH p=(a {code: 'YPO'})-[*2]->(b{code: 'LAX'}) return p" \
  -d "explain=details"
```

------

The `explain` output:

```
Query:
MATCH p=(a {code: 'YPO'})-[*2]->(b{code: 'LAX'}) return p

╔════╤════════╤════════╤═══════════════════╤════════════════════╤═════════════════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments          │ Mode                │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪════════════════════╪═════════════════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]     │ -                   │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFESubquery       │ subQuery=subQuery1 │ -                   │ 0        │ 0         │ 0.00  │ 84.00     ║
╟────┼────────┼────────┼───────────────────┼────────────────────┼─────────────────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ -      │ -      │ TermResolution    │ vars=[?p]          │ id2value_opencypher │ 0        │ 0         │ 0.00  │ 0         ║
╚════╧════════╧════════╧═══════════════════╧════════════════════╧═════════════════════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan       │ pattern=Node(?a) with property 'code' as ?a_code7 and label 'ALL'                                            │ -    │ 0        │ 1         │ 0.00  │ 0.68      ║
║    │        │        │                       │ inlineFilters=[(?a_code7 IN ["YPO"^^xsd:string])]                                                            │      │          │           │       │           ║
║    │        │        │                       │ patternEstimate=1                                                                                            │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_1 │ -    │ 1        │ 1         │ 1.00  │ 0.03      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_2 │ -    │ 1        │ 1         │ 1.00  │ 0.02      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_3 │ -    │ 1        │ 0         │ 0.00  │ 0.04      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEBindRelation       │ inputVars=[?__gen_path6, ?anon_rel26, ?b_code8, ?b, ?a_code7, ?a, ?__gen_path6]                              │ -    │ 0        │ 0         │ 0.00  │ 0.10      ║
║    │        │        │                       │ outputVars=[?__gen_path6, ?anon_rel26, ?b_code8, ?b, ?a_code7, ?a, ?p]                                       │      │          │           │       │           ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ 6      │ -      │ DFEProject            │ columns=[?p]                                                                                                 │ -    │ 0        │ 0         │ 0.00  │ 0.05      ║
╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 6  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 0        │ 0         │ 0.00  │ 0.02      ║
╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_1
╔════╤════════╤════════╤══════════════════════╤════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                  │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪══════════════════════╪════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFESolutionInjection │ outSchema=[?a, ?a_code7]                                   │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ 3      │ DFETee               │ -                                                          │ -    │ 1        │ 2         │ 2.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 4      │ -      │ DFEDistinctColumn    │ column=?a                                                  │ -    │ 1        │ 1         │ 1.00  │ 0.25      ║
║    │        │        │                      │ ordered=false                                              │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 5      │ -      │ DFEHashIndexBuild    │ vars=[?a]                                                  │ -    │ 1        │ 1         │ 1.00  │ 0.05      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ DFEPipelineJoin      │ pattern=Node(?a) with property 'ALL' and label '?a_label1' │ -    │ 1        │ 1         │ 1.00  │ 0.47      ║
║    │        │        │                      │ patternEstimate=3506                                       │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ 6      │ 7      │ DFESync              │ -                                                          │ -    │ 2        │ 2         │ 1.00  │ 0.04      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 6  │ 8      │ -      │ DFEForwardValue      │ -                                                          │ -    │ 1        │ 1         │ 1.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 7  │ 8      │ -      │ DFEForwardValue      │ -                                                          │ -    │ 1        │ 1         │ 1.00  │ 0.01      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 8  │ 9      │ -      │ DFEHashIndexJoin     │ -                                                          │ -    │ 2        │ 1         │ 0.50  │ 0.26      ║
╟────┼────────┼────────┼──────────────────────┼────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 9  │ -      │ -      │ DFEDrain             │ -                                                          │ -    │ 1        │ 0         │ 0.00  │ 0.02      ║
╚════╧════════╧════════╧══════════════════════╧════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_2
╔════╤════════╤════════╤══════════════════════╤═══════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                         │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪══════════════════════╪═══════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan      │ pattern=Node(?b) with property 'code' as ?b_code8 and label 'ALL' │ -    │ 0        │ 1         │ 0.00  │ 0.38      ║
║    │        │        │                      │ inlineFilters=[(?b_code8 IN ["LAX"^^xsd:string])]                 │      │          │           │       │           ║
║    │        │        │                      │ patternEstimate=1                                                 │      │          │           │       │           ║
╟────┼────────┼────────┼──────────────────────┼───────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEMergeChunks       │ -                                                                 │ -    │ 1        │ 1         │ 1.00  │ 0.02      ║
╟────┼────────┼────────┼──────────────────────┼───────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 4      │ -      │ DFERelationalJoin    │ joinVars=[]                                                       │ -    │ 2        │ 1         │ 0.50  │ 0.19      ║
╟────┼────────┼────────┼──────────────────────┼───────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 2      │ -      │ DFESolutionInjection │ outSchema=[?a, ?a_code7]                                          │ -    │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼──────────────────────┼───────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ -      │ -      │ DFEDrain             │ -                                                                 │ -    │ 1        │ 0         │ 0.00  │ 0.01      ║
╚════╧════════╧════════╧══════════════════════╧═══════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_3
╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode     │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪═══════╪═══════════╣
...
║ 17 │ 18     │ -      │ DFELoopSubQuery       │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#cc05129f-d07e-4622-bbe3-9e99558eca46/graph_5 │ -        │ 1        │ 2         │ 2.00  │ 0.31      ║
...
```

# Transactions in Neptune openCypher
<a name="access-graph-opencypher-transactions"></a>

The openCypher implementation in Amazon Neptune uses the [transaction semantics defined by Neptune](transactions-neptune.md) However, isolation levels provided by the Bolt driver have some specific implications for Bolt transaction semantics, as described in the sections below.

## Read-only Bolt transaction queries
<a name="access-graph-opencypher-transactions-ro"></a>

There are various ways that read-only queries can be processed, with different transaction models and isolation levels, as follows:

### Implicit read-only transaction queries
<a name="access-graph-opencypher-transactions-ro-implicit"></a>

Here is an example of a read-only implicit transaction:

```
public void executeReadImplicitTransaction()
{
  // end point
  final String END_POINT = "(End Point URL)";

  // read query
  final String READ_QUERY = "MATCH (n) RETURN n limit 10";

  // create the driver
  final Driver driver = GraphDatabase.driver(END_POINT, AuthTokens.none(),
          Config.builder().withEncryption()
                          .withTrustStrategy(TrustStrategy.trustSystemCertificates())
                          .build());

  // create the session config
  SessionConfig sessionConfig = SessionConfig.builder()
                                             .withFetchSize(1000)
                                             .withDefaultAccessMode(AccessMode.READ)
                                             .build();

  // run the query as access mode read
  driver.session(sessionConfig).readTransaction(new TransactionWork<String>()
    {
      final StringBuilder resultCollector = new StringBuilder();

      @Override
      public String execute(final Transaction tx)
      {
        // execute the query
        Result queryResult = tx.run(READ_QUERY);

        // Read the result
        for (Record record : queryResult.list())
        {
          for (String key : record.keys())
          {
            resultCollector.append(key)
                           .append(":")
                           .append(record.get(key).asNode().toString());
          }
        }
        return resultCollector.toString();
      }

    }
  );

  // close the driver.
  driver.close();
}
```

Because read-replicas only accept read-only queries, all queries against read-replicas execute as read-implicit transactions regardless of the access mode set in the session configuration. Neptune evaluates read-implicit transactions as [read-only queries](transactions-neptune.md#transactions-neptune-read-only) under `SNAPSHOT` isolation semantics.

In case of failure, read-implicit transactions are retried by default.

### Autocommit read-only transaction queries
<a name="access-graph-opencypher-transactions-ro-autocommit"></a>

Here is an example of a read-only autocommit transaction:

```
public void executeAutoCommitTransaction()
{
  // end point
  final String END_POINT = "(End Point URL)";

  // read query
  final String READ_QUERY = "MATCH (n) RETURN n limit 10";

  // Create the session config.
  final SessionConfig sessionConfig = SessionConfig
    .builder()
    .withFetchSize(1000)
    .withDefaultAccessMode(AccessMode.READ)
    .build();

  // create the driver
  final Driver driver = GraphDatabase.driver(END_POINT, AuthTokens.none(),
    Config.builder()
          .withEncryption()
          .withTrustStrategy(TrustStrategy.trustSystemCertificates())
          .build());

  // result collector
  final StringBuilder resultCollector = new StringBuilder();

  // create a session
  final Session session = driver.session(sessionConfig);

  // run the query
  final Result queryResult = session.run(READ_QUERY);
  for (final Record record : queryResult.list())
  {
    for (String key : record.keys())
    {
      resultCollector.append(key)
                     .append(":")
                     .append(record.get(key).asNode().toString());
    }
  }

  // close the session
  session.close();

  // close the driver
  driver.close();
}
```

If the access mode is set to `READ` in the session configuration, Neptune evaluates autocommit transaction queries as [read-only queries](transactions-neptune.md#transactions-neptune-read-only) under `SNAPSHOT` isolation semantics. Note that read-replicas only accept read-only queries.

If you don't pass in a session configuration, autocommit queries are processed by default with mutation query isolation, so it is important to pass in a session configuration that explicitly sets the access mode to `READ`.

In case of failure, read-only autocommit queries are not re-tried.

### Explicit read-only transaction queries
<a name="access-graph-opencypher-transactions-ro-explicit"></a>

Here is an example of an explicit read-only transaction:

```
public void executeReadExplicitTransaction()
{
  // end point
  final String END_POINT = "(End Point URL)";

  // read query
  final String READ_QUERY = "MATCH (n) RETURN n limit 10";

  // Create the session config.
  final SessionConfig sessionConfig = SessionConfig
    .builder()
    .withFetchSize(1000)
    .withDefaultAccessMode(AccessMode.READ)
    .build();

  // create the driver
  final Driver driver = GraphDatabase.driver(END_POINT, AuthTokens.none(),
    Config.builder()
          .withEncryption()
          .withTrustStrategy(TrustStrategy.trustSystemCertificates())
          .build());

  // result collector
  final StringBuilder resultCollector = new StringBuilder();

  // create a session
  final Session session = driver.session(sessionConfig);

  // begin transaction
  final Transaction tx = session.beginTransaction();

  // run the query on transaction
  final List<Record> list = tx.run(READ_QUERY).list();

  // read the result
  for (final Record record : list)
  {
    for (String key : record.keys())
    {
      resultCollector
        .append(key)
        .append(":")
        .append(record.get(key).asNode().toString());
    }
  }

  // commit the transaction and for rollback we can use beginTransaction.rollback();
  tx.commit();

  // close the driver
  driver.close();
}
```

If the access mode is set to `READ` in the session configuration, Neptune evaluates explicit read-only transactions as [read-only queries](transactions-neptune.md#transactions-neptune-read-only) under `SNAPSHOT` isolation semantics. Note that read-replicas only accept read-only queries.

If you don't pass in a session configuration, explicit read-only transactions are processed by default with mutation query isolation, so it is important to pass in a session configuration that explicitly sets the access mode to `READ`.

In case of failure, read-only explicit queries are retried by default.

## Mutation Bolt transaction queries
<a name="access-graph-opencypher-transactions-wr"></a>

As with read-only queries, there are various ways that mutation queries can be processed, with different transaction models and isolation levels, as follows:

### Implicit mutation transaction queries
<a name="access-graph-opencypher-transactions-wr-implicit"></a>

Here is an example of an implicit mutation transaction:

```
public void executeWriteImplicitTransaction()
{
  // end point
  final String END_POINT = "(End Point URL)";

  // create node with label as label and properties.
  final String WRITE_QUERY = "CREATE (n:label {name : 'foo'})";

  // Read the vertex created with label as label.
  final String READ_QUERY = "MATCH (n:label) RETURN n";

  // create the driver
  final Driver driver = GraphDatabase.driver(END_POINT, AuthTokens.none(),
    Config.builder()
          .withEncryption()
          .withTrustStrategy(TrustStrategy.trustSystemCertificates())
          .build());

  // create the session config
  SessionConfig sessionConfig = SessionConfig
    .builder()
    .withFetchSize(1000)
    .withDefaultAccessMode(AccessMode.WRITE)
    .build();

  final StringBuilder resultCollector = new StringBuilder();

  // run the query as access mode write
  driver.session(sessionConfig).writeTransaction(new TransactionWork<String>()
  {
    @Override
    public String execute(final Transaction tx)
    {
      // execute the write query and consume the result.
      tx.run(WRITE_QUERY).consume();

      // read the vertex written in the same transaction
      final List<Record> list = tx.run(READ_QUERY).list();

      // read the result
      for (final Record record : list)
      {
        for (String key : record.keys())
        {
          resultCollector
            .append(key)
            .append(":")
            .append(record.get(key).asNode().toString());
        }
      }
      return resultCollector.toString();
    }
  }); // at the end, the transaction is automatically committed.

  // close the driver.
  driver.close();
}
```

Reads made as part of mutation queries are executed under `READ COMMITTED` isolation with the usual guarantees for [Neptune mutation transactions](transactions-neptune.md#transactions-neptune-mutation).

Whether or not you specifically pass in a session configuration, the transaction is always treated as a write transaction.

For conflicts, see [Conflict Resolution Using Lock-Wait Timeouts](transactions-neptune.md#transactions-neptune-conflicts).

### Autocommit mutation transaction queries
<a name="access-graph-opencypher-transactions-wr-autocommit"></a>

Mutation autocommit queries inherit the same behavior as mutation implicit transactions.

If you do not pass in a session configuration, the transaction is treated as a write transaction by default.

In case of failure, mutation autocommit queries are not automatically retried.

### Explicit mutation transaction queries
<a name="access-graph-opencypher-transactions-wr-explicit"></a>

Here is an example of an explicit mutation transaction:

```
public void executeWriteExplicitTransaction()
{
  // end point
  final String END_POINT = "(End Point URL)";

  // create node with label as label and properties.
  final String WRITE_QUERY = "CREATE (n:label {name : 'foo'})";

  // Read the vertex created with label as label.
  final String READ_QUERY = "MATCH (n:label) RETURN n";

  // create the driver
  final Driver driver = GraphDatabase.driver(END_POINT, AuthTokens.none(),
    Config.builder()
          .withEncryption()
          .withTrustStrategy(TrustStrategy.trustSystemCertificates())
          .build());

  // create the session config
  SessionConfig sessionConfig = SessionConfig
    .builder()
    .withFetchSize(1000)
    .withDefaultAccessMode(AccessMode.WRITE)
    .build();

  final StringBuilder resultCollector = new StringBuilder();

  final Session session = driver.session(sessionConfig);

  // run the query as access mode write
  final Transaction tx = driver.session(sessionConfig).beginTransaction();

  // execute the write query and consume the result.
  tx.run(WRITE_QUERY).consume();

  // read the result from the previous write query in a same transaction.
  final List<Record> list = tx.run(READ_QUERY).list();

  // read the result
  for (final Record record : list)
  {
    for (String key : record.keys())
    {
      resultCollector
        .append(key)
        .append(":")
        .append(record.get(key).asNode().toString());
    }
  }

  // commit the transaction and for rollback we can use tx.rollback();
  tx.commit();

  // close the session
  session.close();

  // close the driver.
  driver.close();
}
```

Explicit mutation queries inherit the same behavior as implicit mutation transactions.

If you do not pass in a session configuration, the transaction is treated as a write transaction by default.

For conflicts, see [Conflict Resolution Using Lock-Wait Timeouts](transactions-neptune.md#transactions-neptune-conflicts).

# openCypher query hints
<a name="opencypher-query-hints"></a>

**Important**  
 openCypher query hint is only available from engine release [1.3.2.0](https://docs.aws.amazon.com//neptune/latest/userguide/engine-releases-1.3.2.0.html) and later. 

 In Amazon Neptune, you can use the `USING` clause to specify query hints for openCypher queries. These hints allow you to control optimization and evaluation strategies. 

 The syntax for query hints is: 

```
USING {scope}:{hint} {value}
```

1.  `{scope}` defines the scope in which the hint applies to: `Query` or `Clause`. 

    A scope value of `Query` means that the query hint applies to the whole query (query-level). 

    A scope value of `Clause` means that the query hint applies to the clause the hint precedes (clause-level). 

1.  `{hint}` is the name of the query hint being applied. 

1.  `{value}` is the argument for the `{hint}`. 

 The values can be case-insensitive. 

 For example, to enable the query plan cache for a query: 

```
Using QUERY:PLANCACHE "enabled" 
MATCH (a:Person {firstName: "Erin", lastName: $lastName})
 RETURN a
```

**Note**  
 Currently, the **Query** scope query hints **PLANCACHE**, **TIMEOUTMILLISECONDS**, and **assumeConsistentDataTypes** are supported. Supported query hints are listed below. 

**Topics**
+ [openCypher query plan cache hint](opencypher-query-hints-qpc-hint.md)
+ [AssumeConsistentDataTypes hint](opencypher-query-hints-AssumeConsistentDataTypes.md)
+ [openCypher query timeout hint](opencypher-query-hints-timeout-hint.md)

# openCypher query plan cache hint
<a name="opencypher-query-hints-qpc-hint"></a>

 Query plan cache behavior can be overridden on a per-query (parameterized or not) basis by query-level query hint `QUERY:PLANCACHE`. It needs to be used with the `USING` clause. The query hint accepts `enabled` or `disabled` as a value. For more information on query plan cache, see [Query plan cache in Amazon Neptune](access-graph-qpc.md). 

------
#### [ AWS CLI ]

Forcing plan to be cached or reused:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

With parameters:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
  --parameters '{"arg": 123}'
```

Forcing plan to be neither cached nor reused:

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1"
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

# Forcing plan to be cached or reused
response = client.execute_open_cypher_query(
    openCypherQuery='Using QUERY:PLANCACHE "enabled" MATCH(n) RETURN n LIMIT 1'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

Forcing plan to be cached or reused:

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

Forcing plan to be cached or reused:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1"
```

With parameters:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \
  -d "parameters={\"arg\": 123}"
```

Forcing plan to be neither cached nor reused:

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1"
```

------

# AssumeConsistentDataTypes hint
<a name="opencypher-query-hints-AssumeConsistentDataTypes"></a>

 openCypher follows a paradigm where matches of numerical datatypes (e.g., int, byte, short, long, etc.) are carried out under type promotion semantics. For instance, when looking up all properties with an input value 10 with short type, under type promotion semantics, it would also match properties that have 10 as a long value. In some cases, type casting can induce overhead and result in query plans that are less efficient than they could be if no type casting was performed. In particular in cases where datatypes are used consistently in the data (e.g., if all person’s ages are stored as long value) performing type promotions causes overhead without impacting the query result. 

 To allow optimization for cases when it is known that numeric property data values stored in the database are of consistent type, a query hint called `assumeConsistentDataTypes` (with value `true/false`, with the default being `false`) can be used. When this query hint is supplied with a value of `true`, the engine assumes the only property values are always long or double and will skip the type promotion semantics. Numerical values specified in the query are considered to be either long values (for non-floating point values) and double (for floating point values). 

 If the data is consistently using a single datatype (e.g. all ages are stored as `long`), then using the `assumeConsistentDataTypes` hint can optimize the query by skipping unnecessary equality checks for different numeric types. However, if the data has inconsistent datatypes for the same property, then using the hint may cause some results to be missed, as the query will only match the single datatype that the hint assumes. 

```
# Database loaded with following openCypher CSV's

# File 1
:ID,age:Int
n1,20
n2,25

# File 2
:ID,age:Long
n3,25


# Example (no hint)
MATCH (n:Person) 
WHERE n.age >= 25
RETURN n

# Result
n2
n3

Returns all person whose age is >= 25 and the values >= 25 can be with any of these datatypes
i.e. byte, short, int, long, double or float

-----------------------------------------------------------------------------------

# Example (with hint present)
USING QUERY:assumeConsistentDataTypes "true"
MATCH (n:Person)
WHERE n.age >= 25
RETURN n

# Result
n3

Returns only "n3" and not "n2". The reason is that even though the numerical value
matches (25), the datatype is "int" and is considered a non-match.
```

 The difference can also be validated via the explain. 

 Without the explain: 

```
# Query
MATCH (n)
WHERE n.age = 20
RETURN n

# Explain Snippet
╔═════╤══════════╤══════════╤══════════════════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╤════════╤════════════╤══════════════╤═════════╤══════════════╗
║ ID │ Out #1 │ Out #2 │ Name                   │ Arguments                                                                                                                            │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠═════╪══════════╪══════════╪══════════════════════════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪════════╪════════════╪══════════════╪═════════╪══════════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan (DFX)  │ pattern=Node(?n) with property 'age' as ?n_age2 and label 'ALL'                                                                      │ -    │ 0        │ 1         │ 0.00  │ 0.10      ║
║    │        │        │                        │ inlineFilters=[(?n_age2 IN ["20"^^xsd:byte, "20"^^xsd:int, "20"^^xsd:long, "20"^^xsd:short, "20.0"^^xsd:double, "20.0"^^xsd:float])] │      │          │           │       │           ║
║    │        │        │                        │ patternEstimate=1                                                                                                                    │      │          │           │       │           ║
╟─────┼──────────┼──────────┼──────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼──────────────┼─────────┼──────────────╢

# The inFilters field contains all numeric types
```

 With the hint: 

```
# Query
MATCH (n)
WHERE n.age = 20
RETURN n

# Explain Snippet
╔═════╤══════════╤══════════╤══════════════════════════════╤═════════════════════════════════════════════════════════════════════════════════╤════════╤════════════╤══════════════╤═════════╤══════════════╗
║ ID │ Out #1 │ Out #2 │ Name                   │ Arguments                                                       │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠═════╪══════════╪══════════╪══════════════════════════════╪═════════════════════════════════════════════════════════════════════════════════╪════════╪════════════╪══════════════╪═════════╪══════════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan (DFX)  │ pattern=Node(?n) with property 'age' as ?n_age2 and label 'ALL' │ -    │ 0        │ 1         │ 0.00  │ 0.07      ║
║    │        │        │                        │ inlineFilters=[(?n_age2 IN ["20"^^xsd:long])]                   │      │          │           │       │           ║
║    │        │        │                        │ patternEstimate=1                                               │      │          │           │       │           ║
╟─────┼──────────┼──────────┼──────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼──────────────┼─────────┼──────────────╢

# The inFilters field only contains long datatype
```

# openCypher query timeout hint
<a name="opencypher-query-hints-timeout-hint"></a>

 Query timeout behavior can be configured on a per-query basis by query-level query hint `QUERY:TIMEOUTMILLISECONDS`. It must be used with the `USING` clause. The query hint accepts non-negative long as a value. 

------
#### [ AWS CLI ]

```
aws neptunedata execute-open-cypher-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --open-cypher-query "USING QUERY:TIMEOUTMILLISECONDS 100 MATCH(n) RETURN n LIMIT 1"
```

For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_open_cypher_query(
    openCypherQuery='USING QUERY:TIMEOUTMILLISECONDS 100 MATCH(n) RETURN n LIMIT 1'
)

print(response['results'])
```

For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/openCypher \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d "query=USING QUERY:TIMEOUTMILLISECONDS 100 MATCH(n) RETURN n LIMIT 1"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/openCypher \
  -d "query=USING QUERY:TIMEOUTMILLISECONDS 100 MATCH(n) RETURN n LIMIT 1"
```

------

 Query timeout behavior will consider the minimum of cluster-level timeout and query-level timeout. Please see below examples to understand query timeout behavior. For more information on cluster-level query timeout, see [neptune\$1query\$1timeout](https://docs.aws.amazon.com/neptune/latest/userguide/parameters.html#parameters-db-cluster-parameters-neptune_query_timeout). 

```
# Suppose `neptune_query_timeout` is 10000 ms and query-level timeout is set to 100 ms
# It will consider 100 ms as the final timeout 

curl https://your-neptune-endpoint:port/openCypher \
  -d "query=USING QUERY:TIMEOUTMILLISECONDS 100 MATCH(n) RETURN n LIMIT 1"

# Suppose `neptune_query_timeout` is 100 ms and query-level timeout is set to 10000 ms
# It will still consider 100 ms as the final timeout 

curl https://your-neptune-endpoint:port/openCypher \
  -d "query=USING QUERY:TIMEOUTMILLISECONDS 10000 MATCH(n) RETURN n LIMIT 1"
```

# Neptune openCypher restrictions
<a name="access-graph-opencypher-limitations"></a>

The Amazon Neptune release of openCypher still does not support everything that is specified in the [Cypher Query Language Reference, Version 9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf), as is detailed in [openCypher specification compliance](feature-opencypher-compliance.md). Future releases are expected to address many of those limitations.

# Neptune openCypher exceptions
<a name="access-graph-opencypher-exceptions"></a>

When working with openCypher on Amazon Neptune, a variety of exceptions may occur. Below are common exceptions you may receive, either from the HTTPS endpoint or from the Bolt driver (all exceptions from the Bolt driver are reported as Server State Exceptions):


| HTTP code | Error message | Retriable? | Remedy | 
| --- | --- | --- | --- | 
| 400 | *(syntax error, propagated directly from the openCypher parser)* | No | Correct query syntax, then retry. | 
| 500 | `Operation terminated (out of memory)` | Yes | Rework the query to add additional filtering criteria to reduce required memory | 
| 500 | Operation terminated (deadline exceeded) | Yes | Increase the query timeout in the DB cluster parameter group, or [retry the request](https://docs.aws.amazon.com/general/latest/gr/api-retries.html). | 
| 500 | Operation terminated (cancelled by user) | Yes | Retry the request. | 
| 500 | Database reset is in progress. Please retry the query after the cluster is available. | Yes | Retry when the reset is completed. | 
| 500 | Operation failed due to conflicting concurrent operations (please retry). Transactions are currently rolling back. | Yes | Retry using an [exponential backoff and retry strategy](best-practices-opencypher-retry-logic.md). | 
| 400 | *(operation name)* operation/feature unsupported Exception | No | The specified operation is not supported. | 
| 400 | openCypher update attempted on a read-only replica | No | Change the target end point to the writer end point. | 
| 400 | MalformedQueryException (Neptune does not show the internal parser state) | No | Correct query syntax and retry. | 
| 400 | Cannot delete node, because it still has relationships. To delete this node, you must first delete its relationships. | No | Instead of using `MATCH (n) DELETE n` use `MATCH(n) DETACH DELETE(n)` | 
| 400 | Invalid operation: attempting to remove the last label of a node. A node must have at least one label. | No | Neptune requires all nodes to have at least one label, and if nodes are created without an explicit label, a default label `vertex` is assigned. Change the query and/or application logic so as not to delete the last label. A singleton label of a node can be updated by setting a new label and then removing the old label. | 
| 500 | Max number of request have breached, ConfiguredQueueCapacity=\$1\$1 for connId = \$1\$1 | Yes | Currently only 8,192 concurrent requests can be processed, regardless of the stack and protocol. | 
| 500 | Max connection limit breached. | Yes | Only 1000 concurrent Bolt connections per instance are allowed (for HTTP there is no limit). | 
| 400 | Expected a [one of: Node, Relationship or Path] and got a Literal | No | Check that you are passing the correct argument(s), correct query syntax, and retry. | 
| 400 | Property value must be a simple literal. Or: Expected Map for Set properties but didn't find one. | No | A SET clause only accepts simple literals, not composite types. | 
| 400 | Entity found passed for deletion is not found | No | Check that the entity you are trying to delete exists in the database.  | 
| 400 | User does not have access to the database. | No | Check the policy on the IAM role being used. | 
| 400 | There is no token passed as part of the request | No | A properly signed token must be passed as part of the query request on an IAM enabled cluster. | 
| 400 | Error message is propagated. | No | Contact AWS Support with the Request Id. | 
| 500 | Operation terminated (internal error) | Yes | Contact AWS Support with the Request Id. | 

# openCypher extensions in Amazon Neptune
<a name="access-graph-opencypher-extensions"></a>

 Amazon Neptune supports the openCypher specification reference version 9. Please refer to [openCypher specification compliance in Amazon Neptune](feature-opencypher-compliance.md) in Amazon Neptune for details. Additionally, Amazon Neptune supports the features listed here. Unless specific versions are mentioned, the features are available in Neptune Database and Neptune Analytics. 

## Query-time S3 data access
<a name="opencypher-compliance-neptune-read"></a>

Available in Neptune Database 1.4.7.0 and up.

Neptune supports the `neptune.read()` function to read CSV or Parquet data from Amazon S3 directly within openCypher queries. Unlike the bulk loader which imports data before querying, `neptune.read()` accesses Amazon S3 data at query execution time.

For complete documentation, see [neptune.read()](access-graph-opencypher-21-extensions-s3-read.md).

## The Neptune-specific `join()` function
<a name="opencypher-compliance-join-function"></a>

Available in Neptune Database and Neptune Analytics.

Neptune implements a `join()` function that is not present in the openCypher specification. It creates a string literal from a list of string literals and a string delimiter. It takes two arguments:
+ The first argument is a list of string literals.
+ The second argument is the delimiter string, which can consist of zero, one, or more than one characters.

Example:

```
join(["abc", "def", "ghi"], ", ")    // Returns "abc, def, ghi"
```

## The Neptune-specific `removeKeyFromMap()` function
<a name="opencypher-compliance-removeKeyFromMap-function"></a>

Available in Neptune Database and Neptune Analytics.

Neptune implements a `removeKeyFromMap()` function that is not present in the openCypher specification. It removes a specified key from a map and returns the resulting new map.

The function takes two arguments:
+ The first argument is the map from which to remove the key.
+ The second argument is the key to remove from the map.

The `removeKeyFromMap()` function is particularly useful in situations where you want to set values for a node or relationship by unwinding a list of maps. For example:

```
UNWIND [{`~id`: 'id1', name: 'john'}, {`~id`: 'id2', name: 'jim'}] as val
CREATE (n {`~id`: val.`~id`})
SET n = removeKeyFromMap(val, '~id')
```

## Custom ID values for node and relationship properties
<a name="opencypher-compliance-custom-ids"></a>

Available in Neptune Database 1.2.0.2 and up, and Neptune Analytics.

Starting in [engine release 1.2.0.2](engine-releases-1.2.0.2.md), Neptune has extended the openCypher specification so that you can now specify the `id` values for nodes and relationships in `CREATE`, `MERGE`, and `MATCH` clauses. This lets you assign user-friendly strings instead of system-generated UUIDs to identify nodes and relationships.

In Neptune Analytics, custom id values are not available for edges.

**Warning**  
This extension to the openCypher specification is backward incompatible, because `~id` is now considered a reserved property name. If you are already using `~id` as a property in your data and queries, you will need to migrate the existing property to a new property key and remove the old one. See [What to do if you're currently using `~id` as a property](#opencypher-compliance-custom-ids-migrating).

Here is an example showing how to create nodes and relationships that have custom IDS:

```
CREATE (n {`~id`: 'fromNode', name: 'john'})
  -[:knows {`~id`: 'john-knows->jim', since: 2020}]
  ->(m {`~id`: 'toNode', name: 'jim'})
```

If you try to create a custom ID that is already in use, Neptune throws a `DuplicateDataException` error.

Here is an example of using a custom ID in a `MATCH` clause:

```
MATCH (n {`~id`: 'id1'})
RETURN n
```

Here is an example of using custom IDs in a `MERGE` clause:

```
MATCH (n {name: 'john'}), (m {name: 'jim'})
MERGE (n)-[r {`~id`: 'john->jim'}]->(m)
RETURN r
```

### What to do if you're currently using `~id` as a property
<a name="opencypher-compliance-custom-ids-migrating"></a>

With [engine release 1.2.0.2](engine-releases-1.2.0.2.md), the `~id` key in openCypher clauses is now treated as `id` instead of as a property. This means that if you have a property named `~id`, accessing it becomes impossible.

If you're using an `~id` property, what you have to do before upgrading to engine release `1.2.0.2` or above is first to migrate the existing `~id` property to a new property key, and then remove the `~id` property. For example, the query below:
+ Creates a new property named 'newId' for all nodes,
+ copies over the value of the '\$1id' property into the 'newId' property,
+ and removes the '\$1id' property from the data

```
MATCH (n)
WHERE exists(n.`~id`)
SET n.newId = n.`~id`
REMOVE n.`~id`
```

The same thing needs to be done for any relationships in the data that have an `~id` property.

You will also have to change any queries you're using that reference an `~id` property. For example, this query:

```
MATCH (n)
WHERE n.`~id` = 'some-value'
RETURN n
```

...would change to this:

```
MATCH (n)
WHERE n.newId = 'some-value'
RETURN n
```

## CALL subquery support in Neptune
<a name="call-subquery-support"></a>

 Available in Neptune Database 1.4.1.0 and up, and Neptune Analytics. 

 Amazon Neptune supports `CALL` subqueries. A `CALL` subquery is a part of the main query that runs in an isolated scope for each input to the `CALL` subquery. 

 For example, suppose a graph contains data about persons, their friends, and cities they lived in. We can retrieve the two largest cities where each friend of someone lived in by using a `CALL` subquery: 

```
MATCH (person:Person)-[:knows]->(friend) 
CALL { 
  WITH friend 
  MATCH (friend)-[:lived_in]->(city) 
  RETURN city 
  ORDER BY city.population DESC
  LIMIT 2 
} 
RETURN person, friend, city
```

 In this example, the query part inside `CALL { ... }` is executed for each `friend` that is matched by the preceding MATCH clause. When the inner query is executed the `ORDER` and `LIMIT` clauses are local to the cities where a specific friend lived in, so we obtain (at most) two cities per friend. 

 All query clauses are available inside `CALL` subqueries. This includes nested `CALL` subqueries as well. Some restrictions for the first `WITH` clause and the emitted variables exist and are explained below. 

### Scope of variables inside CALL subquery
<a name="variable-scope-inside-call-subquery"></a>

 The variables from the clauses before the `CALL` subquery that are used inside it must be imported by the initial `WITH` clause. Unlike regular `WITH` clauses it can only contain a list of variables but doesn't allow aliasing and can't be used together with `DISTINCT`, `ORDER BY`, `WHERE`, `SKIP`, or `LIMIT`. 

### Variables returned from CALL subquery
<a name="variables-returned-call-subquery"></a>

 The variables that are emitted from the `CALL` subquery are specified with the final `RETURN` clause. Note that the emitted variables cannot overlap with variables before the `CALL` subquery. 

### Limitations
<a name="call-subquery-limitations"></a>

 As of now, updates inside of a `CALL` subquery are not supported. 

## Neptune openCypher functions
<a name="opencypher-compliance-new-functions"></a>

 Available in Neptune Database 1.4.1.0 and up, and Neptune Analytics. 

**textIndexOf**

 `textIndexOf(text :: STRING, lookup :: STRING, from = 0 :: INTEGER?, to = -1 :: INTEGER?) :: (INTEGER?)` 

 Returns the index of the first occurrence of `lookup` in the range of `text` starting at offset `from` (inclusive), through offset `to` (exclusive). If `to` is -1, the range continues to the end of `text`. Indexing is zero-based, and is expressed in Unicode scalar values (non-surrogate code points). 

```
RETURN textIndexOf('Amazon Neptune', 'e')
{
  "results": [{
      "textIndexOf('Amazon Neptune', 'e')": 8
    }]
}
```

**collToSet**

 `collToSet(values :: LIST OF ANY?) :: (LIST? OF ANY?)` 

 Returns a new list containing only the unique elements from the original list. The order of the original list is **maintained** (e.g `[1, 6, 5, 1, 5]` returns `[1, 6, 5]`). 

```
RETURN collToSet([1, 6, 5, 1, 1, 5])
{
  "results": [{
      "collToSet([1, 6, 5, 1, 1, 5])": [1, 6, 5]
    }]
}
```

**collSubtract**

 `collSubtract(first :: LIST OF ANY?, second :: LIST OF ANY?) :: (LIST? OF ANY?)` 

 Returns a new list containing all the unique elements of `first` excluding elements from `second`. 

```
RETURN collSubtract([2, 5, 1, 0], [1, 5])
{
  "results": [{
      "collSubtract([2, 5, 1, 0], [1, 5])": [0, 2]
    }]
}
```

**collIntersection**

 `collIntersection(first :: LIST? OF ANY?, second :: LIST? OF ANY?) :: (LIST? OF ANY?)` 

 Returns a new list containing all the unique elements of the intersection of `first` and `second`. 

```
RETURN collIntersection([2, 5, 1, 0], [1, 5])
{
  "results": [{
      "collIntersection([2, 5, 1, 0], [1, 5])": [1, 5]
    }]
}
```

## Sorting functions
<a name="sorting-functions"></a>

 The following sections define functions to sort collections. These functions take (in some cases optional) `config` map arguments, or a list of multiple such maps, that define the sort key and/or the sort direction: 

```
{ key: STRING, order: STRING }
```

 Here `key` is either a map or node property whose value is to be used for sorting. `order` is either "`asc`" or "`desc`" (case insensitive) to specify an ascending or descending sort, respectively. By default, sorting will be performed in ascending order. 

**collSort**

 `collSort(coll :: LIST OF ANY, config :: MAP?) :: (LIST? OF ANY?)` 

 Returns a new sorted list containing the elements from the `coll` input list. 

```
RETURN collSort([5, 3, 1], {order: 'asc'})
{
  "results": [{
      "collSort([5, 3, 1])": [1, 3, 5]
    }]
}
```

**collSortMaps**

 `collSortMaps(coll :: LIST OF MAP, config :: MAP) :: (LIST? OF ANY?)` 

 Returns a list of maps sorted by the value of the specified `key` property. 

```
RETURN collSortMaps([{name: 'Alice', age: 25}, {name: 'Bob', age: 35}, {name: 'Charlie', age: 18}], {key: 'age', order: 'desc'})
{
  "results": [{
      "x": [{
          "age": 35,
          "name": "Bob"
        }, {
          "age": 25,
          "name": "Alice"
        }, {
          "age": 18,
          "name": "Charlie"
        }]
    }]
}
```

**collSortMulti**

```
collSortMulti(coll :: LIST OF MAP?, 
configs = [] :: LIST OF MAP, 
limit = -1 :: INTEGER?, 
skip = 0 :: INTEGER?) :: (LIST? OF ANY?)
```

 Returns a list of maps sorted by the value of the specified `key` properties, optionally applying limit and skip. 

```
RETURN collSortMulti([{name: 'Alice', age: 25}, {name: 'Bob', age: 35}, {name: 'Charlie', age: 18}], [{key: 'age', order: 'desc'}, {key:'name'}]) as x
{
  "results": [{
      "x": [{
          "age": 35,
          "name": "Bob"
        }, {
          "age": 25,
          "name": "Alice"
        }, {
          "age": 18,
          "name": "Charlie"
        }]
    }]
}
```

**collSortNodes**

 `collSortNodes(coll :: LIST OF NODE, config :: MAP) :: (LIST? OF NODE?)` 

 Returns a sorted version of the `coll` input list, sorting the node elements by the values of their respective `key` properties. 

```
create (n:person {name: 'Alice', age: 23}), (m:person {name: 'Eve', age: 21}), (o:person {name:'Bob', age:25})
{"results":[]}

match (n:person) with collect(n) as people return collSortNodes(people, {key: 'name', order: 'desc'})
{
  "results": [{
      "collSortNodes(people, 'name')": [{
          "~id": "e599240a-8c23-4337-8aa8-f603c8fb5488",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 21,
            "name": "Eve"
          }
        }, {
          "~id": "8a6ef785-59e3-4a0b-a0ff-389655a9c4e6",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 25,
            "name": "Bob"
          }
        }, {
          "~id": "466bc826-f47f-452c-8a27-6b7bdf7ae9b4",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 23,
            "name": "Alice"
          }
        }]
    }]
}

match (n:person) with collect(n) as people return collSortNodes(people, {key: 'age'})
{
  "results": [{
      "collSortNodes(people, '^age')": [{
          "~id": "e599240a-8c23-4337-8aa8-f603c8fb5488",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 21,
            "name": "Eve"
          }
        }, {
          "~id": "466bc826-f47f-452c-8a27-6b7bdf7ae9b4",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 23,
            "name": "Alice"
          }
        }, {
          "~id": "8a6ef785-59e3-4a0b-a0ff-389655a9c4e6",
          "~entityType": "node",
          "~labels": ["person"],
          "~properties": {
            "age": 25,
            "name": "Bob"
          }
        }]
    }]
}
```

## Temporal functions
<a name="temporal-functions"></a>

 Temporal functions are available from Neptune version [1.4.5.0](https://docs.aws.amazon.com/releases/release-1.4.5.0.xml) and up. 

### day
<a name="temporal-functions-day"></a>

 `day(temporal :: (datetime | date)) :: (LONG)` 

 Returns the `day` of the month from a `datetime` or `date` value. For `datetime`: values are normalized to UTC based on input before extracting the day. For `date`: day is extracted based on the timezone. 

 The `datetime` input is available in both Neptune Database and Neptune Analytics: 

```
RETURN day(datetime('2021-06-03T01:48:14Z'))
{
  "results": [{
      "day(datetime('2021-06-03T01:48:14Z'))": 3
    }]
}
```

 Here, the `datetime` is normalized to UTC, so \$108:00 shifts back to June 2. 

```
RETURN day(datetime('2021-06-03T00:00:00+08:00'))
{
  "results": [{
      "day(datetime('2021-06-03T00:00:00+08:00'))": 2
    }]
}
```

 The `date` input is available only in Neptune Analytics: 

```
RETURN day(date('2021-06-03Z'))
{
  "results": [{
      "day(date('2021-06-03Z'))": 3
    }]
}
```

 The `date` preserves timezone, keeping June 3. 

```
RETURN day(date('2021-06-03+08:00'))
{
  "results": [{
      "day(date('2021-06-03+08:00'))": 3
    }]
}
```

### month
<a name="temporal-functions-month"></a>

 `month(temporal :: (datetime | date)) :: (LONG)` 

 Returns the month from a `datetime` or `date` value (1-12). For `datetime`: values are normalized to UTC based on input before extracting the month. For `date`: month is extracted based on the timezone. 

 The `datetime` input is available in both Neptune Database and Neptune Analytics: 

```
RETURN month(datetime('2021-06-03T01:48:14Z'))
{
  "results": [{
      "month(datetime('2021-06-03T01:48:14Z'))": 6
    }]
}
```

 Here, the `datetime` is normalized to UTC, so \$108:00 shifts back to May 31. 

```
RETURN month(datetime('2021-06-01T00:00:00+08:00'))
{
  "results": [{
      "month(datetime('2021-06-01T00:00:00+08:00'))": 5
    }]
}
```

 The `date` input is available only in Neptune Analytics: 

```
RETURN month(date('2021-06-03Z'))
{
  "results": [{
      "month(date('2021-06-03Z'))": 6
    }]
}
```

 The `date` preserves timezone, keeping June 1. 

```
RETURN month(date('2021-06-01+08:00'))
{
  "results": [{
      "month(date('2021-06-01+08:00'))": 6
    }]
}
```

### year
<a name="temporal-functions-year"></a>

 `year(temporal :: (datetime | date)) :: (LONG)` 

 Returns the year from a `datetime` or `date` value. For `datetime`: the values are normalized to UTC based on input before extracting the year. For `date`: the year is extracted based on the timezone. 

 The `datetime` input is available in both Neptune Database and Neptune Analytics: 

```
RETURN year(datetime('2021-06-03T01:48:14Z'))
{
  "results": [{
      "year(datetime('2021-06-03T01:48:14Z'))": 2021
    }]
}
```

 Here, the `datetime` is normalized to UTC, so \$108:00 shifts back to December 31, 2020. 

```
RETURN year(datetime('2021-01-01T00:00:00+08:00'))
{
  "results": [{
      "year(datetime('2021-01-01T00:00:00+08:00'))": 2020
    }]
}
```

 The `date` input is available only in Neptune Analytics: 

```
RETURN year(date('2021-06-03Z'))
{
  "results": [{
      "year(date('2021-06-03Z'))": 2021
    }]
}
```

 The `date` preserves timezone, keeping June 2021. 

```
RETURN year(date('2021-01-01+08:00'))
{
  "results": [{
      "year(date('2021-01-01+08:00'))": 2021
    }]
}
```

### Neptune openCypher functions
<a name="openCypher-functions"></a>

 Available in Neptune Database 1.4.6.0 and up, and Neptune Analytics. 

#### reduce()
<a name="openCypher-functions-reduce"></a>

 Reduce sequentially processes each list element by combining it with a running total or ‘accumulator.’ Starting with an initial value, it updates the accumulator after each operation and uses that updated value in the next iteration. 

 `for i in (0, ..., n) acc = acc X list[I], where X denotes any binary operator` 

 Once all elements have been processed, it returns the final accumulated result. 

 A typical reduce() structure would be - `reduce(accumulator = initial , variable IN list | expression)` 

**Type specifications:**  
 `- initial: starting value for the accumulator :: (Long | FLOAT | STRING | LIST? OF (STRING, LONG, FLOAT)) - list: the input list :: LIST OF T where T matches initial type - variable :: represents each element in the input list - expression :: Only supports '+' and '*' operator - return :: Same type as initial ` 

**Restrictions:**  
 Currently, the `reduce()` expression only supports : 
+  Numeric Multiplication 
+  Numeric Addition 
+  String Concatenation 
+  List Concatenation 

 They are represented by the `+` or `*` operator. The expression should be a binary expression as specified below - `expression pattern: accumulator + any variable or accumulator * any variable` 

**Overflow handling:**  
 Neptune detects numeric overflow during the `reduce()` evaluation and responds differently based on the data type: 

```
LONG (signed 64‑bit)
--------------------
• Valid range: –9 223 372 036 854 775 808 … 9 223 372 036 854 775 807  
• If any intermediate or final value falls outside this range,
  Neptune aborts the query with long overflow error message.
  
FLOAT (IEEE‑754 double)
-----------------------
• Largest finite value ≈ 1.79 × 10^308  
• Larger results overflow to INF
  Once `INF` is produced, it propagates through the remainder
  of the reduction.
```

**Examples:**  
See the following examples for the reduce() function.

```
1. Long Addition:
RETURN reduce(sum = 0, n IN [1, 2, 3] | sum + n)
{
  "results": [{
      "reduce(sum = 0, n IN [1, 2, 3] | sum + n)": 6
    }]
}

2. String Concatenation:
RETURN reduce(str = "", x IN ["A", "B", "C"] | str + x) 
{
  "results": [{
      "reduce(str = "", x IN ["A", "B", "C"] | str + x)": "ABC"
    }]
}

3. List Combination:
RETURN reduce(lst = [], x IN [1, 2, 3] | lst + x)
{
  "results": [{
      "reduce(lst = [], x IN [1, 2, 3] | lst + x)": [1, 2, 3]
    }]
}

4. Float Addition:
RETURN reduce(total = 0.0, x IN [1.5, 2.5, 3.5] | total + x) 
{
  "results": [{
      "reduce(total = 0.0, x IN [1.5, 2.5, 3.5] | total + x)": 7.5
    }]
}

5. Long Multiplication:
RETURN reduce(product = 1, n IN [1, 2, 3] | product * n)
{
  "results": [{
      "reduce(product = 0, n IN [1, 2, 3] | product * n)": 6
    }]
}

6. Float Multiplication:
RETURN reduce(product = 1.0, n IN [1.5, 2.5, 3.5] | product * n)
{
  "results": [{
      "reduce(product = 1.0, n IN [1.5, 2.5, 3.5] | product * n)": 13.125
    }]
}

7. Long Overflow (Exception):
RETURN reduce(s = 9223372036854775807, x IN [2, 3] | s * x) AS result
{
"results": [{
    "reduce(s = 9223372036854775807, x IN [2, 3] | s * x) AS result": long overflow
    }]
}

8. Float Overflow:
RETURN reduce(s = 9.0e307, x IN [8.0e307, 1.0e307] | s + x) AS result
{
"results": [{
    "reduce(s = 9.0e307, x IN [8.0e307, 1.0e307] | s + x) AS result": INF
    }]
}
```

# neptune.read()
<a name="access-graph-opencypher-21-extensions-s3-read"></a>

 Neptune supports a `CALL` procedure `neptune.read` to read data from Amazon S3 and then run an openCypher query (read, insert, update) using the data. The procedure yields each row in the file as a declared result variable row. It uses the IAM credentials of the caller to access the data in Amazon S3. See [Managing permissions for neptune.read()](access-graph-opencypher-21-extensions-s3-read-permissions.md) to set up the permissions. The AWS region of the Amazon S3 bucket must be in the same region where instance is located. Currently, cross-region reads are not supported. 

 **Syntax** 

```
CALL neptune.read(
  {
    source: "string",
    format: "parquet/csv",
    concurrency: 10
  }
)
YIELD row
...
```

**Inputs**
+  **source** (required) - Amazon S3 URI to a **single** object. Amazon S3 prefix to multiple objects is not supported. 
+  **format** (required) - `parquet` and `csv` are supported. 
  +  More details on the supported Parquet format can be found in [Supported Parquet column types](access-graph-opencypher-21-extensions-s3-read-parquet.md#access-graph-opencypher-21-extensions-s3-read-parquet-column-types). 
  +  For more information on the supported csv format, see [Gremlin load data format](bulk-load-tutorial-format-gremlin.md). 
+  **concurrency** (optional) - Type: 0 or greater integer. Default: 0. Specifies the number of threads to be used for reading the file. If the value is 0, the maximum number of threads allowed by the resource will be used. For Parquet, it is recommended to be set to a number of row groups. 

**Outputs**

 The neptune.read returns: 
+  **row** - type:Map 
  +  Each row in the file, where the keys are the columns and the values are the data found in each column. 
  +  You can access each column's data like a property access (`row.col`). 

## Best practices for neptune.read()
<a name="access-graph-opencypher-21-extensions-s3-read-best-practices"></a>

Neptune S3 read operations can be memory-intensive. Please use instance types well-suited for production workloads as outlined in [Choosing instance types for Amazon Neptune](instance-types.md).

Memory usage and performance of `neptune.read()` requests are affected by a variety of factors like file size, number of columns, number of rows, and file format. Depending on structure, small files (e.g., CSV files 100MB or under, Parquet files 20MB or under) may work reliably on most production-suited instance types, whereas larger files may require substantial memory that smaller instance types cannot provide.

When testing this feature, it is recommended to start with small files and scale gradually to ensure your read workload can be accommodated by your instance size. If you notice `neptune.read()` requests leading to out-of-memory exceptions or instance restarts, consider splitting your files into smaller chunks, reducing file complexity, or upgrading to larger instance types.

# Query examples using parquet
<a name="access-graph-opencypher-21-extensions-s3-read-parquet"></a>

The following example query returns the number of rows in a given Parquet file:

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "parquet"
  }
)
YIELD row
RETURN count(row)
```

You can run the query example using the `execute-open-cypher-query` operation in the AWS CLI by executing the following code:

```
aws neptunedata execute-open-cypher-query \
--open-cypher-query "CALL neptune.read({source: '<s3 path>', format: 'parquet'}) YIELD row RETURN count(row)" \
--endpoint-url https://my-cluster-name.cluster-abcdefgh1234.us-east-1.neptune.amazonaws.com:8182
```

A query can be flexible in what it does with rows read from a Parquet file. For example, the following query creates a node with a field being set to data found in the Parquet file:

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "parquet"
  }
)
YIELD row
CREATE (n {someField: row.someCol}) 
RETURN n
```

**Warning**  
It is not considered good practice to use a large results-producing clause like `MATCH(n)` prior to a `CALL` clause. This would lead to a long-running query, due to cross product between incoming solutions from prior clauses and the rows read by neptune.read. It's recommended to start the query with `CALL` neptune.read.

## Supported Parquet column types
<a name="access-graph-opencypher-21-extensions-s3-read-parquet-column-types"></a>

**Parquet Data Types:**
+ NULL
+ BOOLEAN
+ FLOAT
+ DOUBLE
+ STRING
+ SIGNED INTEGER: UINT8, UINT16, UINT32, UINT64
+ MAP: Only supports one-level. Does not support nested.
+ LIST: Only supports one-level. Does not support nested.

**Neptune-specific data types:**

Unlike the property column headers of the CSV format, the property column headers of the Parquet format only need to have the property names, so there is no need to have the type names nor the cardinality.

There are however, some special column types in the Parquet format that require annotation in the metadata, including the Any type, Date type, dateTime type, and Geometry type. The following object is an example of the required metadata annotation for files containing columns of these special types:

```
"metadata": {
    "anyTypeColumns": ["UserCol1"],
    "dateTypeColumns": ["UserCol2"],
    "dateTimeTypeColumns": ["UserCol3"],
    "geometryTypeColumns": ["UserCol4"]
}
```

Below are details on the expected payload associated with these types:
+ A column type Any is supported in the user columns. An Any type is a type "syntactic sugar" for all of the other types we support. It is extremely useful if a user column has multiple types in it. The payload of an Any type value is a list of json strings as follows: `{"value": "10", "type": "Int"};{"value": "1.0", "type": "Float"}`, which has a value field and a type field in each individual json string. The cardinality value of an Any column is set, meaning that the column can accept multiple values. 
  + Neptune supports the following types in an Any type: Bool (or Boolean), Byte, Short, Int, Long, UnsignedByte, UnsignedShort, UnsignedInt, UnsignedLong, Float, Double, Date, dateTime, String, and Geometry.
  + Vector type is not supported in Any type.
  + Nested Any type is not supported. For example, `{"value": {"value": "10", "type": "Int"}, "type": "Any"}`.
+ Columns of type Date and Datetime are supported in the user columns. The payload of these columns must be provided as strings following the XSD format or one of the formats below: 
  + yyyy-MM-dd
  + yyyy-MM-ddTHH:mm
  + yyyy-MM-ddTHH:mm:ss
  + yyyy-MM-ddTHH:mm:ssZ
  + yyyy-MM-ddTHH:mm:ss.SSSZ
  + yyyy-MM-ddTHH:mm:ss[\$1\$1-]hhmm
  + yyyy-MM-ddTHH:mm:ss.SSS[\$1\$1-]hhmm
+ A Geometry column type is supported in the user columns. The payload of these columns must only contain Geometry primitives of type Point, provided as strings in Well-known text (WKT) format. For example, POINT (30 10) would be a valid Geometry value.

## Sample parquet output
<a name="sample-parquet-output"></a>

Given a Parquet file like this:

```
<s3 path>

Parquet Type:
    int8     int16       int32             int64              float      double    string
+--------+---------+-------------+----------------------+------------+------------+----------+
|   Byte |   Short |       Int   |                Long  |     Float  |    Double  | String   |
|--------+---------+-------------+----------------------+------------+------------+----------|
|   -128 |  -32768 | -2147483648 | -9223372036854775808 |    1.23456 |    1.23457 | first    |
|    127 |   32767 |  2147483647 |  9223372036854775807 |  nan       |  nan       | second   |
|      0 |       0 |           0 |                    0 | -inf       | -inf       | third    |
|      0 |       0 |           0 |                    0 |  inf       |  inf       | fourth   |
+--------+---------+-------------+----------------------+------------+------------+----------+
```

Here is an example of the output returned by neptune.read using the following query:

```
aws neptunedata execute-open-cypher-query \
--open-cypher-query "CALL neptune.read({source: '<s3 path>', format: 'parquet'}) YIELD row RETURN row" \
--endpoint-url https://my-cluster-name.cluster-abcdefgh1234.us-east-1.neptune.amazonaws.com:8182
```

```
{
 "results": [{
 "row": {
 "Float": 1.23456,
 "Byte": -128,
 "Int": -2147483648,
 "Long": -9223372036854775808,
 "String": "first",
 "Short": -32768,
 "Double": 1.2345678899999999
 }
 }, {
 "row": {
 "Float": "NaN",
 "Byte": 127,
 "Int": 2147483647,
 "Long": 9223372036854775807,
 "String": "second",
 "Short": 32767,
 "Double": "NaN"
 }
 }, {
 "row": {
 "Float": "-INF",
 "Byte": 0,
 "Int": 0,
 "Long": 0,
 "String": "third",
 "Short": 0,
 "Double": "-INF"
 }
 }, {
 "row": {
 "Float": "INF",
 "Byte": 0,
 "Int": 0,
 "Long": 0,
 "String": "fourth",
 "Short": 0,
 "Double": "INF"
 }
 }]
}
```

Currently, there is no way to set a node or edge label to a data field coming from a Parquet file. It is recommended that you partition the queries into multiple queries, one for each label/Type.

```
CALL neptune.read({source: '<s3 path>', format: 'parquet'})
 YIELD row 
WHERE row.`~label` = 'airport'
CREATE (n:airport)

CALL neptune.read({source: '<s3 path>', format: 'parquet'})
YIELD row 
WHERE row.`~label` = 'country'
CREATE (n:country)
```

# Query examples using CSV
<a name="access-graph-opencypher-21-extensions-s3-read-csv"></a>

In this example, the query returns the number of rows in a given CSV file:

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "csv"
  }
)
YIELD row
RETURN count(row)
```

You can run the query example using the execute-open-cypher-query operation in the AWS CLI by executing the following code:

```
aws neptunedata execute-open-cypher-query \
--open-cypher-query "CALL neptune.read({source: '<s3 path>', format: 'csv'}) YIELD row RETURN count(row)" \
--endpoint-url https://my-cluster-name.cluster-abcdefgh1234.us-east-1.neptune.amazonaws.com:8182
```

A query can be flexible in what it does with rows read from a CSV file. For instance, the following query creates a node with a field set to data from a CSV file:

```
CALL neptune.read(
  {
    source: "<s3 path>",
    format: "csv"
  }
)
YIELD row
CREATE (n {someField: row.someCol}) 
RETURN n
```

**Warning**  
It is not considered good practice use a large results-producing clause like MATCH(n) prior to a CALL clause. This would lead to a long-running query due to cross product between incoming solutions from prior clauses and the rows read by neptune.read. It is recommended to start the query with CALL neptune.read.

## Property column headers
<a name="property-column-headers"></a>

You can specify a column (`:`) for a property by using the following syntax. The type names are not case sensitive. If a colon appears within a property name, it must be escaped by preceding it with a backslash: `\:`.

```
propertyname:type
```

**Note**  
Space, comma, carriage return and newline characters are not allowed in the column headers, so property names cannot include these characters.
You can specify a column for an array type by adding `[]` to the type:  

  ```
                          propertyname:type[]
  ```
Edge properties can only have a single value and will cause an error if an array type is specified or a second value is specified. The following example shows the column header for a property named age of type Int:  

  ```
  age:Int
  ```

Every row in the file would be required to have an integer in that position or be left empty. Arrays of strings are allowed, but strings in an array cannot include the semicolon (`;`) character unless it is escaped using a backslash (`\;`).

## Supported CSV column types
<a name="supported-csv-column-types"></a>
+ **BOOL (or BOOLEAN)** - Allowed values: true, false. Indicates a Boolean field. Any value other than true will be treated as false.
+ **FLOAT** - Range: 32-bit IEEE 754 floating point including Infinity, INF, -Infinity, -INF and NaN (not-a-number).
+ **DOUBLE** - Range: 64-bit IEEE 754 floating point including Infinity, INF, -Infinity, -INF and NaN (not-a-number).
+ **STRING** - 
  + Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks ("). Example: "Hello, World".
  + To include quotation marks in a quoted string, you can escape the quotation mark by using two in a row: Example: "Hello ""World""".
  + Arrays of strings are allowed, but strings in an array cannot include the semicolon (;) character unless it is escaped using a backslash (\$1;).
  + If you want to surround strings in an array with quotation marks, you must surround the whole array with one set of quotation marks. Example: "String one; String 2; String 3".
+ **DATE, DATETIME** - The datetime values can be provided in either the XSD format, or one of the following formats: 
  + yyyy-MM-dd
  + yyyy-MM-ddTHH:mm
  + yyyy-MM-ddTHH:mm:ss
  + yyyy-MM-ddTHH:mm:ssZ
  + yyyy-MM-ddTHH:mm:ss.SSSZ
  + yyyy-MM-ddTHH:mm:ss[\$1\$1-]hhmm
  + yyyy-MM-ddTHH:mm:ss.SSS[\$1\$1-]hhmm
+ **SIGNED INTEGER** - 
  + Byte: -128 to 127
  + Short: -32768 to 32767
  + Int: -2^31 to 2^31-1
  + Long: -2^63 to 2^63-1

**Neptune-specific column types:**
+ A column type Any is supported in the user columns. An Any type is a type "syntactic sugar" for all of the other types we support. It is extremely useful if a user column has multiple types in it. The payload of an Any type value is a list of json strings as follows: `{"value": "10", "type": "Int"};{"value": "1.0", "type": "Float"}`, which has a value field and a type field in each individual json string. The column header of an Any type is propertyname:Any. The cardinality value of an Any column is set, meaning that the column can accept multiple values. 
  + Neptune supports the following types in an Any type: Bool (or Boolean), Byte, Short, Int, Long, UnsignedByte, UnsignedShort, UnsignedInt, UnsignedLong, Float, Double, Date, dateTime, String, and Geometry.
  + Vector type is not supported in Any type.
  + Nested Any type is not supported. For example, `{"value": {"value": "10", "type": "Int"}, "type": "Any"}`.
+ A Geometry column type is supported in the user columns. The payload of these columns must only contain Geometry primitives of type Point, provided as strings in Well-known text (WKT) format. For example, POINT (30 10) would be a valid Geometry value.

## Sample CSV output
<a name="sample-csv-output"></a>

Given the following CSV file:

```
<s3 path>
colA:byte,colB:short,colC:int,colD:long,colE:float,colF:double,colG:string
-128,-32768,-2147483648,-9223372036854775808,1.23456,1.23457,first
127,32767,2147483647,9223372036854775807,nan,nan,second
0,0,0,0,-inf,-inf,third
0,0,0,0,inf,inf,fourth
```

This example shows the output returned by neptune.read using the following query:

```
aws neptunedata execute-open-cypher-query \
--open-cypher-query "CALL neptune.read({source: '<s3 path>', format: 'csv'}) YIELD row RETURN row" \
--endpoint-url https://my-cluster-name.cluster-abcdefgh1234.us-east-1.neptune.amazonaws.com:8182
```

```
{
  "results": [{
      "row": {
        "colD": -9223372036854775808,
        "colC": -2147483648,
        "colE": 1.23456,
        "colB": -32768,
        "colF": 1.2345699999999999,
        "colG": "first",
        "colA": -128
      }
    }, {
      "row": {
        "colD": 9223372036854775807,
        "colC": 2147483647,
        "colE": "NaN",
        "colB": 32767,
        "colF": "NaN",
        "colG": "second",
        "colA": 127
      }
    }, {
      "row": {
        "colD": 0,
        "colC": 0,
        "colE": "-INF",
        "colB": 0,
        "colF": "-INF",
        "colG": "third",
        "colA": 0
      }
    }, {
      "row": {
        "colD": 0,
        "colC": 0,
        "colE": "INF",
        "colB": 0,
        "colF": "INF",
        "colG": "fourth",
        "colA": 0
      }
    }]
}
```

Currently, there is no way to set a node or edge label to a data field coming from a CSV file. It is recommended that you partition the queries into multiple queries, one for each label/type.

```
CALL neptune.read({source: '<s3 path>', format: 'csv'})
 YIELD row 
WHERE row.`~label` = 'airport'
CREATE (n:airport)

CALL neptune.read({source: '<s3 path>', format: 'csv'})
YIELD row 
WHERE row.`~label` = 'country'
CREATE (n:country)
```

# Managing permissions for neptune.read()
<a name="access-graph-opencypher-21-extensions-s3-read-permissions"></a>

## Required IAM Policies
<a name="access-graph-opencypher-21-extensions-s3-read-permissions-iam"></a>

To execute openCypher queries that use `neptune.read()`, you must have the appropriate permissions to access data in your Neptune database. Read-only queries require the `ReadDataViaQuery` action. Queries that modify data require `WriteDataViaQuery` for insertions or `DeleteDataViaQuery` for deletions. The example below grants all three actions on the specified cluster.

Additionally, you need permissions to access the S3 bucket containing your data files. The NeptuneS3Access policy statement grants the required S3 permissions:
+ **`s3:ListBucket`**: Required to verify bucket existence and list contents.
+ **`s3:GetObject`**: Required to access the specified object so its content can be read for integration into openCypher queries.

If your S3 bucket uses server-side encryption with AWS KMS, you must also grant KMS permissions. The NeptuneS3KMSAccess policy statement allows Neptune to decrypt data and generate data keys when accessing encrypted S3 objects. The condition restricts KMS operations to requests originating from S3 and RDS services in your region.
+ **`kms:Decrypt`**: Required to perform decryption of the encrypted object so its data can be read by Neptune.
+ **`kms:GenerateDataKey`**: Also required by the S3 API used to retrieve objects to be read.

```
{
  "Sid": "NeptuneQueryAccess",
  "Effect": "Allow",
  "Action": [
      "neptune-db:ReadDataViaQuery",
      "neptune-db:WriteDataViaQuery",
      "neptune-db:DeleteDataViaQuery"
  ],
  "Resource": "arn:aws:neptune-db:<REGION>:<AWS_ACCOUNT_ID>:<CLUSTER_RESOURCE_ID>/*"
},
{
  "Sid": "NeptuneS3Access",
  "Effect": "Allow",
  "Action": [
      "s3:ListBucket",
      "s3:GetObject"
  ],
  "Resource": [
      "arn:aws:s3:::neptune-read-bucket",
      "arn:aws:s3:::neptune-read-bucket/*"
  ]
},
{
  "Sid": "NeptuneS3KMSAccess",
  "Effect": "Allow",
  "Action": [
      "kms:Decrypt",
      "kms:GenerateDataKey"
  ],
  "Resource": "arn:aws:kms:<REGION>:<AWS_ACCOUNT_ID>:key/<KEY_ID>",
  "Condition": {
      "StringEquals": {
        "kms:ViaService": [
            "s3.<REGION>.amazonaws.com",
            "rds.<REGION>.amazonaws.com"
        ]
      }
  }
}
```

## Important prerequisites
<a name="access-graph-opencypher-21-extensions-s3-read-permissions-prerequisites"></a>

These permissions and prerequisites ensure secure and reliable integration of S3 data into openCypher queries, while maintaining proper access controls and data protection measures.
+ **IAM authentication**: This feature is only supported for Neptune clusters with IAM authentication enabled. See [Securing your Amazon Neptune database](security.md) for detailed instructions on how to create and connect to IAM authentication-enabled clusters.
+ **VPC endpoint**:
  + A Gateway-type VPC endpoint for Amazon S3 is required to allow Neptune to communicate with Amazon S3.
  + To use custom AWS KMS encryption in the query, an Interface-type VPC endpoint for AWS KMS is required to allow Neptune to communicate with AWS KMS.
  + For detailed instructions for how to configure this endpoint, see [Creating the Amazon S3 VPC Endpoint](bulk-load-tutorial-IAM.md).

# Spatial Data
<a name="access-graph-opencypher-22-spatial-data"></a>

Amazon Neptune now supports spatial queries, allowing you to store and analyze geometric data in your graph. While commonly used for geographic locations (like coordinates on a map), spatial features work with any two-dimensional data where position and proximity matter. Use this feature to answer questions like "Which stores are within 5 miles of this customer?", "Find all delivery routes that intersect with this service area," or "Which components in this floor plan overlap with the HVAC zone?" Neptune implements spatial support using industry-standard Spatial Types functions that work with points, polygons, and other geometric shapes. You can store spatial data as properties on nodes and edges, then use spatial functions to calculate distances, check if points fall within boundaries, or find overlapping regions, all within your openCypher queries.

**Common use cases**:
+ **Geographic applications**: Location-based recommendations, geofencing, route planning, and territory analysis
+ **Facility and space management**: Floor plan layouts, equipment placement, and zone coverage
+ **Network topology**: Physical infrastructure mapping, coverage areas, and service boundaries
+ **Design and CAD**: Component positioning, collision detection, and spatial relationships in 2D designs
+ **Game development**: Character positioning, collision detection, and area-of-effect calculations

The Spatial Types implementation in Amazon Neptune follows the ISO/IEC 13249-3:2016 directives, like other databases. The [Spatial Functions](access-graph-opencypher-22-spatial-functions.md) are available in the openCypher query language.

## Coordinate system
<a name="access-graph-opencypher-22-spatial-data-coordinate-system"></a>

Neptune has one Spatial Reference Identifier (SRID) for an entire database. Homogeneity of the coordinate system reduce user errors in querying and improves the database performance. The first release (1.4.7.0) supports the Cartesian coordinate system, also referred as SRID 0.

The Neptune implementation of SRID 0 is compatible with longitude and latitude values. Use `ST_DistanceSpheroid` to calculate distances based on WGS84/SRID 4326.

The current implementation supports storing 3-dimensional coordinates. The Spatial Functions currently only support using the x- and y-axis (2-dimensional) coordinates. The z-axis coordinates are currently not supported by the available Spatial Functions.

## Storing location data
<a name="storing-spatial-data"></a>

Store location data on nodes and edges using the Geometry property type. Create Geometry values from Well-Known Text (WKT) format, a standard way to represent geographic shapes as text. For example, to store a point location:

```
CREATE (n:airport {code: 'ATL', location: ST_GeomFromText('POINT (-84.4281 33.6367)')})
```

When working with geographic coordinates, the first argument (x) represents longitude and the second argument (y) represents latitude. This follows the standard coordinate order used in spatial databases and the ISO 19125 standard.

**Note**  
 Neptune now supports a new data type called "Geometry". The Geometry property of a node or an edge can be created from a WKT string using the `ST_GeomFromText` function.  
Neptune will automatically store Points data in a specialized spatial index to improve the performance of the Spatial Types functions. For instance, `ST_Contains` used to find the points within a polygon is accelerated by the specialized spatial index.  
[ Wikipedia page for Well-Known Text representation of geometry ](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)

## Loading spatial data in bulk
<a name="loading-spatial-data-bulk"></a>

When bulk loading data, specify the Geometry type in your CSV header. Neptune will parse WKT strings and create the appropriate Geometry properties:

```
:ID,:LABEL,code:String,city:String,location:Geometry
21,airport,ATL,Atlanta,POINT (-84.42810059 33.63669968)
32,airport,ANC,Anchorage,POINT (-149.9960022 61.17440033)
43,airport,AUS,Austin,POINT (-97.66989899 30.19449997)
```

For complete CSV format details, see [openCypher bulk load format](bulk-load-tutorial-format-opencypher.md).

## Querying spatial data
<a name="querying-spatial-data"></a>

The following query examples use the [air-routes dataset](https://github.com/krlawrence/graph/tree/main/sample-data) to demonstrate how to use Spatial Functions in Neptune.

If your data has separate latitude and longitude properties instead of a Geometry property, you can convert them to points at query time. Find the 10 nearest airports to a given location:

```
MATCH (a:airport)
WITH a, ST_GeomFromText('POINT (' + a.lon + ' ' + a.lat + ')') AS airportLocation
WITH a, airportLocation, ST_Distance(ST_GeomFromText('POINT (-84.4281 33.6367)'), airportLocation) AS distance
WHERE distance IS NOT NULL
RETURN a.code, a.city, distance
ORDER BY distance ASC
LIMIT 10
```

If you already have locations stored as `ST_Point` then you can use those location values directly:

1. Set the property

   ```
   MATCH (a:airport)
   SET a.location = ST_GeomFromText('POINT (' + a.lon + ' ' + a.lat + ')')
   ```

1. Query using ST\$1Distance:

   ```
   MATCH (a:airport)
   WHERE a.location IS NOT NULL
   WITH a, ST_Distance(ST_GeomFromText('POINT (-84.4281 33.6367)'), a.location) AS distance
   RETURN a.code, a.city, distance
   ORDER BY distance ASC
   LIMIT 10
   ```

### Using the Bolt driver
<a name="querying-spatial-data-bolt"></a>

Most query methods return Geometry values as WKT strings, which are human-readable. If you're using the Bolt driver, Geometry values are returned in WKB (Well-Known Binary) format for efficiency. Convert WKB to a Geometry object in your application:

```
try (Session session = driver.session()) {
    Result result = session.run("MATCH (n:airport {code: 'ATL'}) RETURN n.location as geom");
    
    Record record = result.single();
    byte[] wkbBytes = record.get("geom").asByteArray();
    
    // Convert WKB to Geometry object using JTS library
    WKBReader wkbReader = new WKBReader();
    Geometry geom = wkbReader.read(wkbBytes);
}
```

# Spatial Functions
<a name="access-graph-opencypher-22-spatial-functions"></a>

The following spatial functions are available in Neptune openCypher for working with geometry data types:
+ [ST\$1Point](access-graph-opencypher-22-spatial-functions-st-point.md)
+ [ST\$1GeomFromText](access-graph-opencypher-22-spatial-functions-st-geomfromtext.md)
+ [ST\$1AsText](access-graph-opencypher-22-spatial-functions-st-astext.md)
+ [ST\$1GeometryType](access-graph-opencypher-22-spatial-functions-st-geometrytype.md)
+ [ST\$1Equals](access-graph-opencypher-22-spatial-functions-st-equals.md)
+ [ST\$1Contains](access-graph-opencypher-22-spatial-functions-st-contains.md)
+ [ST\$1Intersects](access-graph-opencypher-22-spatial-functions-st-intersect.md)
+ [ST\$1Distance](access-graph-opencypher-22-spatial-functions-st-distance.md)
+ [ST\$1DistanceSpheroid](access-graph-opencypher-22-spatial-functions-st-distancespheroid.md)
+ [ST\$1Envelope](access-graph-opencypher-22-spatial-functions-st-envelope.md)
+ [ST\$1Buffer](access-graph-opencypher-22-spatial-functions-st-buffer.md)

# ST\$1Point
<a name="access-graph-opencypher-22-spatial-functions-st-point"></a>

ST\$1Point returns a point from the input coordinate values.

**Syntax**

```
ST_Point(x, y, z)
```

**Arguments**
+ `x` - A value of data type DOUBLE PRECISION that represents a first coordinate.
+ `y` - A value of data type DOUBLE PRECISION that represents a second coordinate.
+ `z` - (optional)

**Coordinate order**

When working with geographic coordinates, the first argument (`x`) represents **longitude** and the second argument (`y`) represents **latitude**. This follows the standard coordinate order used in spatial databases and the ISO 19125 standard.

```
// Correct: longitude first, latitude second
ST_Point(-84.4281, 33.6367)  // Atlanta airport

// Incorrect: latitude first, longitude second
ST_Point(33.6367, -84.4281)  // This will return NaN in distance calculations
```

**Valid coordinate ranges**

For geographic data, ensure coordinates fall within valid ranges:
+ Longitude (`x`): -180 to 180
+ Latitude (`y`): -90 to 90

Coordinates outside these ranges will return `NaN` (Not a Number) when used with distance calculation functions like `ST_DistanceSpheroid`.

**Return type**

GEOMETRY of subtype POINT

If x or y is null, then null is returned.

**Examples**

The following constructs a point geometry from the input coordinates.

```
RETURN ST_Point(5.0, 7.0); 
POINT(5 7)
```

# ST\$1GeomFromText
<a name="access-graph-opencypher-22-spatial-functions-st-geomfromtext"></a>

ST\$1GeomFromText constructs a geometry object from a well-known text (WKT) representation of an input geometry.

**Syntax**

```
ST_GeomFromText(wkt_string)
```

**Arguments**
+ `wkt_string` - A value of data type STRING that is a WKT representation of a geometry.

**Return type**

GEOMETRY

If wkt\$1string is null, then null is returned.

If wkt\$1string is not valid, then a BadRequestException is returned.

**Examples**

```
RETURN ST_GeomFromText('POLYGON((0 0,0 1,1 1,1 0,0 0))')             
POLYGON((0 0,0 1,1 1,1 0,0 0))
```

# ST\$1AsText
<a name="access-graph-opencypher-22-spatial-functions-st-astext"></a>

ST\$1AsText returns the well-known text (WKT) representation of an input geometry.

**Syntax**

```
ST_AsText(geo)
```

**Arguments**
+ `geo` - A value of data type GEOMETRY, or an expression that evaluates to a GEOMETRY.

**Return type**

STRING

If geo is null, then null is returned.

If the input parameter is not a Geometry, then a BadRequestException is returned.

If the result is larger than a 64-KB STRING, then an error is returned.

**Examples**

```
RETURN ST_AsText(ST_GeomFromText('POLYGON((0 0,0 1,1 1,1 0,0 0))'))             
POLYGON((0 0,0 1,1 1,1 0,0 0))
```

# ST\$1GeometryType
<a name="access-graph-opencypher-22-spatial-functions-st-geometrytype"></a>

ST\$1GeometryType returns the type of the geometry as a string.

**Syntax**

```
ST_GeometryType(geom)
```

**Arguments**
+ `geom` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.

**Return type**

STRING

If geom is null, then null is returned.

If the input parameter is not a Geometry, then a BadRequestException is returned.

**Examples**

```
RETURN ST_GeometryType(ST_GeomFromText('LINESTRING(77.29 29.07,77.42 29.26,77.27 29.31,77.29 29.07)'));
ST_LineString
```

# ST\$1Equals
<a name="access-graph-opencypher-22-spatial-functions-st-equals"></a>

ST\$1Equals returns true if the 2D projections of the input geometries are topologically equal. Geometries are considered topologically equal if they have equal point sets. In topologically equal geometries, the order of vertices may differ while maintaining this equality.

**Syntax**

```
ST_Equals(geom1, geom2)
```

**Arguments**
+ `geom1` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.
+ `geom2` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type. This value is compared with geom1 to determine if it is equal to geom1.

**Return type**

BOOLEAN

If geom1 or geom2 is null, then null is returned.

If geom1 or geom2 are not Geometries, then a BadRequestException is returned.

**Examples**

```
RETURN ST_Equals(
    ST_GeomFromText('POLYGON ((0 2,1 1,0 -1,0 2))'), 
    ST_GeomFromText('POLYGON((-1 3,2 1,0 -3,-1 3))'));
false
```

The following checks if the two linestrings are geometrically equal.

```
RETURN ST_Equals(
    ST_GeomFromText('LINESTRING (1 0, 10 0)'), 
    ST_GeomFromText('LINESTRING(1 0,5 0,10 0)'));
true
```

# ST\$1Contains
<a name="access-graph-opencypher-22-spatial-functions-st-contains"></a>

ST\$1Contains returns true if the 2D projection of the first input geometry contains the 2D projection of the second input geometry. Geometry A contains geometry B if every point in B is a point in A, and their interiors have nonempty intersection. ST\$1Contains(A, B) is equivalent to ST\$1Within(B, A).

**Syntax**

```
ST_Contains(geom1, geom2)
```

**Arguments**
+ `geom1` - A value of type GEOMETRY or an expression that evaluates to a GEOMETRY type.
+ `geom2` - A value of type GEOMETRY or an expression that evaluates to a GEOMETRY type. This value is compared with geom1 to determine if it is contained within geom1.

**Return type**

BOOLEAN

If geom1 or geom2 is null, then null is returned.

If the input parameter is not a Geometry, then a BadRequestException is returned.

**Examples**

```
RETURN ST_Contains(
    ST_GeomFromText('POLYGON((0 2,1 1,0 -1,0 2))'), 
    ST_GeomFromText('POLYGON((-1 3,2 1,0 -3,-1 3))'));
false
```

# ST\$1Intersects
<a name="access-graph-opencypher-22-spatial-functions-st-intersect"></a>

ST\$1Intersects returns true if the 2D projections of the two input geometries have at least one point in common.

**Syntax**

```
ST_Intersects(geom1, geom2)
```

**Arguments**
+ `geom1` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.
+ `geom2` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.

**Return type**

BOOLEAN

If geom1 or geom2 is null, then null is returned.

If the input parameter is not a Geometry, then a BadRequestException is returned.

**Examples**

```
RETURN ST_Intersects(
    ST_GeomFromText('POLYGON((0 0,10 0,10 10,0 10,0 0),(2 2,2 5,5 5,5 2,2 2))'), 
    ST_GeomFromText('MULTIPOINT((4 4),(6 6))'));
true
```

# ST\$1Distance
<a name="access-graph-opencypher-22-spatial-functions-st-distance"></a>

For input geometries, ST\$1Distance returns the minimum Euclidean distance between the 2D projections of the two input geometry values.

**Syntax**

```
ST_Distance(geo1, geo2)
```

**Arguments**
+ `geo1` - A value of data type GEOMETRY, or an expression that evaluates to a GEOMETRY type.
+ `geo2` - A value of data type GEOMETRY, or an expression that evaluates to a GEOMETRY.

**Return type**

DOUBLE PRECISION in the same units as the input geometries.

If geo1 or geo2 is null, then null is returned.

If the input parameter is not a Geometry, then a BadRequestException is returned.

**Examples**

```
RETURN ST_Distance(
    ST_GeomFromText('POLYGON((0 2,1 1,0 -1,0 2))'), 
    ST_GeomFromText('POLYGON((-1 -3,-2 -1,0 -3,-1 -3))'));
1.4142135623731
```

# ST\$1DistanceSpheroid
<a name="access-graph-opencypher-22-spatial-functions-st-distancespheroid"></a>

Returns the minimum distance in meters between two lon/lat geometries. The spheroid is WGS84/SRID 4326.

**Syntax**

```
ST_DistanceSpheroid(geom1, geom2);
```

**Arguments**
+ `geom1` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.
+ `geom2` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.

**Return type**

FLOAT

If geom is null, then null is returned.

**Examples**

```
RETURN ST_DistanceSpheroid(
    ST_GeomFromText('POINT(-110 42)'),
    ST_GeomFromText('POINT(-118 38)'))
814278.77
```

# ST\$1Envelope
<a name="access-graph-opencypher-22-spatial-functions-st-envelope"></a>

ST\$1Envelope returns the minimum bounding box of the input geometry, as follows:
+ If the input geometry is empty, the returned geometry will be POINT EMPTY.
+ If the minimum bounding box of the input geometry degenerates to a point, the returned geometry is a point.
+ If none of the preceding is true, the function returns a counter-clockwise-oriented polygon whose vertices are the corners of the minimum bounding box.

For all nonempty input, the function operates on the 2D projection of the input geometry.

**Syntax**

```
ST_Envelope(geom)
```

**Arguments**
+ `geom` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.

**Return type**

GEOMETRY

If geom is null, then null is returned.

**Examples**

```
RETURN ST_Envelope(ST_GeomFromText("POLYGON ((2 1, 4 3, 6 1, 5 5, 3 4, 2 1))"))
POLYGON ((2 1, 6 1, 6 5, 2 5, 2 1))
```

# ST\$1Buffer
<a name="access-graph-opencypher-22-spatial-functions-st-buffer"></a>

ST\$1Buffer returns 2D geometry that represents all points whose distance from the input geometry projected on the xy-Cartesian plane is less than or equal to the input distance.

**Syntax**

```
ST_Buffer(geom, distance, number_of_segments_per_quarter_circle)
```

**Arguments**
+ `geom` - A value of data type GEOMETRY or an expression that evaluates to a GEOMETRY type.
+ `distance` - A value of data type DOUBLE PRECISION that represents distance (or radius) of the buffer.
+ `number_of_segments_per_quarter_circle` - A value of data type INTEGER (should be larger or equal to 0). This value determines the number of points to approximate a quarter circle around each vertex of the input geometry. Negative values default to zero. The default is 8.

**Return type**

GEOMETRY

The ST\$1Buffer function returns two-dimensional (2D) geometry in the xy-Cartesian plane.

**Examples**

```
RETURN ST_Buffer(ST_GeomFromText('LINESTRING (1 2,5 2,5 8)'), 2, 4);
POLYGON ((3 4, 3 8, 3.1522409349774265 8.76536686473018,
         3.585786437626905 9.414213562373096, 4.234633135269821 9.847759065022574,
         5 10, 5.765366864730179 9.847759065022574,
         6.414213562373095 9.414213562373096, 6.847759065022574 8.76536686473018,
         7 8, 7 2, 6.847759065022574 1.2346331352698203,
         6.414213562373095 0.5857864376269051, 5.765366864730179 0.1522409349774265,
         5 0, 1 0, 0.2346331352698193 0.152240934977427,
         -0.4142135623730954 0.5857864376269051,
         -0.8477590650225737 1.2346331352698208, -1 2.0000000000000004,
         -0.8477590650225735 2.7653668647301797,
         -0.4142135623730949 3.414213562373095,
         0.2346331352698206 3.8477590650225735, 1 4, 3 4))
```

The following returns the buffer of the input point geometry which approximates a circle. Because the command specifies 3 as the number of segments per quarter circle, the function uses three segments to approximate the quarter circle.

```
RETURN ST_Buffer(ST_GeomFromText('POINT (1 1)'), 1.0, 8));
POLYGON ((2 1, 1.9807852804032304 0.8049096779838718,
     1.9238795325112867 0.6173165676349102, 1.8314696123025453 0.4444297669803978,
     1.7071067811865475 0.2928932188134525, 1.5555702330196022 0.1685303876974548,
     1.3826834323650898 0.0761204674887133, 1.1950903220161284 0.0192147195967696,
     1 0, 0.8049096779838718 0.0192147195967696, 0.6173165676349103 0.0761204674887133,
    0.444429766980398 0.1685303876974545, 0.2928932188134525 0.2928932188134524,
     0.1685303876974546 0.4444297669803978, 0.0761204674887133 0.6173165676349102,
     0.0192147195967696 0.8049096779838714, 0 0.9999999999999999,
     0.0192147195967696 1.1950903220161284, 0.0761204674887132 1.3826834323650896,
     0.1685303876974545 1.555570233019602, 0.2928932188134523 1.7071067811865475,
     0.4444297669803978 1.8314696123025453, 0.6173165676349097 1.9238795325112865,
     0.8049096779838714 1.9807852804032304, 0.9999999999999998 2,
     1.1950903220161284 1.9807852804032304, 1.38268343236509 1.9238795325112865,
     1.5555702330196017 1.8314696123025453, 1.7071067811865475 1.7071067811865477,
     1.8314696123025453 1.5555702330196022, 1.9238795325112865 1.3826834323650905,
     1.9807852804032304 1.1950903220161286, 2 1))
```

# Accessing the Neptune graph with SPARQL
<a name="access-graph-sparql"></a>

SPARQL is a query language for the Resource Description Framework (RDF), which is a graph data format designed for the web. Amazon Neptune is compatible with SPARQL 1.1. This means that you can connect to a Neptune DB instance and query the graph using the query language described in the [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) specification.

 A query in SPARQL consists of a `SELECT` clause to specify the variables to return and a `WHERE` clause to specify which data to match in the graph. If you are unfamiliar with SPARQL queries, see [Writing Simple Queries](https://www.w3.org/TR/sparql11-query/#WritingSimpleQueries) in the [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/).

**Important**  
To load data, `SPARQL UPDATE INSERT` may work well for a small dataset, but if you need to load a substantial amount of data from a file, see [Using the Amazon Neptune bulk loader to ingest data](bulk-load.md).

For more information about the specifics of Neptune's SPARQL implementation, see [SPARQL standards compliance](feature-sparql-compliance.md).

Before you begin, you must have the following:
+ A Neptune DB instance. For information about creating a Neptune DB instance, see [Creating an Amazon Neptune cluster](get-started-create-cluster.md).
+ An Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

**Topics**
+ [Using the RDF4J console to connect to a Neptune DB instance](access-graph-sparql-rdf4j-console.md)
+ [Using RDF4J Workbench to connect to a Neptune DB instance](access-graph-sparql-rdf4j-workbench.md)
+ [Using Java to connect to a Neptune DB instance](access-graph-sparql-java.md)
+ [SPARQL HTTP API](sparql-api-reference.md)
+ [SPARQL query hints](sparql-query-hints.md)
+ [SPARQL DESCRIBE behavior with respect to the default graph](sparql-default-describe.md)
+ [SPARQL query status API](sparql-api-status.md)
+ [SPARQL query cancellation](sparql-api-status-cancel.md)
+ [Using the SPARQL 1.1 Graph Store HTTP Protocol (GSP) in Amazon Neptune](sparql-graph-store-protocol.md)
+ [Analyzing Neptune query execution using SPARQL `explain`](sparql-explain.md)
+ [SPARQL federated queries in Neptune using the `SERVICE` extension](sparql-service.md)

# Using the RDF4J console to connect to a Neptune DB instance
<a name="access-graph-sparql-rdf4j-console"></a>



The RDF4J Console allows you to experiment with Resource Description Framework (RDF) graphs and queries in a REPL (read-eval-print loop) environment. 

You can add a remote graph database as a repository and query it from the RDF4J Console. This section walks you through the configuration of the RDF4J Console to connect remotely to a Neptune DB instance.

**To connect to Neptune using the RDF4J Console**

1. Download the RDF4J SDK from the [Download page](http://rdf4j.org/download/) on the RDF4J website.

1. Unzip the RDF4J SDK zip file.

1. In a terminal, navigate to the RDF4J SDK directory, and then enter the following command to run the RDF4J Console:

   ```
   bin/console.sh
   ```

   You should see output similar to the following:

   ```
   14:11:51.126 [main] DEBUG o.e.r.c.platform.PlatformFactory - os.name = linux
   14:11:51.130 [main] DEBUG o.e.r.c.platform.PlatformFactory - Detected Posix platform
   Connected to default data directory
   RDF4J Console 3.6.1
   
   3.6.1
   Type 'help' for help.
   >
   ```

   You are now at the `>` prompt. This is the general prompt for the RDF4J Console. You use this prompt for setting up repositories and other operations. A repository has its own prompt for running queries.

1. At the `>` prompt, enter the following to create a SPARQL repository for your Neptune DB instance:

    

   ```
   create sparql
   ```

1. The RDF4J Console prompts you for values for the variables required to connect to the SPARQL endpoint.

   ```
   Please specify values for the following variables:
   ```

   Specify the following values:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune/latest/userguide/access-graph-sparql-rdf4j-console.html)

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   If the operation is successful, you see the following message:

    

   ```
   Repository created
   ```

1. At the `>` prompt, enter the following to connect to the Neptune DB instance:

   ```
   open neptune
   ```

   If the operation is successful, you see the following message:

    

   ```
   Opened repository 'neptune'
   ```

   You are now at the `neptune>` prompt. At this prompt, you can run queries against the Neptune graph.

    
**Note**  
Now that you have added the repository, the next time you run `bin/console.sh`, you can immediately run the `open neptune` command to connect to the Neptune DB instance.

1. At the `neptune>` prompt, enter the following to run a SPARQL query that returns up to 10 of the triples (subject-predicate-object) in the graph by using the `?s ?p ?o` query with a limit of 10. To query for something else, replace the text after the `sparql` command with another SPARQL query.

   ```
   sparql select ?s ?p ?o where {?s ?p ?o} limit 10
   ```

# Using RDF4J Workbench to connect to a Neptune DB instance
<a name="access-graph-sparql-rdf4j-workbench"></a>

This section walks you through connecting to an Amazon Neptune DB instance using RDF4J Workbench and RDF4J Server. RDF4J Server is required because it acts as a proxy between the Neptune SPARQL HTTP REST endpoint and RDF4J Workbench. 

RDF4J Workbench provides an easy interface for experimenting with a graph, including loading local files. For information, see the [Add section](https://rdf4j.org/documentation/tools/server-workbench/#add) in the RDF4J documentation.

**Prerequisites**  
Before you begin, do the following:
+ Install Java 1.8 or later.
+ Install RDF4J Server and RDF4J Workbench. For information, see [Installing RDF4J Server and RDF4J Workbench](https://rdf4j.org/documentation/tools/server-workbench/#installing-rdf4j-server-and-rdf4j-workbench).

**To use RDF4J Workbench to connect to Neptune**

1. In a web browser, navigate to the URL where the RDF4J Workbench web app is deployed. For example, if you are using Apache Tomcat, the URL is: [https://*ec2\$1hostname*:8080/rdf4j-workbench/](http://localhost:8080/rdf4j-workbench/).

1. If you are asked to **Connect to RDF4J Server**, verify that **RDF4J Server** is installed, running, and that the server URL is correct. Then, proceed to the next step.

1. In the left pane, choose **New repository**.

   In **New repository**:
   + In the **Type** drop-down list, choose **SPARQL endpoint proxy**.
   + For **ID**, type **neptune**.
   + For **Title**, type **Neptune DB instance**.

   Choose **Next**.

1. In **New repository**:
   + For **SPARQL query endpoint URL**, type `https://your-neptune-endpoint:port/sparql`.
   + For **SPARQL update endpoint URL**, type `https://your-neptune-endpoint:port/sparql`.

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section. 

   Choose **Create**.

1. The **neptune** repository now appears in the list of repositories. It might take a few minutes before you can use the new repository.

1. In the **Id** column of the table, choose the **neptune** link.

1. In the left pane, choose **Query**. 

    
**Note**  
If the menu items under **Explore** are disabled, you might need to reconnect to the RDF4J Server and choose the **neptune** repository again.  
You can do this by using the **[change]** links in the upper-right corner.

1. In the query field, type the following SPARQL query, and then choose **Execute**.

    

   ```
   select ?s ?p ?o where {?s ?p ?o} limit 10
   ```

    

The preceding example returns up to 10 of the triples (subject-predicate-object) in the graph by using the `?s ?p ?o` query with a limit of 10. 

# Using Java to connect to a Neptune DB instance
<a name="access-graph-sparql-java"></a>

This section walks you through the running of a complete Java sample that connects to an Amazon Neptune DB instance and performs a SPARQL query.

Follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

**To connect to Neptune using Java**

1. Install Apache Maven on your EC2 instance. If using Amazon Linux 2023 (preferred), use:

   ```
   sudo dnf update -y
   sudo dnf install maven -y
   ```

   If using Amazon Linux 2, download the latest binary from [ https://maven.apache.org/download.cgi: ](https://maven.apache.org/download.cgi:)

   ```
   sudo yum remove maven -y
   wget https://dlcdn.apache.org/maven/maven-3/ <version>/binaries/apache-maven-<version>-bin.tar.gz
   sudo tar -xzf apache-maven-<version>-bin.tar.gz -C /opt/
   sudo ln -sf /opt/apache-maven-<version> /opt/maven
   echo 'export MAVEN_HOME=/opt/maven' >> ~/.bashrc
   echo 'export PATH=$MAVEN_HOME/bin:$PATH' >> ~/.bashrc
   source ~/.bashrc
   ```

1. This example was tested with Java 8 only. Enter the following to install Java 8 on your EC2 instance:

   ```
   sudo yum install java-1.8.0-devel
   ```

1. Enter the following to set Java 8 as the default runtime on your EC2 instance:

   ```
   sudo /usr/sbin/alternatives --config java
   ```

   When prompted, enter the number for Java 8.

1. Enter the following to set Java 8 as the default compiler on your EC2 instance: 

   ```
   sudo /usr/sbin/alternatives --config javac
   ```

   When prompted, enter the number for Java 8.

1. In a new directory, create a `pom.xml` file, and then open it in a text editor.

1. Copy the following into the `pom.xml` file and save it (you can usually adjust the version numbers to the latest stable version):

   ```
   <project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/maven-v4_0_0.xsd">
     <modelVersion>4.0.0</modelVersion>
     <groupId>com.amazonaws</groupId>
     <artifactId>RDFExample</artifactId>
     <packaging>jar</packaging>
     <version>1.0-SNAPSHOT</version>
     <name>RDFExample</name>
     <url>https://maven.apache.org</url>
     <dependencies>
       <dependency>
         <groupId>org.eclipse.rdf4j</groupId>
         <artifactId>rdf4j-runtime</artifactId>
         <version>3.6</version>
       </dependency>
     </dependencies>
     <build>
       <plugins>
         <plugin>
             <groupId>org.codehaus.mojo</groupId>
             <artifactId>exec-maven-plugin</artifactId>
             <version>1.2.1</version>
             <configuration>
               <mainClass>com.amazonaws.App</mainClass>
             </configuration>
         </plugin>
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-compiler-plugin</artifactId>
           <configuration>
             <source>1.8</source>
             <target>1.8</target>
           </configuration>
         </plugin>
       </plugins>
     </build>
   </project>
   ```
**Note**  
If you are modifying an existing Maven project, the required dependency is highlighted in the preceding code.

1. To create subdirectories for the example source code (`src/main/java/com/amazonaws/`), enter the following at the command line:

   ```
   mkdir -p src/main/java/com/amazonaws/
   ```

1. In the `src/main/java/com/amazonaws/` directory, create a file named `App.java`, and then open it in a text editor.

1. Copy the following into the `App.java` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance.
**Note**  
For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section. 

   ```
   package com.amazonaws;
   
   import org.eclipse.rdf4j.repository.Repository;
   import org.eclipse.rdf4j.repository.http.HTTPRepository;
   import org.eclipse.rdf4j.repository.sparql.SPARQLRepository;
   
   import java.util.List;
   import org.eclipse.rdf4j.RDF4JException;
   import org.eclipse.rdf4j.repository.RepositoryConnection;
   import org.eclipse.rdf4j.query.TupleQuery;
   import org.eclipse.rdf4j.query.TupleQueryResult;
   import org.eclipse.rdf4j.query.BindingSet;
   import org.eclipse.rdf4j.query.QueryLanguage;
   import org.eclipse.rdf4j.model.Value;
   
   public class App
   {
       public static void main( String[] args )
       {
           String sparqlEndpoint = "https://your-neptune-endpoint:port/sparql";
           Repository repo = new SPARQLRepository(sparqlEndpoint);
           repo.initialize();
   
           try (RepositoryConnection conn = repo.getConnection()) {
              String queryString = "SELECT ?s ?p ?o WHERE { ?s ?p ?o } limit 10";
   
              TupleQuery tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
   
              try (TupleQueryResult result = tupleQuery.evaluate()) {
                 while (result.hasNext()) {  // iterate over the result
                      BindingSet bindingSet = result.next();
   
                      Value s = bindingSet.getValue("s");
                      Value p = bindingSet.getValue("p");
                      Value o = bindingSet.getValue("o");
   
                      System.out.print(s);
                      System.out.print("\t");
                      System.out.print(p);
                      System.out.print("\t");
                      System.out.println(o);
                 }
              }
           }
       }
   }
   ```

1. Use the following Maven command to compile and run the sample:

   ```
   mvn compile exec:java
   ```

The preceding example returns up to 10 of the triples (subject-predicate-object) in the graph by using the `?s ?p ?o` query with a limit of 10. To query for something else, replace the query with another SPARQL query.

The iteration of the results in the example prints the value of each variable returned. The `Value` object is converted to a `String` and then printed. If you change the `SELECT` part of the query, you must modify the code.

# SPARQL HTTP API
<a name="sparql-api-reference"></a>

SPARQL HTTP requests are accepted at the following endpoint: `https://your-neptune-endpoint:port/sparql`

For more information about connecting to Amazon Neptune with SPARQL, see [Accessing the Neptune graph with SPARQL](access-graph-sparql.md).

For more information about the SPARQL protocol and query language, see the [SPARQL 1.1 Protocol](https://www.w3.org/TR/sparql11-protocol/#protocol) and the [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) specification.

The following topics provide information about SPARQL RDF serialization formats and how to use the SPARQL HTTP API with Neptune.

**Contents**
+ [Using the HTTP REST endpoint to connect to a Neptune DB instance](access-graph-sparql-http-rest.md)
+ [Optional HTTP trailing headers for multi-part SPARQL responses](access-graph-sparql-http-trailing-headers.md)
+ [RDF media types used by SPARQL in Neptune](sparql-media-type-support.md)
  + [RDF serialization formats used by Neptune SPARQL](sparql-media-type-support.md#sparql-serialization-formats)
  + [SPARQL result serialization formats used by Neptune SPARQL](sparql-media-type-support.md#sparql-serialization-formats-neptune-output)
  + [Media-Types that Neptune can use to import RDF data](sparql-media-type-support.md#sparql-serialization-formats-input)
  + [Media-Types that Neptune can use to export query results](sparql-media-type-support.md#sparql-serialization-formats-output)
+ [Using SPARQL UPDATE LOAD to import data into Neptune](sparql-api-reference-update-load.md)
+ [Using SPARQL UPDATE UNLOAD to delete data from Neptune](sparql-api-reference-unload.md)

# Using the HTTP REST endpoint to connect to a Neptune DB instance
<a name="access-graph-sparql-http-rest"></a>

**Note**  
Neptune does not currently support HTTP/2 for REST API requests. Clients must use HTTP/1.1 when connecting to endpoints.

The following instructions walk you through connecting to the SPARQL endpoint using the **curl** command, connecting through HTTPS, and using HTTP syntax. Follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

The HTTP endpoint for SPARQL queries to a Neptune DB instance is:  `https://your-neptune-endpoint:port/sparql`.

**Note**  
For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

Amazon Neptune provides an HTTP endpoint for SPARQL queries. The REST interface is compatible with SPARQL version 1.1.

**QUERY Using HTTP POST**  
The following example uses **curl** to submit a SPARQL **`QUERY`** through HTTP **POST**.

```
curl -X POST --data-binary 'query=select ?s ?p ?o where {?s ?p ?o} limit 10' https://your-neptune-endpoint:port/sparql
```

The preceding example returns up to 10 of the triples (subject-predicate-object) in the graph by using the `?s ?p ?o` query with a limit of 10. To query for something else, replace it with another SPARQL query.

**Note**  
The default MIME media type of a response is `application/sparql-results+json` for `SELECT` and `ASK` queries.  
The default MIME type of a response is `application/n-quads` for `CONSTRUCT` and `DESCRIBE` queries.  
For a list of the media types used by Neptune for serialization, see [RDF serialization formats used by Neptune SPARQL](sparql-media-type-support.md#sparql-serialization-formats).

**UPDATE Using HTTP POST**  
The following example uses **curl** to submit a SPARQL **`UPDATE`** through HTTP **POST**.

```
curl -X POST --data-binary 'update=INSERT DATA { <https://test.com/s> <https://test.com/p> <https://test.com/o> . }' https://your-neptune-endpoint:port/sparql
```

The preceding example inserts the following triple into the SPARQL default graph: `<https://test.com/s> <https://test.com/p> <https://test.com/o>`

# Optional HTTP trailing headers for multi-part SPARQL responses
<a name="access-graph-sparql-http-trailing-headers"></a>

The HTTP response to SPARQL queries and updates is often returned in more than one part or chunk. It can be hard to diagnose a failure that occurs after a query or update begins sending these chunks, especially since the first one arrives with an HTTP status code of `200`.

Unless you explicitly request trailing headers, Neptune only reports such a failure by appending an error message to the message body, which is usually corrupted.

To make detection and diagnosis of this kind of problem easier, you can include a transfer-encoding (TE) trailers header (`te: trailers`) in your request (see, for example, [the MDN page about TE request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/TE)). Doing this will cause Neptune to include two new header fields within the trailing headers of the response chunks:
+ `X-Neptune-Status`  –   contains the response code followed by a short name. For instance, in case of success the trailing header would be: `X-Neptune-Status: 200 OK`. In the case of failure, the response code would be an [Neptune engine error code](errors-engine-codes.md) such as `X-Neptune-Status: 500 TimeLimitExceededException`.
+ `X-Neptune-Detail`  –   is empty for successful requests. In the case of errors, it contains the JSON error message. Because only ASCII characters are allowed in HTTP header values, the JSON string is URL encoded. The error message is also still appended to the response message body.

# RDF media types used by SPARQL in Neptune
<a name="sparql-media-type-support"></a>

Resource Description Framework (RDF) data can be serialized in many different ways, most of which SPARQL can consume or output:

## RDF serialization formats used by Neptune SPARQL
<a name="sparql-serialization-formats"></a>
+ **RDF/XML**  –   XML serialization of RDF, defined in [RDF 1.1 XML Syntax](https://www.w3.org/TR/rdf-syntax-grammar/). Media type: `application/rdf+xml`. Typical file extension: `.rdf`.
+ **N-Triples**  –   A line-based, plain-text format for encoding an RDF graph, defined in [RDF 1.1 N-Triples](https://www.w3.org/TR/n-triples/). Media type: `application/n-triples`, `text/turtle`, or `text/plain`. Typical file extension: `.nt`.
+ **N-Quads**  –   A line-based, plain-text format for encoding an RDF graph, defined in [RDF 1.1 N-Quads](https://www.w3.org/TR/n-quads/). It is an extension of N-Triples. Media type: `application/n-quads`, or `text/x-nquads` when encoded with 7-bit US-ASCII. Typical file extension: `.nq`.
+ **Turtle**  –   A textual syntax for RDF defined in [ RDF 1.1 Turtle](https://www.w3.org/TR/turtle/) that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Turtle provides levels of compatibility with the N-Triples format as well as SPARQL's triple pattern syntax. Media type: `text/turtle`Typical file extension: `.ttl`.
+ **TriG**  –   A textual syntax for RDF defined in [ RDF 1.1 TriG](https://www.w3.org/TR/trig/) that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. TriG is an extension of the Turtle format. Media type: `application/trig`. Typical file extension: `.trig`.
+ **N3 (Notation3)**  –   An assertion and logic language defined in [Notation3 (N3): A readable RDF syntax](https://www.w3.org/TeamSubmission/n3/). N3 extends the RDF data model by adding formulae (literals which are graphs themselves), variables, logical implication, and functional predicates, and provides a textual syntax alternative to RDF/XML. Media type: `text/n3`. Typical file extension: `.n3`.
+ **JSON-LD**  –   A data serialization and messaging format defined in [JSON-LD 1.0](https://www.w3.org/TR/json-ld/).Media type: `application/ld+json`. Typical file extension: `.jsonld`.
+ **TriX**  –   A serialization of RDF in XML, defined in [TriX: RDF Triples in XML](https://www.hpl.hp.com/techreports/2004/HPL-2004-56.html). Media type: `application/trix`. Typical file extension: `.trix`.
+ **SPARQL JSON Results**  –   A serialization of RDF using the [SPARQL 1.1 Query Results JSON Format](https://www.w3.org/TR/sparql11-results-json). Media type: `application/sparql-results+json`. Typical file extension: `.srj`.
+ **RDF4J Binary Format**  –   A binary format for encoding RDF data, documented in [RDF4J Binary RDF Format](https://rdf4j.org/documentation/reference/rdf4j-binary). Media type: `application/x-binary-rdf`.

## SPARQL result serialization formats used by Neptune SPARQL
<a name="sparql-serialization-formats-neptune-output"></a>
+ **SPARQL XML Results**  –   An XML format for the variable binding and boolean results formats provided by the SPARQL query language, defined in [SPARQL Query Results XML Format (Second Edition)](https://www.w3.org/TR/rdf-sparql-XMLres/). Media type: `application/sparql-results+xml`. Typical file extension: `.srx`.
+ **SPARQL CSV and TSV Results**  –   The use of comma-separated values and tab-separated values to express SPARQL query results from `SELECT` queries, defined in [SPARQL 1.1 Query Results CSV and TSV Formats](https://www.w3.org/TR/sparql11-results-csv-tsv/). Media type: `text/csv` for comma-separated values, and `text/tab-separated-values` for tab-separated values. Typical file extensions: `.csv` for comma-separated values, and `.tsv` for tab-separated values.
+ **Binary Results Table**  –   A binary format for encoding the output of SPARQL queries. Media type: `application/x-binary-rdf-results-table`.
+ **SPARQL JSON Results**  –   A serialization of RDF using the [SPARQL 1.1 Query Results JSON Format](https://www.w3.org/TR/sparql11-results-json/). Media type: `application/sparql-results+json`.

## Media-Types that Neptune can use to import RDF data
<a name="sparql-serialization-formats-input"></a>

**Media-types supported by the [Neptune bulk-loader](bulk-load.md)**
+ [N-Triples](https://www.w3.org/TR/n-triples/)
+ [N-Quads](https://www.w3.org/TR/n-quads/)
+ [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/)
+ [Turtle](https://www.w3.org/TR/turtle/)

**Media-types that SPARQL UPDATE LOAD can import**
+ [N-Triples](https://www.w3.org/TR/n-triples/)
+ [N-Quads](https://www.w3.org/TR/n-quads/)
+ [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/)
+ [Turtle](https://www.w3.org/TR/turtle/)
+ [TriG](https://www.w3.org/TR/trig/)
+ [N3](https://www.w3.org/TeamSubmission/n3/)
+ [JSON-LD](https://www.w3.org/TR/json-ld/)

## Media-Types that Neptune can use to export query results
<a name="sparql-serialization-formats-output"></a>

To specify the output format for a SPARQL query response, send an `"Accept: media-type"` header with the query request. For example:

```
curl -H "Accept: application/nquads" ...
```

**RDF media-types that SPARQL SELECT can output from Neptune**
+ [SPARQL JSON Results](https://www.w3.org/TR/sparql11-results-json) (This is the default)
+ [SPARQL XML Results](https://www.w3.org/TR/rdf-sparql-XMLres/)
+ **Binary Results Table** (media type: `application/x-binary-rdf-results-table`)
+ [Comma-Separated Values (CSV)](https://www.w3.org/TR/sparql11-results-csv-tsv/)
+ [Tab-Separated Values (TSV)](https://www.w3.org/TR/sparql11-results-csv-tsv/)

**RDF media-types that SPARQL ASK can output from Neptune**
+ [SPARQL JSON Results](https://www.w3.org/TR/sparql11-results-json) (This is the default)
+ [SPARQL XML Results](https://www.w3.org/TR/rdf-sparql-XMLres/)
+ **Boolean** (media type: `text/boolean`, meaning "true" or "false")

**RDF media-types that SPARQL CONSTRUCT can output from Neptune**
+ [N-Quads](https://www.w3.org/TR/n-quads/) (This is the default)
+ [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/)
+ [JSON-LD](https://www.w3.org/TR/json-ld/)
+ [N-Triples](https://www.w3.org/TR/n-triples/)
+ [Turtle](https://www.w3.org/TR/turtle/)
+ [N3](https://www.w3.org/TeamSubmission/n3/)
+ [TriX](https://www.hpl.hp.com/techreports/2004/HPL-2004-56.html)
+ [TriG](https://www.w3.org/TR/trig/)
+ [SPARQL JSON Results](https://www.w3.org/TR/sparql11-results-json)
+ [RDF4J Binary RDF Format](https://rdf4j.org/documentation/reference/rdf4j-binary)

**RDF media-types that SPARQL DESCRIBE can output from Neptune**
+ [N-Quads](https://www.w3.org/TR/n-quads/) (This is the default)
+ [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/)
+ [JSON-LD](https://www.w3.org/TR/json-ld/)
+ [N-Triples](https://www.w3.org/TR/n-triples/)
+ [Turtle](https://www.w3.org/TR/turtle/)
+ [N3](https://www.w3.org/TeamSubmission/n3/)
+ [TriX](https://www.hpl.hp.com/techreports/2004/HPL-2004-56.html)
+ [TriG](https://www.w3.org/TR/trig/)
+ [SPARQL JSON Results](https://www.w3.org/TR/sparql11-results-json)
+ [RDF4J Binary RDF Format](https://rdf4j.org/documentation/reference/rdf4j-binary)

# Using SPARQL UPDATE LOAD to import data into Neptune
<a name="sparql-api-reference-update-load"></a>

The syntax of the SPARQL UPDATE LOAD command is specified in the [SPARQL 1.1 Update recommendation](https://www.w3.org/TR/sparql11-update/#load):

```
LOAD SILENT (URL of data to be loaded) INTO GRAPH (named graph into which to load the data)
```
+ **`SILENT`**   –   (*Optional*) Causes the operation to return success even if there was an error during processing.

  This can be useful when a single transaction contains multiple statements like `"LOAD ...; LOAD ...; UNLOAD ...; LOAD ...;"` and you want the transaction to complete even if some of the remote data could not be processed.
+ *URL of data to be loaded*   –   (*Required*) Specifies a remote data file containing data to be loaded into a graph.

  The remote file must have one of the following extensions:
  + `.nt` for NTriples.
  + `.nq` for NQuads.
  + `.trig` for Trig.
  + `.rdf` for RDF/XML.
  + `.ttl` for Turtle.
  + `.n3` for N3.
  + `.jsonld` for JSON-LD.
+ **`INTO GRAPH`***(named graph into which to load the data)*   –   (*Optional*) Specifies the graph into which the data should be loaded.

  Neptune associates every triple with a named graph. You can specify the default named graph using the fallback named-graph URI, `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph`, like this:

  ```
  INTO GRAPH <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>
  ```

**Note**  
When you need to load a lot of data, we recommend that you use the Neptune bulk loader rather than UPDATE LOAD. For more information about the bulk loader, see [Using the Amazon Neptune bulk loader to ingest data](bulk-load.md).

You can use `SPARQL UPDATE LOAD` to load data directly from Amazon S3, or from files obtained from a self-hosted web server. The resources to be loaded must reside in the same region as the Neptune server, and the endpoint for the resources must be allowed in the VPC. For information about creating an Amazon S3 endpoint, see [Creating an Amazon S3 VPC Endpoint](bulk-load-data.md#bulk-load-prereqs-s3).

All `SPARQL UPDATE LOAD` URIs must start with `https://`. This includes Amazon S3 URLs.

In contrast to the Neptune bulk loader, a call to `SPARQL UPDATE LOAD` is fully transactional.

**Loading files directly from Amazon S3 into Neptune using SPARQL UPDATE LOAD**

Because Neptune does not allow you to pass an IAM role to Amazon S3 when using SPARQL UPDATE LOAD, either the Amazon S3 bucket in question must be public or you must use a [pre-signed Amazon S3 URL](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html) in the LOAD query.

To generate a pre-signed URL for an Amazon S3 file, you can use an AWS CLI command like this:

```
aws s3 presign --expires-in (number of seconds) s3://(bucket name)/(path to file of data to load)
```

Then you can use the resulting pre-signed URL in your `LOAD` command:

```
curl https://(a Neptune endpoint URL):8182/sparql \
  --data-urlencode 'update=load (pre-signed URL of the remote Amazon S3 file of data to be loaded) \
                           into graph (named graph)'
```

For more information, see [Authenticating Requests: Using Query Parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html). The [Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html) shows how to use a Python script to generate a presigned URL.

Also, the content type of the files to be loaded must be set correctly.

1. Set the content type of files when you upload them into Amazon S3 by using the `-metadata` parameter, like this:

   ```
   aws s3 cp test.nt s3://bucket-name/my-plain-text-input/test.nt --metadata Content-Type=text/plain
   aws s3 cp test.rdf s3://bucket-name/my-rdf-input/test.rdf --metadata Content-Type=application/rdf+xml
   ```

1. Confirm that the media-type information is actually present. Run:

   ```
   curl -v bucket-name/folder-name
   ```

   The output of this command should show the media-type information that you set when uploading the files.

1. Then you can use the `SPARQL UPDATE LOAD` command to import these files into Neptune:

   ```
   curl https://your-neptune-endpoint:port/sparql \
     -d "update=LOAD <https://s3.amazonaws.com/bucket-name/my-rdf-input/test.rdf>"
   ```

The steps above work only for a public Amazon S3 bucket, or for a bucket that you access using a [pre-signed Amazon S3 URL](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html) in the LOAD query.

 You can also set up a web proxy server to load from a private Amazon S3 bucket, as shown below:

**Using a web server to load files into Neptune with SPARQL UPDATE LOAD**

1. Install a web server on a machine running within the VPC that is hosting Neptune and the files to be loaded. For example, using Amazon Linux, you might install Apache as follows:

   ```
   sudo yum install httpd mod_ssl
   sudo /usr/sbin/apachectl start
   ```

1. Define the MIME type(s) of the RDF file-content that you are going to load. SPARQL uses the `Content-type` header sent by the web server to determine the input format of the content, so you must define the relevant MIME types for the web Server.

   For example, suppose you use the following file extensions to identify file formats:
   + `.nt` for NTriples.
   + `.nq` for NQuads.
   + `.trig` for Trig.
   + `.rdf` for RDF/XML.
   + `.ttl` for Turtle.
   + `.n3` for N3.
   + `.jsonld` for JSON-LD.

   If you are using Apache 2 as the web server, you would edit the file `/etc/mime.types` and add the following types:

   ```
    text/plain nt
    application/n-quads nq
    application/trig trig
    application/rdf+xml rdf
    application/x-turtle ttl
    text/rdf+n3 n3
    application/ld+json jsonld
   ```

1. Confirm that the MIME-type mapping works. Once you have your web server up and running and hosting RDF files in the format(s) of your choice, you can test the configuration by sending a request to the web server from your local host.

   For instance, you might send a request such as this:

   ```
   curl -v http://localhost:80/test.rdf
   ```

   Then, in the detailed output from `curl`, you should see a line such as:

   ```
   Content-Type: application/rdf+xml
   ```

   This shows that the content-type mapping was defined successfully.

1. You are now ready to load data using the SPARQL UPDATE command:

   ```
   curl https://your-neptune-endpoint:port/sparql \
       -d "update=LOAD <http://web_server_private_ip:80/test.rdf>"
   ```

**Note**  
Using `SPARQL UPDATE LOAD` can trigger a timeout on the web server when the source file being loaded is large. Neptune processes the file data as it is streamed in, and for a big file that can take longer than the timeout configured on the server. This in turn may cause the server to close the connection, which can result in the following error message when Neptune encounters an unexpected EOF in the stream:  

```
{
  "detailedMessage":"Invalid syntax in the specified file",
  "code":"InvalidParameterException"
}
```
If you receive this message and don't believe your source file contains invalid syntax, try increasing the timeout settings on the web server. You can also diagnose the problem by enabling debug logs on the server and looking for timeouts.

# Using SPARQL UPDATE UNLOAD to delete data from Neptune
<a name="sparql-api-reference-unload"></a>

Neptune also provides a custom SPARQL operation, `UNLOAD`, for removing data that is specified in a remote source. `UNLOAD` can be regarded as a counterpart to the `LOAD` operation. Its syntax is:

```
UNLOAD SILENT (URL of the remote data to be unloaded) FROM GRAPH (named graph from which to remove the data)
```
+ **`SILENT`**   –   (*Optional*) Causes the operation to return success even if there was an error when processing the data.

  This can be useful when a single transaction contains multiple statements like `"LOAD ...; LOAD ...; UNLOAD ...; LOAD ...;"` and you want the transaction to complete even if some of the remote data could not be processed.
+ *URL of the remote data to be unloaded*   –   (*Required*) Specifies a remote data file containing data to be unloaded from a graph.

  The remote file must have one of the following extensions (these are the same formats that UPDATE-LOAD supports):
  + `.nt` for NTriples.
  + `.nq` for NQuads.
  + `.trig` for Trig.
  + `.rdf` for RDF/XML.
  + `.ttl` for Turtle.
  + `.n3` for N3.
  + `.jsonld` for JSON-LD.

  All the data that this file contains will be removed from your DB cluster by the `UNLOAD` operation.

  Any Amazon S3 authentication must be included in the URL for the data to unload. You can pre-sign an Amazon S3 file and then use the resulting URL to access it securely. For example:

  ```
  aws s3 presign --expires-in (number of seconds) s3://(bucket name)/(path to file of data to unload)
  ```

  Then:

  ```
  curl https://(a Neptune endpoint URL):8182/sparql \
    --data-urlencode 'update=unload (pre-signed URL of the remote Amazon S3 data to be unloaded) \
                             from graph (named graph)'
  ```

  For more information, see [Authenticating Requests: Using Query Parameters](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html).
+ **`FROM GRAPH `***(named graph from which to remove the data)*   –   (*Optional*) Specifies the named graph from which the remote data should be unloaded.

  Neptune associates every triple with a named graph. You can specify the default named graph using the fallback named-graph URI, `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph`, like this:

  ```
  FROM GRAPH <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>
  ```

In the same way that `LOAD` corresponds to `INSERT DATA { (inline data) }`, `UNLOAD` corresponds to `DELETE DATA { (inline data) }`. Like `DELETE DATA`, `UNLOAD` does not work on data that contains blank nodes.

For example, if a local web server serves a file named `data.nt` that contains the following 2 triples:

```
<http://example.org/resource#a> <http://example.org/resource#p> <http://example.org/resource#b> .
<http://example.org/resource#a> <http://example.org/resource#p> <http://example.org/resource#c> .
```

The following `UNLOAD` command would delete those two triples from the named graph, `<http://example.org/graph1>`:

```
UNLOAD <http://localhost:80/data.nt> FROM GRAPH <http://example.org/graph1>
```

This would have the same effect as using the following `DELETE DATA` command:

```
DELETE DATA {
  GRAPH <http://example.org/graph1> {
    <http://example.org/resource#a> <http://example.org/resource#p> <http://example.org/resource#b> .
    <http://example.org/resource#a> <http://example.org/resource#p> <http://example.org/resource#c> .
  }
}
```

**Exceptions thrown by the `UNLOAD` command**
+ **`InvalidParameterException`**   –   There were blank nodes in the data. *HTTP status*: 400 Bad Request.

  *Message*: ` Blank nodes are not allowed for UNLOAD`

   
+ **`InvalidParameterException`**   –   There was broken syntax in the data. *HTTP status*: 400 Bad Request.

  *Message*: `Invalid syntax in the specified file.`

   
+ **`UnloadUrlAccessDeniedException `**   –   Access was denied. *HTTP status*: 400 Bad Request.

  *Message*: `Update failure: Endpoint (Neptune endpoint) reported access denied error. Please verify access.`

   
+ **`BadRequestException `**   –   The remote data cannot be retrieved. *HTTP status*: 400 Bad Request.

  *Message*: *(depends on the HTTP response).*

# SPARQL query hints
<a name="sparql-query-hints"></a>

You can use query hints to specify optimization and evaluation strategies for a particular SPARQL query in Amazon Neptune. 

Query hints are expressed using additional triple patterns that are embedded in the SPARQL query with the following parts:

```
scope hint value
```
+ *scope* – Determines the part of the query that the query hint applies to, such as a certain group in the query or the full query.
+ *hint* – Identifies the type of the hint to apply.
+ *value* – Determines the behavior of the system aspect under consideration.

The query hints and scopes are exposed as predefined terms in the Amazon Neptune namespace `http://aws.amazon.com/neptune/vocab/v01/QueryHints#`. The examples in this section include the namespace as a `hint` prefix that is defined and included in the query:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
```

For example, the following shows how to include a `joinOrder` hint in a `SELECT` query:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT ... {
 hint:Query hint:joinOrder "Ordered" .
 ...
}
```

The preceding query instructs the Neptune engine to evaluate joins in the query in the *given* order and disables any automatic reordering.

Consider the following when using query hints:
+ You can combine different query hints in a single query. For example, you can use the `bottomUp` query hint to annotate a subquery for bottom-up evaluation and a `joinOrder` query hint to fix the join order inside the subquery.
+ You can use the same query hint multiple times, in different non-overlapping scopes.
+ Query hints are hints. Although the query engine generally aims to consider given query hints, it might also ignore them.
+ Query hints are semantics preserving. Adding a query hint does not change the output of the query (except for the potential result order when no ordering guarantees are given—that is, when the result order is not explicitly enforced by using ORDER BY). 

The following sections provide more information about the available query hints and their usage in Neptune.

**Topics**
+ [Scope of SPARQL query hints in Neptune](#sparql-query-hints-scope)
+ [The `joinOrder` SPARQL query hint](sparql-query-hints-joinOrder.md)
+ [The `evaluationStrategy` SPARQL query hint](sparql-query-hints-evaluationStrategy.md)
+ [The `queryTimeout` SPARQL query hint](sparql-query-hints-queryTimeout.md)
+ [The `rangeSafe` SPARQL query hint](sparql-query-hints-rangeSafe.md)
+ [The `queryId` SPARQL Query Hint](sparql-query-hints-queryId.md)
+ [The `useDFE` SPARQL query hint](sparql-query-hints-useDFE.md)
+ [SPARQL query hints used with DESCRIBE](sparql-query-hints-for-describe.md)

## Scope of SPARQL query hints in Neptune
<a name="sparql-query-hints-scope"></a>

The following table shows the available scopes, associated hints, and descriptions for SPARQL query hints in Amazon Neptune. The `hint` prefix in these entries represents the Neptune namespace for hints:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
```


| Scope | Supported Hint | Description | 
| --- | --- | --- | 
| hint:Query | [joinOrder](sparql-query-hints-joinOrder.md) | The query hint applies to the whole query. | 
| hint:Query | [queryTimeout](sparql-query-hints-queryTimeout.md) | The time-out value applies to the entire query. | 
| hint:Query | [rangeSafe](sparql-query-hints-rangeSafe.md) | Type promotion is disabled for the entire query. | 
| hint:Query | [queryId](sparql-query-hints-queryId.md) | The query ID value applies to the entire query. | 
| hint:Query | [useDFE](sparql-query-hints-useDFE.md) | Use of the DFE is enabled (or disabled) for the entire query. | 
| hint:Group | [joinOrder](sparql-query-hints-joinOrder.md) | The query hint applies to the top-level elements in the specified group, but not to nested elements (such as subqueries) or parent elements. | 
| hint:SubQuery | [evaluationStrategy](sparql-query-hints-evaluationStrategy.md) | The hint is specified and applied to a nested SELECT subquery. The subquery is evaluated independently, without considering solutions computed before the subquery. | 

# The `joinOrder` SPARQL query hint
<a name="sparql-query-hints-joinOrder"></a>

When you submit a SPARQL query, the Amazon Neptune query engine investigates the structure of the query. It reorders parts of the query and tries to minimize the amount of work required for evaluation and query response time.

For example, a sequence of connected triple patterns is typically not evaluated in the given order. It is reordered using heuristics and statistics such as the selectivity of the individual patterns and how they are connected through shared variables. Additionally, if your query contains more complex patterns such as subqueries, FILTERs, or complex OPTIONAL or MINUS blocks, the Neptune query engine reorders them where possible, aiming for an efficient evaluation order.

For more complex queries, the order in which Neptune chooses to evaluate the query might not always be optimal. For instance, Neptune might miss instance data-specific characteristics (such as hitting power nodes in the graph) that emerge during query evaluation.

If you know the exact characteristics of the data and want to manually dictate the order of the query execution, use the Neptune `joinOrder` query hint to specify that the query be evaluated in the given order.

## `joinOrder` SPARQL hint syntax
<a name="sparql-query-hints-joinOrder-syntax"></a>

The `joinOrder` query hint is specified as a triple pattern included in a SPARQL query.

For clarity, the following syntax uses a `hint` prefix defined and included in the query to specify the Neptune query-hint namespace:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
scope hint:joinOrder "Ordered" .
```

**Available Scopes**
+ `hint:Query`
+ `hint:Group`

For more information about query hint scopes, see [Scope of SPARQL query hints in Neptune](sparql-query-hints.md#sparql-query-hints-scope).

## `joinOrder` SPARQL hint example
<a name="sparql-query-hints-joinOrder-example"></a>

This section shows a query written with and without the `joinOrder` query hint and related optimizations.

For this example, assume that the dataset contains the following:
+ A single person named `John` that `:likes` 1,000 persons, including `Jane`.
+ A single person named `Jane` that `:likes` 10 persons, including `John`.

**No Query Hint**  
The following SPARQL query extracts all the pairs of people named `John` and `Jane` who both like each other from a set of social networking data:

```
PREFIX : <https://example.com/>
SELECT ?john ?jane {
  ?person1 :name "Jane" .
  ?person1 :likes ?person2 .
  ?person2 :name "John" .
  ?person2 :likes ?person1 .
}
```

The Neptune query engine might evaluate the statements in a different order than written. For example, it might choose to evaluate in the following order:

1. Find all persons named `John`.

1. Find all persons connected to `John` by a `:likes` edge.

1. Filter this set by persons named `Jane`.

1. Filter this set by those connected to `John` by a `:likes` edge.

According to the dataset, evaluating in this order results in 1,000 entities being extracted in the second step. The third step narrows this down to the single node, `Jane`. The final step then determines that `Jane` also `:likes` the `John` node.

**Query Hint**  
It would be favorable to start with the `Jane` node because she has only 10 outgoing `:likes` edges. This reduces the amount of work during the evaluation of the query by avoiding the extraction of the 1,000 entities during the second step.

The following example uses the **joinOrder** query hint to ensure that the `Jane` node and its outgoing edges are processed first by disabling all automatic join reordering for the query:

```
PREFIX : <https://example.com/>
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT ?john ?jane {
  hint:Query hint:joinOrder "Ordered" .
  ?person1 :name "Jane" .
  ?person1 :likes ?person2 .
  ?person2 :name "John" .
  ?person2 :likes ?person1 .
}
```

An applicable real-world scenario might be a social network application in which persons in the network are classified as either influencers with many connections or normal users with few connections. In such a scenario, you could ensure that the normal user (`Jane`) is processed before the influencer (`John`) in a query like the preceding example.

**Query Hint and Reorder**  
You can take this example one step further. If you know that the `:name` attribute is unique to a single node, you could speed up the query by reordering and using the `joinOrder` query hint. This step ensures that the unique nodes are extracted first.

```
PREFIX : <https://example.com/>
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT ?john ?jane {
  hint:Query hint:joinOrder "Ordered" .
  ?person1 :name "Jane" .
  ?person2 :name "John" .
  ?person1 :likes ?person2 .
  ?person2 :likes ?person1 .
}
```

In this case, you can reduce the query to the following single actions in each step:

1. Find the single person node with `:name` `Jane`.

1. Find the single person node with `:name` `John`.

1. Check that the first node is connected to the second with a `:likes` edge.

1. Check that the second node is connected to the first with a `:likes` edge.



**Important**  
If you choose the wrong order, the `joinOrder` query hint can lead to significant performance drops. For example, the preceding example would be inefficient if the `:name` attributes were not unique. If all 100 nodes were named `Jane` and all 1,000 nodes were named `John`, then the query would end up checking 1,000 \$1 100 (100,000) pairs for `:likes` edges.

# The `evaluationStrategy` SPARQL query hint
<a name="sparql-query-hints-evaluationStrategy"></a>

The `evaluationStrategy` query hint tells the Amazon Neptune query engine that the fragment of the query annotated should be evaluated from the bottom up, as an independent unit. This means that no solutions from previous evaluation steps are used to compute the query fragment. The query fragment is evaluated as a standalone unit, and its produced solutions are joined with the remainder of the query after it is computed.

Using the `evaluationStrategy` query hint implies a blocking (non-pipelined) query plan, meaning that the solutions of the fragment annotated with the query hint are materialized and buffered in main memory. Using this query hint might significantly increase the amount of main memory needed to evaluate the query, especially if the annotated query fragment computes a large number of results.

## `evaluationStrategy` SPARQL hint syntax
<a name="sparql-query-hints-evaluationStrategy-syntax"></a>

The `evaluationStrategy` query hint is specified as a triple pattern included in a SPARQL query.

For clarity, the following syntax uses a `hint` prefix defined and included in the query to specify the Neptune query-hint namespace:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
hint:SubQuery hint:evaluationStrategy "BottomUp" .
```

**Available Scopes**
+ `hint:SubQuery`

**Note**  
This query hint is supported only in nested subqueries.

For more information about query hint scopes, see [Scope of SPARQL query hints in Neptune](sparql-query-hints.md#sparql-query-hints-scope).

## `evaluationStrategy` SPARQL hint example
<a name="sparql-query-hints-evaluationStrategy-example"></a>



This section shows a query written with and without the `evaluationStrategy` query hint and related optimizations.

For this example, assume that the dataset has the following characteristics:
+ It contains 1,000 edges labeled `:connectedTo`.
+ Each `component` node is connected to an average of 100 other `component` nodes.
+ The typical number of four-hop cyclical connections between nodes is around 100.

As a typical example, the `evaluationStrategy` hint can be helpful to optimize query patterns containing cycles.

**No Query Hint**  
The following SPARQL query extracts all `component` nodes that are cyclically connected to each other via four hops:

```
PREFIX : <https://example.com/>
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT * {
  ?component1 :connectedTo ?component2 .
  ?component2 :connectedTo ?component3 .
  ?component3 :connectedTo ?component4 .
  ?component4 :connectedTo ?component1 .
}
```

The approach of the Neptune query engine is to evaluate this query using the following steps:
+ Extract all 1,000 `connectedTo` edges in the graph.
+ Expand by 100x (the number of outgoing `connectedTo` edges from component2).

  Intermediate results: 100,000 nodes.
+ Expand by 100x (the number of outgoing `connectedTo` edges from component3).

  Intermediate results: 10,000,000 nodes.
+ Scan the 10,000,000 nodes for the cycle close.

This results in a streaming query plan, which has a constant amount of main memory.

**Query Hint and Subqueries**  
You might want to trade off main memory space for accelerated computation. By rewriting the query using an `evaluationStrategy` query hint, you can force the engine to compute a join between two smaller, materialized subsets.

```
PREFIX : <https://example.com/>
          PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT * {
  {
    SELECT * WHERE {
      hint:SubQuery hint:evaluationStrategy "BottomUp" .
      ?component1 :connectedTo ?component2 .
      ?component2 :connectedTo ?component3 .
    }
  }
  {
    SELECT * WHERE {
      hint:SubQuery hint:evaluationStrategy "BottomUp" .
      ?component3 :connectedTo ?component4 .
      ?component4 :connectedTo ?component1 .
    }
  }
}
```

Instead of evaluating the triple patterns in sequence while iteratively using results from the previous triple pattern as input for the upcoming patterns, the `evaluationStrategy` hint causes the two subqueries to be evaluated independently. Both subqueries produce 100,000 nodes for intermediate results, which are then joined together to form the final output. 

In particular, when you run Neptune on the larger instance types, temporarily storing these two 100,000 subsets in main memory increases memory usage in return for significantly speeding up evaluation.

# The `queryTimeout` SPARQL query hint
<a name="sparql-query-hints-queryTimeout"></a>

The `queryTimeout` query hint specifies a timeout that is shorter than the `neptune_query_timeout` value set in the DB parameters group.

If the query terminates as a result of this hint, a `TimeLimitExceededException` is thrown, with an `Operation terminated (deadline exceeded)` message.

## `queryTimeout` SPARQL hint syntax
<a name="sparql-query-hints-queryTimeout-syntax"></a>

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT ... WHERE {
    hint:Query hint:queryTimeout 10 .
    # OR
    hint:Query hint:queryTimeout "10" .
    # OR
    hint:Query hint:queryTimeout "10"^^xsd:integer .
 ...
}
```

The time-out value is expressed in milliseconds.

The time-out value must be smaller than the `neptune_query_timeout` value set in the DB parameters group. Otherwise, a `MalformedQueryException` exception is thrown with a `Malformed query: Query hint 'queryTimeout' must be less than neptune_query_timeout DB Parameter Group` message.

The `queryTimeout` query hint should be specified in the `WHERE` clause of the main query, or in the `WHERE` clause of one of the subqueries as shown in the example below.

It must be set only once across all the queries/subqueries and SPARQL Updates sections (such as INSERT and DELETE). Otherwise, a `MalformedQueryException` exception is thrown with a `Malformed query: Query hint 'queryTimeout' must be set only once` message.

**Available Scopes**

The `queryTimeout` hint can be applied both to SPARQL queries and updates.
+ In a SPARQL query, it can appear in the WHERE clause of the main query or a subquery.
+ In a SPARQL update, it can be set in the INSERT, DELETE, or WHERE clause. If there are multiple update clauses, it can only be set in one of them.

For more information about query hint scopes, see [Scope of SPARQL query hints in Neptune](sparql-query-hints.md#sparql-query-hints-scope).

## `queryTimeout` SPARQL hint example
<a name="sparql-query-hints-queryTimeout-example"></a>

Here is an example of using `hint:queryTimeout` in the main `WHERE` clause of an `UPDATE` query:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
INSERT {
    ?s ?p ?o
} WHERE {
    hint:Query hint:queryTimeout 100 .
    ?s ?p ?o .
}
```

Here, the `hint:queryTimeout` is in the `WHERE` clause of a subquery:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT * {
   ?s ?p ?o .
   {
      SELECT ?s WHERE {
         hint:Query hint:queryTimeout 100 .
         ?s ?p1 ?o1 .
      }
   }
}
```

# The `rangeSafe` SPARQL query hint
<a name="sparql-query-hints-rangeSafe"></a>

Use this query hint to turn off type promotion for a SPARQL query.

When you submit a SPARQL query that includes a `FILTER` over a numerical value or range, the Neptune query engine must normally use type promotion when it executes the query. This means that it has to examine values of every type that could hold the value you are filtering on.

For example, if you are filtering for values equal to 55, the engine must look for integers equal to 55, long integers equal to 55L, floats equal to 55.0, and so forth. Each type promotion requires an additional lookup on storage, which can cause an apparently simple query to take an unexpectedly long time to complete.

Often type promotion is unnecessary because you know in advance that you only need to find values of one specific type. When this is the case, you can speed up your queries dramatically by using the `rangeSafe` query hint to turn off type promotion.

## `rangeSafe` SPARQL hint syntax
<a name="sparql-query-hints-rangeSafe-syntax"></a>

The `rangeSafe` query hint takes a value of `true` to turn off type promotion. It also accepts a value of `false` (the default).

**Example.** The following example shows how to turn off type promotion when filtering for an integer value of `o` greater than 1:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT * {
   ?s ?p ?o .
   hint:Prior hint:rangeSafe 'true' .
   FILTER (?o > '1'^^<http://www.w3.org/2001/XMLSchema#int>)
```

# The `queryId` SPARQL Query Hint
<a name="sparql-query-hints-queryId"></a>

Use this query hint to assign your own queryId value to a SPARQL query.

Example:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
SELECT * WHERE {
  hint:Query hint:queryId "4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47"
  {?s ?p ?o}}
```

The value you assign must be unique across all queries in the Neptune DB.

# The `useDFE` SPARQL query hint
<a name="sparql-query-hints-useDFE"></a>

Use this query hint to enable use of the DFE for executing the query. By default Neptune does not use the DFE without this query hint being set to `true`, because the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter defaults to `viaQueryHint`. If you set that instance parameter to `enabled`, the DFE engine is used for all queries except those having the `useDFE` query hint set to `false`.

Example of enabling use of the DFE for a query:

```
PREFIX : <https://example.com/>
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>

SELECT ?john ?jane
{
  hint:Query hint:useDFE true .
  ?person1 :name "Jane" .
  ?person1 :likes ?person2 .
  ?person2 :name "John" .
  ?person2 :likes ?person1 .
}
```

# SPARQL query hints used with DESCRIBE
<a name="sparql-query-hints-for-describe"></a>

A SPARQL `DESCRIBE` query provides a flexible mechanism for requesting resource descriptions. However, the SPARQL specifications do not define the precise semantics of `DESCRIBE`.

Starting with [engine release 1.2.0.2](engine-releases-1.2.0.2.md), Neptune supports several different `DESCRIBE` modes and algorithms that are suited to different situations.

This sample dataset can help illustrate the different modes:

```
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <https://example.com/> .

:JaneDoe :firstName "Jane" .
:JaneDoe :knows :JohnDoe .
:JohnDoe :firstName "John" .
:JaneDoe :knows _:b1 .
_:b1 :knows :RichardRoe .

:RichardRoe :knows :JaneDoe .
:RichardRoe :firstName "Richard" .

_:s1 rdf:type rdf:Statement .
_:s1 rdf:subject :JaneDoe .
_:s1 rdf:predicate :knows .
_:s1 rdf:object :JohnDoe .
_:s1 :knowsFrom "Berlin" .

:ref_s2 rdf:type rdf:Statement .
:ref_s2 rdf:subject :JaneDoe .
:ref_s2 rdf:predicate :knows .
:ref_s2 rdf:object :JohnDoe .
:ref_s2 :knowsSince 1988 .
```

The examples below assume that a description of the resource `:JaneDoe` is being requested using a SPARQL query like this:

```
DESCRIBE <https://example.com/JaneDoe>
```

## The `describeMode` SPARQL query hint
<a name="sparql-query-hints-describeMode"></a>

The `hint:describeMode` SPARQL query hint is used to select one of the following SPARQL `DESCRIBE` modes supported by Neptune:

### The `ForwardOneStep` DESCRIBE mode
<a name="sparql-query-hints-describeMode-ForwardOneStep"></a>

You invoke the `ForwardOneStep` mode with the `describeMode` query hint like this:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
DESCRIBE <https://example.com/JaneDoe>
{
  hint:Query hint:describeMode "ForwardOneStep"
}
```

The `ForwardOneStep` mode only returns the attributes and forward links of the resource to be described. In the example case, this means it returns the triples that have `:JaneDoe`, the resource to be described, as subject:

```
:JaneDoe :firstName "Jane" .
:JaneDoe :knows :JohnDoe .
:JaneDoe :knows _:b301990159 .
```

Note that the DESCRIBE query may return triples with blank nodes, such as `_:b301990159`, which have different IDs each time, compared to the input dataset.

### The `SymmetricOneStep` DESCRIBE mode
<a name="sparql-query-hints-describeMode-SymmetricOneStep"></a>

`SymmetricOneStep` is the default DESCRIBE mode if you don't provide a query hint. You can also invoke it explicitly with the `describeMode` query hint like this:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
DESCRIBE <https://example.com/JaneDoe>
{
  hint:Query hint:describeMode "SymmetricOneStep"
}
```

Under `SymmetricOneStep` semantics, `DESCRIBE` returns the attributes, forward links, and reverse links of the resource to be described:

```
:JaneDoe :firstName "Jane" .
:JaneDoe :knows :JohnDoe .
:JaneDoe :knows _:b318767375 .

_:b318767631 rdf:subject :JaneDoe .

:RichardRoe :knows :JaneDoe .

:ref_s2 rdf:subject :JaneDoe .
```

### The Concise Bounded Description (`CBD`) DESCRIBE mode
<a name="sparql-query-hints-describeMode-CBD"></a>

The Concise Bounded Description (`CBD`) mode is invoked using the `describeMode` query hint like this:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
DESCRIBE <https://example.com/JaneDoe>
{
  hint:Query hint:describeMode "CBD"
}
```

Under `CBD` semantics, `DESCRIBE` returns the Concise Bounded Description (as [defined by W3C](http://www.w3.org/Submission/CBD)) of the resource to be described:

```
:JaneDoe :firstName "Jane" .
:JaneDoe :knows :JohnDoe .
:JaneDoe :knows _:b285212943 .
_:b285212943 :knows :RichardRoe .

_:b285213199 rdf:subject :JaneDoe .
_:b285213199 rdf:type rdf:Statement .
_:b285213199 rdf:predicate :knows .
_:b285213199 rdf:object :JohnDoe .
_:b285213199 :knowsFrom "Berlin" .

:ref_s2 rdf:subject :JaneDoe .
```

The Concise Bounded Description of an RDF resource (that is, a node in an RDF graph) is the smallest subgraph centered around that node that can stand alone. In practice this means that if you think of this graph as a tree, with the designated node as the root, there are no blank nodes (bnodes) as leaves of that tree. Since bnodes can't be addressed externally or used in subsequent queries, it's not enough for browsing the graph just to find the next single hop(s) from the current node. You also have to go far enough to find something that can be used in subsequent queries (that is, something other than a bnode).

#### Computing the CBD
<a name="sparql-query-hints-describeMode-CBD-computing"></a>

Given a particular node (the starting node or root) in the source RDF graph, the CBD of that node is computed as follows:

1. Include in the subgraph all statements in the source graph where the *subject* of the statement is the starting node.

1. Recursively, for all statements in the subgraph thus far that have a blank node *object*, include in the subgraph all statements in the source graph where the *subject* of the statement is that blank node, and which are not already included in the subgraph.

1. Recursively, for all statements included in the subgraph thus far, for all reifications of these statements in the source graph, include the CBD beginning from the `rdf:Statement` node of each reification.

This results in a subgraph where the *object* nodes are either IRI references or literals, or blank nodes not serving as the *subject* of any statement in the graph. Note that the CBD cannot be computed using a single SPARQL SELECT or CONSTRUCT query.

### The Symmetric Concise Bounded Description (`SCBD`) DESCRIBE mode
<a name="sparql-query-hints-describeMode-SCBD"></a>

The Symmetric Concise Bounded Description (`SCBD`) mode is invoked using the `describeMode` query hint like this:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
DESCRIBE <https://example.com/JaneDoe>
{
  hint:Query hint:describeMode "SCBD"
}
```

Under `SCBD` semantics, `DESCRIBE` returns the Symmetric Concise Bounded Description of the resource (as defined by W3C in [Describing Linked Datasets with the VoID Vocabulary](http://www.w3.org/TR/void/):

```
:JaneDoe :firstName "Jane" .
:JaneDoe :knows :JohnDoe .
:JaneDoe :knows _:b335544591 .
_:b335544591 :knows :RichardRoe .

:RichardRoe :knows :JaneDoe .

_:b335544847 rdf:subject :JaneDoe .
_:b335544847 rdf:type rdf:Statement .
_:b335544847 rdf:predicate :knows .
_:b335544847 rdf:object :JohnDoe .
_:b335544847 :knowsFrom "Berlin" .

:ref_s2 rdf:subject :JaneDoe .
```

The advantage of CBD and SCBD over the `ForwardOneStep` and `SymmetricOneStep` modes is that blank nodes are always expanded to include their representation. This may be an important advantage because you can't query a blank node using SPARQL. In addition, CBD and SCBD modes also consider reifications.

Note that the `describeMode` query hint can also be part of a `WHERE` clause:

```
PREFIX hint: <http://aws.amazon.com/neptune/vocab/v01/QueryHints#>
DESCRIBE ?s
WHERE {
  hint:Query hint:describeMode "CBD" .
  ?s rdf:type <https://example.com/Person>
}
```

## The `describeIterationLimit` SPARQL query hint
<a name="sparql-query-hints-describeIterationLimit"></a>

The `hint:describeIterationLimit` SPARQL query hint provides an **optional** constraint on the maximum number of iterative expansions to be performed for iterative DESCRIBE algorithms such as CBD and SCBD.

DESCRIBE limits are ANDed together. Therefore, if both the iteration limit and the statements limit are specified, then both limits must be met before the DESCRIBE query is cut off.

The default for this value is 5. You may set it to ZERO (0) to specify NO limit on the number of iterative expansions.

## The `describeStatementLimit` SPARQL query hint
<a name="sparql-query-hints-describeStatementLimit"></a>

The `hint:describeStatementLimit` SPARQL query hint provides an **optional** constraint on the maximum number of statements that may be present in a DESCRIBE query response. It is only applied for iterative DESCRIBE algorithms such as CBD and SCBD.

DESCRIBE limits are ANDed together. Therefore, if both the iteration limit and the statements limit are specified, then both limits must be met before the DESCRIBE query is cut off.

The default for this value is 5000. You may set it to ZERO (0) to specify NO limit on the number of statements returned.

# SPARQL DESCRIBE behavior with respect to the default graph
<a name="sparql-default-describe"></a>

The SPARQL [https://www.w3.org/TR/sparql11-query/#describe](https://www.w3.org/TR/sparql11-query/#describe) query form lets you retrieve information about resources without knowing the structure of the data and without having to compose a query. How this information is assembled is left up to the SPARQL implementation. Neptune provides [several query hints](sparql-query-hints-for-describe.md) that invoke different modes and algorithms for `DESCRIBE` to use.

In Neptune's implementation, regardless of the mode, `DESCRIBE` only uses data present in the [SPARQL default graph](feature-sparql-compliance.md#sparql-default-graph). This is consistent with the way SPARQL treats datasets (see [Specifying RDF Datasets](https://www.w3.org/TR/sparql11-query/#specifyingDataset) in the SPARQL specification).

In Neptune, the default graph contains all unique triples in the union of all named graphs in the database, unless particular named graphs are specified using `FROM` and/or `FROM NAMED` clauses. All RDF data in Neptune is stored in a named graph. If a triple is inserted without a named-graph context, Neptune stores it in a named graph designated `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph`.

When one or more named graphs are specified using the `FROM` clause, the default graph is the union of all unique triples in those named graphs. If there is no `FROM` clause and there are one or more `FROM NAMED` clauses, then the default graph is empty.

## SPARQL `DESCRIBE` examples
<a name="sparql-default-describe-examples"></a>

Consider the following data:

```
PREFIX ex: <https://example.com/>

GRAPH ex:g1 {
    ex:s ex:p1 "a" .
    ex:s ex:p2 "c" .
}

GRAPH ex:g2 {
    ex:s ex:p3 "b" .
    ex:s ex:p2 "c" .
}

ex:s ex:p3 "d" .
```

For this query:

```
PREFIX ex: <https://example.com/>
DESCRIBE ?s
FROM ex:g1
FROM NAMED ex:g2
WHERE {
  GRAPH ex:g2 { ?s ?p "b" . }
}
```

Neptune would return:

```
ex:s ex:p1 "a" .
ex:s ex:p2 "c" .
```

Here, the graph pattern `GRAPH ex:g2 { ?s ?p "b" }` is evaluated first, resulting in bindings for `?s`, and then the `DESCRIBE` part is evaluated over the default graph, which is now just `ex:g1`.

However, for this query:

```
PREFIX ex: <https://example.com/>
DESCRIBE ?s 
FROM NAMED ex:g1 
WHERE { 
  GRAPH ex:g1 { ?s ?p "a" . } 
}
```

Neptune would return nothing, because when a `FROM NAMED` clause is present without any `FROM` clause, the default graph is empty.

In the following query, `DESCRIBE` is used with no `FROM` or `FROM NAMED` clause present:

```
PREFIX ex: <https://example.com/>
DESCRIBE ?s 
WHERE { 
  GRAPH ex:g1 { ?s ?p "a" . } 
}
```

In this situation, the default graph is composed of all the unique triples in the union of all the named graphs in the database (formally, the RDF merge), so Neptune would return:

```
ex:s ex:p1 "a" . 
ex:s ex:p2 "c" . 
ex:s ex:p3 "b" .
ex:s ex:p3 "d" .
```

# SPARQL query status API
<a name="sparql-api-status"></a>

To get the status of SPARQL queries, use HTTP `GET` or `POST` to make a request to the `https://your-neptune-endpoint:port/sparql/status` endpoint. 

## SPARQL query status request parameters
<a name="sparql-api-status-get-request"></a>

**queryId (optional)**  
The ID of a running SPARQL query. Only displays the status of the specified query.

## SPARQL query status response syntax
<a name="sparql-api-status-get-response-syntax"></a>

```
{
    "acceptedQueryCount": integer,
    "runningQueryCount": integer,
    "queries": [
      {
        "queryId":"guid",
        "queryEvalStats":
          {
            "subqueries": integer,
            "elapsed": integer,
            "cancelled": boolean
          },
        "queryString": "string"
      }
    ]
}
```

## SPARQL query status response values
<a name="sparql-api-status-get-response-values"></a>

**acceptedQueryCount**  
The number of queries accepted since the last restart of the Neptune engine.

**runningQueryCount**  
The number of currently running SPARQL queries.

**queries**  
A list of the current SPARQL queries.

**queryId**  
A GUID id for the query. Neptune automatically assigns this ID value to each query, or you can also assign your own ID (see [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md)). 

**queryEvalStats**  
Statistics for this query.

**subqueries**  
Number of subqueries in this query.

**elapsed**  
The number of milliseconds the query has been running so far.

**cancelled**  
True indicates that the query was cancelled.

**queryString**  
The submitted query.

## SPARQL query status example
<a name="sparql-api-status-get-example"></a>

The following is an example of the status command using `curl` and HTTP `GET`.

```
curl https://your-neptune-endpoint:port/sparql/status
```

This output shows a single running query.

```
{
    "acceptedQueryCount":9,
    "runningQueryCount":1,
    "queries": [
        {
            "queryId":"fb34cd3e-f37c-4d12-9cf2-03bb741bf54f",
            "queryEvalStats":
                {
                    "subqueries": 0,
                    "elapsed": 29256,
                    "cancelled": false
                },
            "queryString": "SELECT ?s ?p ?o WHERE {?s ?p ?o}"
        }
    ]
}
```

# SPARQL query cancellation
<a name="sparql-api-status-cancel"></a>

To get the status of SPARQL queries, use HTTP `GET` or `POST` to make a request to the `https://your-neptune-endpoint:port/sparql/status` endpoint.

## SPARQL query cancellation request parameters
<a name="sparql-api-status-cancel-request"></a>

**cancelQuery**  
(Required) Tells the status command to cancel a query. This parameter does not take a value.

**queryId**  
(Required) The ID of the running SPARQL query to cancel.

**silent**  
(Optional) If `silent=true` then the running query is cancelled and the HTTP response code is 200. If `silent` is not present or `silent=false`, the query is cancelled with an HTTP 500 status code.

## SPARQL query cancellation examples
<a name="sparql-api-status-cancel-example"></a>

**Example 1: Cancellation with `silent=false`**  
The following is an example of the status command using `curl` to cancel a query with the `silent` parameter set to `false`:

```
curl https://your-neptune-endpoint:port/sparql/status \
  -d "cancelQuery" \
  -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47" \
  -d "silent=false"
```

Unless the query has already started streaming results, the cancelled query would then return an HTTP 500 code with a response like this:

```
{
  "code": "CancelledByUserException",
  "requestId": "4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47",
  "detailedMessage": "Operation terminated (cancelled by user)"
}
```

If the query already returned an HTTP 200 code (OK) and has started streaming results before being cancelled, the timeout exception information is sent to the regular output stream.

**Example 2: Cancellation with `silent=true`**  
The following is an example of the same status command as above except with the `silent` parameter now set to `true`:

```
curl https://your-neptune-endpoint:port/sparql/status \
  -d "cancelQuery" \
  -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47" \
  -d "silent=true"
```

This command would return the same response as when `silent=false`, but the cancelled query would now return an HTTP 200 code with a response like this:

```
{
  "head" : {
    "vars" : [ "s", "p", "o" ]
  },
  "results" : {
    "bindings" : [ ]
  }
}
```

# Using the SPARQL 1.1 Graph Store HTTP Protocol (GSP) in Amazon Neptune
<a name="sparql-graph-store-protocol"></a>

In the [SPARQL 1.1 Graph Store HTTP Protocol](https://www.w3.org/TR/sparql11-http-rdf-update/) recommendation, the W3C defined an HTTP protocol for managing RDF graphs. It defines operations for removing, creating, and replacing RDF graph content as well as for adding RDF statements to existing content.

The graph-store protocol (GSP) provides a convenient way to manipulate your entire graph without having to write complex SPARQL queries.

Neptune fully supports this protocol.

The endpoint for the graph-store protocol (GSP) is:

```
https://your-neptune-cluster:port/sparql/gsp/
```

To access the default graph with GSP, use:

```
https://your-neptune-cluster:port/sparql/gsp/?default
```

To access a named graph with GSP, use:

```
https://your-neptune-cluster:port/sparql/gsp/?graph=named-graph-URI
```

## Special details of the Neptune GSP implementation
<a name="sparql-graph-store-protocol-special"></a>

Neptune fully implements the [W3C recommendation](https://www.w3.org/TR/sparql11-http-rdf-update/) that defines GSP. However, there are a few situations that the specification doesn't cover.

One of these is the case where a `PUT` or `POST` request specifies one or more named graphs in the request body that differ from the graph specified by the request URL. This can only happen when the request body RDF format supports named graphs, as, for example, using `Content-Type: application/n-quads` or `Content-Type: application/trig`.

In this situation, Neptune adds or updates all the named graphs present in the body, as well as the named graph specified in the URL.

For example, suppose that starting with an empty database, you send a `PUT` request to upsert votes into three graphs. One, named `urn:votes`, contains all votes from all election years. Two others, named `urn:votes:2005` and `urn:votes:2019`, contain votes from specific election years. The request and its payload look like this:

```
PUT "http://your-Neptune-cluster:port/sparql/gsp/?graph=urn:votes"
  Host: example.com
  Content-Type: application/n-quads

  PAYLOAD:

  <urn:JohnDoe> <urn:votedFor> <urn:Labour> <urn:votes:2005>
  <urn:JohnDoe> <urn:votedFor> <urn:Conservative> <urn:votes:2019>
  <urn:JaneSmith> <urn:votedFor> <urn:LiberalDemocrats> <urn:votes:2005>
  <urn:JaneSmith> <urn:votedFor> <urn:Conservative> <urn:votes:2019>
```

After the request is executed, the data in the database looks like this:

```
<urn:JohnDoe>   <urn:votedFor> <urn:Labour>           <urn:votes:2005>
<urn:JohnDoe>   <urn:votedFor> <urn:Conservative>     <urn:votes:2019>
<urn:JaneSmith> <urn:votedFor> <urn:LiberalDemocrats> <urn:votes:2005>
<urn:JaneSmith> <urn:votedFor> <urn:Conservative>     <urn:votes:2019>
<urn:JohnDoe>   <urn:votedFor> <urn:Labour>           <urn:votes>
<urn:JohnDoe>   <urn:votedFor> <urn:Conservative>     <urn:votes>
<urn:JaneSmith> <urn:votedFor> <urn:LiberalDemocrats> <urn:votes>
<urn:JaneSmith> <urn:votedFor> <urn:Conservative>     <urn:votes>
```

Another ambiguous situation is where more than one graph is specied in the request URL itself, using any of `PUT`, `POST`, `GET` or `DELETE`. For example:

```
POST "http://your-Neptune-cluster:port/sparql/gsp/?graph=urn:votes:2005&graph=urn:votes:2019"
```

Or:

```
GET "http://your-Neptune-cluster:port/sparql/gsp/?default&graph=urn:votes:2019"
```

In this situation, Neptune returns an HTTP 400 with a message indicating that only one graph can be specified in the request URL.

# Analyzing Neptune query execution using SPARQL `explain`
<a name="sparql-explain"></a>

Amazon Neptune has added a SPARQL feature named *explain*. This feature is a self-service tool for understanding the execution approach taken by the Neptune engine. You invoke it by adding an `explain` parameter to an HTTP call that submits a SPARQL query.

The `explain` feature provides information about the logical structure of query execution plans. You can use this information to identify potential evaluation and execution bottlenecks. You can then use [query hints](sparql-query-hints.md) to improve your query execution plans.

**Topics**
+ [How the SPARQL query engine works in Neptune](sparql-explain-engine.md)
+ [How to use SPARQL `explain` to analyze Neptune query execution](sparql-explain-using.md)
+ [Examples of invoking SPARQL `explain` in Neptune](sparql-explain-examples.md)
+ [Neptune SPARQL `explain` operators](sparql-explain-operators.md)
+ [Limitations of SPARQL `explain` in Neptune](sparql-explain-limitations.md)

# How the SPARQL query engine works in Neptune
<a name="sparql-explain-engine"></a>

To use the information that the SPARQL `explain` feature provides, you need to understand some details about how the Amazon Neptune SPARQL query engine works.

The engine translates every SPARQL query into a pipeline of operators. Starting from the first operator, intermediate solutions known as *binding lists* flow through this operator pipeline. You can think of a binding list as a table in which the table headers are a subset of the variables used in the query. Each row in the table represents a result, up to the point of evaluation.

Let's assume that two namespace prefixes have been defined for our data:

```
  @prefix ex:   <http://example.com> .
  @prefix foaf: <http://xmlns.com/foaf/0.1/> .
```

The following would be an example of a simple binding list in this context:

```
  ?person       | ?firstName
  ------------------------------------------------------
  ex:JaneDoe    | "Jane"
  ex:JohnDoe    | "John"
  ex:RichardRoe | "Richard"
```

For each of three people, the list binds the `?person` variable to an identifier of the person, and the `?firstName` variable to the person's first name.

In the general case, variables can remain unbound, if, for example, there is an `OPTIONAL` selection of a variable in a query for which no value is present in the data.

The `PipelineJoin` operator is an example of a Neptune query engine operator present in the `explain` output. It takes as input an incoming binding set from the previous operator and joins it against a triple pattern, say `(?person, foaf:lastName, ?lastName)`. This operation uses the bindings for the `?person` variable in its input stream, substitutes them into the triple pattern, and looks up triples from the database.

When executed in the context of the incoming bindings from the previous table, `PipelineJoin` would evaluate three lookups, namely the following:

```
  (ex:JaneDoe,    foaf:lastName, ?lastName)
  (ex:JohnDoe,    foaf:lastName, ?lastName)
  (ex:RichardRoe, foaf:lastName, ?lastName)
```

This approach is called *as-bound* evaluation. The solutions from this evaluation process are joined back against the incoming solutions, padding the detected `?lastName` in the incoming solutions. Assuming that you find a last name for all three persons, the operator would produce an outgoing binding list that would look something like this:

```
  ?person       | ?firstName | ?lastName
  ---------------------------------------
  ex:JaneDoe    | "Jane"     | "Doe"
  ex:JohnDoe    | "John"     | "Doe"
  ex:RichardRoe | "Richard"  | "Roe"
```

This outgoing binding list then serves as input for the next operator in the pipeline. At the end, the output of the last operator in the pipeline defines the query result.

Operator pipelines are often linear, in the sense that every operator emits solutions for a single connected operator. However, in some cases, they can have more complex structures. For example, a `UNION` operator in a SPARQL query is mapped to a `Copy` operation. This operation duplicates the bindings and forwards the copies into two subplans, one for the left side and the other for the right side of the `UNION`.

For more information about operators, see [Neptune SPARQL `explain` operators](sparql-explain-operators.md).

# How to use SPARQL `explain` to analyze Neptune query execution
<a name="sparql-explain-using"></a>

The SPARQL `explain` feature is a self-service tool in Amazon Neptune that helps you understand the execution approach taken by the Neptune engine. To invoke `explain`, you pass a parameter to an HTTP or HTTPS request in the form `explain=mode`.

The mode value can be one of `static` `dynamic`, or `details`:
+ In *static* mode, `explain` prints only the static structure of the query plan.
+ In *dynamic* mode, `explain` also includes dynamic aspects of the query plan. These aspects might include the number of intermediate bindings flowing through the operators, the ratio of incoming bindings to outgoing bindings, and the total time taken by operators.
+ In *details* mode, `explain` prints the information shown in `dynamic` mode plus additional details such as the actual SPARQL query string and the estimated range count for the pattern underlying a join operator.

Neptune supports using `explain` with all three SPARQL query access protocols listed in the [W3C SPARQL 1.1 Protocol](https://www.w3.org/TR/sparql11-protocol/#query-operation) specification, namely:

1. HTTP GET

1. HTTP POST using URL-encoded parameters

1. HTTP POST using text parameters

For information about the SPARQL query engine, see [How the SPARQL query engine works in Neptune](sparql-explain-engine.md).

For information about the kind of output produced by invoking SPARQL `explain`, see [Examples of invoking SPARQL `explain` in Neptune](sparql-explain-examples.md).

# Examples of invoking SPARQL `explain` in Neptune
<a name="sparql-explain-examples"></a>

The examples in this section show the various kinds of output you can produce by invoking the SPARQL `explain` feature to analyze query execution in Amazon Neptune.

**Topics**
+ [Understanding Explain Output](#sparql-explain-example-output)
+ [Example of details mode output](#sparql-explain-example-details)
+ [Example of static mode output](#sparql-explain-example-static)
+ [Different ways of encoding parameters](#sparql-explain-example-parameters)
+ [Other output types besides text/plain](#sparql-explain-output-options)
+ [Example of SPARQL `explain` output when the DFE is enabled](#sparql-explain-output-dfe)

## Understanding Explain Output
<a name="sparql-explain-example-output"></a>

In this example, Jane Doe knows two people, namely John Doe and Richard Roe:

```
@prefix ex: <http://example.com> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

ex:JaneDoe foaf:knows ex:JohnDoe .
ex:JohnDoe foaf:firstName "John" .
ex:JohnDoe foaf:lastName "Doe" .
ex:JaneDoe foaf:knows ex:RichardRoe .
ex:RichardRoe foaf:firstName "Richard" .
ex:RichardRoe foaf:lastName "Roe" .
.
```

To determine the first names of all the people whom Jane Doe knows, you can write the following query:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX ex: <https://www.example.com/> \
       SELECT ?firstName WHERE { ex:JaneDoe foaf:knows ?person . ?person foaf:firstName ?firstName }" \
   -H "Accept: text/csv"
```

This simple query returns the following:

```
firstName
John
Richard
```

Next, change the `curl` command to invoke `explain` by adding `-d "explain=dynamic"` and using the default output type instead of `text/csv`:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX ex: <https://www.example.com/> \
       SELECT ?firstName WHERE { ex:JaneDoe foaf:knows ?person . ?person foaf:firstName ?firstName }" \
   -d "explain=dynamic"
```

The query now returns output in pretty-printed ASCII format (HTTP content type `text/plain`), which is the default output type:

```
╔════╤════════╤════════╤═══════════════════╤═══════════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments                                             │ Mode     │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]                                        │ -        │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ PipelineJoin      │ pattern=distinct(ex:JaneDoe, foaf:knows, ?person)     │ -        │ 1        │ 2         │ 2.00  │ 1         ║
║    │        │        │                   │ joinType=join                                         │          │          │           │       │           ║
║    │        │        │                   │ joinProjectionVars=[?person]                          │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ PipelineJoin      │ pattern=distinct(?person, foaf:firstName, ?firstName) │ -        │ 2        │ 2         │ 1.00  │ 1         ║
║    │        │        │                   │ joinType=join                                         │          │          │           │       │           ║
║    │        │        │                   │ joinProjectionVars=[?person, ?firstName]              │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ Projection        │ vars=[?firstName]                                     │ retain   │ 2        │ 2         │ 1.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ -      │ -      │ TermResolution    │ vars=[?firstName]                                     │ id2value │ 2        │ 2         │ 1.00  │ 1         ║
╚════╧════════╧════════╧═══════════════════╧═══════════════════════════════════════════════════════╧══════════╧══════════╧═══════════╧═══════╧═══════════╝
```

For details about the operations in the `Name` column and their arguments, see [explain operators](sparql-explain-operators.md).

The following describes the output row by row:

1. The first step in the main query always uses the `SolutionInjection` operator to inject a solution. The solution is then expanded to the final result through the evaluation process.

   In this case, it injects the so-called universal solution `{ }`. In the presence of `VALUES` clauses or a `BIND`, this step might also inject more complex variable bindings to start out with.

   The `Units Out` column indicates that this single solution flows out of the operator. The `Out #1` column specifies the operator into which this operator feeds the result. In this example, all operators are connected to the operator that follows in the table.

1. The second step is a `PipelineJoin`. It receives as input the single universal (fully unconstrained) solution produced by the previous operator (`Units In := 1`). It joins it against the tuple pattern defined by its `pattern` argument. This corresponds to a simple lookup for the pattern. In this case, the triple pattern is defined as the following:

   ```
   distinct( ex:JaneDoe, foaf:knows, ?person )
   ```

   The `joinType := join` argument indicates that this is a normal join (other types include `optional` joins, `existence check` joins, and so on).

   The `distinct := true` argument says that you extract only distinct matches from the database (no duplicates), and you bind the distinct matches to the variable `joinProjectionVars := ?person`, deduplicated.

   The fact that the `Units Out` column value is 2 indicates that there are two solutions flowing out. Specifically, these are the bindings for the `?person` variable, reflecting the two people that the data shows that Jane Doe knows:

   ```
    ?person
    -------------
    ex:JohnDoe
    ex:RichardRoe
   ```

1. The two solutions from stage 2 flow as input (`Units In := 2`) into the second `PipelineJoin`. This operator joins the two previous solutions with the following triple pattern:

   ```
   distinct(?person, foaf:firstName, ?firstName)
   ```

   The `?person` variable is known to be bound either to `ex:JohnDoe` or to `ex:RichardRoe` by the operator's incoming solution. Given that, the `PipelineJoin` extracts the first names, John and Richard. The outgoing two solutions (Units Out := 2) are then as follows:

   ```
    ?person       | ?firstName
    ---------------------------
    ex:JohnDoe    | John
    ex:RichardRoe | Richard
   ```

1. The next projection operator takes as input the two solutions from stage 3 (`Units In := 2`) and projects onto the `?firstName` variable. This eliminates all other variable bindings in the mappings and passes on the two bindings (`Units Out := 2`):

   ```
    ?firstName
    ----------
    John
    Richard
   ```

1. To improve performance, Neptune operates where possible on internal identifiers that it assigns to terms such as URIs and string literals, rather than on the strings themselves. The final operator, `TermResolution`, performs a mapping from these internal identifiers back to the corresponding term strings.

   In regular (non-explain) query evaluation, the result computed by the last operator is then serialized into the requested serialization format and streamed to the client.

## Example of details mode output
<a name="sparql-explain-example-details"></a>

Suppose that you run the same query as the previous in *details* mode instead of *dynamic* mode:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX ex: <https://www.example.com/> \
       SELECT ?firstName WHERE { ex:JaneDoe foaf:knows ?person . ?person foaf:firstName ?firstName }" \
   -d "explain=details"
```

As this example shows, the output is the same with some additional details such as the query string at the top of the output, and the `patternEstimate` count for the `PipelineJoin` operator:

```
Query:
PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX ex: <https://www.example.com/>
SELECT ?firstName WHERE { ex:JaneDoe foaf:knows ?person . ?person foaf:firstName ?firstName }

╔════╤════════╤════════╤═══════════════════╤═══════════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments                                             │ Mode     │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]                                        │ -        │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ PipelineJoin      │ pattern=distinct(ex:JaneDoe, foaf:knows, ?person)     │ -        │ 1        │ 2         │ 2.00  │ 13        ║
║    │        │        │                   │ joinType=join                                         │          │          │           │       │           ║
║    │        │        │                   │ joinProjectionVars=[?person]                          │          │          │           │       │           ║
║    │        │        │                   │ patternEstimate=2                                     │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ PipelineJoin      │ pattern=distinct(?person, foaf:firstName, ?firstName) │ -        │ 2        │ 2         │ 1.00  │ 3         ║
║    │        │        │                   │ joinType=join                                         │          │          │           │       │           ║
║    │        │        │                   │ joinProjectionVars=[?person, ?firstName]              │          │          │           │       │           ║
║    │        │        │                   │ patternEstimate=2                                     │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ Projection        │ vars=[?firstName]                                     │ retain   │ 2        │ 2         │ 1.00  │ 1         ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ -      │ -      │ TermResolution    │ vars=[?firstName]                                     │ id2value │ 2        │ 2         │ 1.00  │ 7         ║
╚════╧════════╧════════╧═══════════════════╧═══════════════════════════════════════════════════════╧══════════╧══════════╧═══════════╧═══════╧═══════════╝
```

## Example of static mode output
<a name="sparql-explain-example-static"></a>

Suppose that you run the same query as the previous in *static* mode (the default) instead of *details* mode:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX ex: <https://www.example.com/> \
       SELECT ?firstName WHERE { ex:JaneDoe foaf:knows ?person . ?person foaf:firstName ?firstName }" \
   -d "explain=static"
```

As this example shows, the output is the same, except that it omits the last three columns:

```
╔════╤════════╤════════╤═══════════════════╤═══════════════════════════════════════════════════════╤══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments                                             │ Mode     ║
╠════╪════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════════╪══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]                                        │ -        ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────╢
║ 1  │ 2      │ -      │ PipelineJoin      │ pattern=distinct(ex:JaneDoe, foaf:knows, ?person)     │ -        ║
║    │        │        │                   │ joinType=join                                         │          ║
║    │        │        │                   │ joinProjectionVars=[?person]                          │          ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────╢
║ 2  │ 3      │ -      │ PipelineJoin      │ pattern=distinct(?person, foaf:firstName, ?firstName) │ -        ║
║    │        │        │                   │ joinType=join                                         │          ║
║    │        │        │                   │ joinProjectionVars=[?person, ?firstName]              │          ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────╢
║ 3  │ 4      │ -      │ Projection        │ vars=[?firstName]                                     │ retain   ║
╟────┼────────┼────────┼───────────────────┼───────────────────────────────────────────────────────┼──────────╢
║ 4  │ -      │ -      │ TermResolution    │ vars=[?firstName]                                     │ id2value ║
╚════╧════════╧════════╧═══════════════════╧═══════════════════════════════════════════════════════╧══════════╝
```

## Different ways of encoding parameters
<a name="sparql-explain-example-parameters"></a>

The following example queries illustrate two different ways to encode parameters when invoking SPARQL `explain`.

**Using URL encoding** – This example uses URL encoding of parameters, and specifies *dynamic* output:

```
curl -XGET "http(s)://your_server:your_port/sparql?query=SELECT%20*%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D%20LIMIT%20%31&explain=dynamic"
```

**Specifying the parameters directly** – This is the same as the previous query except that it passes the parameters through POST directly:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=SELECT * WHERE { ?s ?p ?o } LIMIT 1" \
   -d "explain=dynamic"
```

## Other output types besides text/plain
<a name="sparql-explain-output-options"></a>

The preceding examples use the default `text/plain` output type. Neptune can also format SPARQL `explain` output in two other MIME-type formats, namely `text/csv` and `text/html`. You invoke them by setting the HTTP `Accept` header, which you can do using the `-H` flag in `curl`, as follows:

```
  -H "Accept: output type"
```

Here are some examples:

**`text/csv` Output**  
This query calls for CSV MIME-type output by specifying `-H "Accept: text/csv"`:

```
 curl http(s)://your_server:your_port/sparql \
   -d "query=SELECT * WHERE { ?s ?p ?o } LIMIT 1" \
   -d "explain=dynamic" \
   -H "Accept: text/csv"
```

The CSV format, which is handy for importing into a spreadsheet or database, separates the fields in each `explain` row by semicolons ( `;` ), like this:

```
ID;Out #1;Out #2;Name;Arguments;Mode;Units In;Units Out;Ratio;Time (ms)
0;1;-;SolutionInjection;solutions=[{}];-;0;1;0.00;0
1;2;-;PipelineJoin;pattern=distinct(?s, ?p, ?o),joinType=join,joinProjectionVars=[?s, ?p, ?o];-;1;6;6.00;1
2;3;-;Projection;vars=[?s, ?p, ?o];retain;6;6;1.00;2
3;-;-;Slice;limit=1;-;1;1;1.00;1
```

 

**`text/html` Output**  
If you specify `-H "Accept: text/html"`, then `explain` generates an HTML table:

```
<!DOCTYPE html>
<html>
  <body>
    <table border="1px">
      <thead>
        <tr>
          <th>ID</th>
          <th>Out #1</th>
          <th>Out #2</th>
          <th>Name</th>
          <th>Arguments</th>
          <th>Mode</th>
          <th>Units In</th>
          <th>Units Out</th>
          <th>Ratio</th>
          <th>Time (ms)</th>
        </tr>
      </thead>

      <tbody>
        <tr>
          <td>0</td>
          <td>1</td>
          <td>-</td>
          <td>SolutionInjection</td>
          <td>solutions=[{}]</td>
          <td>-</td>
          <td>0</td>
          <td>1</td>
          <td>0.00</td>
          <td>0</td>
        </tr>

        <tr>
          <td>1</td>
          <td>2</td>
          <td>-</td>
          <td>PipelineJoin</td>
          <td>pattern=distinct(?s, ?p, ?o)<br>
              joinType=join<br>
              joinProjectionVars=[?s, ?p, ?o]</td>
          <td>-</td>
          <td>1</td>
          <td>6</td>
          <td>6.00</td>
          <td>1</td>
        </tr>

        <tr>
          <td>2</td>
          <td>3</td>
          <td>-</td>
          <td>Projection</td>
          <td>vars=[?s, ?p, ?o]</td>
          <td>retain</td>
          <td>6</td>
          <td>6</td>
          <td>1.00</td>
          <td>2</td>
        </tr>

        <tr>
          <td>3</td>
          <td>-</td>
          <td>-</td>
          <td>Slice</td>
          <td>limit=1</td>
          <td>-</td>
          <td>1</td>
          <td>1</td>
          <td>1.00</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
  </body>
</html>
```

The HTML renders in a browser something like the following:

![\[Sample of SPARQL Explain HTML output.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/sparql-explain-dynamic-html-output.png)


## Example of SPARQL `explain` output when the DFE is enabled
<a name="sparql-explain-output-dfe"></a>

The following is an example of SPARQL `explain` output when the Neptune DFE alternative query engine is enabled:

```
╔════╤════════╤════════╤═══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name              │ Arguments                                                                                                                                                                                                               │ Mode     │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection │ solutions=[{}]                                                                                                                                                                                                          │ -        │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ HashIndexBuild    │ solutionSet=solutionSet1                                                                                                                                                                                                │ -        │ 1        │ 1         │ 1.00  │ 22        ║
║    │        │        │                   │ joinVars=[]                                                                                                                                                                                                             │          │          │           │       │           ║
║    │        │        │                   │ sourceType=pipeline                                                                                                                                                                                                     │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFENode           │ DFE Stats=                                                                                                                                                                                                                    │ -        │ 101      │ 100       │ 0.99  │ 32        ║
║    │        │        │                   │ ====> DFE execution time (measured by DFEQueryEngine)                                                                                                                                                                   │          │          │           │       │           ║
║    │        │        │                   │ accepted [micros]=127                                                                                                                                                                                                   │          │          │           │       │           ║
║    │        │        │                   │ ready [micros]=2                                                                                                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │ running [micros]=5627                                                                                                                                                                                                   │          │          │           │       │           ║
║    │        │        │                   │ finished [micros]=0                                                                                                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ ===> DFE execution time (measured in DFENode)                                                                                                                                                                           │          │          │           │       │           ║
║    │        │        │                   │ -> setupTime [ms]=1                                                                                                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │ -> executionTime [ms]=14                                                                                                                                                                                                │          │          │           │       │           ║
║    │        │        │                   │ -> resultReadTime [ms]=0                                                                                                                                                                                                │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ ===> Static analysis statistics                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ --> 35907 micros spent in parser.                                                                                                                                                                                       │          │          │           │       │           ║
║    │        │        │                   │ --> 7643 micros spent in range count estimation                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ --> 2895 micros spent in value resolution                                                                                                                                                                               │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ --> 39974925 micros spent in optimizer loop                                                                                                                                                                             │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ DFEJoinGroupNode[ children={                                                                                                                                                                                            │          │          │           │       │           ║
║    │        │        │                   │   DFEPatternNode[(?1, TERM[117442062], ?2, ?3) . project DISTINCT[?1, ?2] {rangeCountEstimate=100},                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │     OperatorInfoWithAlternative[                                                                                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │       rec=OperatorInfo[                                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         type=INCREMENTAL_PIPELINE_JOIN,                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         costEstimates=OperatorCostEstimates[                                                                                                                                                                            │          │          │           │       │           ║
║    │        │        │                   │           costEstimate=OperatorCostEstimate[in=1.0000,out=100.0000,io=0.0002,comp=0.0000,mem=0],                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │           worstCaseCostEstimate=OperatorCostEstimate[in=1.0000,out=100.0000,io=0.0002,comp=0.0000,mem=0]]],                                                                                                             │          │          │           │       │           ║
║    │        │        │                   │       alt=OperatorInfo[                                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         type=INCREMENTAL_HASH_JOIN,                                                                                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │         costEstimates=OperatorCostEstimates[                                                                                                                                                                            │          │          │           │       │           ║
║    │        │        │                   │           costEstimate=OperatorCostEstimate[in=1.0000,out=100.0000,io=0.0003,comp=0.0000,mem=3212],                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │           worstCaseCostEstimate=OperatorCostEstimate[in=1.0000,out=100.0000,io=0.0003,comp=0.0000,mem=3212]]]]],                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │   DFEPatternNode[(?1, TERM[150997262], ?4, ?5) . project DISTINCT[?1, ?4] {rangeCountEstimate=100},                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │     OperatorInfoWithAlternative[                                                                                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │       rec=OperatorInfo[                                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         type=INCREMENTAL_HASH_JOIN,                                                                                                                                                                                     │          │          │           │       │           ║
║    │        │        │                   │         costEstimates=OperatorCostEstimates[                                                                                                                                                                            │          │          │           │       │           ║
║    │        │        │                   │           costEstimate=OperatorCostEstimate[in=100.0000,out=100.0000,io=0.0003,comp=0.0000,mem=6400],                                                                                                                   │          │          │           │       │           ║
║    │        │        │                   │           worstCaseCostEstimate=OperatorCostEstimate[in=100.0000,out=100.0000,io=0.0003,comp=0.0000,mem=6400]]],                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │       alt=OperatorInfo[                                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         type=INCREMENTAL_PIPELINE_JOIN,                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │         costEstimates=OperatorCostEstimates[                                                                                                                                                                            │          │          │           │       │           ║
║    │        │        │                   │           costEstimate=OperatorCostEstimate[in=100.0000,out=100.0000,io=0.0010,comp=0.0000,mem=0],                                                                                                                      │          │          │           │       │           ║
║    │        │        │                   │           worstCaseCostEstimate=OperatorCostEstimate[in=100.0000,out=100.0000,io=0.0010,comp=0.0000,mem=0]]]]]                                                                                                          │          │          │           │       │           ║
║    │        │        │                   │ },                                                                                                                                                                                                                      │          │          │           │       │           ║
║    │        │        │                   │ ]                                                                                                                                                                                                                       │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ ===> DFE configuration:                                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │ solutionChunkSize=5000                                                                                                                                                                                                  │          │          │           │       │           ║
║    │        │        │                   │ ouputQueueSize=20                                                                                                                                                                                                       │          │          │           │       │           ║
║    │        │        │                   │ numComputeCores=3                                                                                                                                                                                                       │          │          │           │       │           ║
║    │        │        │                   │ maxParallelIO=10                                                                                                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │ numInitialPermits=12                                                                                                                                                                                                    │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ ====> DFE configuration (reported back)                                                                                                                                                                                 │          │          │           │       │           ║
║    │        │        │                   │ numComputeCores=3                                                                                                                                                                                                       │          │          │           │       │           ║
║    │        │        │                   │ maxParallelIO=2                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ numInitialPermits=12                                                                                                                                                                                                    │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ ===> Statistics & operator histogram                                                                                                                                                                                    │          │          │           │       │           ║
║    │        │        │                   │ ==> Statistics                                                                                                                                                                                                          │          │          │           │       │           ║
║    │        │        │                   │ -> 3741 / 3668 micros total elapsed (incl. wait / excl. wait)                                                                                                                                                           │          │          │           │       │           ║
║    │        │        │                   │ -> 3741 / 3 millis total elapse (incl. wait / excl. wait)                                                                                                                                                               │          │          │           │       │           ║
║    │        │        │                   │ -> 3741 / 0 secs total elapsed (incl. wait / excl. wait)                                                                                                                                                                │          │          │           │       │           ║
║    │        │        │                   │ ==> Operator histogram                                                                                                                                                                                                  │          │          │           │       │           ║
║    │        │        │                   │ -> 47.66% of total time (excl. wait): pipelineScan (2 instances)                                                                                                                                                        │          │          │           │       │           ║
║    │        │        │                   │ -> 10.99% of total time (excl. wait): merge (1 instances)                                                                                                                                                               │          │          │           │       │           ║
║    │        │        │                   │ -> 41.17% of total time (excl. wait): symmetricHashJoin (1 instances)                                                                                                                                                   │          │          │           │       │           ║
║    │        │        │                   │ -> 0.19% of total time (excl. wait): drain (1 instances)                                                                                                                                                                │          │          │           │       │           ║
║    │        │        │                   │                                                                                                                                                                                                                         │          │          │           │       │           ║
║    │        │        │                   │ nodeId | out0   | out1 | opName            | args                                             | rowsIn | rowsOut | chunksIn | chunksOut | elapsed* | outWait | outBlocked | ratio    | rate* [M/s] | rate [M/s] | %     │          │          │           │       │           ║
║    │        │        │                   │ ------ | ------ | ---- | ----------------- | ------------------------------------------------ | ------ | ------- | -------- | --------- | -------- | ------- | ---------- | -------- | ----------- | ---------- | ----- │          │          │           │       │           ║
║    │        │        │                   │ node_0 | node_2 | -    | pipelineScan      | (?1, TERM[117442062], ?2, ?3) DISTINCT [?1, ?2]  | 0      | 100     | 0        | 1         | 874      | 0       | 0          | Infinity | 0.1144      | 0.1144     | 23.83 │          │          │           │       │           ║
║    │        │        │                   │ node_1 | node_2 | -    | pipelineScan      | (?1, TERM[150997262], ?4, ?5) DISTINCT [?1, ?4]  | 0      | 100     | 0        | 1         | 874      | 0       | 0          | Infinity | 0.1144      | 0.1144     | 23.83 │          │          │           │       │           ║
║    │        │        │                   │ node_2 | node_4 | -    | symmetricHashJoin |                                                  | 200    | 100     | 2        | 2         | 1510     | 73      | 0          | 0.50     | 0.0662      | 0.0632     | 41.17 │          │          │           │       │           ║
║    │        │        │                   │ node_3 | -      | -    | drain             |                                                  | 100    | 0       | 1        | 0         | 7        | 0       | 0          | 0.00     | 0.0000      | 0.0000     | 0.19  │          │          │           │       │           ║
║    │        │        │                   │ node_4 | node_3 | -    | merge             |                                                  | 100    | 100     | 2        | 1         | 403      | 0       | 0          | 1.00     | 0.2481      | 0.2481     | 10.99 │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ 4      │ -      │ HashIndexJoin     │ solutionSet=solutionSet1                                                                                                                                                                                                │ -        │ 100      │ 100       │ 1.00  │ 4         ║
║    │        │        │                   │ joinType=join                                                                                                                                                                                                           │          │          │           │       │           ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 4  │ 5      │ -      │ Distinct          │ vars=[?s, ?o, ?o1]                                                                                                                                                                                                      │ -        │ 100      │ 100       │ 1.00  │ 9         ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 5  │ 6      │ -      │ Projection        │ vars=[?s, ?o, ?o1]                                                                                                                                                                                                      │ retain   │ 100      │ 100       │ 1.00  │ 2         ║
╟────┼────────┼────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼───────────┼───────┼───────────╢
║ 6  │ -      │ -      │ TermResolution    │ vars=[?s, ?o, ?o1]                                                                                                                                                                                                      │ id2value │ 100      │ 100       │ 1.00  │ 11        ║
╚════╧════════╧════════╧═══════════════════╧═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════════╧══════════╧═══════════╧═══════╧═══════════╝
```

# Neptune SPARQL `explain` operators
<a name="sparql-explain-operators"></a>

The following sections describe the operators and parameters for the SPARQL `explain` feature currently available in Amazon Neptune.

**Important**  
The SPARQL `explain` feature is still being refined. The operators and parameters documented here might change in future versions.

**Topics**
+ [`Aggregation` operator](#sparql-explain-operator-aggregation)
+ [`ConditionalRouting` operator](#sparql-explain-operator-conditional-routing)
+ [`Copy` operator](#sparql-explain-operator-copy)
+ [`DFENode` operator](#sparql-explain-operator-dfenode)
+ [`Distinct` operator](#sparql-explain-operator-distinct)
+ [`Federation` operator](#sparql-explain-operator-federation)
+ [`Filter` operator](#sparql-explain-operator-filter)
+ [`HashIndexBuild` operator](#sparql-explain-operator-hash-index-build)
+ [`HashIndexJoin` operator](#sparql-explain-operator-hash-index-join)
+ [`MergeJoin` operator](#sparql-explain-operator-merge-join)
+ [`NamedSubquery` operator](#sparql-explain-operator-named-subquery)
+ [`PipelineJoin` operator](#sparql-explain-operator-pipeline-join)
+ [`PipelineCountJoin` operator](#sparql-explain-operator-pipeline-count-join)
+ [`PipelinedHashIndexJoin` operator](#sparql-explain-operator-pipeline-hash-index-join)
+ [`Projection` operator](#sparql-explain-operator-projection)
+ [`PropertyPath` operator](#sparql-explain-operator-property-path)
+ [`TermResolution` operator](#sparql-explain-operator-term-resolution)
+ [`Slice` operator](#sparql-explain-operator-slice)
+ [`SolutionInjection` operator](#sparql-explain-operator-solution-injection)
+ [`Sort` operator](#sparql-explain-operator-sort)
+ [`VariableAlignment` operator](#sparql-explain-operator-variable-alignment)

## `Aggregation` operator
<a name="sparql-explain-operator-aggregation"></a>

Performs one or more aggregations, implementing the semantics of SPARQL aggregation operators such as `count`, `max`, `min`, `sum`, and so on.

`Aggregation` comes with optional grouping using `groupBy` clauses, and optional `having` constraints.

**Arguments**
+ `groupBy` – (*Optional*) Provides a `groupBy` clause that specifies the sequence of expressions according to which the incoming solutions are grouped.
+ `aggregates` – (*Required*) Specifies an ordered list of aggregation expressions.
+ `having` – (*Optional*) Adds constraints to filter on groups, as implied by the `having` clause in the SPARQL query.

## `ConditionalRouting` operator
<a name="sparql-explain-operator-conditional-routing"></a>

Routes incoming solutions based on a given condition. Solutions that satisfy the condition are routed to the operator ID referenced by `Out #1`, whereas solutions that do not are routed to the operator referenced by `Out #2`.

**Arguments**
+ `condition` – (*Required*) The routing condition.

## `Copy` operator
<a name="sparql-explain-operator-copy"></a>

Delegates the solution stream as specified by the specified mode.

**Modes**
+ `forward` – Forwards the solutions to the downstream operator identified by `Out #1`. 
+ `duplicate` – Duplicates the solutions and forwards them to each of the two operators identified by `Out #1` and `Out #2`.

`Copy` has no arguments.

## `DFENode` operator
<a name="sparql-explain-operator-dfenode"></a>

This operator is an abstraction of the plan that is run by the DFE alternative query engine. The detailed DFE plan is outlined in the arguments for this operator. The argument is currently overloaded to contain the detailed runtime statistics of the DFE plan. It contains the time spent in the various steps of query execution by DFE.

The logical optimized abstract syntax tree (AST) for the DFE query plan is printed with information about the operator types that were considered while planning and the associated best- and worst-case costs to run the operators. The AST consists of the following type of nodes at the moment:
+ `DFEJoinGroupNode` –  Represents a join of one or more `DFEPatternNodes`.
+ `DFEPatternNode` –  Encapsulates an underlying pattern using which matching tuples are projected out of the underlying database.

The sub-section, `Statistics & Operator histogram`, contains details about the execution time of the `DataflowOp` plan and the breakdown of CPU time used by each operator. Below this there is a table which prints detailed runtime statistics of the plan executed by DFE.

**Note**  
Because the DFE is an experimental feature released in lab mode, the exact format of its `explain` output may change.

## `Distinct` operator
<a name="sparql-explain-operator-distinct"></a>

Computes the distinct projection on a subset of the variables, eliminating duplicates. As a result, the number of solutions flowing in is larger than or equal to the number of solutions flowing out.

**Arguments**
+ `vars` – (*Required*) The variables to which to apply the `Distinct` projection.

## `Federation` operator
<a name="sparql-explain-operator-federation"></a>

Passes a specified query to a specified remote SPARQL endpoint.

**Arguments**
+ `endpoint` – (*Required*) The endpoint URL in the SPARQL `SERVICE` statement. This can be a constant string, or if the query endpoint is determined based on a variable within the same query, it can be the variable name.
+ `query` – (*Required*) The reconstructed query string to be sent to the remote endpoint. The engine adds default prefixes to this query even when the client doesn't specify any.
+ `silent` – (*Required*) A Boolean that indicates whether the `SILENT` keyword appeared after the keyword. `SILENT` tells the engine not to fail the whole query even if the remote `SERVICE` portion fails.

## `Filter` operator
<a name="sparql-explain-operator-filter"></a>

Filters the incoming solutions. Only those solutions that satisfy the filter condition are forwarded to the upstream operator, and all others are dropped.

**Arguments**
+ `condition` – (*Required*) The filter condition.

## `HashIndexBuild` operator
<a name="sparql-explain-operator-hash-index-build"></a>

Takes a list of bindings and spools them into a hash index whose name is defined by the `solutionSet` argument. Typically, subsequent operators perform joins against this solution set, referring it by that name.

**Arguments**
+ `solutionSet` – (*Required*) The name of the hash index solution set.
+ `sourceType` – (*Required*) The type of the source from which the bindings to store in the hash index are obtained:
  + `pipeline` – Spools the incoming solutions from the downstream operator in the operator pipeline into the hash index.
  + `binding set` – Spools the fixed binding set specified by the `sourceBindingSet` argument into the hash index.
+ `sourceBindingSet` – (*Optional*) If the `sourceType` argument value is `binding set`, this argument specifies the static binding set to be spooled into the hash index.

## `HashIndexJoin` operator
<a name="sparql-explain-operator-hash-index-join"></a>

Joins the incoming solutions against the hash index solution set identified by the `solutionSet` argument.

**Arguments**
+ `solutionSet` – (*Required*) Name of the solution set to join against. This must be a hash index that has been constructed in a prior step using the `HashIndexBuild` operator.
+ `joinType` – (*Required*) The type of join to be performed:
  + `join` – A normal join, requiring an exact match between all shared variables.
  + `optional` – An `optional` join that uses the SPARQL `OPTIONAL` operator semantics.
  + `minus` – A `minus` operation retains a mapping for which no join partner exists, using the SPARQL `MINUS` operator semantics.
  + `existence check` – Checks whether there is a join partner or not, and binds the `existenceCheckResultVar` variable to the result of this check.
+ `constraints` – (*Optional*) Additional join constraints that are considered during the join. Joins that do not satisfy these constraints are discarded.
+ `existenceCheckResultVar` – (*Optional*) Only used for joins where `joinType` equals `existence check` (see the `joinType` argument earlier).

## `MergeJoin` operator
<a name="sparql-explain-operator-merge-join"></a>

A merge join over multiple solution sets, as identified by the `solutionSets` argument.

**Arguments**
+ `solutionSets` – (*Required*) The solution sets to join together.

## `NamedSubquery` operator
<a name="sparql-explain-operator-named-subquery"></a>

Triggers evaluation of the subquery identified by the `subQuery` argument and spools the result into the solution set specified by the `solutionSet` argument. The incoming solutions for the operator are forwarded to the subquery and then to the next operator.

**Arguments**
+ `subQuery` – (*Required*) Name of the subquery to evaluate. The subquery is rendered explicitly in the output.
+ `solutionSet` – (*Required*) The name of the solution set in which to store the subquery result.

## `PipelineJoin` operator
<a name="sparql-explain-operator-pipeline-join"></a>

Receives as input the output of the previous operator and joins it against the tuple pattern defined by the `pattern` argument.

**Arguments**
+ `pattern` – (*Required*) The pattern, which takes the form of a subject-predicate-object, and optionally -graph tuple that underlies the join. If `distinct` is specified for the pattern, the join only extracts distinct solutions from projection variables specified by the `projectionVars` argument, rather than all matching solutions.
+ `inlineFilters` – (*Optional*) A set of filters to be applied to the variables in the pattern. The pattern is evaluated in conjunction with these filters.
+ `joinType` – (*Required*) The type of join to be performed:
  + `join` – A normal join, requiring an exact match between all shared variables.
  + `optional` – An `optional` join that uses the SPARQL `OPTIONAL` operator semantics.
  + `minus` – A `minus` operation retains a mapping for which no join partner exists, using the SPARQL `MINUS` operator semantics.
  + `existence check` – Checks whether there is a join partner or not, and binds the `existenceCheckResultVar` variable to the result of this check.
+ `constraints` – (*Optional*) Additional join constraints that are considered during the join. Joins that do not satisfy these constraints are discarded.
+ `projectionVars` – (*Optional*) The projection variables. Used in combination with `distinct := true` to enforce the extraction of distinct projections over a specified set of variables.
+ `cutoffLimit` – (*Optional*) A cutoff limit for the number of join partners extracted. Although there is no limit by default, you can set this to 1 when performing joins to implement `FILTER (NOT) EXISTS` clauses, where it is sufficient to prove or disprove that there is a join partner.

## `PipelineCountJoin` operator
<a name="sparql-explain-operator-pipeline-count-join"></a>

Variant of the `PipelineJoin`. Instead of joining, it just counts the matching join partners and binds the count to the variable specified by the `countVar` argument.

**Arguments**
+ `countVar` – (*Required*) The variable to which the count result, namely the number of join partners, should be bound.
+ `pattern` – (*Required*) The pattern, which takes the form of a subject-predicate-object, and optionally -graph tuple that underlies the join. If `distinct` is specified for the pattern, the join only extracts distinct solutions from projection variables specified by the `projectionVars` argument, rather than all matching solutions.
+ `inlineFilters` – (*Optional*) A set of filters to be applied to the variables in the pattern. The pattern is evaluated in conjunction with these filters.
+ `joinType` – (*Required*) The type of join to be performed:
  + `join` – A normal join, requiring an exact match between all shared variables.
  + `optional` – An `optional` join that uses the SPARQL `OPTIONAL` operator semantics.
  + `minus` – A `minus` operation retains a mapping for which no join partner exists, using the SPARQL `MINUS` operator semantics.
  + `existence check` – Checks whether there is a join partner or not, and binds the `existenceCheckResultVar` variable to the result of this check.
+ `constraints` – (*Optional*) Additional join constraints that are considered during the join. Joins that do not satisfy these constraints are discarded.
+ `projectionVars` – (*Optional*) The projection variables. Used in combination with `distinct := true` to enforce the extraction of distinct projections over a specified set of variables.
+ `cutoffLimit` – (*Optional*) A cutoff limit for the number of join partners extracted. Although there is no limit by default, you can set this to 1 when performing joins to implement `FILTER (NOT) EXISTS` clauses, where it is sufficient to prove or disprove that there is a join partner.

## `PipelinedHashIndexJoin` operator
<a name="sparql-explain-operator-pipeline-hash-index-join"></a>

This is an all-in-one build hash index and join operator. It takes a list of bindings, spools them into a hash index, and then joins the incoming solutions against the hash index.

**Arguments**
+ `sourceType`  –   (*Required*) The type of the source from which the bindings to store in the hash index are obtained, one of:
  + `pipeline`  –   Causes `PipelinedHashIndexJoin` to spool the incoming solutions from the downstream operator in the operator pipeline into the hash index.
  + `binding set`  –   Causes `PipelinedHashIndexJoin` to spool the fixed binding set specified by the `sourceBindingSet` argument into the hash index.
+ `sourceSubQuery `  –   (*Optional*) If the `sourceType` argument value is `pipeline`, this argument specifies the subquery that is evaluated and spooled into the hash index.
+ `sourceBindingSet `  –   (*Optional*) If the `sourceType` argument value is `binding set`, this argument specifies the static binding set to be spooled into the hash index.
+ `joinType`  –   (*Required*) The type of join to be performed:
  + `join` – A normal join, requiring an exact match between all shared variables.
  + `optional` – An `optional` join that uses the SPARQL `OPTIONAL` operator semantics.
  + `minus` – A `minus` operation retains a mapping for which no join partner exists, using the SPARQL `MINUS` operator semantics.
  + `existence check` – Checks whether there is a join partner or not, and binds the `existenceCheckResultVar` variable to the result of this check.
+ `existenceCheckResultVar`  –   (*Optional*) Only used for joins where `joinType` equals `existence check` (see the joinType argument above).

## `Projection` operator
<a name="sparql-explain-operator-projection"></a>

Projects over a subset of the variables. The number of solutions flowing in equals the number of solutions flowing out, but the shape of the solution differs, depending on the mode setting.

**Modes**
+ `retain` – Retain in solutions only the variables that are specified by the `vars` argument.
+ `drop` – Drop all the variables that are specified by the `vars` argument.

**Arguments**
+ `vars` – (*Required*) The variables to retain or drop, depending on the mode setting.

## `PropertyPath` operator
<a name="sparql-explain-operator-property-path"></a>

Enables recursive property paths such as `+` or `*`. Neptune implements a fixed-point iteration approach based on a template specified by the `iterationTemplate` argument. Known left-side or right-side variables are bound in the template for every fixed-point iteration, until no more new solutions can be found.

**Arguments**
+ `iterationTemplate` – (*Required*) Name of the subquery template used to implement the fixed-point iteration.
+ `leftTerm` – (*Required*) The term (variable or constant) on the left side of the property path.
+ `rightTerm` – (*Required*) The term (variable or constant) on the right side of the property path.
+ `lowerBound` – (*Required*) The lower bound for fixed-point iteration (either `0` for `*` queries, or `1` for `+` queries).

## `TermResolution` operator
<a name="sparql-explain-operator-term-resolution"></a>

Translates internal string identifier values back to their corresponding external strings, or translates external strings to internal string identifier values, depending on the mode.

**Modes**
+ `value2id` – Maps terms such as literals and URIs to corresponding internal ID values (encoding to internal values).
+ `id2value` – Maps internal ID values to the corresponding terms such as literals and URIs (decoding of internal values).

**Arguments**
+ `vars` – (*Required*) Specifies the variables whose strings or internal string IDs should be mapped.

## `Slice` operator
<a name="sparql-explain-operator-slice"></a>

Implements a slice over the incoming solution stream, using the semantics of SPARQL’s `LIMIT` and `OFFSET` clauses.

**Arguments**
+ `limit` – (*Optional*) A limit on the solutions to be forwarded.
+ `offset` – (*Optional*) The offset at which solutions are evaluated for forwarding.

## `SolutionInjection` operator
<a name="sparql-explain-operator-solution-injection"></a>

Receives no input. Statically injects solutions into the query plan and records them in the `solutions` argument.

Query plans always begin with this static injection. If static solutions to inject can be derived from the query itself by combining various sources of static bindings (for example, from `VALUES` or `BIND` clauses), then the `SolutionInjection` operator injects these derived static solutions. In the simplest case, these reflect bindings that are implied by an outer `VALUES` clause.

If no static solutions can be derived from the query, `SolutionInjection` injects the empty, so-called universal solution, which is expanded and multiplied throughout the query-evaluation process.

**Arguments**
+ `solutions` – (*Required*) The sequence of solutions injected by the operator.

## `Sort` operator
<a name="sparql-explain-operator-sort"></a>

Sorts the solution set using specified sort conditions.

**Arguments**
+ `sortOrder` – (*Required*) An ordered list of variables, each containing an `ASC` (ascending) or `DESC` (descending) identifier, used sequentially to sort the solution set.

## `VariableAlignment` operator
<a name="sparql-explain-operator-variable-alignment"></a>

Inspects solutions one by one, performing alignment on each one over two variables: a specified `sourceVar` and a specified `targetVar`.

If `sourceVar` and `targetVar` in a solution have the same value, the variables are considered aligned and the solution is forwarded, with the redundant `sourceVar` projected out.

If the variables bind to different values, the solution is filtered out entirely.

**Arguments**
+ `sourceVar` – (*Required*) The source variable, to be compared to the target variable. If alignment succeeds in a solution, meaning that the two variables have the same value, the source variable is projected out.
+ `targetVar` – (*Required*) The target variable, with which the source variable is compared. Is retained even when alignment succeeds.

# Limitations of SPARQL `explain` in Neptune
<a name="sparql-explain-limitations"></a>

The release of the Neptune SPARQL `explain` feature has the following limitations.

**Neptune Currently Supports Explain Only in SPARQL SELECT Queries**  
For information about the evaluation process for other query forms, such as `ASK`, `CONSTRUCT`, `DESCRIBE`, and `SPARQL UPDATE` queries, you can transform these queries into a SELECT query. Then use `explain` to inspect the corresponding SELECT query instead.

For example, to obtain `explain` information about an `ASK WHERE {...}` query, run the corresponding `SELECT WHERE {...} LIMIT 1` query with `explain`.

Similarly, for a `CONSTRUCT {...} WHERE {...}` query, drop the `CONSTRUCT {...}` part and run a `SELECT` query with `explain` on the second `WHERE {...}` clause. Evaluating the second `WHERE` clause generally reveals the main challenges of processing the `CONSTRUCT` query, because solutions flowing out of the second `WHERE` into the `CONSTRUCT` template generally only require straightforward substitution.

**Explain Operators May Change in Future Releases**  
The SPARQL `explain` operators and their parameters may change in future releases.

**Explain Output May Change in Future Releases**  
For example, column headers could change, and more columns might be added to the tables.

# SPARQL federated queries in Neptune using the `SERVICE` extension
<a name="sparql-service"></a>

Amazon Neptune fully supports the SPARQL federated query extension that uses the `SERVICE` keyword. (For more information, see [SPARQL 1.1 Federated Query](https://www.w3.org/TR/sparql11-federated-query/).)

The `SERVICE` keyword instructs the SPARQL query engine to execute a portion of the query against a remote SPARQL endpoint and compose the final query result. Only `READ` operations are possible. `WRITE` and `DELETE` operations are not supported. Neptune can only run federated queries against SPARQL endpoints that are accessible within its virtual private cloud (VPC). However, you can also use a reverse proxy in the VPC to make an external data source accessible within the VPC.

**Note**  
When SPARQL `SERVICE` is used to federate a query to two or more Neptune clusters in the same VPC, the security groups must be configured to allow all those Neptune clusters to talk to each another.

**Important**  
SPARQL 1.1 Federation makes service requests on your behalf when passing queries and parameters to external SPARQL endpoints. It is your responsibility to verify that the external SPARQL endpoints satisfy your application's data handling and security requirements.

## Example of a Neptune federated query
<a name="sparql-service-example-1"></a>

The following simple example shows how SPARQL federated queries work.

Suppose that a customer sends the following query to *Neptune-1* at `http://neptune-1:8182/sparql`.

```
SELECT * WHERE {
   ?person rdf:type foaf:Person .
   SERVICE <http://neptune-2:8182/sparql> {
       ?person foaf:knows ?friend .
    }
}
```

1. *Neptune-1* evaluates the first query pattern (*Q-1*) which is `?person rdf:type foaf:Person`, uses the results to resolve `?person` in *Q-2* (`?person foaf:knows ?friend`), and forwards the resulting pattern to *Neptune-2* at `http://neptune-2:8182/sparql`.

1. *Neptune-2* evaluates *Q-2* and sends the results back to *Neptune-1*.

1. *Neptune-1* joins the solutions for both patterns and sends the results back to the customer.

This flow is shown in the following diagram.

![\[Flow diagram showing SPARQL federated query patterns being evaluated and responses sent back to client.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/federated.png)


**Note**  
"By default, the optimizer determines at what point in query execution that the `SERVICE` instruction is executed. You can override this placement using the [joinOrder](sparql-query-hints-joinOrder.md) query hint.

## Access control for federated queries in Neptune
<a name="sparql-service-auth"></a>

Neptune uses AWS Identity and Access Management (IAM) for authentication and authorization. Access control for a federated query can involve more than one Neptune DB instance. These instances might have different requirements for access control. In certain circumstances, this can limit your ability to make a federated query.

Consider the simple example presented in the previous section. *Neptune-1* calls *Neptune-2* with the same credentials it was called with.
+ If *Neptune-1* requires IAM authentication and authorization, but *Neptune-2* does not, all you need is appropriate IAM permissions for *Neptune-1* to make the federated query.
+ If *Neptune-1* and *Neptune-2* both require IAM authentication and authorization, you need to attach IAM permissions for both databases to make the federated query. Both clusters must also be in the same AWS account and in the same region. Cross-region and/or cross-account federated query architectures are not currently supported.
+ However, in the case where *Neptune-1* is not IAM-enabled but *Neptune-2* is, you can't make a federated query. The reason is that *Neptune-1* can't retrieve your IAM credentials and pass them on to *Neptune-2* to authorize the second part of the query.