

# Querying Neptune Analytics
<a name="query"></a>

 Neptune Analytics currently supports only the openCypher query language to access a graph. openCypher is a declarative query language for property graphs that was originally developed by Neo4j, then open-sourced in 2015, and contributed to the [openCypher](https://opencypher.org/) project under an Apache 2 open-source license. Its syntax is documented in the [openCypher](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf) spec. 

**Topics**
+ [Query APIs](query-APIs.md)
+ [Query plan cache](query-plan-cache.md)
+ [Concurrency and query queuing in Neptune Analytics](query-concurrency-queuing.md)
+ [Query explain](query-explain.md)
+ [Statistics](query-statistics.md)
+ [Exceptions](query-exceptions.md)
+ [Neptune Analytics openCypher data model](query-openCypher-data-model.md)
+ [Neptune Analytics OpenCypher specification compliance](query-openCypher-standards-compliance.md)
+ [Transaction isolation levels in Neptune Analytics](query-isolation-level.md)

# Query APIs
<a name="query-APIs"></a>

The Neptune Analytics data API provides support for data operations including query execution, query status checking, query cancellation, and graph summarizing via the HTTPS endpoint, the AWS CLI, and the SDK.

**Topics**
+ [ExecuteQuery](query-APIs-execute-query.md)
+ [ListQueries](query-APIs-list-queries.md)
+ [GetQuery](query-APIs-get-query.md)
+ [CancelQuery](query-APIs-cancel-query.md)
+ [GraphSummary](query-APIs-graph-summary.md)
+ [IAM role mappings](query-APIs-IAM-role-mappings.md)

# ExecuteQuery
<a name="query-APIs-execute-query"></a>

ExecuteQuery runs queries against a Neptune Analytics graph. Supported language: openCypher.

## ExecuteQuery inputs
<a name="query-APIs-execute-query-input"></a>
+ graph-identifier (required)

  Type: `String`

  The identifier representing a graph.
+ region (required)

  Type: `String`

  The region where the graph is present.
+ query-string (required)

  Type: `String`

  Default: none

  A string representing a query.
+ language (required)

  Type: `Enum`

  Default: none

  The query language the query is written in. Currently, only `OPEN_CYPHER` is supported.
+ parameters (optional)

  Type: `Map`

  A map from `String` to `String` where the key is the parameter name and the value is the parameter value.
+ plan-cache (optional)

  Type: `Enum`

  Query plan cache is a feature that saves the query plan and reuses it on successive executions of the same query, reducing query latency. Query plan cache works for both read-only and mutation queries. The plan cache is an LRU cache with a five minute TTL and a capacity of 1000. It supports the following values:
  + `AUTO`: The engine will automatically decide to cache the query plan. If the query is parameterized and the runtime is shorter than 100ms, the query plan is automatically cached.
  + `ENABLED`: The query plan is cached regardless of the query runtime. The plan cache uses the query string as the key, this means that if a query is slightly different (i.e. different constants), it will not be able to reuse the plan cache of similar queries.
  + `DISABLED`: The query plan cache is not used.

  For more information on the query plan cache, see [Query plan cache](query-plan-cache.md).
+ explain-mode (optional)

  Type: `Enum`

  The explain mode parameters allow getting a query explain instead of the actual query results. A query explain can be used to gather insights about the query execution such as planning decisions, time spent on each operator, number of records flowing etc. If this parameter is not set the query is executed normally and the result is returned. The acceptable values for query explain are:
  + `STATIC`: Returns a query explain without executing the query. This can give an estimate on what the query plan looks like without actually executing the query. The static query plan may differ from the actual query plan. Actual queries may make planning decisions based on runtime statistics, which may not be considered when fetching a static query plan. A static query plan is useful when it is necessary to observe a plan for a query that either does not complete or runs for too long.
  + `DETAILS`: Returns a detailed query plan that shows what the running query did. This includes information such as operators runtime, number of records flowing through the plan, runtime planning decisions and more. If a query does not succeed in `NONE` mode, it will not succeed in `DETAILS` mode either. In this instance, you would want to use `STATIC` mode.

  For more information on query explain and its output, see [Query explain](query-explain.md).
+ query-timeout-milliseconds (optional)

  Type: `Enum`

  If specified, provides an upper bound to the query run time. This parameter will override the graph default timeout (30 minutes). Neptune Analytics graph have a maximum query runtime of 60 minutes. If the specified timeout is greater than the maximum query runtime, the query will only run for the maximum query runtime.
  +  Using the default settings, any CLI or SDK request will timeout in 60 seconds and attempt a retry. For the cases where you are running queries that can take longer than 60 seconds, it is recommended to set the CLI/SDK timeout to `0` (no timeout), or a much larger value to avoid unnecesssary retries. 

     It is also recommended to set `MAX_ATTEMPTS` for CLI/SDK to `1` for `execute_query` to avoid any retries by CLI/SDK. 

     For the Boto client, set the `read_timeout` to `None`, and the `total_max_attempts` to `1`. 

    ```
    import boto3
    from botocore.config import Config
    n = boto3.client('neptune-graph', 
                     config=(Config(retries={"total_max_attempts": 1, "mode": "standard"}, read_timeout=None)))
    ```

     For the CLI, set the `--cli-read-timeout` parameter to `0` for no timeout, and set the environment variable `AWS_MAX_ATTEMPTS` to `1` to prevent retries. 

    ```
    export AWS_MAX_ATTEMPTS=1
    ```

    ```
    aws neptune-graph execute-query \
    --graph-identifier <graph-id> \
    --region <region> \
    --query-string "MATCH (p:Person)-[r:KNOWS]->(p1) RETURN *;" \
    --cli-read-timeout 0
    --language open_cypher /tmp/out.txt
    ```

## ExecuteQuery examples
<a name="query-APIs-execute-query-examples"></a>

------
#### [ AWS CLI ]

```
# Sample query
aws neptune-graph execute-query \
--graph-identifier <graph-id> \
--region <region> \
--query-string "MATCH (p:Person)-[r:KNOWS]->(p1) RETURN *;" \
--language open_cypher \
/tmp/out.txt

# Sample query that prints directly to the console.
aws neptune-graph execute-query \
--graph-identifier <graph-id> \
--region <region> \
--query-string "MATCH (p:Person)-[r:KNOWS]->(p1) RETURN *;" \
--language open_cypher \
/dev/stdout

# parameters supported
query-string [REQUIRED] : String
language [REQUIRED] : open_cypher
explain-mode [OPTIONAL] : static | details
query-timeout-milliseconds [OPTIONAL] : Integer
plan-cache [OPTIONAL] : enabled | disabled | auto
parameters [OPTIONAL] : Map
```

------
#### [ AWSCURL ]

```
# Sample query
awscurl -X POST "https://<graph-id>.<endpoint>/queries" \
-H "Content-Type: application/x-www-form-urlencoded" \
--region <region> \
--service neptune-graph \
-d "query=MATCH (p:Person)-[r:KNOWS]->(p1) RETURN *;"
```

------

## ExecuteQuery output
<a name="query-APIs-execute-query-output"></a>

```
{
  "results": [{
      "p": {
        "~id": "fa1ef9b0-fa32-4b37-8051-78f2bf0e0d63",
        "~entityType": "node",
        "~labels": ["Person"],
        "~properties": {
          "name": "Simone"
        }
      },
      "p1": {
        "~id": "edaded10-b22b-4818-a22e-ddebfcf37acb",
        "~entityType": "node",
        "~labels": ["Person"],
        "~properties": {
          "name": "Mirro"
        }
      },
      "r": {
        "~id": "neptune_reserved_1_1154145192329347075",
        "~entityType": "relationship",
        "~start": "fa1ef9b0-fa32-4b37-8051-78f2bf0e0d63",
        "~end": "edaded10-b22b-4818-a22e-ddebfcf37acb",
        "~type": "KNOWS",
        "~properties": {}
      }
    }]
}
```

## Parameterized queries
<a name="query-APIs-execute-query-parameterized-queries"></a>

Neptune Analytics supports parameterized openCypher queries. This allows you to use the same query structure multiple times with different arguments. Since the query structure doesn't change, Neptune Analytics tries to cache the plan for these parameterized queries that run in less than 100 milliseconds.

The following is an example of using a parameterized query with the Neptune openCypher HTTPS endpoint. The query is:

```
MATCH (n {name: $name, age: $age})
RETURN n
```

The parameters are definied as follows:

```
parameters={"name": "john", "age": 20}
```

------
#### [ AWS CLI ]

```
# Sample query
aws neptune-graph execute-query \
--graph-identifier <graph-id> \
--region <region> \
--query-string "MATCH (n {name: \$name, age: \$age}) RETURN n" \
--parameters "{\"name\": \"john\", \"age\": 20}"
--language open_cypher /tmp/out.txt
```

------
#### [ AWSCURL ]

```
# Sample query
awscurl -X POST "https://[graph-id].<endpoint>/queries" \
-H "Content-Type: application/x-www-form-urlencoded" \
--region <region> \
--service neptune-graph \
-d "query=MATCH (n {name: \$name, age: \$age}) RETURN n;&parameters={\"name\": \"john\", \"age\": 20}"
```

------

# ListQueries
<a name="query-APIs-list-queries"></a>

ListQueries API fetches the list of running/waiting/cancelling queries on the graph.

## ListQueries syntax
<a name="query-APIs-list-queries-syntax"></a>

```
aws neptune-graph list-queries \
    --graph-identifier <graph-id> \ 
    --region <region> \
    --max-results <result_count>
    --state [all | running | waiting | cancelling]
```

## ListQueries inputs
<a name="query-APIs-list-queries-inputs"></a>
+ graph-identifier (required)

  Type: `String`

  Identifier representing your graph.
+ region (required)

  Type: `String`

  Region where the graph is present.
+ max-results (required)

  Type: `Integer`

  The maximum number of results to be fetched by the API.
+ state (optional)

  Type: `String`

  Supported values: all \$1 running \$1 waiting \$1 cancelling

  If `state` parameter is not specified, the API fetches all types.

## ListQueries outputs
<a name="query-APIs-list-queries-outputs"></a>

```
# Sample Response
{
    "queries": [
        {
            "id": "130ab841-8b4b-46c3-afbe-af00274c7fd9",
            "queryString": "MATCH p=(n)-[*]-(m) RETURN p;",
            "waited": 0,
            "elapsed": 1686,
            "state": "RUNNING"
        }
    ]
}
```

The output contains a list of query objects, each containing:
+ id: `String` - representing the unique identifier of the query.
+ queryString: `String` - The actual query text. The queryString may be truncated if the actual query string is too long.
+ waited: `Integer` - The time in milliseconds for which the query has waited in the waiting queue before being picked up by a worker thread.
+ elapsed: `Integer` - The time in milliseconds representing the running time of the query.
+ state: Current state of the query (running \$1 waiting \$1 cancelling).

The default list order is queries that are `running`, followed by `waiting` and `cancelling`.

## ListQueries Examples
<a name="query-APIs-list-queries-examples"></a>

------
#### [ AWS CLI ]

```
aws neptune-graph list-queries \
    --graph-identifier <graph-id> \ 
    --region us-east-1 \
    --max-results 200
    --state waiting
```

------
#### [ AWSCURL ]

```
awscurl -X GET "https://<graph-id>.<endpoint>/queries?state=WAITING&maxResults=200" \
   -H "Content-Type: application/x-www-form-urlencoded" \
   --region us-east-1 \
   --service neptune-graph
```

------

# GetQuery
<a name="query-APIs-get-query"></a>

The GetQuery API can be used to get the status of a specific query request.

## GetQuery inputs
<a name="query-APIs-get-query-inputs"></a>
+ graph-identifier (required)

  Type: `String`

  The identifier representing a graph.
+ region (required)

  Type: `String`

  The region where the graph is present.
+ query-id (required)

  Type: `String`

  The id of the query request for which you want to get information.

## GetQuery outputs
<a name="query-APIs-get-query-outputs"></a>
+ id: The same id used in this request.
+ queryString: Non-truncated query string associated to this `query-id`.
+ waited: Time in milliseconds this query request had to wait to be executed.
+ elapsed: Time in milliseconds the query spent while in execution.
+ state: Current state of the query - running \$1 waiting \$1 cancelling.

```
{
    "id" : "d6873456-40a7-44d7-be5c-46b4acfdc171",
    "queryString" : "UNWIND range(1,100000) AS i MATCH (n) RETURN i, n",
    "waited" : 1,
    "elapsed" : 8645,
    "state" : "RUNNING"
}
```

## GetQuery examples
<a name="query-APIs-get-query-examples"></a>

------
#### [ AWS CLI ]

```
aws neptune-graph get-query \
    --graph-identifier <graph-id> \
    --region <region> \
    --query-id <query-id>
```

------
#### [ AWSCURL ]

```
awscurl -X GET "https://<graph-id>.<endpoint>/queries/<query-id>" \
   -H "Content-Type: application/x-www-form-urlencoded" \
   --region us-east-1 \
   --service neptune-graph
```

------

# CancelQuery
<a name="query-APIs-cancel-query"></a>

CancelQuery cancels a specific query request.

## CancelQuery inputs
<a name="query-APIs-cancel-query-inputs"></a>
+ graph-identifier (required)

  Type: `String`

  The identifier representing a graph.
+ region (required)

  Type: `String`

  The region where the graph is present.
+ query-id (required)

  Type: `String`

  The id of the query request for which you want to cancel.

## CancelQuery outputs
<a name="query-APIs-cancel-query-outputs"></a>

CancelQuery does not have any output.

## CancelQuery examples
<a name="query-APIs-cancel-query-examples"></a>

------
#### [ AWS CLI ]

```
aws neptune-graph cancel-query \
    --graph-identifier <graph-id> \
    --region <region> \
    --query-id <query-id>
```

------
#### [ AWSCURL ]

```
awscurl -X DELETE "https://<graph-id>.<endpoint>/queries/<query-id>"  --region us-east-1 --service neptune-graph
```

------

# GraphSummary
<a name="query-APIs-graph-summary"></a>

You can use the GetGraphSummary API to quickly gain a high-level understanding of your graph data, size and content. In a graph application, this API can be used to improve the search results by providing discovered node or edge labels as part of the search.

The GetGraphSummary API retrieves a read-only list of node and edge labels and property keys, along with counts of nodes, edges, and properties. The API also accepts an optional parameter named mode, which can take one of two values, namely basic (the default) and detailed. The detailed graph summary response contains two additional fields, nodeStructures and edgeStructures.

## GetGraphSummary inputs
<a name="query-APIs-graph-summary-inputs"></a>

GetGraphSummary accepts two inputs:
+ graph-identifier (required) - The unique identifier of the graph.
+ mode (optional) - Can be `basic` or `detailed`.

## GetGraphSummary outputs
<a name="query-APIs-graph-summary-outputs"></a>

The response contains the following fields:
+ `version` - The version of this graph summary response.
+ `lastStatisticsComputationTime` - The timestamp, in ISO 8601 format, of the time at which Neptune Analytics last computed statistics.
+ `graphSummary`
  + `numNodes` - The number of nodes in the graph.
  + `numEdges` - The number of edges in the graph.
  + `numNodeLabels` - The number of distinct node labels in the graph.
  + `numEdgeLabels` - The number of disctinct edge labels in the graph.
  + `nodeLabels` - List of distinct node labels in the graph.
  + `edgeLabels` - List of distinct edge labels in the graph.
  + `numNodeProperties` - The number of distinct node properties in the graph.
  + `numEdgeProperites` - The number of distinct edge properties in the graph.
  + `nodeProperties` - List of distinct node properties in the graph along with the count of nodes where each property is used.
  + `edgeProperties` - List of distinct edge properties in the graph along with the count of edges where each property is used.
  + `totalNodePropertyValues` - Total number of usages of all node properties.
  + `totalEdgePropertyValues` - Total number of usages of all edge properties.
  + `nodeStructures` (only present for mode=detailed) - Contains a list of node structures, each containing the following fields:
    + `count` - Number of nodes that have this specific structure.
    + `nodeProperties` - List of node properties present in this specific structure.
    + `distinctOutgoingEdgeLabels` - List of distinct outgoing edge labels present in this specific structure.
  + `edgeStructures` (only present for mode=detailed) - Contains a list of edge structures each containing the following fields:
    + `count` - Number of edges that have this specific structure.
    + `edgeProperties` - List of edge properties present in this specific structure.

## GetGraphSummary examples
<a name="query-APIs-graph-summary-examples"></a>

------
#### [ AWS CLI ]

```
# Sample query
aws neptune-graph get-graph-summary \
--graph-identifier <graph-id> \
--region <region>
--mode detailed

# parmeters supported
mode [Optional] : basic | detailed
```

------
#### [ AWSCURL ]

```
# Sample query
awscurl "https://<graph-id>.<endpoint>/summary" \
--region <region> \
--service neptune-graph
```

------

Sample output payload:

```
# this is the graph summary with "mode=detailed" 
{
    "version": "v1",
    "lastStatisticsComputationTime": "2024-01-25T19:50:42+00:00",
    "graphSummary": {
        "numNodes": 3749,
        "numEdges": 57645,
        "numNodeLabels": 4,
        "numEdgeLabels": 2,
        "nodeLabels": [
            "continent",
            "country",
            "version",
            "airport"
        ],
        "edgeLabels": [
            "contains",
            "route"
        ],
        "numNodeProperties": 14,
        "numEdgeProperties": 1,
        "nodeProperties": [
            {
                "code": 3749
            },
            {
                "desc": 3749
            },
            {
                "type": 3749
            },
            {
                "city": 3504
            },
            {
                "country": 3504
            },
            {
                "elev": 3504
            },
            {
                "icao": 3504
            },
            {
                "lat": 3504
            },
            {
                "lon": 3504
            },
            {
                "longest": 3504
            },
            {
                "region": 3504
            },
            {
                "runways": 3504
            },
            {
                "author": 1
            },
            {
                "date": 1
            }
        ],
        "edgeProperties": [
            {
                "dist": 50637
            }
        ],
        "totalNodePropertyValues": 42785,
        "totalEdgePropertyValues": 50637,
        "nodeStructures": [          // will not be present with mode=basic
            {
                "count": 3475,
                "nodeProperties": [
                    "city",
                    "code",
                    "country",
                    "desc",
                    "elev",
                    "icao",
                    "lat",
                    "lon",
                    "longest",
                    "region",
                    "runways",
                    "type"
                ],
                "distinctOutgoingEdgeLabels": [
                    "route"
                ]
            },
            {
                "count": 238,
                "nodeProperties": [
                    "code",
                    "desc",
                    "type"
                ],
                "distinctOutgoingEdgeLabels": [
                    "contains"
                ]
            },
            {
                "count": 29,
                "nodeProperties": [
                    "city",
                    "code",
                    "country",
                    "desc",
                    "elev",
                    "icao",
                    "lat",
                    "lon",
                    "longest",
                    "region",
                    "runways",
                    "type"
                ],
                "distinctOutgoingEdgeLabels": []
            },
            {
                "count": 6,
                "nodeProperties": [
                    "code",
                    "desc",
                    "type"
                ],
                "distinctOutgoingEdgeLabels": []
            },
            {
                "count": 1,
                "nodeProperties": [
                    "author",
                    "code",
                    "date",
                    "desc",
                    "type"
                ],
                "distinctOutgoingEdgeLabels": []
            }
        ],
        "edgeStructures": [          //will not be present with mode=basic
            {
                "count": 50637,
                "edgeProperties": [
                    "dist"
                ]
            }
        ]
    }
}
```

# IAM role mappings
<a name="query-APIs-IAM-role-mappings"></a>

When you're calling Neptune Analytics API methods on a cluster, you require an IAM policy attached to the user or role making the calls that provides permissions for the actions you want to make. You set those permissions in the policy using corresponding IAM actions. You can also restrict the actions that can be taken using [IAM condition keys](https://docs.aws.amazon.com//neptune/latest/userguide/iam-data-condition-keys.html).

 Most IAM actions have the same name as the API methods that they correspond to, but some methods in the data API have different names, because some are shared by more than one method. The table below lists data methods and their corresponding IAM actions. 


| Data API operation name | IAM correspondences | 
| --- | --- | 
|  ListQueries  |  Action: ListQueries  | 
|  GetQuery  |  Action: GetQueryStatus  | 
|  Cancel Query  |  Action: CancelQuery  | 
|  GetGraphSummary  |  Action: GetGraphSummary  | 
|  ExecuteQuery  |  Action: ReadDataViaQuery Action: WriteDataViaQuery Action: DeleteDataViaQuery  | 

For more information, see [ Actions, resources and condition keys for Neptune Analytics](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonneptuneanalytics.html).

# Query plan cache
<a name="query-plan-cache"></a>

When a query is submitted to Neptune , the query string is parsed and translated into a query plan, which then gets optimized and executed by the engine. Often, the applications are backed by common query patterns that are instantiated with different values, and query plan cache would be optimal to reduce latency of those common query patterns. The query plan cache does this by storing a parameterized version of frequently used query plans (at most 1000 at any point), which gets reused and instantiated properly based on new parameter values provided, if any.

**Why use the query plan cache?**

Reusing the query plan can reduce the latency, as the later executions skip parsing and optimization steps.

**Where can it be used?**

Query plan cache can be used for all type of queries. By default, it automatically caches plan for low-latency parameterized queries, whose execution time is less than 100ms.

**How to force enable/disable the query plan cache?**

For read-only queries, query plan cache is enabled by default for low-latency queries. A plan is cached only when latency is lower than the threshold of 100ms. This behavior can be overridden on a per-query basis by HTTP parameter. HTTP parameter `--plan-cache` can take `enabled` or `disabled` as a value.

```
# Forcing plan to be cached or reused
% aws neptune-graph execute-query \
   --graph-identifier <graph-id> \
   --query-string "MATCH (n) RETURN n LIMIT 1" 
   --region <region> \
   --plan-cache "enabled"
   --language open_cypher /tmp/out.txt
    
% aws neptune-graph execute-query \
   --graph-identifier <graph-id> \
   --query-string "RETURN \$arg" 
   --region <region> \
   --plan-cache "enabled" \
   --parameters "{\"arg\": 123}"
   --language open_cypher /tmp/out.txt
```

**How to check if a plan is cached?**

To check if a plan is cached, use `explain`. For read-only queries, if the query was submitted and the plan was cached, explain would show explain details relevant to the query plan cache.

```
% aws neptune-graph execute-query \
   --graph-identifier <graph-id> \
   --query-string "MATCH (n) RETURN n LIMIT 1" 
   --region <region> \
   --plan-cache "enabled" \
   --explain-mode "static" \
   --language open_cypher /tmp/out.txt
```

```
Query: <QUERY STRING>
Plan cached by request: <REQUEST ID OF FIRST TIME EXECUTION>
Plan cached at: <TIMESTAMP OF FIRST TIME EXECUTION>
Parameters: <PARAMETERS IF QUERY IS PARAMETERIZED QUERY>
Plan cache hits: <NUMBER OF CACHE HITS FOR CACHED PLAN>
First query evaluation time: <LATENCY OF FIRST TIME EXECUTION>
```

The query has been executed based on a cached query plan. Detailed explain with operator runtime statistics can be obtained by running the query with plan cache disabled (using HTTP parameter planCache=disabled).

**Note**  
For a mutation query, explain is not yet supported.

**Eviction**

A query plan is evicted by cache TTL or maximum number of cached query plans reached. When the query plan is hit, the TTL is refreshed. The defaults are:
+ The maximum number of plans cached per instance is 1000.
+ TTL: 300\$1000 milliseconds or 5 minutes. Note that cache hit refreshes the TTL back to 5 min.

**Conditions when a query plan is not cached**

The following list demonstrates conditions for when a query plan would not be cached.
+ If submitted with query-specific parameter `--plan-cache "disabled"`.
  + If a cache is wanted, you can rerun the query without `--plan-cache "disabled"`.
+ If the query evaluation time is larger than latency threshold, it’s not cached since it’s a long-running query and is considered to not benefit from query plan cache.
+ If the query contains pattern that does not return any results.
  + i.e. `MATCH (n:nonexistentLabel) return n` when there are zero nodes with specified label.
  + i.e. `MATCH (n {name: $param}) return n with parameters={"param": "abcde"}` when there are zero nodes with name=abcde.
+ If the query parameter is composite type (list, map).

  ```
  aws neptune-graph execute-query \
     --graph-identifier <graph-id> \
     --query-string "RETURN \$arg" 
     --region <region> \
     --plan-cache "enabled" \
     --parameters "{\"arg\": [1, 2, 3]}"
     --language open_cypher /tmp/out.txt
  
   aws neptune-graph execute-query \
     --graph-identifier <graph-id> \
     --query-string "RETURN \$arg" 
     --region <region> \
     --plan-cache "enabled" \
     --parameters "{\"arg\": {\"a\": 1}}"
     --language open_cypher /tmp/out.txt
  ```
+ If the query parameter is a string that has not been part of data load or data insertion.
  + If `CREATE (n {name: "X"})`, is done to insert “X”.
  + `RETURN “X”` is cached, while `RETURN “Y”` isn’t, as “Y” has not been inserted and does not exist in the database.

# Mitigation for query plan cache issue
<a name="engine-releases-mitigation"></a>

We have detected an issue in query plan cache when `skip` or `limit` is used in an inner `WITH` clause and are parameterized. For example:

```
MATCH (n:Person)
WHERE n.age > $age
WITH n skip $skip LIMIT $limit 
RETURN n.name, n.age

parameters={"age": 21, "skip": 2, "limit": 3}
```

In this case, the parameter values for skip and limit from the first plan will be applied to subsequent queries, too, leading to unexpected results.

**Mitigation**

To prevent this issue, add the HTTP parameter `planCache=disabled` or SDK parameter `-\-planCache "disabled"` when submitting a query that includes a parameterized skip and/or limit sub-clause. Alternatively, you can hard-code the values into the query, or add a random comment to create a new plan for each request.

**Option 1:** Using request parameter

Curl example

```
curl -k https://<endpoint>:8182/opencypher -d 'query=MATCH (n:Person) WHERE n.age > $age WITH n skip $skip LIMIT $limit RETURN n.name, n.age' -d 'parameters={"age": 21, "skip": 2, "limit": 3}' -d planCache=disabled
```

SDK example

```
aws neptune-graph execute-query \
   -\-graph-identifier <graph-id> \
   -\-query-string "MATCH (n:Person) WHERE n.age > $age WITH n skip $skip LIMIT $limit RETURN n.name, n.age" 
   -\-region <region> \
   -\-plan-cache "disabled" \
   -\-language open_cypher
```

**Option 2:** Using hard-coded values for skip and limit

```
MATCH (n:Person)
WHERE n.age > $age
WITH n skip 2 LIMIT 3
RETURN n.name, n.age

parameters={"age": 21}
```

**Option 3:** Using a random comment

```
MATCH (n:Person)
WHERE n.age > $age
WITH n skip $skip LIMIT $limit 
RETURN n.name, n.age // 411357f6-00d2-4f03-92ce-060d8e037c0b

parameters={"age": 21, "skip": 2, "limit": 3}
```

# Concurrency and query queuing in Neptune Analytics
<a name="query-concurrency-queuing"></a>

When developing and tuning graph applications, it can be helpful to know the implications of sending parallel requests to an Neptune Analytics graph and how queries are being queued.

## Concurrency
<a name="query-concurrency"></a>

All queries submitted to an Neptune Analytics graph enter a FIFO queue. The number of worker threads that process queries from this queue is determined by the graph size: specifically, the number of m-NCUs divided by four. For example, a 128 m-NCU graph has 32 worker threads, and a 16 m-NCU graph has 4.

The effective concurrency you can expect depends on the nature of your workload:
+ **Compute-bound workloads** (openCypher queries, graph algorithms) – Plan for approximately 1 concurrent query per 8 m-NCU. These workloads operate primarily on in-memory data and are CPU-intensive, so each query fully utilizes a vCPU for the duration of its execution.
+ **I/O-bound workloads** (bulk data loading from Amazon S3, `neptune.read()` operations) – Plan for up to 1 concurrent query per 4 m-NCU. These workloads spend significant time waiting on I/O, which allows the CPU to service other requests during wait periods.

Some operations can consume multiple worker threads. This is particularly important for queries that use graph algorithms. When an algorithm has a `concurrency` parameter greater than `1`, the request attempts to consume up to that many threads.

**Note**  
These are guidelines for planning and sizing purposes, not guarantees. Actual concurrency varies based on query complexity, graph structure, and data access patterns. Monitor query queue depth as the primary indicator of back pressure and adjust your graph size accordingly.

## Query queuing
<a name="query-queuing"></a>

Query queuing occurs when the number of concurrent requests exceeds the available worker threads (m-NCU / 4). Queued queries wait in FIFO order until a worker thread becomes available.

The maximum number of queries that can be queued per graph, regardless of graph size, is 8,192. Any queries beyond that limit are rejected with a `ThrottlingException`.

Query latency includes the time a query spends in the queue, network round-trip time, and the actual execution time.

## Monitoring ongoing and queued requests
<a name="query-monitoring-queued"></a>

Neptune Analytics provides a [ListQueries](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_ListQueries.html) API that you can use to see any active or queued queries. To see all actively executing queries and all queries in the queue, use the `state` parameter set to `ALL`. By default, the ListQueries API only displays actively running queries. The following is an example AWS CLI call:

```
aws neptune-graph list-queries --graph-identifier g-12345abcde \
    --state ALL \
    --max-results 100
```

Requests are marked with a status of `RUNNING`, `WAITING`, or `CANCELING`. Queries in a `WAITING` state are queued.

Queued requests can also be monitored using the `NumQueuedRequestsPerSec` CloudWatch metric. This metric reports the number of requests that were queued over time.

## How query queuing can affect timeouts
<a name="query-queuing-timeouts"></a>

Query latency includes the time a query spends in the queue as well as the time it takes to execute.

Because a query's timeout period is generally measured starting from when it enters the queue, a slow-moving queue can make many queries time out as soon as they are dequeued. To avoid this, don't queue a large number of queries unless they can be executed rapidly.

# Query explain
<a name="query-explain"></a>

The openCypher `explain` feature is a feature that helps users to understand how the query is executed. Usually this is used in the context of query performance analysis.

## Explain inputs
<a name="query-explain-inputs"></a>

 To invoke `explain`, you can pass the explain-mode parameter to an ExecuteQuery request specifying the desired explain mode (i.e., level of detail), where this explain mode value can be one of the following: 
+  `static` - In static mode, `explain` doesn't run the query, but instead prints only the static structure of the query plan. 
+  `details` - In details mode, `explain` runs the query, and includes dynamic aspects of the query plan. These may include the number of intermediate bindings flowing through the operators, the ratio of incoming bindings to outgoing bindings, and the total time taken by each operator. Additional details, such as the actual openCypher query string and the estimated range count for the pattern underlying a join operator, are also shown. 

 The following code examples provide the `explain-mode` when using either the AWS CLI or AWSCURL. 

------
#### [ AWS CLI ]

```
aws neptune-graph execute-query \
--region <region> \
--graph-identifier <graph-id> \
--query-string <query-string> \
--explain-mode <explain-mode> \
--language open_cypher /tmp/out.txt
```

------
#### [ AWSCURL ]

```
awscurl -X POST "https://<graph-id>.<endpoint>/queries" \
-H "Content-Type: application/x-www-form-urlencoded" \
--region <region> \
--service neptune-graph \
-d "query=<query>&explain=<mode>"
```

------

## Explain outputs
<a name="query-explain-outputs"></a>

 **DFE operators in openCypher explain output** 

 To use the information that the openCypher explain feature provides, you need to understand some details about how the DFE query engine works (DFE being the engine that Neptune uses to process openCypher queries). 

 The DFE engine translates every query into a pipeline of operators. Starting from the first operator, intermediate solutions flow from one operator to the next through this operator pipeline. Each row in the explain table represents a result, up to the point of evaluation. The operators that can appear in a DFE query plan are as follows: 
+ DFEApply – Executes the function specified by functor in the arguments section, on the value stored in the specified variable
+ DFEAlgoWriteProperty – Explain operator for the property-writing portion of mutate algorithm invocations.
+ DFEBFSAlgo – Explain operator for invocations of the Breadth First Search algorithm, which searches for nodes from a starting vertex (or starting vertices, also called multi-source BFS) in a graph in breadth-first order.
+ DFEBindRelation – Binds together variables with the specified names.
+ DFEChunkLocalSubQuery – This is a non-blocking operation that acts as a wrapper around subqueries being performed.
+ DFEClosenessCentralityAlgo – Explain operator for invocations of the Closeness Centrality algorithm, which computes a metric that can be used as a positive measure of how close a given node is to all other nodes or how central it is in the graph.
+ DFECommonNeighborsAlgo – Explain operator for invocations of the Common Neighbors algorithm, which counts the number of common neighbors of two input nodes.
+ DFECreateConstant – Extends the given input relation with new columns containing constant values.
+ DFEDegreeAlgo – Explain operator for invocations of the Degree algorithm, which calculates the number of edges that are incident to a vertex.
+ DFEDistinctColumn – Returns the distinct subset of the input values based on the variable specified.
+ DFEDistinctRelation – Returns the distinct subset of the input solutions based on the variable specified.
+ DFEDrain – Appears at the end of a subquery to act as a termination step for that subquery. The number of solutions is recorded as Units In. Units Out is always zero.
+ DFEForwardValue – Copies all input chunks directly as output chunks to be passed to its downstream operator.
+ DFEGroupByHashIndex – This is a blocking operation that organizes the rows of a relation according to a set of variables, outputting a single group identifier column that is one-to-one with the rows of the input relation. Groups here are defined by the join variables used to build the hash index (See DFEHashIndexBuild for where this hash index might be built.)
+ DFEHashIndexBuild – Builds a hash index over a set of variables as a side-effect. This hash index is typically reused in later operations. (See DFEHashIndexJoin for where this hash index might be used.)
+ DFEHashIndexJoin – Performs a join over the incoming solutions against a previously built hash index. (See DFEHashIndexBuild for where this hash index might be built.)
+ DFEJaccardSimilarityAlgo – Explain operator for invocations of the Jaccard similarity algorithm, which measures the similarity between two sets of nodes.
+ DFEJoinExists – Takes a left and right hand input relation, and retains values from the left relation that have a corresponding value in the right relation as defined by the given join variables.
+ DFELabelPropagationAlgo – Explain operator for invocations of the Label Propagation algorithm, which is used for community detection.
+ DFELoopSubQuery – This is a non-blocking operation that acts as a wrapper for a subquery, allowing it to be run repeatedly for use in loops.
+ DFEMergeChunks – This is a blocking operation that combines chunks from its upstream operator into a single chunk of solutions to pass to its downstream operator (inverse of DFESplitChunks).
+ DFEMinus – Takes a left and right hand input relation, and retains values from the left relation that do not have a corresponding value in the right relation as defined by the given join variables. If there is no overlap in join variables across both relations, then this operator simply returns the left hand input relation as is.
+ DFENotExists – Takes a left and right hand input relation, and retains values from the left relation that do not have a corresponding value in the right relation as defined by the given join variables. If there is no overlap in join variables, then this operator will return an empty relation.
+ DFEOptionalJoin – Performs the optional join A OPTIONAL B ≡ (A JOIN B) UNION (A MINUS\$1NE B). This is a blocking operation.
+ DFEOverlapSimilarityAlgo – Explain operator for invocations of the Overlap Similarity algorithm, which measures the overlap between the neighbors of two nodes.
+ DFEPageRankAlgo – Explain operator for invocations of the Page Rank algorithm, which calculates a score for a given node based on the number, quality, and importance of the edges pointing to that node.
+ DFEPipelineJoin – Joins the input against the tuple pattern defined by the pattern argument.
+ DFEPipelineRangeCount – Counts the number of solutions matching a given pattern, and returns a single solution containing the count value.
+ DFEPipelineScan – Scans the database for the given pattern argument, with or without a given filter on column(s).
+ DFEProject – Takes multiple input columns and projects only the desired columns.
+ DFEReduce – Performs the specified aggregation function on specified variables.
+ DFERelationalJoin – Joins the input of the previous operator based on the specified pattern keys using a merge join. This is a blocking operation.
+ DFERouteChunks – Takes input chunks from its singular incoming edge and routes those chunks along its multiple outgoing edges.
+ DFESCCAlgo – Explain operator for invocations of the Strongly Connected Components algorithm, which calculates the maximally connected subgraphs of a directed graph where every node is reachable from every other node.
+ DFESelectRows – This operator selectively takes rows from its left input relation solutions to forward to its downstream operator. The rows selected based on the row identifiers supplied in the operator’s right input relation.
+ DFESerialize – Serializes a query’s final results into a JSON string serialization, mapping each input solution to the appropriate variable name. For node and edge results, these results are serialized into a map of entity properties and metadata.
+ DFESort – Takes an input relation and produces a sorted relation based on the provided sort key.
+ DFESplitByGroup – Splits each single input chunk from one incoming edge into smaller output chunks corresponding to row groups identified by row ids from the corresponding input chunk from the other incoming edge.
+ DFESplitChunks – Splits each single input chunk into smaller output chunks (inverse of DFEMergeChunks).
+ DFESSSPAlgo – Explain operator for invocations of the single source shortest path (SSSP) algorithms (Delta-stepping and Bellman-ford).
+ DFEStreamingHashIndexBuild – Streaming version of DFEHashIndexBuild.
+ DFEStreamingGroupByHashIndex – Streaming version of DFEGroupByHashIndex.
+ DFESubquery – This operator appears at the beginning of all plans and encapsulates the portions of the plan that are run on the DFE engine, which is the entire plan for openCypher.
+ DFESymmetricHashJoin – Joins the input of the previous operator based on the specified pattern keys using a hash join. This is a non-blocking operation.
+ DFESync – This operator is a synchronization operator supporting non-blocking plans. It takes solutions from two incoming edges and forwards these solutions to the appropriate downstream edges. For synchronization purposes, the inputs along one of these edges may be buffered internally.
+ DFETee – This is a branching operator that sends the same set of solutions to multiple operators.
+ DFETermResolution – Performs a localize or globalize operation on its inputs, resulting in columns of either localized or globalized identifiers respectively.
+ DFETopKSSSPAlgo – Explain operator for invocations of the TopK hop-limited single source (weighted) shortest path algorithm algorithm, which finds the single-source weighted shortest paths from a source node to its neighbors out to the distance specified by maxDepth.
+ DFETotalNeighborsAlgo – Explain operator for invocations of the Total Neighbors algorithm, which counts the total number of unique neighbors of two input vertices.
+ DFEUnfold – Unfolds lists of values from an input column into the output column as individual elements.
+ DFEUnion – Takes two or more input relations and produces a union of those relations using the desired output schema.
+ DFEVSSAlgo – Explain operator for invocations of the Vector similarity search algorithms, which find similar vectors based on the distance to each other.
+ DFEWCCAlgo – Explain operator for invocations of the Weakly Connected Components algorithm, which finds the weakly-connected components in a directed graph.
+ SolutionInjection – Appears before everything else in the explain output, with a value of one in the `Units Out` column. However, it serves a no-op, and doesn't actually inject any solutions into the DFE engine.
+ TermResolution – Appears at the end of plans and translates of objects from the Neptune engine into openCypher objects.

 **Columns in openCypher `explain` output** 

 The query plan information generated as openCypher explain output contains tables with one operator per row. The table has the following columns: 
+ ID – The numeric ID of this operator in the plan.
+ Out \$11 (and Out \$12) – The ID(s) of operator(s) that are downstream from this operator. There can be at most two downstream operators.
+ Name – The name of this operator.
+ Arguments – Any relevant details for the operator. This includes things like input schema, output schema, pattern (for `PipelineScan` and `PipelineJoin`), and so on.
+ Mode – A label describing fundamental operator behavior. This column is mostly blank (-). One exception is `TermResolution`, where mode can be `id2value_opencypher`, indicating a resolution from ID to openCypher value.
+ Units In – The number of solutions passed as input to this operator. Operators without upstream operators, such as `DFEPipelineScan`, `SolutionInjections`, and a `DFESubquery` with no static value injected, would have zero value.
+ Units Out – The number of solutions produced as output of this operator. `DFEDrain` is a special case, where the number of solutions being drained is recorded in `Units In` and `Units Out` is always zero.
+ Ratio – The ratio of `Units Out` to `Units In`.
+ Time (ms) – The CPU time consumed by this operator, in milliseconds.

**Note**  
 Depending on the level of detail selected via the explain mode parameter, some of these columns may not appear in the output. 

## Explain examples
<a name="query-explain-example"></a>

 The following is a basic example of openCypher `explain` output. The query is a single-node lookup in the air routes dataset for a node with the airport code `ATL` that invokes `explain` using the details mode: 

```
## sample query
aws neptune-graph execute-query \
--region <region> \
--graph-identifier <graph-id> \
--query-string "MATCH (n {code: 'ATL'}) RETURN n" \
--explain-mode details \
--language open_cypher /tmp/out.txt

## output
Query:
MATCH (n {code: 'ATL'}) RETURN n

╔════╤════════╤════════╤═══════════════════════╤════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments          │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪═══════════════════════╪════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ SolutionInjection     │ solutions=[{}]     │ -    │ 0        │ 1         │ 0.00  │ 0         ║
╟────┼────────┼────────┼───────────────────────┼────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ -      │ -      │ DFESubquery           │ subQuery=subQuery1 │ -    │ 0        │ 0         │ 0.00  │ 8.00      ║
╟────┼────────┼────────┼───────────────────────┼────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║    │        │        │ Summed execution time │                    │      │          │           │       │ 8.00      ║
╚════╧════════╧════════╧═══════════════════════╧════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


subQuery1
╔════╤════════╤════════╤════════════════════════╤══════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
║ ID │ Out #1 │ Out #2 │ Name                   │ Arguments                                                        │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
╠════╪════════╪════════╪════════════════════════╪══════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
║ 0  │ 1      │ -      │ DFEPipelineScan (DFX)  │ pattern=project ?n ?n_code2 (?n,code,?n_code2) [VERTEX_PROPERTY] │ -    │ 0        │ 1         │ 0.00  │ 0.03      ║
║    │        │        │                        │ inlineFilters=[(?n_code2 IN ["ATL"^^xsd:string])]                │      │          │           │       │           ║
║    │        │        │                        │ patternEstimate=1                                                │      │          │           │       │           ║
╟────┼────────┼────────┼────────────────────────┼──────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 1  │ 2      │ -      │ DFEProject (DFX)       │ columns=[?n]                                                     │ -    │ 1        │ 1         │ 1.00  │ 0.03      ║
╟────┼────────┼────────┼────────────────────────┼──────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 2  │ 3      │ -      │ DFESerialize (DFX)     │ columnsToSerialize=[?n]                                          │ -    │ 1        │ 0         │ 0.00  │ 0.08      ║
╟────┼────────┼────────┼────────────────────────┼──────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║ 3  │ -      │ -      │ DFEDrain (DFX)         │ -                                                                │ -    │ 0        │ 0         │ 0.00  │ 0         ║
╟────┼────────┼────────┼────────────────────────┼──────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
║    │        │        │ Summed execution time  │                                                                  │      │          │           │       │ 0.15      ║
╚════╧════════╧════════╧════════════════════════╧══════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝
```

 At the top-level, `SolutionInjection` appears before everything else, with 1 unit out. Note that it doesn't actually inject any solutions. You can see that the next operator, `DFESubquery`, has 0 units in. 

 After `SolutionInjection` at the top-level is the `DFESubquery` operator. `DFESubquery` encapsulates the parts of the query execution plan that are pushed to the DFE engine (for openCypher queries, the entire query plan is executed by the DFE). All the operators in the query plan are nested inside `subQuery1` that is referenced by `DFESubquery`. 

 All the operators that are pushed down to the DFE engine have names that start with a `DFE` prefix. As mentioned above, the whole openCypher query plan is executed by the DFE, so as a result, all of the operators start with `DFE`. 

 Inside `subQuery1`, there can be zero (as in this case) or more `DFEChunkLocalSubQuery` or `DFELoopSubQuery` operators that encapsulate a part of the pushed execution plan that is executed in a memory-bounded mechanism. A `DFEChunkLocalSubQuery` contains one `SolutionInjection` that is used as an input to the subquery. To find the table for that subquery in the output, search for the `subQuery=graph URI` specified in the `Arguments` column for the `DFEChunkLocalSubQuery` or `DFELoopSubQuery` operator. 

 In `subQuery1`, `DFEPipelineScan` with `ID` 0 scans the database for a specified `pattern`. The pattern scans for vertices `?n` with property `code` saved as a variable `?n_code2`. The `inlineFilters` argument shows the filtering for the `code` property equaling `ATL`. 

 Next, the `DFEProject` operator propagates forward only the `?n` variable we’re interested in. Finally, the `DFESerialize` operator performs result serialization, transforming the input solutions into a readable format. 

# Statistics
<a name="query-statistics"></a>

Neptune Analytics uses similar statistics for planning query execution as in [Neptune Database](https://docs.aws.amazon.com//neptune/latest/userguide/neptune-dfe-statistics.html). Computing these statistics is performed as an integrated part of the Neptune Analytics storage system. There are a number of differences between the features and usage of statistics between Neptune Analytics and Neptune Database: 

1.  Initial statistics generation is performed as part of either the initial import task or an initial data load occurring before any query-driven updates. Subsequently, statistics re-computation is triggered automatically based on the amount of update operations performed by the database. 

1.  Like with Neptune Database, Neptune Analytics has a size limit for statistics data, beyond which statistics will be disabled. The number of predicate statistics, may not exceed one million (the same as Neptune Database). There is no hard limit on the number of characteristic sets present in the underlying data. However, beyond 10,000 characteristic sets, the system will begin to merge statistics data in order to limit the overall size of data being managed. 

1.  Statistics generation is fully managed by the storage system. There are no APIs to disable or re-compute statistics. 

1.  There are no CloudWatch metrics relating to statistics generation. 

# Exceptions
<a name="query-exceptions"></a>

The following table lists query-side exceptions that could be encountered while using a query.


| Neptune Analytics error code | HTTP status | Retriable | Description | 
| --- | --- | --- | --- | 
|  Validation Exception  |  400  |  No  |  Something is wrong with the required information - Eg. a malformed query.  | 
|  AccessDeniedException  |  403  |  No  |  User is not authorized to perform the requested operation.  | 
|  ResourceNotFoundException  |  404  |  No  |  Requested resource is not available.  | 
|  ThrottlingException  |  429  |  Yes  |  The server has received too many concurrent requests.  | 
|  InternalServerErrorException  |  500  |  Yes  |  The server failed to process the request for an unknown reason.  | 
|  UnprocessableException  |  422  |  No  |  Request cannot be processed due to known reasons - Eg. The query timed out.  | 
|  ConflictException  |  409  |  Yes  |  Concurrently running queries attempted to modify resources or data records concurrently and the conflict could not be resolved automatically. Please retry with an exponential back-off strategy.  | 

# Neptune Analytics openCypher data model
<a name="query-openCypher-data-model"></a>

 For details on the openCypher data model, please refer to the Neptune Database [documentation](https://docs.aws.amazon.com//neptune/latest/userguide/access-graph-opencypher-data-model.html). There are some differences in modeling of vertices without labels. Neptune Database adds vertices with a default label if one is not explicitly provided. All but the last label of a vertex can be deleted. 

## What is a vertex?
<a name="query-openCypher-data-model-vertices"></a>

 As well as loading both vertices and edges, unlike Neptune Database, Neptune Analytics also allows loading just edges and is still able to run algorithms and queries from that starting point. This is useful if your main interest is, for example, loading a file of edge data from a CSV file and running an algorithm over the data without needing to provide any additional vertex information. This has some implications on how vertices are managed. For the Neptune Analytics query engine, a vertex implicitly exists if it either has an explicit label, a property, or an edge. Likewise, a vertex gets implicitly deleted if all its labels, properties, and edges get removed. Unlike Neptune Database, Neptune Analytics stores a label for a vertex only if one is explicitly provided by the user, and all labels of a vertex can be deleted. 

 This affects some common openCypher queries. An attempt to create a vertex that has neither a label nor properties or edges has no effect. That is, queries such as `CREATE (n)` or `CREATE (n {`~id`: "xyz"})` do not add any vertices to the graph. `CREATE (n {key:value})`, where `key` is different from ``~id``, creates a vertex with the property `key`, and `CREATE (n)-[knows]->(m)` creates two vertices with the one shared edge. 

 `CREATE (n {key:value})`, where key is different from ``~id``, creates a vertex with the property `key`, and a subsequent `MATCH (n)` will discover that vertex. A query such as `MATCH (n {key:value}) REMOVE n.key` will remove the only property for the (edge- and label-less) vertex, which implicitly deletes the vertex. A subsequent `MATCH (n)` query will not return that vertex. Likewise, `CREATE (n:Label)` adds a vertex with the label `Label` (and no other properties or edges). Now, `MATCH (n) REMOVE n:Label` deletes the only label of the vertex, which implicitly deletes the vertex. 

 Similarly, `CREATE (n)-[knows]->(m)` creates two nodes and one edge. `MATCH (n)` will discover those two vertices. Now, `MATCH (n)-[r:knows]->(m) DELETE r` will delete that edge, and implicitly deletes the two vertices. Those two vertices are no longer returned when running a `MATCH (n)` query. 

 Merge on empty vertices, `MERGE (n)` or `MERGE (n {`~id`: "xyz"})`, are not permitted and will throw an exception. `MERGE (n {key:value})` creates a vertex with property `key` if a matching vertex does not exist. 

 The following table illustrates the differences between Neptune Database and Neptune Analytics. 


| Query (run on empty graph) | Neptune Database | Neptune Analytics | 
| --- | --- | --- | 
|  `CREATE (n)`  |  Adds a vertex with label "vertex" to the graph. Each repeat request adds a new vertex to the graph.  |  No change to the graph, query returns without exception. Repeat requests similarly do not change the graph, and query returns without exception.  | 
|  `CREATE (n {`~id`: "xyz"})`  |  Adds a vertex with id "xyz" and label "vertex" to the graph. Repeat request fails with exception.  |  No change to the graph, query returns without exception. Repeat requests similarly do not change the graph, and query returns without exception.  | 
|  `CREATE (n {key:value})`  |  Adds a vertex with label "vertex" and property "key" to the graph.  |  Adds a vertex with property "key" to the graph. This vertex has no label.  | 
|  `CREATE (n {key:value})` `MATCH (n {key:value}) REMOVE n.key`  |  The REMOVE query removes the "key" property on the vertex. The graph contains a vertex with label "vertex" but no property. `MATCH (n)` returns the vertex.  |  The remove query removes the property on the vertex, and as a side effect the vertex gets deleted from the graph. `MATCH (n)` does not return the vertex.  | 
|  `CREATE (n:Label {`~id`: "xyz", key:value})` `MATCH (n {`~id`: "xyz"}) REMOVE n:Label`  |  The REMOVE query errors out, the last label on a vertex cannot be deleted.  |  The REMOVE query removes the label. The graph contains a graph with id "xyz" and property "key".  | 
|  `CREATE (n)-[:knows]->(m)`  |  Adds two vertices with label "vertex" and an edge with label "knows" to the graph. `MATCH (n)` returns both those vertices.  |  Adds an edge between two new vertices to the graph. `MATCH (n)` returns both those vertices.  | 
|  `CREATE (n)-[:knows]->(m)` `MATCH (n)-[r:knows]->(n) DELETE r`  |  Deletes the edge. The graph contains two isolated vertices. `MATCH (n)` returns both those vertices.  |  Deletes the edge, and as a side effect the two vertices also get deleted from the graph. The graph is now empty. `MATCH (n)` does not return the two vertices.  | 
|  `MERGE (n)`  |  Adds a vertex with label "vertex" if graph is empty. Matches all vertices in a non-empty graph.  |  Throws an exception.  | 
|  `MERGE (n {`~id`: "xyz"})`  |  Adds a vertex with label "vertex" and id "xyz" if one does not exist in the graph. Matches vertex with id "xyz".  |  Throws an exception.  | 
|  `MERGE (n {key:value})`  |  Adds a vertex with label "vertex" and property "key" to the graph, if such a vertex does not already exists.  |  Adds a vertex with property "key" to the graph, if such a vertex does not already exist. This vertex has no label.  | 
|  `MERGE (n)-[knows]->(m)`  |  Adds two vertices with label "vertex" and an edge with label "knows" to the graph, if an edge with label knows does not exist. `MATCH (n)` returns both those vertices.  |  Adds an edge between two new vertices to the graph, if an edge with label "knows" does not exist. The two vertices have no label. `MATCH (n)` returns both those vertices.  | 

**Note**  
 A workaround to implicit deletion of a vertex when all of its labels, properties, and edges get removed is to assign immutable labels to all vertices. This way, the deletion of all properties, edges, or mutable labels of a vertex will not lead to an implicit deletion. A vertex would not get deleted until explicitly deleted.   
 Likewise a workaround to no-op vertex create queries is to always create a vertex with a label or a property. To combine it with the previous point, always create a vertex with an immutable label. Extending this to bulk or batch loads, include all vertices in some vertex files and assign a property or an immutable label to all vertices. 

# Neptune Analytics OpenCypher specification compliance
<a name="query-openCypher-standards-compliance"></a>

 Refer to the Neptune Database documentation found [here](https://docs.aws.amazon.com//neptune/latest/userguide/feature-opencypher-compliance.html) for openCypher specification compliance, with the exception that Neptune Analytics does not support custom edge IDs. 

 Amazon Neptune also supports several features beyond the scope of the OpenCypher specification. Refer to [ OpenCypher extensions in Amazon Neptune](https://docs.aws.amazon.com//neptune/latest/userguide/access-graph-opencypher-extensions.html) for details. 

## Vertex and edge IDs
<a name="query-openCypher-standards-compliance-vertex-and-edge-ids"></a>

**Custom IDs for vertices**

 Neptune Analytics supports both querying and creating vertices with custom IDs. See [ openCypher custom IDs](https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html#opencypher-compliance-custom-ids) for more details. 

**Custom IDs for edges**

 Neptune Analytics does not support edge creation with custom edge IDs. Custom IDs are not permitted in CREATE or MERGE clauses. Edges are assigned IDs by Neptune , using a reserved prefix `neptune_reserved_`. Edges can be queried by the server assigned ids, just as in [Neptune Database](https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html#opencypher-compliance-custom-ids).

```
# Supported
MATCH (n)-[r:knows {`~id`: 'neptune_reserved_1_123456789'}]->(m)
RETURN r

# Unsupported
CREATE (n:Person {name: 'John'})-[:knows {`~id`: 'john-knows->jim'}]->(m:Person {name: 'Jim'})

# Unsupported
MERGE (n)-[r:knows {`~id`: 'neptune_reserved_1_123456789'}]->(m)
RETURN r
```

 Server assigned IDs are recycled. After an edge is deleted, a new edge created could get assigned the same ID. 

**Note**  
 The edges could get assigned new IDs if the graph gets restructured and the older IDs would then become invalid. If the edges are reassigned IDs, older IDs would match no other edges. It is not recommended to store these IDs externally for long-term querying purposes. 

## IRIs and language-tagged literals
<a name="query-openCypher-standards-compliance-iri"></a>

 Neptune Analytics supports values hat are of type IRI or languag-tagged literal. See [Handling RDF values](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/using-rdf-data.html#rdf-handling) for more information. 

## OpenCypher reduce() function
<a name="query-openCypher-standards-compliance-reduce"></a>

 Reduce sequentially processes each list element by combining it with a running total or ‘accumulator.’ Starting from an initial value, it updates the accumulator after each operation and uses that updated value in the next iteration. Once all elements have been processed, it returns the final accumulated result. 

**A typical reduce() structure**  
`reduce(accumulator = initial , variable IN list | expression)`

**Type specifications:**
+  initial: starting value for the accumulator - (LONG \$1 FLOAT \$1 STRING \$1 LIST OF (STRING, LONG, FLOAT)). 
+  list: the input list - LIST OF T where T matches the initial type. 
+  variable : represents each element in the input list. 
+  expression : Only supports the `+` operator. 
+  return : The return will be the same type as the initial type. 

**Restrictions:**  
 The reduce() expression currently only supports addition or concatenation (string or list). Both are represented by the `+` operator. The expression should be a binary expression specified as accumulator \$1 any variable. 

**Examples**  
The following examples show the different supported input types:

```
Long Addition:
RETURN reduce(sum = 0, n IN [1, 2, 3] | sum + n)
{
  "results": [{
      "reduce(sum = 0, n IN [1, 2, 3] | sum + n)": 6
    }]
}
```

```
String Concatenation:
RETURN reduce(str = "", x IN ["A", "B", "C"] | str + x)
{
  "results": [{
      "reduce(str = "", x IN ["A", "B", "C"] | str + x)": "ABC"
    }]
}
```

```
List Combination:
RETURN reduce(lst = [], x IN [1, 2, 3] | lst + x)
{
  "results": [{
      "reduce(lst = [], x IN [1, 2, 3] | lst + x)": [1, 2, 3]
    }]
}
```

```
Float Addition:
RETURN reduce(total = 0.0, x IN [1.5, 2.5, 3.5] | total + x)
{
  "results": [{
      "reduce(total = 0.0, x IN [1.5, 2.5, 3.5] | total + x)": 7.5
    }]
}
```

# Transaction isolation levels in Neptune Analytics
<a name="query-isolation-level"></a>

 Neptune Analytics has some differences with isolation level supported by [Neptune Database](https://docs.aws.amazon.com//neptune/latest/userguide/transactions-neptune.html). 

 **Read-only query isolation in Neptune Analytics:** Neptune Analytics evaluates read-only queries under snapshot isolation, just like Neptune Database. 

 **Mutation query isolation in Neptune Analytics:** Reads for mutation queries are normally executed under snapshot isolation, unlike Neptune Database. This is less stricter isolation than Neptune Database as the conditions in the query for proceeding to a write satisfied in a snapshot could have changed concurrently before the query commits. 

 For some specific steps, such as node/relationship deletion or conditional creation of new data using the MERGE step, reads also look at the concurrent writes, to avoid inconsistencies. Below are some examples where concurrent execution of queries one and two always lead to a consistent state. At most, one vertex gets created in example \$11. The age is set to 10 or 11 in example \$12, not both. And in example \$13, either the vertex is fully deleted or the age is set to 11 without any deletion or removal of other properties. 

```
# EXAMPLE 1
Query 1: MERGE (m:Person {ssn: '123456789'})
Query 2: MERGE (n:Person {ssn: '123456789'})
```

```
# EXAMPLE 2
Query 1: MATCH (n {ssn : '123456789'}) SET n.age=10
Query 2: MATCH (n {ssn : '123456789'}) SET n.age=11
```

```
# EXAMPLE 3
Query 1: MATCH (n {ssn : '123456789'}) DETACH DELETE n
Query 2: MATCH (n {ssn : '123456789'}) SET n.age = 11
```

 **Conflict detection:** Different from Neptune Database, conflicts are evaluated more precisely over individual graph elements (properties or edges) rather than over a range of data. Queries one and two in example \$14 would not conflict when run concurrently because they search and merge on different property values ('lname1' and 'lname2'). However, queries one and two in example \$15 merge on different property-value sets, but they could still confict when run concurrently because they share a property-value (firstName: 'fname'). 

```
# EXAMPLE 4
Query 1: MERGE (n {lastName: 'lname1'})
Query 2: MERGE (n {lastName: 'lname2'})
```

```
# EXAMPLE 5
Query 1: MERGE (n {firstName: 'fname', lastName: 'lname1'})
Query 2: MERGE (n {firstName: 'fname', lastName: 'lname2'})
```

 **Vector embeddings:** The changes (inserts, deletes, and updates) to vector embeddings are non-atomic and unisolated (see [Vector index transaction support](vector-index.md#vector-index-transaction-support)), unlike other graph updates. The changes to vector embeddings by a query become durable on write and visible to all other queries even if that query fails later. If a query updates the vector embeddings and makes other changes to the graph, then only the latter are atomic and isolated. 