# Querying a Neptune Graph Neptune supports the following graph query languages to access a graph: + [Gremlin](https://tinkerpop.apache.org/gremlin.html), defined by [Apache TinkerPop](https://tinkerpop.apache.org/) for creating and querying property graphs. A query in Gremlin is a traversal made up of discrete steps, each of which follows an edge to a node. See [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to learn about using Gremlin in Neptune, and [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md) to find specific details about the Neptune implementation of Gremlin. + [openCypher](access-graph-opencypher.md) is a declarative query language for property graphs that was originally developed by Neo4j, then open-sourced in 2015, and contributed to the [openCypher](http://www.opencypher.org/) project under an Apache 2 open-source license. Its syntax is documented in the [openCypher spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf). + [SPARQL](https://www.w3.org/TR/sparql11-overview/) is a declarative language based on graph pattern-matching, for querying [RDF](https://www.w3.org/2001/sw/wiki/RDF) data. It is supported by the [World Wide Web Consortium](https://www.w3.org/). See [Accessing the Neptune graph with SPARQL](access-graph-sparql.md) to learn about using SPARQL in Neptune, and [SPARQL standards compliance in Amazon Neptune](feature-sparql-compliance.md) to find specific details about the Neptune implementation of SPARQL. **Note** Both Gremlin and openCypher can be used to query any property-graph data stored in Neptune, regardless of how it was loaded. **Topics** + [Query queuing in Amazon Neptune](access-graph-queuing.md) + [Query plan cache in Amazon Neptune](access-graph-qpc.md) + [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md) + [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) + [Accessing the Neptune Graph with openCypher](access-graph-opencypher.md) + [Accessing the Neptune graph with SPARQL](access-graph-sparql.md) # Query queuing in Amazon Neptune When developing and tuning graph applications, it can be helpful to know the implications of how queries are being queued by the database. In Amazon Neptune, query queuing occurs as follows: + The maximum number of queries that can be queued up per instance, regardless of the instance size, is 8,192. Any queries over that number are rejected and fail with a `ThrottlingException`. + The maximum number of queries that can be executing at one time is determined by the number of worker threads assigned, which is generally set to twice the number of virtual CPU cores (vCPUs) that are available. + Query latency includes the time a query spends in the queue as well as network round-tripping and the time it actually takes to execute. ## Determining how many queries are in your queue at a given moment The `MainRequestQueuePendingRequests` CloudWatch metric records the the number of requests waiting in the input queue at a five-minutes granularity (see [Neptune CloudWatch Metrics](cw-metrics.md)). For Gremlin, you can obtain a current count of queries in the queue using the `acceptedQueryCount` value returned by the [Gremlin query status API](gremlin-api-status.md). Note, however, that the `acceptedQueryCount` value returned by the [SPARQL query status API](sparql-api-status.md) includes all queries accepted since the server was started, including completed queries. ## How query queuing can affect timeouts As noted above, query latency includes the time a query spends in the queue as well as the time it takes to execute. Because a query's timeout period is generally measured starting from when it enters the queue, a slow-moving queue can make many queries time out as soon as they are dequeued. This is obviously undesirable, so it is good to avoid queuing up a large number of queries unless they can be executed rapidly. # Query plan cache in Amazon Neptune When a query is submitted to Neptune, the query string is parsed, optimized, and transformed into a query plan, which then gets executed by the engine. Applications are often backed by common query patterns that are instantiated with different values. Query plan cache can reduce the overall latency by caching the query plans and thereby avoiding parsing and optimization for such repeated patterns. Query Plan Cache can be used for **OpenCypher** queries — both non-parameterized or parameterized queries. It is enabled for READ, and for HTTP and Bolt. It is **not** supported for OC mutation queries. It is **not** supported for Gremlin or SPARQL queries. ## How to force enable or disable query plan cache Query plan cache is enabled by default for low-latency parameterized queries. A plan for a parameterized query is cached only when latency is lower than the threshold of **100ms**. This behavior can be overridden on a per-query (parameterized or not) basis by the query-level Query Hint `QUERY:PLANCACHE`. It needs to be used with the `USING` clause. The query hint accepts `enabled` or `disabled` as a value. ------ #### [ AWS CLI ] Forcing plan to be cached or reused: ``` aws neptunedata execute-open-cypher-query \ --endpoint-url https://your-neptune-endpoint:port \ --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" ``` With parameters: ``` aws neptunedata execute-open-cypher-query \ --endpoint-url https://your-neptune-endpoint:port \ --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \ --parameters '{"arg": 123}' ``` Forcing plan to be neither cached nor reused: ``` aws neptunedata execute-open-cypher-query \ --endpoint-url https://your-neptune-endpoint:port \ --open-cypher-query "Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1" ``` For more information, see [execute-open-cypher-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-query.html) in the AWS CLI Command Reference. ------ #### [ SDK ] ``` import boto3 from botocore.config import Config client = boto3.client( 'neptunedata', endpoint_url='https://your-neptune-endpoint:port', config=Config(read_timeout=None, retries={'total_max_attempts': 1}) ) # Forcing plan to be cached or reused response = client.execute_open_cypher_query( openCypherQuery='Using QUERY:PLANCACHE "enabled" MATCH(n) RETURN n LIMIT 1' ) print(response['results']) ``` For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md). ------ #### [ awscurl ] Forcing plan to be cached or reused: ``` awscurl https://your-neptune-endpoint:port/openCypher \ --region us-east-1 \ --service neptune-db \ -X POST \ -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" ``` **Note** This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster. ------ #### [ curl ] Forcing plan to be cached or reused: ``` curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" ``` With parameters: ``` curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \ -d "parameters={\"arg\": 123}" ``` Forcing plan to be neither cached nor reused: ``` curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"disabled\" MATCH(n) RETURN n LIMIT 1" ``` ------ ## How to determine if a plan is cached or not For HTTP READ, if the query was submitted and the plan was cached, `explain` would show details relevant to query plan cache. ------ #### [ AWS CLI ] ``` aws neptunedata execute-open-cypher-explain-query \ --endpoint-url https://your-neptune-endpoint:port \ --open-cypher-query "Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \ --explain-mode details ``` For more information, see [execute-open-cypher-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-open-cypher-explain-query.html) in the AWS CLI Command Reference. ------ #### [ SDK ] ``` import boto3 from botocore.config import Config client = boto3.client( 'neptunedata', endpoint_url='https://your-neptune-endpoint:port', config=Config(read_timeout=None, retries={'total_max_attempts': 1}) ) response = client.execute_open_cypher_explain_query( openCypherQuery='Using QUERY:PLANCACHE "enabled" MATCH(n) RETURN n LIMIT 1', explainMode='details' ) print(response['results'].read().decode('utf-8')) ``` For AWS SDK examples in other languages, see [AWS SDK](access-graph-opencypher-sdk.md). ------ #### [ awscurl ] ``` awscurl https://your-neptune-endpoint:port/openCypher \ --region us-east-1 \ --service neptune-db \ -X POST \ -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \ -d "explain=details" ``` **Note** This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster. ------ #### [ curl ] ``` curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"enabled\" MATCH(n) RETURN n LIMIT 1" \ -d "explain=details" ``` ------ If the plan was cached, the `explain` output shows: ``` Query: Plan cached by request: Plan cached at: Parameters: Plan cache hits: First query evaluation time: The query has been executed based on a cached query plan. Detailed explain with operator runtime statistics can be obtained by running the query with plan cache disabled (using HTTP parameter planCache=disabled). ``` When using Bolt, the explain feature is not supported. ## Eviction A query plan is evicted by the cache time to live (TTL) or when a maximum number of cached query plans have been reached. When the query plan is hit, the TTL is refreshed. The defaults are: + 1000 - The maximum number of plans that can be cached per instance. + TTL - 300,000 milliseconds or 5 minutes. The cache hit restarts the TTL, and resets it back to 5 min. ## Conditions causing the plan not to be cached Query plan cache would not be used under the following conditions: 1. When a query is submitted using the query hint `QUERY:PLANCACHE "disabled"`. You can re-run the query and remove `QUERY:PLANCACHE "disabled"` to enable the query plan cache. 1. If the query that was submitted is not a parameterized query and does not contain the hint `QUERY:PLANCACHE "enabled"`. 1. If the query evaluation time is larger than the latency threshold, the query is not cached and is considered a long-running query that would not benefit from the query plan cache. 1. If the query contains a pattern that doesn't return any results. + i.e. `MATCH (n:nonexistentLabel) return n` when there are zero nodes with the specified label. + i.e. `MATCH (n {name: $param}) return n` with `parameters={"param": "abcde"}` when there are zero nodes containing `name=abcde`. 1. If the query parameter is a composite type, such as a `list` or a `map`. ``` curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \ -d "parameters={\"arg\": [1, 2, 3]}" curl https://your-neptune-endpoint:port/openCypher \ -d "query=Using QUERY:PLANCACHE \"enabled\" RETURN \$arg" \ -d "parameters={\"arg\": {\"a\": 1}}" ``` 1. If the query parameter is a string that has not been part of a data load or data insertion operation. For example, if `CREATE (n {name: "X"})` is ran to insert `"X"`, then `RETURN "X"` is cached, while `RETURN "Y"` would not be cached, as `"Y"` has not been inserted and does not exist in the database. # Inject a Custom ID Into a Neptune Gremlin or SPARQL Query By default, Neptune assigns a unique `queryId` value to every query. You can use this ID to get information about a running query (see [Gremlin query status API](gremlin-api-status.md) or [SPARQL query status API](sparql-api-status.md)), or cancel it (see [Gremlin query cancellation](gremlin-api-status-cancel.md) or [SPARQL query cancellation](sparql-api-status-cancel.md)). Neptune also lets you specify your own `queryId` value for a Gremlin or SPARQL query, either in the HTTP header, or for a SPARQL query by using the `queryId` query hint. Assigning your own `queryID` makes it easy to keep track of a query so as to get status or cancel it. ## Injecting a Custom `queryId` Value Using the HTTP Header For both Gremlin and SPARQL, the HTTP header can be used to inject your own `queryId` value into a query. **Gremlin Example** ``` curl -XPOST https://your-neptune-endpoint:port \ -d "{\"gremlin\": \ \"g.V().limit(1).count()\" , \ \"queryId\":\"4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47\" }" ``` **SPARQL Example** ``` curl https://your-neptune-endpoint:port/sparql \ -d "query=SELECT * WHERE { ?s ?p ?o } " \ --data-urlencode \ "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47" ``` ## Injecting a Custom `queryId` Value Using a SPARQL Query Hint Here is an example of how you would use the SPARQL `queryId` query hint to inject a custom `queryId` value into a SPARQL query: ``` curl https://your-neptune-endpoint:port/sparql \ -d "query=PREFIX hint: \ SELECT * WHERE { hint:Query hint:queryId \"4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47\" \ {?s ?p ?o}}" ``` ## Using the `queryId` Value to Check Query Status **Gremlin Example** ``` curl https://your-neptune-endpoint:port/gremlin/status \ -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47" ``` **SPARQL Example** ``` curl https://your-neptune-endpoint:port/sparql/status \ -d "queryId=4d5c4fae-aa30-41cf-9e1f-91e6b7dd6f47" ``` # Accessing a Neptune graph with Gremlin Amazon Neptune is compatible with Apache TinkerPop and Gremlin. This means that you can connect to a Neptune DB instance and use the Gremlin traversal language to query the graph (see [The Graph](https://tinkerpop.apache.org/docs/current/reference/#graph) in the Apache TinkerPop documentation). For differences in the Neptune implementation of Gremlin, see [Gremlin standards compliance](access-graph-gremlin-differences.md). A *traversal* in Gremlin is a series of chained steps. It starts at a vertex (or edge). It walks the graph by following the outgoing edges of each vertex and then the outgoing edges of those vertices. Each step is an operation in the traversal. For more information, see [The Traversal](https://tinkerpop.apache.org/docs/current/reference/#traversal) in the TinkerPop documentation. Different Neptune engine versions support different Gremlin versions. Check the [engine release page](engine-releases.md) of the Neptune version you are running to determine which Gremlin release it supports or consult the following table which lists the earliest and latest versions of TinkerPop supported by different Neptune engine versions: | Neptune Engine Version | Minimum TinkerPop Version | Maximum TinkerPop Version | | --- | --- | --- | | `1.3.2.0 and newer` | `3.7.1` | `3.7.3` | | `1.3.1.0` | `3.6.2` | `3.6.5` | | `1.3.0.0` | `3.6.2` | `3.6.4` | | `1.2.1.0 <= 1.2.1.2` | `3.6.2` | `3.6.2` | | `1.1.1.0 <= 1.2.0.2` | `3.5.5` | `3.5.6` | | `1.1.0.0 and older` | `(deprecated)` | `(deprecated)` | TinkerPop clients are usually backwards compatible within a series (`3.6.x`, for example, or `3.7.x`) and while they can often work across those boundaries, the table above recommends the version combinations to use for the best possible experience and compatibility. Unless otherwise advised, it is generally best to adhere to these guidelines and upgrade client applications to match the version of TinkerPop you are using. When upgrading TinkerPop versions it is always important to refer to [TinkerPop's upgrade documentation](http://tinkerpop.apache.org/docs/current/upgrade/) which will help you identify new features you can take advantage of, but also issues you may need to be aware of as you approach your upgrade. You should typically expect existing queries and features to work after upgrade unless something in particular is called out as an issue to consider. Finally, it is important to note that should a version you upgrade to have a new feature, you may not be able to use it if it is from a version later than what Neptune supports. There are Gremlin language variants and support for Gremlin access in various programming languages. For more information, see [On Gremlin Language Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants) in the TinkerPop documentation. This documentation describes how to access Neptune with the following variants and programming languages: + [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md) + [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md) + [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md) + [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md) + [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md) + [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md) + [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md) As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), you must use Transport Layer Security/Secure Sockets Layer (TLS/SSL) when connecting to Neptune in all AWS Regions. Before you begin, you must have the following: + A Neptune DB instance. For information about creating a Neptune DB instance, see [Creating an Amazon Neptune cluster](get-started-create-cluster.md). + An Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. For more information about loading data into Neptune, including prerequisites, loading formats, and load parameters, see [Loading data into Amazon Neptune](load-data.md). **Topics** + [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md) + [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md) + [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md) + [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md) + [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md) + [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md) + [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md) + [Using the AWS SDK to run Gremlin queries](access-graph-gremlin-sdk.md) + [Gremlin query hints](gremlin-query-hints.md) + [Gremlin query status API](gremlin-api-status.md) + [Gremlin query cancellation](gremlin-api-status-cancel.md) + [Support for Gremlin script-based sessions](access-graph-gremlin-sessions.md) + [Gremlin transactions in Neptune](access-graph-gremlin-transactions.md) + [Using the Gremlin API with Amazon Neptune](gremlin-api-reference.md) + [Caching query results in Amazon Neptune Gremlin](gremlin-results-cache.md) + [Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps](gremlin-efficient-upserts.md) + [Making efficient Gremlin upserts with `fold()/coalesce()/unfold()`](gremlin-efficient-upserts-pre-3.6.md) + [Analyzing Neptune query execution using Gremlin `explain`](gremlin-explain.md) + [Using Gremlin with the Neptune DFE query engine](gremlin-with-dfe.md) # Set up the Gremlin console to connect to a Neptune DB instance The Gremlin Console allows you to experiment with TinkerPop graphs and queries in a REPL (read-eval-print loop) environment. ## Installing the Gremlin console and connecting to it in the usual way You can use the Gremlin Console to connect to a remote graph database. The following section walks you through installing and configuring the Gremlin Console to connect remotely to a Neptune DB instance. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](access-graph-gremlin-java.md#access-graph-gremlin-java-ssl). **Note** If you have [IAM authentication enabled](iam-auth-enable.md) on your Neptune DB cluster, follow the instructions in [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md) to install the Gremlin console rather than the instructions here. **To install the Gremlin Console and connect to Neptune** 1. The Gremlin Console binaries require Java 8 or Java 11. These instructions assume usage of Java 11. You can install Java 11 on your EC2 instance as follows: + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2): ``` sudo amazon-linux-extras install java-openjdk11 ``` + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html): ``` sudo yum install java-11-amazon-corretto-devel ``` + For other distributions, use whichever of the following is appropriate: ``` sudo yum install java-11-openjdk-devel ``` or: ``` sudo apt-get install openjdk-11-jdk ``` 1. Enter the following to set Java 11 as the default runtime on your EC2 instance. ``` sudo /usr/sbin/alternatives --config java ``` When prompted, enter the number for Java 11. 1. Download the appropriate version of the Gremlin Console from the Apache web site. You can check the [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine which Gremlin version your version of Neptune supports. For example, if you need version 3.7.2, you can download the [Gremlin console](https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip) from the [Apache Tinkerpop](https://tinkerpop.apache.org/download.html) website onto your EC2 instance like this: ``` wget https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip ``` 1. Unzip the Gremlin Console zip file. ``` unzip apache-tinkerpop-gremlin-console-3.7.2-bin.zip ``` 1. Change directories into the unzipped directory. ``` cd apache-tinkerpop-gremlin-console-3.7.2 ``` 1. In the `conf` subdirectory of the extracted directory, create a file named `neptune-remote.yaml` with the following text. Replace *your-neptune-endpoint* with the hostname or IP address of your Neptune DB instance. The square brackets (`[ ]`) are required. **Note** For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section. ``` hosts: [your-neptune-endpoint] port: 8182 connectionPool: { enableSsl: true } serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }} ``` **Note** Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. 1. In a terminal, navigate to the Gremlin Console directory (`apache-tinkerpop-gremlin-console-3.7.2`), and then enter the following command to run the Gremlin Console. ``` bin/gremlin.sh ``` You should see the following output: ``` \,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> ``` You are now at the `gremlin>` prompt. You will enter the remaining steps at this prompt. 1. At the `gremlin>` prompt, enter the following to connect to the Neptune DB instance. ``` :remote connect tinkerpop.server conf/neptune-remote.yaml ``` 1. At the `gremlin>` prompt, enter the following to switch to remote mode. This sends all Gremlin queries to the remote connection. ``` :remote console ``` 1. Enter the following to send a query to the Gremlin Graph. ``` g.V().limit(1) ``` 1. When you are finished, enter the following to exit the Gremlin Console. ``` :exit ``` **Note** Use a semicolon (`;`) or a newline character (`\n`) to separate each statement. Each traversal preceding the final traversal must end in `next()` to be executed. Only the data from the final traversal is returned. For more information on the Neptune implementation of Gremlin, see [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md). # An alternate way to connect to the Gremlin console **Drawbacks of the normal connection approach** The most common way to connect to the Gremlin console is the one explained above, using commands like this at the `gremlin>` prompt: ``` gremlin> :remote connect tinkerpop.server conf/(file name).yaml gremlin> :remote console ``` This works well, and lets you send queries to Neptune. However, it takes the Groovy script engine out of the loop, so Neptune treats all queries as pure Gremlin. This means that the following query forms fail: ``` gremlin> 1 + 1 gremlin> x = g.V().count() ``` The closest you can get to using a variable when connected this way is to use the `result` variable maintained by the console and send the query using `:>`, like this: ``` gremlin> :remote console ==>All scripts will now be evaluated locally - type ':remote console' to return to remote mode for Gremlin Server - [krl-1-cluster.cluster-ro-cm9t6tfwbtsr.us-east-1.neptune.amazonaws.com/172.31.19.217:8182] gremlin> :> g.V().count() ==>4249 gremlin> println(result) [result{object=4249 class=java.lang.Long}] gremlin> println(result['object']) [4249] ``` **A different way to connect** You can also connect to the Gremlin console in a different way, which you may find nicer, like this: ``` gremlin> g = traversal().withRemote('conf/neptune.properties') ``` Here `neptune.properties` takes this form: ``` gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection gremlin.remote.driver.clusterFile=conf/my-cluster.yaml gremlin.remote.driver.sourceName=g ``` The `my-cluster.yaml` file should look like this: ``` hosts: [my-cluster-abcdefghijk.us-east-1.neptune.amazonaws.com] port: 8182 serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: false } } connectionPool: { enableSsl: true } ``` **Note** Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. Configuring the Gremlin console connection like that lets you make the following kinds of queries successfully: ``` gremlin> 1+1 ==>2 gremlin> x=g.V().count().next() ==>4249 gremlin> println("The answer was ${x}") The answer was 4249 ``` You can avoid displaying the result, like this: ``` gremlin> x=g.V().count().next();[] gremlin> println(x) 4249 ``` All the usual ways of querying (without the terminal step) continue to work. For example: ``` gremlin> g.V().count() ==>4249 ``` You can even use the [https://tinkerpop.apache.org/docs/current/reference/#io-step](https://tinkerpop.apache.org/docs/current/reference/#io-step) step to load a file with this kind of connection. ## IAM authentication Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from the Gremlin console, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md). # Using the HTTPS REST endpoint to connect to a Neptune DB instance Amazon Neptune provides an HTTPS endpoint for Gremlin queries. The REST interface is compatible with whatever Gremlin version your DB cluster is using (see the [engine release page](engine-releases.md) of the Neptune engine version you are running to determine which Gremlin release it supports). **Note** As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), Neptune now requires that you connect using HTTPS instead of HTTP. In addition, Neptune does not currently support HTTP/2 for REST API requests. Clients must use HTTP/1.1 when connecting to endpoints. The following instructions walk you through connecting to the Gremlin endpoint using the `curl` command and HTTPS. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. The HTTPS endpoint for Gremlin queries to a Neptune DB instance is `https://your-neptune-endpoint:port/gremlin`. **Note** For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md). ## To connect to Neptune using the HTTP REST endpoint The following examples show how to submit a Gremlin query to the REST endpoint. You can use the AWS SDK, the AWS CLI, or **curl**. ------ #### [ AWS CLI ] ``` aws neptunedata execute-gremlin-query \ --endpoint-url https://your-neptune-endpoint:port \ --gremlin-query "g.V().limit(1)" ``` For more information, see [execute-gremlin-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-query.html) in the AWS CLI Command Reference. ------ #### [ SDK ] ``` import boto3 import json from botocore.config import Config client = boto3.client( 'neptunedata', endpoint_url='https://your-neptune-endpoint:port', config=Config(read_timeout=None, retries={'total_max_attempts': 1}) ) response = client.execute_gremlin_query( gremlinQuery='g.V().limit(1)', serializer='application/vnd.gremlin-v3.0+json;types=false' ) print(json.dumps(response['result'], indent=2)) ``` For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md). ------ #### [ awscurl ] ``` awscurl https://your-neptune-endpoint:port/gremlin \ --region us-east-1 \ --service neptune-db \ -X POST \ -d '{"gremlin":"g.V().limit(1)"}' ``` **Note** This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster. For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl). ------ #### [ curl ] The following example uses **curl** to submit a Gremlin query through HTTP **POST**. The query is submitted in JSON format in the body of the post as the `gremlin` property. ``` curl -X POST -d '{"gremlin":"g.V().limit(1)"}' https://your-neptune-endpoint:port/gremlin ``` Although HTTP **POST** requests are recommended for sending Gremlin queries, it is also possible to use HTTP **GET** requests: ``` curl -G "https://your-neptune-endpoint:port?gremlin=g.V().count()" ``` ------ These examples return the first vertex in the graph by using the `g.V().limit(1)` traversal. You can query for something else by replacing it with another Gremlin traversal. **Important** By default, the REST endpoint returns all results in a single JSON result set. If this result set is too large, an `OutOfMemoryError` exception can occur on the Neptune DB instance. You can avoid this by enabling chunked responses (results returned in a series of separate responses). See [Use optional HTTP trailing headers to enable multi-part Gremlin responses](access-graph-gremlin-rest-trailing-headers.md). **Note** Neptune does not support the `bindings` property. # Use optional HTTP trailing headers to enable multi-part Gremlin responses By default, the HTTP response to Gremlin queries is returned in a single JSON result set. In the case of a very large result set, this can cause an `OutOfMemoryError` exception on the DB instance. However, you can enable *chunked* responses (responses that are returned in multiple separate parts). You do this by including a transfer-encoding (TE) trailers header (`te: trailers`) in your request. See [the MDN page about TE request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/TE)) for more information about TE headers. When a response is returned in multiple parts, it can be hard to diagnose a problem that occurs after the first part is received, since the first part arrives with an HTTP status code of `200` (OK). A subsequent failure usually results in a message body containing a corrupt response, at the end of which Neptune appends an error message. To make detection and diagnosis of this kind of failure easier, Neptune also includes two new header fields within the trailing headers of every response chunk: + `X-Neptune-Status` – contains the response code followed by a short name. For instance, in case of success the trailing header would be: `X-Neptune-Status: 200 OK`. In the case of failure, the response code would be one of the [Neptune engine error code](errors-engine-codes.md), such as `X-Neptune-Status: 500 TimeLimitExceededException`. + `X-Neptune-Detail` – is empty for successful requests. In the case of errors, it contains the JSON error message. Because only ASCII characters are allowed in HTTP header values, the JSON string is URL encoded. **Note** Neptune does not currently support `gzip` compression of chunked responses. If the client requests both chunked encoding and compression at the same time, Neptune skips the compression. # Java-based Gremlin clients to use with Amazon Neptune You can use either of two open-source Java-based Gremlin clients with Amazon Neptune: the [Apache TinkerPop Java Gremlin client](https://search.maven.org/artifact/org.apache.tinkerpop/gremlin-driver), or the [Gremlin client for Amazon Neptune](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client). ## Apache TinkerPop Java Gremlin client The Apache TinkerPop Java [gremlin-driver](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) is the standard, official Gremlin client that works with any TinkerPop-enabled graph database. Use this client when you need maximum compatibility with the broader TinkerPop development space, when you're working with multiple graph database systems, or when you don't require the advanced cluster management and load balancing features specific to Neptune. This client is also suitable for simple applications that connect to a single Neptune instance or when you prefer to handle load balancing at the infrastructure level rather than within the client. **Important** Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md). **Note** The table that helps you determine the correct Apache TinkerPop version to use with Neptune has been moved to [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md). This table was previously located on this page for many years and is now more centralized for reference for all programming languages that TinkerPop supports. ## Gremlin Java client for Amazon Neptune The Gremlin client for Amazon Neptune is an [open-source Java-based Gremlin client](https://github.com/aws/neptune-gremlin-client) that acts as a drop-in replacement for the standard TinkerPop Java client. The Neptune Gremlin client is optimized for Neptune clusters. It lets you manage traffic distribution across multiple instances in a cluster, and adapts to changes in cluster topology when you add or remove a replica. You can even configure the client to distribute requests across a subset of instances in your cluster, based on role, instance type, availability zone (AZ), or tags associated with instances. The [latest version of the Neptune Gremlin Java client](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client) is available on Maven Central. For more information about the Neptune Gremlin Java client, see [this blog post](https://aws.amazon.com/blogs/database/load-balance-graph-queries-using-the-amazon-neptune-gremlin-client/). For code samples and demos, check out the [client's GitHub project](https://github.com/aws/neptune-gremlin-client). When choosing the version of the Neptune Gremlin client, you need to consider the underlying TinkerPop version in relation to your Neptune engine version. Refer to the compatibility table at [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine the correct TinkerPop version for your Neptune engine, then use the following table to select the appropriate Neptune Gremlin client version: **Neptune Gremlin client version compatibility** | Neptune Gremlin client version | TinkerPop version | | --- | --- | | 3.x | 3.7.x (AWS SDK for Java 2.x/1.x) | | 2.1.x | 3.7.x (AWS SDK for Java 1.x) | | 2.0.x | 3.6.x | | 1.12 | 3.5.x | # Using a Java client to connect to a Neptune DB instance The following section walks you through the running of a complete Java sample that connects to a Neptune DB instance and performs a Gremlin traversal using the Apache TinkerPop Gremlin client. These instructions must be followed from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. **To connect to Neptune using Java** 1. Install Apache Maven on your EC2 instance. If using Amazon Linux 2023 (preferred), use: ``` sudo dnf update -y sudo dnf install maven -y ``` If using Amazon Linux 2, download the latest binary from [ https://maven.apache.org/download.cgi: ](https://maven.apache.org/download.cgi:) ``` sudo yum remove maven -y wget https://dlcdn.apache.org/maven/maven-3/ /binaries/apache-maven--bin.tar.gz sudo tar -xzf apache-maven--bin.tar.gz -C /opt/ sudo ln -sf /opt/apache-maven- /opt/maven echo 'export MAVEN_HOME=/opt/maven' >> ~/.bashrc echo 'export PATH=$MAVEN_HOME/bin:$PATH' >> ~/.bashrc source ~/.bashrc ``` 1. **Install Java.** The Gremlin libraries need Java 8 or 11. You can install Java 11 as follows: + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2): ``` sudo amazon-linux-extras install java-openjdk11 ``` + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html): ``` sudo yum install java-11-amazon-corretto-devel ``` + For other distributions, use whichever of the following is appropriate: ``` sudo yum install java-11-openjdk-devel ``` or: ``` sudo apt-get install openjdk-11-jdk ``` 1. **Set Java 11 as the default runtime on your EC2 instance:** Enter the following to set Java 8 as the default runtime on your EC2 instance: ``` sudo /usr/sbin/alternatives --config java ``` When prompted, enter the number for Java 11. 1. **Create a new directory named `gremlinjava`:** ``` mkdir gremlinjava cd gremlinjava ``` 1. In the `gremlinjava` directory, create a `pom.xml` file, and then open it in a text editor: ``` nano pom.xml ``` 1. Copy the following into the `pom.xml` file and save it: ``` UTF-8 4.0.0 com.amazonaws GremlinExample jar 1.0-SNAPSHOT GremlinExample https://maven.apache.org org.apache.tinkerpop gremlin-driver 3.7.2 org.slf4j slf4j-jdk14 1.7.25 org.apache.maven.plugins maven-compiler-plugin 2.5.1 11 11 org.codehaus.mojo exec-maven-plugin 1.3 java -classpath com.amazonaws.App com.amazonaws.App 1.11 -1 ``` **Note** If you are modifying an existing Maven project, the required dependency is highlighted in the preceding code. 1. Create subdirectories for the example source code (`src/main/java/com/amazonaws/`) by typing the following at the command line: ``` mkdir -p src/main/java/com/amazonaws/ ``` 1. In the `src/main/java/com/amazonaws/` directory, create a file named `App.java`, and then open it in a text editor. ``` nano src/main/java/com/amazonaws/App.java ``` 1. Copy the following into the `App.java` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance. Do *not* include the `https://` prefix in the `addContactPoint` method. **Note** For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md). ``` package com.amazonaws; import org.apache.tinkerpop.gremlin.driver.Cluster; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal; import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal; import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__; import org.apache.tinkerpop.gremlin.structure.T; public class App { public static void main( String[] args ) { Cluster.Builder builder = Cluster.build(); builder.addContactPoint("your-neptune-endpoint"); builder.port(8182); builder.enableSsl(true); Cluster cluster = builder.create(); GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster)); // Add a vertex. // Note that a Gremlin terminal step, e.g. iterate(), is required to make a request to the remote server. // The full list of Gremlin terminal steps is at https://tinkerpop.apache.org/docs/current/reference/#terminal-steps g.addV("Person").property("Name", "Justin").iterate(); // Add a vertex with a user-supplied ID. g.addV("Custom Label").property(T.id, "CustomId1").property("name", "Custom id vertex 1").iterate(); g.addV("Custom Label").property(T.id, "CustomId2").property("name", "Custom id vertex 2").iterate(); g.addE("Edge Label").from(__.V("CustomId1")).to(__.V("CustomId2")).iterate(); // This gets the vertices, only. GraphTraversal t = g.V().limit(3).elementMap(); t.forEachRemaining( e -> System.out.println(t.toList()) ); cluster.close(); } } ``` For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](#access-graph-gremlin-java-ssl). 1. Compile and run the sample using the following Maven command: ``` mvn compile exec:exec ``` The preceding example returns a map of the key and values of each property for the first two vertexes in the graph by using the `g.V().limit(3).elementMap()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods. **Note** The final part of the Gremlin query, `.toList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance. You also must append an appropriate ending when you add a vertex or edge, such as when you use the `addV( )` step. The following methods submit the query to the Neptune DB instance: + `toList()` + `toSet()` + `next()` + `nextTraverser()` + `iterate()` ## SSL/TLS configuration for Gremlin Java client Neptune requires SSL/TLS to be enabled by default. Typically, if the Java driver is configured with `enableSsl(true)`, it can connect to Neptune without having to set up a `trustStore()` or `keyStore()` with a local copy of a certificate. However, if the instance with which you are connecting doesn't have an internet connection through which to verify a public certificate, or if the certificate you're using isn't public, you can take the following steps to configure a local certificate copy: **Setting up a local certificate copy to enable SSL/TLS** 1. Download and install [keytool](https://docs.oracle.com/javase/9/tools/keytool.htm#JSWOR-GUID-5990A2E4-78E3-47B7-AE75-6D1826259549) from Oracle. This will make setting up the local key store much easier. 1. Download the `SFSRootCAG2.pem`CA certificate (the Gremlin Java SDK requires a certificateto verify the remote certificate): ``` wget https://www.amazontrust.com/repository/SFSRootCAG2.pem ``` 1. Create a key store in either JKS or PKCS12 format. This example uses JKS. Answer the questions that follow at the prompt. The password that you create here will be needed later: ``` keytool -genkey -alias (host name) -keyalg RSA -keystore server.jks ``` 1. Import the `SFSRootCAG2.pem` file that you downloaded into the newly created key store: ``` keytool -import -keystore server.jks -file .pem ``` 1. Configure the `Cluster` object programmatically: ``` Cluster cluster = Cluster.build("(your neptune endpoint)") .port(8182) .enableSSL(true) .keyStore(‘server.jks’) .keyStorePassword("(the password from step 2)") .create(); ``` You can do the same thing in a configuration file if you want, as you might do with the Gremlin console: ``` hosts: [(your neptune endpoint)] port: 8182 connectionPool: { enableSsl: true, keyStore: server.jks, keyStorePassword: (the password from step 2) } serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }} ``` ## IAM authentication Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Java client, see [Connecting to Amazon Neptune databases using IAM with Gremlin Java](iam-auth-connecting-gremlin-java.md). # Java example of connecting to a Neptune DB instance with re-connect logic The following Java example demonstrates how to connect to the Gremlin client with reconnect logic to recover from an unexpected disconnect. It has the following dependencies: ``` org.apache.tinkerpop gremlin-driver ${gremlin.version} com.amazonaws amazon-neptune-sigv4-signer ${sig4.signer.version} com.evanlennick retry4j 0.15.0 ``` Here is the sample code: **Important** The `CallExecutor` from Retry4j may not be thread-safe. Consider having each thread use its own `CallExecutor` instance, or use a different retrying library. **Note** The following example has been updated to include the use of requestInterceptor(). This was added in TinkerPop 3.6.6. Prior to TinkerPop version 3.6.6, the code example used handshakeInterceptor(), which was deprecated with that release. ``` public static void main(String args[]) { boolean useIam = true; // Create Gremlin cluster and traversal source Cluster.Builder builder = Cluster.build() .addContactPoint(System.getenv("neptuneEndpoint")) .port(Integer.parseInt(System.getenv("neptunePort"))) .enableSsl(true) .minConnectionPoolSize(1) .maxConnectionPoolSize(1) .serializer(Serializers.GRAPHBINARY_V1D0) .reconnectInterval(2000); if (useIam) { builder.requestInterceptor( r -> { try { NeptuneNettyHttpSigV4Signer sigV4Signer = new NeptuneNettyHttpSigV4Signer("(your region)", new DefaultAWSCredentialsProviderChain()); sigV4Signer.signRequest(r); } catch (NeptuneSigV4SignerException e) { throw new RuntimeException("Exception occurred while signing the request", e); } return r; }); } Cluster cluster = builder.create(); GraphTraversalSource g = AnonymousTraversalSource .traversal() .withRemote(DriverRemoteConnection.using(cluster)); // Configure retries RetryConfig retryConfig = new RetryConfigBuilder() .retryOnCustomExceptionLogic(getRetryLogic()) .withDelayBetweenTries(1000, ChronoUnit.MILLIS) .withMaxNumberOfTries(5) .withFixedBackoff() .build(); @SuppressWarnings("unchecked") CallExecutor

ID	Out #1	Out #2	Name	Arguments	Mode	Units In	Units Out	Ratio	Time (ms)
0	1	-	SolutionInjection	solutions=[{}]	-	0	1	0.00	0
1	2	-	PipelineJoin	pattern=distinct(?s, ?p, ?o) joinType=join joinProjectionVars=[?s, ?p, ?o]	-	1	6	6.00	1
2	3	-	Projection	vars=[?s, ?p, ?o]	retain	6	6	1.00	2
3	-	-	Slice	limit=1	-	1	1	1.00	1