

# Gremlin inference queries in Neptune ML
<a name="machine-learning-gremlin-inference-queries"></a>

As described in [Neptune ML capabilities](machine-learning.md#machine-learning-capabilities), Neptune ML supports training models that can do the following kinds of inference tasks:
+ **Node classification**   –   Predicts the categorical feature of a vertex property.
+ **Node regression**   –   Predicts a numerical property of a vertex.
+ **Edge classification**   –   Predicts the categorical feature of an edge property.
+ **Edge regression**   –   Predicts a numerical property of an edge.
+ **Link prediction**   –   Predicts destination nodes given a source node and outgoing edge, or source nodes given a destination node and incoming edge.

We can illustrate these different tasks with examples that use the [MovieLens 100k dataset](https://grouplens.org/datasets/movielens/100k/) provided by [GroupLens Research](https://grouplens.org/datasets/movielens/). This dataset consists of movies, users, and ratings of the movies by the users, from which we've created a property graph like this: 

![\[Sample movie property graph using the MovieLens 100k dataset\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/movie_property_graph_example.png)


**Node classification**: In the dataset above, `Genre` is a vertex type which is connected to vertex type `Movie` by edge `included_in`. However, if we tweak the dataset to make `Genre` a [categorical](https://en.wikipedia.org/wiki/Categorical_variable) feature for vertex type `Movie`, then the problem of inferring `Genre` for new movies added to our knowledge graph can be solved using node classification models.

**Node regression**: If we consider the vertex type `Rating`, which has properties like `timestamp` and `score`, then the problem of inferring the numerical value `Score` for a `Rating` can be solved using node regression models.

**Edge classification**: Similarly, for a `Rated` edge, if we have a property `Scale` that can have one of the values, `Love`, `Like`, `Dislike`, `Neutral`, `Hate`, then the problem of inferring `Scale` for the `Rated` edge for new movies/ratings can be solved using edge classification models.

**Edge regression**: Similarly, for the same `Rated` edge, if we have a property `Score` that holds a numerical value for the rating, then this can be inferred from edge regression models.

**Link prediction**: Problems like, find the top ten users who are most likely to rate a given movie, or find the top ten Movies that a given user is most likely to rate, falls under link prediction.

**Note**  
For Neptune ML use-cases, we have a very rich set of notebooks designed to give you a hands-on understanding of each use-case. You can create these notebooks along with your Neptune cluster when you use the [Neptune ML CloudFormation template](machine-learning-quick-start.md) to create a Neptune ML cluster. These notebooks are also available on [github](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/04-Machine-Learning) as well.

**Topics**
+ [

# Neptune ML predicates used in Gremlin inference queries
](machine-learning-gremlin-inference-query-predicates.md)
+ [

# Gremlin node classification queries in Neptune ML
](machine-learning-gremlin-vertex-classification-queries.md)
+ [

# Gremlin node regression queries in Neptune ML
](machine-learning-gremlin-vertex-regression-queries.md)
+ [

# Gremlin edge classification queries in Neptune ML
](machine-learning-gremlin-edge-classification-queries.md)
+ [

# Gremlin edge regression queries in Neptune ML
](machine-learning-gremlin-edge-regression.md)
+ [

# Gremlin link prediction queries using link-prediction models in Neptune ML
](machine-learning-gremlin-link-prediction-queries.md)
+ [

# List of exceptions for Neptune ML Gremlin inference queries
](machine-learning-gremlin-exceptions.md)

# Neptune ML predicates used in Gremlin inference queries
<a name="machine-learning-gremlin-inference-query-predicates"></a>

## `Neptune#ml.deterministic`
<a name="machine-learning-gremlin-inference-neptune-ml-deterministic-predicate"></a>

This predicate is an option for inductive inference queries — that is, for queries that include the [`Neptune#ml.inductiveInference`](#machine-learning-gremlin-inference-neptune-ml-inductiveInference) predicate.

When using inductive inference, the Neptune engine creates the appropriate subgraph to evaluate the trained GNN model, and the requirements of this subgraph depend on parameters of the final model. Specifically, the `num-layer` parameter determines the number of traversal hops from the target nodes or edges, and the `fanouts` parameter specifies how many neighbors to sample at each hop (see [HPO parameters](machine-learning-customizing-hyperparams.md)).

By default, inductive inference queries run in non-deterministic mode, in which Neptune builds the neighborhood randomly. When making predictions, this normal random-neighbor sampling sometimes result in different predictions.

When you include `Neptune#ml.deterministic` in an inductive inference query, the Neptune engine attempts to sample neighbors in a deterministic way so that multiple invocations of the same query return the same results every time. The results can't be guaranteed to be completely deterministic, however, because changes to the underlying graph and artifacts of distributed systems can still introduce fluctuations.

You include the `Neptune#ml.deterministic` predicate in a query like this:

```
.with("Neptune#ml.deterministic")
```

If the `Neptune#ml.deterministic` predicate is included in a query that doesn't also include `Neptune#ml.inductiveInference`, it is simply ignored.

## `Neptune#ml.disableInductiveInferenceMetadataCache`
<a name="machine-learning-gremlin-disableInductiveInferenceMetadataCache-predicate"></a>

This predicate is an option for inductive inference queries — that is, for queries that include the [`Neptune#ml.inductiveInference`](#machine-learning-gremlin-inference-neptune-ml-inductiveInference) predicate.

For inductive inference queries, Neptune uses a metadata file stored in Amazon S3 to decide the number of hops and the fanout while building the neighborhood. Neptune normally caches this model metadata to avoid fetching the file from Amazon S3 repeatedly. Caching can be disabled by including the `Neptune#ml.disableInductiveInferenceMetadataCache` predicate in the query. Although it may be slower for Neptune to fetch the metadata directly from Amazon S3, it is useful when the SageMaker AI endpoint has been updated after retraining or transformation and the cache is stale.

You include the `Neptune#ml.disableInductiveInferenceMetadataCache` predicate in a query like this:

```
.with("Neptune#ml.disableInductiveInferenceMetadataCache")
```

Here is how a sample query might look in a Jupyter notebook:

```
%%gremlin
g.with("Neptune#ml.endpoint", "ep1")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .with("Neptune#ml.disableInductiveInferenceMetadataCache")
 .V('101').properties("rating")
 .with("Neptune#ml.regression")
 .with("Neptune#ml.inductiveInference")
```

## `Neptune#ml.endpoint`
<a name="machine-learning-gremlin-inference-neptune-ml-endpoint-predicate"></a>

The `Neptune#ml.endpoint` predicate is used in a `with()` step to specify the inference endpoint, if necessary:

```
 .with("Neptune#ml.endpoint", "the model's SageMaker AI inference endpoint")
```

You can identify the endpoint either by its `id` or its URL. For example:

```
 .with( "Neptune#ml.endpoint", "node-classification-movie-lens-endpoint" )
```

Or:

```
 .with( "Neptune#ml.endpoint", "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/node-classification-movie-lens-endpoint/invocations" )
```

**Note**  
If you [set the `neptune_ml_endpoint` parameter](machine-learning-cluster-setup.md#machine-learning-set-inference-endpoint-cluster-parameter) in your Neptune DB cluster parameter group to the endpoint `id` or URL, you don't need to include the `Neptune#ml.endpoint` predicate in each query.

## `Neptune#ml.iamRoleArn`
<a name="machine-learning-gremlin-inference-neptune-ml-iamRoleArn-predicate"></a>

`Neptune#ml.iamRoleArn` is used in a `with()` step to specify the ARN of the SageMaker AI execution IAM role, if necessary:

```
 .with("Neptune#ml.iamRoleArn", "the ARN for the SageMaker AI execution IAM role")
```

For information about how to create the SageMaker AI execution IAM role, see [Create a custom NeptuneSageMakerIAMRole role](machine-learning-manual-setup.md#ml-manual-setup-sm-role).

**Note**  
If you [set the `neptune_ml_iam_role` parameter](machine-learning-cluster-setup.md#machine-learning-enabling-create-param-group) in your Neptune DB cluster parameter group to the ARN of your SageMaker AI execution IAM role, you don't need to include the `Neptune#ml.iamRoleArn` predicate in each query.

## Neptune\$1ml.inductiveInference
<a name="machine-learning-gremlin-inference-neptune-ml-inductiveInference"></a>

Transductive inference is enabled by default in Gremlin. To make a [real-time inductive inference](machine-learning-overview-evolving-data.md#inductive-vs-transductive-inference) query, include the `Neptune#ml.inductiveInference` predicate like this:

```
.with("Neptune#ml.inductiveInference")
```

If your graph is dynamic, inductive inference is often the best choice, but if your graph is static, transductive inference is faster and more efficient.

## `Neptune#ml.limit`
<a name="machine-learning-gremlin-inference-neptune-ml-limit-predicate"></a>

The `Neptune#ml.limit` predicate optionally limits the number of results returned per entity:

```
 .with( "Neptune#ml.limit", 2 )
```

By default, the limit is 1, and the maximum number that can be set is 100.

## `Neptune#ml.threshold`
<a name="machine-learning-gremlin-inference-neptune-ml-threshold-predicate"></a>

The `Neptune#ml.threshold` predicate optionally establishes a cutoff threshold for result scores:

```
 .with( "Neptune#ml.threshold", 0.5D )
```

This lets you discard all results with scores below the specified threshold.

## `Neptune#ml.classification`
<a name="machine-learning-gremlin-inference-neptune-ml-classification-predicate"></a>

The `Neptune#ml.classification` predicate is attached to the `properties()` step to establish that the properties need to be fetched from the SageMaker AI endpoint of the node classification model:

```
 .properties( "property key of the node classification model" ).with( "Neptune#ml.classification" )
```

## `Neptune#ml.regression`
<a name="machine-learning-gremlin-inference-neptune-ml-regression-predicate"></a>

The `Neptune#ml.regression` predicate is attached to the `properties()` step to establish that the properties need to be fetched from the SageMaker AI endpoint of the node regression model:

```
 .properties( "property key of the node regression model" ).with( "Neptune#ml.regression" )
```

## `Neptune#ml.prediction`
<a name="machine-learning-gremlin-inference-neptune-ml-prediction-predicate"></a>

The `Neptune#ml.prediction` predicate is attached to `in()` and `out()` steps to establish that this a link-prediction query:

```
 .in("edge label of the link prediction model").with("Neptune#ml.prediction").hasLabel("target node label")
```

## `Neptune#ml.score`
<a name="machine-learning-gremlin-inference-neptune-ml-score-predicate"></a>

The `Neptune#ml.score` predicate is used in Gremlin node or edge classification queries to fetch a machine-learning confidence Score. The `Neptune#ml.score` predicate should be passed together with the query predicate in the `properties()` step to obtain an ML confidence score for node or edge classification queries.

You can find a node classification example with [other node classification examples](machine-learning-gremlin-vertex-classification-queries.md#machine-learning-gremlin-node-class-other-queries), and an edge classification example in the [edge classification section](machine-learning-gremlin-edge-classification-queries.md).

# Gremlin node classification queries in Neptune ML
<a name="machine-learning-gremlin-vertex-classification-queries"></a>

For Gremlin node classification in Neptune ML:
+ The model is trained on one property of the vertices. The set of unique values of this property are referred to as a set of node classes, or simply, classes.
+ The node class or categorical property value of a vertex's property can be inferred from the node classification model. This is useful where this property is not already attached to the vertex.
+ In order to fetch one or more classes from a node classification model, you need to use the `with()` step with the predicate `Neptune#ml.classification` to configure the `properties()` step. The output format is similar to what you would expect if those were vertex properties.

**Note**  
Node classification only works with string property values. That means that numerical property values such as `0` or `1` are not supported, although the string equivalents `"0"` and `"1"` are. Similarly, the Boolean property values `true` and `false` don't work, but `"true"` and `"false"` do.

Here is a sample node classification query:

```
g.with( "Neptune#ml.endpoint","node-classification-movie-lens-endpoint" )
 .with( "Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role" )
 .with( "Neptune#ml.limit", 2 )
 .with( "Neptune#ml.threshold", 0.5D )
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre").with("Neptune#ml.classification")
```

The output of this query would look something like the following:

```
==>vp[genre->Action]
==>vp[genre->Crime]
==>vp[genre->Comedy]
```

In the query above, the `V()` and `properties()` steps are used as follows:

The `V()` step contains the set of vertices for which you want to fetch the classes from the node-classification model:

```
 .V( "movie_1", "movie_2", "movie_3" )
```

The `properties()` step contains the key on which the model was trained, and has `.with("Neptune#ml.classification")` to indicate that this is a node classification ML inference query.

Multiple property keys are not currently supported in a `properties().with("Neptune#ml.classification")` step. For example, the following query results in an exception:

```
g.with("Neptune#ml.endpoint", "node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre", "other_label").with("Neptune#ml.classification")
```

For the specific error message, see the [list of Neptune ML exceptions](machine-learning-gremlin-exceptions.md).

A `properties().with("Neptune#ml.classification")` step can be used in combination with any of the following steps:
+ `value()`
+ `value().is()`
+ `hasValue()`
+ `has(value,"")`
+ `key()`
+ `key().is()`
+ `hasKey()`
+ `has(key,"")`
+ `path()`

## Other node-classification queries
<a name="machine-learning-gremlin-node-class-other-queries"></a>

If both the inference endpoint and the corresponding IAM role have been saved in your DB cluster parameter group, a node-classification query can be as simple as this:

```
g.V("movie_1", "movie_2", "movie_3").properties("genre").with("Neptune#ml.classification")
```

You can mix vertex properties and classes in a query using the `union()` step:

```
g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .union(
   properties("genre").with("Neptune#ml.classification"),
   properties("genre")
 )
```

You can also make an unbounded query such as this:

```
g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V()
 .properties("genre").with("Neptune#ml.classification")
```

You can retrieve the node classes together with vertices using the `select()` step together with the `as()` step:

```
g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" ).as("vertex")
 .properties("genre").with("Neptune#ml.classification").as("properties")
 .select("vertex","properties")
```

You can also filter on node classes, as illustrated in these examples:

```
g.with("Neptune#ml.endpoint", "node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre").with("Neptune#ml.classification")
 .has(value, "Horror")

g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre").with("Neptune#ml.classification")
 .has(value, P.eq("Action"))

g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre").with("Neptune#ml.classification")
 .has(value, P.within("Action", "Horror"))
```

You can get a node classification confidence score using the `Neptune#ml.score` predicate:

```
 g.with("Neptune#ml.endpoint","node-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V( "movie_1", "movie_2", "movie_3" )
 .properties("genre", "Neptune#ml.score").with("Neptune#ml.classification")
```

The response would look like this:

```
==>vp[genre->Action]
==>vp[Neptune#ml.score->0.01234567]
==>vp[genre->Crime]
==>vp[Neptune#ml.score->0.543210]
==>vp[genre->Comedy]
==>vp[Neptune#ml.score->0.10101]
```

## Using inductive inference in a node classification query
<a name="machine-learning-gremlin-node-class-inductive"></a>

Supposing you were to add a new node to an existing graph, in a Jupyter notebook, like this:

```
%%gremlin
g.addV('label1').property(id,'101').as('newV')
 .V('1').as('oldV1')
 .V('2').as('oldV2')
 .addE('eLabel1').from('newV').to('oldV1')
 .addE('eLabel2').from('oldV2').to('newV')
```

You could then use an inductive inference query to get a genre and confidence score that reflected the new node:

```
%%gremlin
g.with("Neptune#ml.endpoint", "nc-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('101').properties("genre", "Neptune#ml.score")
 .with("Neptune#ml.classification")
 .with("Neptune#ml.inductiveInference")
```

If you ran the query several times, however, you might get somewhat different results:

```
# First time
==>vp[genre->Action]
==>vp[Neptune#ml.score->0.12345678]

# Second time
==>vp[genre->Action]
==>vp[Neptune#ml.score->0.21365921]
```

You could make the same query deterministic:

```
%%gremlin
g.with("Neptune#ml.endpoint", "nc-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('101').properties("genre", "Neptune#ml.score")
 .with("Neptune#ml.classification")
 .with("Neptune#ml.inductiveInference")
 .with("Neptune#ml.deterministic")
```

In that case, the results would be roughly the same every time:

```
# First time
==>vp[genre->Action]
==>vp[Neptune#ml.score->0.12345678]
# Second time
==>vp[genre->Action]
==>vp[Neptune#ml.score->0.12345678]
```

# Gremlin node regression queries in Neptune ML
<a name="machine-learning-gremlin-vertex-regression-queries"></a>

Node regression is similar to node classification, except that the value inferred from the regression model for each node is numeric. You can use the same Gremlin queries for node regression as for node classification except for the following differences:
+ Again, in Neptune ML, nodes refer to vertices.
+ The `properties()` step takes the form, `properties().with("Neptune#ml.regression")`instead of `properties().with("Neptune#ml.classification")`.
+ The `"Neptune#ml.limit`" and `"Neptune#ml.threshold"` predicates are not applicable.
+ When you filter on the value, you have to specify a numeric value.

Here is a sample vertex classification query:

```
g.with("Neptune#ml.endpoint","node-regression-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::0123456789:role/sagemaker-role")
 .V("movie_1","movie_2","movie_3")
 .properties("revenue").with("Neptune#ml.regression")
```

You can filter on the value inferred using a regression model, as illustrated in the following examples:

```
g.with("Neptune#ml.endpoint","node-regression-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("movie_1","movie_2","movie_3")
 .properties("revenue").with("Neptune#ml.regression")
 .value().is(P.gte(1600000))

g.with("Neptune#ml.endpoint","node-regression-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("movie_1","movie_2","movie_3")
 .properties("revenue").with("Neptune#ml.regression")
 .hasValue(P.lte(1600000D))
```

## Using inductive inference in a node regression query
<a name="machine-learning-gremlin-node-regress-inductive"></a>

Supposing you were to add a new node to an existing graph, in a Jupyter notebook, like this:

```
%%gremlin
g.addV('label1').property(id,'101').as('newV')
 .V('1').as('oldV1')
 .V('2').as('oldV2')
 .addE('eLabel1').from('newV').to('oldV1')
 .addE('eLabel2').from('oldV2').to('newV')
```

You could then use an inductive inference query to get a rating that took into account the new node:

```
%%gremlin
g.with("Neptune#ml.endpoint", "nr-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('101').properties("rating")
 .with("Neptune#ml.regression")
 .with("Neptune#ml.inductiveInference")
```

Because the query is not deterministic, it might return somewhat different results if you run it several times, based on the neighborhood:

```
# First time
==>vp[rating->9.1]

# Second time
==>vp[rating->8.9]
```

If you need more consistent results, you could make the query deterministic:

```
%%gremlin
g.with("Neptune#ml.endpoint", "nc-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('101').properties("rating")
 .with("Neptune#ml.regression")
 .with("Neptune#ml.inductiveInference")
 .with("Neptune#ml.deterministic")
```

Now the results will be roughly the same every time:

```
# First time
==>vp[rating->9.1]

# Second time
==>vp[rating->9.1]
```

# Gremlin edge classification queries in Neptune ML
<a name="machine-learning-gremlin-edge-classification-queries"></a>

For Gremlin edge classification in Neptune ML:
+ The model is trained on one property of the edges. The set of unique values of this property is referred to as a set of classes.
+ The class or categorical property value of an edge can be inferred from the edge classification model, which is useful when this property is not already attached to the edge.
+ In order to fetch one or more classes from an edge classification model, you need to use the `with()` step with the predicate, `"Neptune#ml.classification"` to configure the `properties()` step. The output format is similar to what you would expect if those were edge properties.

**Note**  
Edge classification only works with string property values. That means that numerical property values such as `0` or `1` are not supported, although the string equivalents `"0"` and `"1"` are. Similarly, the Boolean property values `true` and `false` don't work, but `"true"` and `"false"` do.

Here is an example of an edge classification query that requests a confidence score using the `Neptune#ml.score` predicate:

```
g.with("Neptune#ml.endpoint","edge-classification-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .E("relationship_1","relationship_2","relationship_3")
 .properties("knows_by", "Neptune#ml.score").with("Neptune#ml.classification")
```

The response would look like this:

```
==>p[knows_by->"Family"]
==>p[Neptune#ml.score->0.01234567]
==>p[knows_by->"Friends"]
==>p[Neptune#ml.score->0.543210]
==>p[knows_by->"Colleagues"]
==>p[Neptune#ml.score->0.10101]
```

## Syntax of a Gremlin edge classification query
<a name="machine-learning-gremlin-edge-classification-syntax"></a>

For a simple graph where `User` is the head and tail node, and `Relationship` is the edge that connects them, an example edge classification query is:

```
g.with("Neptune#ml.endpoint","edge-classification-social-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .E("relationship_1","relationship_2","relationship_3")
 .properties("knows_by").with("Neptune#ml.classification")
```

The output of this query would look something like the following:

```
==>p[knows_by->"Family"]
==>p[knows_by->"Friends"]
==>p[knows_by->"Colleagues"]
```

In the query above, the `E()` and `properties()` steps are used as follows:
+ The `E()` step contains the set of edges for which you want to fetch the classes from the edge-classification model:

  ```
  .E("relationship_1","relationship_2","relationship_3")
  ```
+ The `properties()` step contains the key on which the model was trained, and has `.with("Neptune#ml.classification")` to indicate that this is an edge classification ML inference query.

Multiple property keys are not currently supported in a `properties().with("Neptune#ml.classification")` step. For example, the following query results in an exception being thrown:

```
g.with("Neptune#ml.endpoint","edge-classification-social-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .E("relationship_1","relationship_2","relationship_3")
 .properties("knows_by", "other_label").with("Neptune#ml.classification")
```

For specific error messages, see [List of exceptions for Neptune ML Gremlin inference queries](machine-learning-gremlin-exceptions.md).

A `properties().with("Neptune#ml.classification")` step can be used in combination with any of the following steps:
+ `value()`
+ `value().is()`
+ `hasValue()`
+ `has(value,"")`
+ `key()`
+ `key().is()`
+ `hasKey()`
+ `has(key,"")`
+ `path()`

## Using inductive inference in an edge classification query
<a name="machine-learning-gremlin-edge-class-inductive"></a>

Supposing you were to add a new edge to an existing graph, in a Jupyter notebook, like this:

```
%%gremlin
g.V('1').as('fromV')
.V('2').as('toV')
.addE('eLabel1').from('fromV').to('toV').property(id, 'e101')
```

You could then use an inductive inference query to get a scale that took into account the new edge:

```
%%gremlin
g.with("Neptune#ml.endpoint", "ec-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .E('e101').properties("scale", "Neptune#ml.score")
 .with("Neptune#ml.classification")
 .with("Neptune#ml.inductiveInference")
```

Because the query is not deterministic, the results would vary somewhat if you run it multiple times, based on the random neighborhood:

```
# First time
==>vp[scale->Like]
==>vp[Neptune#ml.score->0.12345678]

# Second time
==>vp[scale->Like]
==>vp[Neptune#ml.score->0.21365921]
```

If you need more consistent results, you could make the query deterministic:

```
%%gremlin
g.with("Neptune#ml.endpoint", "ec-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .E('e101').properties("scale", "Neptune#ml.score")
 .with("Neptune#ml.classification")
 .with("Neptune#ml.inductiveInference")
 .with("Neptune#ml.deterministic")
```

Now the results will be more or less the same every time you run the query:

```
# First time
==>vp[scale->Like]
==>vp[Neptune#ml.score->0.12345678]

# Second time
==>vp[scale->Like]
==>vp[Neptune#ml.score->0.12345678]
```

# Gremlin edge regression queries in Neptune ML
<a name="machine-learning-gremlin-edge-regression"></a>

Edge regression is similar to edge classification, except that the value inferred from the ML model is numeric. For edge regression, Neptune ML supports the same queries as for classification.

Key points to note are:
+ You need to use the ML predicate `"Neptune#ml.regression"` to configure the `properties()` step for this use-case.
+ The `"Neptune#ml.limit"` and `"Neptune#ml.threshold"` predicates are not applicable in this use-case.
+ For filtering on the value, you need to specify the value as numerical.

## Syntax of a Gremlin edge regression query
<a name="machine-learning-gremlin-edge-regression-syntax"></a>

For a simple graph where `User` is the head node, `Movie` is the tail node, and `Rated` is the edge that connects them, here is an example edge regression query that finds the numeric rating value, referred to as score here, for the edge `Rated`:

```
g.with("Neptune#ml.endpoint","edge-regression-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .E("rating_1","rating_2","rating_3")
 .properties("score").with("Neptune#ml.regression")
```

You can also filter on a value inferred from the ML regression model. For the existing `Rated` edges (from `User` to `Movie`) identified by `"rating_1"`, `"rating_2"`, and `"rating_3"`, where the edge property `Score` is not present for these ratings, you can use a query like following to infer `Score` for the edges where it is greater than or equal to 9:

```
g.with("Neptune#ml.endpoint","edge-regression-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .E("rating_1","rating_2","rating_3")
 .properties("score").with("Neptune#ml.regression")
 .value().is(P.gte(9))
```

## Using inductive inference in an edge regression query
<a name="machine-learning-gremlin-edge-regression-inductive"></a>

Supposing you were to add a new edge to an existing graph, in a Jupyter notebook, like this:

```
%%gremlin
g.V('1').as('fromV')
.V('2').as('toV')
.addE('eLabel1').from('fromV').to('toV').property(id, 'e101')
```

You could then use an inductive inference query to get a score that took into account the new edge:

```
%%gremlin
g.with("Neptune#ml.endpoint", "er-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .E('e101').properties("score")
 .with("Neptune#ml.regression")
 .with("Neptune#ml.inductiveInference")
```

Because the query is not deterministic, the results would vary somewhat if you run it multiple times, based on the random neighborhood:

```
# First time
==>ep[score->96]

# Second time
==>ep[score->91]
```

If you need more consistent results, you could make the query deterministic:

```
%%gremlin
g.with("Neptune#ml.endpoint", "er-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .E('e101').properties("score")
 .with("Neptune#ml.regression")
 .with("Neptune#ml.inductiveInference")
 .with("Neptune#ml.deterministic")
```

Now the results will be more or less the same every time you run the query:

```
# First time
==>ep[score->96]

# Second time
==>ep[score->96]
```

# Gremlin link prediction queries using link-prediction models in Neptune ML
<a name="machine-learning-gremlin-link-prediction-queries"></a>

Link-prediction models can solve problems such as the following:
+ **Head-node prediction**: Given a vertex and an edge type, what vertices is that vertex likely to link from?
+ **Tail-node prediction**: Given a vertex and an edge label, what vertices is that vertex likely to link to?

**Note**  
Edge prediction is not yet supported in Neptune ML.

For the examples below, consider a simple graph with the vertices `User` and `Movie` that are linked by the edge `Rated`.

Here is a sample head-node prediction query, used to predict the top five users most likely to rate the movies, `"movie_1"`, `"movie_2"`, and `"movie_3"`:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .with("Neptune#ml.limit", 5)
 .V("movie_1", "movie_2", "movie_3")
 .in("rated").with("Neptune#ml.prediction").hasLabel("user")
```

Here is a similar one for tail-node prediction, used to predict the top five movies that user `"user_1"` is likely to rate:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("user_1")
 .out("rated").with("Neptune#ml.prediction").hasLabel("movie")
```

Both the edge label and the predicted vertex label are required. If either is omitted, an exception is thrown. For example, the following query without a predicted vertex label throws an exception:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("user_1")
 .out("rated").with("Neptune#ml.prediction")
```

Similarly, the following query without an edge label throws an exception:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("user_1")
 .out().with("Neptune#ml.prediction").hasLabel("movie")
```

For the specific error messages that these exceptions return, see the [list of Neptune ML exceptions](machine-learning-gremlin-exceptions.md).

## Other link-prediction queries
<a name="machine-learning-gremlin-other-link-prediction-queries"></a>

You can use the `select()` step with the `as(`) step to output the predicted vertices together with the input vertices:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("movie_1").as("source")
 .in("rated").with("Neptune#ml.prediction").hasLabel("user").as("target")
 .select("source","target")

g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("user_1").as("source")
 .out("rated").with("Neptune#ml.prediction").hasLabel("movie").as("target")
 .select("source","target")
```

You can make unbounded queries, like these:

```
g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("user_1")
 .out("rated").with("Neptune#ml.prediction").hasLabel("movie")

g.with("Neptune#ml.endpoint","node-prediction-movie-lens-endpoint")
 .with("Neptune#ml.iamRoleArn","arn:aws:iam::0123456789:role/sagemaker-role")
 .V("movie_1")
 .in("rated").with("Neptune#ml.prediction").hasLabel("user")
```

## Using inductive inference in a link prediction query
<a name="machine-learning-gremlin-link-predict-inductive"></a>

Supposing you were to add a new node to an existing graph, in a Jupyter notebook, like this:

```
%%gremlin
g.addV('label1').property(id,'101').as('newV1')
 .addV('label2').property(id,'102').as('newV2')
 .V('1').as('oldV1')
 .V('2').as('oldV2')
 .addE('eLabel1').from('newV1').to('oldV1')
 .addE('eLabel2').from('oldV2').to('newV2')
```

You could then use an inductive inference query to predict the head node, taking into account the new node:

```
%%gremlin
g.with("Neptune#ml.endpoint", "lp-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('101').out("eLabel1")
 .with("Neptune#ml.prediction")
 .with("Neptune#ml.inductiveInference")
 .hasLabel("label2")
```

Result:

```
==>V[2]
```

Similarly, you could use an inductive inference query to predict the tail node, taking into account the new node:

```
%%gremlin
g.with("Neptune#ml.endpoint", "lp-ep")
 .with("Neptune#ml.iamRoleArn", "arn:aws:iam::123456789012:role/NeptuneMLRole")
 .V('102').in("eLabel2")
 .with("Neptune#ml.prediction")
 .with("Neptune#ml.inductiveInference")
 .hasLabel("label1")
```

Result:

```
==>V[1]
```

# List of exceptions for Neptune ML Gremlin inference queries
<a name="machine-learning-gremlin-exceptions"></a>

 This is a comprehensive list of exceptions that can occur when executing Neptune ML Gremlin inference queries. These exceptions cover a range of issues, from problems with the specified IAM role or endpoint, to unsupported Gremlin steps and limitations on the number of ML inference queries per query. Each entry includes a detailed message describing the issue. 
+ **`BadRequestException`**   –   The credentials for the supplied role cannot be loaded.

  *Message*: `Unable to load credentials for role: the specified IAM Role ARN.`
+ **`BadRequestException`**   –   The specified IAM role is not authorized to invoke the SageMaker AI endpoint.

  *Message*: `User: the specified IAM Role ARN is not authorized to perform: sagemaker:InvokeEndpoint on resource: the specified endpoint.`
+ **`BadRequestException`**   –   The specified endpoint does not exist.

  *Message*: `Endpoint the specified endpoint not found.`
+ **`InternalFailureException`**   –   Unable to fetch Neptune ML real-time inductive inference metadata from Amazon S3.

  *Message*: `Unable to fetch Neptune ML - Real-Time Inductive Inference metadata from S3. Check the permissions of the S3 bucket or if the Neptune instance can connect to S3.`
+ **`InternalFailureException`**   –   Neptune ML cannot find the metadata file for real-time inductive inference in Amazon S3.

  *Message*: `Neptune ML cannot find the metadata file for Real-Time Inductive Inference in S3.`
+ **`InvalidParameterException`**   –   The specified endpoint is not syntactically valid.

  *Message*: `Invalid endpoint provided for external service query.`
+ **`InvalidParameterException`**   –   The specified SageMaker execution IAM Role ARN is not syntactically valid.

  *Message*: `Invalid IAM role ARN provided for external service query.`
+ **`InvalidParameterException`**   –   Multiple property keys are specified in the `properties()` step in a query.

  *Message*: `ML inference queries are currently supported for one property key.`
+ **`InvalidParameterException`**   –   Multiple edge labels are specified in a query.

  *Message*: `ML inference are currently supported only with one edge label.`
+ **`InvalidParameterException`**   –   Multiple vertex label constraints are specified in a query.

  *Message*: `ML inference are currently supported only with one vertex label constraint.`
+ **`InvalidParameterException`**   –   Both `Neptune#ml.classification` and `Neptune#ml.regression` predicates are present in the same query.

  *Message*: `Both regression and classification ML predicates cannot be specified in the query.`
+ **`InvalidParameterException`**   –   More than one edge label was specified in the `in()` or `out()` step in a link-prediction query.

  *Message*: `ML inference are currently supported only with one edge label.`
+ **`InvalidParameterException`**   –   More than one property key was specified with Neptune\$1ml.score.

  *Message*: `Neptune ML inference queries are currently supported for one property key and one Neptune#ml.score property key.`
+ **`MissingParameterException`**   –   The endpoint was not specified in the query or as a DB cluster parameter.

  *Message*: `No endpoint provided for external service query.`
+ **``MissingParameterException**   –   The SageMaker AI execution IAM role was not specified in the query or as a DB cluster parameter.

  *Message*: `No IAM role ARN provided for external service query.`
+ **`MissingParameterException`**   –   The property key is missing from the `properties()` step in a query.

  *Message*: `Property key needs to be specified using properties() step for ML inference queries.`
+ **`MissingParameterException`**   –   No edge label was specified in the `in()` or `out()` step of a link-prediction query.

  *Message*: `Edge label needs to be specified while using in() or out() step for ML inference queries.`
+ **`MissingParameterException`**   –   No property key was specified with Neptune\$1ml.score.

  *Message*: `Property key needs to be specified along with Neptune#ml.score property key while using the properties() step for Neptune ML inference queries.`
+ **`UnsupportedOperationException`**   –   The `both()` step is used in a link-prediction query.

  *Message*: `ML inference queries are currently not supported with both() step.`
+ **`UnsupportedOperationException`**   –   No predicted vertex label was specified in the `has()` step with the `in()` or `out()` step in a link-prediction query.

  *Message*: `Predicted vertex label needs to be specified using has() step for ML inference queries.`
+ **`UnsupportedOperationException`**   –   Gremlin ML inductive inference queries are not currently supported with unoptimized steps.

  *Message*: `Neptune ML - Real-Time Inductive Inference queries are currently not supported with Gremlin steps which are not optimized for Neptune. Check the Neptune User Guide for a list of Neptune-optimized steps.`
+ **`UnsupportedOperationException`**   –   Neptune ML inference queries are not currently supported inside a `repeat` step.

  *Message*: `Neptune ML inference queries are currently not supported inside a repeat step.`
+ **`UnsupportedOperationException`**   –   No more than one Neptune ML inference query is currently supported per Gremlin query.

  *Message*: `Neptune ML inference queries are currently supported only with one ML inference query per gremlin query.`