

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 查询世系实体
<a name="querying-lineage-entities"></a>

Amazon SageMaker AI 会在您使用世系实体时自动生成这些图表。您可以查询这些数据来回答各种问题。下面将说明如何在 SDK for Python 中查询这些数据。

有关如何在 Amazon SageMaker Studio 中查看注册模特血统的信息，请参阅[在 Studio 中查看模型任务流水线详情](model-registry-lineage-view-studio.md)。

您可以查询世系实体以执行以下操作：
+ 检索创建模型时使用的所有数据集。
+ 检索创建端点时使用的所有作业。
+ 检索所有使用数据集的模型。
+ 检索所有使用模型的端点。
+ 检索哪些端点派生自特定数据集。
+ 检索创建了训练作业的管道执行。
+ 检索实体之间的关系，以便进行调查、治理和再现。
+ 检索所有使用该构件的下游试验。
+ 检索所有使用该构件的上游试验。
+ 检索使用所提供的 S3 URI 的构件列表。
+ 检索使用该数据集构件的上游构件。
+ 检索使用该数据集构件的下游构件。
+ 检索使用该映像构件的数据集。
+ 检索使用该上下文的操作。
+ 检索使用该端点的处理作业。
+ 检索使用该端点的转换作业。
+ 检索使用该端点的试验组件。
+ 检索与模型包组关联的管道执行的 ARN。
+ 检索所有使用该操作的构件。
+ 检索所有使用该模型包批准操作的上游数据集。
+ 从模型包批准操作中检索模型包。
+ 检索使用该端点的下游端点上下文。
+ 检索与试验组件关联的管道执行的 ARN。
+ 检索使用该试验组件的数据集。
+ 检索使用该试验组件的模型。
+ 探索您的世系以实现可视化。

**限制**
+ 以下区域不提供世系查询功能：
  + 非洲（开普敦）- af-south
  + 亚太地区（雅加达）– ap-southeast-3
  + 亚太地区（大阪）– ap-northeast-3
  + 欧洲地区（米兰）- eu-south-1
  + 欧洲（西班牙）- eu-south-2
  + 以色列（特拉维夫）– il-central-1
+ 要发现的关系的最大深度目前限制为 10。
+ 筛选仅限于以下属性：上次修改日期、创建日期、类型和世系实体类型。

**Topics**
+ [查询世系实体入门](#querying-lineage-entities-getting-started)

## 查询世系实体入门
<a name="querying-lineage-entities-getting-started"></a>

最简单的入门方式是通过：
+ 适用于 [Python 的亚马逊 SageMaker AI 开发工具包](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/lineage/artifact.py#L397)定义了许多常见用例。
+ [有关演示如何使用 SageMaker AI Lineage API 在谱系图中查询关系的笔记本，请参阅 sagemaker-lineage-multihop-queries.ipynb。](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-multihop-queries.ipynb)

以下示例说明如何使用 `LineageQuery` 和 `LineageFilter` API 来构造查询，以回答有关世系图表的问题，并针对一些使用案例提取实体关系。

**Example 使用 `LineageQuery` API 查找实体关联**  

```
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)
```

**Example 查找与端点关联的所有数据集**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)
```

**Example 查找与端点关联的模型**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)
```

**Example 查找与端点关联的试验组件**  

```
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)
```

**Example 更改世系的焦点**  
可以修改 `LineageQuery`，使其具有不同的 `start_arns`，这将更改世系的焦点。此外，`LineageFilter` 可以采用多个来源和实体来扩大查询范围。  
在下文中，我们使用模型作为世系焦点，并查找与之关联的端点和数据集。  

```
# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example 使用 `LineageQueryDirectionEnum.BOTH` 查找前代和后代关系**  
当方向设置为 `BOTH` 时，查询将遍历图表以查找前代和后代关系。这种遍历不仅从起始节点开始，而且从访问的每个节点开始。如果一个训练作业运行了两次，并且该训练作业生成的两个模型都部署到端点，则方向设置为 `BOTH` 的查询结果会显示这两个端点。这是因为训练和部署模型时使用的是同一映像。由于该映像对模型是通用的，因此 `start_arn` 和两个端点都会显示在查询结果中。  

```
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example `进入 LineageQuery-上升者与后代的方向`**  
要了解世系图表中的方向，请使用以下实体关系图表 - 数据集 -> 训练作业 -> 模型 -> 端点  
端点是模型的后代，而模型是数据集的后代。同样，模型是端点的前代。`direction` 参数可用于指定查询应返回 `start_arns` 中实体的后代实体还是前代实体。如果 `start_arns` 包含模型且方向为 `DESCENDANTS`，则查询将返回端点。如果方向为 `ASCENDANTS`，则查询将返回数据集。  

```
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)
```

**Example 可简化世系查询的 SDK 帮助程序函数**  
类 `EndpointContext`、`ModelArtifact` 和 `DatasetArtifact` 都有一些帮助程序函数，它们是 `LineageQuery` API 的包装器，可以让某些世系查询更容易利用。以下示例演示如何使用这些帮助程序函数。  

```
# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)
```

**Example 获取世系图表可视化**  
示例笔记本 [visualizer.py](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/visualizer.py) 中提供了一个帮助程序类 `Visualizer` 来帮助绘制世系图表。呈现查询响应时，将显示一个包含 `StartArns` 世系关系的图表。从 `StartArns` 开始，可视化显示了与 `query_lineage` API 操作中返回的其他世系实体之间的关系。  

```
# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")
```