

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 查詢歷程實體
<a name="querying-lineage-entities"></a>

Amazon SageMaker AI 會在您使用歷程實體時自動產生歷程實體的圖形。您可以查詢此資料以回答各種問題。以下提供如何在 SDK for Python 中查詢此資料的指示。

如需如何在 Amazon SageMaker Studio 中檢視已註冊模型歷程的相關資訊，請參閱[在 Studio 中檢視模型歷程詳細資訊](model-registry-lineage-view-studio.md)。

您可以查詢歷程實體，以執行下列作業：
+ 擷取建立模型時使用的所有資料集。
+ 擷取建立端點時使用的所有工作。
+ 擷取使用資料集的所有模型。
+ 擷取使用模型的所有端點。
+ 擷取從特定資料集衍生的端點。
+ 擷取建立訓練工作的管道執行。
+ 擷取實體之間的關係，以進行調查、治理和再現。
+ 擷取使用成品的所有下游試用。
+ 擷取所有使用成品的上游試用。
+ 擷取使用所提供之 S3 URI 的成品清單。
+ 擷取使用資料集成品的上游成品。
+ 擷取使用資料集成品的下游成品。
+ 擷取使用映像成品的資料集。
+ 擷取使用內容的動作。
+ 擷取使用端點的處理工作。
+ 擷取使用端點的轉換工作。
+ 擷取使用端點的試用元件。
+ 擷取與模型套件群組相關聯之管道執行的 ARN。
+ 擷取使用動作的所有成品。
+ 擷取使用模型套件核准動作的所有上游資料集。
+ 透過模型套件核准動作擷取模型套件。
+ 擷取使用端點的下游端點內容。
+ 擷取與試用元件相關聯之管道執行的 ARN。
+ 擷取使用試用元件的資料集。
+ 擷取使用試用元件的模型。
+ 探索歷程以進行視覺化。

**限制**
+ 下列區域無法使用歷程查詢：
  + 非洲 (開普敦) – af-south
  + 亞太區域 (雅加達) – ap-southeast-3
  + 亞太區域 (大阪) - (ap-northeast-3)
  + 歐洲 (米蘭) – eu-south-1
  + 歐洲 (西班牙) – eu-south-2
  + 以色列 (特拉維夫) – il-central-1
+ 目前，關係探索的最大深度限制為 10。
+ 篩選僅限於下列屬性：上次修改日期、建立日期、類型和歷程實體類型。

**Topics**
+ [開始查詢歷程實體](#querying-lineage-entities-getting-started)

## 開始查詢歷程實體
<a name="querying-lineage-entities-getting-started"></a>

開始查詢歷程實體的最簡單方式是：
+ [Amazon SageMaker AI SDK for Python](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/lineage/artifact.py#L397)，其中定義了許多常見使用案例。
+ 如需示範如何使用 SageMaker AI 歷程 API 查詢整個歷程圖中關係的筆記本，請參閱 [sagemaker-lineage-multihop-queries.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-multihop-queries.ipynb)。

下列範例展示如何使用 `LineageQuery` 和 `LineageFilter` API 建構查詢，以回答有關歷程圖的問題，並擷取一些使用案例中的實體關聯。

**Example 使用 `LineageQuery` API 尋找實體關聯**  

```
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)
```

**Example 尋找與某個端點相關聯的所有資料集**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)
```

**Example 尋找與某個端點相關聯的模型**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)
```

**Example 尋找與端點相關聯的試用元件**  

```
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)
```

**Example 變更歷程的焦點**  
`LineageQuery` 可以修改為具有不同的 `start_arns` 來變更歷程的焦點。此外，`LineageFilter` 可以採用多個來源和實體來擴充查詢的範圍。  
我們在下面使用該模型作為歷程焦點，並找到與之相關聯的端點和資料集。  

```
# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example 是用 `LineageQueryDirectionEnum.BOTH` 尋找遞增與遞減關係**  
當方向設定為 `BOTH` 時，查詢會遍歷圖形，以尋找遞增和遞減關係。這種遍歷不僅在起始節點發生，還會在造訪的每個節點進行。例如，如果某個訓練工作執行兩次，而且訓練工作產生的兩個模型均部署到端點，則查詢結果的方向會設定為 `BOTH`，以顯示兩個端點。這是因為模型訓練和部署是用了相同的映像。由於模型映像是相同的，因此 `start_arn` 和兩個端點都會出現在查詢結果中。  

```
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example `LineageQuery` 中的方向 - `ASCENDANTS` 和 `DESCENDANTS`**  
要了解在歷程圖中的方向，可採取以下實體關係圖：資料集-> 訓練工作 -> 模型-> 端點  
從模型到端點是遞減，從模型到資料集也是遞減。與此類似，從端點到模型是遞增。`direction` 參數可用來指定查詢應傳回 `start_arns` 中實體的遞減還是遞增實體。如果 `start_arns` 包含模型且方向為 `DESCENDANTS`，則查詢會傳回端點。如果方向為 `ASCENDANTS`，則查詢會傳回資料集。  

```
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)
```

**Example SDK 輔助函式讓歷程查詢變得更輕鬆**  
`EndpointContext`、`ModelArtifact` 和 `DatasetArtifact` 類別都具有輔助函式，這些函式是 `LineageQuery` API 上的包裝函式，可以讓某些歷程查詢變得更輕鬆。以下範例展示如何使用這些輔助函式。  

```
# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)
```

**Example 取得歷程圖視覺化圖形**  
範例筆記本 [visualizer.py](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/visualizer.py) 中提供了一個輔助函式類別 `Visualizer`，能夠幫助歷程圖出圖。彩現查詢回應時，系統會顯示含有來自 `StartArns` 之歷程關係的圖形。從`StartArns` 開始，此視覺化圖形會顯示與 `query_lineage` API 動作中傳回之其他歷程實體之間的關係。  

```
# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")
```