

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 開始使用
<a name="nova-sagemaker-inference-getting-started"></a>

本指南說明如何在 SageMaker 即時端點上部署自訂的 Amazon Nova 模型、設定推論參數，以及叫用模型進行測試。

## 先決條件
<a name="nova-sagemaker-inference-prerequisites"></a>

以下是在 SageMaker 推論上部署 Amazon Nova 模型的先決條件：
+ 建立 AWS 帳戶 - 如果您還沒有 ，請參閱[建立 AWS 帳戶](https://docs.aws.amazon.com//sagemaker/latest/dg/gs-set-up.html#sign-up-for-aws)。
+ 必要的 IAM 許可 - 確保您的 IAM 使用者或角色已連接下列受管政策：
  + `AmazonSageMakerFullAccess`
  + `AmazonS3FullAccess`
+ 必要的 SDKs/CLI 版本 - 下列 SDK 版本已在 SageMaker 推論上使用 Amazon Nova 模型進行測試和驗證：
  + 適用於資源型 API 方法的 SageMaker Python SDK v3.0.0\$1 (`sagemaker>=3.0.0`)
  + Boto3 1.35.0\$1 版 (`boto3>=1.35.0`) 用於直接 API 呼叫。本指南中的範例使用此方法。
+ 增加服務配額 - 針對您計劃用於 Amazon SageMaker SageMaker 服務配額 （例如 `ml.p5.48xlarge for endpoint usage`)。如需支援的執行個體類型清單，請參閱 [支援的模型和執行個體](nova-model-sagemaker-inference.md#nova-sagemaker-inference-supported)。若要請求提高配額，請參閱[請求提高配額](https://docs.aws.amazon.com//servicequotas/latest/userguide/request-quota-increase.html)。如需 SageMaker 執行個體配額的相關資訊，請參閱 [SageMaker 端點和配額](https://docs.aws.amazon.com//general/latest/gr/sagemaker.html)。

## 步驟 1：設定 AWS 登入資料
<a name="nova-sagemaker-inference-step1"></a>

使用下列其中一種方法設定您的 AWS 登入資料：

**選項 1： AWS CLI （建議）**

```
aws configure
```

出現提示時，輸入您的 AWS 存取金鑰、私密金鑰和預設區域。

**選項 2： AWS credentials 檔案**

建立或編輯 `~/.aws/credentials`：

```
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
```

**選項 3：環境變數**

```
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
```

**注意**  
如需 AWS 登入資料的詳細資訊，請參閱[組態和登入資料檔案設定](https://docs.aws.amazon.com//cli/latest/userguide/cli-configure-files.html)。

**初始化 AWS 用戶端**

使用下列程式碼建立 Python 指令碼或筆記本，以初始化 AWS SDK 並驗證您的登入資料：

```
import boto3

# AWS Configuration - Update these for your environment
REGION = "us-east-1"  # Supported regions: us-east-1, us-west-2
AWS_ACCOUNT_ID = "YOUR_ACCOUNT_ID"  # Replace with your AWS account ID

# Initialize AWS clients using default credential chain
sagemaker = boto3.client('sagemaker', region_name=REGION)
sts = boto3.client('sts')

# Verify credentials
try:
    identity = sts.get_caller_identity()
    print(f"Successfully authenticated to AWS Account: {identity['Account']}")
    
    if identity['Account'] != AWS_ACCOUNT_ID:
        print(f"Warning: Connected to account {identity['Account']}, expected {AWS_ACCOUNT_ID}")

except Exception as e:
    print(f"Failed to authenticate: {e}")
    print("Please verify your credentials are configured correctly.")
```

如果身分驗證成功，您應該會看到確認 AWS 帳戶 ID 的輸出。

## 步驟 2：建立 SageMaker 執行角色
<a name="nova-sagemaker-inference-step2"></a>

SageMaker 執行角色是一種 IAM 角色，可授予 SageMaker 代表您存取 AWS 資源的許可，例如用於模型成品的 Amazon S3 儲存貯體和用於記錄的 CloudWatch。

**建立執行角色**

**注意**  
建立 IAM 角色需要 `iam:CreateRole`和 `iam:AttachRolePolicy`許可。在繼續之前，請確定您的 IAM 使用者或角色具有這些許可。

下列程式碼會建立具有部署 Amazon Nova 自訂模型必要許可的 IAM 角色：

```
import json

# Create SageMaker Execution Role
role_name = f"SageMakerInference-ExecutionRole-{AWS_ACCOUNT_ID}"

trust_policy = {
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "sagemaker.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }
    ]
}

iam = boto3.client('iam', region_name=REGION)

# Create the role
role_response = iam.create_role(
    RoleName=role_name,
    AssumeRolePolicyDocument=json.dumps(trust_policy),
    Description='SageMaker execution role with S3 and SageMaker access'
)

# Attach required policies
iam.attach_role_policy(
    RoleName=role_name,
    PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'
)

iam.attach_role_policy(
    RoleName=role_name,
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'
)

SAGEMAKER_EXECUTION_ROLE_ARN = role_response['Role']['Arn']
print(f"Created SageMaker execution role: {SAGEMAKER_EXECUTION_ROLE_ARN}")
```

**使用現有的執行角色 （選用）**

如果您已有 SageMaker 執行角色，您可以改用它：

```
# Replace with your existing role ARN
SAGEMAKER_EXECUTION_ROLE_ARN = "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_EXISTING_ROLE_NAME"
```

若要尋找帳戶中現有的 SageMaker 角色：

```
iam = boto3.client('iam', region_name=REGION)
response = iam.list_roles()
sagemaker_roles = [role for role in response['Roles'] if 'SageMaker' in role['RoleName']]
for role in sagemaker_roles:
    print(f"{role['RoleName']}: {role['Arn']}")
```

**重要**  
執行角色必須與 `sagemaker.amazonaws.com`和 具有信任關係，才能存取 Amazon S3 和 SageMaker 資源。

如需 SageMaker 執行角色的詳細資訊，請參閱 [SageMaker 角色](https://docs.aws.amazon.com//sagemaker/latest/dg/sagemaker-roles.html)。

## 步驟 3：設定模型參數
<a name="nova-sagemaker-inference-step3"></a>

設定 Amazon Nova 模型的部署參數。這些設定控制模型行為、資源配置和推論特性。如需每個支援的執行個體類型和支援的 CONTEXT\$1LENGTH 和 MAX\$1CONCURRENCY 值清單，請參閱 [支援的模型和執行個體](nova-model-sagemaker-inference.md#nova-sagemaker-inference-supported)。

**必要參數**
+ `IMAGE`：Amazon Nova 推論容器的 Docker 容器映像 URI。這將由 提供 AWS。
+ `CONTEXT_LENGTH`：模型內容長度。
+ `MAX_CONCURRENCY`：每次反覆運算的序列數目上限；設定在 GPU 上的單一批次內可同時處理個別使用者請求 （提示） 的數量限制。範圍：大於 0 的整數。

**選用的產生參數**
+ `DEFAULT_TEMPERATURE`：控制產生時的隨機性。範圍：0.0 到 2.0 (0.0 = 確定性，較高 = 更多隨機）。
+ `DEFAULT_TOP_P`：Hucleus 取樣閾值用於權杖選取。範圍：1e-10 到 1.0。
+ `DEFAULT_TOP_K`：將權杖選擇限制為最熱門的 K 權杖。範圍：整數 -1 或更高 (-1 = 無限制）。
+ `DEFAULT_MAX_NEW_TOKENS`：回應中要產生的字符數量上限 （即最大輸出字符）。範圍：整數 1 或更高。
+ `DEFAULT_LOGPROBS`：每個字符傳回的日誌機率。範圍：整數 1 到 20。

**設定您的部署**

```
# AWS Configuration
REGION = "us-east-1"  # Must match region from Step 1

# ECR Account mapping by region
ECR_ACCOUNT_MAP = {
    "us-east-1": "708977205387",
    "us-west-2": "176779409107"
}

# Container Image
IMAGE = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:SM-Inference-latest"
print(f"IMAGE = {IMAGE}")

# Model Parameters
CONTEXT_LENGTH = "16000"       # Maximum total context length
MAX_CONCURRENCY = "2"          # Maximum concurrent sequences

# Optional: Default generation parameters (uncomment to use)
DEFAULT_TEMPERATURE = "0.0"   # Deterministic output
DEFAULT_TOP_P = "1.0"         # Consider all tokens
# DEFAULT_TOP_K = "50"        # Uncomment to limit to top 50 tokens
# DEFAULT_MAX_NEW_TOKENS = "2048"  # Uncomment to set max output tokens
# DEFAULT_LOGPROBS = "1"      # Uncomment to enable log probabilities

# Build environment variables for the container
environment = {
    'CONTEXT_LENGTH': CONTEXT_LENGTH,
    'MAX_CONCURRENCY': MAX_CONCURRENCY,
}

# Add optional parameters if defined
if 'DEFAULT_TEMPERATURE' in globals():
    environment['DEFAULT_TEMPERATURE'] = DEFAULT_TEMPERATURE
if 'DEFAULT_TOP_P' in globals():
    environment['DEFAULT_TOP_P'] = DEFAULT_TOP_P
if 'DEFAULT_TOP_K' in globals():
    environment['DEFAULT_TOP_K'] = DEFAULT_TOP_K
if 'DEFAULT_MAX_NEW_TOKENS' in globals():
    environment['DEFAULT_MAX_NEW_TOKENS'] = DEFAULT_MAX_NEW_TOKENS
if 'DEFAULT_LOGPROBS' in globals():
    environment['DEFAULT_LOGPROBS'] = DEFAULT_LOGPROBS

print("Environment configuration:")
for key, value in environment.items():
    print(f"  {key}: {value}")
```

**設定部署特定的參數**

現在為您的 Amazon Nova 模型部署設定特定參數，包括模型成品位置和執行個體類型選擇。

**設定部署識別符**

```
# Deployment identifier - use a descriptive name for your use case
JOB_NAME = "my-nova-deployment"
```

**指定模型成品位置**

提供存放訓練過之 Amazon Nova 模型成品的 Amazon S3 URI。這應該是模型訓練或微調任務的輸出位置。

```
# S3 location of your trained Nova model artifacts
# Replace with your model's S3 URI - must end with /
MODEL_S3_LOCATION = "s3://your-bucket-name/path/to/model/artifacts/"
```

**選取模型變體和執行個體類型**

```
# Configure model variant and instance type
TESTCASE = {
    "model": "lite2",              # Options: micro, lite, lite2
    "instance": "ml.p5.48xlarge"   # Refer to "Supported models and instances" section
}

# Generate resource names
INSTANCE_TYPE = TESTCASE["instance"]
MODEL_NAME = JOB_NAME + "-" + TESTCASE["model"] + "-" + INSTANCE_TYPE.replace(".", "-")
ENDPOINT_CONFIG_NAME = MODEL_NAME + "-Config"
ENDPOINT_NAME = MODEL_NAME + "-Endpoint"

print(f"Model Name: {MODEL_NAME}")
print(f"Endpoint Config: {ENDPOINT_CONFIG_NAME}")
print(f"Endpoint Name: {ENDPOINT_NAME}")
```

**命名慣例**

程式碼會自動為 AWS 資源產生一致的名稱：
+ 模型名稱：`{JOB_NAME}-{model}-{instance-type}`
+ 端點組態： `{MODEL_NAME}-Config`
+ 端點名稱： `{MODEL_NAME}-Endpoint`

## 步驟 4：建立 SageMaker 模型和端點組態
<a name="nova-sagemaker-inference-step4"></a>

在此步驟中，您將建立兩個基本資源：參考 Amazon Nova 模型成品的 SageMaker 模型物件，以及定義模型部署方式的端點組態。

**SageMaker 模型**：封裝推論容器映像、模型成品位置和環境組態的模型物件。這是可重複使用的資源，可部署到多個端點。

**端點組態**：定義部署的基礎設施設定，包括執行個體類型、執行個體計數和模型變體。這可讓您與模型本身分開管理部署設定。

**建立 SageMaker 模型**

下列程式碼會建立參考 Amazon Nova 模型成品的 SageMaker 模型：

```
try:
    model_response = sagemaker.create_model(
        ModelName=MODEL_NAME,
        PrimaryContainer={
            'Image': IMAGE,
            'ModelDataSource': {
                'S3DataSource': {
                    'S3Uri': MODEL_S3_LOCATION,
                    'S3DataType': 'S3Prefix',
                    'CompressionType': 'None'
                }
            },
            'Environment': environment
        },
        ExecutionRoleArn=SAGEMAKER_EXECUTION_ROLE_ARN,
        EnableNetworkIsolation=True
    )
    print("Model created successfully!")
    print(f"Model ARN: {model_response['ModelArn']}")
    
except sagemaker.exceptions.ClientError as e:
    print(f"Error creating model: {e}")
```

重要參數：
+ `ModelName`：模型的唯一識別符
+ `Image`：Amazon Nova 推論的 Docker 容器映像 URI
+ `ModelDataSource`：模型成品的 Amazon S3 位置
+ `Environment`：在步驟 3 中設定的環境變數
+ `ExecutionRoleArn`：步驟 2 中的 IAM 角色
+ `EnableNetworkIsolation`：將 設定為 True 以增強安全性 （防止容器進行傳出網路呼叫）

**建立端點組態**

接著，建立定義部署基礎設施的端點組態：

```
# Create Endpoint Configuration
try:
    production_variant = {
        'VariantName': 'primary',
        'ModelName': MODEL_NAME,
        'InitialInstanceCount': 1,
        'InstanceType': INSTANCE_TYPE,
    }
    
    config_response = sagemaker.create_endpoint_config(
        EndpointConfigName=ENDPOINT_CONFIG_NAME,
        ProductionVariants=[production_variant]
    )
    print("Endpoint configuration created successfully!")
    print(f"Config ARN: {config_response['EndpointConfigArn']}")
    
except sagemaker.exceptions.ClientError as e:
    print(f"Error creating endpoint configuration: {e}")
```

重要參數：
+ `VariantName`：此模型變體的識別符 （單一模型部署使用「主要」)
+ `ModelName`：參考上面建立的模型
+ `InitialInstanceCount`：要部署的執行個體數目 （從 1 開始，視需要稍後擴展）
+ `InstanceType`：在步驟 3 中選取 ML 執行個體類型

**驗證資源建立**

您可以驗證您的資源是否已成功建立：

```
# Describe the model
model_info = sagemaker.describe_model(ModelName=MODEL_NAME)
print(f"Model Status: {model_info['ModelName']} created")

# Describe the endpoint configuration
config_info = sagemaker.describe_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME)
print(f"Endpoint Config Status: {config_info['EndpointConfigName']} created")
```

## 步驟 5：部署端點
<a name="nova-sagemaker-inference-step5"></a>

下一步是透過建立 SageMaker 即時端點來部署 Amazon Nova 模型。此端點會託管您的模型，並提供安全的 HTTPS 端點來提出推論請求。

端點建立通常需要 15-30 分鐘，因為 會 AWS 佈建基礎設施、下載模型成品，以及初始化推論容器。

**建立端點**

```
import time

try:
    endpoint_response = sagemaker.create_endpoint(
        EndpointName=ENDPOINT_NAME,
        EndpointConfigName=ENDPOINT_CONFIG_NAME
    )
    print("Endpoint creation initiated successfully!")
    print(f"Endpoint ARN: {endpoint_response['EndpointArn']}")
except Exception as e:
    print(f"Error creating endpoint: {e}")
```

**監控端點建立**

下列程式碼會輪詢端點狀態，直到部署完成：

```
# Monitor endpoint creation progress
print("Waiting for endpoint creation to complete...")
print("This typically takes 15-30 minutes...\n")

while True:
    try:
        response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)
        status = response['EndpointStatus']
        
        if status == 'Creating':
            print(f"⏳ Status: {status} - Provisioning infrastructure and loading model...")
        elif status == 'InService':
            print(f"✅ Status: {status}")
            print("\nEndpoint creation completed successfully!")
            print(f"Endpoint Name: {ENDPOINT_NAME}")
            print(f"Endpoint ARN: {response['EndpointArn']}")
            break
        elif status == 'Failed':
            print(f"❌ Status: {status}")
            print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}")
            print("\nFull response:")
            print(response)
            break
        else:
            print(f"Status: {status}")
        
    except Exception as e:
        print(f"Error checking endpoint status: {e}")
        break
    
    time.sleep(30)  # Check every 30 seconds
```

**確認端點已就緒**

一旦端點成為 InService，您就可以驗證其組態：

```
# Get detailed endpoint information
endpoint_info = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)

print("\n=== Endpoint Details ===")
print(f"Endpoint Name: {endpoint_info['EndpointName']}")
print(f"Endpoint ARN: {endpoint_info['EndpointArn']}")
print(f"Status: {endpoint_info['EndpointStatus']}")
print(f"Creation Time: {endpoint_info['CreationTime']}")
print(f"Last Modified: {endpoint_info['LastModifiedTime']}")

# Get endpoint config for instance type details
endpoint_config_name = endpoint_info['EndpointConfigName']
endpoint_config = sagemaker.describe_endpoint_config(EndpointConfigName=endpoint_config_name)

# Display production variant details
for variant in endpoint_info['ProductionVariants']:
    print(f"\nProduction Variant: {variant['VariantName']}")
    print(f"  Current Instance Count: {variant['CurrentInstanceCount']}")
    print(f"  Desired Instance Count: {variant['DesiredInstanceCount']}")
    # Get instance type from endpoint config
    for config_variant in endpoint_config['ProductionVariants']:
        if config_variant['VariantName'] == variant['VariantName']:
            print(f"  Instance Type: {config_variant['InstanceType']}")
            break
```

**對端點建立失敗進行故障診斷**

常見的失敗原因：
+ **容量不足**：請求的執行個體類型在您的區域無法使用
  + 解決方案：嘗試不同的執行個體類型或請求提高配額
+ **IAM 許可**：執行角色缺少必要的許可
  + 解決方案：確認角色可存取 Amazon S3 模型成品和必要的 SageMaker 許可
+ **找不到模型成品**：Amazon S3 URI 不正確或無法存取
  + 解決方案：驗證 Amazon S3 URI 並檢查儲存貯體許可，確保您位於正確的區域
+ **資源限制**：端點或執行個體超過帳戶限制
  + 解決方案：透過 Service Quotas 或 Support 請求提高服務配額 AWS 

**注意**  
如果您需要刪除失敗的端點並重新開始：  

```
sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)
```

## 步驟 6：叫用端點
<a name="nova-sagemaker-inference-step6"></a>

一旦您的端點為 InService，您就可以傳送推論請求，以從 Amazon Nova 模型產生預測。SageMaker 支援同步端點 （即時串流/非串流模式） 和非同步端點 （以 Amazon S3 為基礎進行批次處理）。

**設定執行時間用戶端**

使用適當的逾時設定建立 SageMaker 執行期用戶端：

```
import json
import boto3
import botocore
from botocore.exceptions import ClientError

# Configure client with appropriate timeouts
config = botocore.config.Config(
    read_timeout=120,      # Maximum time to wait for response
    connect_timeout=10,    # Maximum time to establish connection
    retries={'max_attempts': 3}  # Number of retry attempts
)

# Create SageMaker Runtime client
runtime_client = boto3.client('sagemaker-runtime', config=config, region_name=REGION)
```

**建立通用推論函數**

下列 函數會同時處理串流和非串流請求：

```
def invoke_nova_endpoint(request_body):
    """
    Invoke Nova endpoint with automatic streaming detection.
    
    Args:
        request_body (dict): Request payload containing prompt and parameters
    
    Returns:
        dict: Response from the model (for non-streaming requests)
        None: For streaming requests (prints output directly)
    """
    body = json.dumps(request_body)
    is_streaming = request_body.get("stream", False)
    
    try:
        print(f"Invoking endpoint ({'streaming' if is_streaming else 'non-streaming'})...")
        
        if is_streaming:
            response = runtime_client.invoke_endpoint_with_response_stream(
                EndpointName=ENDPOINT_NAME,
                ContentType='application/json',
                Body=body
            )
            
            event_stream = response['Body']
            for event in event_stream:
                if 'PayloadPart' in event:
                    chunk = event['PayloadPart']
                    if 'Bytes' in chunk:
                        data = chunk['Bytes'].decode()
                        print("Chunk:", data)
        else:
            # Non-streaming inference
            response = runtime_client.invoke_endpoint(
                EndpointName=ENDPOINT_NAME,
                ContentType='application/json',
                Accept='application/json',
                Body=body
            )
            
            response_body = response['Body'].read().decode('utf-8')
            result = json.loads(response_body)
            print("✅ Response received successfully")
            return result
    
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        print(f"❌ AWS Error: {error_code} - {error_message}")
    except Exception as e:
        print(f"❌ Unexpected error: {str(e)}")
```

**範例 1：非串流聊天完成**

使用聊天格式進行對話互動：

```
# Non-streaming chat request
chat_request = {
    "messages": [
        {"role": "user", "content": "Hello! How are you?"}
    ],
    "max_tokens": 100,
    "max_completion_tokens": 100,  # Alternative to max_tokens
    "stream": False,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "logprobs": True,
    "top_logprobs": 3,
    "reasoning_effort": "low",  # Options: "low", "high"
    "allowed_token_ids": None,  # List of allowed token IDs
    "truncate_prompt_tokens": None,  # Truncate prompt to this many tokens
    "stream_options": None
}

response = invoke_nova_endpoint(chat_request)
```

**回應範例：**

```
{
    "id": "chatcmpl-123456",
    "object": "chat.completion",
    "created": 1234567890,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?"
            },
            "logprobs": {
                "content": [
                    {
                        "token": "Hello",
                        "logprob": -0.123,
                        "top_logprobs": [
                            {"token": "Hello", "logprob": -0.123},
                            {"token": "Hi", "logprob": -2.456},
                            {"token": "Hey", "logprob": -3.789}
                        ]
                    }
                    # Additional tokens...
                ]
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 12,
        "completion_tokens": 28,
        "total_tokens": 40
    }
}
```

**範例 2：簡單文字完成**

使用完成格式產生簡單的文字：

```
# Simple completion request
completion_request = {
    "prompt": "The capital of France is",
    "max_tokens": 50,
    "stream": False,
    "temperature": 0.0,
    "top_p": 1.0,
    "top_k": -1,  # -1 means no limit
    "logprobs": 3,  # Number of log probabilities to return
    "allowed_token_ids": None,  # List of allowed token IDs
    "truncate_prompt_tokens": None,  # Truncate prompt to this many tokens
    "stream_options": None
}

response = invoke_nova_endpoint(completion_request)
```

**回應範例：**

```
{
    "id": "cmpl-789012",
    "object": "text_completion",
    "created": 1234567890,
    "model": "default",
    "choices": [
        {
            "text": " Paris.",
            "index": 0,
            "logprobs": {
                "tokens": [" Paris", "."],
                "token_logprobs": [-0.001, -0.002],
                "top_logprobs": [
                    {" Paris": -0.001, " London": -5.234, " Rome": -6.789},
                    {".": -0.002, ",": -4.567, "!": -7.890}
                ]
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 6,
        "completion_tokens": 2,
        "total_tokens": 8
    }
}
```

**範例 3：串流聊天完成**

```
# Streaming chat request
streaming_request = {
    "messages": [
        {"role": "user", "content": "Tell me a short story about a robot"}
    ],
    "max_tokens": 200,
    "stream": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "logprobs": True,
    "top_logprobs": 2,
    "reasoning_effort": "high",  # For more detailed reasoning
    "stream_options": {"include_usage": True}
}

invoke_nova_endpoint(streaming_request)
```

**串流輸出範例：**

```
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":null},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" Once","reasoning_content":null},"logprobs":{"content":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101],"top_logprobs":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101]},{"token":"\u2581In","logprob":-0.7864127159118652,"bytes":[226,150,129,73,110]}]}]},"finish_reason":null,"token_ids":null}]}
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" upon","reasoning_content":null},"logprobs":{"content":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110],"top_logprobs":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110]},{"token":"\u2581a","logprob":-6.789,"bytes":[226,150,129,97]}]}]},"finish_reason":null,"token_ids":null}]}
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" a","reasoning_content":null},"logprobs":{"content":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97],"top_logprobs":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97]},{"token":"\u2581time","logprob":-9.123,"bytes":[226,150,129,116,105,109,101]}]}]},"finish_reason":null,"token_ids":null}]}
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" time","reasoning_content":null},"logprobs":{"content":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101],"top_logprobs":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101]},{"token":",","logprob":-6.012,"bytes":[44]}]}]},"finish_reason":null,"token_ids":null}]}

# Additional chunks...

Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":87,"total_tokens":102}}
Chunk: data: [DONE]
```

**範例 4：多模式聊天完成**

針對影像和文字輸入使用多模式格式：

```
# Multimodal chat request (if supported by your model)
multimodal_request = {
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
            ]
        }
    ],
    "max_tokens": 150,
    "temperature": 0.3,
    "top_p": 0.8,
    "stream": False
}

response = invoke_nova_endpoint(multimodal_request)
```

**回應範例：**

```
{
    "id": "chatcmpl-345678",
    "object": "chat.completion",
    "created": 1234567890,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The image shows..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 1250,
        "completion_tokens": 45,
        "total_tokens": 1295
    }
}
```

## 步驟 7：清除資源 （選用）
<a name="nova-sagemaker-inference-step7"></a>

為了避免產生不必要的費用，請刪除您在本教學課程中建立 AWS 的資源。SageMaker 端點會在執行時產生費用，即使您沒有主動提出推論請求。

**重要**  
刪除資源是永久的，無法復原。在繼續之前，請確定您不再需要這些資源。

**刪除端點**

```
import boto3

# Initialize SageMaker client
sagemaker = boto3.client('sagemaker', region_name=REGION)

try:
    print("Deleting endpoint...")
    sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)
    print(f"✅ Endpoint '{ENDPOINT_NAME}' deletion initiated")
    print("Charges will stop once deletion completes (typically 2-5 minutes)")
except Exception as e:
    print(f"❌ Error deleting endpoint: {e}")
```

**注意**  
端點刪除是非同步的。您可以監控刪除狀態：  

```
import time

print("Monitoring endpoint deletion...")
while True:
    try:
        response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME)
        status = response['EndpointStatus']
        print(f"Status: {status}")
        time.sleep(10)
    except sagemaker.exceptions.ClientError as e:
        if e.response['Error']['Code'] == 'ValidationException':
            print("✅ Endpoint successfully deleted")
            break
        else:
            print(f"Error: {e}")
            break
```

**刪除端點組態**

刪除端點後，移除端點組態：

```
try:
    print("Deleting endpoint configuration...")
    sagemaker.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME)
    print(f"✅ Endpoint configuration '{ENDPOINT_CONFIG_NAME}' deleted")
except Exception as e:
    print(f"❌ Error deleting endpoint configuration: {e}")
```

**刪除模型**

移除 SageMaker 模型物件：

```
try:
    print("Deleting model...")
    sagemaker.delete_model(ModelName=MODEL_NAME)
    print(f"✅ Model '{MODEL_NAME}' deleted")
except Exception as e:
    print(f"❌ Error deleting model: {e}")
```