开始使用 - Amazon Nova

开始使用

本指南介绍如何在 SageMaker 实时端点上部署自定义 Amazon Nova 模型、配置推理参数,并调用模型进行测试。

先决条件

在 SageMaker 推理上部署 Amazon Nova 模型需满足以下先决条件:

  • 创建 AWS 账户 账户:如尚无账户,请参阅创建 AWS 账户

  • 所需 IAM 权限:确保 IAM 用户或角色已附加以下托管策略:

    • AmazonSageMakerFullAccess

    • AmazonS3FullAccess

  • 所需 SDK/CLI 版本:以下 SDK 版本已在 SageMaker 推理上通过 Amazon Nova 模型的测试与验证:

    • 适用于基于资源的 API 方式:SageMaker Python SDK v3.0.0+ (sagemaker>=3.0.0)

    • 适用于直接 API 调用:Boto3 版本 1.35.0+ (boto3>=1.35.0)。本指南中的示例均采用此方式。

步骤 1:配置 AWS 凭证

使用以下方法之一管理配置 AWS 凭证:

选项 1:AWS CLI(建议)

aws configure

提示后,输入 AWS 访问密钥 ID、私有密钥和默认区域名称。

选项 2:AWS 凭证文件

创建或编辑 ~/.aws/credentials

[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY

选项 3:环境变量

export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key
注意

有关 AWS 凭证的更多信息,请参阅配置与凭证文件设置

初始化 AWS 客户端

创建包含以下代码的 Python 脚本或笔记本文件,用于初始化 AWS SDK 并验证凭证:

import boto3 # AWS Configuration - Update these for your environment REGION = "us-east-1" # Supported regions: us-east-1, us-west-2 AWS_ACCOUNT_ID = "YOUR_ACCOUNT_ID" # Replace with your AWS account ID # Initialize AWS clients using default credential chain sagemaker = boto3.client('sagemaker', region_name=REGION) sts = boto3.client('sts') # Verify credentials try: identity = sts.get_caller_identity() print(f"Successfully authenticated to AWS Account: {identity['Account']}") if identity['Account'] != AWS_ACCOUNT_ID: print(f"Warning: Connected to account {identity['Account']}, expected {AWS_ACCOUNT_ID}") except Exception as e: print(f"Failed to authenticate: {e}") print("Please verify your credentials are configured correctly.")

如果身份验证成功,您会看到输出结果中包含自己的 AWS 账户 ID 确认信息。

步骤 2:创建 SageMaker 执行角色

SageMaker 执行角色是一种 IAM 角色,用于授予 SageMaker 权限,使其能够代表您访问 AWS 资源(例如存储模型构件的 Amazon S3 存储桶、用于日志记录的 CloudWatch)。

创建执行角色

注意

创建 IAM 角色需要具备 iam:CreateRoleiam:AttachRolePolicy 权限。在继续操作前,确保自己的 IAM 用户或角色已拥有这些权限。

以下代码将创建一个具备部署 Amazon Nova 自定义模型所需权限的 IAM 角色:

import json # Create SageMaker Execution Role role_name = f"SageMakerInference-ExecutionRole-{AWS_ACCOUNT_ID}" trust_policy = { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": {"Service": "sagemaker.amazonaws.com"}, "Action": "sts:AssumeRole" } ] } iam = boto3.client('iam', region_name=REGION) # Create the role role_response = iam.create_role( RoleName=role_name, AssumeRolePolicyDocument=json.dumps(trust_policy), Description='SageMaker execution role with S3 and SageMaker access' ) # Attach required policies iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess' ) iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess' ) SAGEMAKER_EXECUTION_ROLE_ARN = role_response['Role']['Arn'] print(f"Created SageMaker execution role: {SAGEMAKER_EXECUTION_ROLE_ARN}")

使用现有执行角色(可选)

如果已有 SageMaker 执行角色,可直接复用:

# Replace with your existing role ARN SAGEMAKER_EXECUTION_ROLE_ARN = "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_EXISTING_ROLE_NAME"

查找账户中现有的 SageMaker 角色:

iam = boto3.client('iam', region_name=REGION) response = iam.list_roles() sagemaker_roles = [role for role in response['Roles'] if 'SageMaker' in role['RoleName']] for role in sagemaker_roles: print(f"{role['RoleName']}: {role['Arn']}")
重要

执行角色必须与 sagemaker.amazonaws.com 建立信任关系,且拥有访问 Amazon S3 和 SageMaker 资源的权限。

有关 SageMaker 执行角色的更多信息,请参阅 SageMaker 角色

步骤 3:配置模型参数

为您的 Amazon Nova 模型配置部署参数。这些设置将控制模型行为、资源分配及推理特性。

必填参数

  • IMAGE:Amazon Nova 推理容器的 Docker 容器映像 URI。该地址由 AWS 提供。

  • CONTEXT_LENGTH:模型上下文长度。

  • MAX_CONCURRENCY:每轮迭代的最大序列数;用于限制 GPU 上单批次可并发处理的独立用户请求(提示词)数量。取值范围:大于 0 的整数。

可选生成参数

  • DEFAULT_TEMPERATURE:控制生成内容的随机性。取值范围:0.0 到 2.0(0.0 = 确定性生成,数值越高随机性越强)。

  • DEFAULT_TOP_P:核采样词元选择阈值。取值范围:1e-10 到 1.0。

  • DEFAULT_TOP_K:将词元选择范围限制为概率最高的前 K 个词元。取值范围:大于等于 -1 的整数(-1 = 无限制)。

  • DEFAULT_MAX_NEW_TOKENS::响应中生成的最大词元数(即最大输出词元数)。取值范围:大于等于 1 的整数。

  • DEFAULT_LOGPROBS:每个词元返回的对数概率数量。取值范围:1 到 20 的整数。

配置部署

# AWS Configuration REGION = "us-east-1" # Must match region from Step 1 # ECR Account mapping by region ECR_ACCOUNT_MAP = { "us-east-1": "708977205387", "us-west-2": "176779409107" } # Container Image - Replace with the image URI provided by your AWS contact # Two image tags are available (both point to the same image): IMAGE_LATEST = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:SM-Inference-latest" IMAGE_VERSIONED = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:v1.0.0" # Use the versioned tag for production deployments (recommended) IMAGE = IMAGE_VERSIONED print(f"IMAGE = {IMAGE}") print(f"Available tags:") print(f" Latest: {IMAGE_LATEST}") print(f" Versioned: {IMAGE_VERSIONED}") # Model Parameters CONTEXT_LENGTH = "8000" # Maximum total context length MAX_CONCURRENCY = "16" # Maximum concurrent sequences # Optional: Default generation parameters (uncomment to use) DEFAULT_TEMPERATURE = "0.0" # Deterministic output DEFAULT_TOP_P = "1.0" # Consider all tokens # DEFAULT_TOP_K = "50" # Uncomment to limit to top 50 tokens # DEFAULT_MAX_NEW_TOKENS = "2048" # Uncomment to set max output tokens # DEFAULT_LOGPROBS = "1" # Uncomment to enable log probabilities # Build environment variables for the container environment = { 'CONTEXT_LENGTH': CONTEXT_LENGTH, 'MAX_CONCURRENCY': MAX_CONCURRENCY, } # Add optional parameters if defined if 'DEFAULT_TEMPERATURE' in globals(): environment['DEFAULT_TEMPERATURE'] = DEFAULT_TEMPERATURE if 'DEFAULT_TOP_P' in globals(): environment['DEFAULT_TOP_P'] = DEFAULT_TOP_P if 'DEFAULT_TOP_K' in globals(): environment['DEFAULT_TOP_K'] = DEFAULT_TOP_K if 'DEFAULT_MAX_NEW_TOKENS' in globals(): environment['DEFAULT_MAX_NEW_TOKENS'] = DEFAULT_MAX_NEW_TOKENS if 'DEFAULT_LOGPROBS' in globals(): environment['DEFAULT_LOGPROBS'] = DEFAULT_LOGPROBS print("Environment configuration:") for key, value in environment.items(): print(f" {key}: {value}")

配置部署专属参数

接下来为您的 Amazon Nova 模型部署配置专属参数,包括模型构件存储位置及实例类型选择。

设置部署标识符

# Deployment identifier - use a descriptive name for your use case JOB_NAME = "my-nova-deployment"

指定模型构件存储位置

提供训练完成的 Amazon Nova 模型构件所在的 Amazon S3 URI。该地址应为模型训练或微调作业的输出位置。

# S3 location of your trained Nova model artifacts # Replace with your model's S3 URI - must end with / MODEL_S3_LOCATION = "s3://your-bucket-name/path/to/model/artifacts/"

选择模型变体与实例类型

# Configure model variant and instance type TESTCASE = { "model": "micro", # Options: micro, lite, lite2 "instance": "ml.g5.12xlarge" # Refer to "Supported models and instances" section } # Generate resource names INSTANCE_TYPE = TESTCASE["instance"] MODEL_NAME = JOB_NAME + "-" + TESTCASE["model"] + "-" + INSTANCE_TYPE.replace(".", "-") ENDPOINT_CONFIG_NAME = MODEL_NAME + "-Config" ENDPOINT_NAME = MODEL_NAME + "-Endpoint" print(f"Model Name: {MODEL_NAME}") print(f"Endpoint Config: {ENDPOINT_CONFIG_NAME}") print(f"Endpoint Name: {ENDPOINT_NAME}")

命名规范

代码会自动为 AWS 资源生成统一的命名:

  • 模型名称:{JOB_NAME}-{model}-{instance-type}

  • 端点配置:{MODEL_NAME}-Config

  • 端点名称:{MODEL_NAME}-Endpoint

步骤 4:创建 SageMaker 模型与端点配置

本步骤中,您将创建两项核心资源:一是关联 Amazon Nova 模型构件的 SageMaker 模型对象,二是定义模型部署方式的端点配置。

SageMaker 模型:封装推理容器映像、模型构件存储位置及环境配置的模型对象。该资源可复用,能部署到多个端点。

端点配置:定义部署的基础设施设置,包括实例类型、实例数量、模型变体。通过该配置,您可将部署设置与模型本身分开管理。

创建 SageMaker 模型

以下代码将创建一个关联您 Amazon Nova 模型构件的 SageMaker 模型:

try: model_response = sagemaker.create_model( ModelName=MODEL_NAME, PrimaryContainer={ 'Image': IMAGE, 'ModelDataSource': { 'S3DataSource': { 'S3Uri': MODEL_S3_LOCATION, 'S3DataType': 'S3Prefix', 'CompressionType': 'None' } }, 'Environment': environment }, ExecutionRoleArn=SAGEMAKER_EXECUTION_ROLE_ARN, EnableNetworkIsolation=True ) print("Model created successfully!") print(f"Model ARN: {model_response['ModelArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating model: {e}")

关键参数:

  • ModelName:模型的唯一标识符

  • Image:Amazon Nova 推理所用的 Docker 容器映像 URI

  • ModelDataSource:模型构件的 Amazon S3 存储位置

  • Environment:步骤 3 中配置的环境变量

  • ExecutionRoleArn:步骤 2 中创建的 IAM 角色 ARN

  • EnableNetworkIsolation:设为 True 可增强安全性(禁止容器发起出站网络请求)

创建端点配置

接下来,创建定义部署基础设施的端点配置:

# Create Endpoint Configuration try: production_variant = { 'VariantName': 'primary', 'ModelName': MODEL_NAME, 'InitialInstanceCount': 1, 'InstanceType': INSTANCE_TYPE, } config_response = sagemaker.create_endpoint_config( EndpointConfigName=ENDPOINT_CONFIG_NAME, ProductionVariants=[production_variant] ) print("Endpoint configuration created successfully!") print(f"Config ARN: {config_response['EndpointConfigArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating endpoint configuration: {e}")

关键参数:

  • VariantName:该模型变体的标识符(单模型部署时使用“primary”)

  • ModelName:关联上文创建的模型

  • InitialInstanceCount:部署的实例数量(初始可设为 1,后续则按需扩缩)

  • InstanceType:步骤 3 中选定的机器学习实例类型

验证资源创建

您可验证资源是否创建成功:

# Describe the model model_info = sagemaker.describe_model(ModelName=MODEL_NAME) print(f"Model Status: {model_info['ModelName']} created") # Describe the endpoint configuration config_info = sagemaker.describe_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"Endpoint Config Status: {config_info['EndpointConfigName']} created")

步骤 5:部署端点

下一步,您将通过创建 SageMaker 实时端点来部署 Amazon Nova 模型。该端点将托管您的模型,并提供安全的 HTTPS 端点用于发起推理请求。

端点创建通常需要 15–30 分钟,此过程中 AWS 会完成基础设施配置、模型构件下载及推理容器初始化。

创建端点

import time try: endpoint_response = sagemaker.create_endpoint( EndpointName=ENDPOINT_NAME, EndpointConfigName=ENDPOINT_CONFIG_NAME ) print("Endpoint creation initiated successfully!") print(f"Endpoint ARN: {endpoint_response['EndpointArn']}") except Exception as e: print(f"Error creating endpoint: {e}")

监控端点创建进度

以下代码会轮询端点状态,直至部署完成:

# Monitor endpoint creation progress print("Waiting for endpoint creation to complete...") print("This typically takes 15-30 minutes...\n") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] if status == 'Creating': print(f"⏳ Status: {status} - Provisioning infrastructure and loading model...") elif status == 'InService': print(f"✅ Status: {status}") print("\nEndpoint creation completed successfully!") print(f"Endpoint Name: {ENDPOINT_NAME}") print(f"Endpoint ARN: {response['EndpointArn']}") break elif status == 'Failed': print(f"❌ Status: {status}") print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}") print("\nFull response:") print(response) break else: print(f"Status: {status}") except Exception as e: print(f"Error checking endpoint status: {e}") break time.sleep(30) # Check every 30 seconds

验证端点就绪状态

当端点状态变为 InService 后,您可验证其配置信息:

# Get detailed endpoint information endpoint_info = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) print("\n=== Endpoint Details ===") print(f"Endpoint Name: {endpoint_info['EndpointName']}") print(f"Endpoint ARN: {endpoint_info['EndpointArn']}") print(f"Status: {endpoint_info['EndpointStatus']}") print(f"Creation Time: {endpoint_info['CreationTime']}") print(f"Last Modified: {endpoint_info['LastModifiedTime']}") # Get endpoint config for instance type details endpoint_config_name = endpoint_info['EndpointConfigName'] endpoint_config = sagemaker.describe_endpoint_config(EndpointConfigName=endpoint_config_name) # Display production variant details for variant in endpoint_info['ProductionVariants']: print(f"\nProduction Variant: {variant['VariantName']}") print(f" Current Instance Count: {variant['CurrentInstanceCount']}") print(f" Desired Instance Count: {variant['DesiredInstanceCount']}") # Get instance type from endpoint config for config_variant in endpoint_config['ProductionVariants']: if config_variant['VariantName'] == variant['VariantName']: print(f" Instance Type: {config_variant['InstanceType']}") break

排查端点创建失败问题

常见失败原因:

  • 容量不足:请求的实例类型在您的区域暂不可用

    • 解决方法:尝试更换实例类型,或提交配额提升申请

  • IAM 权限问题:执行角色缺少必要权限

    • 解决方法:验证该角色是否拥有访问 Amazon S3 模型构件的权限,以及 SageMaker 相关必要权限

  • 未找到模型构件:Amazon S3 URI 错误或无法访问

    • 解决方法:验证 Amazon S3 URI 正确性,检查存储桶权限,并确认当前操作的区域正确

  • 资源限额超限:账户的端点/实例数量超出服务限额

    • 解决方法:通过“服务配额”或 AWS Support 提交配额提升申请

注意

若需删除创建失败的端点并重新部署:

sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)

步骤 6:调用端点

当端点状态变为 InService 后,即可发送推理请求,通过 Amazon Nova 模型生成预测结果。SageMaker 支持两类端点调用方式:同步端点(含流式/非流式模式的实时调用)和异步端点(基于 Amazon S3 的批量处理)。

设置运行时客户端

创建带有合理超时设置的 SageMaker Runtime 客户端:

import json import boto3 import botocore from botocore.exceptions import ClientError # Configure client with appropriate timeouts config = botocore.config.Config( read_timeout=120, # Maximum time to wait for response connect_timeout=10, # Maximum time to establish connection retries={'max_attempts': 3} # Number of retry attempts ) # Create SageMaker Runtime client runtime_client = boto3.client('sagemaker-runtime', config=config, region_name=REGION)

编写通用推理函数

以下函数可同时处理流式与非流式请求:

def invoke_nova_endpoint(request_body): """ Invoke Nova endpoint with automatic streaming detection. Args: request_body (dict): Request payload containing prompt and parameters Returns: dict: Response from the model (for non-streaming requests) None: For streaming requests (prints output directly) """ body = json.dumps(request_body) is_streaming = request_body.get("stream", False) try: print(f"Invoking endpoint ({'streaming' if is_streaming else 'non-streaming'})...") if is_streaming: response = runtime_client.invoke_endpoint_with_response_stream( EndpointName=ENDPOINT_NAME, ContentType='application/json', Body=body ) event_stream = response['Body'] for event in event_stream: if 'PayloadPart' in event: chunk = event['PayloadPart'] if 'Bytes' in chunk: data = chunk['Bytes'].decode() print("Chunk:", data) else: # Non-streaming inference response = runtime_client.invoke_endpoint( EndpointName=ENDPOINT_NAME, ContentType='application/json', Accept='application/json', Body=body ) response_body = response['Body'].read().decode('utf-8') result = json.loads(response_body) print("✅ Response received successfully") return result except ClientError as e: error_code = e.response['Error']['Code'] error_message = e.response['Error']['Message'] print(f"❌ AWS Error: {error_code} - {error_message}") except Exception as e: print(f"❌ Unexpected error: {str(e)}")

示例 1:非流式对话补全

采用对话格式实现多轮交互:

# Non-streaming chat request chat_request = { "messages": [ {"role": "user", "content": "Hello! How are you?"} ], "max_tokens": 100, "max_completion_tokens": 100, # Alternative to max_tokens "stream": False, "temperature": 0.7, "top_p": 0.9, "top_k": 50, "logprobs": True, "top_logprobs": 3, "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(chat_request)

示例 2:简单文本补全

采用补全格式实现基础文本生成:

# Simple completion request completion_request = { "prompt": "The capital of France is", "max_tokens": 50, "stream": False, "temperature": 0.0, "top_p": 1.0, "top_k": -1, # -1 means no limit "logprobs": 3, # Number of log probabilities to return "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(completion_request)

示例 3:流式对话补全

# Streaming chat request streaming_request = { "messages": [ {"role": "user", "content": "Tell me a short story about a robot"} ], "max_tokens": 200, "stream": True, "temperature": 0.7, "top_p": 0.95, "top_k": 40, "logprobs": True, "top_logprobs": 2, "stream_options": {"include_usage": True} } invoke_nova_endpoint(streaming_request)

示例 4:多模态对话补全

采用多模态格式处理图像与文本混合输入:

# Multimodal chat request (if supported by your model) multimodal_request = { "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": 150, "temperature": 0.3, "top_p": 0.8, "stream": False } response = invoke_nova_endpoint(multimodal_request)

步骤 7:清理资源(可选)

为避免产生不必要的费用,请删除本教程中创建的 AWS 资源。SageMaker 端点在运行期间会持续计费,即使您未主动发起推理请求。

重要

删除资源是永久操作,无法撤销。继续操作之前,确认不再需要这些资源。

删除端点

import boto3 # Initialize SageMaker client sagemaker = boto3.client('sagemaker', region_name=REGION) try: print("Deleting endpoint...") sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME) print(f"✅ Endpoint '{ENDPOINT_NAME}' deletion initiated") print("Charges will stop once deletion completes (typically 2-5 minutes)") except Exception as e: print(f"❌ Error deleting endpoint: {e}")
注意

端点删除为异步操作。您可以监控删除状态:

import time print("Monitoring endpoint deletion...") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] print(f"Status: {status}") time.sleep(10) except sagemaker.exceptions.ClientError as e: if e.response['Error']['Code'] == 'ValidationException': print("✅ Endpoint successfully deleted") break else: print(f"Error: {e}") break

删除现有端点配置

端点删除完成后,删除端点配置:

try: print("Deleting endpoint configuration...") sagemaker.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"✅ Endpoint configuration '{ENDPOINT_CONFIG_NAME}' deleted") except Exception as e: print(f"❌ Error deleting endpoint configuration: {e}")

删除模型

删除 SageMaker 模型对象:

try: print("Deleting model...") sagemaker.delete_model(ModelName=MODEL_NAME) print(f"✅ Model '{MODEL_NAME}' deleted") except Exception as e: print(f"❌ Error deleting model: {e}")