

# API 参考
<a name="nova-sagemaker-inference-api-reference"></a>

SageMaker 上的 Amazon Nova 模型，使用标准 SageMaker Runtime API 进行推理。有关完整的 API 文档，请参阅[测试已部署模型](https://docs.aws.amazon.com//sagemaker/latest/dg/realtime-endpoints-test-endpoints.html)。

## 端点调用
<a name="nova-sagemaker-inference-api-invocation"></a>

SageMaker 上的 Amazon Nova 模型支持两种调用方式：
+ **同步调用**：使用 [InvokeEndpoint API](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) 处理实时、非流式推理请求。
+ **流式调用**：使用 [InvokeEndpointWithResponseStream API](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html) 处理实时流式推理请求。

## 请求格式
<a name="nova-sagemaker-inference-api-request"></a>

Amazon Nova 模型支持三种请求格式：

**对话补全格式**

该格式用于对话交互：

```
{
  "messages": [
    {"role": "user", "content": "string"}
  ],
  "max_tokens": integer,
  "max_completion_tokens": integer,
  "stream": boolean,
  "temperature": float,
  "top_p": float,
  "top_k": integer,
  "logprobs": boolean,
  "top_logprobs": integer,
  "reasoning_effort": "low" | "high",
  "allowed_token_ids": [integer],
  "truncate_prompt_tokens": integer,
  "stream_options": {
    "include_usage": boolean
  }
}
```

**文本补全格式**

该格式用于简单文本生成：

```
{
  "prompt": "string",
  "max_tokens": integer,
  "stream": boolean,
  "temperature": float,
  "top_p": float,
  "top_k": integer,
  "logprobs": integer,
  "allowed_token_ids": [integer],
  "truncate_prompt_tokens": integer,
  "stream_options": {
    "include_usage": boolean
  }
}
```

**多模态对话补全格式**

该格式用于图像与文本混合输入：

```
{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
      ]
    }
  ],
  "max_tokens": integer,
  "temperature": float,
  "top_p": float,
  "stream": boolean
}
```

**请求参数**
+ `messages`（数组）：用于对话补全格式。由包含 `role` 和 `content` 字段的消息对象数组组成。content 为字符串表示纯文本输入，为数组表示多模态输入。
+ `prompt`（字符串）：用于文本补全格式。用于生成内容的输入文本。
+ `max_tokens`（整数）：响应中生成的最大词元数。取值范围：≥ 1。
+ `max_completion_tokens`（整数）：max\$1tokens 的替代参数，用于对话补全。生成的最大补全词元数。
+ `temperature`（浮点数）：控制生成内容的随机性。取值范围：0.0 到 2.0（0.0 = 确定性生成，2.0 = 最大随机性）。
+ `top_p`（浮点数）：核采样阈值。取值范围：1e-10 到 1.0。
+ `top_k`（整数）：将词元选择范围限制为概率最高的前 K 个词元。取值范围：大于等于 -1（-1 = 无限制）。
+ `stream`（布尔值）：是否流式返回响应。`true` 为流式，`false` 为非流式。
+ `logprobs`（布尔值/整数）：对话补全使用布尔值。文本补全使用整数，表示返回的对数概率数量。取值范围：1 到 20。
+ `top_logprobs`（整数）：返回对数概率的概率最高词元数量（仅对话补全）。
+ `reasoning_effort`（字符串）：推理强度等级。选项：low、high（仅限 Nova 2 Lite 自定义模型的聊天补全）。
+ `allowed_token_ids`（数组）：允许生成的词元 ID 列表。用于将输出限制为指定词元。
+ `truncate_prompt_tokens`（整数）：若提示词超出限制，则截断为该词元数量。
+ `stream_options`（对象）：流式响应选项。包含布尔值 `include_usage`，用于在流式响应中包含词元用量信息。

## 响应格式
<a name="nova-sagemaker-inference-api-response"></a>

响应格式取决于调用方式与请求类型：

**对话补全响应（非流式）**

适用于同步对话补全请求：

```
{
  "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?",
        "refusal": null,
        "reasoning": null,
        "reasoning_content": null
      },
      "logprobs": {
        "content": [
          {
            "token": "Hello",
            "logprob": -0.31725305,
            "bytes": [72, 101, 108, 108, 111],
            "top_logprobs": [
              {
                "token": "Hello",
                "logprob": -0.31725305,
                "bytes": [72, 101, 108, 108, 111]
              },
              {
                "token": "Hi",
                "logprob": -1.3190403,
                "bytes": [72, 105]
              }
            ]
          }
        ]
      },
      "finish_reason": "stop",
      "stop_reason": null,
      "token_ids": [9906, 0, 358, 2157, 1049, 11, 1309, 345, 369, 6464, 13]
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  },
  "prompt_token_ids": [9906, 0, 358]
}
```

**文本补全响应（非流式）**

适用于同步文本补全请求：

```
{
  "id": "cmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "text_completion",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "text": "Paris, the capital and most populous city of France.",
      "logprobs": {
        "tokens": ["Paris", ",", " the", " capital"],
        "token_logprobs": [-0.31725305, -0.07918124, -0.12345678, -0.23456789],
        "top_logprobs": [
          {
            "Paris": -0.31725305,
            "London": -1.3190403,
            "Rome": -2.1234567
          },
          {
            ",": -0.07918124,
            " is": -1.2345678
          }
        ]
      },
      "finish_reason": "stop",
      "stop_reason": null,
      "prompt_token_ids": [464, 6864, 315, 4881, 374],
      "token_ids": [3915, 11, 279, 6864, 323, 1455, 95551, 3363, 315, 4881, 13]
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 11,
    "total_tokens": 16,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
```

**对话补全流式响应**

适用于流式对话补全请求，响应以服务器发送事件（SSE）形式返回：

```
data: {
  "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "Hello",
        "refusal": null,
        "reasoning": null,
        "reasoning_content": null
      },
      "logprobs": {
        "content": [
          {
            "token": "Hello",
            "logprob": -0.31725305,
            "bytes": [72, 101, 108, 108, 111],
            "top_logprobs": [
              {
                "token": "Hello",
                "logprob": -0.31725305,
                "bytes": [72, 101, 108, 108, 111]
              }
            ]
          }
        ]
      },
      "finish_reason": null,
      "stop_reason": null
    }
  ],
  "usage": null,
  "prompt_token_ids": null
}

data: {
  "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "! I'm"
      },
      "logprobs": null,
      "finish_reason": null,
      "stop_reason": null
    }
  ],
  "usage": null
}

data: {
  "id": "chatcmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

data: [DONE]
```

**文本补全流式响应**

适用于流式文本补全请求：

```
data: {
  "id": "cmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "text_completion",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "text": "Paris",
      "logprobs": {
        "tokens": ["Paris"],
        "token_logprobs": [-0.31725305],
        "top_logprobs": [
          {
            "Paris": -0.31725305,
            "London": -1.3190403
          }
        ]
      },
      "finish_reason": null,
      "stop_reason": null
    }
  ],
  "usage": null
}

data: {
  "id": "cmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "text_completion",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "text": ", the capital",
      "logprobs": null,
      "finish_reason": null,
      "stop_reason": null
    }
  ],
  "usage": null
}

data: {
  "id": "cmpl-123e4567-e89b-12d3-a456-426614174000",
  "object": "text_completion",
  "created": 1677652288,
  "model": "nova-micro-custom",
  "choices": [
    {
      "index": 0,
      "text": "",
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 11,
    "total_tokens": 16
  }
}

data: [DONE]
```

**响应字段说明**
+ `id`：补全结果的唯一标识符
+ `object`：返回对象类型（“chat.completion”“text\$1completion”“chat.completion.chunk”）
+ `created`：生成补全结果的 Unix 时间戳
+ `model`：用于生成补全结果的模型
+ `choices`：补全结果数组
+ `usage`：词元用量信息，包含提示词词元、补全词元和总词元数
+ `logprobs`：词元的对数概率信息（需主动请求）
+ `finish_reason`模型停止生成的原因（“stop”“length”“content\$1filter”）
+ `delta`：流式响应中的增量内容
+ `reasoning`：使用 reasoning\$1effort 时的推理内容
+ `token_ids`：生成文本对应的词元 ID 数组

如需完整的 API 文档，请参阅 [InvokeEndpoint API 参考](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html)和 [InvokeEndpointWithResponseStream API 参考](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html)。