# 온디맨드 추론
<a name="on-demand-inference"></a>

온디맨드 추론은 프로비저닝된 용량 없이 Amazon Nova 모델에 대한 서버리스 액세스 권한을 제공합니다. 이 모드는 사용량을 기반으로 워크로드 및 요금을 처리하도록 자동으로 조정됩니다.

## 이점
<a name="on-demand-benefits"></a>

온디맨드 추론은 다음과 같은 몇 가지 이점을 제공합니다.
+ **용량 계획 없음:** 수요에 맞게 자동으로 규모 조정
+ **종량제:** 처리된 토큰에 대해서만 비용이 청구됨
+ **즉시 가용성:** 프로비저닝 또는 워밍업 시간이 필요하지 않음
+ **비용 효율성:** 가변적이거나 예측할 수 없는 워크로드에 적합

## 온디맨드 추론 사용
<a name="on-demand-usage"></a>

온디맨드 추론은 Amazon Nova 모델의 기본 모드입니다. API 직접 호출 시 모델 ID를 지정하기만 하면 됩니다.

```
import boto3

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.converse(
    modelId='us.amazon.nova-2-lite-v1:0',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'Hello, Nova!'}]
        }
    ]
)

# Print the response text
content_list = response["output"]["message"]["content"]
text = next((item["text"] for item in content_list if "text" in item), None)
if text is not None:
    print(text)
```

## 가격 책정
<a name="on-demand-pricing"></a>

온디맨드 추론은 처리되는 입력 및 출력 토큰 수에 기반하여 요금이 청구됩니다. 현재 요금 세부 정보는 [Amazon Bedrock 요금](https://aws.amazon.com/bedrock/pricing/)을 참조하세요.

## 할당량 및 제한
<a name="on-demand-limits"></a>

온디맨드 추론에는 모델 및 리전에 따라 달라지는 기본 할당량이 있습니다. 할당량 증가를 요청하려면 [ Service Quotas 콘솔](https://console.aws.amazon.com/servicequotas/)을 사용하세요.