적응형 사고의 작동 방식 노력 파라미터를 사용한 적응형 사고 Converse API에서 적응형 사고 사용 프롬프트 캐싱 사고 동작 튜닝

적응형 사고

적응형 사고는 Claude Opus 4.6과 확장된 사고 함께 사용하는 권장 방법입니다. 사고 토큰 예산을 수동으로 설정하는 대신 적응형 사고를 통해 각 요청의 복잡성에 따라 언제 얼마나 많이 생각할지 Claude 동적으로 결정할 수 있습니다. 적응형 사고는 고정된를 사용하여 확장형 사고보다 더 나은 성능을 안정적으로 구동budget_tokens하므로 적응형 사고로 이동하여 Claude Opus 4.6에서 가장 지능적인 응답을 얻는 것이 좋습니다. 베타 헤더는 필요하지 않습니다.

지원되는 모델은 다음과 같습니다.

모델	모델 ID
Claude Opus 4.7	`anthropic.claude-opus-4-7`
Claude Mythos 미리 보기	`anthropic.claude-mythos-preview`
Claude Opus 4.6	`anthropic.claude-opus-4-6-v1`
Claude Sonnet 4.6	`anthropic.claude-sonnet-4-6`

참고

Claude Opus 4.7 및 Claude Mythos 미리 보기는 적응형 사고만 지원합니다. 수동 확장 사고(thinking.type: "enabled" 포함budget_tokens)는 이러한 모델에서 지원되지 않으며 400 오류를 반환합니다.

thinking.type: "enabled" 및 budget_tokens는 Claude Opus 4.6 및 Claude Sonnet 4.6에서 더 이상 사용되지 않으며 향후 모델 릴리스에서 제거될 예정입니다. 대신 노력 파라미터와 thinking.type: "adaptive" 함께를 사용합니다.

이전 모델(Claude Sonnet 4.5, Claude Opus 4.5 등)은 적응형 사고를 지원하지 않으며 thinking.type: "enabled"에가 필요합니다budget_tokens.

적응형 사고의 작동 방식

적응형 모드에서는 각 요청의 복잡성을 Claude 평가하고 생각할지 여부와 정도를 결정합니다. 기본 노력 수준(high)에서는 Claude가 거의 항상 생각하게 됩니다. 낮은 노력 수준에서는 더 간단한 문제에 대한 생각을 건너뛸 Claude 수 있습니다.

적응형 사고는 도 자동으로 활성화합니다인터리브 사고(베타). 즉, 도구 호출 간에 생각할 Claude 수 있으므로 에이전트 워크플로에 특히 효과적입니다.

API 요청"adaptive"에서를 thinking.type로 설정합니다.

CLI


aws bedrock-runtime invoke-model \
--model-id "us.anthropic.claude-opus-4-6-v1" \
--body '{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 16000,
"thinking": {
"type": "adaptive"
},
"messages": [
{
"role": "user",
"content": "Three players A, B, C play a game. Each has a jar with 100 balls numbered 1-100. Simultaneously, each draws one ball. A beats B if As number > Bs number (mod 100, treating 100 as 0 for comparison). Similarly for B vs C and C vs A. The overall winner is determined by majority of pairwise wins (ties broken randomly). Is there a mixed strategy Nash equilibrium where each player draws uniformly? If not, characterize the equilibrium."
}
]
}' \
--cli-binary-format raw-in-base64-out \
output.json && cat output.json | jq '.content[] | {type, thinking: .thinking[0:200], text}'

Python


import boto3
import json

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-2'
)

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 16000,
        "thinking": {
            "type": "adaptive"
        },
        "messages": [{
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }]
    })
)

response_body = json.loads(response["body"].read())

for block in response_body["content"]:
    if block["type"] == "thinking":
        print(f"\nThinking: {block['thinking']}")
    elif block["type"] == "text":
        print(f"\nResponse: {block['text']}")

TypeScript


import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            max_tokens: 16000,
            thinking: {
                type: "adaptive"
            },
            messages: [{
                role: "user",
                content: "Explain why the sum of two even numbers is always even."
            }]
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    for (const block of responseBody.content) {
        if (block.type === "thinking") {
            console.log(`\nThinking: ${block.thinking}`);
        } else if (block.type === "text") {
            console.log(`\nResponse: ${block.text}`);
        }
    }
}

main().catch(console.error);

노력 파라미터를 사용한 적응형 사고

적응형 사고와 노력 파라미터를 결합하여 사고의 정도를 안내할 수 Claude 있습니다. 노력 수준은 Claude의 사고 할당에 대한 소프트 지침 역할을 합니다.

작업 수준	사고 동작
`max`	Claude는 항상 사고 깊이에 제약 없이 사고합니다. Claude Opus 4.6 전용 - 다른 모델에서 `max`를 사용하는 요청은 오류를 반환합니다.
`high`(기본값)	Claude는 항상 생각합니다. 복잡한 작업에 대한 심층 추론을 제공합니다.
`medium`	Claude는 보통의 사고를 사용합니다. 매우 간단한 쿼리에 대한 생각은 건너뛸 수 있습니다.
`low`	Claude는 사고를 최소화합니다. 속도가 가장 중요한 간단한 작업에 대한 생각은 건너뜁니다.

중요

effort 파라미터는 output_config 객체 내부가 아니라 요청 본문의 별도의 thinking 객체 내에 배치되어야 합니다. effort 내부에를 배치thinking하면가 생성됩니다ValidationException.

다음 예제에서는 InvokeModel API를 사용할 때 작업 수준을 설정하는 방법을 보여줍니다.


{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 16000,
    "thinking": {
        "type": "adaptive"
    },
    "output_config": {
        "effort": "high"
    },
    "messages": [{
        "role": "user",
        "content": "Your prompt here"
    }]
}

Converse API에서 적응형 사고 사용

Converse API를 사용하는 경우 내부에서 thinking 및 effort 파라미터를 전달합니다additionalModelRequestFields. 다음 예제에서는 기본 노력 수준의 적응형 사고를 보여줍니다.


import boto3, json

bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-east-2')

response = bedrock_runtime.converse(
    modelId="us.anthropic.claude-opus-4-6-v1",
    messages=[{
        "role": "user",
        "content": [{"text": "Explain why the sum of two even numbers is always even."}]
    }],
    additionalModelRequestFields={
        "thinking": {
            "type": "adaptive"
        }
    }
)

print(json.dumps(response["output"], indent=2, default=str))

노력 수준을 지정하려면에서 별도의 output_config 객체 내에 effort 필드를 추가합니다additionalModelRequestFields.


response = bedrock_runtime.converse(
    modelId="us.anthropic.claude-opus-4-6-v1",
    messages=[{
        "role": "user",
        "content": [{"text": "What is 2 + 2?"}]
    }],
    additionalModelRequestFields={
        "thinking": {
            "type": "adaptive"
        },
        "output_config": {
            "effort": "low"
        }
    }
)

프롬프트 캐싱

adaptive 사고를 사용한 연속 요청은 프롬프트 캐시 중단점을 보존합니다. 그러나 adaptive 및 enabled/disabled 사고 모드 간에 전환하면 메시지에 대한 캐시 중단점이 끊어집니다. 시스템 프롬프트와 도구 정의는 모드 변경과 관계없이 캐시된 상태로 유지됩니다.

사고 동작 튜닝

Claude가 원하는 것보다 더 자주 생각하거나 덜 자주 생각하면 시스템 프롬프트에 지침을 추가할 수 있습니다.


Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.

주의

자주 생각Claude하지 않도록 조정하면 추론의 이점을 얻는 작업의 품질이 저하될 수 있습니다. 프롬프트 기반 튜닝을 프로덕션에 배포하기 전에 특정 워크로드에 미치는 영향을 측정합니다. 먼저 낮은 노력 수준으로 테스트하는 것이 좋습니다.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

확장된 사고

사고 암호화