압축 작동 방식 기본 사용법 파라미터 트리거 구성 사용자 지정 요약 지침 압축 후 일시 중지 압축 블록 작업 스트리밍 프롬프트 캐싱 사용량 이해

압축

작은 정보

서버 측 압축은 최소한의 통합 작업으로 컨텍스트 관리를 자동으로 처리하기 때문에 장기 실행 대화 및 에이전트 워크플로에서 컨텍스트를 관리하는 데 권장됩니다.

참고

압축은 현재 베타 상태입니다. 이 기능을 사용하려면 compact-2026-01-12 API 요청에 베타 헤더를 포함합니다. 압축은 현재 Converse API에서 지원되지 않지만 InvokeModel에서는 지원됩니다.

압축은 컨텍스트 기간 제한에 근접할 때 이전 컨텍스트를 자동으로 요약하여 장기 실행 대화 및 작업에 대한 유효 컨텍스트 길이를 확장합니다. 이는 다음과 같은 경우에 적합합니다.

사용자가 장기간 채팅 하나를 사용하게 하려는 채팅 기반 멀티턴 대화
200K 컨텍스트 기간을 초과할 수 있는 많은 후속 작업(종종 도구 사용)이 필요한 작업 지향 프롬프트

압축은 다음 모델에서 지원됩니다.

모델	모델 ID
Claude Sonnet 4.6	`anthropic.claude-sonnet-4-6`
Claude Opus 4.6	`anthropic.claude-opus-4-6-v1`

참고

usage 필드output_tokens의 최상위 input_tokens 및 에는 압축 반복 사용이 포함되지 않으며 모든 비압축 반복의 합계를 반영합니다. 요청에 대해 소비되고 청구된 총 토큰을 계산하려면 usage.iterations 배열의 모든 항목에서 합계를 구합니다.

이전에 비용 추적 또는 감사를 usage.output_tokens 위해 usage.input_tokens 및를 사용한 경우 압축이 활성화될 usage.iterations 때 집계되도록 추적 로직을 업데이트해야 합니다. iterations 배열은 요청 중에 새 압축이 트리거될 때만 표시됩니다. 이전 compaction 블록을 다시 적용하면 추가 압축 비용이 발생하지 않으며,이 경우 최상위 사용 필드는 여전히 정확합니다.

압축 작동 방식

압축이 활성화되면는 구성된 토큰 임계값에 도달하면 대화를 Claude 자동으로 요약합니다. API:

입력 토큰이 지정된 트리거 임계값을 초과하는 경우를 감지합니다.
현재 대화의 요약을 생성합니다.
요약이 포함된 compaction 블록을 생성합니다.
압축된 컨텍스트로 응답을 계속합니다.

후속 요청 시 메시지에 응답을 추가합니다. API는 compaction 블록 앞에 모든 메시지 블록을 자동으로 삭제하여 요약에서 대화를 계속합니다.

기본 사용법

Messages API 요청context_management.edits에서에 compact_20260112 전략을 추가하여 압축을 활성화합니다.

CLI


aws bedrock-runtime invoke-model \
    --model-id "us.anthropic.claude-opus-4-6-v1" \
    --body '{
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [
            {
                "role": "user",
                "content": "Help me build a website"
            }
        ],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    }' \
    --cli-binary-format raw-in-base64-out \
    /tmp/response.json

echo "Response:"
cat /tmp/response.json | jq '.content[] | {type, text: .text[0:500]}'

Python


import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Append the response (including any compaction block) to continue the conversation
messages.append({"role": "assistant", "content": response_body["content"]})

for block in response_body["content"]:
    if block.get("type") == "compaction":
        print(f"[COMPACTION]: {block['content'][:200]}...")
    elif block.get("type") == "text":
        print(f"[RESPONSE]: {block['text']}")

TypeScript


import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const messages: Array<{role: string, content: string | object[]}> = [
        { role: "user", content: "Help me build a website" }
    ];

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            anthropic_beta: ["compact-2026-01-12"],
            max_tokens: 4096,
            messages,
            context_management: {
                edits: [
                    {
                        type: "compact_20260112"
                    }
                ]
            }
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    // Append response to continue conversation
    messages.push({ role: "assistant", content: responseBody.content });

    for (const block of responseBody.content) {
        if (block.type === "compaction") {
            console.log(`[COMPACTION]: ${block.content.substring(0, 200)}...`);
        } else if (block.type === "text") {
            console.log(`[RESPONSE]: ${block.text}`);
        }
    }
}

main().catch(console.error);

파라미터

파라미터	유형	기본값	설명
`type`	문자열	필수	`"compact_20260112"`이어야 합니다.
`trigger`	객체	토큰 150,000개	압축을 트리거할 시기입니다. 토큰은 50,000개 이상이어야 합니다.
`pause_after_compaction`	부울	`false`	압축 요약 생성 후 일시 중지할지 여부
`instructions`	문자열	`null`	사용자 지정 요약 프롬프트입니다. 제공 시 기본 프롬프트를 완전히 대체합니다.

트리거 구성

trigger 파라미터를 사용하여 압축이 트리거되는 시기를 구성합니다.


import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {
                        "type": "input_tokens",
                        "value": 100000
                    }
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])

사용자 지정 요약 지침

기본적으로 압축은 다음 요약 프롬프트를 사용합니다.


You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.

instructions 파라미터를 통해 사용자 지정 지침을 제공하여이 프롬프트를 완전히 바꿀 수 있습니다. 사용자 지정 지침은 기본값을 보완하지 않으며 완전히 대체합니다.


import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])

압축 후 일시 중지

압축 요약을 생성한 후 API를 일시 중지pause_after_compaction하는 데 사용합니다. 이렇게 하면 API가 응답을 계속하기 전에 추가 콘텐츠 블록(예: 최근 메시지 또는 특정 지침 지향 메시지 보존)을 추가할 수 있습니다.

활성화하면 압축 블록을 생성한 후 API가 compaction 중지 이유가 포함된 메시지를 반환합니다.


import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "pause_after_compaction": True
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Check if compaction triggered a pause
if response_body.get("stop_reason") == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response_body["content"]})

    # Continue the request
    response = bedrock_runtime.invoke_model(
        modelId="us.anthropic.claude-opus-4-6-v1",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "anthropic_beta": ["compact-2026-01-12"],
            "max_tokens": 4096,
            "messages": messages,
            "context_management": {
                "edits": [{"type": "compact_20260112"}]
            }
        })
    )
    response_body = json.loads(response["body"].read())

print(response_body["content"][-1]["text"])

압축 블록 작업

압축이 트리거되면 API는 어시스턴트 응답 시작 시 compaction 블록을 반환합니다.

장시간 대화하면 압축이 여러 개 발생할 수 있습니다. 마지막 압축 블록은 프롬프트의 최종 상태를 반영하며 그 이전의 콘텐츠를 생성된 요약으로 대체합니다.


{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}

스트리밍

압축이 활성화된 상태에서 응답을 스트리밍하면 압축이 시작될 때 content_block_start 이벤트를 받게 됩니다. 압축 블록은 텍스트 블록과 다르게 스트리밍됩니다. content_block_start 이벤트를 수신한 다음 전체 요약 콘텐츠(중간 스트리밍 없음)가 content_block_delta 포함된 단일를 수신한 다음 content_block_stop 이벤트를 수신합니다.

프롬프트 캐싱

압축 블록에 cache_control 중단점을 추가하여 요약된 콘텐츠와 함께 전체 시스템 프롬프트를 캐싱할 수 있습니다. 원래 압축된 콘텐츠는 무시됩니다. 압축이 트리거되면 후속 요청에서 캐시 누락이 발생할 수 있습니다.


{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}

사용량 이해

압축에는 요금 제한 및 청구에 기여하는 추가 샘플링 단계가 필요합니다. API는 응답에 세부 사용 정보를 반환합니다.


{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}

iterations 배열은 각 샘플링 반복의 사용량을 보여줍니다. 압축이 발생하면 compaction 반복 후 기본 message 반복이 표시됩니다. 최종 반복의 토큰 수는 압축 후 유효 컨텍스트 크기를 반영합니다.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

적응형 사고

거부된 요청에 대한 폴백 크레딧(베타)