# What are Automated Reasoning checks in Amazon Bedrock Guardrails?
What are Automated Reasoning checks?

## What Automated Reasoning checks do


A key challenge with large language models (LLMs) is ensuring the accuracy of their responses. Without validation, LLMs can produce hallucinations or inaccurate information that undermines trust. Automated Reasoning checks in Amazon Bedrock Guardrails help solve this problem by using mathematical techniques to validate natural language content against policies you define.

Unlike traditional guardrail components that block or filter content based on pattern matching, Automated Reasoning checks uses formal logic to provide structured feedback about *why* a response is correct or incorrect. This feedback can be used to steer an LLM towards generating content that is provably consistent with your policy. Specifically, Automated Reasoning checks can:
+ **Detect factually incorrest statements** in LLM responses by mathematically proving that generated content contradicts your policy rules.
+ **Highlight unstated assumptions** where a response is consistent with your policy but doesn't address all relevant rules, indicating the response may be incomplete.
+ **Provide mathematically verifiable explanations** for why accurate statements are correct, citing the specific policy rules and variable assignments that support the conclusion.

These capabilities make Automated Reasoning checks different from other Amazon Bedrock Guardrails components. Content filters and topic policies act as binary gates — they block or allow content. Automated Reasoning checks act as a verification layer that provides detailed, actionable feedback you can use to improve responses programmatically.

## When to use Automated Reasoning checks


Automated Reasoning checks are most valuable when you need to demonstrate the factual basis for an LLM's response. Consider using them when your application involves:
+ **Regulated industries** such as healthcare, human resources, and financial services, where incorrect information can have legal or compliance consequences.
+ **Complex rule sets** such as mortgage approvals, zoning laws, insurance eligibility, or employee benefits, where multiple conditions interact to determine an outcome.
+ **Compliance scenarios** that require auditable AI responses with mathematically verifiable proof that the response is consistent with your policies.
+ **Customer-facing applications** where incorrect guidance could erode trust, such as chatbots that answer questions about company policies, product eligibility, or service terms.

## What Automated Reasoning checks don't do


To set the right expectations, be aware of the following limitations:
+ **No prompt injection protection.** Automated Reasoning checks validate exactly what you send them. If malicious or manipulated content is provided as input, the validation is performed on that content as-is. To detect and block prompt injection attacks, use [Content filters](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html#guardrails-content-filters) in combination with Automated Reasoning checks.
+ **No off-topic detection.** Automated Reasoning only analyzes text that is relevant to the policy. It ignores unrelated content and cannot tell you whether a response is off-topic. To detect off-topic responses, use [topic policies](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html#guardrails-topic-policies).
+ **No streaming support.** Automated Reasoning checks do not support streaming APIs. You must validate complete responses.
+ **English only.** Automated Reasoning checks currently support English (US) only.
+ **Scope limited to your policy.** A `VALID` result guarantees validity only for the parts of the input captured through policy variables. Statements that fall outside the scope of your policy's variables are not validated. For example, "I can submit my homework late because I have a fake doctor's note" might be deemed valid if the policy has no variable to capture whether the doctor's note is fake.

Automated Reasoning checks complement other Amazon Bedrock Guardrails features like content filters and topic policies. For the best protection, use them together. For more information, see [Guardrail components](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html).

## End-to-end workflow overview


Using Automated Reasoning checks involves four phases: creating a policy, testing it, deploying it in a guardrail, and integrating it into your application.

```
Source Document ──► Extracted Policy ──► Testing ──► Deployment ──► Integration
    (rules)          (formal logic)      (verify)    (guardrail)    (validate responses
                                                                     and act on feedback)
```

1. **Create a policy.** Upload a source document that contains the rules you want to enforce. Automated Reasoning extracts formal logic rules and a schema of variables from your document. A fidelity report is automatically generated that measures how accurately the extracted policy represents your source documents, with coverage and accuracy scores and detailed grounding that links each rule and variable back to the specific statements in your source content. Review the extracted policy and fidelity report to ensure the policy captures your rules correctly. For more information, see [Create your Automated Reasoning policy](create-automated-reasoning-policy.md).

1. **Test and refine.** Tests helps ensure that your policy can accurately validate generated content even while you make changes to the policy itself. Create tests that mimic the questions your users will ask and the responses your LLM might generate. Automated Reasoning checks uses foundational models to translation natural language to logic. Use generated scenarios to validate rule correctness and QnA tests to validate the natural language to logic translation accuracy. Refine your policy based on test results. For more information, see [Test an Automated Reasoning policy](test-automated-reasoning-policy.md).

1. **Deploy.** Save an immutable version of your tested policy and attach it to a guardrail. You can automate deployment using CloudFormation or CI/CD pipelines. For more information, see [Deploy your Automated Reasoning policy in your application](deploy-automated-reasoning-policy.md).

1. **Integrate.** At runtime, Automated Reasoning findings are returned through APIs that supports a Amazon Bedrock Guardrails configuration: `Converse`, `InvokeModel`, `InvokeAgent`, and `RetrieveAndGenerate`, as well as the standalone `ApplyGuardrail` API. Inspect the findings to decide whether to serve the response, rewrite it using the feedback, or ask the user for clarification. Automated Reasoning checks operate in *detect mode* only — they return findings and feedback rather than blocking content. For more information on how to integrate Automated Reasoning checks in your application, see [Integrate Automated Reasoning checks in your application](integrate-automated-reasoning-checks.md). For more information on the permissions required to enable Automated Reasoning checks, see [Permissions for Automated Reasoning policies with ApplyGuardrail](guardrail-automated-reasoning-permissions.md).

## Availability and language support


Automated Reasoning checks in Amazon Bedrock Guardrails is generally available in the following Regions:
+ US East (N. Virginia)
+ US West (Oregon)
+ US East (Ohio)
+ EU (Frankfurt)
+ EU (Paris)
+ EU (Ireland)

Automated Reasoning checks currently support English (US) only.

## Limitations and considerations


Before implementing Automated Reasoning checks, be aware of these technical limitations:
+ **Document complexity.** Source documents should be well-structured with clear, unambiguous rules. Highly complex documents with nested conditions or contradictory statements may not extract cleanly into formal logic. Input documents are limited to 5 MB in size and 50,000 characters. You can split larger documents and merge each section into your policy. Images and tables in documents also impact the number of input characters.
+ **Processing time.** Automated Reasoning checks validation adds latency to your application responses. Plan for additional processing time, especially for complex policies with many variables. The number of variables in a policy directly contributes to increases in validation latency.
+ **Policy scope.** To create policies that are easier to maintain, each policy should focus on a specific domain (for example, HR, finance, legal) rather than trying to cover multiple unrelated areas in a single policy.
+ **Variable and rules limits.** Policies with excessive numbers of variables or overly complex rule interactions may hit processing limits or return TOO\$1COMPLEX results. See [Amazon Bedrock limits documentation](https://docs.aws.amazon.com/hgeneral/latest/gr/bedrock.html#limits_bedrock) and [Validation results reference](automated-reasoning-checks-concepts.md#ar-concept-validation-results).
+ **Natural language dependency.** The accuracy of validation depends on how well natural language in user prompts and model responses can be translated to your policy's formal logic variables. Automated Reasoning checks use foundational models to translate natural language into logic representations. Variable descriptions influence the quality of this translation.
+ **Non-linear arithmetic.** Automated Reasoning checks might timeout or return TOO\$1COMPLEX if constraints involve reasoning with non-linear arithmetic (for example, irrational numbers or exponents).

## Pricing


Automated Reasoning checks in Amazon Bedrock Guardrails are charged based on the number of validation requests processed. For current pricing information, see the [Amazon Bedrock pricing page](https://aws.amazon.com/bedrock/pricing/).

Charges are incurred for each validation request, regardless of the result (for example, VALID, INVALID, TRANSLATION\$1AMBIGUOUS). To optimize costs:
+ Use appropriate confidence thresholds to balance accuracy with processing requirements.
+ Consider caching validation results for identical or similar queries when appropriate for your use case.
+ Monitor usage patterns and adjust policies to reduce unnecessary validation requests.

## Cross-region inference for policy operations


Automated Reasoning utilizes cross-region inference to optimize the performance and availability of policy creation and testing operations. Specific API operations automatically distribute processing across AWS Regions within your geographic boundary to ensure reliable service delivery.

The following Automated Reasoning API operations employ cross-region inference:
+ `StartAutomatedReasoningPolicyBuildWorkflow` — Invoked during policy creation and compilation from source documents.
+ `StartAutomatedReasoningPolicyTestWorkflow` — Invoked during policy validation and testing procedures.

These operations invoke large language models to extract formal logic rules from source documents and translate natural language constructs into structured logical representations. To ensure optimal performance and availability, request processing is distributed according to the following geographic routing:
+ **United States Regions:** API requests originating from US East (N. Virginia), US West (Oregon), or US East (Ohio) may be processed in any supported US Region.
+ **European Union Regions:** API requests originating from EU (Frankfurt), EU (Paris), or EU (Ireland) may be processed in any supported EU Region.

**Important**  
Customer data remains within the originating geographic boundary (United States or European Union) and is processed in accordance with AWS data residency commitments. Cross-region inference routes requests exclusively within the same geographic Region to optimize performance and service availability.

Cross-region inference operates transparently without requiring customer configuration. API functionality remains consistent regardless of the specific Region that processes the request.

**Topics**
+ [

## What Automated Reasoning checks do
](#automated-reasoning-what-it-does)
+ [

## When to use Automated Reasoning checks
](#automated-reasoning-when-to-use)
+ [

## What Automated Reasoning checks don't do
](#automated-reasoning-what-it-doesnt-do)
+ [

## End-to-end workflow overview
](#automated-reasoning-workflow-overview)
+ [

## Availability and language support
](#automated-reasoning-availability)
+ [

## Limitations and considerations
](#automated-reasoning-limitations)
+ [

## Pricing
](#automated-reasoning-pricing)
+ [

## Cross-region inference for policy operations
](#automated-reasoning-cross-region-inference)
+ [

# Automated Reasoning checks concepts
](automated-reasoning-checks-concepts.md)
+ [

# Create your Automated Reasoning policy
](create-automated-reasoning-policy.md)
+ [

# Automated Reasoning policy best practices
](automated-reasoning-policy-best-practices.md)
+ [

# Test an Automated Reasoning policy
](test-automated-reasoning-policy.md)
+ [

# Troubleshoot and refine your Automated Reasoning policy
](address-failed-automated-reasoning-tests.md)
+ [

# Use Kiro CLI with an Automated Reasoning policy
](kiro-cli-automated-reasoning-policy.md)
+ [

# Deploy your Automated Reasoning policy in your application
](deploy-automated-reasoning-policy.md)
+ [

# Integrate Automated Reasoning checks in your application
](integrate-automated-reasoning-checks.md)

# Automated Reasoning checks concepts
Automated Reasoning checks concepts

This page describes the building blocks of Automated Reasoning checks. Understanding these concepts will help you create effective policies, interpret test results, and debug issues. For a high-level overview of what Automated Reasoning checks do and when to use them, see [Rules](#ar-concept-rules).

## Policies


An Automated Reasoning *policy* is a resource in your AWS account that contains a set of formal logic rules, a schema of variables, and optional custom types. The policy encodes the business rules, regulations, or guidelines that you want to validate LLM responses against.

Policies are created from source documents — such as HR handbooks, compliance manuals, or product specifications — that describe the rules in natural language. When you create a policy, Automated Reasoning checks extract the rules and variables from your document and translates them into formal logic that can be mathematically verified.

The relationship between policies, guardrails, and your application is as follows:

```
Source Document ──► Automated Reasoning Policy ──► Guardrail ──► Your Application
  (natural          (rules + variables +           (references     (calls guardrail
   language)         custom types)                  a policy        APIs to validate
                                                    version)        LLM responses)
```

Key characteristics of policies:
+ Each policy is identified by an Amazon Resource Name (ARN) and exists in a specific AWS Region.
+ Policies have a `DRAFT` version (called "Working Draft" in the console) that you edit during development, and numbered immutable versions that you create for deployment.
+ A guardrail can reference the DRAFT policy or a specific numbered version. Using a numbered version means you can update the `DRAFT` without affecting your deployed guardrail.
+ Each policy should focus on a specific domain (for example, HR benefits, loan eligibility, product return rules) rather than trying to cover multiple unrelated areas.

For step-by-step instructions on creating a policy, see [Create your Automated Reasoning policy](create-automated-reasoning-policy.md).

## Fidelity report


A *fidelity report* measures how accurately an extracted policy represents the source documents it was generated from. The report is automatically generated when you create a policy from a source document, and provides two key scores along with detailed grounding information that links every rule and variable back to specific statements in your source content.

The fidelity report is designed to help non-technical subject matter experts explore and validate a policy without needing to understand formal logic. In the console, the **Source Document** tab displays the fidelity report as a table of numbered atomic statements extracted from your document, showing which rules and variables each statement grounds. You can filter by specific rules or variables and search for content within the statements.

The fidelity report includes two scores, each ranging from 0.0 to 1.0:
+ **Coverage score** — Indicates how well the policy covers the statements in the source documents. A higher score means more of the source content is represented in the policy.
+ **Accuracy score** — Indicates how faithfully the policy rules represent the source material. A higher score means the extracted rules more closely match the intent of the original document.

Beyond the aggregate scores, the fidelity report provides detailed grounding for each rule and variable in the policy:
+ **Rule reports** — For each rule, the report identifies the specific statements from the source documents that support it (grounding statements), explains how those statements justify the rule (grounding justifications), and provides an individual accuracy score with a justification.
+ **Variable reports** — For each variable, the report identifies the source statements that support the variable definition, explains the justification, and provides an individual accuracy score.
+ **Document sources** — The source documents are broken down into atomic statements — individual, indivisible facts extracted from the text. The document content is annotated with line numbers so you can trace each rule and variable back to the exact location in the original document.

## Rules


Rules are the core of an Automated Reasoning policy. Each rule is a formal logic expression that captures a relationship between variables. Rules are expressed using a subset of [SMT-LIB](https://smtlib.cs.uiowa.edu/) syntax, a standard format for formal logic that Automated Reasoning checks use for mathematical verification. See [KMS permissions for Automated Reasoning policies](create-automated-reasoning-policy.md#automated-reasoning-policy-kms-permissions)

Most rules should follow an *if-then* (implicative) format. This means rules should have a condition (the "if" part) and a conclusion (the "then" part), connected by the implication operator `=>`.

**Well-formed rules (if-then format):**

```
;; If the employee is full-time AND has worked for more than 12 months,
;; then they are eligible for parental leave.
(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)

;; If the loan amount is greater than 500,000, then a co-signer is required.
(=> (> loanAmount 500000) requiresCosigner)
```

**Bare assertions (rules without an if-then structure) create axioms — statements that are always true.** This is useful to check boundary conditions such as account balances having positive values, but can also make certain conditions logically impossible and lead to unexpected `IMPOSSIBLE` results during validation. For example, the bare assertion `(= eligibleForParentalLeave true)` means Automated Reasoning checks treat it as a fact that the user is eligible for parent leave. Any input that mentions not being eligible would produce a validation result of `IMPOSSIBLE` because it contradicts this axiom.

```
;; GOOD: Useful to check impossible conditions such as 
;; negative account balance
(>= accountBalance 0)

;; BAD: This asserts eligibility as always true, regardless of conditions.
eligibleForParentalLeave
```

Rules support the following logic operators:


| Operator | Meaning | Example | 
| --- | --- | --- | 
| => | Implication (if-then) | (=> isFullTime eligibleForBenefits) | 
| and | Logical AND | (and isFullTime (> tenure 12)) | 
| or | Logical OR | (or isVeteran isTeacher) | 
| not | Logical NOT | (not isTerminated) | 
| = | Equality | (= employmentType FULL\$1TIME) | 
| >, <, >=, <= | Comparison | (>= creditScore 700) | 

For best practices on writing effective rules, see [Automated Reasoning policy best practices](automated-reasoning-policy-best-practices.md).

## Variables


Variables represent the concepts in your domain that Automated Reasoning checks use to translate natural language into formal logic and to evaluate rules. Each variable has a name, a type, and a description.

Automated Reasoning checks support the following variable types:


| Type | Description | Example | 
| --- | --- | --- | 
| bool | True or false value | isFullTime — Whether the employee works full-time | 
| int | Whole number | tenureMonths — Number of months the employee has worked | 
| real | Decimal number | interestRate — Annual interest rate as a decimal (0.05 means 5%) | 
| Custom type (enum) | One value from a defined set | leaveType — One of: PARENTAL, MEDICAL, BEREAVEMENT, PERSONAL | 

### The critical role of variable descriptions


Variable descriptions are the single most important factor in translation accuracy. When Automated Reasoning checks translate natural language into formal logic, it uses variable descriptions to determine which variables correspond to concepts mentioned in the text. Vague or incomplete descriptions lead to `TRANSLATION_AMBIGUOUS` results or incorrect variable assignments.

**Example: How descriptions affect translation**

Consider a user asking: "I've been working here for 2 years. Am I eligible for parental leave?"


| Vague description (likely to fail) | Detailed description (likely to succeed) | 
| --- | --- | 
| tenureMonths: "How long the employee has worked." | tenureMonths: "The number of complete months the employee has been continuously employed. When users mention years of service, convert to months (for example, 2 years = 24 months). Set to 0 for new hires." | 

With the vague description, Automated Reasoning checks may not know to convert "2 years" to 24 months, or may not assign the variable at all. With the detailed description, the translation is unambiguous.

Good variable descriptions should:
+ Explain what the variable represents in plain language.
+ Specify the unit and format (for example, "in months", "as a decimal where 0.15 means 15%").
+ Include non-obvious synonyms and alternative phrasings that users might use (for example, "Set to true when users mention being 'full-time' or working full hours").
+ Describe boundary conditions (for example, "Set to 0 for new hires").

## Custom types (enums)


Custom types define a set of named values that a variable can take. They are equivalent to enumerations (enums) in programming languages. Use custom types when a variable represents a category with a fixed set of possible values.

**Examples:**


| Type name | Possible values | Use case | 
| --- | --- | --- | 
| LeaveType | PARENTAL, MEDICAL, BEREAVEMENT, PERSONAL | Categorize the type of leave an employee is requesting | 
| Severity | CRITICAL, MAJOR, MINOR | Classify the severity of an issue or incident | 

**When to use enums vs. booleans:**
+ Use enums when the values are *mutually exclusive* — a variable can only be one value at a time. For example, `leaveType` can be PARENTAL or MEDICAL, but not both simultaneously.
+ Use separate boolean variables when states can *co-exist*. For example, a person can be both a veteran and a teacher. Using an enum `customerType = {VETERAN, TEACHER}` would force a choice between them, creating a logical contradiction when both apply. Instead, use two booleans: `isVeteran` and `isTeacher`.

**Tip**  
If it's possible for a variable not to have any value from the enum, include an `OTHER` or `NONE` value. This prevents translation issues when the input doesn't match any of the defined values.

## Translation: from natural language to formal logic


Translation is the process by which Automated Reasoning checks convert natural language (user questions and LLM responses) into formal logic expressions that can be mathematically verified against your policy rules. Understanding this process is key to debugging issues and creating effective policies.

Automated Reasoning checks validate content in two distinct steps:

1. **Translate** — Automated Reasoning checks use foundation models (LLMs) to translate the natural language input into formal logic. This step maps concepts in the text to your policy's variables and expresses the relationships as logical statements. Because this step uses LLMs, it may *contain errors*. Automated Reasoning checks uses multiple LLMs to translate the input text then uses the semantic equivalence of the redundant translations to set a confidence score. The quality of the translation depends on how well your variable descriptions match the language used in the input.

1. **Validate** — Automated Reasoning checks use mathematical techniques (through SMT solvers) to check whether the translated logic is consistent with your policy rules. This step *is mathematically sound* — if the translation is correct, the validation result will be consistent.

**Important**  
This two-step distinction is critical for debugging. If you are certain the rules in the policy are correct, when a test fails or returns unexpected results, the issue is mist likely in step 1 (translation), not step 2 (validation). The mathematical validation is sound and if the translation correctly captures the meaning of the input, the validation result will be correct. Focus your debugging efforts on improving variable descriptions and ensuring the translation assigns the right variables with the right values.

**Example: Translation in action**

Given a policy with variables `isFullTime` (bool), `tenureMonths` (int), and `eligibleForParentalLeave` (bool), and the input:
+ **Question:** "I'm a full-time employee and I've been here for 18 months. Can I take parental leave?"
+ **Answer:** "Yes, you are eligible for parental leave."

Step 1 (translate) produces:

```
Premises: isFullTime = true, tenureMonths = 18
Claims: eligibleForParentalLeave = true
```

Step 2 (validate) checks these assignments against the policy rule `(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)` and confirms the claim is `VALID`.

To improve translation accuracy:
+ Write detailed variable descriptions that cover how users refer to concepts in everyday language.
+ Remove duplicate or near-duplicate variables that could confuse the translation (for example, `tenureMonths` and `monthsOfService`).
+ Delete unused variables that aren't referenced by any rules — they add noise to the translation process.
+ Use question-and-answer tests to validate translation accuracy with realistic user inputs. For more information, see [Test an Automated Reasoning policy](test-automated-reasoning-policy.md).

## Findings and validation results


When Automated Reasoning checks validate content, it produces a set of *findings*. Each finding represents a factual claim extracted from the input, along with the validation result, the variable assignments used, and the policy rules that support the conclusion. The overall (aggregated) result is determined by sorting findings in order of severity and selecting the worst result. The severity order from worst to best is: `TRANSLATION_AMBIGUOUS`, `IMPOSSIBLE`, `INVALID`, `SATISFIABLE`, `VALID`.

### Structure of a finding


The result type determines which fields are present in the finding. See the [Validation results reference](#ar-concept-validation-results) section for an in-depth description of each finding type. However, most finding types share a common `translation` object that contains the following components:

`premises`  
Context, assumptions, or conditions extracted from the input that affect how a claim should be evaluated. In question-and-answer formats, the premise is often the question itself. Answers can also contain premises that establish constraints. For example, in "I'm a full-time employee with 18 months of service," the premises are `isFullTime = true` and `tenureMonths = 18`.

`claims`  
Factual statements that Automated Reasoning checks evaluate for accuracy. In a question-and-answer format, the claim is typically the answer. For example, in "Yes, you are eligible for parental leave," the claim is `eligibleForParentalLeave = true`.

`confidence`  
A score from 0.0 to 1.0 representing how certain Automated Reasoning checks is about the translation from natural language to formal logic. Higher scores indicate greater certainty. A confidence of 1.0 means all translation models agreed on the same interpretation.

`untranslatedPremises`  
References to portions of the original input text that correspond to premises but could not be translated into formal logic. These highlight parts of the input that Automated Reasoning recognized as relevant but couldn't map to policy variables.

`untranslatedClaims`  
References to portions of the original input text that correspond to claims but could not be translated into formal logic. A `VALID` result only covers the translated claims — untranslated claims are not validated.

### Validation results reference


Each finding is exactly one of the following types. The type determines the meaning of the result, the fields available in the finding, and the recommended action for your application. All finding types that include a `translation` field also include a `logicWarning` field that is present when the translation contains logical issues independent of the policy rules (for example, statements that are always true or always false).


| Result | Finding fields | Recommended action | 
| --- | --- | --- | 
| VALID |  `translation` — The translated premises, claims, confidence score, and any untranslated references. `supportingRules` — The policy rules that prove the claims are correct. Each rule includes its identifier and the policy version ARN. `claimsTrueScenario` — A scenario (set of variable assignments) demonstrating how the claims are logically true.  | Serve the response to the user. Log supportingRules and claimsTrueScenario for audit purposes — they provide mathematically verifiable proof of validity. Check untranslatedPremises and untranslatedClaims for parts of the input that were not validated. | 
| INVALID |  `translation` — The translated premises, claims, confidence score, and any untranslated references. `contradictingRules` — The policy rules that the claims violate. Each rule includes its identifier and the policy version ARN.  | Do not serve the response. Use translation (to see what was claimed) and contradictingRules (to see which rules were violated) to rewrite the response or block it. In a rewriting loop, pass the contradicting rules and incorrect claims to the LLM to generate a corrected response. | 
| SATISFIABLE |  `translation` — The translated premises, claims, confidence score, and any untranslated references. `claimsTrueScenario` — A scenario demonstrating how the claims could be logically true. `claimsFalseScenario` — A scenario demonstrating how the claims could be logically false under different conditions.  | Compare claimsTrueScenario and claimsFalseScenario to identify the missing conditions. Rewrite the response to include the additional information needed to make it VALID, ask the user for clarification about the missing conditions, or serve the response with a caveat that it may be incomplete. | 
| IMPOSSIBLE |  `translation` — The translated premises, claims, confidence score, and any untranslated references. Inspect the premises to identify contradictions. `contradictingRules` — The policy rules that conflict with the premises or with each other. If populated, the contradiction may be in the policy itself.  | Check whether the input contains contradictory statements (for example, "I'm full-time and also part-time"). If the input is valid, the contradiction is likely in your policy — check contradictingRules and review the quality report. See [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md). | 
| TRANSLATION\$1AMBIGUOUS |  Does not contain a `translation` object. Instead provides: `options` — The competing logical interpretations (up to 2). Each option contains its own `translations` with premises, claims, and confidence. Compare options to see where models disagreed. `differenceScenarios` — Scenarios (up to 2) that illustrate how the different interpretations differ in meaning, with variable assignments highlighting the practical impact of the ambiguity.  | Inspect options to understand the disagreement. Improve variable descriptions to reduce ambiguity, merge or remove overlapping variables, or ask the user for clarification. You can also adjust the confidence threshold — see [Confidence thresholds](#ar-concept-confidence-thresholds). | 
| TOO\$1COMPLEX |  Does not contain a `translation`, rules, or scenarios. The input exceeded processing capacity due to volume or complexity.  | Shorten the input by breaking it into smaller pieces, or simplify policy by reducing the number of variables, and avoid complex arithmetic (for example, exponents or irrational numbers). You can split your policy into smaller, more focused policies. | 
| NO\$1TRANSLATIONS |  Does not contain a `translation`, rules, or scenarios. May appear alongside other findings if only part of the input could be translated.  | A NO\$1TRANSLATIONS finding is included in the output whenever one of the other findings includes untranslated premises or claims. Look through the other findings to see which portions of the input were not translated. If the content should be relevant, add variables to your policy to capture the missing concepts. If the content is off-topic, consider using topic policies to filter it before it reaches Automated Reasoning checks. | 

**Note**  
A `VALID` result covers only the parts of the input captured through policy variables in the translated premises and claims. Statements that fall outside the scope of your policy's variables are not validated. For example, "I can submit my homework late because I have a fake doctor's note" might be deemed valid if the policy has no variable to capture whether the doctor's note is fake. Automated Reasoning checks will likely include "fake doctor's note" as an untranslated premise in its finding. Treat untranslated content and `NO_TRANSLATIONS` findings as a warning signal.

## Confidence thresholds


Automated Reasoning checks use multiple foundation models to translate natural language into formal logic. Each model produces its own translation independently. The *confidence score* represents the level of agreement among these translations — specifically, the percentage of models that produced semantically equivalent interpretations.

The *confidence threshold* is a value you set (from 0.0 to 1.0) that determines the minimum level of agreement required for a translation to be considered reliable enough to validate. It controls the trade-off between coverage and accuracy:
+ **Higher threshold** (for example, 0.9): Requires strong agreement among translation models. Produces fewer findings but with higher accuracy. More inputs will be flagged as `TRANSLATION_AMBIGUOUS`.
+ **Lower threshold** (for example, 0.5): Accepts translations with less agreement. Produces more findings but with a higher risk of incorrect translations. Fewer inputs will be flagged as `TRANSLATION_AMBIGUOUS`.

**How the threshold works:**

1. Multiple foundation models each translate the input independently.

1. Translations that are supported by a percentage of models equal to or above the threshold become high-confidence findings with a definitive result (`VALID`, `INVALID`, etc.).

1. If one or more translations fall below the threshold, Automated Reasoning checks surface an additional `TRANSLATION_AMBIGUOUS` finding. This finding includes details about the disagreements between the models, which you can use to improve your variable descriptions or ask the user for clarification.

**Tip**  
Start with the default threshold and adjust based on your testing results. If you see too many `TRANSLATION_AMBIGUOUS` results for inputs that should be unambiguous, focus on improving your variable descriptions rather than lowering the threshold. Lowering the threshold may reduce `TRANSLATION_AMBIGUOUS` results but increases the risk of incorrect validations.

# Create your Automated Reasoning policy
Create your Automated Reasoning policy

When you create an Automated Reasoning policy, your source document is translated into a set of formal logic rules and a schema of variables and types. This page walks you through preparing your document, creating the policy, and reviewing the results.

Amazon Bedrock encrypts your Automated Reasoning policy using AWS Key Management Service (KMS). By default, Amazon Bedrock uses a service-owned key. You can optionally specify a customer managed KMS key for additional control over the encryption of your policy data.

To test and use your Automated Reasoning policy, ensure you have [the appropriate permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrail-automated-reasoning-permissions.html).

## Prepare your source document


Before you open the console or call the API, prepare the document that Automated Reasoning will use to extract rules and variables. The quality of your policy depends directly on the quality of this input.

### Document structure and clarity


Automated Reasoning checks work best with documents that contain clear, unambiguous rules. Each rule should state a condition and an outcome. Avoid vague language, subjective criteria, or rules that depend on external context not present in the document.

**Example: Clear vs. vague rules**


| Clear (good for extraction) | Vague (poor for extraction) | 
| --- | --- | 
| "Full-time employees with at least 12 months of continuous service are eligible for parental leave." | "Eligible employees may apply for parental leave subject to manager approval." | 
| "Refund requests must be submitted within 30 days of purchase. Items must be in original packaging." | "Refunds are handled on a case-by-case basis." | 

### Size limits and splitting large documents


Source documents are limited to 5 MB in size and 50,000 characters. Images and tables in documents also count toward the character limit.

If your document exceeds these limits, or if it covers multiple unrelated domains, split it into focused sections. For example, split an employee handbook into separate documents for leave policies, benefits eligibility, and expense reimbursement. Create your policy with the first section, then use iterative policy building (described later on this page) to merge additional sections into the same policy.

### Pre-process complex documents


Documents that contain a lot of boilerplate, legal disclaimers, or content unrelated to the rules you want to enforce will produce noisy policies with unnecessary variables and rules. Before uploading, consider:
+ Removing headers, footers, table of contents, and appendices that don't contain rules.
+ Extracting only the sections that contain the rules relevant to your use case.
+ Simplifying complex tables into plain text statements where possible.

**Tip**  
Start with a focused subset of your rules. Create and test the policy thoroughly, then gradually add more content in subsequent iterations. This approach helps you identify and resolve issues early and makes troubleshooting easier.

### (Optional) Use an LLM to rewrite documents as logical rules


For documents that contain narrative prose, legal language, or complex formatting, consider using a frontier model with advanced reasoning capabilities to rewrite the content as clear, logical rules before uploading it to Automated Reasoning checks. This one-off preprocessing step converts text into a format that Automated Reasoning checks can extract from more accurately, resulting in higher-quality policies with fewer unused variables and bare assertions.

**Note**  
Always review the LLM's output against your original document before using it as source text.

There are two approaches to LLM preprocessing, depending on the complexity of your document and how much control you want over the extraction.

#### Approach 1: Plain text rule extraction


Ask the LLM to rewrite the document as a numbered list of if-then rules. This approach is straightforward and works well for short, focused documents where the rules are relatively clear in the source.

**Example prompt:**

```
You are a logical reasoning expert. Your task is to analyze the provided
source text and rewrite it as a set of clear, logical rules using if-then
statements.

Instructions:
1. Extract the key relationships, conditions, and outcomes from the source text.
2. Convert these into logical implications using "if-then" format.
3. Use clear, precise language that captures the original meaning.
4. Number each rule for easy reference.
5. Ensure rules are mutually consistent and non-contradictory.

Format:
- Rule [N]: If [condition], then [consequence].
- Use "and" to combine multiple conditions.
- Use "or" for alternative conditions.
- Include negations when relevant: If not [condition], then [consequence].

Example:
Source: "Students who complete all assignments and attend at least 80% of
classes will pass the course."
Rule 1: If a student completes all assignments and attends at least 80% of
classes, then they will pass the course.

Source Text:
[Paste your document here]
```

#### Approach 2: Structured rule extraction


For complex or lengthy documents, ask the LLM to extract rules as structured JSON with metadata for each rule. This approach produces richer output that helps you audit which parts of the document each rule came from, how confident the extraction is, and which rules are inferred rather than directly stated. It also asks the LLM to generate sanity rules — common-sense boundary constraints such as "age must be non-negative" — which translate directly into the boundary rules that Automated Reasoning policies use. For more information on boundary rules, see [Validate ranges for numerical values](automated-reasoning-policy-best-practices.md#bp-validate-ranges).

**Example prompt:**

```
You are a logical reasoning expert. Extract formal logical rules from the
provided text.

Output Format:
For each rule, provide:
- Rule ID: [unique identifier]
- Conditions: [ALL preconditions — preserve compound conditions with AND/OR/NOT]
- Consequence: [the outcome/action]
- Confidence: [high/medium/low based on text clarity]
- Source Reference: [quote or paraphrase from source]
- Rule Type: [explicit/implicit/sanity]

Critical Guidelines:
1. PRESERVE ALL CONDITIONS: Do not drop or simplify conditions.
2. PRESERVE LOGICAL OPERATORS: Maintain AND, OR, NOT relationships exactly.
3. PRESERVE QUANTIFIERS: Keep "all", "any", "at least", numeric thresholds.
4. PRESERVE EXCEPTIONS: Include "unless", "except when" clauses.
5. Make implicit conditions explicit only when clearly implied by context.
6. Use consistent terminology across rules.
7. Flag ambiguities such as unclear, incomplete, or contradictory statements.
8. Add sanity rules for common-sense constraints:
   - Numeric ranges (e.g., "age must be between 0 and 150")
   - Temporal constraints (e.g., "start date must be before end date")
   - Physical limits (e.g., "quantity cannot be negative")
   - Mutual exclusivity (e.g., "status cannot be both active and inactive")

Output Requirements:
- Produce final JSON only (no text or markdown).
- Use the following JSON keys:
  - "rules" for the rules array
  - "ambiguities" for the ambiguities array

Source Text:
[Paste your document here]
```

After running the structured extraction, review the JSON output. Pay special attention to:
+ Rules with `confidence: low` — these may need manual verification against the source document.
+ Rules with `ruleType: implicit` — these were inferred rather than directly stated. Verify they accurately reflect the intent of the source.
+ The `ambiguities` array — these highlight areas where the source document is unclear and may need rewriting before extraction.

Convert the reviewed JSON rules into plain text if-then statements for use as your source document when creating the Automated Reasoning policy.

## Write effective instructions


When creating a policy, you can provide optional instructions that guide how Automated Reasoning processes your source document. While optional, good instructions significantly improve the quality of the extracted rules and variables.

Effective instructions should cover three things:

1. **Describe the use case.** Explain what your application does and what type of content the policy will validate. For example: "This policy will validate an HR chatbot that answers employee questions about leave of absence eligibility."

1. **Describe the types of questions users will ask.** Give examples of realistic user questions. For example: "Users will ask questions like 'Am I eligible for parental leave if I've worked here for 9 months?' or 'How many days of bereavement leave can I take?'"

1. **Focus the extraction.** If your document covers multiple topics, tell Automated Reasoning checks which parts to focus on and which to ignore. For example: "Focus on sections 3 through 5 which cover leave policies. Ignore the general company overview in section 1 and the organizational chart in section 2."

**Example instruction:**

```
This policy will validate HR questions about leave eligibility. The document
has sections on different leave types (parental, medical, bereavement, personal).
Users will ask questions like "Am I eligible for parental leave if I've worked
here for 9 months?" or "Can part-time employees take bereavement leave?"
Focus on the eligibility criteria for each leave type. Capture variables that
help determine whether an employee is eligible for a specific type of leave.
```

## Create a policy in the console


1. In the left navigation, choose **Automated Reasoning**, and then choose **Create policy**.

1. Enter a **Name** for the policy.

1. (Optional) Enter a **Description** for the policy.

1. <a name="source-document-step"></a>For **Source**, provide the document that describes the rules and policies of your knowledge domain. Do the following:

   1. For **Ingest method**, do one of the following:

      1. Select **Upload document**, then select **Choose file**. Upload a PDF document of the source content.

      1. Select **Enter text**. Paste or enter your source content.

   1. (Recommended) For **Instructions**, provide guidance on how to process your source document. See [Write effective instructions](#write-effective-instructions) for what to include.

1. (Optional) For **Tags**, choose **Add new tag** to tag your policy.

1. (Optional) For **Encryption**, choose a KMS key to encrypt your policy. You can use the default service-owned key or select a customer managed key.

1. Choose **Create policy**.

**Tip**  
If your application expects a specific set of variables, you can pre-define the schema before importing content. Use the `CreateAutomatedReasoningPolicy` API or CloudFormation to create a policy with a `policyDefinition` that contains your desired variables and types but no rules. Then use [Iterative policy building](#iterative-policy-building) to import your source document. Automated Reasoning will use your predefined schema as a starting point and add rules that reference your variables.

## Create a policy using the API


An Automated Reasoning policy is a resource in your AWS account identified by an Amazon Resource Name (ARN). Creating a policy through the API is a two-step process: first create the policy resource, then start a build workflow to extract rules from your document.

### Step 1: Create the policy resource


Use the `CreateAutomatedReasoningPolicy` API to create the policy resource.

`name` (required)  
The name of the policy. Must be unique within your AWS account and Region.

`description` (optional)  
A description of the policy's purpose.

`policyDefinition` (optional)  
An initial policy definition with rules, variables, and custom types. Use this if you already have a schema you want to start from.

`kmsKeyId` (optional)  
The KMS key identifier for encrypting the policy. If not specified, Amazon Bedrock uses a service-owned key.

`tags` (optional)  
Tags to associate with the policy.

`clientRequestToken` (optional)  
An idempotency token to ensure the operation completes no more than once.

**Example:**

```
aws bedrock create-automated-reasoning-policy \
  --name "MyHRPolicy" \
  --description "Validates HR chatbot responses about leave eligibility" \
  --kms-key-id arn:aws:kms:us-east-1:111122223333:key/12345678-1234-1234-1234-123456789012
```

Example response:

```
{
  "createdAt": "2025-07-21T14:43:52.692Z",
  "definitionHash": "f16ba1ceca36e1d21adce559481add6a...",
  "name": "MyHRPolicy",
  "policyArn": "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk",
  "updatedAt": "2025-07-21T14:43:52.692Z",
  "version": "DRAFT"
}
```

### Step 2: Start a build workflow to extract rules


Use the `StartAutomatedReasoningPolicyBuildWorkflow` API with the policy ARN from step 1 to extract rules and variables from your source document.

`policyArn` (required)  
The ARN of the policy resource created in step 1.

`buildWorkflowType` (required)  
Set to `INGEST_CONTENT` to extract rules from a document.

`sourceContent` (required)  
Contains the document to process and an optional starting policy definition.

**Example:**

```
# Encode your PDF to base64
PDF_BASE64=$(base64 -i your-policy.pdf | tr -d '\n')

# Start the build workflow
aws bedrock start-automated-reasoning-policy-build-workflow \
  --policy-arn arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk \
  --build-workflow-type INGEST_CONTENT \
  --source-content "{
    \"policyDefinition\": {
      \"version\": \"1.0\",
      \"types\": [],
      \"rules\": [],
      \"variables\": []
    },
    \"workflowContent\": {
      \"documents\": [
        {
          \"document\": \"$PDF_BASE64\",
          \"documentContentType\": \"pdf\",
          \"documentName\": \"HR Leave Policy\",
          \"documentDescription\": \"Validates HR chatbot responses about leave eligibility. Users ask questions like 'Am I eligible for parental leave?'\"
        }
      ]
    }
  }"
```

Example response:

```
{
  "policyArn": "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk",
  "buildWorkflowId": "d40fa7fc-351e-47d8-a338-53e4b3b1c690"
}
```

Check the build status with `ListAutomatedReasoningPolicyBuildWorkflows`:

```
aws bedrock list-automated-reasoning-policy-build-workflows \
  --policy-arn arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk
```

## Review the extracted policy


After a build completes, review the extracted policy definition before you start testing. Catching issues at this stage saves time compared to discovering them through failed tests later.

In the console, open your policy and go to the **Definitions** page. Via the API, use `GetAutomatedReasoningPolicyBuildWorkflowResultAssets` with `--asset-type POLICY_DEFINITION` to retrieve the extracted definition, and `--asset-type QUALITY_REPORT` to retrieve the quality report. You can see a full list of the assets produced during the workflow, such as the fidelity report, using the `--asset-type ASSET_MANIFEST` parameter.

Check for the following issues:

1. **Unused variables.** In the console, look for warning indicators next to variables. These flag variables that aren't referenced by any rules. Delete unused variables — they add noise to the translation process and can cause `TRANSLATION_AMBIGUOUS` results. In the API, unused variables are listed in the `QUALITY_REPORT` asset.

1. **Duplicate or near-duplicate variables.** Scan the variable list for variables with overlapping meanings, such as `tenureMonths` and `monthsOfService`. Duplicate variables confuse the translation process because Automated Reasoning checks can't determine which one to use for a given concept. Merge or delete duplicates.

1. **Bare assertions (rules not in if-then format).** Skim the rules and look for rules that aren't in if-then format, such as `(= eligibleForParentalLeave true)`. Bare assertions create axioms — statements that are always true — which make certain conditions logically impossible and lead to unexpected `IMPOSSIBLE` results during validation. Rewrite them as conditionals (for example, `(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)`) or delete them. Bare assertions are appropriate only for boundary conditions like `(>= accountBalance 0)`.

1. **Conflicting rules.** The quality report flags rules that contradict each other. Conflicting rules cause your policy to return `IMPOSSIBLE` for all validation requests that involve the conflicting rules. Resolve conflicts by merging the rules or deleting one of them.

1. **Missing rules or variables.** Compare the extracted policy against your source document. If important rules or concepts are missing, you can add them manually or re-create the policy with better instructions.

**Tip**  
The quality report also identifies disjoint rule sets — groups of rules that don't share any variables. Disjoint rule sets aren't necessarily a problem (your policy may cover independent topics), but they can indicate that variables are missing connections between related rules.

## Review the fidelity report


When you create a policy from a source document, a fidelity report is automatically generated alongside the extracted policy. The fidelity report measures how accurately the policy represents your source content and provides detailed grounding that links each rule and variable back to specific statements in the document. For more information about fidelity report concepts, see [Fidelity report](automated-reasoning-checks-concepts.md#ar-concept-fidelity-report).

### Review the fidelity report in the console


In the console, open your policy and choose the **Source Document** tab (next to **Definitions**). The **Source Content** view displays each atomic statement extracted from your document as a numbered row in a table. Each row shows:
+ The statement number and extracted text.
+ The source **Document** the statement came from.
+ The number of **Rules** grounded by that statement.
+ The number of **Variables** grounded by that statement.

Use the **Rules** and **Variables** dropdown filters at the top of the table to focus on statements that ground a specific rule or variable. Use the search bar to find specific content within the extracted statements.

If you edit the policy after the initial extraction — for example, by modifying rules or adding variables — choose the **Regenerate** button to update the fidelity report so it reflects your current policy definition.

### Review the fidelity report using the API


Use `GetAutomatedReasoningPolicyBuildWorkflowResultAssets` with `--asset-type FIDELITY_REPORT` to retrieve the fidelity report. To regenerate the report after making policy changes, use `StartAutomatedReasoningPolicyBuildWorkflow` with the build workflow type `GENERATE_FIDELITY_REPORT` and provide the source documents in the `generateFidelityReportContent` field. The workflow re-analyzes the documents against the current policy definition and produces a new fidelity report. You can also retrieve the original source documents from a previous build workflow using `--asset-type SOURCE_DOCUMENT` with the `--asset-id` parameter (obtain the asset ID from the asset manifest).

### What to look for


When reviewing the fidelity report from the APIs, pay attention to:
+ **Low coverage score.** A low coverage score indicates that significant portions of your source document were not captured in the policy. Look for statements with 0 rules and 0 variables in the source content view to identify which parts of the document were missed, and consider using iterative policy building to add the missing content. See [Iterative policy building](#iterative-policy-building).
+ **Low accuracy score on individual rules.** Each rule has its own accuracy score and justification. Rules with low accuracy scores may not faithfully represent the source material. Use the **Rules** filter to isolate the grounding statements for a specific rule and compare them against the rule's formal logic to identify misinterpretations.
+ **Ungrounded rules or variables.** Rules or variables that lack grounding statements may have been inferred rather than directly extracted from the document. Verify that these are correct or remove them if they don't reflect your intent.

**Tip**  
The fidelity report is especially useful for collaboration with domain experts who authored the source document. Share the **Source Document** view with them so they can verify that the policy correctly captures their intent without needing to read the formal logic rules directly.

## Iterative policy building


For complex domains, build your policy incrementally rather than trying to capture everything in a single document upload. Start with a focused subset of your rules, create and test the policy, then add more content in subsequent iterations.

### Add content in the console


1. Open your Automated Reasoning policy in the console.

1. On the **Definitions** page, choose **Import**.

1. Select the option to merge the new content with the existing policy definition.

1. Upload or paste the additional source content.

1. Review the updated policy definition and resolve any new conflicts or duplicates.

### Add content using the API


Call `StartAutomatedReasoningPolicyBuildWorkflow` with `INGEST_CONTENT`, passing the complete current policy definition alongside the new document. You must include the full existing definition — rules, variables, and types — so that the new content is merged with the existing policy rather than replacing it.

```
# First, retrieve the current policy definition
aws bedrock get-automated-reasoning-policy \
  --policy-arn arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk

# Encode the new document
PDF_BASE64=$(base64 -i additional-rules.pdf | tr -d '\n')

# Start a build workflow with the existing definition + new document
aws bedrock start-automated-reasoning-policy-build-workflow \
  --policy-arn arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk \
  --build-workflow-type INGEST_CONTENT \
  --source-content "{
    \"policyDefinition\": EXISTING_POLICY_DEFINITION_JSON,
    \"workflowContent\": {
      \"documents\": [
        {
          \"document\": \"$PDF_BASE64\",
          \"documentContentType\": \"pdf\",
          \"documentName\": \"Additional Benefits Rules\",
          \"documentDescription\": \"Additional rules covering medical and bereavement leave eligibility.\"
        }
      ]
    }
  }"
```

**Important**  
The API supports a maximum of 2 build workflows per policy, with only 1 allowed to be `IN_PROGRESS` at any time. If you need to start a new build and already have 2 workflows, delete an old one first using `DeleteAutomatedReasoningPolicyBuildWorkflow`.

## KMS permissions for Automated Reasoning policies


If you specify a customer managed KMS key to encrypt your Automated Reasoning policy, you must configure permissions that allow Amazon Bedrock to use the key on your behalf.

### Key policy permissions


Add the following statement to your KMS key policy to allow Amazon Bedrock to use the key for Automated Reasoning policies:

```
{
  "Sid": "PermissionsForAutomatedReasoningPolicy",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::111122223333:user/role"
  },
  "Action": [
    "kms:Decrypt",
    "kms:DescribeKey",
    "kms:GenerateDataKey"
  ],
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "kms:EncryptionContext:aws:bedrock:automated-reasoning-policy": [
        "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/policy-id",
        "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/policy-id:*"
      ],
      "kms:ViaService": "bedrock.us-east-1.amazonaws.com"
    }
  }
}
```

### IAM permissions


Your IAM principal must have the following permissions to use a customer managed KMS key with Automated Reasoning policies:

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowKMSForAutomatedReasoningPolicy",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:DescribeKey",
        "kms:GenerateDataKey"
      ],
      "Resource": "arn:aws:kms:us-east-1:111122223333:key/key-id",
      "Condition": {
        "StringEquals": {
          "kms:EncryptionContext:aws:bedrock:automated-reasoning-policy": [
            "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/policy-id",
            "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/policy-id:*"
          ],
          "kms:ViaService": "bedrock.us-east-1.amazonaws.com"
        }
      }
    }
  ]
}
```

### Encryption context


Amazon Bedrock uses encryption context to provide additional security for your Automated Reasoning policies. The encryption context is a set of key-value pairs used as additional authenticated data when encrypting and decrypting your policy.

For Automated Reasoning policies, Amazon Bedrock uses the following encryption context:
+ **Key:** `aws:bedrock:automated-reasoning-policy`
+ **Value:** The Amazon Resource Name (ARN) of your Automated Reasoning policy

# Automated Reasoning policy best practices
Policy best practices

This page consolidates best practices for creating and maintaining Automated Reasoning policies. Read this before creating your first policy and refer back to it when debugging issues. For the conceptual foundations behind these practices, see [Automated Reasoning checks concepts](automated-reasoning-checks-concepts.md). For step-by-step creation instructions, see [Create your Automated Reasoning policy](create-automated-reasoning-policy.md).

## Start simple and iterate


The most common mistake when creating an Automated Reasoning policy is trying to capture an entire complex document in a single pass. Instead, start with a focused subset of your rules and build incrementally.

1. Pick a single, well-defined section of your source document (for example, parental leave eligibility from an HR handbook).

1. Create a policy from that section and review the extracted rules and variables.

1. Write tests that cover the key scenarios for that section.

1. Fix any issues before adding more content.

1. Use iterative policy building to merge additional sections one at a time. For more information, see [Iterative policy building](create-automated-reasoning-policy.md#iterative-policy-building).

This approach has two advantages: it makes issues easier to isolate (you know which section introduced a problem), and it keeps the policy manageable during development. A policy with 10 well-tested rules is more useful than one with 100 untested rules.

## Pre-process documents with an LLM


For documents that are lengthy, contain narrative prose, or mix rules with non-rule content (such as legal disclaimers or organizational background), run the document through an LLM before uploading it to Automated Reasoning checks. Ask the LLM to extract the content as explicit if-then rules. This preprocessing step significantly improves the quality of the extracted policy because Automated Reasoning checks works best with clear, declarative statements rather than unstructured text.

When writing your preprocessing prompt, include the following instructions for the LLM:
+ Extract rules in if-then format with clear conditions and consequences.
+ Preserve all conditions, logical operators (AND, OR, NOT), quantifiers ("at least", "at most"), and exception clauses ("unless", "except when").
+ Add sanity rules for common-sense constraints — such as "account balance cannot be negative" or "credit score must be between 300 and 850" — which translate into boundary rules in your policy (see [Validate ranges for numerical values](#bp-validate-ranges)).

**Important**  
Always review the LLM's output against your original document before using it as source text. LLMs can hallucinate rules not present in the source, misinterpret conditions, or drop important exceptions. The preprocessing step is a starting point — not a substitute for human review.

For detailed prompt templates and a step-by-step preprocessing workflow, see [(Optional) Use an LLM to rewrite documents as logical rules](create-automated-reasoning-policy.md#preprocess-with-llm).

## Use implications (=>) to structure rules


The if-then format (using the `=>` implication operator) is the single most important rule-writing pattern. Every rule that expresses a conditional relationship should use this format.


| Good: Implication | Bad: Bare assertion | 
| --- | --- | 
| (=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave) | eligibleForParentalLeave | 
| (=> (> loanAmount 500000) requiresCosigner) | requiresCosigner | 

Bare assertions (rules without an if-then structure) create axioms — statements that are always true. The assertion `eligibleForParentalLeave` tells Automated Reasoning checks that parental leave eligibility is always true, regardless of any conditions. Any input that says the user is *not* eligible would return `IMPOSSIBLE` because it contradicts this axiom.

Bare assertions are appropriate only for boundary conditions that should always hold, such as:

```
;; Account balance can never be negative
(>= accountBalance 0)

;; Interest rate is always between 0 and 1
(and (>= interestRate 0) (<= interestRate 1))
```

If you find bare assertions in your extracted policy, rewrite them as conditionals or delete them. For more information on reviewing your extracted policy, see [Review the extracted policy](create-automated-reasoning-policy.md#review-extracted-policy).

## Write comprehensive variable descriptions


Variable descriptions are the primary factor in translation accuracy. When Automated Reasoning checks translate natural language into formal logic, it uses variable descriptions to determine which variables correspond to concepts mentioned in the text. Vague or incomplete descriptions are the number one cause of `TRANSLATION_AMBIGUOUS` results.

A good variable description should answer four questions:

1. **What does this variable represent?** Explain the concept in plain language.

1. **What unit or format does it use?** Specify units (months, dollars, percentage as decimal) and any conversion rules.

1. **How might users refer to this concept?** Include synonyms, alternative phrasings, and common ways users express this concept in everyday language.

1. **What are the boundary conditions?** Describe edge cases, default values, and what the variable means when set to specific values.

**Example: Before and after**


| Vague (causes translation failures) | Detailed (translates reliably) | 
| --- | --- | 
| tenureMonths: "How long the employee has worked." | tenureMonths: "The number of complete months the employee has been continuously employed. When users mention years of service, convert to months (for example, 2 years = 24 months). Set to 0 for new hires who have not yet completed their first month." | 
| isFullTime: "Full-time status." | isFullTime: "Whether the employee works full-time (true) or part-time (false). Set to true when users mention being 'full-time', working 'full hours', or working 40\$1 hours per week. Set to false when users mention being 'part-time', working 'reduced hours', or working fewer than 40 hours per week." | 
| interestRate: "The interest rate." | interestRate: "The annual interest rate expressed as a decimal value, where 0.05 means 5% and 0.15 means 15%. When users mention a percentage like '5%', convert to the decimal form (0.05)." | 

## Use booleans for non-exclusive states


When modeling states that can co-exist, use separate boolean variables instead of a single enum. A person can be both a veteran and a teacher. Using an enum `customerType = {VETERAN, TEACHER}` forces a choice between them, creating a logical contradiction when both apply.


| Good: Separate booleans | Bad: Enum for non-exclusive states | 
| --- | --- | 
|  `isVeteran` (bool): "Whether the customer is a military veteran." `isTeacher` (bool): "Whether the customer is a teacher."  |  `customerType` (enum: VETERAN, TEACHER, STUDENT): "The type of customer." Problem: A customer who is both a veteran and a teacher cannot be represented.  | 

Reserve enums for truly mutually exclusive categories where only one value can apply at a time, such as `leaveType = {PARENTAL, MEDICAL, BEREAVEMENT}` (an employee can only request one type of leave at a time). For more information on custom types, see [Custom types (enums)](automated-reasoning-checks-concepts.md#ar-concept-custom-types).

## Specify units and formats in variable descriptions


Ambiguity about units is a common source of translation errors. If a user says "I've worked here for 2 years" and your variable is `tenureMonths`, the translation needs to know to convert years to months. If your variable description doesn't specify the unit, the translation may assign `tenureMonths = 2` instead of `tenureMonths = 24`.

Always specify:
+ The unit of measurement (months, days, dollars, percentage).
+ The format (decimal vs. percentage, date format, currency).
+ Conversion rules for common alternative expressions (for example, "2 years = 24 months").

**Examples:**
+ `loanAmount`: "The total loan amount in US dollars. When users mention amounts in thousands (for example, '500K'), convert to the full number (500000)."
+ `submissionDate`: "The number of days after the due date that the submission was made. A value of 0 means the submission was on time. Positive values indicate late submissions."

## Validate ranges for numerical values


For numerical variables, add boundary rules that constrain the valid range. This prevents logically impossible scenarios and helps Automated Reasoning checks produce more meaningful results.

```
;; Account balance cannot be negative
(>= accountBalance 0)

;; Interest rate must be between 0 and 1 (0% to 100%)
(and (>= interestRate 0) (<= interestRate 1))

;; Credit score ranges from 300 to 850
(and (>= creditScore 300) (<= creditScore 850))

;; Tenure in months cannot be negative
(>= tenureMonths 0)
```

Without these boundary rules, Automated Reasoning checks might consider scenarios with negative account balances or credit scores above 1000, which are meaningless in your domain. Boundary rules are one of the few cases where bare assertions (rules not in if-then format) are appropriate.

## Use intermediate variables for abstraction


When multiple rules share a common condition, extract that condition into an intermediate boolean variable. This simplifies your rules and makes the policy easier to maintain.

**Example: Membership tiers**

Instead of repeating the membership condition in every benefit rule:

```
;; Without intermediate variable (repetitive)
(=> (and (> purchaseTotal 1000) (> accountAge 12)) eligibleForFreeShipping)
(=> (and (> purchaseTotal 1000) (> accountAge 12)) eligibleForPrioritySupport)
(=> (and (> purchaseTotal 1000) (> accountAge 12)) eligibleForEarlyAccess)
```

Define an intermediate variable and reference it:

```
;; With intermediate variable (cleaner)
(=> (and (> purchaseTotal 1000) (> accountAge 12)) isPremiumMember)
(=> isPremiumMember eligibleForFreeShipping)
(=> isPremiumMember eligibleForPrioritySupport)
(=> isPremiumMember eligibleForEarlyAccess)
```

This pattern makes it easier to update the membership criteria later — you only need to change one rule instead of three.

## Use enums for categorization


When a variable represents a category with a fixed set of mutually exclusive values, use a custom type (enum) instead of multiple booleans or a string. Enums constrain the possible values and make rules clearer.


| Good: Enum | Avoid: Multiple booleans for exclusive states | 
| --- | --- | 
|  Type: `LeaveType = {PARENTAL, MEDICAL, BEREAVEMENT, PERSONAL}` Variable: `leaveType` (LeaveType) Rule: `(=> (= leaveType PARENTAL) (>= leaveDays 60))`  |  `isParentalLeave` (bool) `isMedicalLeave` (bool) `isBereavementLeave` (bool) Problem: Nothing prevents multiple booleans from being true simultaneously.  | 

**Tip**  
Include an `OTHER` or `NONE` value in your enum if it's possible for the input to not match any of the defined categories. This prevents translation issues when the input doesn't fit neatly into one of the defined values.

## Keep logic declarative, not procedural


Automated Reasoning policies describe *what is true*, not *how to compute it*. Avoid writing rules that look like code with sequential steps or precedence logic.


| Good: Declarative | Avoid: Procedural thinking | 
| --- | --- | 
|  "If the employee is full-time and has more than 12 months of tenure, then they are eligible for parental leave." This states a fact about the relationship between conditions and outcomes.  |  "First check if the employee is full-time. If yes, then check tenure. If tenure is greater than 12 months, set eligibility to true." This describes a procedure, not a logical relationship.  | 

Similarly, avoid encoding precedence or priority between rules. In formal logic, all rules apply simultaneously. If you need to express that one condition overrides another, encode it explicitly in the rule conditions:

```
;; GOOD: Explicit exception handling
;; General rule: full-time employees with 12+ months get parental leave
(=> (and isFullTime (> tenureMonths 12) (not isOnProbation))
    eligibleForParentalLeave)

;; BAD: Trying to encode precedence
;; "Rule 1 takes priority over Rule 2" — this concept doesn't exist
;; in formal logic. Instead, combine the conditions into a single rule.
```

## Naming conventions


Consistent naming makes policies easier to read, maintain, and debug. Follow these conventions:
+ **Boolean variables:** Use the `is` or `has` prefix. For example: `isFullTime`, `hasDirectDeposit`, `isEligibleForLeave`.
+ **Numerical variables:** Include the unit in the name. For example: `tenureMonths`, `loanAmountUSD`, `creditScore`.
+ **Enum types:** Use PascalCase for type names and UPPER\$1SNAKE\$1CASE for values. For example: `LeaveType = {PARENTAL, MEDICAL, BEREAVEMENT}`.
+ **Variables:** Use camelCase. For example: `tenureMonths`, `isFullTime`, `leaveType`.

Avoid abbreviations that might be ambiguous. Use `tenureMonths` instead of `tenMo`, and `isFullTime` instead of `ft`. Clear names help both human reviewers and the translation process.

## Common anti-patterns


The following patterns frequently cause issues in Automated Reasoning policies. If you encounter unexpected test results, check whether your policy contains any of these anti-patterns.

### Axioms instead of implications


As described in [Use implications (=>) to structure rules](#bp-use-implications), bare assertions create axioms that are always true. This is the most common anti-pattern and the most damaging — it makes entire categories of inputs return `IMPOSSIBLE`.

**Symptom:** Tests that should return `VALID` or `INVALID` return `IMPOSSIBLE` instead.

**Fix:** Find bare assertions in your rules and rewrite them as implications, or delete them if they don't represent boundary conditions.

### Overlapping variables


Having two variables that represent the same or similar concepts (for example, `tenureMonths` and `monthsOfService`) confuses the translation process. Automated Reasoning checks can't determine which variable to use for a given concept, leading to inconsistent translations and `TRANSLATION_AMBIGUOUS` results.

**Symptom:** Tests return `TRANSLATION_AMBIGUOUS` even with clear, unambiguous input text.

**Fix:** Merge overlapping variables into a single variable with a comprehensive description. Update all rules that reference the deleted variable.

### Overly complex policies


Policies with too many variables, deeply nested conditions, or non-linear arithmetic can exceed processing limits and return `TOO_COMPLEX` results.

**Symptom:** Tests return `TOO_COMPLEX` or time out.

**Fix:** Simplify the policy. Remove unused variables, break complex rules into simpler ones using intermediate variables, and avoid non-linear arithmetic (exponents, irrational numbers). If your domain is genuinely complex, consider splitting it into multiple focused policies.

### Contradictory rules


Rules that contradict each other make it impossible for Automated Reasoning checks to reach a conclusion. For example, one rule says full-time employees are eligible for leave, while another says employees in their first year are not eligible — without specifying what happens to full-time employees in their first year.

**Symptom:** Tests return `IMPOSSIBLE` for inputs that involve the conflicting rules.

**Fix:** Check the quality report for conflicting rules. Resolve conflicts by merging the rules into a single rule with explicit conditions, or by deleting one of the conflicting rules. For more information, see [Review the extracted policy](create-automated-reasoning-policy.md#review-extracted-policy).

### Unused variables


Variables that aren't referenced by any rules add noise to the translation process. The translation may assign values to unused variables, wasting processing capacity and potentially causing `TRANSLATION_AMBIGUOUS` results when the unused variable competes with a similar active variable.

**Symptom:** Unexpected `TRANSLATION_AMBIGUOUS` results, or translations that assign values to variables that don't affect any rules.

**Fix:** Delete unused variables. In the console, look for warning indicators next to variables. Via the API, check the quality report from `GetAutomatedReasoningPolicyBuildWorkflowResultAssets` with `--asset-type QUALITY_REPORT`.

### Missing enum values


If your enum doesn't include a value for every possible category that users might mention, the translation may fail or produce unexpected results when the input doesn't match any defined value.

**Symptom:** Tests return `TRANSLATION_AMBIGUOUS` or `NO_TRANSLATIONS` when the input mentions a category not in the enum.

**Fix:** Add an `OTHER` or `NONE` value to your enum to handle inputs that don't match the defined categories. Update the enum value descriptions to clarify when each value applies.

# Test an Automated Reasoning policy
Test an Automated Reasoning policy

Testing validates that your policy's rules are correct and that Automated Reasoning checks can accurately translate natural language into formal logic. You test a policy by sending natural language statements for validation, then inspecting the feedback to ensure the translation uses the right variables and that the rules produce the expected results.

There are two complementary testing approaches: generated scenarios and question-and-answer (QnA) tests. Each targets a different part of the validation pipeline. The recommended workflow is to start with scenarios to validate rule correctness, then add QnA tests to validate translation accuracy.

## Testing strategy: scenarios vs. QnA tests


Automated Reasoning checks validate content in two steps: first, foundation models translate natural language into formal logic; then, mathematical techniques verify the logic against your policy rules. Each testing approach targets a different step in this pipeline.

### Generated scenarios (test rule correctness)


Generated scenarios test the *semantics encoded in your policy rules directly*. They remove the uncertainty of natural language translation from the equation, isolating whether the rules themselves are correct.

Scenarios are generated from your policy rules and represent situations that are logically possible given those rules. They are sorted to surface the most likely-to-be-wrong scenarios first. For each scenario, you review the variable assignments and decide:
+ **Thumbs up** — The scenario is realistic and should indeed be possible. Save it as a `SATISFIABLE` test.
+ **Thumbs down** — Something is off. The scenario shouldn't be possible given your domain knowledge. Provide natural language feedback explaining why, and Automated Reasoning checks will attempt to deduce the necessary rule changes.

**Example:** Your policy says full-time employees with 12\$1 months of tenure are eligible for parental leave. A generated scenario might show `isFullTime = true, tenureMonths = 3, eligibleForParentalLeave = true`. If this scenario shouldn't be possible (because 3 months is less than 12), you'd give it a thumbs down and explain that employees need at least 12 months of tenure. This indicates a missing or incorrect rule.

Use scenarios as your *first* testing step. They help you catch rule issues before you invest time writing QnA tests.

### QnA tests (test translation accuracy)


QnA tests validate the *full pipeline end-to-end*: natural language translation and rule validation together. They mimic real user interactions and catch translation issues that scenarios can't detect.

Each QnA test consists of:
+ An **input** (optional) — The question a user might ask your application.
+ An **output** — The response your foundation model might generate.
+ An **expected result** — The validation result you expect (for example, `VALID` or `INVALID`).

**Example:** For the same parental leave policy, a QnA test might be: input = "I've been working here for 2 years full-time. Can I take parental leave?", output = "Yes, you are eligible for parental leave.", expected result = `VALID`. This tests whether Automated Reasoning checks correctly translates "2 years" to `tenureMonths = 24` and "full-time" to `isFullTime = true`.

**Tip**  
Create tests that cover both valid and invalid scenarios. For example, if your policy states "Employees need 1 year of service for parental leave," create tests for responses that correctly state this rule *and* tests for responses that incorrectly state a different requirement.

### Recommended testing workflow


1. **Generate and review scenarios.** Start here to validate that your rules are correct. Fix any rule issues before proceeding.

1. **Write QnA tests for key use cases.** Focus on the questions your users are most likely to ask and the responses your LLM is most likely to generate. Include edge cases and boundary conditions.

1. **Run all tests.** Check that both scenarios and QnA tests pass.

1. **Iterate.** If tests fail, determine whether the issue is in the rules (fix the policy) or in the translation (improve variable descriptions). For more information, see [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md).

## Generate test scenarios automatically in the console


1. Go to the Automated Reasoning policy that you want to test (for example, **MyHrPolicy**).

1. Choose **View tests**, then select **Generate**.

1. In the **Generate scenarios** dialog, review the generated scenario and the related rules. Each scenario shows a set of variable assignments that are logically possible given your policy rules. Evaluate whether the scenario is realistic in your domain:
   + If the scenario could happen in your domain (it is *satisfiable*), select the thumbs up icon. This saves the scenario as a test that expects a `SATISFIABLE` result.
   + If the scenario shouldn't be possible, select the thumbs down icon. Provide an annotation explaining why — for example, "Employees need at least 12 months of tenure for parental leave, but this scenario shows 3 months with eligibility." Automated Reasoning checks uses your feedback to deduce rule changes that would prevent this scenario.
   + If you want a different scenario, choose **Regenerate scenario**.
**Tip**  
To inspect the formal logic version of the scenario, enable **Show SMT-LIB**. This is useful for understanding exactly which rules and variable assignments are involved.

1. Select **Save and close** to save the test, or **Save and add another** to continue reviewing scenarios.

1. If you provided annotations (thumbs down feedback) to any scenarios, choose **Apply annotations**. Automated Reasoning checks will start a build workflow to apply the changes to your policy based on your feedback.

1. On the **Review policy changes** screen, review the proposed changes to your policy's rules, variables, and variable types. Then select **Accept changes**.

## Generate test scenarios automatically using the API


Use the `GetAutomatedReasoningPolicyNextScenario` API to fetch generated test scenarios based on your policy's rules.

`policyArn` (required)  
The ARN of the Automated Reasoning policy.

`buildWorkflowId` (required)  
The identifier of the build workflow for the generated scenarios. Retrieve the latest build workflow using the `ListAutomatedReasoningPolicyBuildWorkflows` API.

**Example:**

```
aws bedrock get-automated-reasoning-policy-next-scenario \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --build-workflow-id d40fa7fc-351e-47d8-a338-53e4b3b1c690
```

The response includes a generated scenario with variable assignments and the related policy rules. Review the scenario and use the `CreateAutomatedReasoningPolicyTestCase` API to save it as a test, or use the annotation APIs to provide feedback if the scenario reveals a rule issue.

## Create a QnA test manually in the console


1. Go to the Automated Reasoning policy that you want to test (for example, **MyHrPolicy**).

1. Choose **View tests**, then select **Add**.

1. In the **Add tests** dialog, do the following:

   1. For **Input** (optional), enter the question a user might ask. For **Output**, enter the response your foundation model might provide. Together these form a QnA pair that tests how your policy validates real user interactions.

   1. Choose the result you expect from the test (such as **Valid** or **Invalid**).

   1. (Optional) Select a **Confidence threshold**, which is the minimum confidence level for logic validation. Automated Reasoning checks uses multiple LLMs to translate natural language into findings. It returns only findings supported by a significant percentage of the LLM translations. The confidence threshold defines the minimum percentage of support needed for a translation to become a finding with a validity result. Findings below the threshold are surfaced as `TRANSLATION_AMBIGUOUS`.

1. Select **Save** to create the test.

## Create a QnA test using the API


Use the `CreateAutomatedReasoningPolicyTestCase` API to create a test programmatically.

`policyArn` (required)  
The ARN of the Automated Reasoning policy.

`queryContent` (optional)  
The input query or prompt that generated the content, such as the user question. This provides context for the validation.

`guardContent` (required)  
The output content to validate — the foundation model response that will be checked for accuracy.

`expectedAggregatedFindingsResult` (optional)  
The expected validation result (for example, `VALID` or `INVALID`). The actual result is determined by sorting findings in order of severity and selecting the worst result. The severity order from worst to best is: `TRANSLATION_AMBIGUOUS`, `IMPOSSIBLE`, `INVALID`, `SATISFIABLE`, `VALID`.

`confidenceThreshold` (optional)  
The minimum confidence level for logic validation.

**Example:**

```
aws bedrock create-automated-reasoning-policy-test-case \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --query-content "Can I take a leave of absence if I'm a part-time employee?" \
  --guard-content "No, only full-time employees are eligible for leave of absence." \
  --expected-aggregated-findings-result "VALID" \
  --confidence-threshold 0.8
```

Example response:

```
{
  "testCaseId": "test-12345abcde",
  "policyArn": "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk"
}
```

## Run tests


### Run tests in the console


1. Go to the Automated Reasoning policy that you want to validate (for example, **MyHrPolicy**).

1. Choose **View tests**.

1. Do one of the following:
   + To run all tests, choose **Validate all tests**.
   + To run a single test, select the **Action** button next to the test and choose **Validate**.

### Run tests using the API


Use the `StartAutomatedReasoningPolicyTestWorkflow` API to run tests and the `GetAutomatedReasoningPolicyTestResult` API to retrieve results.

`policyArn` (required)  
The ARN of the Automated Reasoning policy.

`buildWorkflowId` (required)  
The identifier of the build workflow to execute the tests against. Retrieve the latest build workflow using the `ListAutomatedReasoningPolicyBuildWorkflows` API.

`testCaseIds` (optional)  
A list of test identifiers to run. If not provided, all tests for the policy are run.

**Example:**

```
# Run tests
aws bedrock start-automated-reasoning-policy-test-workflow \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --build-workflow-id d40fa7fc-351e-47d8-a338-53e4b3b1c690

# Get results for a specific test
aws bedrock get-automated-reasoning-policy-test-result \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --build-workflow-id d40fa7fc-351e-47d8-a338-53e4b3b1c690 \
  --test-case-id test-12345abcde
```

The response includes detailed test results with validation findings and execution status. To list all test results for a build workflow, use the `ListAutomatedReasoningPolicyTestResults` API.

## Understand test results


When a test finishes, you receive a set of *findings*. Each finding represents a factual claim extracted from your test input, along with the validation result, the variable assignments used, and the policy rules that support the conclusion. For a detailed description of finding structure and all validation result types, see [Findings and validation results](automated-reasoning-checks-concepts.md#ar-concept-findings).

### Anatomy of a test result


Each test result includes:
+ **Expected result** — The result you set when creating the test.
+ **Actual result** — The aggregated result from running the test. This is determined by sorting findings in order of severity and selecting the worst result. The severity order from worst to best is: `TRANSLATION_AMBIGUOUS`, `IMPOSSIBLE`, `INVALID`, `SATISFIABLE`, `VALID`. For example, a test with two `VALID` findings and one `IMPOSSIBLE` finding has an aggregated result of `IMPOSSIBLE`.
+ **Execution result** — Whether the test passed (expected and actual results match) or failed.
+ **Findings** — The individual validation results. Each finding contains the translated premises and claims, a confidence score, variable assignments, and the policy rules that support the conclusion.

### Practical interpretation of results


The following table summarizes what each validation result means in practice and what action to take when you see it in a test. For the full reference including finding fields and detailed descriptions, see [Validation results reference](automated-reasoning-checks-concepts.md#ar-concept-validation-results).


| Result | What it means | What to do | 
| --- | --- | --- | 
| VALID | The claims in the response are mathematically proven correct given the premises and your policy rules. The finding includes supportingRules that prove the claims and a claimsTrueScenario demonstrating how the claims are true. | If this is the expected result, the test passes. Check untranslatedPremises and untranslatedClaims for parts of the input that were not validated — a VALID result only covers the translated claims. | 
| INVALID | The claims contradict your policy rules. The finding includes contradictingRules showing which rules were violated. | If this is the expected result, the test passes. If unexpected, check whether the rules are correct or whether the translation assigned the wrong variables. Review the contradictingRules to understand which rules caused the result. | 
| SATISFIABLE | The claims are consistent with your policy but don't address all relevant rules. The response is correct under some conditions but not all. The finding includes both a claimsTrueScenario and a claimsFalseScenario showing the conditions under which the claims are true and false. | Compare the two scenarios to identify the missing conditions. This typically means the response is incomplete — it's not wrong, but it doesn't mention all the requirements. Consider whether your test should expect SATISFIABLE or whether the response should be more complete. | 
| IMPOSSIBLE | Automated Reasoning checks can't evaluate the claims because the premises are contradictory or the policy itself contains conflicting rules. | Check whether the test input contains contradictory statements (for example, "I'm full-time and also part-time"). If the input is valid, the contradiction is likely in your policy — check the quality report for conflicting rules. See [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md). | 
| TRANSLATION\$1AMBIGUOUS | The translation from natural language to formal logic was ambiguous. The multiple LLMs used for translation disagreed on how to interpret the input. The finding includes the alternative interpretations to help you understand the disagreement. | This is usually a variable description issue. Review the alternative interpretations to understand where the disagreement is, then improve the relevant variable descriptions. Common causes: overlapping variables, vague descriptions, or ambiguous input text. See [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md). | 
| TOO\$1COMPLEX | The input contains too much information for Automated Reasoning checks to process within its latency limits. | Simplify the test input. If the issue persists, your policy may be too complex — consider splitting it into multiple focused policies or simplifying rules that involve non-linear arithmetic. | 
| NO\$1TRANSLATIONS | The input couldn't be translated into formal logic. This typically means the input is not relevant to your policy's domain, or the policy doesn't have variables to model the concepts in the input. | If the input should be relevant to your policy, add the missing variables and update your rules. If the input is genuinely off-topic, this result is expected — your application should handle off-topic content separately (for example, using topic policies). | 

### Debugging tips for failed tests


When a test fails (the actual result doesn't match the expected result), use the following approach to diagnose the issue:

1. **Check the translation first.** Look at the premises and claims in the finding. Are the right variables assigned? Are the values correct? If the translation is wrong, the issue is in your variable descriptions, not your rules. For example, if "2 years" was translated to `tenureMonths = 2` instead of `tenureMonths = 24`, the variable description needs to specify the unit conversion.

1. **Check the rules.** If the translation looks correct, the issue is in your policy rules. Look at the `supportingRules` or `contradictingRules` in the finding to identify which rules are involved. Compare them against your source document.

1. **Check for untranslated content.** Look at `untranslatedPremises` and `untranslatedClaims`. If important parts of the input were not translated, you may need to add variables to capture those concepts.

1. **Check the confidence score.** A low confidence score indicates the translation models disagreed. This suggests the variable descriptions are ambiguous for this type of input.

For detailed troubleshooting guidance, see [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md).

# Troubleshoot and refine your Automated Reasoning policy
Troubleshoot and refine your policy

When an Automated Reasoning policy test fails — the actual result doesn't match the expected result — the issue is either in the translation (natural language was mapped to the wrong variables or values) or in the rules (the policy logic doesn't match your domain). This page provides a systematic approach to diagnosing and fixing both types of issues.

Before you start troubleshooting, make sure you understand the two-step validation process (translate, then validate) described in [Translation: from natural language to formal logic](automated-reasoning-checks-concepts.md#ar-concept-translation). This distinction is the key to efficient debugging.

**Note**  
**Tutorial video:** For a step-by-step walkthrough of refining and troubleshooting an Automated Reasoning policy, watch the following tutorial:  
[Tutorial Demo 3 - Refining the Automated Reasoning policy](https://youtu.be/YmohVGWr_PA)

## Debugging workflow


When a test fails, use the actual result to identify the type of issue and jump to the relevant section.


| Actual result | Likely cause | Where to look | 
| --- | --- | --- | 
| TRANSLATION\$1AMBIGUOUS | The translation models disagreed on how to interpret the input. Usually caused by overlapping variables, vague descriptions, or ambiguous input text. | [Fix translation issues](#fix-translation-issues) | 
| NO\$1TRANSLATIONS | The input couldn't be mapped to any policy variables. Either the input is off-topic or the policy is missing variables for the concepts mentioned. | [Fix translation issues](#fix-translation-issues) | 
| TOO\$1COMPLEX | The input or policy exceeds processing limits. Often caused by non-linear arithmetic or policies with too many interacting rules. | [Limitations and considerations](guardrails-automated-reasoning-checks.md#automated-reasoning-limitations) | 
| IMPOSSIBLE | The premises contradict each other, or the policy itself contains conflicting rules. | [Fix impossible results](#fix-impossible-results) | 
| VALID, INVALID, or SATISFIABLE (but not what you expected) | Check the translation in the finding first. If the right variables are assigned with the right values, the issue is in your rules. If the translation is wrong, the issue is in your variable descriptions. | Translation wrong: [Fix translation issues](#fix-translation-issues). Rules wrong: [Fix rule issues](#fix-rule-issues). | 

**Tip**  
Always check the translation first. In most cases, the mathematical validation (step 2) is correct — the issue is in how the natural language was translated to formal logic (step 1). Fixing variable descriptions is faster and less risky than changing rules.

## Fix translation issues


Translation issues occur when Automated Reasoning checks can't reliably map natural language to your policy's variables. The most visible symptom is a `TRANSLATION_AMBIGUOUS` result, but translation issues can also cause incorrect `VALID`, `INVALID`, or `SATISFIABLE` results when the wrong variables or values are assigned.

### Diagnose TRANSLATION\$1AMBIGUOUS results


A `TRANSLATION_AMBIGUOUS` finding includes two key fields that help you understand the disagreement:
+ `options` — The competing logical interpretations (up to 2). Each option contains its own translation with premises, claims, and confidence. Compare the options to see where the translation models disagreed.
+ `differenceScenarios` — Scenarios (up to 2) that illustrate how the different interpretations differ in meaning, with variable assignments highlighting the practical impact of the ambiguity.

Examine these fields to identify the specific source of ambiguity, then apply the appropriate fix from the following list.

### Overlapping variable definitions


When multiple variables could reasonably represent the same concept, the translation models disagree on which one to use.

**Symptom:** The `options` in the `TRANSLATION_AMBIGUOUS` finding show the same concept assigned to different variables. For example, one option assigns "2 years of service" to `tenureMonths = 24` while the other assigns it to `monthsOfService = 24`.

**Fix:** Merge the overlapping variables into a single variable with a comprehensive description. Update all rules that reference the deleted variable to use the remaining one.

**Example:**


| Before (overlapping) | After (merged) | 
| --- | --- | 
|  `tenureMonths`: "How long the employee has worked in months." `monthsOfService`: "The employee's months of service."  |  `tenureMonths`: "The number of complete months the employee has been continuously employed. When users mention years of service, convert to months (for example, 2 years = 24 months). This variable captures all references to employment duration, length of service, time at the company, or seniority." (Delete `monthsOfService` and update rules.)  | 

### Incomplete variable descriptions


Variable descriptions that lack detail about how users refer to concepts in everyday language make it difficult to map input to the correct variable.

**Symptom:** The `options` show the correct variable but with different values, or the translation assigns a value that doesn't match what the user said. For example, "2 years" is translated to `tenureMonths = 2` instead of `tenureMonths = 24`.

**Fix:** Update the variable description to include unit conversion rules, synonyms, and alternative phrasings. See [Write comprehensive variable descriptions](automated-reasoning-policy-best-practices.md#bp-variable-descriptions) for detailed guidance.

**Example:**


| Before (incomplete) | After (comprehensive) | 
| --- | --- | 
| isFullTime: "Full-time status." | isFullTime: "Whether the employee works full-time (true) or part-time (false). Set to true when users mention being 'full-time', working 'full hours', or working 40\$1 hours per week. Set to false when users mention being 'part-time', working 'reduced hours', or working fewer than 40 hours per week." | 

### Inconsistent value formatting


Translation ambiguity can occur when the system is unsure how to format values such as numbers, dates, or percentages.

**Symptom:** The `options` show the same variable but with different value formats. For example, one option translates "5%" to `interestRate = 5` while the other translates it to `interestRate = 0.05`.

**Fix:** Update the variable description to specify the expected format and include conversion rules. See [Specify units and formats in variable descriptions](automated-reasoning-policy-best-practices.md#bp-units-formats).

### Ambiguous input text


Sometimes the input itself is genuinely ambiguous — it contains vague pronouns, unclear references, or statements that can be interpreted multiple ways.

**Symptom:** The `options` show fundamentally different interpretations of the same text. For example, "Can they take leave?" could refer to any employee type.

**Fix:** If this is a test, rewrite the input to be more specific. At runtime, your application should ask the user for clarification when it receives a `TRANSLATION_AMBIGUOUS` result. For integration patterns, see [Integrate Automated Reasoning checks in your application](integrate-automated-reasoning-checks.md).

### Adjust the confidence threshold


If you see `TRANSLATION_AMBIGUOUS` results for inputs that are borderline ambiguous, you can adjust the confidence threshold. Lowering the threshold allows translations with less model agreement to proceed to validation, reducing `TRANSLATION_AMBIGUOUS` results but increasing the risk of incorrect translations.

**Important**  
Adjusting the threshold should be a last resort. In most cases, improving variable descriptions or removing overlapping variables is a better fix because it addresses the root cause. For more information on how thresholds work, see [Confidence thresholds](automated-reasoning-checks-concepts.md#ar-concept-confidence-thresholds).

## Fix rule issues


Rule issues occur when the translation is correct but the policy logic doesn't match your domain. You've confirmed that the right variables are assigned with the right values, but the validation result is still wrong.

### Getting VALID when you expected INVALID


The policy doesn't have a rule that prohibits the claim. The response contradicts your domain knowledge, but the policy allows it.

**Diagnosis:** Look at the `supportingRules` in the finding. These are the rules that prove the claim is valid. Check whether these rules are correct or whether a rule is missing.

**Common causes and fixes:**
+ **Missing rule.** Your policy doesn't have a rule that covers this condition. Add a new rule that captures the constraint. For example, if the policy allows parental leave for all full-time employees but should require 12 months of tenure, add: `(=> (and isFullTime (<= tenureMonths 12)) (not eligibleForParentalLeave))`
+ **Rule is too permissive.** An existing rule allows more than it should. Edit the rule to add the missing condition. For example, change `(=> isFullTime eligibleForParentalLeave)` to `(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)`
+ **Missing variable.** The policy doesn't have a variable to capture a relevant concept. Add the variable, write a clear description, and create rules that reference it.

### Getting INVALID when you expected VALID


The policy has a rule that incorrectly prohibits the claim.

**Diagnosis:** Look at the `contradictingRules` in the finding. These are the rules that disprove the claim. Check whether these rules are correct.

**Common causes and fixes:**
+ **Rule is too restrictive.** An existing rule blocks a valid scenario. Edit the rule to relax the condition or add an exception. For example, if the rule requires 24 months of tenure but the policy should require only 12, update the threshold.
+ **Rule was misextracted.** Automated Reasoning checks misinterpreted your source document. Edit the rule to match the intended logic, or delete it and add a correct rule manually.

### Getting SATISFIABLE when you expected VALID


The response is correct under some conditions but not all. The policy has additional rules that the response doesn't address.

**Diagnosis:** Compare the `claimsTrueScenario` and `claimsFalseScenario` in the finding. The difference between them shows the conditions that the response doesn't mention.

**Common causes and fixes:**
+ **Response is incomplete.** The test output doesn't mention all the conditions required by the policy. Update the test output to include the missing conditions, or change the expected result to `SATISFIABLE` if incomplete responses are acceptable for your use case.
+ **Policy has unnecessary rules.** The policy requires conditions that aren't relevant to this scenario. Review whether the additional rules should apply and remove them if they don't.

## Fix impossible results


An `IMPOSSIBLE` result means Automated Reasoning checks can't evaluate the claims because the premises are contradictory or the policy itself contains conflicting rules. There are two distinct causes.

### Contradictions in the input


The test input contains statements that contradict each other. For example, "I'm a full-time employee and also part-time" sets `isFullTime = true` and `isFullTime = false` simultaneously, which is logically impossible.

**Diagnosis:** Inspect the `translation` premises in the finding. Look for variables that are assigned contradictory values.

**Fix:** If this is a test, rewrite the input to remove the contradiction. At runtime, your application should handle `IMPOSSIBLE` results by asking the user to clarify their input.

### Conflicts in the policy


The policy contains rules that contradict each other, making it impossible for Automated Reasoning checks to reach a conclusion for inputs that involve the conflicting rules.

**Diagnosis:** If the input is valid (no contradictory premises), the issue is in the policy. Check the `contradictingRules` field in the finding to identify which rules conflict. Also check the quality report (see [Use the quality report](#use-quality-report)) — it flags conflicting rules automatically.

**Common causes and fixes:**
+ **Contradictory rules.** Two rules reach opposite conclusions for the same conditions. For example, one rule says full-time employees are eligible for leave, while another says employees in their first year are not eligible, without specifying what happens to full-time employees in their first year. Merge the rules into a single rule with explicit conditions: `(=> (and isFullTime (> tenureMonths 12)) eligibleForLeave)`
+ **Bare assertions.** A bare assertion like `(= eligibleForLeave true)` makes it impossible for any input to claim the user is *not* eligible. Rewrite bare assertions as implications. See [Use implications (=>) to structure rules](automated-reasoning-policy-best-practices.md#bp-use-implications).
+ **Circular dependencies.** Rules that depend on each other in a way that creates logical loops. Simplify the rules to break the cycle, or use intermediate variables to make the logic explicit.

## Use annotations to repair your policy


Annotations are targeted corrections you apply to your policy when tests fail. Instead of manually editing rules and variables, you can use annotations to describe the change you want and let Automated Reasoning checks apply it. Annotations are available through both the console and the API.

### Apply annotations in the console


1. Open the failed test and review the findings to understand the issue.

1. Modify the test conditions (for example, add a premise or change the expected result) and rerun the test. If the modified test returns the result you expect, you can apply this modification as an annotation.

1. Choose **Apply annotations**. Automated Reasoning checks starts a build workflow to apply the changes to your policy based on your feedback.

1. On the **Review policy changes** screen, review the proposed changes to your policy's rules, variables, and types. Then select **Accept changes**.

### Apply annotations using the API


Use the `StartAutomatedReasoningPolicyBuildWorkflow` API with `REFINE_POLICY` to apply annotations programmatically. Pass the complete current policy definition alongside the annotations.

Annotation types include:
+ **Variable annotations:** `addVariable`, `updateVariable`, `deleteVariable` — Add missing variables, improve descriptions, or remove duplicates.
+ **Rule annotations:** `addRule`, `updateRule`, `deleteRule`, `addRuleFromNaturalLanguage` — Fix incorrect rules, add missing rules, or remove conflicting rules. Use `addRuleFromNaturalLanguage` to describe a rule in plain English and let Automated Reasoning checks convert it to formal logic.
+ **Type annotations:** `addType`, `updateType`, `deleteType` — Manage custom types (enums).
+ **Feedback annotations:** `updateFromRulesFeedback`, `updateFromScenarioFeedback` — Provide natural language feedback about specific rules or scenarios and let Automated Reasoning checks deduce the necessary changes.

**Example: Add a missing variable and rule using annotations**

```
aws bedrock start-automated-reasoning-policy-build-workflow \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --build-workflow-type REFINE_POLICY \
  --source-content "{
    \"policyDefinition\": EXISTING_POLICY_DEFINITION_JSON,
    \"workflowContent\": {
      \"policyRepairAssets\": {
        \"annotations\": [
          {
            \"addVariable\": {
              \"name\": \"tenureMonths\",
              \"type\": \"int\",
              \"description\": \"The number of complete months the employee has been continuously employed. When users mention years of service, convert to months (for example, 2 years = 24 months).\"
            }
          },
          {
            \"addRuleFromNaturalLanguage\": {
              \"naturalLanguage\": \"If an employee is full-time and has more than 12 months of tenure, then they are eligible for parental leave.\"
            }
          }
        ]
      }
    }
  }"
```

### Annotation examples


**Example 1: Fix a missing tenure requirement**

Problem: The policy approves parental leave for all full-time employees, but the source document requires 12\$1 months of tenure.


| Before | After annotation | 
| --- | --- | 
|  Rule: `(=> isFullTime eligibleForParentalLeave)` No `tenureMonths` variable.  |  New variable: `tenureMonths` (int) — "The number of complete months the employee has been continuously employed." Updated rule: `(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)`  | 

**Example 2: Fix overlapping variables causing TRANSLATION\$1AMBIGUOUS**

Problem: Two variables (`tenureMonths` and `monthsOfService`) represent the same concept, causing inconsistent translations.

Annotations:

1. `deleteVariable` for `monthsOfService`.

1. `updateVariable` for `tenureMonths` with an improved description that covers all the ways users might refer to employment duration.

1. `updateRule` for any rules that referenced `monthsOfService`, changing them to use `tenureMonths`.

**Example 3: Fix a bare assertion causing IMPOSSIBLE results**

Problem: The rule `(= eligibleForParentalLeave true)` is a bare assertion that makes it impossible for any input to claim the user is not eligible.

Annotations:

1. `deleteRule` for the bare assertion.

1. `addRuleFromNaturalLanguage`: "If an employee is full-time and has more than 12 months of tenure, then they are eligible for parental leave."

## Use the quality report


The quality report is generated after each build workflow and identifies structural issues in your policy that can cause test failures. In the console, quality report issues are surfaced as warnings on the **Definitions** page. Via the API, use `GetAutomatedReasoningPolicyBuildWorkflowResultAssets` with `--asset-type QUALITY_REPORT`.

The quality report flags the following issues:

### Conflicting rules


Two or more rules reach contradictory conclusions for the same set of conditions. Conflicting rules cause your policy to return `IMPOSSIBLE` for all validation requests that involve the conflicting rules.

**Example:** Rule A says `(=> isFullTime eligibleForLeave)` and Rule B says `(=> (<= tenureMonths 6) (not eligibleForLeave))`. For a full-time employee with 3 months of tenure, Rule A says eligible and Rule B says not eligible — a contradiction.

**Fix:** Merge the rules into a single rule with explicit conditions: `(=> (and isFullTime (> tenureMonths 6)) eligibleForLeave)`. Or delete one of the conflicting rules if it was misextracted.

### Unused variables


Variables that aren't referenced by any rules. Unused variables add noise to the translation process and can cause `TRANSLATION_AMBIGUOUS` results when they compete with similar active variables for the same concept.

**Fix:** Delete unused variables unless you plan to add rules that reference them in a future iteration.

### Unused type values


Values in a custom type (enum) that aren't referenced by any rules. For example, if your `LeaveType` enum has values PARENTAL, MEDICAL, BEREAVEMENT, and PERSONAL, but no rule references PERSONAL, it's flagged as unused.

**Fix:** Either add rules that reference the unused value, or remove it from the enum. Unused values can cause translation issues if the input mentions the concept but no rule handles it.

### Disjoint rule sets


Groups of rules that don't share any variables. Disjoint rule sets aren't necessarily a problem — your policy may intentionally cover independent topics (for example, leave eligibility and expense reimbursement). However, they can indicate that variables are missing connections between related rules.

**When to act:** If the disjoint rule sets should be related (for example, they both deal with employee benefits but use different variable names for the same concept), merge the overlapping variables to connect them. If the rule sets are genuinely independent, no action is needed.

## Use Kiro CLI for policy refinement


Kiro CLI provides an interactive chat interface for diagnosing and fixing policy issues. It can load your policy definition and quality report, explain why tests are failing, suggest changes, and apply annotations — all through natural language conversation.

Kiro CLI is particularly useful for:
+ **Understanding failures.** Ask Kiro CLI to load a failing test and explain why it's not returning the expected result. Kiro CLI will analyze the policy definition, the test findings, and the quality report to identify the root cause.
+ **Resolving quality report issues.** Ask Kiro CLI to summarize the quality report and suggest fixes for conflicting rules, unused variables, and overlapping variable descriptions.
+ **Suggesting rule changes.** Describe the behavior you expect and ask Kiro CLI to propose the necessary variable and rule changes. Review the suggestions and instruct Kiro CLI to apply them as annotations.

**Example workflow:**

```
You: The test with ID test-12345 is not returning the expected result.
     Can you load the test definition and findings, look at the policy
     definition, and explain why this test is failing?

Kiro: [analyzes the test and policy] The test expects VALID but gets
      INVALID because rule R3 requires 24 months of tenure, while the
      test input specifies 18 months. The source document says 12 months.
      Rule R3 appears to have been misextracted.

You: Can you suggest changes to fix this?

Kiro: I suggest updating rule R3 to change the tenure threshold from 24
      to 12 months. Here's the updated rule: ...

You: Looks good. Can you use the annotation APIs to submit these changes?

Kiro: [applies annotations via the API]
```

For complete instructions on setting up and using Kiro CLI with Automated Reasoning policies, see [Use Kiro CLI with an Automated Reasoning policy](kiro-cli-automated-reasoning-policy.md).

# Use Kiro CLI with an Automated Reasoning policy
Use Kiro CLI with an Automated Reasoning policy

You can use Kiro CLI to ask questions about your Automated Reasoning policies, understand the behavior of the various rules, and request changes that address failing tests or ambiguities in the policy itself. Kiro CLI is particularly useful for the iterative refinement workflow described in [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md) because it can load your policy definition, analyze test results, and apply annotations through natural language conversation.

## Prerequisites


To use Kiro CLI with your Automated Reasoning policies, you must first complete the following steps:
+ Install the latest version of [Kiro CLI](https://kiro.dev/cli/).
+ Install the latest version of the AWS CLI.
+ Create an Automated Reasoning policy using a document through the console or APIs. To get started quickly, use the built-in sample Homework policy from the console. For more information, see [Create your Automated Reasoning policy](create-automated-reasoning-policy.md).
+ Familiarize yourself with Automated Reasoning checks concepts, particularly policies, rules, variables, and findings. For more information, see [Automated Reasoning checks concepts](automated-reasoning-checks-concepts.md).
+ Copy the content of the contextual prompt provided in [Automated Reasoning policy API context prompt](#kiro-cli-context-prompt) and save it in a Markdown file in your project folder. This prompt helps Kiro CLI use the Automated Reasoning policy control plane and test API correctly.

**Note**  
For the prompt examples below, we use the sample Homework policy. The prompts should work just as well with other policies, simply change the topic highlighted.

**Note**  
Automated Reasoning policies can be complex and require Kiro CLI to reason through complex logical constructs. For best performance, we recommend using larger LLMs such as Anthropic Sonnet 4.5. To change model in Kiro CLI, use the `/model` command.

## Getting started


You need the ARN of the Automated Reasoning policy you created to start the workflow with Kiro CLI.

1. Using the console, open your Automated Reasoning policy and from the **Policy Overview** page, open the **Policy details** tab.

1. In the **Policy details** tab, find the policy ARN and copy it to your clipboard.

1. Using the terminal, start a Kiro CLI session with the following command:

   ```
   kiro-cli
   ```

1. With your first prompt, ask Kiro to look for the instructions Markdown file you copied from this page as part of the prerequisites. For example:

   ```
   We will be using Automated Reasoning checks control plane APIs. I have saved an instructions file called your_file_name.md in this folder. Read this file as it will give you the context you need to work with the APIs.
   ```

1. After Kiro CLI has loaded and understood Automated Reasoning checks' APIs, ask it to load the latest build of your policy and start exploring it. Use a variation of the following prompt with the ARN you copied:

   ```
   Load the policy assets for the latest build of the policy with ARN YOUR_POLICY_ARN. Make sure you understand the policy with all its rules and variables. Give a high-level description of the policy and the type of content it is capable of validating.
   ```

At this point, Kiro CLI should provide you with a brief description of the policy's rules and variables. Kiro CLI should also load the policy quality report and summarize issues like unused types and variables.

## Resolving policy issues


You can use Kiro CLI to resolve policy issues reported in the policy report. First, ask Kiro to give you a summary of the quality report:

```
Can you give me a summary of the quality report for this policy?
```

The quality report includes a list of the unused variables, conflicting rules, and disjointed rules and other potential issues with the policy. For more information about interpreting the quality report, see [Use the quality report](address-failed-automated-reasoning-tests.md#use-quality-report).

Conflicting rules will cause your policy to respond with `IMPOSSIBLE` to all validation requests. For more information about conflicting rules and how to resolve them, see [Conflicts in the policy](address-failed-automated-reasoning-tests.md#fix-impossible-policy-conflicts). You can ask Kiro CLI to explain the conflict and propose a solution:

```
Can you look at the conflicting rules, explain how they are used in the policy, why they conflict, and suggest a change such as deleting one of the rules or merging the logic from the two into a single rule?
```

Unused variables can cause validation results to return `TRANSLATION_AMBIGUOUS` results. For more information about why unused variables cause issues, see [Unused variables](automated-reasoning-policy-best-practices.md#bp-anti-unused-variables). You can ask Kiro CLI to help with this issue:

```
I see the quality report lists some unused variables, can you get rid of them?
```

Similarly, ambiguous variables that are semantically similar can cause validation results to return `TRANSLATION_AMBIGUOUS` results. For more information about overlapping variables and how to fix them, see [Overlapping variables](automated-reasoning-policy-best-practices.md#bp-anti-overlapping-variables) and [Overlapping variable definitions](address-failed-automated-reasoning-tests.md#fix-overlapping-variables). You can ask Kiro CLI to help with this issue:

```
Automated Reasoning checks translate input natural language into logical statements that use the schema of variables from the policy. Variables that are semantically similar - ambiguous - can cause issues with inconsistent translations. Can you take a look at the schema of variables and help me identify variables that have potentially overlapping meanings? If you find any, suggest changes like removing one of them or merging them. Variable changes are also likely to require corresponding rule changes.
```

**Note**  
After processing some changes, Kiro CLI will ask for confirmation to apply them. At this point, you can use the Bedrock Console user interface to review the proposed changes in a diff screen. If you use the console to review and approve the changes, do not forget to tell Kiro CLI to reload the latest build of the policy definition.

## Interacting with a policy


You can use Kiro CLI to explore your policy. For example, you could ask Kiro CLI to summarize policy rules related to a specific area. Using the sample homework policy as an example, you could ask Kiro CLI to tell you about the rules that constrain math homework specifically:

```
Can you tell me about the rules that constrain math homework? Explain the rules themselves and how they fit in the broader policy.
```

You can also use Kiro CLI to add functionality to the policy. Automated Reasoning policies often work around some key output variables that capture the user intent. For example, the sample homework submission policy uses an `isSubmissionAcceptable` variable as the output of its rules. You can use Kiro CLI to alter the policy to capture other possible user questions. When adding new rules, follow the best practices described in [Automated Reasoning policy best practices](automated-reasoning-policy-best-practices.md), particularly [Use implications (=>) to structure rules](automated-reasoning-policy-best-practices.md#bp-use-implications).

## Editing or revising your policy


You can ask Kiro CLI whether the policy is capable of validating specific questions from your users. For example, the homework policy is focused on determining whether a homework submission is acceptable. You can expect a positive answer from Kiro CLI with this question:

```
Can this policy validate answers about whether homework submissions are acceptable?
```

If you ask about delaying homework submission in the sample Homework policy, Kiro CLI will explain that the policy is not capable of answering these questions:

```
Can this policy validate answers about permissible delays in homework submissions?
```

Kiro CLI will explain what the policy can and cannot do:

```
> Yes, but with limitations. The policy can validate whether a late submission is acceptable, but it cannot answer general questions about permissible delays.

What it CAN validate:
- Whether a specific late submission is acceptable given the circumstances
- Example: "Student submitted 2 days late with a doctor's note because they were sick before the deadline" → The policy will determine if this is acceptable (it would be)

What it CANNOT validate:
- General questions like "How many days late can a student submit homework?"
- Questions about delay policies without specific submission details
- Hypothetical scenarios without concrete variable values

The late submission logic:
The policy allows late submissions (submissionDate > 0) ONLY if:
1. The student has a valid doctor's note (hasDoctorNote = true), AND
2. The student was sick before the original deadline (wasSickPriorToDeadline = true)

Without both conditions met, any submission after the due date (submissionDate > 0) is automatically rejected.

So it's designed to validate specific submission instances against the rules, not to answer policy interpretation questions.
```

When you discover something the policy cannot do, you can ask Kiro CLI to update the policy to capture this information using the following prompt:

```
I would like the policy to also validate answers about by how many days homework submissions can be delayed. Homework submissions accompanied by a doctor note can be up to 7 days late. Suggest the necessary variables and rules to capture these new questions.
```

In response to this prompt, Kiro CLI will suggest a set of variables and rules that can be added to the policy to validate the new type of questions. Review the suggestions and if they align with your intent, you can instruct Kiro CLI to use Automated Reasoning checks' annotation APIs to make these changes to the policy:

```
Looks good. Can you use the annotation APIs to submit these changes to the policy.
```

Once Kiro CLI confirms the annotations are ready, you can open your policy in the console to review the annotations. If the annotations are correct, choose **Apply Annotations**.

After applying the annotations, ask Kiro CLI to reload the latest build of the policy to ensure Kiro CLI is working with a current copy:

```
I applied the annotations. Reload the latest build of the policy.
```

## Address failing tests


A good way to test that your Automated Reasoning policy can validate natural language generated by your application is to use tests. After creating test Q&As with their expected results, you can use Kiro CLI to understand why a test did not return the expected result and adjust the policy. For more information about creating and running tests, see [Test an Automated Reasoning policy](test-automated-reasoning-policy.md). For a systematic approach to diagnosing test failures without Kiro CLI, see [Troubleshoot and refine your Automated Reasoning policy](address-failed-automated-reasoning-tests.md).

1. As a first step, ask Kiro CLI to load the failed test and explain why it is not returning the expected result based on the policy definition. Use the console or APIs to copy the test ID for your failing test. In the console, the test ID is available both in the table that lists tests and the detail page for each test.

   ```
   The test with ID YOUR_TEST_ID is not returning the expected result. Can you load the test definition and findings, look at the policy definition, and explain why this test is failing.
   ```

1. The explanation from Kiro CLI will give you direction on whether the policy is doing the right thing (and you should change the expected result for the test) or the policy is wrong. You can ask Kiro CLI to suggest changes to the policy to ensure that the test returns the expected result:

   ```
   Can you suggest changes to the policy to ensure this test returns the expected result? Explain why you are suggesting these changes. Only create rules in if/then format.
   ```
**Note**  
When suggesting rule changes, Kiro CLI may try to overfit to the specific example and create rules that are not useful in other use cases. Check the test output and give Kiro CLI guidance to focus it on the right problem. For guidance on writing effective rules, see [Automated Reasoning policy best practices](automated-reasoning-policy-best-practices.md).  
For example, asking Kiro to change the sample Homework policy so that the `SATISFIABLE` test returns `VALID`, may lead Kiro to suggest adding axioms to the policy that make the test always pass, such as creating a rule that says `(false isHomeworkSubmissionAcceptable)`. This would ensure the value is always false. While this technically fixes the problematic test, it is detrimental to the overall policy functionality. Analyzing the scenarios returned by the `SATISFIABLE` test result, you can see that give Kiro CLI better guidance to either create a new rule that only covers the constraints specified in the test, or update the existing rules to only check the test constraints:

1. Once you are happy with the suggested changes, ask Kiro CLI to submit the annotations and review them using the console user interface:

   ```
   Looks good. Can you start a build workflow to apply these changes to the policy.
   ```

1. After applying the changes and moving on to the next failing test, ask Kiro CLI to reload the latest build of the policy:

   ```
   I applied the changes. Reload the latest build of the policy.
   ```

## Next steps


Once you are happy with the Automated Reasoning policy, you can deploy it for use in Amazon Bedrock Guardrails. For more information, see [Deploy your Automated Reasoning policy in your application](deploy-automated-reasoning-policy.md).

After deploying your policy, see [Integrate Automated Reasoning checks in your application](integrate-automated-reasoning-checks.md) for guidance on using Automated Reasoning checks at runtime to validate LLM responses and act on the feedback.

## Automated Reasoning policy API context prompt


Copy the following content and save it in a Markdown file in your project folder for Kiro CLI. This prompt provides Kiro CLI with the context it needs to work with the Automated Reasoning policy APIs correctly.

```
# Automated Reasoning Policy APIs and Workflows

## Table of Contents

### Core APIs
- Policy Management
- Policy Versions
- Build Workflows
- Test Management
- Annotations & Scenarios

### Build Workflow Types
- INGEST_CONTENT Workflow
- REFINE_POLICY Workflow
- IMPORT_POLICY Workflow
- GENERATE_FIDELITY_REPORT Workflow

### Annotation Type Reference
- Type Management Annotations
- Variable Management Annotations
- Rule Management Annotations
- Natural Language Rule Creation
- Feedback-Based Updates

### Common Workflows
1. Getting Started (New Policy)
2. Building Policy from Document
3. Policy Development Cycle
4. REFINE_POLICY Workflow (Annotation-Based)

### Testing Workflow
1. Primary Approach: Scenarios API (Recommended)
2. Secondary Approach: Test Cases (User Experience)
3. Test Result Analysis and Troubleshooting

### Build Workflow Monitoring
- Check Build Status
- List Build History
- Best Practice: Clean Build Management
- Troubleshooting Build Failures

### Build Workflow Assets
- Asset Types
- Understanding Conflicting Rules
- Understanding Disjoint Rule Sets
- Advanced Quality Report Analysis

### Additional Topics
- Policy Version Export
- Key Concepts
- Important Format Requirements
- Policy Modeling Best Practices
- ARN Formats

## Core APIs

### Policy Management
- `create-automated-reasoning-policy` - Create initial policy (returns policy ARN). Supports optional `--description`, `--kms-key-id` (for encryption with a customer managed AWS KMS key), `--tags` (up to 200 tags), and `--client-request-token` (idempotency token).
- `get-automated-reasoning-policy` - Retrieve policy (DRAFT version by default with unversioned ARN). Returns `policyId`, `definitionHash`, and `kmsKeyArn` (if a KMS key was provided at creation).
- `update-automated-reasoning-policy` - Update DRAFT policy with new definition. Accepts optional `--name` and `--description` updates alongside `--policy-definition` (required).
- `delete-automated-reasoning-policy` - Delete policy. Supports optional `--force` flag: when true, deletes the policy and all its artifacts (versions, test cases, test results) without validation; when false (default), validates that all artifacts have been deleted first.
- `list-automated-reasoning-policies` - List all policies. Supports optional `--policy-arn` filter to list only versions of a specific policy.

### Policy Versions
- `create-automated-reasoning-policy-version` - Snapshot DRAFT into numbered version. Requires `--last-updated-definition-hash` (concurrency token from get/create/update response). Supports optional `--tags` (up to 200 tags) and `--client-request-token`.
- `export-automated-reasoning-policy-version` - Export specific policy version definition including rules, variables, and types.

### Build Workflows
- `start-automated-reasoning-policy-build-workflow` - Start build process. Valid `--build-workflow-type` values: `INGEST_CONTENT`, `REFINE_POLICY`, `IMPORT_POLICY`, `GENERATE_FIDELITY_REPORT`. Supports optional `--client-request-token` (idempotency token, passed as header).
- `get-automated-reasoning-policy-build-workflow` - Get build workflow status. Status values: `SCHEDULED`, `CANCEL_REQUESTED`, `PREPROCESSING`, `BUILDING`, `TESTING`, `COMPLETED`, `FAILED`, `CANCELLED`.
- `cancel-automated-reasoning-policy-build-workflow` - Cancel running build
- `delete-automated-reasoning-policy-build-workflow` - Delete build workflow. Requires `--last-updated-at` (concurrency token timestamp).
- `list-automated-reasoning-policy-build-workflows` - List build workflows
- `get-automated-reasoning-policy-build-workflow-result-assets` - Get compiled policy assets. Requires `--asset-type`. Valid asset types: `BUILD_LOG`, `QUALITY_REPORT`, `POLICY_DEFINITION`, `GENERATED_TEST_CASES`, `POLICY_SCENARIOS`, `FIDELITY_REPORT`, `ASSET_MANIFEST`, `SOURCE_DOCUMENT`. Supports optional `--asset-id` (required when retrieving `SOURCE_DOCUMENT` assets if multiple source documents were used; obtain from the `ASSET_MANIFEST`).

### Test Management
- `create-automated-reasoning-policy-test-case` - Create test case. Requires `--guard-content` and `--expected-aggregated-findings-result`. Supports optional `--query-content`, `--confidence-threshold` (Double, 0 to 1, minimum confidence level for logic validation), and `--client-request-token`.
- `get-automated-reasoning-policy-test-case` - Get test case details (includes `confidenceThreshold` if set)
- `update-automated-reasoning-policy-test-case` - Update test case. Requires `--guard-content`, `--expected-aggregated-findings-result`, and `--last-updated-at` (concurrency token). Supports optional `--query-content`, `--confidence-threshold`, and `--client-request-token`.
- `delete-automated-reasoning-policy-test-case` - Delete test case. Requires `--last-updated-at` (concurrency token).
- `list-automated-reasoning-policy-test-cases` - List test cases
- `start-automated-reasoning-policy-test-workflow` - Run tests against a completed build. Requires `--build-workflow-id` (the build workflow must show COMPLETED status). Supports optional `--test-case-ids` (array of test case IDs to run; if not provided, all tests for the policy are run) and `--client-request-token`.
- `get-automated-reasoning-policy-test-result` - Get test result for a specific test case. Requires `--build-workflow-id` and `--test-case-id`.
- `list-automated-reasoning-policy-test-results` - List test results. Requires `--build-workflow-id`.

### Annotations & Scenarios
- `get-automated-reasoning-policy-annotations` - Get policy annotations for a build workflow. Requires `--build-workflow-id`. Returns `annotations`, `annotationSetHash` (concurrency token), `buildWorkflowId`, `name`, `policyArn`, and `updatedAt`.
- `update-automated-reasoning-policy-annotations` - Update annotations for a build workflow. Requires `--build-workflow-id`, `--annotations` (array of annotation objects, max 10), and `--last-updated-annotation-set-hash` (concurrency token from get-annotations response). Returns updated `annotationSetHash`.
- `get-automated-reasoning-policy-next-scenario` - Get next test scenario

**Important**: Do NOT use `get-automated-reasoning-policy-annotations` or 
`update-automated-reasoning-policy-annotations` for the `REFINE_POLICY` workflow. Annotations are passed directly in the `start-automated-reasoning-policy-build-workflow` call.

## Build Workflow Types

1. **INGEST_CONTENT** - Process documents to create/extract policy rules
2. **REFINE_POLICY** - Refine and improve existing policies using annotations
3. **IMPORT_POLICY** - Import policies from external sources
4. **GENERATE_FIDELITY_REPORT** - Generate a fidelity report for the policy

### INGEST_CONTENT Workflow
- **Purpose**: Extract policy rules from documents (PDF/TXT)
- **Input**: Documents + optional existing policy definition
- **Use Cases**: Document-to-policy conversion, incremental policy building
- **Content Structure**: `workflowContent.documents[]`

**CRITICAL: Complete Policy Definition for Incremental Building**

When adding documents to an existing policy, you must include the complete current policy definition:

```json
// CORRECT - Incremental policy building
{
  "policyDefinition": {
    "version": "1.0",
    "types": [/* ALL existing types */],
    "rules": [/* ALL existing rules */],
    "variables": [/* ALL existing variables */]
  },
  "workflowContent": {
    "documents": [/* New documents to process */]
  }
}
```

### REFINE_POLICY Workflow
- **Purpose**: Iteratively improve policies with targeted modifications
- **Input**: Policy definition + annotations for specific changes
- **Use Cases**: Kiro CLI suggestions, test-driven improvements, feedback-based refinement
- **Content Structure**: `workflowContent.policyRepairAssets.annotations[]`

**CRITICAL: Complete Policy Definition Required**

ALL build workflows require the COMPLETE existing policy definition in the `policyDefinition` section, not just the changes you want to make.

**REFINE_POLICY Annotation Types:**

**Top-Level Annotations:**
- **Type Management**: `addType`, `updateType`, `deleteType`
- **Variable Management**: `addVariable`, `updateVariable`, `deleteVariable`
- **Rule Management**: `addRule`, `updateRule`, `deleteRule`
- **Natural Language Rules**: `addRuleFromNaturalLanguage`
- **Feedback-Based Updates**: `updateFromRulesFeedback`, `updateFromScenarioFeedback`

**Sub-Operations (only within `updateType`):**
- `addTypeValue`, `updateTypeValue`, `deleteTypeValue` - Used to modify values within an existing custom type

**important**: Only create rules in if/then format.

## Annotation Type Reference

### Type Management Annotations

#### `addType` - Create New Custom Type
```json
{
  "addType": {
    "name": "ApprovalStatus",
    "description": "Status values for approval requests",
    "values": [
      {
        "value": "PENDING",
        "description": "Request is awaiting approval"
      },
      {
        "value": "APPROVED",
        "description": "Request has been approved"
      },
      {
        "value": "REJECTED",
        "description": "Request has been rejected"
      }
    ]
  }
}
```

#### `updateType` - Modify Existing Custom Type
```json
{
  "updateType": {
    "name": "ApprovalStatus",
    "newName": "RequestStatus",
    "description": "Updated status values for all request types",
    "values": [
      {
        "addTypeValue": {
          "value": "ESCALATED",
          "description": "Request escalated to higher authority"
        }
      },
      {
        "updateTypeValue": {
          "value": "PENDING",
          "newValue": "WAITING",
          "description": "Request is waiting for review"
        }
      },
      {
        "deleteTypeValue": {
          "value": "REJECTED"
        }
      }
    ]
  }
}
```

#### `deleteType` - Remove Custom Type
```json
{
  "deleteType": {
    "name": "ObsoleteType"
  }
}
```

### Variable Management Annotations

#### `addVariable` - Create New Variable
```json
{
  "addVariable": {
    "name": "requestAmount",
    "type": "real",
    "description": "The monetary amount of the approval request in USD"
  }
}
```

#### `updateVariable` - Modify Existing Variable
```json
{
  "updateVariable": {
    "name": "requestAmount",
    "newName": "approvalAmount",
    "description": "The monetary amount requiring approval in USD (updated description)"
  }
}
```

#### `deleteVariable` - Remove Variable
```json
{
  "deleteVariable": {
    "name": "obsoleteVariable"
  }
}
```

### Rule Management Annotations

#### `addRule` - Create New Rule (SMT-LIB)
```json
{
  "addRule": {
    "expression": "(=> (and (= userRole MANAGER) (< requestAmount 10000)) (not approvalRequired))"
  }
}
```

#### `updateRule` - Modify Existing Rule
```json
{
  "updateRule": {
    "ruleId": "A1B2C3D4E5F6",
    "expression": "(=> (and (= userRole MANAGER) (< requestAmount 5000)) (not approvalRequired))"
  }
}
```

#### `deleteRule` - Remove Rule
```json
{
  "deleteRule": {
    "ruleId": "G7H8I9J0K1L2"
  }
}
```

### Natural Language Rule Creation

#### `addRuleFromNaturalLanguage` - Convert Natural Language to Rule
```json
{
  "addRuleFromNaturalLanguage": {
    "naturalLanguage": "Managers can approve expense requests up to $5,000 without additional authorization. Senior managers can approve up to $25,000."
  }
}
```

### Feedback-Based Updates

#### `updateFromRulesFeedback` - Improve Rules Based on Performance
```json
{
  "updateFromRulesFeedback": {
    "ruleIds": ["A1B2C3D4E5F6", "G7H8I9J0K1L2"],
    "feedback": "These rules are too restrictive for emergency scenarios. Add exception handling for urgent requests with proper escalation paths."
  }
}
```

#### `updateFromScenarioFeedback` - Improve Based on Test Scenarios
```json
{
  "updateFromScenarioFeedback": {
    "ruleIds": ["A1B2C3D4E5F6"],
    "scenarioExpression": "(and (= requestType EMERGENCY) (= userRole MANAGER) (> requestAmount 10000))",
    "feedback": "Emergency requests should have different approval thresholds. Current rule blocks legitimate emergency expenses."
  }
}
```

**Important**: Do NOT use `get-automated-reasoning-policy-annotations` or `update-automated-reasoning-policy-annotations` for the `REFINE_POLICY` workflow. Annotations are passed directly in the `start-automated-reasoning-policy-build-workflow` call.

## Common Workflows

### 1. Getting Started (New Policy)

**CRITICAL: Always Create Policy First**

You must create a policy before starting any build workflows.

```bash
# Step 1: Create initial policy (REQUIRED FIRST STEP)
aws bedrock create-automated-reasoning-policy \
  --region us-west-2 \
  --name "YourPolicyName"

# Step 2: Extract the policyArn from the response above, then start build workflow
aws bedrock start-automated-reasoning-policy-build-workflow \
  --region us-west-2 \
  --policy-arn "arn:aws:bedrock:us-west-2:123456789012:automated-reasoning-policy/abcd1234efgh" \
  --build-workflow-type INGEST_CONTENT \
  --source-content <policy-definition>

# Step 3: Get build results
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --region us-west-2 \
  --policy-arn "arn:aws:bedrock:us-west-2:123456789012:automated-reasoning-policy/abcd1234efgh" \
  --build-workflow-id <workflow-id>
```

### 2. Building Policy from Document

**RECOMMENDED: Using CLI Input JSON File**

```bash
# Step 1: Encode PDF to base64 and create JSON file with base64 content
PDF_BASE64=$(base64 -i your-policy.pdf | tr -d '\n')

cat > ingest-policy.json << EOF
{
  "policyArn": "arn:aws:bedrock:us-west-2:123456789012:automated-reasoning-policy/your-actual-policy-id",
  "buildWorkflowType": "INGEST_CONTENT",
  "sourceContent": {
    "policyDefinition": {
      "version": "1.0",
      "types": [],
      "rules": [],
      "variables": []
    },
    "workflowContent": {
      "documents": [
        {
          "document": "$PDF_BASE64",
          "documentContentType": "pdf",
          "documentName": "Company Policy Document",
          "documentDescription": "Main policy document containing business rules and organizational guidelines."
        }
      ]
    }
  }
}
EOF

# Step 2: Use the JSON file
aws bedrock start-automated-reasoning-policy-build-workflow \
  --region us-west-2 \
  --cli-input-json file://ingest-policy.json
```

### 3. Policy Development Cycle

```bash
# 1. Import/process policy definition
aws bedrock start-automated-reasoning-policy-build-workflow \
  --build-workflow-type IMPORT_POLICY

# 2. Update DRAFT with processed definition
aws bedrock update-automated-reasoning-policy \
  --policy-arn <unversioned-arn> \
  --policy-definition <build-output>

# 3. Create versioned snapshot of DRAFT (definitionHash from step 2 response)
aws bedrock create-automated-reasoning-policy-version \
  --policy-arn <unversioned-arn> \
  --last-updated-definition-hash <definition-hash>
```

## Testing Workflow

### Primary Approach: Scenarios API (Recommended)

Use `get-automated-reasoning-policy-next-scenario` for comprehensive policy validation.

The Scenarios API is superior for testing because it:
- Tests formal logic directly - Validates policy rules work correctly
- AI-generated scenarios - Comprehensive coverage of edge cases and rule interactions
- Targets specific rules - Tests individual rules and combinations
- Always works - No natural language translation issues
- Intelligent test generation - AI understands policy logic deeply

```bash
# Generate intelligent test scenarios automatically
aws bedrock get-automated-reasoning-policy-next-scenario \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123"
```

### Secondary Approach: Test Cases (User Experience)

Use manual test cases to validate natural language translation.

```bash
# Create test cases for natural language validation
aws bedrock create-automated-reasoning-policy-test-case \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --guard-content "It is 2:30 PM on a clear day" \
  --query-content "What color should the sky be?" \
  --expected-aggregated-findings-result "VALID" \
  --confidence-threshold 0.8
```

### Test Result Analysis and Troubleshooting

**Understanding Test Results:**

**Scenarios API Results:**
- `expectedResult: SATISFIABLE` - Policy logic works correctly
- API errors or logic conflicts - Policy needs fixing with REFINE_POLICY

**Common Test Case Failure Modes:**

1. **TRANSLATION_AMBIGUOUS**
   - Problem: AI can't map natural language to policy variables
   - Solution: Improve variable descriptions with more natural language synonyms

2. **SATISFIABLE when expecting VALID**
   - Problem: Your expected result label is likely WRONG, not the policy
   - SATISFIABLE = "This scenario is logically consistent with the policy rules"
   - VALID = "This is the correct/expected answer according to the policy"
   - Solution: Change `expectedAggregatedFindingsResult` from `VALID` to `SATISFIABLE`

3. **Empty testFindings arrays**
   - Problem: Translation issues, not rule violations
   - Solution: Focus on improving natural language descriptions, not policy logic

**Valid values for `expectedAggregatedFindingsResult`:**
- `VALID` - The claims are true, implied by the premises and the policy
- `INVALID` - The claims are false, not implied by the premises and policy
- `SATISFIABLE` - The claims can be true or false depending on assumptions
- `IMPOSSIBLE` - Automated Reasoning can't make a statement (e.g., conflicting policy rules)
- `TRANSLATION_AMBIGUOUS` - Ambiguity in translation prevented validity checking
- `TOO_COMPLEX` - Input too complex for Automated Reasoning to process within latency limits
- `NO_TRANSLATION` - Some or all of the input wasn't translated into logic

### Running Tests Against a Build

After creating test cases, run them against a completed build workflow:

```bash
# Run all tests against a completed build
aws bedrock start-automated-reasoning-policy-test-workflow \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123"

# Run specific tests only
aws bedrock start-automated-reasoning-policy-test-workflow \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --test-case-ids '["A1B2C3D4E5F6"]'

# Get result for a specific test case
aws bedrock get-automated-reasoning-policy-test-result \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --test-case-id "A1B2C3D4E5F6"

# List all test results for a build
aws bedrock list-automated-reasoning-policy-test-results \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123"
```

## Build Workflow Monitoring

**Critical Build Limits**: The API supports maximum 2 total build workflows per policy, with only 1 allowed to be IN_PROGRESS at any time. When a build workflow completes, you can instruct the user to review the output using the console. 

### Check Build Status

```bash
aws bedrock get-automated-reasoning-policy-build-workflow \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123"
```

### List Build History

```bash
aws bedrock list-automated-reasoning-policy-build-workflows \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --max-results 50
```

### Best Practice: Clean Build Management

```bash
# 1. Check existing builds before starting new ones
aws bedrock list-automated-reasoning-policy-build-workflows \
  --policy-arn <policy-arn> \
  --max-results 10

# 2. Delete old/completed builds if you have 2 already
aws bedrock delete-automated-reasoning-policy-build-workflow \
  --policy-arn <policy-arn> \
  --build-workflow-id "old-workflow-id" \
  --last-updated-at "2025-11-15T00:41:18.608000+00:00"

# 3. Now start your new build
aws bedrock start-automated-reasoning-policy-build-workflow \
  --policy-arn <policy-arn> \
  --build-workflow-type INGEST_CONTENT \
  --source-content <content>
```

## Build Workflow Assets

After a build workflow completes successfully, you can retrieve various assets. After you complete a build workflow, you can ask the user to check the build diff using the Automated Reasoning checks console.

### Asset Types

#### 1. ASSET_MANIFEST - Index of All Assets

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "ASSET_MANIFEST"
```

**What it contains:**
- A manifest listing all available assets and their IDs for the build workflow
- Use this to discover asset IDs needed for retrieving assets

#### 2. POLICY_DEFINITION - The Main Output

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "POLICY_DEFINITION"
```

**What it contains:**
- Compiled policy with extracted/refined rules, variables, and types
- SMT-LIB expressions for all rules
- Complete policy structure ready for deployment

#### 3. BUILD_LOG - Build Process Details

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "BUILD_LOG"
```

**What it shows:**
- Document processing steps - What content was analyzed
- Extraction results - What rules, variables, and types were found
- Processing warnings - Content that couldn't be interpreted
- Success/failure status for each extraction step

#### 4. QUALITY_REPORT - Policy Quality Analysis

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "QUALITY_REPORT"
```

**What it contains:**
- Conflicting rules - Rules that contradict each other
- Unused variables - Variables not referenced by any rules
- Unused type values - Enum values not used in rules
- Disjoint rule sets - Groups of rules that don't interact

#### 5. GENERATED_TEST_CASES - Auto-Generated Tests

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "GENERATED_TEST_CASES"
```

**What it contains:**
- Automatically generated test cases based on the policy rules

#### 6. POLICY_SCENARIOS - Policy Test Scenarios

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "POLICY_SCENARIOS"
```

**What it contains:**
- AI-generated scenarios for comprehensive policy validation

#### 7. FIDELITY_REPORT - Policy Fidelity Analysis

```bash
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "FIDELITY_REPORT"
```

**What it contains:**
- Fidelity analysis results from a GENERATE_FIDELITY_REPORT build workflow

#### 8. SOURCE_DOCUMENT - Original Source Documents

```bash
# Requires --asset-id obtained from the ASSET_MANIFEST
aws bedrock get-automated-reasoning-policy-build-workflow-result-assets \
  --policy-arn "arn:aws:bedrock:region:account:automated-reasoning-policy/policy-id" \
  --build-workflow-id "workflow-123" \
  --asset-type "SOURCE_DOCUMENT" \
  --asset-id "a1b2c3d4-e5f6-4a7b-8c9d-e0f1a2b3c4d5"
```

**What it contains:**
- The original source document used in the build workflow
- The `--asset-id` parameter is required because multiple source documents may have been used
```

# Deploy your Automated Reasoning policy in your application
Deploy your Automated Reasoning policy

After you've tested your Automated Reasoning policy and are satisfied with its performance, you can deploy it for use in your application with Amazon Bedrock Guardrails. This page covers the full deployment workflow: saving an immutable version, attaching it to a guardrail, automating deployment with CloudFormation, and integrating into CI/CD pipelines.

## Save a version of your Automated Reasoning policy


When you're done testing your policy, create an immutable version. Immutable versions ensure that the policy attached to your guardrail doesn't change unexpectedly when you continue editing the DRAFT. Each version is identified by a numeric version number (1, 2, 3, ...) and cannot be modified after creation.

### Using the console


1. In the left navigation, choose **Automated Reasoning**.

1. Choose the Automated Reasoning policy that you want to use with your application.

1. Choose **Save as new version**. You can use this version of your policy with your guardrail.

### Using the API


Use the `CreateAutomatedReasoningPolicyVersion` API to create an immutable version of your Automated Reasoning policy.

#### Request parameters


`policyArn` (required)  
The Amazon Resource Name (ARN) of the Automated Reasoning policy for which to create a version.

`lastUpdatedDefinitionHash` (required)  
The hash of the policy definition for the new version. Retrieve this hash from the `GetAutomatedReasoningPolicy` API. This ensures you're versioning the exact policy definition you tested.

#### Example


```
# Get the current definition hash
aws bedrock get-automated-reasoning-policy \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --query "definitionHash" --output text

# Create the version
aws bedrock create-automated-reasoning-policy-version \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \
  --last-updated-definition-hash "583463f067a8a4f49fc1206b4642fd40..."
```

Example response:

```
{
  "policyArn": "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk",
  "version": "1",
  "name": "MyHRPolicy"
}
```

## Add your Automated Reasoning policy to your guardrail


Once you have a saved version of your Automated Reasoning policy, add it to a guardrail. The guardrail is the runtime component that your application calls to validate LLM responses. You can add an Automated Reasoning policy to a new or existing guardrail.

### Using the console


1. In the left navigation, choose **Guardrails**, then choose **Create guardrail** (or select an existing guardrail and choose **Edit**).

1. When you get to the **Add Automated Reasoning checks** screen, choose **Enable Automated Reasoning policy**.

1. For **Policy name**, choose a saved version of an Automated Reasoning policy, then choose **Next**.

1. Finish creating or updating your guardrail.

### Using the API


Use the `CreateGuardrail` or `UpdateGuardrail` API to add an Automated Reasoning policy to your guardrail. Include the `automatedReasoningConfig` parameter with the versioned policy ARN.

#### Request parameters


`automatedReasoningConfig`  
The configuration for Automated Reasoning checks in Amazon Bedrock Guardrails.

`policyArn` (required)  
The ARN of the Automated Reasoning policy version to use with your guardrail. Use the versioned ARN (ending in `:1`, `:2`, etc.), not the unversioned ARN.

#### Example


```
aws bedrock create-guardrail \
  --name "HR-Policy-Guardrail" \
  --description "Guardrail for HR policy validation" \
  --automated-reasoning-policy-config policies="arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk:1" \
  --cross-region-config '{"guardrailProfileIdentifier": "us.guardrail.v1:0"}' \
  --blocked-input-messaging "I cannot process this request." \
  --blocked-outputs-messaging "I cannot provide this response."
```

**Important**  
Use the versioned policy ARN (for example, `arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk:1`). If you use the unversioned ARN, the API returns an error. Create a version first using `CreateAutomatedReasoningPolicyVersion`.

**Important**  
Guardrails that use Automated Reasoning checks require a cross-Region inference profile. Include the `--cross-region-config` parameter with a `guardrailProfileIdentifier` that matches your Region prefix (for example, `us.guardrail.v1:0` for US Regions or `eu.guardrail.v1:0` for EU Regions). If you omit this parameter, the API returns a `ValidationException`.

## Export a policy version for deployment


To deploy a policy through CloudFormation or a CI/CD pipeline, you need the policy definition JSON. Use the `ExportAutomatedReasoningPolicyVersion` API to export the complete policy definition — including all rules, variables, and custom types — from a saved version.

The exported definition is the same format accepted by the CloudFormation `AWS::Bedrock::AutomatedReasoningPolicy` resource's `PolicyDefinition` property. This makes it straightforward to move a policy from the interactive console workflow to automated deployment.

```
# Export the policy definition from version 1
aws bedrock export-automated-reasoning-policy-version \
  --policy-arn "arn:aws:bedrock:us-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk:1" \
  --query "policyDefinition" \
  --output json > policy-definition.json
```

The exported JSON contains the following structure:

```
{
  "version": "1.0",
  "variables": [
    {
      "name": "isFullTime",
      "type": "BOOL",
      "description": "Whether the employee works full-time (true) or part-time (false)."
    },
    {
      "name": "tenureMonths",
      "type": "INT",
      "description": "The number of complete months the employee has been continuously employed."
    }
  ],
  "rules": [
    {
      "id": "A1B2C3D4E5F6",
      "expression": "(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave)"
    }
  ],
  "types": []
}
```

Store this file in version control alongside your CloudFormation templates. When you update your policy, export the new version and update the file to trigger a deployment.

## Automate deployment with CloudFormation


Use CloudFormation to deploy your Automated Reasoning policy and guardrail as infrastructure as code. The `AWS::Bedrock::AutomatedReasoningPolicy` resource creates a policy with a policy definition that you export from the API or console. Combined with `AWS::Bedrock::Guardrail`, you can deploy the complete validation stack in a single template.

**Note**  
CloudFormation creates the policy resource with the policy definition you provide. It does not run a build workflow or extract rules from source documents. You must first create and test your policy interactively (using the console, API, or Kiro CLI), then export the tested policy definition for use in your template. For more information, see [Export a policy version for deployment](#export-policy-version).

For the complete property reference of the policy resource, see [AWS::Bedrock::AutomatedReasoningPolicy](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-resource-bedrock-automatedreasoningpolicy.html) in the *CloudFormation Template Reference*.

### Example: Deploy a policy and guardrail


The following CloudFormation template creates an Automated Reasoning policy with a policy definition and a guardrail that references it. Replace the policy definition with the JSON exported from your tested policy.

```
AWSTemplateFormatVersion: '2010-09-09'
Description: Deploy an Automated Reasoning policy and guardrail

Parameters:
  PolicyName:
    Type: String
    Default: MyHRPolicy
    Description: Name of the Automated Reasoning policy
  GuardrailName:
    Type: String
    Default: HR-Policy-Guardrail
    Description: Name of the guardrail

Resources:
  AutomatedReasoningPolicy:
    Type: AWS::Bedrock::AutomatedReasoningPolicy
    Properties:
      Name: !Ref PolicyName
      Description: Validates HR chatbot responses about leave eligibility
      PolicyDefinition:
        Version: '1.0'
        Variables:
          - Name: isFullTime
            Type: BOOL
            Description: >-
              Whether the employee works full-time (true) or part-time (false).
              Set to true when users mention being full-time or working 40+ hours
              per week.
          - Name: tenureMonths
            Type: INT
            Description: >-
              The number of complete months the employee has been continuously
              employed. When users mention years of service, convert to months
              (for example, 2 years = 24 months).
          - Name: eligibleForParentalLeave
            Type: BOOL
            Description: >-
              Whether the employee is eligible for parental leave based on
              employment status and tenure.
        Rules:
          - Id: A1B2C3D4E5F6
            Expression: >-
              (=> (and isFullTime (> tenureMonths 12))
              eligibleForParentalLeave)
          - Id: G7H8I9J0K1L2
            Expression: >-
              (=> (or (not isFullTime) (<= tenureMonths 12))
              (not eligibleForParentalLeave))
        Types: []
      Tags:
        - Key: Environment
          Value: Production
        - Key: Team
          Value: HR

  Guardrail:
    Type: AWS::Bedrock::Guardrail
    Properties:
      Name: !Ref GuardrailName
      Description: Guardrail with Automated Reasoning checks for HR policy
      BlockedInputMessaging: I cannot process this request.
      BlockedOutputsMessaging: I cannot provide this response.
      AutomatedReasoningPolicyConfig:
        Policies:
          - !GetAtt AutomatedReasoningPolicy.PolicyArn
      CrossRegionConfig:
        GuardrailProfileArn: !Sub "arn:aws:bedrock:${AWS::Region}:${AWS::AccountId}:guardrail-profile/us.guardrail.v1:0"

Outputs:
  PolicyArn:
    Description: ARN of the Automated Reasoning policy
    Value: !GetAtt AutomatedReasoningPolicy.PolicyArn
  PolicyId:
    Description: ID of the Automated Reasoning policy
    Value: !GetAtt AutomatedReasoningPolicy.PolicyId
  GuardrailId:
    Description: ID of the guardrail
    Value: !Ref Guardrail
```

**Tip**  
For production deployments, keep the policy definition in a separate JSON file and reference it using `Fn::Include` or by loading it as a template parameter. This keeps your template clean and makes it easier to update the policy definition independently.

**Important**  
Guardrails that use Automated Reasoning checks require a cross-Region inference profile. The `CrossRegionConfig` property specifies the guardrail profile ARN for your Region. Replace the Region prefix (`us`) with the appropriate prefix for your deployment Region (for example, `eu` for EU Regions). If you omit this property, the guardrail creation fails.

### Example: Deploy with a customer managed KMS key


To encrypt your policy with a customer managed KMS key, add the `KmsKeyId` property. You must also configure the key policy to allow Amazon Bedrock to use the key. For the required key policy permissions, see [KMS permissions for Automated Reasoning policies](create-automated-reasoning-policy.md#automated-reasoning-policy-kms-permissions).

```
  AutomatedReasoningPolicy:
    Type: AWS::Bedrock::AutomatedReasoningPolicy
    Properties:
      Name: !Ref PolicyName
      Description: Validates HR chatbot responses about leave eligibility
      KmsKeyId: !GetAtt PolicyEncryptionKey.Arn
      PolicyDefinition:
        # ... policy definition ...
      Tags:
        - Key: Environment
          Value: Production
```

**Important**  
Changing the `KmsKeyId` property requires replacement of the resource. CloudFormation will delete the existing policy and create a new one with a new ARN. Update any guardrails that reference the old policy ARN.

## Next steps


After deploying your policy and guardrail, integrate Automated Reasoning checks into your application to validate LLM responses at runtime. For more information, see [Integrate Automated Reasoning checks in your application](integrate-automated-reasoning-checks.md).

# Integrate Automated Reasoning checks in your application
Integrate Automated Reasoning checks

After you deploy your Automated Reasoning policy in a guardrail (see [Deploy your Automated Reasoning policy in your application](deploy-automated-reasoning-policy.md)), you can use it at runtime to validate LLM responses and act on the feedback. This page explains how to call the validation API, interpret the findings programmatically, and implement common integration patterns such as rewriting invalid responses and asking clarifying questions.

Automated Reasoning checks operate in *detect mode* only — they return findings and feedback rather than blocking content. Your application is responsible for deciding what to do with the findings: serve the response, rewrite it, ask for clarification, or fall back to a default behavior.

## Integration overview


At runtime, the integration follows this flow:

```
User question ──► LLM generates response ──► ApplyGuardrail validates response
                                                        │
                                              ┌─────────┴─────────┐
                                              │                   │
                                            VALID              Not VALID
                                              │                   │
                                              ▼                   ▼
                                        Serve response     Inspect findings
                                        to user                  │
                                                        ┌────────┴────────┐
                                                        │                 │
                                                   OTHER FINDING     TRANSLATION_
                                                      TYPES       AMBIGUOUS / SATISFIABLE
                                                        │                 │
                                                        ▼                 ▼
                                                   Rewrite using    Ask user for
                                                   AR feedback      clarification
                                                        │                 │
                                                        ▼                 ▼
                                                   Validate again   Validate with
                                                                    clarified input
```

Automated Reasoning findings are returned through any API that supports a Amazon Bedrock Guardrails configuration:
+ `ApplyGuardrail` — Standalone validation API. Use this when you want to validate content independently of the LLM invocation. This is the recommended approach for Automated Reasoning checks because it gives you full control over what content is validated and when.
+ `Converse` and `InvokeModel` — LLM invocation APIs with guardrail configuration. Automated Reasoning findings are returned in the `trace` field of the response.
+ `InvokeAgent` and `RetrieveAndGenerate` — Agent and knowledge base APIs with guardrail configuration.

This page focuses on the `ApplyGuardrail` API because it provides the most flexibility for implementing the rewriting and clarification patterns described below. For information about using guardrails with the other APIs, see [Use a guardrail](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-use.html).

## Open-source rewriting chatbot sample


For a complete, production-style implementation of the patterns described on this page, see the [Automated Reasoning checks rewriting chatbot](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/responsible_ai/automated-reasoning-rewriting-chatbot) on GitHub. This sample application demonstrates:
+ An iterative rewriting loop where invalid responses are automatically corrected based on AR feedback.
+ Follow-up questions when the LLM needs additional context from the user to rewrite accurately.
+ A timeout mechanism that automatically resumes processing when users don't respond to clarification questions.
+ Policy context injection into LLM prompts so the LLM can reference the full policy rules during rewriting.
+ JSON audit logging of every validation iteration for compliance and debugging.

The sample uses a Python/Flask backend with a React frontend and communicates with Amazon Bedrock for LLM inference and Amazon Bedrock Guardrails for validation through the `ApplyGuardrail` API.

**Note**  
The sample application includes the policy content directly in the LLM generation prompts to support any Automated Reasoning policy without requiring document uploads. In a production deployment, you would typically use RAG content or feed the LLM the original natural language document instead of the Automated Reasoning policy source code.

## Call ApplyGuardrail with Automated Reasoning checks


Use the `ApplyGuardrail` API to validate content against your guardrail. The API accepts one or more content blocks and returns an assessment that includes Automated Reasoning findings.

### Request structure


`guardrailIdentifier` (required)  
The guardrail ID or ARN. Use the guardrail that has your Automated Reasoning policy attached.

`guardrailVersion` (required)  
The guardrail version number (for example, `1`). Use a numbered version for production workloads, not `DRAFT`.

`source` (required)  
Set to `OUTPUT` when validating LLM responses. Set to `INPUT` when validating user prompts. For Automated Reasoning checks, you typically validate the LLM output.

`content` (required)  
An array of content blocks to validate. Each block contains a `text` field with the content to check. You can pass the user question and the LLM response as separate content blocks, or combine them into a single block.

### Example: Validate an LLM response using the AWS CLI


```
aws bedrock-runtime apply-guardrail \
  --guardrail-identifier "your-guardrail-id" \
  --guardrail-version "1" \
  --source OUTPUT \
  --content '[
    {
      "text": {
        "text": "User: Am I eligible for parental leave if I have been working here for 2 years full-time?\nAssistant: Yes, you are eligible for parental leave."
      }
    }
  ]'
```

### Example: Validate an LLM response using Python (boto3)


```
import boto3
import json

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier="your-guardrail-id",
    guardrailVersion="1",
    source="OUTPUT",
    content=[
        {
            "text": {
                "text": (
                    "User: Am I eligible for parental leave if I have been "
                    "working here for 2 years full-time?\n"
                    "Assistant: Yes, you are eligible for parental leave."
                )
            }
        }
    ],
)

# The AR findings are in the assessments
for assessment in response.get("assessments", []):
    ar_assessment = assessment.get("automatedReasoningPolicy", {})
    findings = ar_assessment.get("findings", [])
    for finding in findings:
        # Each finding is a union — exactly one key is present
        # Possible keys: valid, invalid, satisfiable, impossible,
        #                translationAmbiguous, tooComplex, noTranslations
        print(json.dumps(finding, indent=2, default=str))
```

### Response structure


The `ApplyGuardrail` response includes an `assessments` array. Each assessment contains an `automatedReasoningPolicy` object with a `findings` array. Each finding is a union type — exactly one of the following keys is present:
+ `valid`
+ `invalid`
+ `satisfiable`
+ `impossible`
+ `translationAmbiguous`
+ `tooComplex`
+ `noTranslations`

For a detailed description of each finding type and its fields, see [Findings and validation results](automated-reasoning-checks-concepts.md#ar-concept-findings).

## Interpret AR findings at runtime


To act on Automated Reasoning findings programmatically, your application needs to extract the finding type, the translation details, and the supporting or contradicting rules. The following sections explain how to parse each part of a finding.

### Determine the finding type


Each finding is a union — exactly one key is present. Check which key exists to determine the finding type:

```
def get_finding_type(finding):
    """Return the finding type and its data from an AR finding union."""
    for finding_type in [
        "valid", "invalid", "satisfiable", "impossible",
        "translationAmbiguous", "tooComplex", "noTranslations"
    ]:
        if finding_type in finding:
            return finding_type, finding[finding_type]
    return None, None
```

### Read the translation


Most finding types include a `translation` object that shows how Automated Reasoning checks translated the natural language input into formal logic. The translation contains:
+ `premises` — The conditions extracted from the input (for example, `isFullTime = true`, `tenureMonths = 24`).
+ `claims` — The assertions to validate (for example, `eligibleForParentalLeave = true`).
+ `untranslatedPremises` — Parts of the input that could not be mapped to policy variables. These parts are not validated.
+ `untranslatedClaims` — Claims that could not be mapped to policy variables.

Check `untranslatedPremises` and `untranslatedClaims` to understand the scope of the validation. A `VALID` result only covers the translated claims — untranslated content is not verified.

### Read the supporting or contradicting rules


Depending on the finding type, the finding includes rules that explain the result:
+ `valid` findings include `supportingRules` — the policy rules that prove the claims are correct.
+ `invalid` findings include `contradictingRules` — the policy rules that the claims violate.
+ `satisfiable` findings include both a `claimsTrueScenario` and a `claimsFalseScenario` — showing the conditions under which the claims are true and false.

These rules and scenarios are the key inputs for the rewriting pattern described in [Rewrite invalid responses using AR feedback](#rewrite-invalid-responses).

### Determine the aggregate result


A single validation request can return multiple findings. To determine the overall result, sort findings by severity and select the worst. The severity order from worst to best is: `TRANSLATION_AMBIGUOUS`, `IMPOSSIBLE`, `INVALID`, `SATISFIABLE`, `VALID`.

```
SEVERITY_ORDER = {
    "tooComplex": 0,
    "translationAmbiguous": 0,
    "impossible": 1,
    "invalid": 2,
    "satisfiable": 3,
    "valid": 4,
    "noTranslations": 5, 
}

def get_aggregate_result(findings):
    """Return the worst finding type from a list of findings."""
    worst = None
    worst_severity = float("inf")
    for finding in findings:
        finding_type, _ = get_finding_type(finding)
        severity = SEVERITY_ORDER.get(finding_type, 0)
        if severity < worst_severity:
            worst_severity = severity
            worst = finding_type
    return worst
```

## Handle validation outcomes in your application


Use the aggregate result to decide what your application does next. The following table summarizes the recommended action for each result type.


| Result | What it means | Recommended action | 
| --- | --- | --- | 
| valid | The response is mathematically proven correct given the premises and your policy rules. | Serve the response to the user. Log the finding for audit purposes (see [Build an audit trail](#build-audit-trail)). | 
| invalid | The response contradicts your policy rules. The contradictingRules field identifies which rules were violated. | Rewrite the response using the AR feedback (see [Rewrite invalid responses using AR feedback](#rewrite-invalid-responses)). If rewriting fails after multiple attempts, block the response and return a fallback message. | 
| satisfiable | The response is correct under some conditions but not all. It's not wrong, but it's incomplete — it doesn't mention all the requirements. | Rewrite the response to include the missing conditions. Use the claimsFalseScenario to identify what's missing. Alternatively, you can let your LLM ask the user clarifying questions. | 
| impossible | The premises are contradictory, or the policy contains conflicting rules. | Ask the user to clarify their input (see [Ask clarifying questions](#ask-clarifying-questions)). If the issue persists, it may indicate a policy problem — review the quality report. | 
| translationAmbiguous | The input has multiple valid interpretations. The translation models disagreed on how to map the natural language to policy variables. | Ask the user for clarification to resolve the ambiguity. Use the options and differenceScenarios fields to generate targeted clarifying questions. | 
| tooComplex | The input exceeds processing limits for logical analysis. | Simplify the input by breaking it into smaller parts, or return a fallback message explaining that the response could not be verified. | 
| noTranslations | The input is not relevant to your policy's domain. No policy variables could be mapped. | The content is off-topic for this policy. Serve the response without AR validation, or use other guardrail components (such as topic policies) to handle off-topic content. | 

## Rewrite invalid responses using AR feedback


The most powerful integration pattern for Automated Reasoning checks is the *rewriting loop*: when a response is `invalid` or `satisfiable`, your application constructs a prompt that includes the original response, the specific findings, and the policy rules, then asks the LLM to rewrite the response to be consistent with the policy. The rewritten response is validated again, and the loop continues until the response is `valid` or a maximum number of iterations is reached.

### Rewriting loop flow


```
LLM generates initial response
         │
         ▼
Validate with ApplyGuardrail ◄──────────────────┐
         │                                       │
         ▼                                       │
   ┌─────┴─────┐                                 │
   │           │                                 │
 VALID     Not VALID                             │
   │           │                                 │
   ▼           ▼                                 │
 Done    Construct rewriting prompt              │
         with findings + rules                   │
              │                                  │
              ▼                                  │
         LLM rewrites response                   │
              │                                  │
              ▼                                  │
         Max iterations? ──── No ────────────────┘
              │
             Yes
              │
              ▼
         Return best response
         with warning
```

### Construct the rewriting prompt


The rewriting prompt should include three pieces of information from the AR findings:

1. The original response that failed validation.

1. The specific finding — including the translated premises, claims, and the contradicting or supporting rules.

1. An instruction to rewrite the response so that it is consistent with the policy rules.

**Example rewriting prompt template:**

```
The following response was checked against our policy and found to be
{finding_type}.

Original response:
{original_response}

The validation found the following issue:
- Premises (what was understood from the input): {premises}
- Claims (what was asserted): {claims}
- Contradicting rules: {contradicting_rules}

Please rewrite the response so that it is consistent with the policy document. 
Keep the same helpful tone and answer the user's question
accurately based on the rules. If you cannot provide an accurate answer
without more information, explain what additional information is needed.
```

**Tip**  
Always include the Retrieval Augmented Generation (RAG) content in your rewriting requests or the policy rules so the LLM has all the context it needs when rewriting. The rewriting prompt template provides the specific finding details, while the system prompt provides the broader policy context. This dual-context approach is demonstrated in the [open-source rewriting chatbot sample](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/responsible_ai/automated-reasoning-rewriting-chatbot).

### Rewriting best practices

+ **Set a maximum iteration count.** The rewriting loop should have a hard limit (typically 2–5 iterations) to prevent infinite loops. If the response is still not `valid` after the maximum iterations, return the best response with a warning or fall back to a default message.
+ **Process findings in priority order.** When multiple findings are returned, address the most severe finding first. The severity order is: `translationAmbiguous`, `impossible`, `invalid`, `satisfiable`, `valid`.
+ **Include policy context in the system prompt.** The LLM needs access either to the source document or the full policy rules to rewrite accurately. You can use a [ Knowledge Base](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html) to include your documents in the generation request or use the `ExportAutomatedReasoningPolicyVersion` API to retrieve the policy definition and format it for the LLM.
+ **Log each iteration.** Record the original response, the findings, the rewriting prompt, and the rewritten response for each iteration. This audit trail is valuable for debugging and compliance (see [Build an audit trail](#build-audit-trail)).

## Ask clarifying questions


When Automated Reasoning checks return `translationAmbiguous`, `satisfiable`, or `impossible` results, the LLM may not have enough information to rewrite the response accurately. In these cases, your application can ask the user for clarification, then incorporate the answers into the next validation attempt.

### When to ask for clarification

+ **`translationAmbiguous`** — The input has multiple valid interpretations. The `options` field shows the competing interpretations, and the `differenceScenarios` field shows how they differ in practice. Use these to generate targeted questions about the specific ambiguity.
+ **`satisfiable`** — The response is correct under some conditions but not all. The `claimsFalseScenario` shows the conditions under which the response would be incorrect. Ask the user about those specific conditions.
+ **`impossible`** — The input contains contradictory statements. Ask the user to clarify the contradiction.
+ **Rewriting fails** — If the LLM cannot rewrite the response to be `valid` after multiple attempts, it may need additional context from the user. Ask the LLM to generate clarifying questions based on the findings.

### Clarification pattern


The clarification flow works as follows:

1. Extract the ambiguous variables or missing conditions from the AR findings.

1. Generate clarifying questions — either programmatically from the finding fields, or by asking the LLM to formulate questions based on the findings.

1. Present the questions to the user and collect answers.

1. Incorporate the answers into the context and generate a new response.

1. Validate the new response with `ApplyGuardrail`.

**Example: Generate clarifying questions from a `satisfiable` finding**

```
def generate_clarifying_questions(finding_data, user_question):
    """Ask the LLM to generate clarifying questions from a SATISFIABLE finding."""
    claims_true = json.dumps(
        finding_data.get("claimsTrueScenario", {}), indent=2, default=str
    )
    claims_false = json.dumps(
        finding_data.get("claimsFalseScenario", {}), indent=2, default=str
    )

    prompt = (
        f"A user asked: {user_question}\n\n"
        f"The answer is correct when these conditions hold:\n{claims_true}\n\n"
        f"But incorrect when these conditions hold:\n{claims_false}\n\n"
        f"Generate 1-3 short, specific questions to ask the user to determine "
        f"which conditions apply to their situation. Format each question on "
        f"its own line."
    )

    return generate_response(prompt, "You are a helpful assistant.")
```

## Build an audit trail


Automated Reasoning findings provide mathematically verifiable proof of validity. For regulated industries and compliance scenarios, this proof is a key differentiator — you can demonstrate that an AI response was verified against specific policy rules with specific variable assignments, not just pattern-matched or probabilistically assessed.

To build an effective audit trail, log the following information for each validation request:
+ **Timestamp and request ID.** When the validation occurred and a unique identifier for the request.
+ **Input content.** The user question and LLM response that were validated.
+ **Finding type and details.** The validation result (`valid`, `invalid`, etc.), the translated premises and claims, and the supporting or contradicting rules.
+ **Action taken.** What your application did with the finding — served the response, rewrote it, asked for clarification, or blocked it.
+ **Rewriting history.** If the response was rewritten, log each iteration: the original response, the rewriting prompt, the rewritten response, and the validation result for each iteration.
+ **Policy version.** The guardrail version and policy version used for validation. This ensures you can reproduce the validation result later.

**Example: Audit log entry structure**

```
{
  "timestamp": "2025-07-21T14:30:00Z",
  "request_id": "req-abc123",
  "guardrail_id": "your-guardrail-id",
  "guardrail_version": "1",
  "user_question": "Am I eligible for parental leave?",
  "llm_response": "Yes, you are eligible for parental leave.",
  "validation_result": "valid",
  "findings": [
    {
      "type": "valid",
      "premises": "isFullTime = true, tenureMonths = 24",
      "claims": "eligibleForParentalLeave = true",
      "supporting_rules": ["A1B2C3D4E5F6"]
    }
  ],
  "action_taken": "served_response",
  "rewrite_iterations": 0
}
```

**Tip**  
Store audit logs in a durable, tamper-evident store such as Amazon CloudWatch Logs or Amazon S3 with object lock enabled. For compliance scenarios, consider using Lake to query audit logs across your organization.