

# Running and managing resiliency assessments in AWS Resilience Hub
Managing resiliency assessments in AWS Resilience Hub

When your application changes, you should run a resiliency assessment. The assessment compares each Application Component configuration to the policy and makes alarm, SOP, and test recommendations. These configuration recommendations can improve the speed of recovery procedures. 

Alarm recommendations help you set alarms that detect outages. SOP recommendations provide scripts that manage common recovery processes, such as recovery from a backup. Test recommendations offer suggestions to verify your configurations work properly. For example, you can test whether an application recovers during automatic recovery processes, such as automatic scaling or load balancing because of network issues. You can test whether application alarms are triggered when resources reach their limits. You can also test how well SOPs work under the conditions that you indicate.

**Topics**
+ [

# Running resiliency assessments in AWS Resilience Hub
](run-assessment.md)
+ [

# Reviewing assessments reports
](review-assessment.md)
+ [

# Deleting resiliency assessments
](delete-assessment.md)

# Running resiliency assessments in AWS Resilience Hub


You can run resiliency assessments from multiple locations in AWS Resilience Hub. For more information about your application, see [Describing and managing AWS Resilience Hub Applications](applications.md).

**To run a resiliency assessment from the Actions menu**

1.  In the left navigation menu, choose **Applications**.

1. Choose an application from the **Applications** table.

1. Choose the **Assess resiliency** from the **Actions** menu.

1. In **Run resiliency assessment** dialog, you can enter a unique name or use the generated name for the assessment.

1. Choose **Run**.

   To review the assessment report, choose **Assessments** in your application. For more information, see [Reviewing assessments reports](review-assessment.md).

**To run a resiliency assessment from the Assessments tab**

You can run a new resiliency assessment when your application or resiliency policy changes.

1. In the left navigation menu, choose **Applications**.

1. Choose an application from the **Applications** table. 

1. Choose the **Assessments** tab.

1. Choose **Run resiliency assessment**.

1. In **Run resiliency assessment** dialog, you can enter a unique name or use the generated name for the assessment.

1. Choose **Run**.

   To review the assessment report, choose **Assessments** in your application. For more information, see [Reviewing assessments reports](review-assessment.md).

# Reviewing assessments reports


You find assessment reports in the **Assessments** view of your application.

**To find an assessment report**

1. In the left navigation menu, choose **Applications**.

1. In **Applications**, open an application.

1. In **Assessments** tab, choose an assessment report from the **Resiliency assessments** section.

When you open the report, you see the following:
+ An overall overview of the assessment report
+ Recommendations to improve resiliency.
+ Recommendations to set up alarms, SOPs, and tests
+ How to create and manage tags to search and filter your AWS resources

## Assessment report


This section provides an overview of the assessment report. AWS Resilience Hub lists each disruption type and the associated Application Component. It also lists your actual RTO and RPO policies and determines whether the Application Component can achieve the policy goals.

**Overview**

Shows the name of the application, the name of the resiliency policy, and the creation date of the report.

**Detected resource drifts**

This section lists all the resources that were added or removed after they were included in the latest version of the published application. Choose **Reimport input sources** to reimport all the input sources (which contains drifted resources) in the **Input sources** tab. Choose **Publish and assess** to include the updated resources in the application and receive an accurate resiliency assessment.

You can identify the drifted input sources using the following:
+ **Logical ID** – Indicates the logical ID of the resource. A logical ID is a name used to identify resources in your AWS CloudFormation stack, Terraform state file, myApplications application, or AWS Resource Groups.
+ **Change** – Indicates if an input resource was **Added** or **Removed**.
+ **Source name** – Indicates the resource name. Choose a source name to view its details in the respective application. For manually added input sources, the link will not be available. For example, if you choose the source name that is imported from an AWS CloudFormation stack, you will be redirected to the stack details page on the AWS CloudFormation.
+ **Resource type** – Indicates the resource type.
+ **Account** – Indicates the AWS account that owns the physical resource.
+ **Region** – Indicates the AWS Region where the resource is located.

**RTO**

Shows a graphical representation of whether the application is estimated to meet resiliency policy's objectives. This is based on the amount of time that an application can be down without causing significant damage to the organization. The assessment provides an estimated workload RTO.

**RPO**

Shows a graphical representation of whether the application is estimated to meet resiliency policy's objectives. This is based on the amount of time that data can be lost before a significant harm to the business occurs. The assessment provides an estimated workload RPO.

**Details**

Provides detailed descriptions of each disruption type using **All results** and **Application compliance drifts** tabs. **All results** tab shows all the disruptions including compliance drifts, and **Application compliance drifts** tab displays only compliance drifts. Disruption type includes **Application**, cloud infrastructure (**Infrastructure** and **Availability Zone**), and **Region**, and provides the following information about it:
+ **AppComponent**

  The resources that comprise the application. For example, your application might have a database or compute component.
+ **Estimated RTO**

  Indicates whether your policy configuration aligns with your policy requirement. We provide two values, our **Estimated RTO** and your **Targeted RTO**. For example, if you see **2h** value under **Targeted RTO** and **40m** under **Estimated Workload RTO**, it indicates that we provide an estimated workload RTO of 40 minutes, while the current RTO of your application is two hours. We base our estimated workload RTO calculation on the configuration, not the policy. As a result, a multi-Availability Zone database will have the same estimated workload RTO for Availability Zone failure, no matter which policy you select. 
+ **RTO drift**

  Indicates the duration by which your application has drifted from the estimated workload RTO of the previous successful assessment. We provide two values, our **Estimated RTO** and **RTO drift**. For example, if you see **2h** value under **Estimated RTO** and **40m** under **RTO drift**, it indicates that your application drifts from the estimated workload RTO of the previous successful assessment by 40 minutes.
+ **Estimated RPO**

  Shows the actual **Estimated Workload RPO** policy that AWS Resilience Hub estimates, based on the **Targeted RPO** policy that you set for each Application Component. For example, you might have set the RPO target in your resiliency policy for Availability Zone failures to one hour. The estimated result might be calculated near to zero. This assumes that Amazon Aurora, where we commit every transaction, is successful in four out of six nodes, spanning multiple Availability Zones. It might be five minutes for point-in-time restore.

  The only RTO and RPO target that you can opt not to supply is Region. For some applications, it is useful to plan for recovery when there is a crucial dependency on an AWS service, which might become unavailable in the entire Region.

  If you choose this option, such as setting RTO or RPO targets for the Region, you’ll receive an estimated recovery time and operational recommendations for such failures.
+ **RPO drift**

  Indicates the duration by which your application has drifted from the estimated workload RPO of the previous successful assessment. We provide two values, our **Estimated RPO** and **RPO drift**. For example, if you see **2h** value under **Estimated RPO** and **40m** under **RPO drift**, it indicates that your application drifts from the estimated workload RPO of the previous successful assessment by 40 minutes.

# Reviewing resiliency recommendations


Resiliency recommendations evaluate Application Components and recommend how to optimize by estimated workload RTO and estimated workload RPO, costs, and minimal changes.

With AWS Resilience Hub, you can optimize resiliency using one of the following recommended options in **Why you should choose this option**:

**Note**  
AWS Resilience Hub provides up to three AWS Resilience Hub recommended options.
If you set Regional RTO and RPO targets, AWS Resilience Hub displays **Optimize for Region RTO/RPO** in the recommended options. If Regional RTO and RPO targets are not set, **Optimize for Availability Zone (AZ) RTO/RPO** is displayed. For more information about setting Regional RTO/RPO targets while creating resiliency policies, see [Creating resiliency policies](create-policy.md).
Estimated workload RTO and estimated workload RPO values for the applications and their configurations are determined by considering the amount of data and individual AppComponents. However, these values are only estimates. You should use your own testing (such as AWS Fault Injection Service) to test your application for actual recovery times.

**Optimize for Availability Zone RTO/RPO**

The lowest possible estimated workload recovery times (RTO/RPO) during an Availability Zone (AZ) disruption. If your configuration can't be changed sufficiently to meet the RTO and RPO targets, you're informed about the lowest estimated workload AZ recovery times to get your configuration close to the possibility of meeting the policy.

**Optimize for Region RTO/RPO**

The lowest possible estimated workload recovery times (RTO/RPO) during a Regional disruption. If your configuration can't be changed sufficiently to meet the RTO and RPO targets, you're informed about the lowest estimated workload Region recovery times to get your configuration close to the possibility of meeting the policy.

**Optimize for cost**

The lowest cost that you can incur and still meet your resiliency policy. If your configuration can't be changed sufficiently to meet the optimization goals, you're informed about the lowest cost that you can incur to get your configuration close to the possibility of meeting the policy.

**Optimize for minimal changes**

The minimum changes required to achieve your policy targets. If your configuration can't be changed sufficiently to meet the optimization goals, you're informed about the recommended changes that can get your configuration close to the possibility of meeting the policy.

The following items are included in the optimization category breakdowns:
+ **Description**

  Describes the configurations suggested by AWS Resilience Hub.
+ **Changes**

  A list of text changes that describe the necessary tasks to switch to the suggested configuration.
+ **Base cost**

  The estimated cost associated with the recommended changes.
**Note**  
**Base cost** can vary based on the usage and it does not include any discounts or offers from Enterprise Discount Program (EDP).
+ **Estimated Workload RTO and RPO**

  The estimated workload RTO and estimated workload RPO after changes.

AWS Resilience Hub evaluates whether an Application Component (AppComponent) can comply with a resiliency policy. If the AppComponent does not comply with a resiliency policy and AWS Resilience Hub cannot make any recommendations to facilitate compliance, it might be because the recovery time for the selected AppComponent cannot be met within the constraints of the AppComponent. Examples of AppComponent constraints include resource type, storage size, or resource configuration.

To facilitate the compliance of the AppComponent with the resiliency policy, change the resource type of the AppComponent or update the resiliency policy to align with what the resource can deliver.

# Reviewing operational recommendations


Operational recommendations contain recommendations to set up alarms, SOPs, and AWS FIS experiments through AWS CloudFormation templates. 

AWS Resilience Hub provides AWS CloudFormation template files for you to download and manage the application's infrastructure as code. As a result, we supply recommendations in AWS CloudFormation so that you can add them to your application code. If the size of AWS CloudFormation template file is more than one MB and contains more than 500 resources, AWS Resilience Hub generates more than one AWS CloudFormation template file where the size of each file is not more than one MB and contains up to 500 resources. If the AWS CloudFormation template file is split into multiple files, the AWS CloudFormation template file names will be appended with `partXofY`, where `X` denotes the file number in the sequence and `Y` denotes the total number of files the AWS CloudFormation template file is divided into. For example, if the template file `big-app-template5-Alarm-104849185070-us-west-2.yaml` is divided into four files, the file names would be as follows:
+ `big-app-template5-Alarm-104849185070-us-west-2-part1of4.yaml`
+ `big-app-template5-Alarm-104849185070-us-west-2-part2of4.yaml`
+ `big-app-template5-Alarm-104849185070-us-west-2-part3of4.yaml`
+ `big-app-template5-Alarm-104849185070-us-west-2-part4of4.yaml`

However, in case of large AWS CloudFormation templates, you are requested to provide the Amazon Simple Storage Service URI instead of using CLI/API with local file as input.

In AWS Resilience Hub, you can perform the following actions:
+ You can provision the selected alarms, SOPs, and AWS FIS experiments. To provision alarms, SOPs, and AWS FIS experiments, select the appropriate recommendation and enter a unique name. AWS Resilience Hub creates a template based on your selected recommendations. In **Templates**, you can access your created templates through an Amazon Simple Storage Service (Amazon S3) URL.
+ You can include or exclude selected alarms, SOPs, and AWS FIS experiments that were recommended for your application at any point of time. For more information see, [Including or excluding operational recommendations](exclude-recommend.md).
+ You can also search, create, add, remove, and manage tags, for an application and see all the tags associated with it.

# Including or excluding operational recommendations


AWS Resilience Hub provides an option to include or exclude the alarms, SOPs, and AWS FIS experiments (tests) that were recommended for improving the resiliency score of your application at any point of time. Including and excluding operational recommendations will have an impact on the resiliency score of your application only after you run a new assessment. Hence, we recommend you to run an assessment to get the updated resiliency score and understand its impact on your application.

For more information about restricting permissions to include or exclude recommendations per application, see [Limiting permissions to include or exclude AWS Resilience Hub recommendations](include-exclude-limit-permissions.md).

**To include or exclude operational recommendations from applications**

1. In the left navigation menu, choose **Applications**.

1. In **Applications**, open an application.

1. Choose **Assessments** and select an assessment from the **Resiliency assessments** table. If you don't have an assessment, complete the procedure in [Running resiliency assessments in AWS Resilience Hub](run-assessment.md) and then return to this step.

1. Select **Operational recommendations** tab.

1. To include or exclude operational recommendations from your application, complete the following procedures:

**To include or exclude recommended alarms from your application**

1. To exclude alarms, complete the following steps:

   1. Under **Alarms** tab, from **Alarms** table, select all the alarms (with **Not implemented** state) you want to exclude. You can identify the current implementation state of an alarm from the **State** column.

   1. From **Actions**, choose **Exclude selected**.

   1. From **Exclude recommendations** dialog, select one of the following reasons (optional), and choose **Exclude selected** to exclude the selected alarms from the application.
      + **Already implemented** – Choose this option if you have already implemented these alarms in an AWS service such as Amazon CloudWatch, or any other third-party service provider.
      + **Not relevant** – Choose this option if the alarms do not suit your business requirements.
      + **Too complicated to implement** – Choose this option if you think these alarms are too complicated to implement.
      + **Other** – Choose this option to specify any other reason for excluding the recommendation.

1. To include alarms, complete the following steps:

   1. Under **Alarms** tab, from **Alarms** table, select all the alarms (with **Excluded** state) you want to include. You can identify the current implementation state of the alarm from the **State** column.

   1. From **Actions**, choose **Include selected**.

   1. From **Include recommendations** dialog, choose **Include selected** to include all the selected alarms in your application.

**To include or exclude recommended standard operating procedures (SOPs) from your application**

1. To exclude recommended SOPs, complete the following steps:

   1. Under **Standard operating procedures** tab, from **SOPs** table, select all the SOPs (with **Implemented** or **Not implemented** state) you want to exclude. You can identify the current implementation state of an SOP from the **State** column.

   1. From **Actions**, choose **Exclude selected** to exclude the selected SOPs from your application.

   1. From **Exclude recommendations** dialog, select one of the following reasons (optional), and choose **Exclude selected** to exclude the selected SOPs from the application.
      + **Already implemented** – Choose this option if you have already implemented these SOPs in an AWS service, or any other third-party service provider.
      + **Not relevant** – Choose this option if the SOPs do not suit your business requirements.
      + **Too complicated to implement** – Choose this option if you think these SOPs are too complicated to implement.
      + **None** – Choose this option if you don't want to specify the reason.

1. To include SOPs, complete the following steps:

   1. Under **Standard operating procedures** tab, from **SOPs** table, select all the alarms (with **Excluded** state) you want to include. You can identify the current implementation state of the alarm from the **State** column.

   1. From **Actions**, choose **Include selected**.

   1. From **Include recommendations** dialog, choose **Include selected** to include all the selected SOPs in your application.

**To include or exclude recommended tests from your application**

1. To exclude recommended tests, complete the following steps:

   1. Under **Fault injection experiment templates** tab, from **Fault injection experiment templates** table, select all the tests (with **Implemented** or **Not implemented** state) you want to exclude. You can identify the current implementation state of a test from the **State** column.

   1. From **Actions**, choose **Exclude selected**.

   1. From **Exclude recommendations** dialog, select one of the following reasons (optional), and choose **Exclude selected** to exclude the selected AWS FIS experiments from the application.
      + **Already implemented** – Choose this option if you have already implemented these tests in an AWS service, or any other third-party service provider.
      + **Not relevant** – Choose this option if the tests do not suit your business requirements.
      + **Too complicated to implement** – Choose this option if you think these tests are too complicated to implement.
      + **None** – Choose this option if you don't want to specify the reason.

1. To include recommended tests, complete the following steps:

   1. Under **Fault injection experiment templates** tab, from **Fault injection experiment templates** table, select all the tests (with **Excluded** state) you want to include. You can identify the current implementation state of the test from the **State** column.

   1. From **Actions**, choose **Include selected**.

   1. From **Include recommendations** dialog, choose **Include selected** to include all the selected tests in your application.

# Deleting resiliency assessments


You can delete resiliency assessments in the **Assessments** view of your application.

**To delete a resiliency assessment**

1. In the left navigation menu, choose **Applications**.

1. In **Applications**, open an application. 

1. In **Assessments**, choose an assessment report in the **Resiliency assessments **table.

1. To confirm the deletion, choose **Delete**.

   The report no longer appears in the **Resiliency assessments** table.