

# Managing standard operating procedures
<a name="sops"></a>

A standard operating procedure (SOP) is a prescriptive set of steps designed to efficiently recover your application in the event of an outage or alarm. Prepare, test, and measure your SOPs in advance to ensure timely recovery in the event of an operational outage. 

Based on your Application Components, AWS Resilience Hub recommends the SOPs you should prepare. AWS Resilience Hub works with Systems Manager to automate the steps of your SOPs by providing a number of SSM documents you can use as the basis for those SOPs.

For example, AWS Resilience Hub may recommend an SOP for adding disk space based on an existing SSM Automation document. To run this SSM document, you require a specific IAM role with the correct permissions. AWS Resilience Hub creates metadata in your application indicating which SSM automation document to run in the case of disk shortage, and which IAM role is required to run that SSM document. This metadata is then saved in an SSM parameter.

In addition to configuring the SSM automation, it is also best practice to test it with an AWS FIS experiment. Therefore, AWS Resilience Hub also provides an AWS FIS experiment that calls the SSM automation document - this way, you can proactively test your application to make sure the SOP you've created does the intended job.

AWS Resilience Hub provides its recommendations in the form of an CloudFormation template you can add to your application code base. This template provides:
+ The IAM role with the permissions required to run the SOP.
+ An AWS FIS experiment you can use to test the SOP.
+ An SSM parameter that contains application metadata indicating which SSM document and which IAM role is to be run as the SOP, and on which resource. For example: `$(DocumentName) for SOP $(HandleCrisisA) on $(ResourceA)`. 

Creating an SOP may require some trial and error. Running a resiliency assessment against your application and generating an CloudFormation template from the AWS Resilience Hub recommendations is a good start. Use the CloudFormation template to generate an CloudFormation stack, then use the SSM parameters and their default values in your SOP. Run the SOP and see what refinements you need to make. 

Because all applications have differing requirements, the default list of SSM documents that AWS Resilience Hub provides will not be sufficient for all of your needs. You can, however, copy the default SSM documents and use them as a basis to create your own custom documents tailored for your application. You can also create your own entirely new SSM documents. If you create your own SSM documents instead of modifying the defaults, you must associate them with SSM parameters, so the correct SSM document is called when the SOP runs. 

When you've finalized your SOP by creating the necessary SSM documents and updating the parameter and document associations as necessary, add the SSM documents directly to your code base, and make any subsequent changes or customizations there. That way, every time you deploy your application, you'll also deploy the most up-to-date SOP.

**Topics**
+ [Building an SOP based on AWS Resilience Hub recommendations](building-sops.md)
+ [Creating a custom SSM document](create-custom-ssm-doc.md)
+ [Using a custom SSM document instead of the default](using-different-ssm-doc.md)
+ [Testing SOPs](testing-sops.md)
+ [Viewing standard operating procedures](view-sops.md)

# Building an SOP based on AWS Resilience Hub recommendations
<a name="building-sops"></a>

To build an SOP based on AWS Resilience Hub recommendations, you need an AWS Resilience Hub application with a resiliency policy attached to it, and you need to have run a resiliency assessment against that application. The resiliency assessment generates the recommendations for your SOP.

To build an SOP based on AWS Resilience Hub recommendations, you must create an CloudFormation template for the recommended SOPs and include them in your code base.

**Create an CloudFormation template for the SOP recommendations**

1. Open the AWS Resilience Hub console.

1. In the navigation pane, choose **Applications**.

1. From the list of applications, choose the application you want to create an SOP for.

1. Choose **Assessments** tab.

1. Select an assessment from the **Resiliency assessments** table. If you don't have an assessment, complete the procedure in [Running resiliency assessments in AWS Resilience Hub](run-assessment.md) and then return to this step.

1. Under **Operational recommendations**, choose **Standard operating procedures**.

1. Select all the SOP recommendations you want to include.

1. Choose **Create CloudFormation template**. This can take up to a few minutes to create the AWS CloudFormation template.

   Complete the following procedure to include the SOP recommendations in your code base.

**To include the AWS Resilience Hub recommendations in your code base**

1. In **Operational recommendations**, choose **Templates**.

1. In the list of templates, choose the name of the SOP template you just created.

   You can identify the SOPs that are implemented in your application using the following information:
   + **SOP name** – Name of the SOP that you have defined for your application.
   + **Description** – Describes the objective of the SOP.
   + **SSM document** – Amazon S3 URL of the SSM document that contains the SOP definition.
   + **Test run** – Amazon S3 URL of the document that contains the results of the latest test.
   + **Source template** – Provides the Amazon Resource Name (ARN) of the AWS CloudFormation stack that contains the SOP details.

1. Under **Template details**, choose the link in **Templates S3 Path** to open the template object in Amazon S3 console.

1. In Amazon S3 console, from **Objects** table, choose the SOP folder link.

1. To copy the Amazon S3 path, select the check box in front of the JSON file and choose **Copy URL**.

1. Create an AWS CloudFormation stack from AWS CloudFormation console. For more information about creating an AWS CloudFormation stack, see [https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). 

   While creating the AWS CloudFormation stack, you must provide the Amazon S3 path that you copied from the previous step.

# Creating a custom SSM document
<a name="create-custom-ssm-doc"></a>

To fully automate the recovery of your application, you may need to create a custom SSM document for your SOP in Systems Manager console. You can modify an existing SSM document as a base, or you can create a new SSM document.

For detailed information on using Systems Manager to create an SSM document, see [Walkthrough: Using Document Builder to create a custom runbook](https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-walk-document-builder.html).

For information about SSM document syntax, see [SSM document syntax](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-doc-syntax.html).

For information about automating SSM document actions, see [Systems Manager automation actions reference](https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-actions.html).

# Using a custom SSM document instead of the default
<a name="using-different-ssm-doc"></a>

To replace the SSM document AWS Resilience Hub suggested for your SOP with a custom document you've created, work directly in your code base. In addition to adding your new custom SSM automation document, you'll also:

1. Add the IAM permissions required to run the automation.

1. Add an AWS FIS experiment to test your SSM document.

1. Add an SSM parameter that points to the automation document you want to use as the SOP.

Generally, it's most efficient to work with the suggested default values in AWS Resilience Hub and customize them as necessary. For example, add or remove permissions as necessary for the IAM role, change the AWS FIS experiment setup to point to the new SSM document, or change the SSM parameter to point to your new SSM document. 

# Testing SOPs
<a name="testing-sops"></a>

As previously mentioned, best practice is to add AWS FIS experiments to your CI/CD pipelines to test your SOPs regularly; this ensures they're ready to go if an outage occurs.

Test both AWS Resilience Hub-provided and custom SOPs.

# Viewing standard operating procedures
<a name="view-sops"></a>

**To view the implemented SOPs from applications**

1. In the left navigation menu, choose **Applications**.

1. In **Applications**, open an application.

1. Choose **Standard operating procedures** tab.

   In **Standard operating procedures summary** section, the **Implemented standard operating procedures** table displays the list of SOPs that are generated from SOP recommendations.

   You can identify your SOPs by the following:
   + **SOP name** – Name of the SOP that you have defined for your application.
   + **SSM document** – S3 URL of the Amazon EC2 Systems Manager document that contains the SOP definition.
   + **Description** – Describes the objective of the SOP.
   + **Test run** – S3 URL of the document that contains the results of the latest test.
   + **Reference ID** – Identifier of the referenced SOP recommendation.
   + **Resource ID** – Identifier of the resource for which the SOP recommendation is implemented.

**To view the recommended SOPs from assessments**

1. In the left navigation menu, choose **Applications**.

1. Select an application from the **Applications** table. 

   To find an application, enter the application name in the **Find applications** box.

1. Choose **Assessments** tab.

   In **Resiliency assessments** table, you can identify your assessments using the following information:
   + **Name** – Name of the assessment you had provided at the time of creation.
   + **Status** – Indicates the execution state of the assessment.
   + **Compliance status** – Indicates if the assessment is compliant with the resiliency policy.
   + **Resiliency drift status** – Indicates if your application has drifted or not from the previous successful assessment.
   + **App version** – Version of your application.
   + **Invoker** – Indicates the role that invoked the assessment.
   + **Start time** – Indicates the start time of the assessment.
   + **End time** – Indicates the end time of the assessment.
   + **ARN** – The Amazon Resource Name (ARN) of the assessment.

1. Select an assessment from the **Resiliency assessments** table.

1. Choose **Operational recommendations** tab.

1. Choose **Standard operating procedures** tab.

   In the **Standard operating procedures** table, you can understand more about the recommended SOPs using the following information:
   + **Name** – Name of the recommended SOP.
   + **Description** – Describes the objective of the SOP.
   + **State** – Indicates the current implementation state of the SOP. That is, **Implemented**, **Not implemented**, and **Excluded**.
   + **Configuration** – Indicates if there are any pending configuration dependencies that needs to be addressed.
   + **Type** – Indicates the type of SOP.
   + **AppComponent** – Indicates the Application Components (AppComponents) that are associated with this SOP. For more information about supported AppComponents, see [Grouping resources in an AppComponent](https://docs.aws.amazon.com/resilience-hub/latest/userguide/AppComponent.grouping.html?icmpid=docs_resiliencehub_help_panel_operational_recommendations_alarms).
   + **Reference ID** – Indicates the logical identifier of the AWS CloudFormation stack event in AWS CloudFormation.
   + **Recommendation ID** – Indicates the logical identifier of the AWS CloudFormation stack resource in AWS CloudFormation.