

# Incident management
<a name="ams-adv-manage-incidents"></a>

**Topics**
+ [What is incident management?](what-is-incident-mgmt.md)
+ [Incident management service commitments](incident-serv-commits.md)
+ [Incident management examples](incident-mgmt-examples.md)

Incidents are AWS service performance issues that impact your managed environment, as determined by AWS Managed Services (AMS) or you. Incidents identified by the AMS team are first received as "events": a change in system state captured by monitoring. If a configured threshold is breached, the event triggers an alarm, also called an alert. The AMS operations team determines if the event is non-impacting, an incident (a service interruption or degradation), or a problem (the underlying root cause of one or more resolved incidents). 

The AMS team also receives incidents identified by you through the Support center or programmatically using the [AWS Support API](https://docs.aws.amazon.com/awssupport/latest/user/Welcome.html) with the service code `sentinel-report-incident`.

After your incident is received by the AMS operations team, it's reviewed to ensure that the incident is not better classified as a service request. If it should be classified as a service request, it's immediately reclassified and the AMS service request team takes over and you are notified. If the incident can be resolved by the receiving operator, steps are taken to immediately to resolve the incident. AMS operators consult internal documentation for a resolution and, if needed, escalate the incident to other support resources until the incident is resolved. To be kept informed at each step of the incident resolution process, be sure to fill in the **CC Emails** option, and, if you'll connect by federation, log in before following the link in the email that AMS sends. After it is resolved, the AMS operations team documents the incident and resolution for future use.

If an incident resolution requires infrastructure changes, a security review might be needed. Infrastructure changes that might require a security review include those related to IAM, or resource-based policy, or risk approvals. Those types of incidents require an AMS Operations engineer to create an RFC before making the change, and your approval to that RFC is required. For example, should the incident resolution require the update of an IAM policy, there would be an AMS security review and then an AMS Operations engineer would create an RFC with the Management \$1 Advanced stack components \$1 Identity and Access Management (IAM) \$1 Update entity or policy change type (ct-27tuth19k52b4) and wait for you to approve the RFC before proceeding.

**Note**  
AMS now allows incident resolution that requires infrastructure changes to be made without the additional step of RFC approval. If the changes needed to resolve the incident do NOT require a security review (the change is not related to IAM, or resource-based policy, or risk approvals), AMS can make the changes based on your approval received in the incident, without needing separate approval in an RFC.

For definitions of incident management terms, see [AMS Key Terms](https://docs.aws.amazon.com/managedservices/latest/userguide/key-terms.html).

To understand the escalation path of incidents, see [Getting help](https://docs.aws.amazon.com/managedservices/latest/userguide/faq-get-help.html).

For a description of AMS response to incidents, see [AMS incident response](https://docs.aws.amazon.com/managedservices/latest/userguide/sec-incident-response.html).

# What is incident management?
<a name="what-is-incident-mgmt"></a>

Incident management is the process AMS uses to record, act on, communicate progress of, and provide notification of, active incidents.

The goal of the incident management process is to ensure that normal operation of your managed service is restored as quickly as possible, the business impact is minimized, and all concerned parties are kept informed.

Examples of incidents include (but are not restricted to) loss of or degradation of network connectivity, a non-responsive process or API, or a scheduled task not being performed (for example, a failed backup).

The following graphic depicts the workflow of an incident reported by you to AMS.

![\[Incident management workflow between AMS operations and the customer with a customer-reported incident.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/incident-mgmt-workflow-customer.png)


This graphic depicts the workflow of an incident reported by AMS to you.

![\[Incident management workflow between AMS operations and the customer with a CloudWatch-detected incident.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/incident-mgmt-workflow-ams.png)


## Incident priority
<a name="incident-priority"></a>

Incidents created in AWS Support center, console or Support API (SAPI), have different classifications than incidents created in the AMS console.
+ Low: Non-critical functions of your business service, or application, related to AWS or AMS resources are impacted.
+ Medium: A business service or application related to AWS and/or AMS resources is moderately impacted and is functioning in a degraded state.
+ High: Your business is significantly impacted. Critical functions of your application related to AWS and/or AMS resources are unavailable. Reserved for the most critical outages affecting production systems.

**Note**  
The AWS Support Console offers five levels of incident priority that we translate to the three AMS levels.

## Problem vs incident
<a name="what-is-problem-mgmt"></a>

When AMS believes that an incident reveals a larger defect or misconfiguration and could recur, it is considered a problem rather than just an incident. In such cases, AMS undertakes analyses of the problem and offers suggestions to resolve the problem.

# Incident management service commitments
<a name="incident-serv-commits"></a>


**Incident management service commitments**  

| Event or action | Service commitment measurement | 
| --- | --- | 
| Case 1: An event with known impact is generated. AMS opens an incident and informs you. Case 2: AMS contacts you to confirm the impact of the event. You confirm the event is an incident. Case 3: You notice an issue and submit an incident report. | Clock for incident response and incident resolution starts when: Case 1: AMS creates an incident. Case 2: You confirm the alert is an incident. Case 3: You submit an incident. Service commitments depend on the priority of the incident created. | 
| If you submit the incident, AMS sends a response to acknowledge it. If AMS creates the incident on your behalf, a separate incident response is not sent. | Clock for incident resolution continues ticking. Clock for incident response time stops when AMS sends the incident acknowledgement.  Time spent waiting for inputs from you is excluded from incident resolution time calculations. For incidents that AMS creates, the initial response time is the time of the creation of the initial incident notification to you. | 
| For the resources / services in question, AMS checks the health to verify if: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/managedservices/latest/userguide/incident-serv-commits.html) If an incident you submit is not correctly prioritized, AMS re-prioritizes it. If AMS changes an incident priority, a notification is sent to you along with reasoning behind the priority change. In certain cases, an issue you submitted may not qualify as an incident, depending on the cause. In those cases, AMS closes the incident and sends you a notification explaining the reason why. Irrespective of the event categorization, AMS works with you to assist as needed. To understand the rules for incident categorization, see [Incident priority](what-is-incident-mgmt.md#incident-priority). | In case incident priority changes, the service commitment for the new priority is applicable; clock continues ticking. In cases when an incident is closed because it does not meet the definition of an incident, service commitments are not applicable; clock stops. | 
| AMS works on the incident to resolve it within service commitment. In certain cases, if AMS determines that unavailable stack(s) or resource(s) cannot be resolved in a timely manner, AMS will offer Infrastructure Restore as an option for resolution. Infrastructure Restore involves re-deploying existing stack(s), based on the templates of the impacted stack(s), and initiating a data restore based on the last known restore point (EBS/RDS snapshot), unless otherwise specified by you. Ephemeral data on individual EC2 instances will be lost. If you do not authorize an Infrastructure Restore as recommended by AWS, you will not be eligible for a service credit for the associated Incident Resolution Time Service Commitment.  | Clock stops when: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/managedservices/latest/userguide/incident-serv-commits.html) | 
| Occasionally, AMS needs clarification from, or activity by, you to keep incident resolution efforts moving forward, unless you have a pre-defined, approved action. As a result, there is communication between AMS and you in order to resolve incidents | Clock stops when: AMS is waiting for a response or action from you. Clock restarts when: AMS receives the response from you or the action AMS requires of you is completed. | 

**Note**  
For a complete list of service commitments, download the [AMS Service Level Agreement](samples/ams_sla.zip).

# Incident management examples
<a name="incident-mgmt-examples"></a>

Incident management examples.

**Topics**
+ [Incident testing](#incident-testing)
+ [Reporting incidents](gui-ex-report-incident.md)
+ [Monitoring and updating incidents](mon-update-incident-console.md)
+ [Managing incidents with the AWS Support API](report-manage-incidents-api.md)
+ [Responding to AMS-generated incidents](respond-to-sent-generated-incident.md)

The following examples describe using the AMS console to submit an incident. Once submitted, the AMS team works with you to resolve the incident per your Service Level Agreement (SLA).

## Incident testing
<a name="incident-testing"></a>

When testing AMS incident submissions, we ask that you include in the subject text this flag: **AMSTestNoOpsActionRequired**. This flag lets AMS know that the incident submission is only for testing. When AMS operations engineers see that flag, they will not respond in any way to the incident submission.

# Reporting incidents
<a name="gui-ex-report-incident"></a>

Use the AMS console to report an incident. It's important to create a new incident for each new issue or question. When opening cases related to old inquiries, it's helpful to include the related case number so we can refer to previous correspondence.
**Note**  
If case correspondence strays from the original issue, an AMS operator might ask you to report a new incident.

To report an incident using the AMS console:

1. From the left navigation, choose **Incidents**

   The **Incidents** list opens:  
![\[Incidents page with options to create an incident and view open incidents.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiIncidentlistOpenPnC.png)

   If your incident list is empty, the **Clear filter** option resets the filter to **Any status**.

   If you know you want to use phone or chat, click **Create incident in Support Center** to open the incident **Create** page in the Support Center Console, auto-populated with the AMS service type.
**Important**  
Phone calls initiated with Support are recorded, to better improve response. If the call drops, you must call back through the Support Center case, AWS has no mechanism for calling you back. 
Phone and chat support is designed to help with support cases, incidents. and service requests, not RFC or security issues.
For RFC issues, use the correspondence option on the relevant RFC details page, to reach an AMS engineer.
For security issues, create a high-priority (P1 or P2) support case. The live chat feature is not for security events.  
![\[Incidents page showing a list of resolved incidents with their creation dates, subjects, and IDs.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiIncidentList2.png)

1. If you want to find an existing incident, select an incident status filter in the drop-down list.    
<a name="sr-filter-options"></a>[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/managedservices/latest/userguide/gui-ex-report-incident.html)

1. Choose **Create**.

   The **Create an incident** page opens:  
![\[Incident details form with priority options, access issues dropdown, and input fields for subject and details.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiIncidentCreate5.png)

1. Select a **Priority**:
   + **Low**: Non-critical functions of your business service or application related to AWS/AMS resources are impacted.
   + **Medium**: A business service or application related to AWS/AMS resources is moderately impacted and functioning in a degraded state.
   + **High**: Your business is significantly impacted. Critical functions of your application related to AWS/AMS resources are unavailable. Reserved for the most critical outages affecting production systems.

1. Select a **Category**.
**Note**  
If you are going to test incident functionality, then add the no-action flag (AMSTestNoOpsActionRequired) to your incident title.

1. Enter information for:
   + **Subject**: A descriptive title for the incident report.
   + **CC emails**: A list of email addresses for people you want informed about the incident report and resolution.
   + **Details**: A comprehensive description of the incident, the systems impacted, and the expected outcome of the resolution. Answer the pre-set questions, or delete them and enter any relevant information.

   To add an attachment, choose **Add Attachment**, browse to the attachment you want, and click **Open**. To delete the attachment, click the Delete icon: ![\[Blue circular icon with a white X symbol in the center.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/icon-delete-attachment.png).

1. Choose **Submit**.

   A details page opens with information on the incident—such as **Type**, **Subject**, **Created**, **ID**, and **Status**—and a **Correspondence** area that includes the description of the request you created.

   Click **Reply** to open a correspondence area and provide additional details or updates in status.

   Click **Close Case** when the incident has been resolved.

   Click **Load More** if there is more correspondence than will fit on one page.

   Don't forget to rate the communication\$1  
![\[Correspondence section showing a test message from Amazon Web Services with rating stars.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiSRcorrespond.png)

   Your incident displays on the **Incidents** list page.

**YouTube Video**: [ How do I raise an incident from the AWS Managed Services console?](https://www.youtube.com/watch?v=A51d8Qm_MvA&list=PLhr1KZpdzukc_VXASRqOUSM5AJgtHat6-&index=6&t=46s)

# Monitoring and updating incidents
<a name="mon-update-incident-console"></a>

You can update, monitor, and review incident reports and service requests, both called cases, by using the AMS console, or programmatically using the Support API. For information on using the Support API, see [https://docs.aws.amazon.com/awssupport/latest/APIReference/API_DescribeCases.html](https://docs.aws.amazon.com/awssupport/latest/APIReference/API_DescribeCases.html) operation.

To monitor a case, incident or service request, using the AMS console, follow these steps.

1. In the AMS console **Incident reports** or **Service requests** dashboard, browse to a case and choose the **Subject** to open a details page with current status and correspondences.  
![\[Incident detail card showing type, status, subject, ID, creation date, and priority.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiIncidentDetail.png)  
![\[Service request detail showing type, subject, creation date, ID, and resolved status.\]](http://docs.aws.amazon.com/managedservices/latest/userguide/images/guiSRdetail.png)

   When a reported incident or service request case is updated by the AMS operations team, you receive an email and a link to the incident in the AMS console so you can respond. You can't respond to incident correspondence by replying to the email.
**Important**  
You must have entered an email address to receive notifications of state change for a service request or incident case. Notifications only go to the email address added to the case when it's created.  
The link in the notification email will not work unless you are using an email server on your AMS federated network. However, you can respond to the correspondence by going to your AMS console and using the case details page.

1.  If there are many cases in the list, you can use the **Filter** option:
   + **All open** (default): Use this filter to see all cases that have not been resolved.
   + **Unassigned**: Use if you've just submitted the case and have not received any notice that the case state has changed. Note, incidents and service request cases are addressed with different promptness depending on the submitted priority (incidents) or your service level agreement (service requests).
   + **Open**: Use if you have received notice that the case is "Pending Amazon" action; this means that the case has been assigned but work has not yet begun.
   + **Reopened**: Use if you have received notice that the case was reopened after having been resolved.
   + **Work in progress**: Use if you have received notice that an operator has begun to work on the case.
   + **Pending customer action**: Use if you have received an operator request for action on your part.
   + **Customer action completed**: Use if you have received notice that your action on the case has been processed.
   + **Resolved**: Use to view cases that you know have been resolved. Resolved cases are maintained in history for twelve months.
   + **Any status**: Use this filter to see all cases, regardless of status.

1. To check the latest status, refresh the page.

1. If there are so many correspondences that they do not all appear on the page, choose **Load More**.

1. To provide an update to the case status, choose **Reply**, enter the new correspondence, and then choose **Submit**.

1. To close out the case after it has been resolved to your satisfaction, choose **Close case**.

   Be sure to rate the service through the 1-5 star rating to let AMS know how we're doing\$1

# Managing incidents with the AWS Support API
<a name="report-manage-incidents-api"></a>

The [AWS Support API](https://docs.aws.amazon.com/awssupport/latest/user/Welcome.html) enables you to create incidents and add correspondence to them throughout investigations of your issues and interactions with AWS Support staff. The AWS Support API models much of the behavior of the [ AWS Support Center](https://console.aws.amazon.com/support/home#/). For more details about how you can use this AWS support service, see [ Programming an AWS Support Case](https://docs.aws.amazon.com/awssupport/latest/user/Case_Life_Cycle.html#crebopbatecase).

**Note**  
When using the AWS Support API, or SAPI, for AMS Advanced incidents, use this service code: `sentinel-report-incident`.

# Responding to AMS-generated incidents
<a name="respond-to-sent-generated-incident"></a>

AMS proactively monitors your resources; for more information, see [Monitoring and event management](https://docs.aws.amazon.com/managedservices/latest/userguide/monitoring.html). Sometimes AMS identifies and creates an incident case, most often to notify you of an incident. In the event that action is required on your part to resolve an incident, AMS sends a notification to the contact information you have provided for the account. You respond to this incident in the same way as you would any other incident. You would usually respond to incidents via the AMS console; in some cases, contact by email or phone is required.

**Note**  
AMS sends communications to your primary email address on your AWS account; we recommend adding an alternate Operations contact email alias to facilitate the incident management process. This is covered during the AMS onboarding process and related onboarding documentation. If you have provided AMS with non-resource based contacts (that you informed your CSDM of) during onboarding, those contact are used. For example, you could provide a list of contacts named "SecurityContacts" to your CSDMs/CAs to use for security-related incidents or notifications. Contact tags on your instances/resources are used for AMS-generated incidents, if you have provided your consent to CSDM for using tag information.   
To learn more about this notification service, see [Notifications](https://docs.aws.amazon.com/managedservices/latest/userguide/notifications.html).