

 This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

# High availability and scalability on AWS
<a name="high-availability-and-scalability-on-aws"></a>

 Most providers of real-time communications align with service levels that provide availability from 99.9% to 99.999%. Depending on the degree of high availability (HA) that you want, you must take increasingly sophisticated measures along the full lifecycle of the application. AWS recommends following these guidelines to achieve a robust degree of high availability: 
+  Design the system to have no single point of failure. Use automated monitoring, failure detection, and failover mechanisms for both stateless and stateful components 
  +  Single points of failure (SPOF) are commonly eliminated with an N\$11 or 2N redundancy configuration, where N\$11 is achieved via load balancing among *active–active* nodes, and 2N is achieved by a pair of nodes in *active–standby* configuration. 
  +  AWS has several methods for achieving HA through both approaches, such as through a scalable, load balanced cluster or assuming an *active–standby* pair. 
+  Correctly instrument and test system availability. 
+  Prepare operating procedures for manual mechanisms to respond to, mitigate, and recover from the failure. 

 This section focuses on how to achieve no single point of failure using capabilities available on AWS. Specifically, this section describes a subset of core AWS capabilities and design patterns that allow you to build highly available real-time communication applications. 

# Floating IP pattern for HA between active–standby stateful servers
<a name="floating-ip-pattern-for-ha-between-activestandby-stateful-servers"></a>

 The floating IP design pattern is a well-known mechanism to achieve automatic failover between an active and standby pair of hardware nodes (media servers). A static secondary virtual IP address is assigned to the active node. Continuous monitoring between the active and standby nodes detects failure. If the active node fails, the monitoring script assigns the virtual IP to the ready standby node and the standby node takes over the primary active function. In this way, the virtual IP floats between the active and standby node. 

## Applicability in RTC solutions
<a name="applicability-in-rtc-solutions"></a>

 It is not always possible to have multiple active instances of the same component in service, such as an active–active cluster of N nodes. An active–standby configuration provides the best mechanism for HA. For example, the stateful components in an RTC solution, such as the media server or conferencing server, or even an SBC or database server, are well-suited for an active–standby setup. An SBC or media server has several long running sessions or channels active at a given time, and in the case of the SBC active instance failing, the endpoints can reconnect to the standby node without any client-side configuration due to the floating IP. 

### Implementation on AWS
<a name="implementation-on-aws"></a>

 You can implement this pattern on AWS using core capabilities in Amazon Elastic Compute Cloud (Amazon EC2), Amazon EC2 API, Elastic IP addresses, and support on Amazon EC2 for secondary private IP addresses. 

To implement the floating IP pattern on AWS:

1.  Launch two EC2 instances to assume the roles of primary and secondary nodes, where the primary is assumed to be in *active* state by default. 

1.  Assign an additional secondary private IP address to the primary EC2 instance. 

1.  An elastic IP address, which is similar to a virtual IP (VIP), is associated with the secondary private address. This secondary private address is the address that is used by external endpoints to access the application. 

1.  Some operating system (OS) configuration is required to make the secondary IP address added as an alias to the primary network interface. 

1.  The application must bind to this elastic IP address. In the case of Asterisk software, you can configure the binding through advanced Asterisk SIP settings. 

1.  Run a monitoring script—custom, KeepAlive on Linux, Corosync, and so on—on each node to monitor the state of the peer node. In the event, that the current active node fails, the peer detects this failure, and invokes the Amazon EC2 API to reassign the secondary private IP address to itself. 

   Therefore, the application that was listening on the VIP associated with the secondary private IP address becomes available to endpoints via the standby node. 

![\[A diagram depicting failover between stateful EC2 instances using an elastic IP address.\]](http://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/images/failover-stateful.jpg)


#### Benefits
<a name="benefits"></a>

 This approach is a reliable low-budget solution that protects against failures at the EC2 instance, infrastructure, or application level. 

##### Limitations and extensibility
<a name="limitations-and-extensibility"></a>

 This design pattern is typically limited to within a single Availability Zone. It can be implemented across two Availability Zones, but with a variation. In this case, the Floating Elastic IP address is re-associated between active and standby node in different Availability Zones via the re-associate elastic IP address API available. In the failover implementation shown in the preceding figure, calls in progress are dropped and endpoints must reconnect. It is possible to extend this implementation with replication of underlying session data to provide seamless failover of sessions or media continuity as well. 

#### Load balancing for scalability and HA with WebRTC and SIP
<a name="load-balancing-for-scalability-and-ha-with-webrtc-and-sip"></a>

 Load balancing a cluster of active instances based on predefined rules, such as round robin, affinity or latency, and so on, is a design pattern widely popularized by the stateless nature of HTTP requests. In fact, load balancing is a viable option in case of many RTC application components. 

 The load balancer acts as the reverse proxy or entry point for requests to the desired application, which itself is configured to run in multiple active nodes simultaneously. At any given point in time, the load balancer directs a user request to one of the active nodes in the defined cluster. Load balancers perform a health check against the nodes in their target cluster and do not send an incoming request to a node that fails the health check. Therefore, a fundamental degree of high availability is achieved by load balancing. Also, because a load balancer performs active and passive health checks against all cluster nodes in sub-second intervals, the time for failover is near instantaneous. 

 The decision on which node to direct is based on system rules defined in the load balancer, including: 
+  Round robin 
+  Session or IP affinity, which ensures that multiple requests within a session or from the same IP are sent to the same node in the cluster 
+  Latency based 
+  Load based 

## Applicability in RTC architectures
<a name="applicability-in-rtc-architectures"></a>

 The WebRTC protocol makes it possible for WebRTC Gateways to be easily load balanced via an HTTP-based load balancer, such as [Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) (ELB), [Application Load Balancer](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/) (ALB), or [Network Load Balancer](https://aws.amazon.com/elasticloadbalancing/network-load-balancer/) (NLB). With most SIP implementations relying on transport over both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP), you need network- or connection-level load balancing with support for both TCP and UDP based traffic is needed. 

## Load balancing on AWS for WebRTC using Application Load Balancer and Auto Scaling
<a name="load-balancing-on-aws-for-webrtc-using-application-load-balancer-and-auto-scaling"></a>

 In the case of WebRTC based communications, Elastic Load Balancing provides a fully managed, highly available and scalable load balancer to serve as the entry point for requests, which are then directed to a target cluster of EC2 instances associated with Elastic Load Balancing. Because WebRTC requests are stateless, you can use Amazon EC2 Auto Scaling, to provide fully automated and controllable scalability, elasticity, and high availability. 

 The Application Load Balancer provides a fully managed load balancing service that is highly available using multiple Availability Zones, and scalable. This supports the load balancing of WebSocket requests that handle the signaling for WebRTC applications and bidirectional communication between the client and server using a long running TCP connection. The Application Load Balancer also supports content-based routing and [sticky sessions](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/sticky-sessions.html), routing requests from the same client to the same target using load balancer generated cookies. If you enable sticky sessions, the same target receives the request and can use the cookie to recover the session context. 

The following figure shows the target topology. 

![\[A diagram depicting WebRTC scalability and high availability architecture.\]](http://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/images/webrtc-scalability.png)


## Implementation for SIP using Network Load Balancer or an AWS Marketplace product
<a name="implementation-for-sip-using-network-load-balancer-or-aws-marketplace-product"></a>

 In the case of SIP-based communications, the connections are made over TCP or UDP, with the majority of RTC applications using UDP. If SIP/TCP is the signal protocol of choice, then it is feasible to use the Network Load Balancer for fully managed, highly available, scalable and performance load balancing. 

 A Network Load Balancer operates at the connection level (Layer four), routing connections to targets such as Amazon EC2 instances, containers, and IP addresses based on IP protocol data. Ideal for TCP or UDP traffic load balancing, network load balancing is capable of handling millions of requests per second while maintaining ultra-low latencies. It is integrated with other popular AWS services, such as Amazon EC2 Auto Scaling, [Amazon Elastic Container Service](https://aws.amazon.com/ecs/) (Amazon ECS), [Amazon Elastic Kubernetes Service](https://aws.amazon.com/eks/) (Amazon EKS) and [AWS CloudFormation](https://aws.amazon.com/cloudformation/). 

 If SIP connections are initiated, another option is to use [AWS Marketplace](https://aws.amazon.com/marketplace) commercial off-the-shelf software (COTS). The AWS Marketplace offers many products that can handle UDP and other types of layer four connection load balancing. COTS typically include support for high availability and commonly integrate with features, such as Amazon EC2 Auto Scaling, to further enhance availability and scalability. The following figure shows the target topology: 

![\[A diagram depicting SIP-based RTC scalability with AWS Marketplace product.\]](http://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/images/sip-based-rtc-scalability.jpg)


# Cross-Region DNS-based load balancing and failover
<a name="cross-region-dns-based-load-balancing-and-failover"></a>

 [Amazon Route 53](https://aws.amazon.com/route53/) provides a global DNS service that can be used as a public or private endpoint for RTC clients to register and connect with media applications. With Amazon Route 53, DNS health checks can be configured to route traffic to healthy endpoints or to independently monitor the health of your application. 

The Amazon Route 53 Traffic Flow feature makes it easy for you to manage traffic globally through a variety of routing types, including latency-based routing, geo DNS, geoproximity, and weighted round robin—all of which can be combined with DNS Failover to enable a variety of low-latency, fault-tolerant architectures. The Amazon Route 53 Traffic Flow simple visual editor allows you to manage how your end users are routed to your application’s endpoints—whether in a single AWS Region or distributed around the globe. 

 In the case of global deployments, the latency-based routing policy in Route 53 is especially useful to direct customers to the nearest point of presence for a media server to improve the quality of service associated with real-time media exchanges. 

 Note that to enforce a failover to a new DNS address, client caches must be flushed. Also, DNS changes may have a lag as they are propagated across global DNS servers. You can manage the refresh interval for DNS lookups with the Time to Live attribute. This attribute is configurable at the time of setting up DNS policies. 

 To reach global users quickly or to meet the requirements of using a single public IP, AWS Global Accelerator can also be used for cross-Region failover. [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/?blogs-global-accelerator.sort-by=item.additionalFields.createdDate&blogs-global-accelerator.sort-order=desc&aws-global-accelerator-wn.sort-by=item.additionalFields.postDateTime&aws-global-accelerator-wn.sort-order=desc) is a networking service that improves availability and performance for applications with both local and global reach. AWS Global Accelerator provides static IP addresses that act as a fixed entry point to your application endpoints, such as your Application Load Balancers, Network Load Balancers, or Amazon EC2 instances in single or multiple AWS Regions. It uses the AWS global network to optimize the path from your users to your applications, improving performance, such as the latency of your TCP and UDP traffic. 

AWS Global Accelerator continually monitors the health of your application endpoints, and automatically redirects traffic to the nearest healthy endpoints in the event of current endpoints turning unhealthy. For additional security requirements, Accelerated Site-to-Site VPN uses AWS Global Accelerator to improve the performance of VPN connections by intelligently routing traffic through the AWS Global Network and AWS edge locations. 

![\[A diagram depicting inter-Region high availability design using AWS Global Accelerator or Amazon Route 53.\]](http://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/images/inter-region-ha-design.png)


# Data durability and HA with persistent storage
<a name="data-durability-and-ha-with-persistent-storage"></a>

 Most RTC applications rely on persistent storage to store and access data for authentication, authorization, accounting (session data, call detail records, etc.), operational monitoring, and logging. In a traditional data center, ensuring high availability and durability for the persistent storage components (databases, file systems, and so on) typically requires heavy lifting via the setup of a storage area network (SAN), Redundant Array of Independent Disks (RAID) design, and processes for backup, restore, and failover processing. The AWS Cloud greatly simplifies and enhances traditional data center practices around data durability and availability. 

 For object storage and file storage, AWS services like [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) and [Amazon Elastic File System](https://aws.amazon.com/efs/) (Amazon EFS) provide managed high availability and scalability. Amazon S3 has a data durability of 99.999999999% (11 nines). 

 For transactional data storage, customers have the option to take advantage of the fully managed Amazon Relational Database Service (Amazon RDS) that supports Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server with high availability deployments. For the registrar function, subscriber profile, or accounting records storage (such as CDRs), the Amazon RDS provides a fault-tolerant, highly available and scalable option. 

# Dynamic scaling with AWS Lambda, Amazon Route 53, and Amazon EC2 Auto Scaling
<a name="dynamic-scaling-with-aws-lambda-amazon-route-53-and-aws-auto-scaling"></a>

AWS allows the chaining of features and the ability to incorporate custom serverless functions as a service based on infrastructure events. One such design pattern that has many versatile uses in RTC applications is the combination of automatic scaling lifecycle hooks with [Amazon CloudWatch Events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html), Amazon Route 53, and [AWS Lambda](https://aws.amazon.com/lambda/) functions. AWS Lambda functions can embed any action or logic. The following figure demonstrates how these features chained together can enhance system reliability and scalability with automation. 

![\[A diagram depicting automatic scaling with dynamic updates to Amazon Route 53 .\]](http://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/images/auto-scaling-dynamic-updates.jpg)


# Highly available WebRTC with Amazon Kinesis Video Streams
<a name="highly-available-webrtc-with-kinesis-video-streams"></a>

[Amazon Kinesis Video Streams](https://aws.amazon.com/kinesis/video-streams/?nc=sn&loc=0&amazon-kinesis-video-streams-resources-blog.sort-by=item.additionalFields.createdDate&amazon-kinesis-video-streams-resources-blog.sort-order=desc) offers real-time media streaming via WebRTC, allowing users to capture, process, and store media streams for playback, analytics, and machine learning. These streams are highly available, scalable, and compliant with WebRTC standards. Amazon Kinesis Video Streams include a WebRTC signaling endpoint for fast peer discovery and secure connection establishment. It includes managed Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN) end-points for real-time exchange of media between peers. It also includes a free open-source SDK that directly integrates with camera firmware to enable secure communication with Amazon Kinesis Video Streams end-points, allowing for peer discovery and media streaming. Finally, it provides client libraries for Android, iOS, and JavaScript that allow WebRTC compliant mobile and web players to securely discover and connect with a camera device for media streaming and two-way communication. 

# Highly available SIP trunking with Amazon Chime Voice Connector
<a name="highly-available-sip-trunking-with-amazon-chime-voice-connector"></a>

[ Amazon Chime Voice Connector](https://docs.aws.amazon.com/chime-sdk/latest/ag/voice-connectors.html) delivers a pay-as-you-go SIP trunking service that enables companies to make and/or receive secure and inexpensive phone calls with their phone systems. Amazon Chime Voice Connector is a low-cost alternative to service provider SIP trunks or Integrated Services Digital Network (ISDN) Primary Rate Interfaces (PRIs). Customers have the option to enable inbound calling, outbound calling, or both. 

The service uses the AWS network to deliver a highly available calling experience across multiple AWS Regions. You can stream audio from SIP trunking telephone calls, or forwarded SIP-based media recording (SIPREC) feeds to Amazon Kinesis Video Streams to gain insights from business calls in real time. You can quickly build applications for audio analytics through integration with [Amazon Transcribe](https://aws.amazon.com/transcribe/) and other common machine learning libraries. 