

# Create fleet infrastructure with an Amazon EC2 Auto Scaling group
Create fleet infrastructure

This section explains how to create an Amazon EC2 Auto Scaling fleet.

Use the CloudFormation YAML template below to create an Amazon EC2 Auto Scaling (Auto Scaling) group, an Amazon Virtual Private Cloud (Amazon VPC) with two subnets, an instance profile, and an instance access role. These are required to launch instance using Auto Scaling in the subnets.

You should review and update the list of instance types to fit your rendering needs.

For a complete explanation of the resources and parameters used in the CloudFormation YAML template, see the [Deadline Cloud resource type reference](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_Deadline.html) in the *AWS CloudFormation User Guide*.

**To create an Amazon EC2 Auto Scaling fleet**

1. Use the following example to create a CloudFormation template that defines the `FarmID`, `FleetID`, and `AMIId` parameters. Save the template to a `.YAML` file on your local computer.

   ```
   AWSTemplateFormatVersion: 2010-09-09
   Description: Amazon Deadline Cloud customer-managed fleet
   Parameters:
     FarmId:
       Type: String
       Description: Farm ID
     FleetId:
       Type: String
       Description: Fleet ID
     AMIId:
       Type: String
       Description: AMI ID for launching workers
   Resources:
     deadlineVPC:
       Type: 'AWS::EC2::VPC'
       Properties:
         CidrBlock: 100.100.0.0/16
     deadlineWorkerSecurityGroup:
       Type: 'AWS::EC2::SecurityGroup'
       Properties:
         GroupDescription: !Join
           - ' '
           - - Security group created for Deadline Cloud workers in the fleet
             - !Ref FleetId
         GroupName: !Join
           - ''
           - - deadlineWorkerSecurityGroup-
             - !Ref FleetId
         SecurityGroupEgress:
           - CidrIp: 0.0.0.0/0
             IpProtocol: '-1'
         SecurityGroupIngress: []
         VpcId: !Ref deadlineVPC
     deadlineIGW:
       Type: 'AWS::EC2::InternetGateway'
       Properties: {}
     deadlineVPCGatewayAttachment:
       Type: 'AWS::EC2::VPCGatewayAttachment'
       Properties:
         VpcId: !Ref deadlineVPC
         InternetGatewayId: !Ref deadlineIGW
     deadlinePublicRouteTable:
       Type: 'AWS::EC2::RouteTable'
       Properties:
         VpcId: !Ref deadlineVPC
     deadlinePublicRoute:
       Type: 'AWS::EC2::Route'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         DestinationCidrBlock: 0.0.0.0/0
         GatewayId: !Ref deadlineIGW
       DependsOn:
         - deadlineIGW
         - deadlineVPCGatewayAttachment
     deadlinePublicSubnet0:
       Type: 'AWS::EC2::Subnet'
       Properties:
         VpcId: !Ref deadlineVPC
         CidrBlock: 100.100.16.0/22
         AvailabilityZone: !Join
           - ''
           - - !Ref 'AWS::Region'
             - a
     deadlineSubnetRouteTableAssociation0:
       Type: 'AWS::EC2::SubnetRouteTableAssociation'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         SubnetId: !Ref deadlinePublicSubnet0
     deadlinePublicSubnet1:
       Type: 'AWS::EC2::Subnet'
       Properties:
         VpcId: !Ref deadlineVPC
         CidrBlock: 100.100.20.0/22
         AvailabilityZone: !Join
           - ''
           - - !Ref 'AWS::Region'
             - c
     deadlineSubnetRouteTableAssociation1:
       Type: 'AWS::EC2::SubnetRouteTableAssociation'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         SubnetId: !Ref deadlinePublicSubnet1
     deadlineInstanceAccessAccessRole:
       Type: 'AWS::IAM::Role'
       Properties:
         RoleName: !Join
           - '-'
           - - deadline
             - InstanceAccess
             - !Ref FleetId
         AssumeRolePolicyDocument:
           Statement:
             - Effect: Allow
               Principal:
                 Service: ec2.amazonaws.com
               Action:
                 - 'sts:AssumeRole'
         Path: /     
         ManagedPolicyArns:
           - 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'
           - 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
           - 'arn:aws:iam::aws:policy/AWSDeadlineCloud-WorkerHost'  
     deadlineInstanceProfile:
       Type: 'AWS::IAM::InstanceProfile'
       Properties:
         Path: /
         Roles:
           - !Ref deadlineInstanceAccessAccessRole
     deadlineLaunchTemplate:
       Type: 'AWS::EC2::LaunchTemplate'
       Properties:
         LaunchTemplateName: !Join
           - ''
           - - deadline-LT-
             - !Ref FleetId
         LaunchTemplateData:
           NetworkInterfaces:
             - DeviceIndex: 0
               AssociatePublicIpAddress: true
               Groups:
                 - !Ref deadlineWorkerSecurityGroup
               DeleteOnTermination: true
           ImageId: !Ref AMIId
           InstanceInitiatedShutdownBehavior: terminate
           IamInstanceProfile:
             Arn: !GetAtt
               - deadlineInstanceProfile
               - Arn
           MetadataOptions:
             HttpTokens: required
             HttpEndpoint: enabled
   
     deadlineAutoScalingGroup:
       Type: 'AWS::AutoScaling::AutoScalingGroup'
       Properties:
         AutoScalingGroupName: !Join
           - ''
           - - deadline-ASG-autoscalable-
             - !Ref FleetId
         MinSize: 0
         MaxSize: 10
         VPCZoneIdentifier:
           - !Ref deadlinePublicSubnet0
           - !Ref deadlinePublicSubnet1
         NewInstancesProtectedFromScaleIn: true
         MixedInstancesPolicy:
           InstancesDistribution:
             OnDemandBaseCapacity: 0
             OnDemandPercentageAboveBaseCapacity: 0
             SpotAllocationStrategy: capacity-optimized
             OnDemandAllocationStrategy: lowest-price
           LaunchTemplate:
             LaunchTemplateSpecification:
               LaunchTemplateId: !Ref deadlineLaunchTemplate
               Version: !GetAtt
                 - deadlineLaunchTemplate
                 - LatestVersionNumber
             Overrides:
               - InstanceType: m5.large
               - InstanceType: m5d.large
               - InstanceType: m5a.large
               - InstanceType: m5ad.large
               - InstanceType: m5n.large
               - InstanceType: m5dn.large
               - InstanceType: m4.large
               - InstanceType: m3.large
               - InstanceType: r5.large
               - InstanceType: r5d.large
               - InstanceType: r5a.large
               - InstanceType: r5ad.large
               - InstanceType: r5n.large
               - InstanceType: r5dn.large
               - InstanceType: r4.large
         MetricsCollection:
           - Granularity: 1Minute
             Metrics:
               - GroupMinSize
               - GroupMaxSize
               - GroupDesiredCapacity
               - GroupInServiceInstances
               - GroupTotalInstances
               - GroupInServiceCapacity
               - GroupTotalCapacity
   ```

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

   Use the CloudFormation console to create a stack using the instructions for uploading the template file that you created. For more information, see [Creating a stack on the CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) in the *AWS CloudFormation User Guide*.

**Note**  
Credentials from the IAM role that are attached to your worker's Amazon EC2 instance are available to *all* processes running on that worker, which includes jobs. The worker should have the least privileges to operate: `deadline:CreateWorker` and `deadline:AssumeFleetRoleForWorker.`
The worker agent obtains credentials for the queue role and configures them for use by running jobs. The Amazon EC2 instance profile role shouldn't include permissions that are needed by your jobs.

## Auto scale your Amazon EC2 fleet with Deadline Cloud scale recommendation feature
Auto scale your fleet

Deadline Cloud leverages an Amazon EC2 Auto Scaling (Auto Scaling) group to scale the Amazon EC2 customer-managed fleet (CMF) automatically. You need to configure the fleet mode as well as deploy the required infrastructure in your account to make your fleet auto scale. The infrastructure you deployed will work for all fleets, so you only need to set it up once.

The basic workflow is: you configure your fleet mode to auto scale, and then Deadline Cloud will send out an EventBridge event for that fleet whenever recommended fleet size changes (one event contains fleet id, recommended fleet size, and other metadata). You will have an EventBridge rule to filter the relevant events and have a Lambda to consume them. The Lambda will integrate with Amazon EC2 Auto Scaling `AutoScalingGroup` to scale the Amazon EC2 fleet automatically.

### Set fleet mode to `EVENT_BASED_AUTO_SCALING`
Set fleet mode

Configure your fleet mode to `EVENT_BASED_AUTO_SCALING`. You can use the console to do this, or use the AWS CLI to directly call the `CreateFleet` or `UpdateFleet` API. After the mode is configured, Deadline Cloud starts sending EventBridge events whenever the recommended fleet size changes.
+ Example `UpdateFleet` command: 

  ```
  aws deadline update-fleet \
    --farm-id FARM_ID \
    --fleet-id FLEET_ID \
    --configuration file://configuration.json
  ```
+ Example `CreateFleet` command: 

  ```
  aws deadline create-fleet \
    --farm-id FARM_ID \
    --display-name "Fleet name" \
    --max-worker-count 10 \
    --configuration file://configuration.json
  ```

The following is an example of `configuration.json` used in the CLI commands above (`--configuration file://configuration.json`).
+ To enable Auto Scaling on your fleet, you should set the mode to `EVENT_BASED_AUTO_SCALING`.
+ The `workerCapabilities` are the default values assigned to the CMF when you created it. You can change these values if you need to increase resources available to your CMF.

 After you configure the fleet mode, Deadline Cloud starts emitting fleet size recommendation events for that fleet.

```
{
    "customerManaged": {
        "mode": "EVENT_BASED_AUTO_SCALING",       
        "workerCapabilities": {
            "vCpuCount": {
                "min": 1,
                "max": 4
            },
            "memoryMiB": {
                "min": 1024,
                "max": 4096
            },
            "osFamily": "linux",
            "cpuArchitectureType": "x86_64"
        }
    }
}
```

#### Deploy Auto Scaling stack using the CloudFormation template
Deploy stack

You can set up an EventBridge rule to filter events, a Lambda to consume the events and control Auto Scaling, and an SQS queue to store unprocessed events. Use the following CloudFormation template to deploy everything in a stack. After you deploy the resources successfully, you can submit a job and the fleet will automatically scale up.

```
Resources:
  AutoScalingLambda:
    Type: 'AWS::Lambda::Function'
    Properties:
      Code:
        ZipFile: |-
          """
          This lambda is configured to handle "Fleet Size Recommendation Change"
          messages. It will handle all such events, and requires
          that the ASG is named based on the fleet id. It will scale up/down the fleet
          based on the recommended fleet size in the message.
          
          Example EventBridge message:
          {
              "version": "0",
              "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
              "detail-type": "Fleet Size Recommendation Change",
              "source": "aws.deadline",
              "account": "111122223333",
              "time": "2017-12-22T18:43:48Z",
              "region": "us-west-1",
              "resources": [],
              "detail": {
                  "farmId": "farm-12345678900000000000000000000000",
                  "fleetId": "fleet-12345678900000000000000000000000",
                  "oldFleetSize": 1,
                  "newFleetSize": 5,
              }
          }
          """
          
          import json
          import boto3
          import logging

          logger = logging.getLogger()
          logger.setLevel(logging.INFO)

          auto_scaling_client = boto3.client("autoscaling")

          def lambda_handler(event, context):
              logger.info(event)
              event_detail = event["detail"]
              fleet_id = event_detail["fleetId"]
              desired_capacity = event_detail["newFleetSize"]

              asg_name = f"deadline-ASG-autoscalable-{fleet_id}"
              auto_scaling_client.set_desired_capacity(
                  AutoScalingGroupName=asg_name,
                  DesiredCapacity=desired_capacity,
                  HonorCooldown=False,
              )

              return {
                  'statusCode': 200,
                  'body': json.dumps(f'Successfully set desired_capacity for {asg_name} to {desired_capacity}')
              }
      Handler: index.lambda_handler
      Role: !GetAtt 
        - AutoScalingLambdaServiceRole
        - Arn
      Runtime: python3.11
    DependsOn:
      - AutoScalingLambdaServiceRoleDefaultPolicy
      - AutoScalingLambdaServiceRole
  AutoScalingEventRule:
    Type: 'AWS::Events::Rule'
    Properties:
      EventPattern:
        source:
          - aws.deadline
        detail-type:
          - Fleet Size Recommendation Change
      State: ENABLED
      Targets:
        - Arn: !GetAtt 
            - AutoScalingLambda
            - Arn
          DeadLetterConfig:
            Arn: !GetAtt 
              - UnprocessedAutoScalingEventQueue
              - Arn
          Id: Target0
          RetryPolicy:
            MaximumRetryAttempts: 15
  AutoScalingEventRuleTargetPermission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !GetAtt 
        - AutoScalingLambda
        - Arn
      Principal: events.amazonaws.com
      SourceArn: !GetAtt 
        - AutoScalingEventRule
        - Arn
  AutoScalingLambdaServiceRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: 'sts:AssumeRole'
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
        Version: 2012-10-17
      ManagedPolicyArns:
        - !Join 
          - ''
          - - 'arn:'
            - !Ref 'AWS::Partition'
            - ':iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
  AutoScalingLambdaServiceRoleDefaultPolicy:
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyDocument:
        Statement:
          - Action: 'autoscaling:SetDesiredCapacity'
            Effect: Allow
            Resource: '*'
        Version: 2012-10-17
      PolicyName: AutoScalingLambdaServiceRoleDefaultPolicy
      Roles:
        - !Ref AutoScalingLambdaServiceRole
  UnprocessedAutoScalingEventQueue:
    Type: 'AWS::SQS::Queue'
    Properties:
      QueueName: deadline-unprocessed-autoscaling-events
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
  UnprocessedAutoScalingEventQueuePolicy:
    Type: 'AWS::SQS::QueuePolicy'
    Properties:
      PolicyDocument:
        Statement:
          - Action: 'sqs:SendMessage'
            Condition:
              ArnEquals:
                'aws:SourceArn': !GetAtt 
                  - AutoScalingEventRule
                  - Arn
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Resource: !GetAtt 
              - UnprocessedAutoScalingEventQueue
              - Arn
        Version: 2012-10-17
      Queues:
        - !Ref UnprocessedAutoScalingEventQueue
```

## Perform a fleet health check
Fleet health check

After creating your fleet, you should build a custom health check to ensure your fleet remains healthy and free of stalled instances to help prevent unnecessary costs. See [ Deploying a Deadline Cloud fleet health check](https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/cmf_templates) on GitHub. A health check can lower the risk of an accidental change in your Amazon Machine Image, launch template, or network configuration running undetected.