

# AWS DeepRacer reward function reference
Reward function reference

 The following is the technical reference of the AWS DeepRacer reward function. 

**Topics**
+ [

# Input parameters of the AWS DeepRacer reward function
](deepracer-reward-function-input.md)
+ [

# AWS DeepRacer reward function examples
](deepracer-reward-function-examples.md)

# Input parameters of the AWS DeepRacer reward function
Reward function input parameters

The AWS DeepRacer reward function takes a dictionary object as the input. 

```
def reward_function(params) :
    
    reward = ...

    return float(reward)
```

The `params` dictionary object contains the following key-value pairs:

```
{
    "all_wheels_on_track": Boolean,        # flag to indicate if the agent is on the track
    "x": float,                            # agent's x-coordinate in meters
    "y": float,                            # agent's y-coordinate in meters
    "closest_objects": [int, int],         # zero-based indices of the two closest objects to the agent's current position of (x, y).
    "closest_waypoints": [int, int],       # indices of the two nearest waypoints.
    "distance_from_center": float,         # distance in meters from the track center 
    "is_crashed": Boolean,                 # Boolean flag to indicate whether the agent has crashed.
    "is_left_of_center": Boolean,          # Flag to indicate if the agent is on the left side to the track center or not. 
    "is_offtrack": Boolean,                # Boolean flag to indicate whether the agent has gone off track.
    "is_reversed": Boolean,                # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False).
    "heading": float,                      # agent's yaw in degrees
    "objects_distance": [float, ],         # list of the objects' distances in meters between 0 and track_length in relation to the starting line.
    "objects_heading": [float, ],          # list of the objects' headings in degrees between -180 and 180.
    "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False).
    "objects_location": [(float, float),], # list of object locations [(x,y), ...].
    "objects_speed": [float, ],            # list of the objects' speeds in meters per second.
    "progress": float,                     # percentage of track completed
    "speed": float,                        # agent's speed in meters per second (m/s)
    "steering_angle": float,               # agent's steering angle in degrees
    "steps": int,                          # number steps completed
    "track_length": float,                 # track length in meters.
    "track_width": float,                  # width of the track
    "waypoints": [(float, float), ]        # list of (x,y) as milestones along the track center

}
```

A more detailed technical reference of the input parameters is as follows. 

## all\$1wheels\$1on\$1track


**Type: ** `Boolean`

**Range: ** `(True:False)`

A `Boolean` flag to indicate whether the agent is on-track or off-track. It's off-track (`False`) if any of its wheels are outside of the track borders. It's on-track (`True`) if all of the wheels are inside the two track borders. The following illustration shows that the agent is on-track. 

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-all_wheels_on_track-true.png)


The following illustration shows that the agent is off-track.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-all_wheels_on_track-false.png)


**Example: ** *A reward function using the `all_wheels_on_track` parameter*

```
def reward_function(params):
    #############################################################################
    '''
    Example of using all_wheels_on_track and speed
    '''

    # Read input variables
    all_wheels_on_track = params['all_wheels_on_track']
    speed = params['speed']

    # Set the speed threshold based your action space
    SPEED_THRESHOLD = 1.0

    if not all_wheels_on_track:
        # Penalize if the car goes off track
        reward = 1e-3
    elif speed < SPEED_THRESHOLD:
        # Penalize if the car goes too slow
        reward = 0.5
    else:
        # High reward if the car stays on track and goes fast
        reward = 1.0

    return float(reward)
```

## closest\$1waypoints


**Type**: `[int, int]`

**Range**: `[(0:Max-1),(1:Max-1)]`

The zero-based indices of the two neighboring `waypoint`s closest to the agent's current position of `(x, y)`. The distance is measured by the Euclidean distance from the center of the agent. The first element refers to the closest waypoint behind the agent and the second element refers the closest waypoint in front of the agent. `Max` is the length of the waypoints list. In the illustration shown in [waypoints](#reward-function-input-waypoints), the `closest_waypoints` would be `[16, 17]`. 

**Example**: A reward function using the `closest_waypoints` parameter.

The following example reward function demonstrates how to use `waypoints` and `closest_waypoints` as well as `heading` to calculate immediate rewards.

AWS DeepRacer supports the following libraries: math, random, NumPy, SciPy, and Shapely. To use one, add an import statement, `import supported library`, above your function definition, `def function_name(parameters)`.

```
# Place import statement outside of function (supported libraries: math, random, numpy, scipy, and shapely)
# Example imports of available libraries
#
# import math
# import random
# import numpy
# import scipy
# import shapely

import math

def reward_function(params):
    ###############################################################################
    '''
    Example of using waypoints and heading to make the car point in the right direction
    '''

    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']

    # Initialize the reward with typical value
    reward = 1.0

    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]

    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0])
    # Convert to degree
    track_direction = math.degrees(track_direction)

    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)
    if direction_diff > 180:
        direction_diff = 360 - direction_diff

    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0
    if direction_diff > DIRECTION_THRESHOLD:
        reward *= 0.5

    return float(reward)
​
```

## closest\$1objects


**Type**: `[int, int]`

**Range**: `[(0:len(objects_location)-1), (0:len(objects_location)-1)]`

 The zero-based indices of the two closest objects to the agent's current position of (x, y). The first index refers to the closest object behind the agent, and the second index refers to the closest object in front of the agent. If there is only one object, both indices are 0. 

## distance\$1from\$1center


**Type**: `float`

**Range**: `0:~track_width/2`

Displacement, in meters, between the agent center and the track center. The observable maximum displacement occurs when any of the agent's wheels are outside a track border and, depending on the width of the track border, can be slightly smaller or larger than half the `track_width`.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-distance_from_center.png)


**Example:** *A reward function using the `distance_from_center` parameter*

```
def reward_function(params):
    #################################################################################
    '''
    Example of using distance from the center
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Penalize if the car is too far away from the center
    marker_1 = 0.1 * track_width
    marker_2 = 0.5 * track_width

    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return float(reward)
```

## heading


**Type**: `float`

**Range**: `-180:+180`

Heading direction, in degrees, of the agent with respect to the x-axis of the coordinate system.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-heading.png)


**Example:** *A reward function using the `heading` parameter*

For more information, see [`closest_waypoints`](#reward-function-input-closest_waypoints).

## is\$1crashed


**Type**: `Boolean`

**Range**: `(True:False)`

A Boolean flag to indicate whether the agent has crashed into another object (`True`) or not (`False`) as a termination status. 

## is\$1left\$1of\$1center


**Type**: `Boolean`

**Range**: `[True : False]`

A `Boolean` flag to indicate if the agent is on the left side to the track center (`True`) or on the right side (`False`). 

## is\$1offtrack


**Type**: `Boolean`

**Range**: `(True:False)`

A Boolean flag to indicate whether the agent has off track (True) or not (False) as a termination status. 

## is\$1reversed


**Type**: `Boolean`

**Range**: `[True:False]`

A Boolean flag to indicate if the agent is driving on clock-wise (True) or counter clock-wise (False). 

It's used when you enable direction change for each episode. 

## objects\$1distance


**Type**: `[float, … ]`

**Range**: `[(0:track_length), … ]`

A list of the distances between objects in the environment in relation to the starting line. The ith element measures the distance in meters between the ith object and the starting line along the track center line. 

**Note**  
abs \$1 (var1) - (var2)\$1 = how close the car is to an object, WHEN var1 = ["objects\$1distance"][index] and var2 = params["progress"]\$1params["track\$1length"]  
To get an index of the closest object in front of the vehicle and the closest object behind the vehicle, use the "closest\$1objects" parameter.

## objects\$1heading


**Type**: `[float, … ]`

**Range**: `[(-180:180), … ]`

List of the headings of objects in degrees. The ith element measures the heading of the ith object. For stationary objects, their headings are 0. For a bot vehicle, the corresponding element's value is the vehicle's heading angle.

## objects\$1left\$1of\$1center


**Type**: `[Boolean, … ]`

**Range**: `[True|False, … ]`

List of Boolean flags. The ith element value indicates whether the ith object is to the left (True) or right (False) side of the track center. 

## objects\$1location


**Type**: `[(x,y), … ]`

**Range**: `[(0:N,0:N), … ]`

List of all object locations, each location is a tuple of ([x, y](#reward-function-input-x_y)). 

The size of the list equals the number of objects on the track. Note the object could be the stationary obstacles, moving bot vehicles. 

## objects\$1speed


**Type**: `[float, … ]`

**Range**: `[(0:12.0), … ]`

List of speeds (meters per second) for the objects on the track. For stationary objects, their speeds are 0. For a bot vehicle, the value is the speed you set in training. 

## progress


**Type**: `float`

**Range**: `0:100`

Percentage of track completed.

**Example:** *A reward function using the `progress` parameter*

For more information, see [steps](#reward-function-input-steps).

## speed


**Type**: `float`

**Range**: `0.0:5.0`

The observed speed of the agent, in meters per second (m/s).

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-speed.png)


**Example:** *A reward function using the `speed` parameter*

For more information, see [all\$1wheels\$1on\$1track](#reward-function-input-all_wheels_on_track).

## steering\$1angle


**Type**: `float`

**Range**: `-30:30`

Steering angle, in degrees, of the front wheels from the center line of the agent. The negative sign (-) means steering to the right and the positive (\$1) sign means steering to the left. The agent center line is not necessarily parallel with the track center line as is shown in the following illustration.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-steering.png)


**Example:** *A reward function using the `steering_angle` parameter*

```
def reward_function(params):
    '''
    Example of using steering angle
    '''

    # Read input variable
    abs_steering = abs(params['steering_angle']) # We don't care whether it is left or right steering

    # Initialize the reward with typical value
    reward = 1.0

    # Penalize if car steer too much to prevent zigzag
    ABS_STEERING_THRESHOLD = 20.0
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)
```

## steps


**Type**: `int`

**Range**: `0:Nstep`

Number of steps completed. A step corresponds to an action taken by the agent following the current policy.

**Example:** *A reward function using the `steps` parameter*

```
def reward_function(params):
    #############################################################################
    '''
    Example of using steps and progress
    '''

    # Read input variable
    steps = params['steps']
    progress = params['progress']

    # Total num of steps we want the car to finish the lap, it will vary depends on the track length
    TOTAL_NUM_STEPS = 300

    # Initialize the reward with typical value
    reward = 1.0

    # Give additional reward if the car pass every 100 steps faster than expected
    if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 :
        reward += 10.0

    return float(reward)
```

## track\$1length


**Type**: `float`

**Range**: `[0:Lmax]`

The track length in meters. `Lmax is track-dependent.`

## track\$1width


**Type**: `float`

**Range**: `0:Dtrack`

Track width in meters.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-track_width.png)


**Example:** *A reward function using the `track_width` parameter*

```
def reward_function(params):
    #############################################################################
    '''
    Example of using track width
    '''

    # Read input variable
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate the distance from each border
    distance_from_border = 0.5 * track_width - distance_from_center

    # Reward higher if the car stays inside the track borders
    if distance_from_border >= 0.05:
        reward = 1.0
    else:
        reward = 1e-3 # Low reward if too close to the border or goes off the track

    return float(reward)
```

## x, y


**Type**: `float`

**Range**: `0:N`

Location, in meters, of the agent center along the x and y axes, of the simulated environment containing the track. The origin is at the lower-left corner of the simulated environment.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-x-y.png)


## waypoints


**Type**: `list` of `[float, float]`

**Range**: `[[xw,0,yw,0] … [xw,Max-1, yw,Max-1]]`

An ordered list of track-dependent `Max` milestones along the track center. Each milestone is described by a coordinate of (xw,i, yw,i). For a looped track, the first and last waypoints are the same. For a straight or other non-looped track, the first and last waypoints are different.

![\[\]](http://docs.aws.amazon.com/deepracer/latest/developerguide/images/deepracer-reward-function-input-waypoints.png)


**Example** *A reward function using the `waypoints` parameter*

For more information, see [`closest_waypoints`](#reward-function-input-closest_waypoints).

# AWS DeepRacer reward function examples
Reward function examples

The following lists some examples of the AWS DeepRacer reward function.

**Topics**
+ [

## Example 1: Follow the center line in time trials
](#deepracer-reward-function-example-0)
+ [

## Example 2: Stay inside the two borders in time trials
](#deepracer-reward-function-example-1)
+ [

## Example 3: Prevent zig-zag in time trials
](#deepracer-reward-function-example-2)
+ [

## Example 4: Stay in one lane without crashing into stationary obstacles or moving vehicles
](#deepracer-reward-function-example-3)

## Example 1: Follow the center line in time trials


 This example determines how far away the agent is from the center line, and gives higher reward if it is closer to the center of the track, encouraging the agent to closely follow the center line. 

```
def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward
```

## Example 2: Stay inside the two borders in time trials


 This example simply gives high rewards if the agent stays inside the borders, and lets the agent figure out the best path to finish a lap. It's easy to program and understand, but likely takes longer to converge. 

```
def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and 
    # the car is somewhere in between the track borders 
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return reward
```

## Example 3: Prevent zig-zag in time trials


 This example incentivizes the agent to follow the center line but penalizes with lower reward if it steers too much, which helps prevent zig-zag behavior. The agent learns to drive smoothly in the simulator and likely keeps the same behavior when deployed to the physical vehicle. 

```
def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 

    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)
```

## Example 4: Stay in one lane without crashing into stationary obstacles or moving vehicles


 

This reward function rewards the agent for staying inside the track's borders and penalizes the agent for getting too close to objects in front of it. The agent can move from lane to lane to avoid crashes. The total reward is a weighted sum of the reward and penalty. The example gives more weight to the penalty in effort to avoid crashes. Experiment with different averaging weights to train for different behavior outcomes.

 

```
import math
def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_location = params['objects_location']
    agent_x = params['x']
    agent_y = params['y']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']
    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3
    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3
    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0
    # Distance to the next object
    next_object_loc = objects_location[next_object_index]
    distance_closest_object = math.sqrt((agent_x - next_object_loc[0])**2 + (agent_y - next_object_loc[1])**2)
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center
    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8:
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3  # Likely crashed
    # Calculate reward by putting different weights on
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid
    return reward
```