翻訳は機械翻訳により提供されています。提供された翻訳内容と英語版の間で齟齬、不一致または矛盾がある場合、英語版が優先します。 # AWS DeepRacer 報酬関数のリファレンス AWS DeepRacerの報酬関数に関する技術的なリファレンスは以下の通りです。 **Topics** + [ # AWS DeepRacer 報酬関数の入力パラメータ ](deepracer-reward-function-input.md) + [ # AWS DeepRacer 報酬関数の例 ](deepracer-reward-function-examples.md) # AWS DeepRacer 報酬関数の入力パラメータ AWS DeepRacer の報酬関数は、辞書オブジェクトを入力として取ります。 ``` def reward_function(params) : reward = ... return float(reward) ``` `params` 辞書オブジェクトには、次のキーと値のペアが含まれています。 ``` { "all_wheels_on_track": Boolean, # flag to indicate if the agent is on the track "x": float, # agent's x-coordinate in meters "y": float, # agent's y-coordinate in meters "closest_objects": [int, int], # zero-based indices of the two closest objects to the agent's current position of (x, y). "closest_waypoints": [int, int], # indices of the two nearest waypoints. "distance_from_center": float, # distance in meters from the track center "is_crashed": Boolean, # Boolean flag to indicate whether the agent has crashed. "is_left_of_center": Boolean, # Flag to indicate if the agent is on the left side to the track center or not. "is_offtrack": Boolean, # Boolean flag to indicate whether the agent has gone off track. "is_reversed": Boolean, # flag to indicate if the agent is driving clockwise (True) or counter clockwise (False). "heading": float, # agent's yaw in degrees "objects_distance": [float, ], # list of the objects' distances in meters between 0 and track_length in relation to the starting line. "objects_heading": [float, ], # list of the objects' headings in degrees between -180 and 180. "objects_left_of_center": [Boolean, ], # list of Boolean flags indicating whether elements' objects are left of the center (True) or not (False). "objects_location": [(float, float),], # list of object locations [(x,y), ...]. "objects_speed": [float, ], # list of the objects' speeds in meters per second. "progress": float, # percentage of track completed "speed": float, # agent's speed in meters per second (m/s) "steering_angle": float, # agent's steering angle in degrees "steps": int, # number steps completed "track_length": float, # track length in meters. "track_width": float, # width of the track "waypoints": [(float, float), ] # list of (x,y) as milestones along the track center } ``` 入力パラメータに関するより詳細な技術リファレンスは以下のとおりです。 ## all\$1wheels\$1on\$1track **タイプ: ** `Boolean` **範囲: ** `(True:False)` エージェントがトラック内にあるのかトラック外にあるのかを示す `Boolean` フラグ。ホイールのいずれかがトラックの境界線の外側にある場合は、トラック外 (`False`) です。すべてのホイールが 2 つのトラック境界の内側にある場合はトラック内 (`True`) です。次の図は、エージェントがトラック上にあることを示しています。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-all_wheels_on_track-true.png) 次の図は、エージェントがトラックから外れていることを示しています。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-all_wheels_on_track-false.png) **例: ** *`all_wheels_on_track` パラメータを試用した報酬関数* ``` def reward_function(params): ############################################################################# ''' Example of using all_wheels_on_track and speed ''' # Read input variables all_wheels_on_track = params['all_wheels_on_track'] speed = params['speed'] # Set the speed threshold based your action space SPEED_THRESHOLD = 1.0 if not all_wheels_on_track: # Penalize if the car goes off track reward = 1e-3 elif speed < SPEED_THRESHOLD: # Penalize if the car goes too slow reward = 0.5 else: # High reward if the car stays on track and goes fast reward = 1.0 return float(reward) ``` ## closest\$1waypoints **タイプ**: `[int, int]` **範囲**: `[(0:Max-1),(1:Max-1)]` `(x, y)` のエージェントの現在位置に最も近い 2 つが隣接する `waypoint` のゼロベースのインデックス。距離は、エージェントの中心からのユークリッド距離によって測定されます。最初の要素は、エージェントの背後に最も近いウェイポイントを指し、2 番目の要素は、エージェントの前面にある最も近いウェイポイントを指します。`Max` は、ウェイポイントリストの長さです。[ウェイポイント](#reward-function-input-waypoints) で示している図では、`closest_waypoints` は `[16, 17]` になります。 **例**: `closest_waypoints` パラメータを使用する報酬関数。次の例の報酬関数は、`waypoints` と`closest_waypoints`、および `heading` を使用して即時報酬を計算する方法を示しています。 AWS DeepRacer は、数学、ランダム、NumPy、SciPy、Shapely のライブラリをサポートしています。1 つを使用するには、関数定義の上に、`import supported library`、インポートステートメントを追加します: `def function_name(parameters)`。 ``` # Place import statement outside of function (supported libraries: math, random, numpy, scipy, and shapely) # Example imports of available libraries # # import math # import random # import numpy # import scipy # import shapely import math def reward_function(params): ############################################################################### ''' Example of using waypoints and heading to make the car point in the right direction ''' # Read input variables waypoints = params['waypoints'] closest_waypoints = params['closest_waypoints'] heading = params['heading'] # Initialize the reward with typical value reward = 1.0 # Calculate the direction of the center line based on the closest waypoints next_point = waypoints[closest_waypoints[1]] prev_point = waypoints[closest_waypoints[0]] # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0]) # Convert to degree track_direction = math.degrees(track_direction) # Calculate the difference between the track direction and the heading direction of the car direction_diff = abs(track_direction - heading) if direction_diff > 180: direction_diff = 360 - direction_diff # Penalize the reward if the difference is too large DIRECTION_THRESHOLD = 10.0 if direction_diff > DIRECTION_THRESHOLD: reward *= 0.5 return float(reward) ``` ## closest\$1objects **タイプ**: `[int, int]` **範囲**: `[(0:len(objects_location)-1), (0:len(objects_location)-1)]` エージェントの現在の位置（x、y）に最も近い 2 つのオブジェクトのゼロから始まるインデックス。最初のインデックスは、エージェントの背後にある最も近いオブジェクトを参照し、2 番目のインデックスは、エージェントの前にある最も近いオブジェクトを参照します。オブジェクトが 1 つしかない場合、両方のインデックスは 0 です。 ## distance\$1from\$1center **タイプ**: `float` **範囲**: `0:~track_width/2` エージェントの中心とトラックの中心との間のメートル単位の変位。観察可能な最大変位は、エージェントのいずれかの車輪がトラックの境界線の外側にあるときに発生し、トラックの境界線の幅に応じて、`track_width` の半分よりわずかに小さいまたは大きい場合があります。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-distance_from_center.png) **例:** *`distance_from_center` パラメータを使用する報酬関数* ``` def reward_function(params): ################################################################################# ''' Example of using distance from the center ''' # Read input variable track_width = params['track_width'] distance_from_center = params['distance_from_center'] # Penalize if the car is too far away from the center marker_1 = 0.1 * track_width marker_2 = 0.5 * track_width if distance_from_center <= marker_1: reward = 1.0 elif distance_from_center <= marker_2: reward = 0.5 else: reward = 1e-3 # likely crashed/ close to off track return float(reward) ``` ## heading **タイプ**: `float` **範囲**: `-180:+180` 座標系の x 軸に対するエージェントの進行方向（度単位）。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-heading.png) **例:** *`heading` パラメータを使用する報酬関数* 詳細については、「[`closest_waypoints`](#reward-function-input-closest_waypoints)」を参照してください。 ## is\$1crashed **タイプ**: `Boolean` **範囲**: `(True:False)` エージェントが終了ステータスとして別のオブジェクトにクラッシュしたか (`True`)、否か (`False`) を示すブール型フラグ。 ## is\$1left\$1of\$1center **タイプ**: `Boolean` **範囲**: `[True : False]` エージェントがトラックの中心より左側 (`True`) にあるのか右側 (`False`) にあるのかを示す `Boolean` フラグ。 ## is\$1offtrack **タイプ**: `Boolean` **範囲**: `(True:False)` エージェントが終了ステータスとしてトラック外 (True) であるのかどうか (False) を示すブール型フラグ。 ## is\$1reversed **タイプ**: `Boolean` **範囲**: `[True:False]` エージェントが時計回り (True) であるのか反時計回り (False) であるのかを示すブール型フラグ。これは、エピソードごとに方向変更を有効にする場合に使用されます。 ## objects\$1distance **タイプ**: `[float, … ]` **範囲**: `[(0:track_length), … ]` 開始ラインに対する環境内のオブジェクト間の距離のリスト。i 番目の要素は、i 番目のオブジェクトと、トラックの中心線に沿った開始線間の距離をメートルで測定します。 **注記** abs \$1 (var1) - (var2)\$1 = how close the car is to an object, WHEN var1 = ["objects\$1distance"][index] and var2 = params["progress"]\$1params["track\$1length"] 車両の前面に最も近いオブジェクトと車両の背後に最も近いオブジェクトのインデックスを取得するには、"closest\$1objects" パラメータを使用します。 ## objects\$1heading **タイプ**: `[float, … ]` **範囲**: `[(-180:180), … ]` オブジェクトの見出しのリスト（度単位）。i番目の要素は、i番目のオブジェクトの見出しを測定します。静止オブジェクトの場合、見出しは 0 です。ボット車両の場合、対応する要素の値は車両の見出し角度です。 ## objects\$1left\$1of\$1center **タイプ**: `[Boolean, … ]` **範囲**: `[True|False, … ]` ブール型フラグのリスト。i番目の要素の値は、i番目のオブジェクトがトラックセンターの左側 (True) か右側 (False) かを示します。 ## objects\$1location **タイプ**: `[(x,y), … ]` **範囲**: `[(0:N,0:N), … ]` すべてのオブジェクトの場所のリスト。各場所は ([x, y](#reward-function-input-x_y)) のタプルです。リストのサイズは、トラック上のオブジェクトの数と同じです。オブジェクトは、固定障害物、移動ボット車両である可能性があることに注意してください。 ## objects\$1speed **タイプ**: `[float, … ]` **範囲**: `[(0:12.0), … ]` トラック上のオブジェクトの速度（メートル/秒）のリスト。静止オブジェクトの場合、速度は 0 です。ボット車両の場合、値はトレーニングで設定した速度です。 ## プログレス **タイプ**: `float` **範囲**: `0:100` トラック完走の割合。 **例:** *`progress` パラメータを使用する報酬関数* 詳細については、「[ステップ](#reward-function-input-steps)」を参照してください。 ## speed **タイプ**: `float` **範囲**: `0.0:5.0` エージェントの観測速度（メートル/秒）。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-speed.png) **例:** *`speed` パラメータを使用する報酬関数* 詳細については、「[all\$1wheels\$1on\$1track](#reward-function-input-all_wheels_on_track)」を参照してください。 ## steering\$1angle **タイプ**: `float` **範囲**: `-30:30` エージェントの中心線からの前輪のステアリング角（度単位）。負の記号 (-) は右へのステアリングを意味し、正の (\$1) 記号は左へのステアリングを意味します。次の図に示すように、エージェントの中心線はトラックの中心線と必ずしも平行ではありません。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-steering.png) **例:** *`steering_angle` パラメータを使用する報酬関数* ``` def reward_function(params): ''' Example of using steering angle ''' # Read input variable abs_steering = abs(params['steering_angle']) # We don't care whether it is left or right steering # Initialize the reward with typical value reward = 1.0 # Penalize if car steer too much to prevent zigzag ABS_STEERING_THRESHOLD = 20.0 if abs_steering > ABS_STEERING_THRESHOLD: reward *= 0.8 return float(reward) ``` ## ステップ **タイプ**: `int` **範囲**: `0:Nstep` 完了したステップ数。ステップは、現在のポリシーに従ってエージェントがとるアクションに対応します。 **例:** *`steps` パラメータを使用する報酬関数* ``` def reward_function(params): ############################################################################# ''' Example of using steps and progress ''' # Read input variable steps = params['steps'] progress = params['progress'] # Total num of steps we want the car to finish the lap, it will vary depends on the track length TOTAL_NUM_STEPS = 300 # Initialize the reward with typical value reward = 1.0 # Give additional reward if the car pass every 100 steps faster than expected if (steps % 100) == 0 and progress > (steps / TOTAL_NUM_STEPS) * 100 : reward += 10.0 return float(reward) ``` ## track\$1length **タイプ**: `float` **範囲**: `[0:Lmax]` トラックの長さ（メートル単位）。`Lmax is track-dependent.` ## track\$1width **タイプ**: `float` **範囲**: `0:Dtrack` トラックの幅 (メートル)。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-track_width.png) **例:** *`track_width` パラメータを使用する報酬関数* ``` def reward_function(params): ############################################################################# ''' Example of using track width ''' # Read input variable track_width = params['track_width'] distance_from_center = params['distance_from_center'] # Calculate the distance from each border distance_from_border = 0.5 * track_width - distance_from_center # Reward higher if the car stays inside the track borders if distance_from_border >= 0.05: reward = 1.0 else: reward = 1e-3 # Low reward if too close to the border or goes off the track return float(reward) ``` ## x、y **タイプ**: `float` **範囲**: `0:N` トラックを含むシミュレーション環境の x 軸と y 軸に沿ったエージェント中心の位置（メートル単位）。原点は、シミュレーション環境の左下隅にあります。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-x-y.png) ## ウェイポイント **タイプ**: `[float, float]` の `list` **範囲**: `[[xw,0,yw,0] … [xw,Max-1, yw,Max-1]]` トラックの中心に沿ったトラック依存 `Max` マイルストーンの順序付きリスト。各マイルストーンは、(x w,i、y w,i) の座標で表されます。ループされたトラックの場合、最初と最後のウェイポイントは同じです。直線のトラックなどループされないトラックの場合、最初と最後のウェイポイントは異なります。 ![\[\]](http://docs.aws.amazon.com/ja_jp/deepracer/latest/developerguide/images/deepracer-reward-function-input-waypoints.png) **例** *`waypoints` パラメータを使用する報酬関数* 詳細については、「[`closest_waypoints`](#reward-function-input-closest_waypoints)」を参照してください。 # AWS DeepRacer 報酬関数の例以下に AWS DeepRacer の報酬関数のいくつかの例を示します。 **Topics** + [ ## 例 1: タイムトライアルでセンターラインに従う ](#deepracer-reward-function-example-0) + [ ## 例 2: タイムトライアルで 2 つの境界内に留まる ](#deepracer-reward-function-example-1) + [ ## 例 3: タイムトライアルでのジグザグ運転の防止 ](#deepracer-reward-function-example-2) + [ ## 例 4: 静止している障害物や走行中の車両に衝突することなく、1 つの車線に留まること。 ](#deepracer-reward-function-example-3) ## 例 1: タイムトライアルでセンターラインに従うこの例では、エージェントがセンターラインからどれだけ離れているかを調べ、トラックの中央に近いと高い報酬を与え、エージェントがセンターラインに密接に従うように促します。 ``` def reward_function(params): ''' Example of rewarding the agent to follow center line ''' # Read input parameters track_width = params['track_width'] distance_from_center = params['distance_from_center'] # Calculate 3 markers that are increasingly further away from the center line marker_1 = 0.1 * track_width marker_2 = 0.25 * track_width marker_3 = 0.5 * track_width # Give higher reward if the car is closer to center line and vice versa if distance_from_center <= marker_1: reward = 1 elif distance_from_center <= marker_2: reward = 0.5 elif distance_from_center <= marker_3: reward = 0.1 else: reward = 1e-3 # likely crashed/ close to off track return reward ``` ## 例 2: タイムトライアルで 2 つの境界内に留まるこの例では、エージェントが境界内に留まる場合に高い報酬を与え、エージェントがラップを完了するための最良の経路を把握させます。プログラミングと理解は簡単ですが、収束に時間がかかる可能性があります。 ``` def reward_function(params): ''' Example of rewarding the agent to stay inside the two borders of the track ''' # Read input parameters all_wheels_on_track = params['all_wheels_on_track'] distance_from_center = params['distance_from_center'] track_width = params['track_width'] # Give a very low reward by default reward = 1e-3 # Give a high reward if no wheels go off the track and # the car is somewhere in between the track borders if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05: reward = 1.0 # Always return a float value return reward ``` ## 例 3: タイムトライアルでのジグザグ運転の防止この例では、エージェントがセンターラインに従うようにインセンティブを与えますが、操作が大きすぎると報酬が低くなり、ジグザグ運転を防ぐのに役立ちます。エージェントはシミュレーターでスムーズに運転することを学習すれば、実際の車両にデプロイされたときに同じ動作を維持できる可能性があります。 ``` def reward_function(params): ''' Example of penalize steering, which helps mitigate zig-zag behaviors ''' # Read input parameters distance_from_center = params['distance_from_center'] track_width = params['track_width'] abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle # Calculate 3 marks that are farther and father away from the center line marker_1 = 0.1 * track_width marker_2 = 0.25 * track_width marker_3 = 0.5 * track_width # Give higher reward if the car is closer to center line and vice versa if distance_from_center <= marker_1: reward = 1.0 elif distance_from_center <= marker_2: reward = 0.5 elif distance_from_center <= marker_3: reward = 0.1 else: reward = 1e-3 # likely crashed/ close to off track # Steering penality threshold, change the number based on your action space setting ABS_STEERING_THRESHOLD = 15 # Penalize reward if the car is steering too much if abs_steering > ABS_STEERING_THRESHOLD: reward *= 0.8 return float(reward) ``` ## 例 4: 静止している障害物や走行中の車両に衝突することなく、1 つの車線に留まること。この報酬関数は、トラック境界の間に留まるエージェントに報酬を与え、前方のオブジェクトに近づきすぎたエージェントにペナルティを与えます。エージェントは、クラッシュを回避するために、車線から車線に移動することができます。報酬総額は、報酬とペナルティの加重合計です。この例では、衝突を回避するために、ペナルティをより重視しました。平均化の重みを変えて実験し、様々な行動の結果に対応できるようにトレーニングします。 ``` import math def reward_function(params): ''' Example of rewarding the agent to stay inside two borders and penalizing getting too close to the objects in front ''' all_wheels_on_track = params['all_wheels_on_track'] distance_from_center = params['distance_from_center'] track_width = params['track_width'] objects_location = params['objects_location'] agent_x = params['x'] agent_y = params['y'] _, next_object_index = params['closest_objects'] objects_left_of_center = params['objects_left_of_center'] is_left_of_center = params['is_left_of_center'] # Initialize reward with a small number but not zero # because zero means off-track or crashed reward = 1e-3 # Reward if the agent stays inside the two borders of the track if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05: reward_lane = 1.0 else: reward_lane = 1e-3 # Penalize if the agent is too close to the next object reward_avoid = 1.0 # Distance to the next object next_object_loc = objects_location[next_object_index] distance_closest_object = math.sqrt((agent_x - next_object_loc[0])**2 + (agent_y - next_object_loc[1])**2) # Decide if the agent and the next object is on the same lane is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center if is_same_lane: if 0.5 <= distance_closest_object < 0.8: reward_avoid *= 0.5 elif 0.3 <= distance_closest_object < 0.5: reward_avoid *= 0.2 elif distance_closest_object < 0.3: reward_avoid = 1e-3 # Likely crashed # Calculate reward by putting different weights on # the two aspects above reward += 1.0 * reward_lane + 4.0 * reward_avoid return reward ```