本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # 實作獎勵函數 ## 概觀獎勵函數（也稱為計分器或分級器）是評估模型回應並提供意見回饋訊號以進行訓練的核心元件。它必須實作為接受模型回應並傳回獎勵分數的 Lambda 函數。 ## 介面格式您的獎勵函數必須接受並傳回下列格式的資料： **訓練的輸入範例** ``` { "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" } ], "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." } } ``` **獎勵 lambda 的範例承載** 容器會自動轉換您的資料，然後再將資料傳送到 Lambda 函數，方法如下： 1. 為每個提示產生模型回應 1. 將助理轉彎（產生的回應）附加到訊息陣列 1. 新增唯一`id`欄位以進行追蹤您的 Lambda 函數將以此轉換格式接收資料： ``` { "id": "123", "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" }, { "role": "assistant", "content": "As an AI developed by Amazon, I don not have a dedicated security team..." } ], # Following section will be same as your training dataset sample "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." } } ``` **獎勵 Lambda 合約** ``` def lambda_handler(event, context): return lambda_grader(event) def lambda_grader(samples: list[dict]) -> list[dict]: """ Args: samples: List of dictionaries in OpenAI format Example input: { "id": "123", "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" }, { "role": "assistant", "content": "As an AI developed by Company, I don nott have a dedicated security team..." } ], # This section will be same as your training dataset "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." } } Returns: List of dictionaries with reward scores: { "id": str, # Same id as input sample "aggregate_reward_score": float, # Overall score for the sample "metrics_list": [ # OPTIONAL: Component scores { "name": str, # Name of the component score "value": float, # Value of the component score "type": str # "Reward" or "Metric" } ] } """ ``` ## 輸入和輸出欄位 ### 輸入欄位 | 欄位 | Description | 其他備註 | | --- | --- | --- | | id | 範例的唯一識別符 | 在輸出中回呼。字串格式 | | messages | OpenAI 格式的排序聊天歷史記錄 | 訊息物件陣列 | | messages【】.role | 訊息的發言者 | 常見值："user"、"assistant"、"system" | | messages【】.content | 訊息的文字內容 | 純文字的字串 | | \*\*中繼資料 | 協助分級的自由格式資訊 | 物件；從訓練資料傳遞的選用欄位 | ### 輸出欄位 | 欄位 | Description | 其他備註 | | --- | --- | --- | | id | 與輸入範例相同的識別符 | 必須符合輸入 | | aggregate\_reward\_score | 範例的整體分數 | 浮動（例如 0.0–1.0 或任務定義範圍） | | metrics\_list | 組成彙總的元件分數 | 指標物件陣列 | ## 技術限制條件 + **逾時限制** – 每次 Lambda 調用最多 15 分鐘執行時間 + **並行** – 必須處理`rollout_worker_replicas * 64`並行請求 + **可靠性** – 必須實作適當的錯誤處理，並一致地傳回有效的分數 + **效能** – 最佳化快速執行（秒，而非分鐘），以啟用高效訓練 **最佳實務** + 將外部 API 呼叫降至最低 + 使用有效率的演算法和資料結構 + 實作暫時性失敗的重試邏輯 + 快取可重複使用的運算 + 在訓練之前徹底測試，以確保無錯誤執行 ## 使用自訂獎勵函數當您有任務特定的評估條件時，請實作自訂獎勵函數： + **定義評估條件** - 決定什麼可以為您的任務做出良好的回應 + **實作 Lambda 函數** – 依照介面格式建立 Lambda 函數 + 在**本機測試** – 驗證您的函數傳回正確的範例輸入分數 + **部署至 AWS** - 部署您的 Lambda 並記下 ARN + **設定配方** – 將 Lambda ARN 新增至配方的 `reward_lambda_arn` 欄位 + **使用小型資料集進行測試** – 以最少的資料執行 RFT 以驗證整合 ## IAM 許可 ### 所需的許可您的 SageMaker 執行角色必須具有叫用 Lambda 函數的許可。將此政策新增至 SageMaker 執行角色： ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lambda:InvokeFunction" ], "Resource": "arn:aws:lambda:region:account-id:function:function-name" } ] } ``` ### Lambda 執行角色您的 Lambda 函數的執行角色需要基本的 Lambda 執行許可： ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:*:*:*" } ] } ``` 其他許可：如果您的 Lambda 函數存取其他 AWS 服務（例如，用於參考資料的 S3、用於記錄的 DynamoDB)，請將這些許可新增至 Lambda 執行角色。 ## 範例：LLM 做為判斷獎勵函數此範例示範如何使用 Amazon Bedrock 模型做為判斷，透過將模型回應與參考答案進行比較來評估模型回應。此 Lambda 範本提供架構，讓客戶實作對 Amazon Bedrock 的呼叫，以進行推論請求，以處理判斷評估。Lambda 函數會維護與其他獎勵函數相同的輸入/輸出合約。 ### 實作此 Lambda 函數會實作兩階段評估程序：會從傳入的範例中`lambda_handler`擷取模型回應和參考答案，然後`lambda_graded`函數會呼叫 Amazon Bedrock 來對它們之間的語意相似性進行評分。實作包括強大的錯誤處理功能，可自動重試暫時性失敗，並支援靈活的參考答案格式（字串和結構化字典格式）。 **實作詳細資訊：** + **重試邏輯**：針對調節例外狀況實作指數退避 (1、2、4)，以處理 Bedrock API 速率限制 + **錯誤處理**：針對失敗的評估傳回 0.0 的分數，而不是引發例外狀況 + **確定性評分**：使用溫度=0.0 來確保評估之間的分數一致 + **彈性參考格式**：自動處理字串和字典參考答案 + **分數限制**：確保所有分數都落在有效的【0.0， 1.0】範圍內 + **模型無關**：將 JUDGE\_MODEL\_ID 變更為使用任何 Amazon Bedrock 模型 (Nova、Llama、Mistral 等） ``` """ LLM Judge Lambda POC - Working implementation using Amazon Bedrock """ import json import time import boto3 bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1') JUDGE_MODEL_ID = "anthropic.claude-3-5-sonnet-20240620-v1:0" SYSTEM_PROMPT = "You must output ONLY a number between 0.0 and 1.0. No explanations, no text, just the number." JUDGE_PROMPT_TEMPLATE = """Compare the following two responses and rate how similar they are on a scale of 0.0 to 1.0, where: - 1.0 means the responses are semantically equivalent (same meaning, even if worded differently) - 0.5 means the responses are partially similar - 0.0 means the responses are completely different or contradictory Response A: {response_a} Response B: {response_b} Output ONLY a number between 0.0 and 1.0. No explanations.""" def lambda_graded(response_a: str, response_b: str, max_retries: int = 3) -> float: """Call Bedrock to compare responses and return similarity score.""" prompt = JUDGE_PROMPT_TEMPLATE.format(response_a=response_a, response_b=response_b) for attempt in range(max_retries): try: response = bedrock_runtime.converse( modelId=JUDGE_MODEL_ID, messages=[{"role": "user", "content": [{"text": prompt}]}], system=[{"text": SYSTEM_PROMPT}], inferenceConfig={"temperature": 0.0, "maxTokens": 10} ) print(f"Bedrock call successful: {response}") output = response['output']['message']['content'][0]['text'].strip() score = float(output) print(f"Score parsed: {score}") return max(0.0, min(1.0, score)) except Exception as e: if "ThrottlingException" in str(e) and attempt < max_retries - 1: time.sleep(2 ** attempt) else: print(f"Bedrock call failed: {e}") return None return None def lambda_handler(event, context): """AWS Lambda handler - processes samples from RFTEvalInvoker.""" try: samples = event if isinstance(event, list) else [event] results = [] for sample in samples: sample_id = sample.get("id", "unknown") messages = sample.get("messages", []) # Extract assistant response (response A) response_a = "" for msg in messages: if msg.get("role") in ["assistant", "nova_assistant"]: response_a = msg.get("content", "") break # Extract reference answer from root level (no longer in metadata) reference_answer = sample.get("reference_answer", "") # Handle both string and dict reference_answer formats if isinstance(reference_answer, dict): # If reference_answer is a dict, extract the explanation or compliant field response_b = reference_answer.get("explanation", reference_answer.get("compliant", "")) else: response_b = reference_answer if not response_a or not response_b: results.append({ "id": sample_id, "aggregate_reward_score": 0.0, "metrics_list": [{"name": "similarity_score", "value": 0.0, "type": "Metric"}] }) continue # Get similarity score score = lambda_graded(response_a, response_b) results.append({ "id": sample_id, "aggregate_reward_score": score, "metrics_list": [ { "name": "similarity_score", "value": score, "type": "Metric" } ] }) return {"statusCode": 200, "body": json.dumps(results)} except Exception as e: print(f"Error: {e}") return {"statusCode": 500, "body": json.dumps({"error": str(e)})} ``` ### 輸入格式 Lambda 會收到與其他獎勵函數相同的輸入格式： ``` { "id": "sample-001", "messages": [ { "role": "user", "content": "Do you have a dedicated security team?" }, { "role": "assistant", "content": "As an AI developed by Amazon, I don't have a dedicated security team..." } ], "reference_answer": { "compliant": "No", "explanation": "As an AI developed by Company, I do not have a traditional security team..." }, "my_custom_field": "custom_value" } ``` ### 輸出格式 ``` { "id": "sample-001", "aggregate_reward_score": 0.85, "metrics_list": [ { "name": "similarity_score", "value": 0.85, "type": "Metric" } ] } ``` ### 部署考量您可能還需要根據所選模型的功能和 API 格式來調整提示範本和推論參數。 + **IAM 許可**：Lambda 執行角色必須具有所選模型的`bedrock:InvokeModel`許可 + **逾時**：將 Lambda 逾時設定為至少 60 秒，以適應 Bedrock API 延遲和重試 + **區域**：部署在可使用您所選 Bedrock 模型的區域 + **成本**：監控 Bedrock API 用量，因為每次評估都會對每個範例進行一次 API 呼叫 + **輸送量**：對於大規模評估，請求提高 Bedrock 配額以避免限流 **增加 Bedrock 輸送量** 如果您在評估期間遇到限流，請增加 Bedrock 模型配額： + 導覽至 AWS Service Quotas 主控台 + 搜尋 "Bedrock" 並選擇您的區域 + 尋找所選模型的配額（例如「Claude 3.5 Sonnet 每分鐘叫用次數」) + 按一下「請求增加配額」並指定所需的輸送量 + 提供增加的理由（例如，「RFT 評估工作負載」) Lambda 的內建重試邏輯會偶爾處理限流，但持續的大量評估需要增加適當的配額。 **必要的 IAM 政策：** ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel" ], "Resource": "arn:aws:bedrock:*::foundation-model/*" } ] } ```