本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # AWS Glue for Ray 終止支援 **重要** AWS Glue for Ray 自 2026 年 4 月 30 日起不再向新客戶開放。如果您想要使用 AWS Glue for Ray，請在該日期之前註冊。現有客戶可以繼續正常使用該服務。如需類似 AWS Glue for Ray 的功能，請探索 Amazon EKS。如需詳細資訊，請參閱 [AWS Glue 的 Ray 終止支援](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html)。在仔細考慮之後，我們決定自 2026 年 4 月 30 日起 AWS Glue 關閉 Ray 給新客戶。如果您想要使用 AWS Glue for Ray，請在該日期之前註冊。現有客戶可以繼續正常使用該服務。 AWS 持續投資於 AWS Glue for Ray 的安全性和可用性改進。請注意，我們不打算將新功能引入 AWS Glue for Ray，但安全性和可用性增強功能除外。作為 AWS Glue Ray 的替代方案，我們建議您使用 Amazon Elastic Kubernetes Service。Amazon Elastic Kubernetes Service 是一項全受管、經認證的 Kubernetes 合規服務，可簡化建置、保護、操作和維護 Kubernetes 叢集的程序 AWS。這是一個高度可自訂的選項，依賴開放原始碼 KubeRay Operator 在 Kubernetes 上部署和管理 Ray 叢集，從而提高資源使用率、簡化基礎設施管理和對 Ray 功能的完整支援。 ## 將 Ray 任務遷移至 Amazon Elastic Kubernetes Service 本節提供從 AWS Glue for Ray 遷移至 Amazon Elastic Kubernetes Service 上的 Ray 的步驟。這些步驟適用於兩種遷移案例： + **標準遷移 (x86/amd64)**：對於這些使用案例，遷移策略使用 OpenSource Ray 容器進行基本實作，並直接在基礎容器上執行指令碼。 + **ARM64 遷移**：對於這些使用案例，遷移策略支援針對 ARM64-specific相依性和架構需求的自訂容器建置。 ### 遷移的先決條件安裝下列 CLI 工具：**aws**、**kubectl**、**eksctl**、**helm**、Python 3.9\$1。這些 CLI 工具是佈建和管理 Ray on EKS 環境的必要工具。 **eksctl** 可簡化建立和管理 EKS 叢集。 **kubectl**是在您的叢集上部署和疑難排解工作負載的標準 Kubernetes CLI。 **helm** 用於安裝和管理 KubeRay （在 Kubernetes 上執行 Ray 的運算子）。Ray 本身和在本機執行任務提交指令碼需要 Python 3.9\$1。 #### 安裝 eksctl 請遵循 [Eksctl 安裝選項](https://docs.aws.amazon.com/eks/latest/eksctl/installation.html)的指示，或使用下列指示進行安裝。 macOS： ``` brew tap weaveworks/tap brew install weaveworks/tap/eksctl ``` 適用於 Linux： ``` curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp # Move the extracted binary to /usr/local/bin sudo mv /tmp/eksctl /usr/local/bin # Test the installation eksctl version ``` #### 安裝 kubectl 請遵循[設定 kubectl 和 eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) 的指示，或使用下列指示進行安裝。 macOS： ``` brew install kubectl ``` 適用於 Linux： ``` curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" chmod +x kubectl sudo mv kubectl /usr/local/bin/ ``` #### 安裝 helm 請遵循[安裝 Helm](https://helm.sh/docs/intro/install/) 的指示，或使用下列指示進行安裝。 macOS： ``` brew install helm ``` 適用於 Linux： ``` curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash ``` ### 步驟 1. 建置或選擇 Ray 的 Docker 影像 **選項 1：使用官方 Ray 映像（不需要建置）** 此選項使用 Docker [Hub 上的官方 Ray Docker](https://hub.docker.com/u/rayproject) 映像，例如 `rayproject/ray:2.4.0-py39`，由 Ray 專案維護。 **注意** 此映像僅限 amd64。如果您的相依性與 amd64 相容，而且您不需要 ARM 特定的組建，請使用此選項。 **選項 2：建置和發佈您自己的 arm64 Ray 2.4.0 映像** 此選項在使用 Graviton (ARM) 節點時非常有用，與 Ray 在內部使用的內容 AWS Glue 一致。您可以建立自訂映像，固定到與 AWS Glue Ray 相同的相依版本，以減少相容性不相符。在本機建立 Dockerfile： ``` # Build an ARM64 image FROM --platform=linux/arm64 python:3.9-slim-bullseye # Handy tools: wget for KubeRay probes; CA certs; keep image small RUN apt-get update && apt-get install -y --no-install-recommends \ wget ca-certificates \ && rm -rf /var/lib/apt/lists/* # Keep pip/setuptools modern enough for wheels resolution RUN python -m pip install -U "pip<24" "setuptools<70" wheel # ---- Install Ray 2.4.0 (ARM64 / Py3.9) and Glue-like dependencies ---- # 1) Download the exact Ray 2.4.0 wheel for aarch64 (no network at runtime) RUN python -m pip download --only-binary=:all: --no-deps --dest /tmp/wheels ray==2.4.0 # 2) Core libs used in Glue (pin to Glue-era versions) # + the dashboard & jobs API dependencies compatible with Ray 2.4.0. # (Pins matter: newer major versions break 2.4.0's dashboard.) RUN python -m pip install --no-cache-dir \ /tmp/wheels/ray-2.4.0-*.whl \ "pyarrow==11.0.0" \ "pandas==1.5.3" \ "boto3==1.26.133" \ "botocore==1.29.133" \ "numpy==1.24.3" \ "fsspec==2023.4.0" \ "protobuf<4" \ # --- dashboard / jobs server deps --- "aiohttp==3.8.5" \ "aiohttp-cors==0.7.0" \ "yarl<1.10" "multidict<7.0" "frozenlist<1.4" "aiosignal<1.4" "async_timeout<5" \ "pydantic<2" \ "opencensus<0.12" \ "prometheus_client<0.17" \ # --- needed if using py_modules --- "smart_open[s3]==6.4.0" # Optional: prove Ray & arch at container start ENV PYTHONUNBUFFERED=1 WORKDIR /app # KubeRay overrides the start command; this is just a harmless default CMD ["python","-c","import ray,platform; print('Ray', ray.__version__, 'on', platform.machine())"] ``` ``` # Set environment variables export AWS_REGION=us-east-1 export AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text) export REPO=ray-2-4-arm64 export IMAGE=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPO}:v1 # Create repository and login aws ecr create-repository --repository-name $REPO >/dev/null 2>&1 || true aws ecr get-login-password --region $AWS_REGION \ | docker login --username AWS --password-stdin ${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com # Enable Buildx (for cross-builds on non-ARM hosts) docker buildx create --name multi --driver docker-container --use 2>/dev/null || true # Build & push ARM64 image docker buildx build \ --platform linux/arm64 \ -t "$IMAGE" \ . --push # Verify the image architecture remotely aws ecr batch-get-image \ --repository-name $REPO \ --image-ids imageTag=v1 \ --accepted-media-types application/vnd.docker.distribution.manifest.v2+json \ | jq -r '.images[0].imageManifest' \ | jq -r 'fromjson.config.digest' ``` 完成後，請使用在 RayCluster 規格中參考此 ARM64 映像`nodeSelector: { kubernetes.io/arch: arm64 }`。 ``` spec: rayVersion: "2.4.0" headGroupSpec: template: spec: containers: - name: ray-head image: ``` ### 步驟 2. AWS Glue 將 for Ray 任務組態轉換為 Amazon Elastic Kubernetes Service 上的 Ray AWS Glue for Ray 任務支援一組任務引數，用於設定工作者、相依性、記憶體和記錄。使用 KubeRay 遷移至 Amazon Elastic Kubernetes Service 時，這些引數需要翻譯為 RayCluster 規格欄位或 Ray 任務執行時間環境設定。 #### 任務引數映射 **AWS Glue 將 Ray 引數映射至 Ray on EKS 對等項目** | AWS Glue for Ray 引數 | 它在 AWS Glue for Ray 中執行的操作 | Ray on Amazon Elastic Kubernetes Service 對等項目 | | --- | --- | --- | | --min-workers | 任務必須配置的最小工作者。 | workerGroupSpecs[].minReplicas RayCluster 中的 | | --working-dir | 將 zip (S3 URI) 分佈至所有節點。 | 使用 Ray 執行期 env：working\$1dir如果您從本機檔案提交，py\$1modules請使用 S3 zip 指向 S3 成品 | | --s3-py-modules | 從 S3 新增 Python wheel/dists。 | 使用 Ray 執行期 env： py\$1modules: ["s3://.../xxx.whl", ...] | | --pip-install | 為任務安裝額外的 PyPI 套件。 | Ray 執行時間 env： pip: ["pkg==ver", ...](Ray 任務 CLI --runtime-env-json或 RayJob)runtimeEnvYAML。 | | --object\$1store\$1memory\$1head | 前端節點的 Plasma 存放區的記憶體 %。 | headGroupSpec[].rayStartParams.object-store-memory RayCluster 中的。請注意，這應該以位元組為單位。 AWS Glue 使用百分比，而 Ray 使用位元組。 | | --object\$1store\$1memory\$1worker | 工作者節點的 Plasma 存放區的記憶體 %。 | 與上述相同，但在每個工作者群組的 rayStartParams.object-store-memory（位元組）中設定。 | | --object\$1spilling\$1config | 設定 Ray 物件溢出。 | headGroupSpec[].rayStartParams.object-spilling-config | | --logging\$1configuration | AWS Glue受管日誌 (CloudWatch、S3)。 | 檢查 Pod stdout/stderr：使用 kubectl -n ray logs --follow。從 Ray Dashboard 檢查日誌（連接埠向前到：8265)，您也可以在那裡查看任務和任務日誌。 | #### 任務組態映射 **AWS Glue 將 Ray 任務組態映射至 Ray on EKS 對等項目** | Configuration | 它在 AWS Glue for Ray 中執行的操作 | Ray on EKS 對等項目 | | --- | --- | --- | | 工作者類型 | 設定任務執行時允許的預先定義工作者類型。預設為 Z 2X (8vCPU、64 GB RAM)。 | EKS 中的節點群組執行個體類型（例如 r7g.2xlarge ≈ 8 vCPU / ARM 為 64 GB，x86 為 r7a.2xlarge)。 | | 工作者數量上限 | AWS Glue 您要配置給此任務的工作者數量。 | workerGroupSpecs[].maxReplicas 設定為與您在中使用的相同數量 AWS Glue。這是自動擴展的上限。同樣地minReplicas，將設定為下限。您可以從 replicas: 0、開始minReplicas: 0。 | ### 步驟 3。設定 Amazon Elastic Kubernetes Service 您可以建立新的 Amazon Elastic Kubernetes Service 叢集，或重複使用現有的 Amazon Elastic Kubernetes Service 叢集。如果使用現有的叢集，請略過建立叢集命令並跳至新增節點群組、IRSA，然後安裝 KubeRay。 #### 建立 Amazon Elastic Kubernetes Service 叢集 **注意** 如果您有現有的 Amazon Elastic Kubernetes Service 叢集，請略過命令來建立新的叢集，並只新增節點群組。 ``` # Environment Variables export AWS_REGION=us-east-1 export CLUSTER=ray-eks export NS=ray # namespace for your Ray jobs (you can reuse another if you like) # Create a cluster (OIDC is required for IRSA) eksctl create cluster \ --name $CLUSTER \ --region $AWS_REGION \ --with-oidc \ --managed ``` #### 新增節點群組 ``` # ARM/Graviton (matches Glue's typical runtime): eksctl create nodegroup \ --cluster $CLUSTER \ --region $AWS_REGION \ --name arm64-ng \ --node-type m7g.large \ --nodes 2 --nodes-min 1 --nodes-max 5 \ --managed \ --node-labels "workload=ray" # x86/amd64 (use if your image is amd64-only): eksctl create nodegroup \ --cluster $CLUSTER \ --region $AWS_REGION \ --name amd64-ng \ --node-type m5.large \ --nodes 2 --nodes-min 1 --nodes-max 5 \ --managed \ --node-labels "workload=ray" ``` **注意** 如果您使用的是現有的 Amazon Elastic Kubernetes Service 叢集，則使用在新增節點群組時`--with-oidc`啟用 OIDC。 #### 為 S3 的服務帳戶 (IRSA) 建立命名空間 \$1 IAM 角色 Kubernetes 命名空間是資源 (Pod、服務、角色等）的邏輯分組。您可以建立或重複使用現有的命名空間。您也需要為 S3 建立 IAM 政策，該政策會反映 AWS Glue 任務的存取權。使用任務 AWS Glue 角色擁有的相同自訂許可（通常是對特定儲存貯體的 S3 讀取/寫入）。若要將許可授予 Amazon Elastic Kubernetes Service，類似於 AWSGlueServiceRole，請建立繫結至此 IAM 政策的服務帳戶 (IRSA)。如需設定此服務[帳戶的指示，請參閱服務帳戶的 IAM 角色](https://docs.aws.amazon.com/eks/latest/eksctl/iamserviceaccounts.html)。 ``` # Create (or reuse) namespace kubectl create namespace $NS || true ``` ``` { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::YOUR-BUCKET", "arn:aws:s3:::YOUR-BUCKET/*" ] }] } ``` ``` # Create the IAM policy and wire IRSA: aws iam create-policy \ --policy-name RayS3Policy \ --policy-document file://example.json || true # Create a service account (IRSA) bound to that policy. eksctl create iamserviceaccount \ --cluster $CLUSTER \ --region $AWS_REGION \ --namespace $NS \ --name ray-s3-access \ --attach-policy-arn arn:aws:iam::${AWS_ACCOUNT}:policy/RayS3Policy \ --approve \ --override-existing-serviceaccounts ``` #### 安裝 KubeRay 運算子（在 K8s控制器） ``` helm repo add kuberay https://ray-project.github.io/kuberay-helm/ helm repo update helm upgrade --install kuberay-operator kuberay/kuberay-operator \ --namespace kuberay-system \ --create-namespace # Validate the operator pod Running kubectl -n kuberay-system get pods ``` ### 步驟 4. 啟動 Ray 叢集建立 YAML 檔案以定義射線叢集。以下是範例組態 (raycluster.yaml)： ``` apiVersion: ray.io/v1 kind: RayCluster metadata: name: glue-like namespace: ray spec: rayVersion: "2.4.0" headGroupSpec: template: spec: nodeSelector: kubernetes.io/arch: amd64 serviceAccountName: ray-s3-access containers: - name: ray-head image: rayproject/ray:2.4.0-py39 imagePullPolicy: Always resources: requests: { cpu: "1", memory: "2Gi" } limits: { cpu: "1", memory: "2Gi" } workerGroupSpecs: - groupName: workers replicas: 0 # start with just a head (like small Glue dev job) and turn number of replicas later minReplicas: 0 maxReplicas: 5 template: spec: nodeSelector: kubernetes.io/arch: amd64 serviceAccountName: ray-s3-access containers: - name: ray-worker image: rayproject/ray:2.4.0-py39 imagePullPolicy: Always resources: requests: { cpu: "1", memory: "2Gi" } limits: { cpu: "1", memory: "2Gi" } ``` #### 在 Amazon Elastic Kubernetes Service 叢集上部署 Ray 叢集 ``` kubectl apply -n $NS -f raycluster.yaml # Validate that the head pod turns to READY/ RUNNING state kubectl -n $NS get pods -l ray.io/cluster=glue-like -w ``` 如果需要修改已部署的 yaml，請先刪除叢集，然後重新套用更新的 yaml： ``` kubectl -n $NS delete raycluster glue-like kubectl -n $NS apply -f raycluster.yaml ``` #### 存取 Ray 儀表板您可以使用 kubectl 啟用連接埠轉送來存取 Ray 儀表板： ``` # Get service SVC=$(kubectl -n $NS get svc -l ray.io/cluster=glue-like,ray.io/node-type=head -o jsonpath='{.items[0].metadata.name}') # Make the Ray dashboard accessible at http://localhost:8265 on your local machine. kubectl -n $NS port-forward svc/$SVC 8265:8265 ``` ### 步驟 5. 提交 Ray 任務若要提交 Ray 任務，請使用 Ray 任務 CLI。CLI 版本可以比叢集更新，並且可以回溯相容。作為先決條件，請將您的任務指令碼儲存在本機檔案中，例如 `job.py`。 ``` python3 -m venv ~/raycli && source ~/raycli/bin/activate pip install "ray[default]==2.49.2" # Submit your ray job by supplying all python dependencies that was added to your Glue job ray job submit --address http://127.0.0.1:8265 --working-dir . \ --runtime-env-json '{ "pip": ["boto3==1.28.*","pyarrow==12.*","pandas==2.0.*"] }' \ -- python job.py ``` 您可以在 Ray 儀表板上監控任務。