# 什么是 AWS Glue？ AWS Glue 是一项无服务器数据集成服务，可让使用分析功能的用户轻松发现、准备、移动和集成来自多个来源的数据。您可以将其用于分析、机器学习和应用程序开发。它还包括用于编写、运行任务和实施业务工作流程的额外生产力和数据操作工具。通过使用 AWS Glue，您可以发现并连接到 70 多个不同的数据来源，并在集中式数据目录中管理您的数据。您可以直观地创建、运行和监控“提取、转换、加载（ETL）”管道，以将数据加载到数据湖中。此外，您可以使用 Amazon Athena、Amazon EMR 和 Amazon Redshift Spectrum 立即搜索和查询已编目数据。 AWS Glue 将主要数据集成功能整合到一项服务中。其中包括数据发现、现代 ETL、清理、转换和集中式编目。这也是一项无服务器服务，即无需管理基础设施。通过在一项服务中灵活支持 ETL、ELT 和流式传输之类的所有工作负载，AWS Glue 可为不同工作负载和类型的用户提供支持。此外，AWS Glue 可以轻松地在您的架构中集成数据。它可与 AWS 分析服务和 Amazon S3 数据湖集成。AWS Glue 具有集成式界面和任务编写工具，对于从开发人员到业务用户在内的所有用户来说，使用十分方便，还可针对不同的技术技能组合提供定制解决方案。 [![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/u14iVEc-C6E/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/u14iVEc-C6E) AWS Glue 可按需扩展，因此可帮助您专注于能最大限度地提高数据价值的高价值活动。可针对任何数据大小进行扩展，并支持所有数据类型和架构变化。为了提高灵活性并优化成本，AWS Glue 提供内置的高可用性和即付即用计费模式。有关定价信息，请参阅 [AWS Glue 定价](https://aws.amazon.com/glue/pricing)。 **AWS Glue Studio** AWS Glue Studio 采用图形界面，能让您轻松创建、运行和监控 AWS Glue 中的数据集成任务。您可以直观地编写数据转换工作流，并在 AWS Glue 中的基于 Apache Spark 的无服务器 ETL 引擎上无缝运行。使用 AWS Glue Studio，您能够创建并管理收集、转换和清理数据的任务。您还可以使用 AWS Glue Studio 进行问题排查并编辑任务脚本。 **Topics** + [AWS Glue 特征](#glue-features-summary) + [了解 AWS Glue 中的创新](#innovations-in-glue) + [AWS Glue 入门](#getting-started-with-glue) + [访问 AWS Glue](#accessing-aws-glue) + [相关服务](#what-is-glue-related-services) + [AWS Glue for Ray 终止支持](awsglue-ray-jobs-availability-change.md) ## AWS Glue 特征 AWS Glue 功能分为三大类： + 发现和整理数据 + 转换、准备和清理数据以进行分析 + 构建和监控数据管道 **发现和整理数据** + **跨多个数据存储的统一和搜索** – 通过对 AWS 中的所有数据进行编目，跨多个数据来源和接收器进行存储、索引和搜索。 + **自动发现数据** – 使用 AWS Glue 爬网程序自动推断架构信息并将其集成到 AWS Glue Data Catalog。 + **管理架构和权限** – 验证和控制对数据库和表的访问。 + **连接到各种数据来源** – 利用本地和 AWS 的多个数据来源，使用 AWS Glue 连接构建您的数据湖，从而了解多个数据源。 **转换、准备和清理数据以进行分析** + **使用作业画布界面直观地转换数据** – 在可视任务编辑器中定义 ETL 流程，并自动生成用于提取、转换和加载数据的代码。 + **通过简单的任务计划构建复杂的 ETL 管道** – 按计划、按需或按事件调用 AWS Glue 任务。 + **清理和转换传输中流数据** – 支持持续性的数据使用，并在传输过程中对其进行清理和转换。这样便可在数秒内在目标数据存储中完成分析。 + **通过内置的机器学习去除重复数据和清理数据** – 使用 `FindMatches` 功能，您无需成为机器学习专家也能轻松清理和准备数据以进行分析。此功能可去除重复项并查找彼此不完全匹配的记录。 + **内置任务笔记本** – 仅需在 AWS Glue 中进行最少设置，AWS Glue 任务笔记本即可提供无服务器笔记本，以便于您快速开始使用。 + **编辑、调试和测试 ETL 代码** – 通过 AWS Glue 交互式会话，您能够以交互方式探索和准备数据。您可以使用 IDE 或自己选择的笔记本以交互方式探索数据、对数据进行试验以及处理数据。 + **定义、检测和修复敏感数据** – AWS Glue 的敏感数据检测功能可让您定义、识别和处理数据管道和数据湖中的敏感数据。 **构建和监控数据管道** + **根据工作负载自动扩展** – 根据工作负载动态扩展和缩减资源。仅在需要时才为工作人员分配任务。 + **使用基于事件的触发器自动处理任务** – 使用基于事件的触发器启动爬网程序或 AWS Glue 任务，并设计相互依赖的任务与爬网程序链。 + **运行和监控作业** - 使用您选择的引擎（Spark 或 Ray）运行 AWS Glue 作业。使用自动监控工具 AWS Glue 作业运行见解和 AWS CloudTrail 对其进行监控。使用 Apache Spark 用户界面改善对 Spark 支持的作业的监控。 + **定义 ETL 和集成活动的工作流程** – 为多个爬网程序、任务和触发器定义 ETL 和集成活动的工作流程。 ## 了解 AWS Glue 中的创新了解 AWS Glue 中最新的创新，听听客户如何使用 AWS Glue 在整个组织中实现自助式数据准备。 [![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/cDDPg_XxPqc/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/cDDPg_XxPqc) 了解客户如何在传统设置之外扩展 AWS Glue，以及他们如何针对作业监控和性能配置 AWS Glue。 [![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/ce6t3FqB_Z4/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/ce6t3FqB_Z4) ## AWS Glue 入门我们建议您首先阅读以下部分： + [AWS Glue 使用概述](https://docs.aws.amazon.com/glue/latest/dg/start-console-overview.html) + [AWS Glue 概念](https://docs.aws.amazon.com/glue/latest/dg/components-key-concepts.html) + [为 AWS Glue 设置 IAM 权限](https://docs.aws.amazon.com/glue/latest/dg/set-up-iam.html) + [AWS Glue Data Catalog 入门](https://docs.aws.amazon.com/glue/latest/dg/start-data-catalog.html) + [在 AWS Glue 中编写任务](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html) + [开始使用 AWS Glue 交互式会话](https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html) + [在 AWS Glue 中编排](https://docs.aws.amazon.com/glue/latest/dg/etl-jobs.html) ## 访问 AWS Glue 可以使用以下界面创建、查看和管理您的 AWS Glue 任务： + **AWS Glue 控制台** – 提供 Web 界面供您创建、查看和管理 AWS Glue 任务。要访问此控制台，请参阅 [https://console.aws.amazon.com/glue](https://console.aws.amazon.com/glue)。 + **AWS Glue Studio** – 提供图形界面供您直观地创建和编辑 AWS Glue 任务。有关更多信息，请参阅 [构建可视化 ETL 作业](author-job-glue.md)。 + **AWS CLI 参考的 AWS Glue 部分** – 提供可与 AWS Glue 配合使用的 AWS CLI 命令。有关更多信息，请参阅[适用于 AWS Glue 的 AWS CLI 参考](https://docs.aws.amazon.com/cli/latest/reference/glue/index.html)。 + **AWS Glue API** – 为开发人员提供完整的 API 参考。有关更多信息，请参阅 [AWS Glue API](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html)。 ## 相关服务 AWS Glue 的用户也使用： + **[AWS Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html)** – 此服务是授权层，提供对 AWS Glue Data Catalog 中的资源访问权的精细控制。 + ** [AWS Glue DataBrew](https://docs.aws.amazon.com/databrew/latest/dg/what-is.html) ** – 是一种可视化数据准备工具，让您无需编写任何代码即可清理数据并实现标准化。 # AWS Glue for Ray 终止支持 **重要** 自 2026 年 4 月 30 日起，AWS Glue for Ray 将不再向新客户开放。要使用 AWS Glue for Ray，请在该日期之前注册。现有客户可以继续正常使用该服务。要获取与 AWS Glue for Ray 相似的功能，可探索 Amazon EKS。有关更多信息，请参阅 [AWS Glue for Ray 终止支持](https://docs.aws.amazon.com/glue/latest/dg/awsglue-ray-jobs-availability-change.html)。经过深思熟虑，我们决定自 2026 年 4 月 30 日起向新客户关闭 AWS Glue for Ray。要使用 AWS Glue for Ray，请在该日期之前注册。现有客户可以继续正常使用该服务。 AWS 会继续投资提高 AWS Glue for Ray 的安全性和可用性。但请注意，除安全性和可用性增强之外，我们并不计划为 AWS Glue for Ray 引入新功能。作为 AWS Glue for Ray 的替代方案，我们建议使用 Amazon Elastic Kubernetes Service。Amazon Elastic Kubernetes Service 是一项完全托管式、经认证符合 Kubernetes 规范的服务，可简化在 AWS 上构建、保护、运行和维护 Kubernetes 集群的过程。这是一个高度可定制的选项，使用开源 KubeRay Operator 在 Kubernetes 上部署和管理 Ray 集群，从而可提高资源利用率，简化基础设施管理，并且完全支持各种 Ray 功能。 ## 将 Ray 作业迁移到 Amazon Elastic Kubernetes Service 本节介绍了从 AWS Glue for Ray 迁移到 Amazon Elastic Kubernetes Service 上的 Ray 的步骤。这些步骤对于以下两种迁移场景十分实用： + **标准迁移（x86/amd64）**：对于此类使用案例，迁移策略使用 OpenSource Ray 容器进行基础实施，并直接在基础容器上执行脚本。 + **ARM64 迁移**：对于此类使用案例，迁移策略支持使用自定义容器构建来满足 ARM64 特定的依赖项和架构要求。 ### 迁移的先决条件安装以下 CLI 工具：**aws**、**kubectl**、**eksctl**、**helm**、Python 3.9\$1。这些 CLI 工具是预调配和管理 EKS 上的 Ray 环境所必需的。**eksctl** 可简化 EKS 集群的创建和管理。**kubectl** 是用于在集群上部署和排查工作负载问题的标准 Kubernetes CLI。**helm** 用于安装和管理 KubeRay（在 Kubernetes 上运行 Ray 的操作符）。Ray 本身需要使用 Python 3.9\$1 才能在本地运行作业提交脚本。 #### 安装 eksctl 按照 [Installation options for Eksctl](https://docs.aws.amazon.com/eks/latest/eksctl/installation.html) 中的说明进行操作，或按以下说明进行安装。对于 macOS： ``` brew tap weaveworks/tap brew install weaveworks/tap/eksctl ``` 对于 Linux： ``` curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp # Move the extracted binary to /usr/local/bin sudo mv /tmp/eksctl /usr/local/bin # Test the installation eksctl version ``` #### 安装 kubectl 按照 [Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) 中的说明进行操作，或按以下说明进行安装。对于 macOS： ``` brew install kubectl ``` 对于 Linux： ``` curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" chmod +x kubectl sudo mv kubectl /usr/local/bin/ ``` #### 安装 helm 按照 [Installing Helm](https://helm.sh/docs/intro/install/) 中的说明进行操作，或按以下说明进行安装。对于 macOS： ``` brew install helm ``` 对于 Linux： ``` curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash ``` ### 步骤 1：构建或选择一个适用于 Ray 的 Docker 映像 **选项 1：使用官方 Ray 映像（无需构建）** 此选项使用 [Docker Hub](https://hub.docker.com/u/rayproject) 上的官方 Ray Docker 映像，例如由 Ray 项目维护的 `rayproject/ray:2.4.0-py39`。 **注意** 此映像仅支持 amd64。如果您的依赖项与 amd64 兼容，并且不需要特定于 ARM 的版本，则可以使用此选项。 **选项 2：构建并发布自己的 arm64 Ray 2.4.0 映像** 此选项在使用 Graviton（ARM）节点时非常实用，与 AWS Glue for Ray 内部使用的节点一致。您可以创建固定到与 AWS Glue for Ray 相同的依赖项版本的自定义映像，从而减少兼容性不匹配问题。在本地创建 Dockerfile： ``` # Build an ARM64 image FROM --platform=linux/arm64 python:3.9-slim-bullseye # Handy tools: wget for KubeRay probes; CA certs; keep image small RUN apt-get update && apt-get install -y --no-install-recommends \ wget ca-certificates \ && rm -rf /var/lib/apt/lists/* # Keep pip/setuptools modern enough for wheels resolution RUN python -m pip install -U "pip<24" "setuptools<70" wheel # ---- Install Ray 2.4.0 (ARM64 / Py3.9) and Glue-like dependencies ---- # 1) Download the exact Ray 2.4.0 wheel for aarch64 (no network at runtime) RUN python -m pip download --only-binary=:all: --no-deps --dest /tmp/wheels ray==2.4.0 # 2) Core libs used in Glue (pin to Glue-era versions) # + the dashboard & jobs API dependencies compatible with Ray 2.4.0. # (Pins matter: newer major versions break 2.4.0's dashboard.) RUN python -m pip install --no-cache-dir \ /tmp/wheels/ray-2.4.0-*.whl \ "pyarrow==11.0.0" \ "pandas==1.5.3" \ "boto3==1.26.133" \ "botocore==1.29.133" \ "numpy==1.24.3" \ "fsspec==2023.4.0" \ "protobuf<4" \ # --- dashboard / jobs server deps --- "aiohttp==3.8.5" \ "aiohttp-cors==0.7.0" \ "yarl<1.10" "multidict<7.0" "frozenlist<1.4" "aiosignal<1.4" "async_timeout<5" \ "pydantic<2" \ "opencensus<0.12" \ "prometheus_client<0.17" \ # --- needed if using py_modules --- "smart_open[s3]==6.4.0" # Optional: prove Ray & arch at container start ENV PYTHONUNBUFFERED=1 WORKDIR /app # KubeRay overrides the start command; this is just a harmless default CMD ["python","-c","import ray,platform; print('Ray', ray.__version__, 'on', platform.machine())"] ``` ``` # Set environment variables export AWS_REGION=us-east-1 export AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text) export REPO=ray-2-4-arm64 export IMAGE=${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPO}:v1 # Create repository and login aws ecr create-repository --repository-name $REPO >/dev/null 2>&1 || true aws ecr get-login-password --region $AWS_REGION \ | docker login --username AWS --password-stdin ${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com # Enable Buildx (for cross-builds on non-ARM hosts) docker buildx create --name multi --driver docker-container --use 2>/dev/null || true # Build & push ARM64 image docker buildx build \ --platform linux/arm64 \ -t "$IMAGE" \ . --push # Verify the image architecture remotely aws ecr batch-get-image \ --repository-name $REPO \ --image-ids imageTag=v1 \ --accepted-media-types application/vnd.docker.distribution.manifest.v2+json \ | jq -r '.images[0].imageManifest' \ | jq -r 'fromjson.config.digest' ``` 完成后，在 RayCluster 规范中使用 `nodeSelector: { kubernetes.io/arch: arm64 }` 引用此 ARM64 映像。 ``` spec: rayVersion: "2.4.0" headGroupSpec: template: spec: containers: - name: ray-head image: ``` ### 步骤 2：将 AWS Glue for Ray 作业配置转换到 Amazon Elastic Kubernetes Service 上的 Ray AWS Glue for Ray 作业支持一组用于配置 Worker 节点、依赖项、内存和日志记录的作业参数。使用 KubeRay 迁移到 Amazon Elastic Kubernetes Service 时，需要将这些参数转换为 RayCluster 规范字段或 Ray 作业运行时环境设置。 #### 作业参数映射 **将 AWS Glue for Ray 参数映射到 EKS 上的 Ray 等效参数** | AWS Glue for Ray 参数 | 在 AWS Glue for Ray 中的作用 | Amazon Elastic Kubernetes Service 上的 Ray 等效参数 | | --- | --- | --- | | --min-workers | 该作业必须分配的最低 Worker 节点数。 | RayCluster 中的 workerGroupSpecs[].minReplicas | | --working-dir | 将 zip（S3 URI）分发到所有节点。 | 如果从本地文件提交，则使用 Ray 运行时环境 working\$1dir；要将 S3 zip 文件指向 S3 构件，请使用 py\$1modules | | --s3-py-modules | 添加来自 S3 的 Python wheels/dists。 | 使用 Ray 运行时环境：py\$1modules: ["s3://.../xxx.whl", ...] | | --pip-install | 为作业安装额外的 PyPI 软件包。 | Ray 运行时环境：pip: ["pkg==ver", ...]（Ray Job CLI --runtime-env-json 或 RayJob runtimeEnvYAML）。 | | --object\$1store\$1memory\$1head | 用于头节点 Plasma 存储的内存百分比。 | RayCluster 中的 headGroupSpec[].rayStartParams.object-store-memory。请注意，此参数的单位应为字节。AWS Glue 使用百分比，而 Ray 使用字节。 | | --object\$1store\$1memory\$1worker | 用于 Worker 节点 Plasma 存储的内存百分比。 | 同上，但在每个 Worker 节点组的 rayStartParams.object-store-memory（字节）中设置。 | | --object\$1spilling\$1config | 配置 Ray 对象溢出。 | headGroupSpec[].rayStartParams.object-spilling-config | | --logging\$1configuration | AWS Glue 托管式日志（CloudWatch、S3）。 | 检查容器组 stdout/stderr：使用 kubectl -n ray logs --follow。检查来自 Ray 控制面板（端口转发到 :8265）的日志，您还可以在其中查看任务和作业日志。 | #### 作业配置映射 **将 AWS Glue for Ray 作业配置映射到 EKS 上的 Ray 等效配置** | 配置 | 在 AWS Glue for Ray 中的作用 | EKS 上的 Ray 等效配置 | | --- | --- | --- | | Worker 类型 | 设置作业运行时允许的预定义 Worder 节点类型。默认为 Z 2X（8vCPU，64 GB RAM）。 | EKS 中的节点组实例类型（例如，ARM 为 r7g.2xlarge ≈ 8 vCPU/64 GB，x86 为 r7a.2xlarge）。 | | 最大 Worder 节点数 | 您希望 AWS Glue 分配给此作业的 Worker 节点数量。 | workerGroupSpecs[].maxReplicas 的设置应与您在 AWS Glue 中使用的数量相同。这是自动扩缩的上限。同样将 minReplicas 设置为下限。您可以首先使用 replicas: 0、minReplicas: 0。 | ### 步骤 3：设置 Amazon Elastic Kubernetes Service 您可以创建新的 Amazon Elastic Kubernetes Service 集群，也可以再利用现有的 Amazon Elastic Kubernetes Service 集群。如果使用现有的集群，请越过创建集群命令跳至添加节点组、IRSA，然后安装 KubeRay。 #### 创建 Amazon Elastic Kubernetes Service 集群 **注意** 如果您已有现成的 Amazon Elastic Kubernetes Service 集群，请跳过创建新集群的命令，只需添加节点组即可。 ``` # Environment Variables export AWS_REGION=us-east-1 export CLUSTER=ray-eks export NS=ray # namespace for your Ray jobs (you can reuse another if you like) # Create a cluster (OIDC is required for IRSA) eksctl create cluster \ --name $CLUSTER \ --region $AWS_REGION \ --with-oidc \ --managed ``` #### 添加节点组 ``` # ARM/Graviton (matches Glue's typical runtime): eksctl create nodegroup \ --cluster $CLUSTER \ --region $AWS_REGION \ --name arm64-ng \ --node-type m7g.large \ --nodes 2 --nodes-min 1 --nodes-max 5 \ --managed \ --node-labels "workload=ray" # x86/amd64 (use if your image is amd64-only): eksctl create nodegroup \ --cluster $CLUSTER \ --region $AWS_REGION \ --name amd64-ng \ --node-type m5.large \ --nodes 2 --nodes-min 1 --nodes-max 5 \ --managed \ --node-labels "workload=ray" ``` **注意** 如果您使用现有的 Amazon Elastic Kubernetes Service 集群，请在添加节点组时使用 `--with-oidc` 启用 OIDC。 #### 为 S3 的服务账户（IRSA）创建命名空间 \$1 IAM 角色 Kubernetes 命名空间是资源（容器组、服务、角色等）的逻辑分组。您可以创建命名空间，也可以再利用现有的命名空间。您还需要为 S3 创建一个符合 AWS Glue 作业访问权限的 IAM 策略。使用与您的 AWS Glue 作业角色相同的自定义权限（通常是对特定存储桶的 S3 读/写权限）。要向 Amazon Elastic Kubernetes Service 授予与 AWSGlueServiceRole 类似的权限，请创建一个绑定到此 IAM 策略的服务账户（IRSA）。有关设置此服务账户的说明，请参阅 [IAM Roles for Service Accounts](https://docs.aws.amazon.com/eks/latest/eksctl/iamserviceaccounts.html)。 ``` # Create (or reuse) namespace kubectl create namespace $NS || true ``` ``` { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::YOUR-BUCKET", "arn:aws:s3:::YOUR-BUCKET/*" ] }] } ``` ``` # Create the IAM policy and wire IRSA: aws iam create-policy \ --policy-name RayS3Policy \ --policy-document file://example.json || true # Create a service account (IRSA) bound to that policy. eksctl create iamserviceaccount \ --cluster $CLUSTER \ --region $AWS_REGION \ --namespace $NS \ --name ray-s3-access \ --attach-policy-arn arn:aws:iam::${AWS_ACCOUNT}:policy/RayS3Policy \ --approve \ --override-existing-serviceaccounts ``` #### 安装 KubeRay 操作符（在 K8s 上运行 Ray 的控制器） ``` helm repo add kuberay https://ray-project.github.io/kuberay-helm/ helm repo update helm upgrade --install kuberay-operator kuberay/kuberay-operator \ --namespace kuberay-system \ --create-namespace # Validate the operator pod Running kubectl -n kuberay-system get pods ``` ### 步骤 4：快速启动 Ray 集群创建一个 YAML 文件来定义 Ray 集群。以下是一个示例配置（raycluster.yaml）： ``` apiVersion: ray.io/v1 kind: RayCluster metadata: name: glue-like namespace: ray spec: rayVersion: "2.4.0" headGroupSpec: template: spec: nodeSelector: kubernetes.io/arch: amd64 serviceAccountName: ray-s3-access containers: - name: ray-head image: rayproject/ray:2.4.0-py39 imagePullPolicy: Always resources: requests: { cpu: "1", memory: "2Gi" } limits: { cpu: "1", memory: "2Gi" } workerGroupSpecs: - groupName: workers replicas: 0 # start with just a head (like small Glue dev job) and turn number of replicas later minReplicas: 0 maxReplicas: 5 template: spec: nodeSelector: kubernetes.io/arch: amd64 serviceAccountName: ray-s3-access containers: - name: ray-worker image: rayproject/ray:2.4.0-py39 imagePullPolicy: Always resources: requests: { cpu: "1", memory: "2Gi" } limits: { cpu: "1", memory: "2Gi" } ``` #### 在 Amazon Elastic Kubernetes Service 集群上部署 Ray 集群 ``` kubectl apply -n $NS -f raycluster.yaml # Validate that the head pod turns to READY/ RUNNING state kubectl -n $NS get pods -l ray.io/cluster=glue-like -w ``` 如果需要修改已部署的 yaml，请先删除该集群，然后重新应用更新后的 yaml： ``` kubectl -n $NS delete raycluster glue-like kubectl -n $NS apply -f raycluster.yaml ``` #### 访问 Ray 控制面板您可以通过使用 kubectl 启用端口转发来访问 Ray 控制面板： ``` # Get service SVC=$(kubectl -n $NS get svc -l ray.io/cluster=glue-like,ray.io/node-type=head -o jsonpath='{.items[0].metadata.name}') # Make the Ray dashboard accessible at http://localhost:8265 on your local machine. kubectl -n $NS port-forward svc/$SVC 8265:8265 ``` ### 步骤 5。提交 Ray 作业要提交 Ray 作业，请使用 Ray 作业 CLI。CLI 版本可以比集群更新，并且向后兼容。作为一项先决条件，请将作业脚本存储到某个本地文件中，例如 `job.py`。 ``` python3 -m venv ~/raycli && source ~/raycli/bin/activate pip install "ray[default]==2.49.2" # Submit your ray job by supplying all python dependencies that was added to your Glue job ray job submit --address http://127.0.0.1:8265 --working-dir . \ --runtime-env-json '{ "pip": ["boto3==1.28.*","pyarrow==12.*","pandas==2.0.*"] }' \ -- python job.py ``` 可以在 Ray 控制面板上监控作业。