搭配 Glue AWS 使用 Amazon S3 Express One Zone

使用 5.1 版和更新AWS Glue版本，您可以從 ETL 任務讀取和寫入 Amazon S3 Express One Zone 目錄儲存貯體中的資料。S3 Express One Zone 是高效能的單一區域 Amazon S3 儲存類別，可為延遲敏感的應用程式提供一致的單一位數毫秒資料存取。

先決條件

您必須先具備下列項目AWS Glue，才能搭配使用 S3 Express One Zone：

執行 5.1 版或更新版本AWS Glue的任務。
在與AWS Glue任務相同的區域中建立的 S3 目錄儲存貯體。目錄儲存貯體不支援跨區域存取。如需詳細資訊，請參閱《Amazon S3 使用者指南》中的建立目錄儲存貯體。
IAM 角色的s3express:CreateSession許可。當 S3 Express One Zone 在目錄儲存貯體上執行動作時，它會CreateSession代表您呼叫。

IAM 許可

將下列許可新增至AWS Glue任務的 IAM 角色，以允許存取 S3 Express One Zone 目錄儲存貯體：


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3express:CreateSession",
            "Resource": "arn:aws:s3express:*:*:bucket/EXAMPLE-BUCKET--az-id--x-s3"
        }
    ]
}

將 EXAMPLE-BUCKET 取代為您的目錄儲存貯體名稱，並將 az-id 取代為可用區域 ID （例如，use1-az4)。

讀取和寫入資料

AWS Glue 5.1+ 版支援使用 s3://和 s3a:// URI 結構描述存取 S3 Express One Zone 目錄儲存貯體。不需任何其他設定。

下列範例示範如何從 AWS Glue ETL 任務中的 S3 Express One Zone 目錄儲存貯體讀取和寫入資料：


import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# S3 Express One Zone directory bucket path
express_path = "s3://EXAMPLE-BUCKET--use1-az4--x-s3/my-data/"

# Read data from S3 Express One Zone
df = spark.read.parquet(express_path)

# Write data to S3 Express One Zone
df.write.mode("overwrite").parquet(express_path + "output/")

您也可以搭配 S3 Express One Zone 使用 DynamicFrames：


# Read with DynamicFrame
dynamicFrame = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={"paths": [express_path]},
    format="parquet"
)

# Write with DynamicFrame
glueContext.write_dynamic_frame.from_options(
    frame=dynamicFrame,
    connection_type="s3",
    connection_options={"path": express_path + "output/"},
    format="parquet"
)

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

排除 Amazon S3 儲存體方案

管理分割區