# Amazon Athena로 Amazon S3 인벤토리 쿼리 Athena를 사용할 수 있는 모든 리전에서 Amazon Athena를 사용하여 표준 SQL 쿼리로 Amazon S3 인벤토리 파일을 쿼리할 수 있습니다. AWS 리전 가용성을 확인하려면 [AWS 리전 표](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/)를 참조하세요. Athena는 [Apache 최적화된 행 열 형식(ORC)](https://orc.apache.org/), [https://parquet.apache.org/](https://parquet.apache.org/) 또는 쉼표로 구분된 값(CSV) 형식의 Amazon S3 인벤토리 파일을 쿼리할 수 있습니다. Athena를 사용하여 인벤토리 파일을 쿼리할 경우 ORC 형식 또는 Parquet 형식 인벤토리 파일을 사용하는 것이 좋습니다. ORC 및 Parquet 형식은 더 빠른 쿼리 성능을 제공하지만 쿼리 비용이 더 저렴합니다. ORC와 Parquet은 자체 설명이 가능한 유형 인식 열 형식의 파일 형식으로, [http://hadoop.apache.org/](http://hadoop.apache.org/)용으로 설계되었습니다. 이 컬럼 방식의 형식을 통해 리더가 현재 쿼리에 필요한 컬럼만 읽고 압축 해제하며 처리합니다. Amazon S3 인벤토리에 대한 ORC 및 Parquet 형식은 모든 AWS 리전에서 사용할 수 있습니다. **Athena를 사용하여 Amazon S3 인벤토리 파일을 쿼리하려면** 1. Athena 테이블을 만듭니다. 테이블 생성에 대한 자세한 내용은 *Amazon Athena 사용 설명서*의 [Amazon Athena에서 테이블 생성](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html)을 참조하세요. 1. 쿼리하려는 인벤토리 보고서가 ORC 형식인지, Parquet 형식인지, CSV 형식인지에 따라 다음 샘플 쿼리 템플릿 중 하나를 사용하여 쿼리를 생성합니다. + Athena를 사용하여 ORC 형식의 인벤토리 보고서를 쿼리하는 경우 다음 샘플 쿼리를 템플릿으로 사용합니다. 다음 쿼리 예제에는 ORC 형식 인벤토리 보고서의 모든 선택적 필드가 포함되어 있습니다. 이 샘플 쿼리를 사용하려면 다음을 수행하세요. + `your_table_name`을 생성한 Athena 테이블의 이름으로 바꿉니다. + 쿼리가 인벤토리에 대해 선택한 필드와 일치하도록 인벤토리에 대해 선택하지 않은 선택적 필드를 제거합니다. + 다음 버킷 이름과 인벤토리 위치(구성 ID)를 구성에 맞게 적절하게 바꿉니다. `s3://amzn-s3-demo-bucket/config-ID/hive/` + `2022-01-01-00-00` 아래의 `projection.dt.range` 날짜를 Athena에서 데이터를 파티션하는 시간 범위의 첫날로 바꿉니다. 자세한 내용은 [Athena에서 데이터 파티셔닝](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)을 참조하세요. ``` CREATE EXTERNAL TABLE your_table_name ( bucket string, key string, version_id string, is_latest boolean, is_delete_marker boolean, size bigint, last_modified_date timestamp, e_tag string, storage_class string, is_multipart_uploaded boolean, replication_status string, encryption_status string, object_lock_retain_until_date bigint, object_lock_mode string, object_lock_legal_hold_status string, intelligent_tiering_access_tier string, bucket_key_status string, checksum_algorithm string, object_access_control_list string, object_owner string, lifecycle_expiration_date timestamp ) PARTITIONED BY ( dt string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/' TBLPROPERTIES ( "projection.enabled" = "true", "projection.dt.type" = "date", "projection.dt.format" = "yyyy-MM-dd-HH-mm", "projection.dt.range" = "2022-01-01-00-00,NOW", "projection.dt.interval" = "1", "projection.dt.interval.unit" = "HOURS" ); ``` + Athena를 사용하여 Parquet 형식의 인벤토리 보고서를 쿼리하는 경우 ORC 형식의 보고서에 대한 샘플 쿼리를 사용합니다. 단, `ROW FORMAT SERDE` 문에서 ORC SerDe 대신 다음 Parquet SerDe를 사용하세요. ``` ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' ``` + Athena를 사용하여 CSV 형식의 인벤토리 보고서를 쿼리하는 경우 다음 샘플 쿼리를 템플릿으로 사용합니다. 다음 쿼리 예제에는 CSV 형식 인벤토리 보고서의 모든 선택적 필드가 포함되어 있습니다. 이 샘플 쿼리를 사용하려면 다음을 수행하세요. + `your_table_name`을 생성한 Athena 테이블의 이름으로 바꿉니다. + 쿼리가 인벤토리에 대해 선택한 필드와 일치하도록 인벤토리에 대해 선택하지 않은 선택적 필드를 제거합니다. + 다음 버킷 이름과 인벤토리 위치(구성 ID)를 구성에 맞게 적절하게 바꿉니다. `s3://amzn-s3-demo-bucket/config-ID/hive/` + `2022-01-01-00-00` 아래의 `projection.dt.range` 날짜를 Athena에서 데이터를 파티션하는 시간 범위의 첫날로 바꿉니다. 자세한 내용은 [Athena에서 데이터 파티셔닝](https://docs.aws.amazon.com/athena/latest/ug/partitions.html)을 참조하세요. ``` CREATE EXTERNAL TABLE your_table_name ( bucket string, key string, version_id string, is_latest boolean, is_delete_marker boolean, size string, last_modified_date string, e_tag string, storage_class string, is_multipart_uploaded boolean, replication_status string, encryption_status string, object_lock_retain_until_date string, object_lock_mode string, object_lock_legal_hold_status string, intelligent_tiering_access_tier string, bucket_key_status string, checksum_algorithm string, object_access_control_list string, object_owner string, lifecycle_expiration_date string ) PARTITIONED BY ( dt string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/' TBLPROPERTIES ( "projection.enabled" = "true", "projection.dt.type" = "date", "projection.dt.format" = "yyyy-MM-dd-HH-mm", "projection.dt.range" = "2022-01-01-00-00,NOW", "projection.dt.interval" = "1", "projection.dt.interval.unit" = "HOURS" ); ``` 1. 이제 다음 예제와 같이 인벤토리에서 다양한 쿼리를 실행할 수 있습니다. 각 `user input placeholder`를 사용자의 정보로 바꿉니다. ``` # Get a list of the latest inventory report dates available. SELECT DISTINCT dt FROM your_table_name ORDER BY 1 DESC limit 10; # Get the encryption status for a provided report date. SELECT encryption_status, count(*) FROM your_table_name WHERE dt = 'YYYY-MM-DD-HH-MM' GROUP BY encryption_status; # Get the encryption status for inventory report dates in the provided range. SELECT dt, encryption_status, count(*) FROM your_table_name WHERE dt > 'YYYY-MM-DD-HH-MM' AND dt < 'YYYY-MM-DD-HH-MM' GROUP BY dt, encryption_status; ``` 인벤토리 보고서에 객체 액세스 제어 목록(ACL) 필드를 추가하도록 S3 인벤토리를 구성하면 보고서에 객체 ACL 필드의 값이 base64로 인코딩된 문자열로 표시됩니다. 객체 ACL 필드의 디코딩된 값을 JSON으로 가져오려면 Athena를 사용해 이 필드를 쿼리하면 됩니다. 다음 쿼리 예제를 참조하세요. 객체 ACL 필드에 대한 자세한 내용은 [객체 ACL 필드 작업](objectacl.md) 단원을 참조하세요. ``` # Get the S3 keys that have Object ACL grants with public access. WITH grants AS ( SELECT key, CAST( json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR)) ) AS grants_array FROM your_table_name ) SELECT key, grants_array, grant FROM grants, UNNEST(grants_array) AS t(grant) WHERE element_at(grant, 'uri') = 'http://acs.amazonaws.com/groups/global/AllUsers' ``` ``` # Get the S3 keys that have Object ACL grantees in addition to the object owner. WITH grants AS (SELECT key, from_utf8(from_base64(object_access_control_list)) AS object_access_control_list, object_owner, CAST(json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))) AS grants_array FROM your_table_name) SELECT key, grant, objectowner FROM grants, UNNEST(grants_array) AS t(grant) WHERE cardinality(grants_array) > 1 AND element_at(grant, 'canonicalId') != object_owner; ``` ``` # Get the S3 keys with READ permission that is granted in the Object ACL. WITH grants AS ( SELECT key, CAST( json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR)) ) AS grants_array FROM your_table_name ) SELECT key, grants_array, grant FROM grants, UNNEST(grants_array) AS t(grant) WHERE element_at(grant, 'permission') = 'READ'; ``` ``` # Get the S3 keys that have Object ACL grants to a specific canonical user ID. WITH grants AS ( SELECT key, CAST( json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR)) ) AS grants_array FROM your_table_name ) SELECT key, grants_array, grant FROM grants, UNNEST(grants_array) AS t(grant) WHERE element_at(grant, 'canonicalId') = 'user-canonical-id'; ``` ``` # Get the number of grantees on the Object ACL. SELECT key, object_access_control_list, json_array_length(json_extract(object_access_control_list,'$.grants')) AS grants_count FROM your_table_name; ``` Athena 사용에 대한 자세한 내용은 [Amazon Athena 사용 설명서](https://docs.aws.amazon.com/athena/latest/ug/)를 참조하십시오.