本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # 查詢 AWS 服務日誌本節包含數個使用 Amazon Athena 查詢熱門資料集的程序，例如 AWS CloudTrail 日誌、Amazon CloudFront 日誌、Classic Load Balancer 日誌、Application Load Balancer 日誌、Amazon VPC 流程日誌和 Network Load Balancer 日誌。本節中的任務使用 Athena 主控台，但您也可以使用其他工具，例如 [Athena JDBC 驅動程式](connect-with-jdbc.md)、[AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/athena/) 或 [Amazon Athena API 參考](https://docs.aws.amazon.com/athena/latest/APIReference/)。如需有關使用 AWS CloudFormation 在 Athena 中自動建立 AWS 服務日誌資料表、分割區和範例查詢的資訊，請參閱 AWS 大數據部落格中的[自動化 AWS 服務日誌資料表建立和使用 Amazon Athena 進行查詢](https://aws.amazon.com/blogs/big-data/automating-aws-service-logs-table-creation-and-querying-them-with-amazon-athena/)。如需使用適用於的 Python 程式庫 AWS Glue 來建立常見架構以在 Athena 中處理 AWS 服務日誌和查詢它們的相關資訊，請參閱[使用 Amazon Athena 輕鬆查詢 AWS 服務日誌](https://aws.amazon.com/blogs/big-data/easily-query-aws-service-logs-using-amazon-athena/)。本節中的主題假設您已設定適當許可來存取 Athena，以及應包含要查詢的資料的 Amazon S3 儲存貯體。如需詳細資訊，請參閱[設定、管理和以程式設計方式存取](setting-up.md)及[開始使用](getting-started.md)。 **Topics** + [Application Load Balancer](application-load-balancer-logs.md) + [Elastic Load Balancing](elasticloadbalancer-classic-logs.md) + [CloudFront](cloudfront-logs.md) + [CloudTrail](cloudtrail-logs.md) + [Amazon EMR](emr-logs.md) + [Global Accelerator](querying-global-accelerator-flow-logs.md) + [GuardDuty](querying-guardduty.md) + [Network Firewall](querying-network-firewall-logs.md) + [Network Load Balancer](networkloadbalancer-classic-logs.md) + [Route 53](querying-r53-resolver-logs.md) + [Amazon SES](querying-ses-logs.md) + [Amazon VPC](vpc-flow-logs.md) + [AWS WAF](waf-logs.md) # 查詢 Application Load Balancer 日誌 Application Load Balancer 是 Elastic Load Balancing 的一個負載平衡選項，可實施使用容器的微服務部署中的流量分配。查詢 Application Load Balancer 日誌可讓您查看流量來源、延遲，以及往返於 Elastic Load Balancing 執行個體與後端應用程式傳輸的位元組。如需詳細資訊，請參閱《Application Load Balancer 使用者指南**》中的 [Application Load Balancer 的存取日誌](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html)和 [Application Load Balancer 的連線日誌](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html)。 ## 先決條件 + 啟用[存取日誌記錄](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html)或[連線日誌記錄](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html)，以便 Application Load Balancer 日誌可以儲存到您的 Amazon S3 儲存貯體中。 + 用於保存您將為 Athena 建立的資料表的資料庫。若要建立資料庫，您可以使用 Athena 或 AWS Glue 主控台。如需詳細資訊，請參閱本指南中的 [在 Athena 中建立資料庫](creating-databases.md)，或《 AWS 開發人員指南》[https://docs.aws.amazon.com/glue/latest/dg/console-databases.html](https://docs.aws.amazon.com/glue/latest/dg/console-databases.html)中的*在AWS Glue Glue 主控台上使用資料庫*。 **Topics** + [先決條件](#application-load-balancer-logs-prerequisites) + [建立 ALB 存取日誌的資料表](create-alb-access-logs-table.md) + [使用分割區投影在 Athena 中建立 ALB 存取日誌的資料表](create-alb-access-logs-table-partition-projection.md) + [ALB 存取日誌的查詢範例](query-alb-access-logs-examples.md) + [建立 ALB 連線日誌的資料表](create-alb-connection-logs-table.md) + [使用分割區投影在 Athena 中建立 ALB 連線日誌的資料表](create-alb-connection-logs-table-partition-projection.md) + [ALB 連線日誌的查詢範例](query-alb-connection-logs-examples.md) + [其他資源](application-load-balancer-logs-additional-resources.md) # 建立 ALB 存取日誌的資料表 1. 複製下列 `CREATE TABLE` 陳述式，並將其貼到 Athena 主控台查詢編輯器，然後針對自己的日誌項目要求視需要進行修改。如需有關 Athena 主控台入門的資訊，請參閱[開始使用](getting-started.md)。將 `LOCATION` 子句中的路徑取代為您的 Amazon S3 存取日誌資料夾位置。如需有關存取日誌檔案位置的詳細資訊，請參閱《Application Load Balancer 使用者指南**》中的[存取日誌檔案](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-file-format)。如需有關每個日誌檔案欄位的資訊，請參閱《Application Load Balancer 使用者指南**》中的[存取日誌項目](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format)。 **注意** 下列範例 `CREATE TABLE` 陳述式包含最近新增的 `classification`、`classification_reason` 和 `conn_trace_id` (「可追溯性 ID」或 TID) 資料欄。若要為不含這些項目的 Application Load Balancer 存取日誌建立資料表，請從 `CREATE TABLE` 陳述式移除相應的資料欄，然後據此修改規則表達式。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS alb_access_logs ( type string, time string, elb string, client_ip string, client_port int, target_ip string, target_port int, request_processing_time double, target_processing_time double, response_processing_time double, elb_status_code int, target_status_code string, received_bytes bigint, sent_bytes bigint, request_verb string, request_url string, request_proto string, user_agent string, ssl_cipher string, ssl_protocol string, target_group_arn string, trace_id string, domain_name string, chosen_cert_arn string, matched_rule_priority string, request_creation_time string, actions_executed string, redirect_url string, lambda_error_reason string, target_port_list string, target_status_code_list string, classification string, classification_reason string, conn_trace_id string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\\s]+?)\" \"([^\\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)? ?( .*)?' ) LOCATION 's3://amzn-s3-demo-bucket/access-log-folder-path/' ``` **注意** 我們建議在 `input.regex` 參數的結尾一律保留模式 *`?( .*)?`*，以便在新增新的 ALB 日誌欄位時，處理未來的日誌項目。 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `alb_access_logs` 資料表，讓其中的資料可供您發出查詢。 # 使用分割區投影在 Athena 中建立 ALB 存取日誌的資料表由於 ALB 存取日誌具有您可以事先指定其分割區結構描述的已知結構，您可以使用 Athena 分割區投影功能減少查詢執行時期並自動化分割區管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 `ALTER TABLE ADD PARTITION` 手動新增分割區。以下範例 `CREATE TABLE` 陳述式會從指定日期到現在，對單一 AWS 區域的 ALB 存取日誌自動使用分割區投影。該陳述式基於上一節中的範例，但新增了 `PARTITIONED BY` 和 `TBLPROPERTIES` 子句以啟用分割區投影。在 `LOCATION` 和 `storage.location.template` 子句中，請將預留位置取代為識別 ALB 存取日誌的 Amazon S3 儲存貯體位置的值。如需有關存取日誌檔案位置的詳細資訊，請參閱《Application Load Balancer 使用者指南**》中的[存取日誌檔案](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-file-format)。對於 `projection.day.range`，請用您要使用的開始日期取代 *2022*/*01*/*01*。成功執行查詢之後，您可以查詢資料表。您無須執行 `ALTER TABLE ADD PARTITION` 就能載入分割區。如需有關每個日誌檔案欄位的資訊，請參閱[存取日誌項目](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-log-entry-format)。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS alb_access_logs ( type string, time string, elb string, client_ip string, client_port int, target_ip string, target_port int, request_processing_time double, target_processing_time double, response_processing_time double, elb_status_code int, target_status_code string, received_bytes bigint, sent_bytes bigint, request_verb string, request_url string, request_proto string, user_agent string, ssl_cipher string, ssl_protocol string, target_group_arn string, trace_id string, domain_name string, chosen_cert_arn string, matched_rule_priority string, request_creation_time string, actions_executed string, redirect_url string, lambda_error_reason string, target_port_list string, target_status_code_list string, classification string, classification_reason string, conn_trace_id string ) PARTITIONED BY ( day STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\\s]+?)\" \"([^\\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)? ?( .*)?' ) LOCATION 's3://amzn-s3-demo-bucket/AWSLogs//elasticloadbalancing//' TBLPROPERTIES ( "projection.enabled" = "true", "projection.day.type" = "date", "projection.day.range" = "2022/01/01,NOW", "projection.day.format" = "yyyy/MM/dd", "projection.day.interval" = "1", "projection.day.interval.unit" = "DAYS", "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs//elasticloadbalancing//${day}" ) ``` 如需有關分割區投影的詳細資訊，請參閱[透過 Amazon Athena 使用分割區投影](partition-projection.md)。 **注意** 我們建議在 `input.regex` 參數的結尾一律保留模式 *?( .\$1)?*，以便在新增新的 ALB 日誌欄位時，處理未來的日誌項目。 # ALB 存取日誌的查詢範例以下查詢會計算負載平衡器收到的 HTTP GET 請求，並依用戶端 IP 地址分組： ``` SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_access_logs GROUP BY request_verb, client_ip LIMIT 100; ``` 另一個查詢顯示 Safari 瀏覽器使用者造訪的 URL： ``` SELECT request_url FROM alb_access_logs WHERE user_agent LIKE '%Safari%' LIMIT 10; ``` 下列查詢顯示具有 ELB 狀態碼值大於或等於 500 的記錄。 ``` SELECT * FROM alb_access_logs WHERE elb_status_code >= 500 ``` 下列範例說明如何以 `datetime` 剖析日誌： ``` SELECT client_ip, sum(received_bytes) FROM alb_access_logs WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime('2018-05-30-12:00:00','yyyy-MM-dd-HH:mm:ss') AND parse_datetime('2018-05-31-00:00:00','yyyy-MM-dd-HH:mm:ss') GROUP BY client_ip; ``` 下列查詢會查詢對指定日期的所有 ALB 存取日誌使用分割區投影的資料表。 ``` SELECT * FROM alb_access_logs WHERE day = '2022/02/12' ``` # 建立 ALB 連線日誌的資料表 1. 複製下列範例 `CREATE TABLE` 陳述式，並將其貼到 Athena 主控台查詢編輯器，然後針對自己的日誌項目要求視需要進行修改。如需有關 Athena 主控台入門的資訊，請參閱[開始使用](getting-started.md)。將 `LOCATION` 子句中的路徑取代為您的 Amazon S3 連線日誌資料夾位置。如需有關連線日誌檔案位置的詳細資訊，請參閱《Application Load Balancer 使用者指南**》中的[連線日誌檔案](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-file-format)。如需有關每個日誌檔案欄位的資訊，請參閱[連線日誌項目](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-entry-format)。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS alb_connection_logs ( time string, client_ip string, client_port int, listener_port int, tls_protocol string, tls_cipher string, tls_handshake_latency double, leaf_client_cert_subject string, leaf_client_cert_validity string, leaf_client_cert_serial_number string, tls_verify_status string, conn_trace_id string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?' ) LOCATION 's3://amzn-s3-demo-bucket/connection-log-folder-path/' ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `alb_connection_logs` 資料表，讓其中的資料可供您發出查詢。 # 使用分割區投影在 Athena 中建立 ALB 連線日誌的資料表由於 ALB 連線日誌具有您可以事先指定其分割區名稱的已知結構，您可以使用 Athena 分割區投影功能減少查詢執行時期並自動化分割區管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 `ALTER TABLE ADD PARTITION` 手動新增分割區。以下範例 `CREATE TABLE` 陳述式會從指定日期到現在，對單一 AWS 區域的 ALB 連線日誌自動使用分割區投影。該陳述式基於上一節中的範例，但新增了 `PARTITIONED BY` 和 `TBLPROPERTIES` 子句以啟用分割區投影。在 `LOCATION` 和 `storage.location.template` 子句中，請將預留位置取代為識別 ALB 連線日誌的 Amazon S3 儲存貯體位置的值。如需有關連線日誌檔案位置的詳細資訊，請參閱《Application Load Balancer 使用者指南**》中的[連線日誌檔案](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-file-format)。對於 `projection.day.range`，請用您要使用的開始日期取代 *2023*/*01*/*01*。成功執行查詢之後，您可以查詢資料表。您無須執行 `ALTER TABLE ADD PARTITION` 就能載入分割區。如需有關每個日誌檔案欄位的資訊，請參閱[連線日誌項目](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-connection-logs.html#connection-log-entry-format)。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS alb_connection_logs ( time string, client_ip string, client_port int, listener_port int, tls_protocol string, tls_cipher string, tls_handshake_latency double, leaf_client_cert_subject string, leaf_client_cert_validity string, leaf_client_cert_serial_number string, tls_verify_status string, conn_trace_id string ) PARTITIONED BY ( day STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?' ) LOCATION 's3://amzn-s3-demo-bucket/AWSLogs//elasticloadbalancing//' TBLPROPERTIES ( "projection.enabled" = "true", "projection.day.type" = "date", "projection.day.range" = "2023/01/01,NOW", "projection.day.format" = "yyyy/MM/dd", "projection.day.interval" = "1", "projection.day.interval.unit" = "DAYS", "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs//elasticloadbalancing//${day}" ) ``` 如需有關分割區投影的詳細資訊，請參閱[透過 Amazon Athena 使用分割區投影](partition-projection.md)。 # ALB 連線日誌的查詢範例下列查詢會計算 `tls_verify_status` 的值並非 `'Success'` 的出現次數，並依用戶端 IP 位址分組： ``` SELECT DISTINCT client_ip, count() AS count FROM alb_connection_logs WHERE tls_verify_status != 'Success' GROUP BY client_ip ORDER BY count() DESC; ``` 下列查詢會搜尋 `tls_handshake_latency` 的值在指定時間範圍內超過 2 秒的出現次數： ``` SELECT * FROM alb_connection_logs WHERE ( parse_datetime(time, 'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime('2024-01-01-00:00:00', 'yyyy-MM-dd-HH:mm:ss') AND parse_datetime('2024-03-20-00:00:00', 'yyyy-MM-dd-HH:mm:ss') ) AND (tls_handshake_latency >= 2.0); ``` # 其他資源如需有關使用 ALB 日誌的詳細資訊，請參閱以下資源。 + *AWS 知識中心*中的[如何使用 Amazon Athena 分析 Application Load Balancer 存取日誌？](https://repost.aws/knowledge-center/athena-analyze-access-logs)。 + 如需有關 Elastic Load Balancing 中的 HTTP 狀態碼的資訊，請參閱《*Application Load Balancers 使用者指南*》中的[對 Application Load Balancer 進行疑難排解](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html)。 + *AWS 大數據部落格*中的[使用 AWS Glue 自訂分類器和 Amazon Athena 更有效地編目和分析 Application Load Balancer 日誌](https://aws.amazon.com/blogs/big-data/catalog-and-analyze-application-load-balancer-logs-more-efficiently-with-aws-glue-custom-classifiers-and-amazon-athena/)。 # 查詢 Classic Load Balancer 日誌使用 Classic Load Balancer 日誌來分析和了解往返於 Elastic Load Balancing 執行個體與後端應用程式的流量模式。您可以查看流量的來源、延遲和已傳輸的位元組。分析 Elastic Load Balancing 日誌之前，請對其進行設定，以在目的地 Amazon S3 儲存貯體中儲存。如需詳細資訊，請參閱[為您的 Classic Load Balancer 啟用存取日誌](https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-access-logs.html)。 **若要為 Elastic Load Balancing 日誌建立資料表** 1. 複製下列 DDL 陳述式，並將其貼到 Athena 主控台。檢查 Elastic Load Balancing 日誌記錄的[語法](https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/access-log-collection.html#access-log-entry-format)。您可能需要更新以下查詢，為記錄的最新版本包含欄和 Regex 語法。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs ( timestamp string, elb_name string, request_ip string, request_port int, backend_ip string, backend_port int, request_processing_time double, backend_processing_time double, client_response_time double, elb_response_code string, backend_response_code string, received_bytes bigint, sent_bytes bigint, request_verb string, url string, protocol string, user_agent string, ssl_cipher string, ssl_protocol string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' ) LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_account_ID/elasticloadbalancing/'; ``` 1. 修改 `LOCATION` Amazon S3 儲存貯體，以指定您的 Elastic Load Balancing 日誌的目的地。 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `elb_logs` 資料表，讓其中的資料可供您查詢。如需詳細資訊，請參閱[查詢範例](#query-elb-classic-example)。 ## 查詢範例使用與以下範例類似的查詢。它會列出傳回 `4XX` 或 `5XX` 錯誤回應碼的後端應用程式伺服器。使用 `LIMIT` 運算子來限制一次查詢的日誌數量。 ``` SELECT timestamp, elb_name, backend_ip, backend_response_code FROM elb_logs WHERE backend_response_code LIKE '4%' OR backend_response_code LIKE '5%' LIMIT 100; ``` 使用後續的查詢來將依後端 IP 地址和 Elastic Load Balancing 執行個體名稱分組的所有交易的回應時間加總。 ``` SELECT sum(backend_processing_time) AS total_ms, elb_name, backend_ip FROM elb_logs WHERE backend_ip <> '' GROUP BY backend_ip, elb_name LIMIT 100; ``` 如需詳細資訊，請參閱[使用 Athena 在 S3 中分析資料](https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/)。 # 查詢 Amazon CloudFront 日誌您可以設定 Amazon CloudFront CDN 來將 Web 分佈存取日誌匯出至 Amazon Simple Storage Service。使用這些日誌來探索使用者的瀏覽模式，其涵蓋 CloudFront 提供的所有 Web 屬性。開始查詢日誌之前，請在您偏好的 CloudFront 分佈中啟用 Web 分佈存取日誌。如需相關資訊，請參閱《Amazon CloudFront 開發人員指南》**中的[存取日誌](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html)。記下儲存這些日誌的 Amazon S3 儲存貯體。 **Topics** + [建立 CloudFront 標準日誌的資料表 (舊版)](create-cloudfront-table-standard-logs.md) + [使用手動分割 (JSON) 在 Athena 中建立 CloudFront 日誌的資料表](create-cloudfront-table-manual-json.md) + [使用手動分割 (Parquet) 在 Athena 中建立 CloudFront 日誌的資料表](create-cloudfront-table-manual-parquet.md) + [使用分割區投影 (JSON) 在 Athena 中建立 CloudFront 日誌的資料表](create-cloudfront-table-partition-json.md) + [使用分割區投影 (Parquet) 在 Athena 中建立 CloudFront 日誌的資料表](create-cloudfront-table-partition-parquet.md) + [建立 CloudFront 即時日誌的資料表](create-cloudfront-table-real-time-logs.md) + [其他資源](cloudfront-logs-additional-resources.md) # 建立 CloudFront 標準日誌的資料表 (舊版) **注意** 下列程序適用於 CloudFront 中的 Web 分佈存取日誌。不適用於來自 RTMP 分佈的串流日誌。 **建立 CloudFront 標準日誌檔案欄位的資料表** 1. 複製下列範例 DDL 陳述式，並將其貼到 Athena 主控台查詢編輯器。範例陳述式使用《**Amazon CloudFront 開發人員指南》中[標準日誌檔欄位](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#BasicDistributionFileFormat)章節中所述的日誌檔欄位。為存放日誌的 Amazon S3 儲存貯體修改 `LOCATION`。如需有關使用查詢編輯器的資訊，請參閱[開始使用](getting-started.md)。此查詢可指定 `ROW FORMAT DELIMITED` 和 `FIELDS TERMINATED BY '\t'`，表示欄位以定位字元分隔。對於 `ROW FORMAT DELIMITED`，Athena 預設使用 [LazySimpleSerDe](lazy-simple-serde.md)。因為在 Athena 中它是保留字，所以使用反引號 (') 逸出 `date` 資料欄。如需相關資訊，請參閱[在查詢中逸出預留關鍵字](reserved-words.md)。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs ( `date` DATE, time STRING, x_edge_location STRING, sc_bytes BIGINT, c_ip STRING, cs_method STRING, cs_host STRING, cs_uri_stem STRING, sc_status INT, cs_referrer STRING, cs_user_agent STRING, cs_uri_query STRING, cs_cookie STRING, x_edge_result_type STRING, x_edge_request_id STRING, x_host_header STRING, cs_protocol STRING, cs_bytes BIGINT, time_taken FLOAT, x_forwarded_for STRING, ssl_protocol STRING, ssl_cipher STRING, x_edge_response_result_type STRING, cs_protocol_version STRING, fle_status STRING, fle_encrypted_fields INT, c_port INT, time_to_first_byte FLOAT, x_edge_detailed_result_type STRING, sc_content_type STRING, sc_content_len BIGINT, sc_range_start BIGINT, sc_range_end BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://amzn-s3-demo-bucket/' TBLPROPERTIES ( 'skip.header.line.count'='2' ) ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `cloudfront_standard_logs` 資料表，讓其中的資料可供您發出查詢。 ## 查詢範例以下查詢加總 CloudFront 在 2018 年 6 月 9 日到 6 月 11 日之間提供的位元組數。以雙引號圍住日期欄名稱，因為它是保留字。 ``` SELECT SUM(bytes) AS total_bytes FROM cloudfront_standard_logs WHERE "date" BETWEEN DATE '2018-06-09' AND DATE '2018-06-11' LIMIT 100; ``` 若要從查詢結果中除去重複的資料列 (例如，重複的空白資料列)，您可以使用 `SELECT DISTINCT` 陳述式，如下列範例所示。 ``` SELECT DISTINCT * FROM cloudfront_standard_logs LIMIT 10; ``` # 使用手動分割 (JSON) 在 Athena 中建立 CloudFront 日誌的資料表 **使用 JSON 格式建立 CloudFront 標準日誌檔案欄位的資料表** 1. 複製下列範例 DDL 陳述式，並將其貼到 Athena 主控台查詢編輯器。範例陳述式使用《**Amazon CloudFront 開發人員指南》中[標準日誌檔欄位](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#BasicDistributionFileFormat)章節中所述的日誌檔欄位。為存放日誌的 Amazon S3 儲存貯體修改 `LOCATION`。此查詢會搭配下列 SerDe 屬性使用 OpenX JSON SerDe，以在 Athena 中正確讀取 JSON 欄位。 ``` CREATE EXTERNAL TABLE `cf_logs_manual_partition_json`( `date` string , `time` string , `x-edge-location` string , `sc-bytes` string , `c-ip` string , `cs-method` string , `cs(host)` string , `cs-uri-stem` string , `sc-status` string , `cs(referer)` string , `cs(user-agent)` string , `cs-uri-query` string , `cs(cookie)` string , `x-edge-result-type` string , `x-edge-request-id` string , `x-host-header` string , `cs-protocol` string , `cs-bytes` string , `time-taken` string , `x-forwarded-for` string , `ssl-protocol` string , `ssl-cipher` string , `x-edge-response-result-type` string , `cs-protocol-version` string , `fle-status` string , `fle-encrypted-fields` string , `c-port` string , `time-to-first-byte` string , `x-edge-detailed-result-type` string , `sc-content-type` string , `sc-content-len` string , `sc-range-start` string , `sc-range-end` string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='c-ip,c-port,cs(Cookie),cs(Host),cs(Referer),cs(User-Agent),cs-bytes,cs-method,cs-protocol,cs-protocol-version,cs-uri-query,cs-uri-stem,date,fle-encrypted-fields,fle-status,sc-bytes,sc-content-len,sc-content-type,sc-range-end,sc-range-start,sc-status,ssl-cipher,ssl-protocol,time,time-taken,time-to-first-byte,x-edge-detailed-result-type,x-edge-location,x-edge-request-id,x-edge-response-result-type,x-edge-result-type,x-forwarded-for,x-host-header') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/' ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `cf_logs_manual_partition_json` 資料表，讓其中的資料可供您發出查詢。 ## 查詢範例下列查詢加總 CloudFront 在 2025 年 1 月 15 日提供的位元組數。 ``` SELECT sum(cast("sc-bytes" as BIGINT)) as sc FROM cf_logs_manual_partition_json WHERE "date"='2025-01-15' ``` 若要從查詢結果中除去重複的資料列 (例如，重複的空白資料列)，您可以使用 `SELECT DISTINCT` 陳述式，如下列範例所示。 ``` SELECT DISTINCT * FROM cf_logs_manual_partition_json ``` # 使用手動分割 (Parquet) 在 Athena 中建立 CloudFront 日誌的資料表 **使用 Parquet 格式建立 CloudFront 標準日誌檔案欄位的資料表** 1. 複製下列範例 DDL 陳述式，並將其貼到 Athena 主控台查詢編輯器。範例陳述式使用《**Amazon CloudFront 開發人員指南》中[標準日誌檔欄位](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#BasicDistributionFileFormat)章節中所述的日誌檔欄位。此查詢會搭配下列 SerDe 屬性使用 ParquetHiveSerDe，以在 Athena 中正確讀取 Parquet 欄位。 ``` CREATE EXTERNAL TABLE `cf_logs_manual_partition_parquet`( `date` string, `time` string, `x_edge_location` string, `sc_bytes` string, `c_ip` string, `cs_method` string, `cs_host` string, `cs_uri_stem` string, `sc_status` string, `cs_referer` string, `cs_user_agent` string, `cs_uri_query` string, `cs_cookie` string, `x_edge_result_type` string, `x_edge_request_id` string, `x_host_header` string, `cs_protocol` string, `cs_bytes` string, `time_taken` string, `x_forwarded_for` string, `ssl_protocol` string, `ssl_cipher` string, `x_edge_response_result_type` string, `cs_protocol_version` string, `fle_status` string, `fle_encrypted_fields` string, `c_port` string, `time_to_first_byte` string, `x_edge_detailed_result_type` string, `sc_content_type` string, `sc_content_len` string, `sc_range_start` string, `sc_range_end` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/' ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `cf_logs_manual_partition_parquet` 資料表，讓其中的資料可供您發出查詢。 ## 查詢範例下列查詢加總 CloudFront 在 2025 年 1 月 19 日提供的位元組數。 ``` SELECT sum(cast("sc_bytes" as BIGINT)) as sc FROM cf_logs_manual_partition_parquet WHERE "date"='2025-01-19' ``` 若要從查詢結果中除去重複的資料列 (例如，重複的空白資料列)，您可以使用 `SELECT DISTINCT` 陳述式，如下列範例所示。 ``` SELECT DISTINCT * FROM cf_logs_manual_partition_parquet ``` # 使用分割區投影 (JSON) 在 Athena 中建立 CloudFront 日誌的資料表您可以使用 Athena 分割區投影功能減少查詢執行時期並自動化分割區管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 `ALTER TABLE ADD PARTITION` 手動新增分割區。下列 CREATE TABLE 陳述式範例會從指定 CloudFront 分佈到現在，對單一 AWS 區域的 CloudFront 日誌自動使用分割區投影。成功執行查詢之後，您可以查詢資料表。 ``` CREATE EXTERNAL TABLE `cloudfront_logs_pp`( `date` string, `time` string, `x-edge-location` string, `sc-bytes` string, `c-ip` string, `cs-method` string, `cs(host)` string, `cs-uri-stem` string, `sc-status` string, `cs(referer)` string, `cs(user-agent)` string, `cs-uri-query` string, `cs(cookie)` string, `x-edge-result-type` string, `x-edge-request-id` string, `x-host-header` string, `cs-protocol` string, `cs-bytes` string, `time-taken` string, `x-forwarded-for` string, `ssl-protocol` string, `ssl-cipher` string, `x-edge-response-result-type` string, `cs-protocol-version` string, `fle-status` string, `fle-encrypted-fields` string, `c-port` string, `time-to-first-byte` string, `x-edge-detailed-result-type` string, `sc-content-type` string, `sc-content-len` string, `sc-range-start` string, `sc-range-end` string) PARTITIONED BY( distributionid string, year int, month int, day int, hour int ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='c-ip,c-port,cs(Cookie),cs(Host),cs(Referer),cs(User-Agent),cs-bytes,cs-method,cs-protocol,cs-protocol-version,cs-uri-query,cs-uri-stem,date,fle-encrypted-fields,fle-status,sc-bytes,sc-content-len,sc-content-type,sc-range-end,sc-range-start,sc-status,ssl-cipher,ssl-protocol,time,time-taken,time-to-first-byte,x-edge-detailed-result-type,x-edge-location,x-edge-request-id,x-edge-response-result-type,x-edge-result-type,x-forwarded-for,x-host-header') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/CloudFront/' TBLPROPERTIES ( 'projection.distributionid.type'='enum', 'projection.distributionid.values'='E2Oxxxxxxxxxxx', 'projection.day.range'='01,31', 'projection.day.type'='integer', 'projection.day.digits'='2', 'projection.enabled'='true', 'projection.month.range'='01,12', 'projection.month.type'='integer', 'projection.month.digits'='2', 'projection.year.range'='2025,2026', 'projection.year.type'='integer', 'projection.hour.range'='00,23', 'projection.hour.type'='integer', 'projection.hour.digits'='2', 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/CloudFront/${distributionid}/${year}/${month}/${day}/${hour}/') ``` 以下是先前範例中所用屬性的一些考量事項。 + **資料表名稱** – 資料表名稱 *`cloudfront_logs_pp`* 為可取代。您可以將其變更為您偏好的任何名稱。 + **位置** – 修改 `s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/` 來指向您的 Amazon S3 儲存貯體。 + **分佈 ID** – 對於 `projection.distributionid.values`，如果您使用逗號分隔分佈 ID，則可以指定多個分佈 ID。例如，**、**。 + **年份範圍** – 在 `projection.year.range` 中，您可以根據資料定義年份範圍。例如，您可以將其調整為任何期間，例如 *2025 年*、*2026 年*。 **注意** 包含空的分割區 (例如未來日期的分割區，例如：2025-2040 年) 可能會影響查詢效能。不過，分割區投影旨在有效地處理未來日期。為了維持最佳效能，請確保仔細管理分割區，並盡可能避免過多的空的分割區。 + **儲存位置範本** – 您必須確保根據下列 CloudFront 分割結構和 S3 路徑正確更新 `storage.location.template`。 **** [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/athena/latest/ug/create-cloudfront-table-partition-json.html) 在確認 CloudFront 分割結構和 S3 結構符合所需的模式後，請更新 `storage.location.template`，如下所示： ``` 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/account_id/CloudFront/${distributionid}/folder2/${year}/${month}/${day}/${hour}/folder3/' ``` **注意** `storage.location.template` 的適當組態對於確保正確的資料儲存和擷取至關重要。 # 使用分割區投影 (Parquet) 在 Athena 中建立 CloudFront 日誌的資料表下列 CREATE TABLE 陳述式範例會從指定 CloudFront 分佈到現在，對單一 AWS 區域的 Parquet 格式 CloudFront 日誌自動使用分割區投影。成功執行查詢之後，您可以查詢資料表。 ``` CREATE EXTERNAL TABLE `cloudfront_logs_parquet_pp`( `date` string, `time` string, `x_edge_location` string, `sc_bytes` string, `c_ip` string, `cs_method` string, `cs_host` string, `cs_uri_stem` string, `sc_status` string, `cs_referer` string, `cs_user_agent` string, `cs_uri_query` string, `cs_cookie` string, `x_edge_result_type` string, `x_edge_request_id` string, `x_host_header` string, `cs_protocol` string, `cs_bytes` string, `time_taken` string, `x_forwarded_for` string, `ssl_protocol` string, `ssl_cipher` string, `x_edge_response_result_type` string, `cs_protocol_version` string, `fle_status` string, `fle_encrypted_fields` string, `c_port` string, `time_to_first_byte` string, `x_edge_detailed_result_type` string, `sc_content_type` string, `sc_content_len` string, `sc_range_start` string, `sc_range_end` string) PARTITIONED BY( distributionid string, year int, month int, day int, hour int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/CloudFront/' TBLPROPERTIES ( 'projection.distributionid.type'='enum', 'projection.distributionid.values'='E3OK0LPUNWWO3', 'projection.day.range'='01,31', 'projection.day.type'='integer', 'projection.day.digits'='2', 'projection.enabled'='true', 'projection.month.range'='01,12', 'projection.month.type'='integer', 'projection.month.digits'='2', 'projection.year.range'='2019,2025', 'projection.year.type'='integer', 'projection.hour.range'='01,12', 'projection.hour.type'='integer', 'projection.hour.digits'='2', 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/CloudFront/${distributionid}/${year}/${month}/${day}/${hour}/') ``` 以下是先前範例中所用屬性的一些考量事項。 + **資料表名稱** – 資料表名稱 *`cloudfront_logs_pp`* 為可取代。您可以將其變更為您偏好的任何名稱。 + **位置** – 修改 `s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_ID/` 來指向您的 Amazon S3 儲存貯體。 + **分佈 ID** – 對於 `projection.distributionid.values`，如果您使用逗號分隔分佈 ID，則可以指定多個分佈 ID。例如，**、**。 + **年份範圍** – 在 `projection.year.range` 中，您可以根據資料定義年份範圍。例如，您可以將其調整為任何期間，例如 *2025 年*、*2026 年*。 **注意** 包含空的分割區 (例如未來日期的分割區，例如：2025-2040 年) 可能會影響查詢效能。不過，分割區投影旨在有效地處理未來日期。為了維持最佳效能，請確保仔細管理分割區，並盡可能避免過多的空的分割區。 + **儲存位置範本** – 您必須確保根據下列 CloudFront 分割結構和 S3 路徑正確更新 `storage.location.template`。 **** [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/athena/latest/ug/create-cloudfront-table-partition-parquet.html) 在確認 CloudFront 分割結構和 S3 結構符合所需的模式後，請更新 `storage.location.template`，如下所示： ``` 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/account_id/CloudFront/${distributionid}/folder2/${year}/${month}/${day}/${hour}/folder3/' ``` **注意** `storage.location.template` 的適當組態對於確保正確的資料儲存和擷取至關重要。 # 建立 CloudFront 即時日誌的資料表 **建立 CloudFront 即時日誌檔案欄位的資料表** 1. 複製下列範例 DDL 陳述式，並將其貼到 Athena 主控台查詢編輯器。範例陳述式使用《Amazon CloudFront 開發人員指南**》中[即時日誌](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/real-time-logs.html)章節中所述的日誌檔案欄位。為存放日誌的 Amazon S3 儲存貯體修改 `LOCATION`。如需有關使用查詢編輯器的資訊，請參閱[開始使用](getting-started.md)。此查詢可指定 `ROW FORMAT DELIMITED` 和 `FIELDS TERMINATED BY '\t'`，表示欄位以定位字元分隔。對於 `ROW FORMAT DELIMITED`，Athena 預設使用 [LazySimpleSerDe](lazy-simple-serde.md)。因為在 Athena 中它是保留字，所以使用反引號 (') 逸出 `timestamp` 資料欄。如需相關資訊，請參閱[在查詢中逸出預留關鍵字](reserved-words.md)。下列範例包含所有可用的欄位。您可以註解排除或移除不需要的欄位。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_real_time_logs ( `timestamp` STRING, c_ip STRING, time_to_first_byte BIGINT, sc_status BIGINT, sc_bytes BIGINT, cs_method STRING, cs_protocol STRING, cs_host STRING, cs_uri_stem STRING, cs_bytes BIGINT, x_edge_location STRING, x_edge_request_id STRING, x_host_header STRING, time_taken BIGINT, cs_protocol_version STRING, c_ip_version STRING, cs_user_agent STRING, cs_referer STRING, cs_cookie STRING, cs_uri_query STRING, x_edge_response_result_type STRING, x_forwarded_for STRING, ssl_protocol STRING, ssl_cipher STRING, x_edge_result_type STRING, fle_encrypted_fields STRING, fle_status STRING, sc_content_type STRING, sc_content_len BIGINT, sc_range_start STRING, sc_range_end STRING, c_port BIGINT, x_edge_detailed_result_type STRING, c_country STRING, cs_accept_encoding STRING, cs_accept STRING, cache_behavior_path_pattern STRING, cs_headers STRING, cs_header_names STRING, cs_headers_count BIGINT, primary_distribution_id STRING, primary_distribution_dns_name STRING, origin_fbl STRING, origin_lbl STRING, asn STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://amzn-s3-demo-bucket/' TBLPROPERTIES ( 'skip.header.line.count'='2' ) ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `cloudfront_real_time_logs` 資料表，讓其中的資料可供您發出查詢。 # 其他資源如需有關使用 Athena 查詢 CloudFront 日誌的詳細資訊，請參閱 [AWS 大數據部落格](https://aws.amazon.com/blogs/big-data/)中的下列貼文。 [使用 Amazon Athena 輕鬆查詢 AWS 服務日誌](https://aws.amazon.com/blogs/big-data/easily-query-aws-service-logs-using-amazon-athena/) (2019 年 5 月 29 日）。 [大規模分析 Amazon CloudFront 存取日誌](https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/) (2018 年 12 月 21 日)。 [建置無伺服器架構，使用 AWS Lambda、Amazon Athena 和 Amazon Managed Service for Apache Flink (2017 年 5 月 26 日）來分析 Amazon CloudFront 存取日誌 Amazon Athena](https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/)。 # 查詢 AWS CloudTrail 日誌 AWS CloudTrail 是一項服務，可記錄 Amazon Web Services 帳戶的 AWS API 呼叫和事件。 CloudTrail 日誌包含對進行任何 API 呼叫的詳細資訊 AWS 服務，包括主控台。CloudTrail 會產生加密的日誌檔案，並將它們存放在 Amazon S3 中。如需詳細資訊，請參閱《AWS CloudTrail 使用者指南》[https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html)。 **注意** 如果要跨帳戶、區域和日期對 CloudTrail 事件資訊執行 SQL 查詢，請考慮使用 CloudTrail Lake。CloudTrail Lake 是建立線索的 AWS 替代方案，可將企業的資訊彙總到單一可搜尋的事件資料存放區。其不使用 Amazon S3 儲存貯體儲存，而是將事件儲存在資料湖中，進而允許更豐富、更快的查詢。您可以使用它來建立 SQL 查詢，以便跨組織、區域並在自訂的時間範圍內搜尋事件。因為您在 CloudTrail 主控台本身中執行 CloudTrail Lake 查詢，因此使用 CloudTrail Lake 不需要 Athena。如需詳細資訊，請參閱 [CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html) 文件。搭配使用 Athena 與 CloudTrail 日誌是一種功能強大的方式，可增強 AWS 服務活動的分析。例如，您可以使用查詢來識別趨勢，並依屬性 (例如來源 IP 地址或使用者) 進一步隔離活動。常見的應用是使用 CloudTrail 日誌來分析營運活動的安全性與合規性。如需詳細範例的資訊，請參閱 AWS 大數據部落格文章、[使用和 Amazon Athena 分析安全性、合規性 AWS CloudTrail 和操作活動](https://aws.amazon.com/blogs/big-data/aws-cloudtrail-and-amazon-athena-dive-deep-to-analyze-security-compliance-and-operational-activity/)。您可以使用 Athena，指定日誌檔案的 `LOCATION`，直接從 Amazon S3 查詢這些日誌檔案。有以下兩種做法： + 從 CloudTrail 主控台直接建立 CloudTrail 日誌檔案的資料表。 + 在 Athena 主控台中手動建立 CloudTrail 日誌檔案的資料表。 **Topics** + [了解 CloudTrail 日誌和 Athena 資料表](create-cloudtrail-table-understanding.md) + [使用 CloudTrail 主控台建立 CloudTrail 日誌的 Athena 資料表](create-cloudtrail-table-ct.md) + [使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table.md) + [使用手動分割為整個組織的追蹤建立資料表](create-cloudtrail-table-org-wide-trail.md) + [使用分割區投影在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table-partition-projection.md) + [CloudTrail 日誌查詢範例](query-examples-cloudtrail-logs.md) # 了解 CloudTrail 日誌和 Athena 資料表開始建立資料表之前，您應該更深入了解 CloudTrail 及其如何存放資料。這可協助您建立所需資料表，不論您的建立方式是透過 CloudTrail 主控台或 Athena。 CloudTrail 將日誌以壓縮的 gzip 格式 (`*.json.gz`) 儲存在 JSON 文字檔案中。日誌檔案的位置取決於您設定追蹤的方式、您正在記錄的 AWS 區域或區域，以及其他因素。如需有關日誌存放位置、JSON 結構、日誌檔案內容的詳細資訊，請參閱《[AWS CloudTrail 使用者指南](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html)》中的下列主題： + [尋找 CloudTrail 日誌檔案](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-find-log-files.html) + [CloudTrail 日誌檔案範例](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-log-file-examples.html) + [CloudTrail 記錄內容](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-record-contents.html) + [CloudTrail 事件參考](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference.html) 若要收集日誌並儲存到 Amazon S3 中，請從 AWS 管理主控台啟用 CloudTrail。如需詳細資訊，請參閱《AWS CloudTrail 使用者指南》**中的[建立追蹤](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-a-trail-using-the-console-first-time.html)。 # 使用 CloudTrail 主控台建立 CloudTrail 日誌的 Athena 資料表您可以建立非分割 Athena 資料表，以直接從 CloudTrail 主控台查詢 CloudTrail 日誌。從 CloudTrail 主控台建立 Athena 資料表時，您必須使用具有足夠許可的角色登入，才能在 Athena 中建立資料表。 **注意** 您無法使用 CloudTrail 主控台建立組織線索日誌的 Athena 資料表。而是要使用 Athena 主控台手動建立資料表，以便指定正確的儲存位置。如需有關組織線索的資訊，請參閱《AWS CloudTrail 使用者指南》**中的[為組織建立追蹤](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/creating-trail-organization.html)。 + 如需有關設定 Athena 許可的資訊，請參閱 [設定、管理和以程式設計方式存取](setting-up.md)。 + 如需建立包含分割區之資料表的相關資訊，請參閱[使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table.md)。 **若要使用 CloudTrail 主控台為 CloudTrail 線索建立 Athena 資料表** 1. 透過 [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/) 開啟 CloudTrail 主控台。 1. 在導覽窗格中，選擇 **Event history** (事件歷史記錄)。 1. 選擇 **Create Athena table** (建立 Athena 資料表)。 ![\[選擇 Create Athena table (建立 Athena 資料表)\]](http://docs.aws.amazon.com/zh_tw/athena/latest/ug/images/cloudtrail-logs-create-athena-table.png) 1. 對於 **Storage location** (儲存位置)，請使用向下箭頭選取存放日誌檔案的 Amazon S3 儲存貯體，以供查詢線索。 **注意** 若要尋找與線索相關聯的儲存貯體名稱，請選擇 CloudTrail 導覽窗格中的 **Trails** (線索)，然後檢視該線索的 **S3 bucket** (S3 儲存貯體) 資料欄。若要查看儲存貯體的 Amazon S3 位置，請在 **S3 bucket** (S3 儲存貯體) 資料欄中選擇儲存貯體的連結。這會開啟 Amazon S3 主控台，以前往 CloudTrail 儲存貯體位置。 1. 選擇 **Create Table** (建立資料表)。建立的資料表會使用包含 Amazon S3 儲存貯體名稱的預設名稱。 # 使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表您可以在 Athena 主控台中手動建立 CloudTrail 日誌檔案的資料表，然後在 Athena 中執行查詢。 **若要使用 Athena 主控台為 CloudTrail 線索建立 Athena 資料表** 1. 複製下列 DDL 陳述式，並將其貼到 Athena 主控台查詢編輯器，然後根據您的要求進行修改。請注意，由於 CloudTrail 日誌檔案並非公有 API 呼叫的堆疊追蹤排序，因此日誌檔案中的欄位不會以任何特定順序出現。 ``` CREATE EXTERNAL TABLE cloudtrail_logs ( eventversion STRING, useridentity STRUCT< type:STRING, principalid:STRING, arn:STRING, accountid:STRING, invokedby:STRING, accesskeyid:STRING, username:STRING, onbehalfof: STRUCT< userid: STRING, identitystorearn: STRING>, sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING, creationdate:STRING>, sessionissuer:STRUCT< type:STRING, principalid:STRING, arn:STRING, accountid:STRING, username:STRING>, ec2roledelivery:string, webidfederationdata: STRUCT< federatedprovider: STRING, attributes: map> > >, eventtime STRING, eventsource STRING, eventname STRING, awsregion STRING, sourceipaddress STRING, useragent STRING, errorcode STRING, errormessage STRING, requestparameters STRING, responseelements STRING, additionaleventdata STRING, requestid STRING, eventid STRING, resources ARRAY>, eventtype STRING, apiversion STRING, readonly STRING, recipientaccountid STRING, serviceeventdetails STRING, sharedeventid STRING, vpcendpointid STRING, vpcendpointaccountid STRING, eventcategory STRING, addendum STRUCT< reason:STRING, updatedfields:STRING, originalrequestid:STRING, originaleventid:STRING>, sessioncredentialfromconsole STRING, edgedevicedetails STRING, tlsdetails STRUCT< tlsversion:STRING, ciphersuite:STRING, clientprovidedhostheader:STRING> ) PARTITIONED BY (region string, year string, month string, day string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/Account_ID/'; ``` **注意** 我們建議使用範例中顯示的 `org.apache.hive.hcatalog.data.JsonSerDe`。雖然存在 `com.amazon.emr.hive.serde.CloudTrailSerde`，但目前不會處理一些較新的 CloudTrail 欄位。 1. (選用) 移除資料表不再需要的任何欄位。如果您只需要讀取特定的資料欄集，則您的資料表定義可排除其他資料欄。 1. 修改 `s3://amzn-s3-demo-bucket/AWSLogs/Account_ID/` 來指向包含您要查詢的日誌資料的 Amazon S3 儲存貯體。範例中使用特定帳戶日誌的 `LOCATION` 值，但您可以使用適合您應用程式的具體程度。例如： + 若要分析多個帳戶的資料，您可以復原 `LOCATION` 指標，使用 `LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/'` 來指定所有 `AWSLogs`。 + 若要分析特定日期、帳戶和區域的資料，請使用 `LOCATION 's3://amzn-s3-demo-bucket/123456789012/CloudTrail/us-east-1/2016/03/14/'.` + 若要分析網路活動資料，而不是管理事件，請將 `LOCATION` 子句中的 `/CloudTrail/` 取代為 `/CloudTrail-NetworkActivity/`。當您使用 Athena 進行查詢時，使用物件階層的最高層級可給您最大的靈活性。 1. 確認列出的欄位正確。如需有關 CloudTrail 記錄中欄位的完整清單的詳細資訊，請參閱 [CloudTrail 記錄內容](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-record-contents.html)。步驟 1 中的範例 `CREATE TABLE` 陳述式使用 [Hive JSON SerDe](hive-json-serde.md)。在這個範例中，欄位 `requestparameters`、`responseelements` 和 `additionaleventdata` 在查詢中列為類型 `STRING`，但採用 JSON 格式時為 `STRUCT` 資料類型。因此，若要取得這些欄位的資料，請使用 `JSON_EXTRACT` 函數。如需詳細資訊，請參閱[從字串擷取 JSON 資料](extracting-data-from-JSON.md)。為了改善效能，範例會依 AWS 區域年、月和日分割資料。 1. 在 Athena 主控台執行 `CREATE TABLE` 陳述式。 1. 使用 [ALTER TABLE ADD PARTITION](alter-table-add-partition.md) 命令載入分割區，以便進行查詢，如下列範例所示。 ``` ALTER TABLE table_name ADD PARTITION (region='us-east-1', year='2019', month='02', day='01') LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/Account_ID/CloudTrail/us-east-1/2019/02/01/' ``` # 使用手動分割為整個組織的追蹤建立資料表若要在 Athena 為整個組織的 CloudTrail 日誌檔建立資料表，請依循[使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table.md)所述的步驟在下列程序中修改。 **為整個組織的 CloudTrail 日誌建立 Athena 資料表** 1. 在 `CREATE TABLE` 陳述式中，修改 `LOCATION` 子句以加入組織 ID，如下列範例所示： ``` LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/organization_id/' ``` 1. 在 `PARTITIONED BY` 子句中，以字串形式新增帳戶 ID 項目，如下列範例所示： ``` PARTITIONED BY (account string, region string, year string, month string, day string) ``` 上述步驟的合併結果如以下範例所示： ``` ... PARTITIONED BY (account string, region string, year string, month string, day string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/organization_id/Account_ID/CloudTrail/' ``` 1. 在 `ALTER TABLE` 陳述式中的 `ADD PARTITION` 子句內加入帳戶 ID，如下列範例所示： ``` ALTER TABLE table_name ADD PARTITION (account='111122223333', region='us-east-1', year='2022', month='08', day='08') ``` 1. 在 `ALTER TABLE` 陳述式中的 `LOCATION` 子句內加入組織 ID、帳戶 ID，以及您要新增的分割區，如下列範例所示： ``` LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/organization_id/Account_ID/CloudTrail/us-east-1/2022/08/08/' ``` 上述步驟的合併結果如以下 `ALTER TABLE` 陳述式範例所示： ``` ALTER TABLE table_name ADD PARTITION (account='111122223333', region='us-east-1', year='2022', month='08', day='08') LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/organization_id/111122223333/CloudTrail/us-east-1/2022/08/08/' ``` 請注意，在大型組織中，使用此方法來手動為每個組織帳戶 ID 新增和維護分割區可能會很麻煩。在這種情況下，請考慮使用 CloudTrail Lake 而不是 Athena。在這種情況下，CloudTrail Lake 可提供下列優勢： + 自動彙總整個組織的日誌 + 不需要設定或維護分割區或 Athena 資料表 + 查詢會在 CloudTrail 主控台中直接執行 + 使用 SQL 相容查詢語言如需詳細資訊，請參閱*AWS CloudTrail 《使用者指南*》中的[使用 AWS CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html)。 # 使用分割區投影在 Athena 中建立 CloudTrail 日誌的資料表由於 CloudTrail 日誌具有您可以事先指定其分割區配置的已知結構，您可以使用 Athena 分割區投影功能來減少查詢執行時間並自動化分割區管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 `ALTER TABLE ADD PARTITION` 手動新增分割區。以下 `CREATE TABLE` 陳述式範例會從指定日期到現在，對單一 AWS 區域的 CloudTrail 日誌自動使用分割區投影。在 `LOCATION` 和 `storage.location.template` 子句中，請以對應的相同值取代*儲存貯體*、*account-id* 和 *aws-region* 預留位置。對於 `projection.timestamp.range`，請用您要使用的開始日期取代 *2020*/*01*/*01*。成功執行查詢之後，您可以查詢資料表。您無須執行 `ALTER TABLE ADD PARTITION` 就能載入分割區。 ``` CREATE EXTERNAL TABLE cloudtrail_logs_pp( eventversion STRING, useridentity STRUCT< type: STRING, principalid: STRING, arn: STRING, accountid: STRING, invokedby: STRING, accesskeyid: STRING, username: STRING, onbehalfof: STRUCT< userid: STRING, identitystorearn: STRING>, sessioncontext: STRUCT< attributes: STRUCT< mfaauthenticated: STRING, creationdate: STRING>, sessionissuer: STRUCT< type: STRING, principalid: STRING, arn: STRING, accountid: STRING, username: STRING>, ec2roledelivery:string, webidfederationdata: STRUCT< federatedprovider: STRING, attributes: map> > >, eventtime STRING, eventsource STRING, eventname STRING, awsregion STRING, sourceipaddress STRING, useragent STRING, errorcode STRING, errormessage STRING, requestparameters STRING, responseelements STRING, additionaleventdata STRING, requestid STRING, eventid STRING, readonly STRING, resources ARRAY>, eventtype STRING, apiversion STRING, recipientaccountid STRING, serviceeventdetails STRING, sharedeventid STRING, vpcendpointid STRING, vpcendpointaccountid STRING, eventcategory STRING, addendum STRUCT< reason:STRING, updatedfields:STRING, originalrequestid:STRING, originaleventid:STRING>, sessioncredentialfromconsole STRING, edgedevicedetails STRING, tlsdetails STRUCT< tlsversion:STRING, ciphersuite:STRING, clientprovidedhostheader:STRING> ) PARTITIONED BY ( `timestamp` string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/account-id/CloudTrail/aws-region' TBLPROPERTIES ( 'projection.enabled'='true', 'projection.timestamp.format'='yyyy/MM/dd', 'projection.timestamp.interval'='1', 'projection.timestamp.interval.unit'='DAYS', 'projection.timestamp.range'='2020/01/01,NOW', 'projection.timestamp.type'='date', 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/account-id/CloudTrail/aws-region/${timestamp}') ``` 如需有關分割區投影的詳細資訊，請參閱[透過 Amazon Athena 使用分割區投影](partition-projection.md)。 # CloudTrail 日誌查詢範例以下範例顯示一個查詢的片段，該查詢從為 CloudTrail 事件日誌建立的資料表傳回所有匿名 (未簽署的) 請求。此查詢選擇 `useridentity.accountid` 是匿名且未指定 `useridentity.arn` 的那些請求： ``` SELECT * FROM cloudtrail_logs WHERE eventsource = 's3.amazonaws.com' AND eventname in ('GetObject') AND useridentity.accountid = 'anonymous' AND useridentity.arn IS NULL AND requestparameters LIKE '%[your bucket name ]%'; ``` 如需詳細資訊，請參閱 AWS 大數據部落格文章[使用和 Amazon Athena 分析安全性、合規性 AWS CloudTrail 和營運活動](https://aws.amazon.com/blogs/big-data/aws-cloudtrail-and-amazon-athena-dive-deep-to-analyze-security-compliance-and-operational-activity/)。 ## 查詢 CloudTrail 日誌中的巢狀欄位由於 `userIdentity` 和 `resources` 欄位是巢狀資料類型，查詢它們需要特殊的處理。 `userIdentity` 物件由巢狀 `STRUCT` 類型組成。如以下範例所示，可以使用點來分隔欄位，以查詢這些類型： ``` SELECT eventsource, eventname, useridentity.sessioncontext.attributes.creationdate, useridentity.sessioncontext.sessionissuer.arn FROM cloudtrail_logs WHERE useridentity.sessioncontext.sessionissuer.arn IS NOT NULL ORDER BY eventsource, eventname LIMIT 10 ``` `resources` 欄位是 `STRUCT` 物件的陣列。請對這些陣列使用 `CROSS JOIN UNNEST`，解除陣列巢狀，以便於查詢其物件。以下範例會傳回資源 ARN 以 `example/datafile.txt` 結尾的所有資料行。為了可讀性，[replace](https://prestodb.io/docs/current/functions/string.html#replace) 函數會從 ARN 移除初始 `arn:aws:s3:::` 子字串。 ``` SELECT awsregion, replace(unnested.resources_entry.ARN,'arn:aws:s3:::') as s3_resource, eventname, eventtime, useragent FROM cloudtrail_logs t CROSS JOIN UNNEST(t.resources) unnested (resources_entry) WHERE unnested.resources_entry.ARN LIKE '%example/datafile.txt' ORDER BY eventtime ``` 下列範例為 `DeleteBucket` 事件的查詢範例。該查詢會從 `resources` 物件擷取儲存貯體的名稱及儲存貯體隸屬的帳戶 ID。 ``` SELECT awsregion, replace(unnested.resources_entry.ARN,'arn:aws:s3:::') as deleted_bucket, eventtime AS time_deleted, useridentity.username, unnested.resources_entry.accountid as bucket_acct_id FROM cloudtrail_logs t CROSS JOIN UNNEST(t.resources) unnested (resources_entry) WHERE eventname = 'DeleteBucket' ORDER BY eventtime ``` 如需有關解除巢狀的詳細資訊，請參閱[篩選陣列](filtering-arrays.md)。 ## 查詢 CloudTrail 日誌的秘訣探索 CloudTrail 日誌資料時，請考慮下列事項： + 查詢日誌之前，確認您的日誌資料表看起來和 [使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table.md) 中的相同。如果它不是第一個資料表，使用下列命令刪除現有的資料表：`DROP TABLE cloudtrail_logs`。 + 捨棄現有資料表後，再重新建立。如需詳細資訊，請參閱[使用手動分割在 Athena 中建立 CloudTrail 日誌的資料表](create-cloudtrail-table.md)。確認 Athena 查詢中列出的欄位正確。如需有關 CloudTrail 記錄中欄位的完整清單的資訊，請參閱 [CloudTrail 記錄內容](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-record-contents.html)。如果您的查詢包含 JSON 格式的欄位，例如 `STRUCT`，請從 JSON 擷取資料。如需詳細資訊，請參閱[從字串擷取 JSON 資料](extracting-data-from-JSON.md)。針對 CloudTrail 資料表發出查詢的部分建議。 + 首先，查看哪些使用者呼叫了哪些 API 操作以及從哪些來源 IP 地址進行呼叫。 + 以下列基本 SQL 查詢當做您的範本。將查詢貼到 Athena 主控台然後執行。 ``` SELECT useridentity.arn, eventname, sourceipaddress, eventtime FROM cloudtrail_logs LIMIT 100; ``` + 修改查詢，進一步探索您的資料。 + 為了改善效能，加入 `LIMIT` 子句來傳回資料列的指定部分。 # 查詢 Amazon EMR 日誌在 Amazon EMR 上執行的 Amazon EMR 和大數據應用程式會產生日誌檔案。日誌檔案會被寫入[主節點](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html)，且您也可以將 Amazon EMR 設定為自動將日誌檔案存檔至 Amazon S3。您可以使用 Amazon Athena 來查詢這些日誌，以識別應用程式和叢集的事件和趨勢。如需有關 Amazon EMR 中的日誌檔案類型和將其儲存至 Amazon S3 的詳細資訊，請參閱《Amazon EMR 管理指南》**中的[檢視日誌檔案](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html)。 **Topics** + [根據 Amazon EMR 日誌檔案建立和查詢基本資料表](emr-create-table.md) + [根據 Amazon EMR 日誌建立和查詢分割的資料表](emr-create-table-partitioned.md) # 根據 Amazon EMR 日誌檔案建立和查詢基本資料表以下範例會根據儲存至 `s3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/elasticmapreduce/` 的日誌檔建立基本資料表 `myemrlogs`。在以下範例中使用的 Amazon S3 位置會反映由 Amazon Web Services 帳戶 *123456789012* 在區域 *us-west-2* 中建立之 EMR 叢集的預設日誌位置的模式。如果您使用自訂位置，模式則為 s3://amzn-s3-demo-bucket/*ClusterID*。如需建立分割的資料表以盡可能提升查詢效能並減少資料傳輸的詳細資訊，請參閱 [根據 Amazon EMR 日誌建立和查詢分割的資料表](emr-create-table-partitioned.md)。 ``` CREATE EXTERNAL TABLE `myemrlogs`( `data` string COMMENT 'from deserializer') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6' ``` ## 查詢範例以下範例查詢可在上一個範例建立的 `myemrlogs` 資料表上執行。 **Example – 查詢 ERROR、WARN、INFO、EXCEPTION、FATAL 或 DEBUG 發生次數的步驟日誌** ``` SELECT data, "$PATH" FROM "default"."myemrlogs" WHERE regexp_like("$PATH",'s-86URH188Z6B1') AND regexp_like(data, 'ERROR|WARN|INFO|EXCEPTION|FATAL|DEBUG') limit 100; ``` **Example – 查詢 ERROR、WARN、INFO、EXCEPTION、FATAL 或 DEBUG 的特定執行個體日誌 i-00b3c0a839ece0a9c** ``` SELECT "data", "$PATH" AS filepath FROM "default"."myemrlogs" WHERE regexp_like("$PATH",'i-00b3c0a839ece0a9c') AND regexp_like("$PATH",'state') AND regexp_like(data, 'ERROR|WARN|INFO|EXCEPTION|FATAL|DEBUG') limit 100; ``` **Example – 查詢 ERROR、WARN、INFO、EXCEPTION、FATAL 或 DEBUG 的 Presto 應用程式日誌** ``` SELECT "data", "$PATH" AS filepath FROM "default"."myemrlogs" WHERE regexp_like("$PATH",'presto') AND regexp_like(data, 'ERROR|WARN|INFO|EXCEPTION|FATAL|DEBUG') limit 100; ``` **Example – 查詢 ERROR、WARN、INFO、EXCEPTION、FATAL 或 DEBUG 的 Namenode 應用程式日誌** ``` SELECT "data", "$PATH" AS filepath FROM "default"."myemrlogs" WHERE regexp_like("$PATH",'namenode') AND regexp_like(data, 'ERROR|WARN|INFO|EXCEPTION|FATAL|DEBUG') limit 100; ``` **Example – 依日期和小時查詢 ERROR、WARN、INFO、EXCEPTION、FATAL 或 DEBUG 的所有日誌** ``` SELECT distinct("$PATH") AS filepath FROM "default"."myemrlogs" WHERE regexp_like("$PATH",'2019-07-23-10') AND regexp_like(data, 'ERROR|WARN|INFO|EXCEPTION|FATAL|DEBUG') limit 100; ``` # 根據 Amazon EMR 日誌建立和查詢分割的資料表這些範例使用相同的日誌位置來建立 Athena 資料表，但資料表會進行分割，然後為每個日誌位置建立分割區。如需詳細資訊，請參閱[分割您的資料](partitions.md)。以下查詢會建立名為 `mypartitionedemrlogs` 的分割資料表： ``` CREATE EXTERNAL TABLE `mypartitionedemrlogs`( `data` string COMMENT 'from deserializer') partitioned by (logtype string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6' ``` 以下查詢陳述式接著會根據 Amazon EMR 在 Amazon S3 中所建立之不同日誌類型的子目錄，建立資料表分割區： ``` ALTER TABLE mypartitionedemrlogs ADD PARTITION (logtype='containers') LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/containers/' ``` ``` ALTER TABLE mypartitionedemrlogs ADD PARTITION (logtype='hadoop-mapreduce') LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/hadoop-mapreduce/' ``` ``` ALTER TABLE mypartitionedemrlogs ADD PARTITION (logtype='hadoop-state-pusher') LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/hadoop-state-pusher/' ``` ``` ALTER TABLE mypartitionedemrlogs ADD PARTITION (logtype='node') LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/node/' ``` ``` ALTER TABLE mypartitionedemrlogs ADD PARTITION (logtype='steps') LOCATION 's3://aws-logs-123456789012-us-west-2/elasticmapreduce/j-2ABCDE34F5GH6/steps/' ``` 建立分割區後，您可以在資料表上執行 `SHOW PARTITIONS` 查詢，以便確認： ``` SHOW PARTITIONS mypartitionedemrlogs; ``` ## 查詢範例以下範例示範特定日誌項目的查詢會使用上述範例所建立的資料表和分割區。 **Example – 查詢 ERROR 或 WARN 的容器分割區中的應用程式 application\$11561661818238\$10002 日誌** ``` SELECT data, "$PATH" FROM "default"."mypartitionedemrlogs" WHERE logtype='containers' AND regexp_like("$PATH",'application_1561661818238_0002') AND regexp_like(data, 'ERROR|WARN') limit 100; ``` **Example – 查詢任務 job\$11561661818238\$10004 的 hadoop-Mapreduce 分割區和失敗的 Reduce 次數** ``` SELECT data, "$PATH" FROM "default"."mypartitionedemrlogs" WHERE logtype='hadoop-mapreduce' AND regexp_like(data,'job_1561661818238_0004|Failed Reduces') limit 100; ``` **Example – 在查詢 ID 056e0609-33e1-4611-956c-7a31b42d2663 的節點分割區中查詢 Hive 日誌** ``` SELECT data, "$PATH" FROM "default"."mypartitionedemrlogs" WHERE logtype='node' AND regexp_like("$PATH",'hive') AND regexp_like(data,'056e0609-33e1-4611-956c-7a31b42d2663') limit 100; ``` **Example – 在應用程式 1567660019320\$10001\$101\$1000001 的節點分割區中查詢 resourcemanager 日誌** ``` SELECT data, "$PATH" FROM "default"."mypartitionedemrlogs" WHERE logtype='node' AND regexp_like(data,'resourcemanager') AND regexp_like(data,'1567660019320_0001_01_000001') limit 100 ``` # 查詢 AWS Global Accelerator 流程日誌您可以使用 AWS Global Accelerator 建立加速器，透過 AWS 全球網路將網路流量導向最佳端點。如需 Global Accelerator 的詳細資訊，請參閱[什麼是 AWS Global Accelerator](https://docs.aws.amazon.com/global-accelerator/latest/dg/what-is-global-accelerator.html) 。 Global Accelerator 流程日誌可讓您擷取加速器中進出網路界面的 IP 地址流量相關資訊。流程日誌資料會發佈至 Amazon S3，您可以在其中擷取和檢視資料。如需詳細資訊，請參閱 [AWS Global Accelerator中的流程日誌](https://docs.aws.amazon.com/global-accelerator/latest/dg/monitoring-global-accelerator.flow-logs.html)。您可以使用 Athena 建立在 Amazon S3 中指定其位置的資料表來查詢 Global Accelerator 流程日誌。 **若要建立 Global Accelerator 流程日誌的資料表** 1. 複製下列 DDL 陳述式，並將其貼到 Athena 主控台。此查詢會指定 *ROW FORMAT DELIMITED* 並省略指定 [SerDe](serde-reference.md)，這代表該查詢使用 [`LazySimpleSerDe`](lazy-simple-serde.md)。在這個查詢中，欄位以空格結尾。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS aga_flow_logs ( version string, account string, acceleratorid string, clientip string, clientport int, gip string, gipport int, endpointip string, endpointport int, protocol string, ipaddresstype string, numpackets bigint, numbytes int, starttime int, endtime int, action string, logstatus string, agasourceip string, agasourceport int, endpointregion string, agaregion string, direction string, vpc_id string, reject_reason string ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/account_id/globalaccelerator/region/' TBLPROPERTIES ("skip.header.line.count"="1"); ``` 1. 修改 `LOCATION` 值來指向包含日誌資料的 Amazon S3 儲存貯體。 ``` 's3://amzn-s3-demo-bucket/prefix/AWSLogs/account_id/globalaccelerator/region_code/' ``` 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `aga_flow_logs` 資料表，讓其中的資料可供查詢。 1. 建立分割區以讀取資料，如下列查詢範例所示。該查詢會針對指定日期建立單一分割區。將預留位置換成日期和位置。 ``` ALTER TABLE aga_flow_logs ADD PARTITION (dt='YYYY-MM-dd') LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/account_id/globalaccelerator/region_code/YYYY/MM/dd'; ``` ## AWS Global Accelerator 流程日誌的範例查詢 **Example – 列出透過特定節點傳遞的請求** 下列範例查詢列出了透過 LHR 節點傳遞的要求。使用 `LIMIT` 運算子來一次限制查詢的日誌數量。 ``` SELECT clientip, agaregion, protocol, action FROM aga_flow_logs WHERE agaregion LIKE 'LHR%' LIMIT 100; ``` **Example – 列出了接收最多 HTTPS 請求的端點 IP 地址** 若要查看哪些端點 IP 位址正在接收最多的 HTTPS 要求數目，請使用下列查詢。此查詢會計算 HTTPS 連接埠 443 所接收的封包數、依目的地 IP 地址分組，然後傳回前 10 個 IP 地址。 ``` SELECT SUM(numpackets) AS packetcount, endpointip FROM aga_flow_logs WHERE endpointport = 443 GROUP BY endpointip ORDER BY packetcount DESC LIMIT 10; ``` # 查詢 Amazon GuardDuty 調查結果 [Amazon GuardDuty](https://aws.amazon.com/guardduty/) 是一種安全監控服務，可協助識別您 AWS 環境中的非預期和可能未經授權或惡意活動。當偵測到未預期的和潛在的惡意活動時，GuardDuty 會產生安全[問題清單](https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_findings.html)，您可將其匯出至 Amazon S3 以進行儲存和分析。將問題清單匯出至 Amazon S3 之後，您可以使用 Athena 來查詢它們。本文將示範如何在 Athena 中為您的 GuardDuty 問題清單建立資料表並進行查詢。如需有關 Amazon GuardDuty 的詳細資訊，請參閱《[Amazon GuardDuty 使用者指南](https://docs.aws.amazon.com/guardduty/latest/ug/)》。 ## 先決條件 + 啟用 GuardDuty 功能以將問題清單匯出至 Amazon S3。如需相關步驟，請參閱《Amazon GuardDuty 使用者指南》中的[匯出問題清單](https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_exportfindings.html)。 ## 在 Athena 中為 GuardDuty 調查結果建立資料表若要從 Athena 查詢您的 GuardDuty 問題清單，您必須為它們建立一個資料表。 **若要在 Athena 中為 GuardDuty 問題清單建立資料表** 1. 前往 [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) 開啟 Athena 主控台。 1. 將下列 DDL 陳述式複製到 Athena 主控台。修改 `LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/account-id/GuardDuty/'` 中的數值，以指向 Amazon S3 中的 GuardDuty 問題清單。 ``` CREATE EXTERNAL TABLE `gd_logs` ( `schemaversion` string, `accountid` string, `region` string, `partition` string, `id` string, `arn` string, `type` string, `resource` string, `service` string, `severity` string, `createdat` string, `updatedat` string, `title` string, `description` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/account-id/GuardDuty/' TBLPROPERTIES ('has_encrypted_data'='true') ``` **注意** SerDe 預期每筆 JSON 文件都以單行文字表示，而且沒有行終止字元分隔記錄中的欄位。如果 JSON 文字是美化過的列印格式，則在建立資料表後嘗試在其中查詢時可能會收到下列錯誤訊息：HIVE\$1CURSOR\$1ERROR: Row is not a valid JSON Object (HIVE\$1CURSOR\$1ERROR：資料列不是有效的 JSON 物件) 或 HIVE\$1CURSOR\$1ERROR: JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (HIVE\$1CURSOR\$1ERROR：JsonParseException：非預期的輸入結束：預期為 OBJECT 的關閉標記)。如需詳細資訊，請參閱 GitHub 上 OpenX SerDe 文件中的 [JSON 資料檔案](https://github.com/rcongiu/Hive-JSON-Serde#json-data-files)。 1. 在 Athena 主控台中執行查詢來註冊 `gd_logs` 資料表。查詢完成時，您就可以從 Athena 查詢問題清單。 ## 查詢範例下列範例說明如何從 Athena 查詢 GuardDuty 問題清單。 **Example – DNS 資料外洩** 下列查詢會傳回可能透過 DNS 查詢外洩資料之 Amazon EC2 執行個體的相關資訊。 ``` SELECT title, severity, type, id AS FindingID, accountid, region, createdat, updatedat, json_extract_scalar(service, '$.count') AS Count, json_extract_scalar(resource, '$.instancedetails.instanceid') AS InstanceID, json_extract_scalar(service, '$.action.actiontype') AS DNS_ActionType, json_extract_scalar(service, '$.action.dnsrequestaction.domain') AS DomainName, json_extract_scalar(service, '$.action.dnsrequestaction.protocol') AS protocol, json_extract_scalar(service, '$.action.dnsrequestaction.blocked') AS blocked FROM gd_logs WHERE type = 'Trojan:EC2/DNSDataExfiltration' ORDER BY severity DESC ``` **Example – 未經授權的 IAM 使用者存取** 下列查詢會傳回所有區域的 IAM 委託人的所有 `UnauthorizedAccess:IAMUser` 問題清單類型。 ``` SELECT title, severity, type, id, accountid, region, createdat, updatedat, json_extract_scalar(service, '$.count') AS Count, json_extract_scalar(resource, '$.accesskeydetails.username') AS IAMPrincipal, json_extract_scalar(service,'$.action.awsapicallaction.api') AS APIActionCalled FROM gd_logs WHERE type LIKE '%UnauthorizedAccess:IAMUser%' ORDER BY severity desc; ``` ## 查詢 GuardDuty 問題清單的秘訣當您建立查詢時，請記住以下幾點。 + 若要從巢狀 JSON 欄位中提取資料，請使用 Presto `json_extract` 或 `json_extract_scalar` 函數。如需詳細資訊，請參閱[從字串擷取 JSON 資料](extracting-data-from-JSON.md)。 + 確保 JSON 欄位中的所有字元都是小寫。 + 如需下載查詢結果的相關資訊，請參閱[使用 Athena 主控台下載查詢結果檔案](saving-query-results.md)。 # 查詢 AWS Network Firewall 日誌 AWS Network Firewall 是一項受管服務，可用來部署 Amazon Virtual Private Cloud 執行個體的基本網路保護。與搭配 AWS Network Firewall 運作 AWS Firewall Manager ，讓您可以根據 AWS Network Firewall 規則建置政策，然後將這些政策集中套用至您的 VPCs和帳戶。如需的詳細資訊 AWS Network Firewall，請參閱 [AWS Network Firewall](https://aws.amazon.com/network-firewall/)。您可以為轉送至防火牆具狀態規則引擎的流量設定 AWS Network Firewall 記錄。記錄功能提供詳細的網路流量資訊，包括有狀態引擎接收封包的時間、詳細的封包資訊，以及對封包採取的任何有狀態規則動作。日誌會發佈至您已設定的日誌目的地，其中您可以擷取並檢視它們。如需詳細資訊，請參閱《AWS Network Firewall 開發人員指南》**中的[記錄來自 AWS Network Firewall的網路流量](https://docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-logging.html)。 **Topics** + [建立並查詢提醒日誌的資料表](querying-network-firewall-logs-sample-alert-logs-table.md) + [建立並查詢 Netflow 日誌的資料表](querying-network-firewall-logs-sample-netflow-logs-table.md) # 建立並查詢提醒日誌的資料表 1. 修改下列範例 DDL 陳述式，以符合提醒日誌的結構。您可能需要更新陳述式，以加入最新版本日誌的資料欄。如需詳細資訊，請參閱《AWS Network Firewall 開發人員指南》**中的[防火牆日誌內容](https://docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-logging.html#firewall-logging-contents)。 ``` CREATE EXTERNAL TABLE network_firewall_alert_logs ( firewall_name string, availability_zone string, event_timestamp string, event struct< timestamp:string, flow_id:bigint, event_type:string, src_ip:string, src_port:int, dest_ip:string, dest_port:int, proto:string, app_proto:string, sni:string, tls_inspected:boolean, tls_error:struct< error_message:string>, revocation_check:struct< leaf_cert_fpr:string, status:string, action:string>, alert:struct< alert_id:string, alert_type:string, action:string, signature_id:int, rev:int, signature:string, category:string, severity:int, rule_name:string, alert_name:string, alert_severity:string, alert_description:string, file_name:string, file_hash:string, packet_capture:string, reference_links:array >, src_country:string, dest_country:string, src_hostname:string, dest_hostname:string, user_agent:string, url:string > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://amzn-s3-demo-bucket/path_to_alert_logs_folder/'; ``` 1. 修改 `LOCATION` 子句，以在 Amazon S3 中指定日誌的資料夾。 1. 在 Athena 查詢編輯器中，執行 `CREATE TABLE` 查詢。查詢完成之後，Athena 會註冊 `network_firewall_alert_logs` 資料表，讓其所指向的資料可供您查詢。 ## 查詢範例本節中的範例提醒日誌查詢會篩選執行 TLS 檢查且提醒嚴重等級為 2 或更高級別的事件。查詢會使用別名來建立輸出資料欄標題，以顯示資料欄所屬的 `struct`。例如，`event.alert.category` 欄位的資料欄標題是 `event_alert_category` 而不只是 `category`。若要進一步自訂資料欄名稱，您可以依照您的偏好設定修改別名。例如，您可以使用底線或其他分隔符號來分隔 `struct` 名稱和欄位名稱。請記得根據您的資料表定義和查詢結果中要包含的欄位來修改資料欄名稱和 `struct` 參考。 ``` SELECT firewall_name, availability_zone, event_timestamp, event.timestamp AS event_timestamp, event.flow_id AS event_flow_id, event.event_type AS event_type, event.src_ip AS event_src_ip, event.src_port AS event_src_port, event.dest_ip AS event_dest_ip, event.dest_port AS event_dest_port, event.proto AS event_protol, event.app_proto AS event_app_proto, event.sni AS event_sni, event.tls_inspected AS event_tls_inspected, event.tls_error.error_message AS event_tls_error_message, event.revocation_check.leaf_cert_fpr AS event_revocation_leaf_cert, event.revocation_check.status AS event_revocation_check_status, event.revocation_check.action AS event_revocation_check_action, event.alert.alert_id AS event_alert_alert_id, event.alert.alert_type AS event_alert_alert_type, event.alert.action AS event_alert_action, event.alert.signature_id AS event_alert_signature_id, event.alert.rev AS event_alert_rev, event.alert.signature AS event_alert_signature, event.alert.category AS event_alert_category, event.alert.severity AS event_alert_severity, event.alert.rule_name AS event_alert_rule_name, event.alert.alert_name AS event_alert_alert_name, event.alert.alert_severity AS event_alert_alert_severity, event.alert.alert_description AS event_alert_alert_description, event.alert.file_name AS event_alert_file_name, event.alert.file_hash AS event_alert_file_hash, event.alert.packet_capture AS event_alert_packet_capture, event.alert.reference_links AS event_alert_reference_links, event.src_country AS event_src_country, event.dest_country AS event_dest_country, event.src_hostname AS event_src_hostname, event.dest_hostname AS event_dest_hostname, event.user_agent AS event_user_agent, event.url AS event_url FROM network_firewall_alert_logs WHERE event.alert.severity >= 2 AND event.tls_inspected = true LIMIT 10; ``` # 建立並查詢 Netflow 日誌的資料表 1. 修改下列範例 DDL 陳述式，以符合 Netflow 日誌的結構。您可能需要更新陳述式，以加入最新版本日誌的資料欄。如需詳細資訊，請參閱《AWS Network Firewall 開發人員指南》**中的[防火牆日誌內容](https://docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-logging.html#firewall-logging-contents)。 ``` CREATE EXTERNAL TABLE network_firewall_netflow_logs ( firewall_name string, availability_zone string, event_timestamp string, event struct< timestamp:string, flow_id:bigint, event_type:string, src_ip:string, src_port:int, dest_ip:string, dest_port:int, proto:string, app_proto:string, tls_inspected:boolean, netflow:struct< pkts:int, bytes:bigint, start:string, `end`:string, age:int, min_ttl:int, max_ttl:int, tcp_flags:struct< syn:boolean, fin:boolean, rst:boolean, psh:boolean, ack:boolean, urg:boolean > > > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://amzn-s3-demo-bucket/path_to_netflow_logs_folder/'; ``` 1. 修改 `LOCATION` 子句，以在 Amazon S3 中指定日誌的資料夾。 1. 在 Athena 查詢編輯器中，執行 `CREATE TABLE` 查詢。查詢完成之後，Athena 會註冊 `network_firewall_netflow_logs` 資料表，讓其所指向的資料可供您查詢。 ## 查詢範例本節中的範例 Netflow 日誌查詢會篩選執行 TLS 檢查的事件。查詢會使用別名來建立輸出資料欄標題，以顯示資料欄所屬的 `struct`。例如，`event.netflow.bytes` 欄位的資料欄標題是 `event_netflow_bytes` 而不只是 `bytes`。若要進一步自訂資料欄名稱，您可以依照您的偏好設定修改別名。例如，您可以使用底線或其他分隔符號來分隔 `struct` 名稱和欄位名稱。請記得根據您的資料表定義和查詢結果中要包含的欄位來修改資料欄名稱和 `struct` 參考。 ``` SELECT event.src_ip AS event_src_ip, event.dest_ip AS event_dest_ip, event.proto AS event_proto, event.app_proto AS event_app_proto, event.tls_inspected AS event_tls_inspected, event.netflow.pkts AS event_netflow_pkts, event.netflow.bytes AS event_netflow_bytes, event.netflow.tcp_flags.syn AS event_netflow_tcp_flags_syn FROM network_firewall_netflow_logs WHERE event.tls_inspected = true ``` # 查詢 Network Load Balancer 日誌使用 Athena 分析和處理 Network Load Balancer 的日誌。這些日誌會接收傳送至 Network Load Balancer 之 Transport Layer Security (TLS) 的詳細資訊。您可以使用這些存取日誌來分析流量模式和排除問題。分析 Network Load Balancer 存取日誌之前，請對其啟用和設定，以在目的地 Amazon S3 儲存貯體中儲存。如需詳細資訊，以及有關每個 Network Load Balancer 存取日誌項目的資訊，請參閱[存取您的 Network Load Balancer 日誌](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html)。 **若要為 Network Load Balancer 日誌建立資料表** 1. 複製下列 DDL 陳述式，並將其貼到 Athena 主控台。檢查 Network Load Balancer 日誌記錄的[語法](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html#access-log-file-format)。視需要更新陳述式，以包含與日誌記錄對應的資料欄和 regex。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS nlb_tls_logs ( type string, version string, time string, elb string, listener_id string, client_ip string, client_port int, target_ip string, target_port int, tcp_connection_time_ms double, tls_handshake_time_ms double, received_bytes bigint, sent_bytes bigint, incoming_tls_alert int, cert_arn string, certificate_serial string, tls_cipher_suite string, tls_protocol_version string, tls_named_group string, domain_name string, alpn_fe_protocol string, alpn_be_protocol string, alpn_client_preference_list string, tls_connection_creation_time string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ?([^ ]*)?( .*)?' ) LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_account_ID/elasticloadbalancing/region'; ``` 1. 修改 `LOCATION` Amazon S3 儲存貯體，以指定您的 Network Load Balancer 日誌的目的地。 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `nlb_tls_logs` 資料表，讓其中的資料可供您查詢。 ## 查詢範例若要查看憑證使用的次數，請使用類似此範例的查詢： ``` SELECT count(*) AS ct, cert_arn FROM "nlb_tls_logs" GROUP BY cert_arn; ``` 下列查詢顯示使用早於 TLS 1.3 版本的使用者數量： ``` SELECT tls_protocol_version, COUNT(tls_protocol_version) AS num_connections, client_ip FROM "nlb_tls_logs" WHERE tls_protocol_version < 'tlsv13' GROUP BY tls_protocol_version, client_ip; ``` 使用下列查詢，來識別需要很長 TLS 交握時間的連接： ``` SELECT * FROM "nlb_tls_logs" ORDER BY tls_handshake_time_ms DESC LIMIT 10; ``` 使用下列查詢可以識別和計算在過去 30 天內交涉的 TLS 通訊協定版本和密碼套件。 ``` SELECT tls_cipher_suite, tls_protocol_version, COUNT(*) AS ct FROM "nlb_tls_logs" WHERE from_iso8601_timestamp(time) > current_timestamp - interval '30' day AND NOT tls_protocol_version = '-' GROUP BY tls_cipher_suite, tls_protocol_version ORDER BY ct DESC; ``` # 查詢 Amazon Route 53 Resolver 查詢日誌您可以為 Amazon Route 53 Resolver 查詢日誌建立 Athena 資料表，並從 Athena 查詢它們。 Route 53 Resolver 查詢記錄功能用於記錄資源在 VPC 內所做的 DNS 查詢、使用傳入解析程式端點的內部部署資源、使用遞迴 DNS 解析之傳出解析程式端點的查詢，以及使用 Route 53 Resolver DNS 防火牆規則封鎖、允許或監控網域清單的查詢。如需有關 Resolver 查詢記錄的詳細資訊，請參閱《Amazon Route 53 開發人員指南》**中的 [Resolver 查詢記錄](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs.html)。如需有關日誌中各欄位的資訊，請參閱《Amazon Route 53 開發人員指南》**中的[在 Resolver 查詢日誌中出現的值](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs-format.html)。 **Topics** + [建立 Resolver 查詢日誌的資料表](querying-r53-resolver-logs-creating-the-table.md) + [使用分割區投影](querying-r53-resolver-logs-partitioning-example.md) + [查詢範例](querying-r53-resolver-logs-example-queries.md) # 建立 Resolver 查詢日誌的資料表您可以使用 Athena 主控台中的查詢編輯器，建立並查詢 Route 53 Resolver 查詢日誌的資料表。 **若要建立並查詢 Route 53 Resolver 查詢日誌的 Athena 資料表** 1. 前往 [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home) 開啟 Athena 主控台。 1. 在 Athena 查詢編輯器中，輸入以下 `CREATE TABLE` 陳述式。使用對應於 Amazon S3 中 Resolver 日誌位置的值取代 `LOCATION` 子句值。 ``` CREATE EXTERNAL TABLE r53_rlogs ( version string, account_id string, region string, vpc_id string, query_timestamp string, query_name string, query_type string, query_class string, rcode string, answers array< struct< Rdata: string, Type: string, Class: string> >, srcaddr string, srcport int, transport string, srcids struct< instance: string, resolver_endpoint: string >, firewall_rule_action string, firewall_rule_group_id string, firewall_domain_list_id string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/aws_account_id/vpcdnsquerylogs/{vpc-id}/' ``` 由於 Resolver 查詢日誌資料是 JSON 格式，CREATE TABLE 陳述式會使用 [JSON SerDe 程式庫](json-serde.md)分析資料。 **注意** SerDe 預期每筆 JSON 文件都以單行文字表示，而且沒有行終止字元分隔記錄中的欄位。如果 JSON 文字是美化過的列印格式，則在建立資料表後嘗試在其中查詢時可能會收到下列錯誤訊息：HIVE\$1CURSOR\$1ERROR: Row is not a valid JSON Object (HIVE\$1CURSOR\$1ERROR：資料列不是有效的 JSON 物件) 或 HIVE\$1CURSOR\$1ERROR: JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (HIVE\$1CURSOR\$1ERROR：JsonParseException：非預期的輸入結束：預期為 OBJECT 的關閉標記)。如需詳細資訊，請參閱 GitHub 上 OpenX SerDe 文件中的 [JSON 資料檔案](https://github.com/rcongiu/Hive-JSON-Serde#json-data-files)。 1. 選擇 **Run query** (執行查詢)。該陳述式會建立名為 `r53_rlogs` 的 Athena 資料表，其資料欄表示 Resolver 日誌資料中的各欄位。 1. 在 Athena 主控台查詢編輯器中，執行以下查詢，以確認您的資料表已建立。 ``` SELECT * FROM "r53_rlogs" LIMIT 10 ``` # 使用分割區投影下列範例顯示使用分割區投影並依據 vpc 和日期分割的 Resolver 查詢日誌 `CREATE TABLE` 陳述式。如需有關分割區投影的詳細資訊，請參閱[透過 Amazon Athena 使用分割區投影](partition-projection.md)。 ``` CREATE EXTERNAL TABLE r53_rlogs ( version string, account_id string, region string, vpc_id string, query_timestamp string, query_name string, query_type string, query_class string, rcode string, answers array< struct< Rdata: string, Type: string, Class: string> >, srcaddr string, srcport int, transport string, srcids struct< instance: string, resolver_endpoint: string >, firewall_rule_action string, firewall_rule_group_id string, firewall_domain_list_id string ) PARTITIONED BY ( `date` string, `vpc` string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/route53-query-logging/AWSLogs/aws_account_id/vpcdnsquerylogs/' TBLPROPERTIES( 'projection.enabled' = 'true', 'projection.vpc.type' = 'enum', 'projection.vpc.values' = 'vpc-6446ae02', 'projection.date.type' = 'date', 'projection.date.range' = '2023/06/26,NOW', 'projection.date.format' = 'yyyy/MM/dd', 'projection.date.interval' = '1', 'projection.date.interval.unit' = 'DAYS', 'storage.location.template' = 's3://amzn-s3-demo-bucket/route53-query-logging/AWSLogs/aws_account_id/vpcdnsquerylogs/${vpc}/${date}/' ) ``` # 查詢範例以下範例顯示您可以從 Athena 對 Resolver 查詢日誌執行的部分查詢。 ## 範例 1 – 以遞減的 query\$1timestamp 順序查詢日誌以下查詢以遞減的 `query_timestamp` 順序顯示日誌結果。 ``` SELECT * FROM "r53_rlogs" ORDER BY query_timestamp DESC ``` ## 範例 2 – 在指定的開始和結束時間內查詢日誌以下查詢會查詢 2020 年 9 月 24 日午夜到早上 8 點之間的日誌。根據您的需求替換開始和結束時間。 ``` SELECT query_timestamp, srcids.instance, srcaddr, srcport, query_name, rcode FROM "r53_rlogs" WHERE (parse_datetime(query_timestamp,'yyyy-MM-dd''T''HH:mm:ss''Z') BETWEEN parse_datetime('2020-09-24-00:00:00','yyyy-MM-dd-HH:mm:ss') AND parse_datetime('2020-09-24-00:08:00','yyyy-MM-dd-HH:mm:ss')) ORDER BY query_timestamp DESC ``` ## 範例 3 – 根據指定的 DNS 查詢名稱模式查詢日誌以下查詢會選取其查詢名稱包括 "example.com" 字串的記錄。 ``` SELECT query_timestamp, srcids.instance, srcaddr, srcport, query_name, rcode, answers FROM "r53_rlogs" WHERE query_name LIKE '%example.com%' ORDER BY query_timestamp DESC ``` ## 範例 4 – 查詢無答案的日誌請求以下查詢會選取請求未收到答案的日誌項目。 ``` SELECT query_timestamp, srcids.instance, srcaddr, srcport, query_name, rcode, answers FROM "r53_rlogs" WHERE cardinality(answers) = 0 ``` ## 範例 5 – 查詢有特定答案的日誌以下查詢會顯示其中的 `answer.Rdata` 值有指定 IP 地址的日誌。 ``` SELECT query_timestamp, srcids.instance, srcaddr, srcport, query_name, rcode, answer.Rdata FROM "r53_rlogs" CROSS JOIN UNNEST(r53_rlogs.answers) as st(answer) WHERE answer.Rdata='203.0.113.16'; ``` # 查詢 Amazon SES 事件日誌您可以使用 Amazon Athena 來查詢 [Amazon Simple Email Service](https://aws.amazon.com/ses/) (Amazon SES) 事件日誌。 Amazon SES 是一種電子郵件平台，提供便利且符合經濟效益的方式，讓您能使用自有的電子郵件地址和網域傳送及接收電子郵件。您可以使用事件、指標和統計資料，以細微的程度監控 Amazon SES 傳送活動。依據您定義的特徵，您可以將 Amazon SES 事件發布至 [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)、[Amazon Data Firehose](https://aws.amazon.com/kinesis/data-firehose/) 或 [Amazon Simple Notification Service](https://aws.amazon.com/sns/)。將資訊存放在 Amazon S3 之後，您就能從 Amazon Athena 進行查詢。如需 Amazon SES 日誌的範例 Athena `CREATE TABLE`陳述式，包括如何在 Amazon SES 事件日誌資料中建立檢視和扁平化巢狀陣列的步驟，請參閱 AWS 部落格文章中的「使用 Amazon Athena 查詢 SES 事件日誌」使用 Analytics Services 分析 Amazon SES 事件資料。 [ Amazon SES AWS](https://aws.amazon.com/blogs/messaging-and-targeting/analyzing-amazon-ses-event-data-with-aws-analytics-services/) # 查詢 Amazon VPC 流程日誌 Amazon Virtual Private Cloud 流量日誌擷取傳入和傳出 VPC 中網路界面之 IP 流量的相關資訊。使用日誌來調查網路流量模式，並識別整個 VPC 網路所遭受的威脅和風險。若要查詢 Amazon VPC 流程日誌，您有下列兩種選擇： **** + **Amazon VPC 主控台** – 使用 Amazon VPC 主控台中的 Athena 整合功能來產生 CloudFormation 範本，以建立具有分割的 Athena 資料庫、工作群組和流程日誌資料表。此範本也會建立一組[預先定義流程日誌查詢](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-athena.html#predefined-queries)，您可以使用其取得關於流經 VPC 流量的可行見解。如需有關此方法的詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[使用 Amazon Athena 查詢流程日誌](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-athena.html)。 + **Amazon Athena 主控台** – 直接在 Athena 主控台中建立資料表和查詢。如需詳細資訊，請繼續閱讀此頁內容。在 Athena 中開始查詢日誌之前，[請啟用 VPC 流程日誌](https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs.html)，並將它們設定為儲存到您的 Amazon S3 儲存貯體。建立日誌之後，讓它們執行幾分鐘以收集一些資料。日誌是以 Athena 可讓您直接查詢的 GZIP 壓縮格式建立。建立 VPC 流程日誌時，若您想要指定要在流程日誌中傳回的欄位以及欄位出現的順序，請使用自訂格式。如需有關流程日誌記錄的詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[流程日誌記錄](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records)。 ## 考量和限制在 Athena 中為 Amazon VPC 流程日誌建立資料表時，請記住以下幾點： + 預設情況下，在 Athena 中，Parquet 將會依名稱存取欄。如需詳細資訊，請參閱[處理結構描述更新](handling-schema-updates-chapter.md)。 + 使用流程日誌記錄中的名稱作為 Athena 中的資料欄名稱。Athena 結構描述中的資料欄名稱應與 Amazon VPC 流程日誌中的欄位名稱完全相符，但有以下差異： + 在 Athena 資料欄名稱中，將 Amazon VPC 日誌欄位名稱中的連字號替換為底線。如需有關 Athena 中資料庫名稱、資料表名稱和資料欄名稱可接受字元的資訊，請參閱 [為資料庫、資料表和資料欄命名](tables-databases-columns-names.md)。 + 透過反引號將流程日誌記錄名稱逸出為 Athena 中的[保留關鍵字](reserved-words.md)。 + VPC 流程日誌是 AWS 帳戶特定的。當您將日誌檔案發布到 Amazon S3 時，Amazon VPC 在 Amazon S3 中建立的路徑會包含用來建立流量日誌的 AWS 帳戶的 ID。如需詳細資訊，請參閱《Amazon VPC 使用者指南》**中的[發布流量日誌至 Amazon S3](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-s3.html)。 **Topics** + [考量和限制](#vpc-flow-logs-common-considerations) + [建立 Amazon VPC 流程日誌的資料表並進行查詢](vpc-flow-logs-create-table-statement.md) + [以 Apache Parquet 格式為流程日誌建立資料表](vpc-flow-logs-parquet.md) + [使用分割區投影為 Amazon VPC 流程日誌建立和查詢資料表](vpc-flow-logs-partition-projection.md) + [使用分割區投影，以 Apache Parquet 格式為流程日誌建立資料表](vpc-flow-logs-partition-projection-parquet-example.md) + [其他資源](query-examples-vpc-logs-additional-resources.md) # 建立 Amazon VPC 流程日誌的資料表並進行查詢以下程序會建立 Amazon VPC 流程日誌的 Amazon VPC 資料表。當您使用自訂格式建立流程日誌時，需要建立資料表，其欄位符合您在建立流程日誌時指定的欄位，而順序與您指定它們的順序相同。 **為 Amazon VPC 流量日誌建立 Athena 資料表** 1. 在 Athena 主控台查詢編輯器中輸入類似下列的 DDL 陳述式，遵循 [考量和限制](vpc-flow-logs.md#vpc-flow-logs-common-considerations) 區段中的準則。陳述式範例會建立擁有 Amazon VPC 流程日誌版本 2 至 5 (如[流程日誌記錄](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records)所記載) 之資料欄的資料表。如果您使用不同組的資料欄或資料欄順序，請據此修改陳述式。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS `vpc_flow_logs` ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, region string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY (`date` date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/{region_code}/' TBLPROPERTIES ("skip.header.line.count"="1"); ``` 請注意以下重點： + 該查詢指定 `ROW FORMAT DELIMITED`，且省略不指定 SerDe。這表示該查詢會使用 [適用於 CSV、TSV 和自訂分隔檔案的 Lazy Simple SerDe](lazy-simple-serde.md)。在這個查詢中，欄位以空格結尾。 + `PARTITIONED BY` 子句使用 `date` 類型。這樣就可以在查詢中使用數學運算子來選取特定日期之前或之後的項目。 **注意** 因為 `date` 是 DDL 陳述式中保留的關鍵字，它會溢出反引號字元。如需詳細資訊，請參閱[在查詢中逸出預留關鍵字](reserved-words.md)。 + 針對具有不同自訂格式的 VPC 流程日誌，請修改欄位，使其符合您建立流程日誌時指定的欄位。 1. 修改 `LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/{region_code}/'` 來指向包含日誌資料的 Amazon S3 儲存貯體。 1. 在 Athena 主控台中執行查詢。查詢完成之後，Athena 會註冊 `vpc_flow_logs` 資料表，讓其中的資料可供您發出查詢。 1. 建立能夠讀取資料的分割區，如下列查詢範例所示。這個查詢針對指定的日期建立單一分割區。視需要將預留位置換成日期和位置。 **注意** 這個查詢針對您指定的日期，只建立單一分割區。若要自動執行程序，請使用指令碼以執行此查詢和依此針對 `year/month/day` 建立分割區，或使用指定[分割區投影](vpc-flow-logs-partition-projection.md)的 `CREATE TABLE` 陳述式。 ``` ALTER TABLE vpc_flow_logs ADD PARTITION (`date`='YYYY-MM-dd') LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/{region_code}/YYYY/MM/dd'; ``` ## vpc\$1flow\$1logs 資料表的查詢範例使用 Athena 主控台中的查詢編輯器在您建立的資料表上執行 SQL 陳述式。您可以儲存查詢、檢視先前的查詢，或以 CSV 格式下載查詢結果。在下列範例中，將 `vpc_flow_logs` 替換為您的資料表名稱。根據自己的需求修改欄值和其他變數。下列範例查詢會列出指定日期的最多 100 個流程日誌。 ``` SELECT * FROM vpc_flow_logs WHERE date = DATE('2020-05-04') LIMIT 100; ``` 以下查詢列出所有拒絕的 TCP 連線，並使用新建立的日期新分割欄 `date`，以從中擷取發生這些事件是星期幾。 ``` SELECT day_of_week(date) AS day, date, interface_id, srcaddr, action, protocol FROM vpc_flow_logs WHERE action = 'REJECT' AND protocol = 6 LIMIT 100; ``` 若要查看您的哪一個伺服器接收最多的 HTTPS 請求，請使用下列查詢。它計算 HTTPS 連接埠 443 所接收的封包數、依目的地 IP 地址分組，然後傳回上週的前 10 個。 ``` SELECT SUM(packets) AS packetcount, dstaddr FROM vpc_flow_logs WHERE dstport = 443 AND date > current_date - interval '7' day GROUP BY dstaddr ORDER BY packetcount DESC LIMIT 10; ``` # 以 Apache Parquet 格式為流程日誌建立資料表以下程序會以 Parquet 格式建立 Amazon VPC 流程日誌的 Amazon VPC 資料表。 **為 Amazon VPC 流量日誌以 Parquet 格式建立 Athena 資料表** 1. 在 Athena 主控台查詢編輯器中輸入類似下列的 DDL 陳述式，遵循 [考量和限制](vpc-flow-logs.md#vpc-flow-logs-common-considerations) 區段中的準則。陳述式範例會建立擁有 Amazon VPC 流程日誌版本 2 至 5 (如 Parquet 格式、Hive 每小時已分割的[流程日誌記錄](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-log-records)所記載) 之資料欄的資料表。如果您沒有小時分割區，請從 `PARTITIONED BY` 子句中移除 `hour`。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs_parquet ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, region string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY ( `aws-account-id` string, `aws-service` string, `aws-region` string, `year` string, `month` string, `day` string, `hour` string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/' TBLPROPERTIES ( 'EXTERNAL'='true', 'skip.header.line.count'='1' ) ``` 1. 修改 `LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/'` 來指向包含日誌資料的 Amazon S3 路徑。 1. 在 Athena 主控台中執行查詢。 1. 如果您的資料採用 Hive 相容的格式，請在 Athena 主控台中執行以下命令來更新和載入中繼存放區中的 Hive 分割區。查詢完成後，您可以在 `vpc_flow_logs_parquet` 資料表中查詢資料。 ``` MSCK REPAIR TABLE vpc_flow_logs_parquet ``` 如果您沒有使用 Hive 相容的資料，請執行 [ALTER TABLE ADD PARTITION](alter-table-add-partition.md) 來載入分割區。如需有關使用 Athena 查詢 Parquet 格式的 Amazon VPC 流程日誌的詳細資訊，請參閱 *AWS 大數據部落格*中的[使用 Apache Parquet 格式的 VPC 流程日誌最佳化效能並降低網路分析成本](https://aws.amazon.com/blogs/big-data/optimize-performance-and-reduce-costs-for-network-analytics-with-vpc-flow-logs-in-apache-parquet-format/)。 # 使用分割區投影為 Amazon VPC 流程日誌建立和查詢資料表使用如下 `CREATE TABLE` 陳述句，以建立資料表、分割資料表，並使用[分割區投影](partition-projection.md)自動填入分割區。將範例中的資料表名稱 `test_table_vpclogs` 替換為您的資料表名稱。編輯 `LOCATION` 子句以指定包含 Amazon VPC 日誌資料的 Amazon S3 儲存貯體。以下 `CREATE TABLE` 陳述式適用於以非 Hive 樣式分割格式交付的 VPC 流程日誌。此範例允許多帳戶彙總。如果您要將多個帳戶的 VPC 流程日誌集中到一個 Amazon S3 儲存貯體，則必須在 Amazon S3 路徑中輸入帳戶 ID。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS test_table_vpclogs ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY (accid string, region string, day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LOCATION '$LOCATION_OF_LOGS' TBLPROPERTIES ( "skip.header.line.count"="1", "projection.enabled" = "true", "projection.accid.type" = "enum", "projection.accid.values" = "$ACCID_1,$ACCID_2", "projection.region.type" = "enum", "projection.region.values" = "$REGION_1,$REGION_2,$REGION_3", "projection.day.type" = "date", "projection.day.range" = "$START_RANGE,NOW", "projection.day.format" = "yyyy/MM/dd", "storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs/${accid}/vpcflowlogs/${region}/${day}" ) ``` ## test\$1table\$1vpclogs 的查詢範例下列查詢範例會查詢由前面 `CREATE TABLE` 陳述式所建立的 `test_table_vpclogs`。將查詢中的 `test_table_vpclogs` 替換為您自己的資料表名稱。根據自己的需求修改欄值和其他變數。若要在指定時段內依時間順序傳回前 100 個存取日誌項目，請執行如下查詢。 ``` SELECT * FROM test_table_vpclogs WHERE day >= '2021/02/01' AND day < '2021/02/28' ORDER BY day ASC LIMIT 100 ``` 若要查看哪個伺服器在指定時段內接收前十個 HTTP 封包，請執行如下查詢。查詢計算 HTTPS 連接埠 443 所接收的封包數、依目的地 IP 地址分組，然後傳回上週的前 10 個查詢。 ``` SELECT SUM(packets) AS packetcount, dstaddr FROM test_table_vpclogs WHERE dstport = 443 AND day >= '2021/03/01' AND day < '2021/03/31' GROUP BY dstaddr ORDER BY packetcount DESC LIMIT 10 ``` 若要傳回在指定時段內建立的日誌，請執行如下查詢。 ``` SELECT interface_id, srcaddr, action, protocol, to_iso8601(from_unixtime(start)) AS start_time, to_iso8601(from_unixtime("end")) AS end_time FROM test_table_vpclogs WHERE DAY >= '2021/04/01' AND DAY < '2021/04/30' ``` 若要傳回指定時段之間的來源 IP 地址的存取日誌，請執行如下查詢。 ``` SELECT * FROM test_table_vpclogs WHERE srcaddr = '10.117.1.22' AND day >= '2021/02/01' AND day < '2021/02/28' ``` 若要列出拒絕的 TCP 連接，請執行如下查詢。 ``` SELECT day, interface_id, srcaddr, action, protocol FROM test_table_vpclogs WHERE action = 'REJECT' AND protocol = 6 AND day >= '2021/02/01' AND day < '2021/02/28' LIMIT 10 ``` 若要傳回以 `10.117` 開頭的 IP 地址範圍的存取日誌，請執行如下查詢。 ``` SELECT * FROM test_table_vpclogs WHERE split_part(srcaddr,'.', 1)='10' AND split_part(srcaddr,'.', 2) ='117' ``` 若要傳回特定時間範圍之間的目的地 IP 地址的存取日誌，請執行如下查詢。 ``` SELECT * FROM test_table_vpclogs WHERE dstaddr = '10.0.1.14' AND day >= '2021/01/01' AND day < '2021/01/31' ``` # 使用分割區投影，以 Apache Parquet 格式為流程日誌建立資料表下列 VPC 流程日誌的資料分割投影 `CREATE TABLE` 陳述式採用 Apache Parquet 格式，與 Hive 不相容，並按小時和日期而非按日期進行分割。將範例中的資料表名稱 `test_table_vpclogs_parquet` 替換為您的資料表名稱。編輯 `LOCATION` 子句以指定包含 Amazon VPC 日誌資料的 Amazon S3 儲存貯體。 ``` CREATE EXTERNAL TABLE IF NOT EXISTS test_table_vpclogs_parquet ( version int, account_id string, interface_id string, srcaddr string, dstaddr string, srcport int, dstport int, protocol bigint, packets bigint, bytes bigint, start bigint, `end` bigint, action string, log_status string, vpc_id string, subnet_id string, instance_id string, tcp_flags int, type string, pkt_srcaddr string, pkt_dstaddr string, az_id string, sublocation_type string, sublocation_id string, pkt_src_aws_service string, pkt_dst_aws_service string, flow_direction string, traffic_path int ) PARTITIONED BY (region string, date string, hour string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/' TBLPROPERTIES ( "EXTERNAL"="true", "skip.header.line.count" = "1", "projection.enabled" = "true", "projection.region.type" = "enum", "projection.region.values" = "us-east-1,us-west-2,ap-south-1,eu-west-1", "projection.date.type" = "date", "projection.date.range" = "2021/01/01,NOW", "projection.date.format" = "yyyy/MM/dd", "projection.hour.type" = "integer", "projection.hour.range" = "00,23", "projection.hour.digits" = "2", "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/AWSLogs/${account_id}/vpcflowlogs/${region}/${date}/${hour}" ) ``` # 其他資源如需有關使用 Athena 分析 VPC 流程日誌的詳細資訊，請參閱以下 AWS 大數據部落格文章。 + [使用點選式 Amazon Athena 整合分析 VPC 流程日誌](https://aws.amazon.com/blogs/networking-and-content-delivery/analyze-vpc-flow-logs-with-point-and-click-amazon-athena-integration/) + [使用 Amazon Athena 和 Quick 分析 VPC 流程日誌](https://aws.amazon.com/blogs/big-data/analyzing-vpc-flow-logs-using-amazon-athena-and-amazon-quicksight/) + [使用 Apache Parquet 格式的 VPC 流程日誌，最佳化效能並降低網路分析成本](https://aws.amazon.com/blogs/big-data/optimize-performance-and-reduce-costs-for-network-analytics-with-vpc-flow-logs-in-apache-parquet-format/) # 查詢 AWS WAF 日誌 AWS WAF 是一種 Web 應用程式防火牆，可讓您監控和控制受保護 Web 應用程式從用戶端收到的 HTTP 和 HTTPS 請求。您可以透過在 Web 存取控制清單 (ACL) 中設定規則來定義如何處理 AWS WAF Web 請求。然後，您可以透過將 Web ACL 與 Web 應用程式建立關聯來保護 Web 應用程式。您可以使用保護的 Web 應用程式資源範例 AWS WAF 包括 Amazon CloudFront 分佈、Amazon API Gateway REST APIs 和 Application Load Balancer。如需的詳細資訊 AWS WAF，請參閱《 *AWS WAF 開發人員指南*[AWS WAF](https://docs.aws.amazon.com/waf/latest/developerguide/waf-chapter.html)》中的。 AWS WAF 日誌包含 Web ACL 所分析流量的相關資訊，例如從您的 AWS 資源 AWS WAF 接收請求的時間、請求的詳細資訊，以及每個請求相符之規則的動作。您可以設定 AWS WAF Web ACL 將日誌發佈到多個目的地之一，您可以在其中查詢和檢視它們。如需有關設定 Web ACL 記錄和 AWS WAF 日誌內容的詳細資訊，請參閱*AWS WAF 《開發人員指南*》中的[記錄 AWS WAF Web ACL 流量](https://docs.aws.amazon.com/waf/latest/developerguide/logging.html)。如需有關如何使用 Athena 分析 AWS WAF 日誌以深入了解威脅偵測和潛在安全攻擊的資訊，請參閱 AWS 網路與內容交付部落格文章[如何使用 Amazon Athena 查詢來分析 AWS WAF 日誌並提供威脅偵測所需的可見性](https://aws.amazon.com/blogs/networking-and-content-delivery/how-to-use-amazon-athena-queries-to-analyze-aws-waf-logs-and-provide-the-visibility-needed-for-threat-detection/)。如需如何將 AWS WAF 日誌彙總到中央資料湖儲存庫並使用 Athena 查詢日誌的範例，請參閱 AWS 大數據部落格文章[使用 OpenSearch Service、Amazon Athena 和 Quick 分析 AWS WAF 日誌](https://aws.amazon.com/blogs/big-data/analyzing-aws-waf-logs-with-amazon-es-amazon-athena-and-amazon-quicksight/)。本主題提供分割區投影、手動分割以及不使用任何分割的範例 `CREATE TABLE` 陳述式。 **注意** 所以本主題中的 `CREATE TABLE` 陳述式可同時用於 v1 和 v2 AWS WAF 日誌。在 v1 中，`webaclid` 欄位含有一個 ID。在 v2 中，`webaclid` 欄位含有一個完整的 ARN。此 `CREATE TABLE` 陳述式藉由使用 `string` 資料類型，以不可知的方式處理此內容。 **Topics** + [使用分割區投影在 Athena 中建立 AWS WAF S3 日誌的資料表](create-waf-table-partition-projection.md) + [使用手動分割區在 Athena 中建立 AWS WAF S3 日誌的資料表](create-waf-table-manual-partition.md) + [建立 AWS WAF 日誌的資料表而不分割](create-waf-table.md) + [AWS WAF 日誌的範例查詢](query-examples-waf-logs.md) # 使用分割區投影在 Athena 中建立 AWS WAF S3 日誌的資料表由於 AWS WAF 日誌具有已知的結構，您可以預先指定其分割區配置，因此您可以使用 Athena 分割區[投影功能來減少查詢執行時間並自動化分割區](partition-projection.md)管理。分割區投影會在新增資料時自動新增分割區。因此您無需使用 `ALTER TABLE ADD PARTITION` 手動新增分割區。下列範例`CREATE TABLE`陳述式會自動在 AWS WAF 指定日期到四個不同 AWS 區域的日誌上使用分割區投影。本範例中的 `PARTITION BY` 子句按區域和日期進行分割，但您可以根據自己的要求進行修改。視需要修改欄位，以符合您的日誌輸出。在 `LOCATION`和 `storage.location.template`子句中，將 *amzn-s3-demo-bucket* 和 *AWS\$1ACCOUNT\$1NUMBER* 預留位置取代為可識別 AWS WAF 日誌 Amazon S3 儲存貯體位置的值。對於 `projection.day.range`，請用您要使用的開始日期取代 *2021*/*01*/*01*。成功執行查詢之後，您可以查詢資料表。您無須執行 `ALTER TABLE ADD PARTITION` 就能載入分割區。 ``` CREATE EXTERNAL TABLE `waf_logs_partition_projection`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array>>, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array>>>,nonterminatingmatchingrules:array>>,challengeresponse:struct,captcharesponse:struct>>,excludedrules:string>>, `ratebasedrulelist` array>, `nonterminatingmatchingrules` array>>,challengeresponse:struct,captcharesponse:struct>>, `requestheadersinserted` array>, `responsecodesent` string, `httprequest` struct>,uri:string,args:string,httpversion:string,httpmethod:string,requestid:string,fragment:string,scheme:string,host:string>, `labels` array>, `captcharesponse` struct, `challengeresponse` struct, `ja3fingerprint` string, `ja4fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int) PARTITIONED BY ( `log_time` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_NUMBER/WAFLogs/cloudfront/testui/' TBLPROPERTIES ( 'projection.enabled'='true', 'projection.log_time.format'='yyyy/MM/dd/HH/mm', 'projection.log_time.interval'='1', 'projection.log_time.interval.unit'='minutes', 'projection.log_time.range'='2025/01/01/00/00,NOW', 'projection.log_time.type'='date', 'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_NUMBER/WAFLogs/cloudfront/testui/${log_time}') ``` **注意** 範例中 `LOCATION`子句中的路徑格式是標準格式，但可能會根據您實作的 AWS WAF 組態而有所不同。例如，下列範例 AWS WAF 日誌路徑適用於 CloudFront 分佈： ``` s3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_NUMBER/WAFLogs/cloudfront/cloudfronyt/2025/01/01/00/00/ ``` 如果您在建立或查詢 AWS WAF 日誌資料表時遇到問題，請確認日誌資料或[聯絡人支援](https://console.aws.amazon.com/support/home/)的位置。如需有關分割區投影的詳細資訊，請參閱[透過 Amazon Athena 使用分割區投影](partition-projection.md)。 # 使用手動分割區在 Athena 中建立 AWS WAF S3 日誌的資料表本節說明如何使用手動分割區為 AWS WAF 日誌建立資料表。在 `LOCATION`和 `storage.location.template`子句中，將 *amzn-s3-demo-bucket* 和 *AWS\$1ACCOUNT\$1NUMBER* 預留位置取代為可識別 AWS WAF 日誌 Amazon S3 儲存貯體位置的值。 ``` CREATE EXTERNAL TABLE `waf_logs_manual_partition`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array>>, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array>>>,nonterminatingmatchingrules:array>>,challengeresponse:struct,captcharesponse:struct>>,excludedrules:string>>, `ratebasedrulelist` array>, `nonterminatingmatchingrules` array>>,challengeresponse:struct,captcharesponse:struct>>, `requestheadersinserted` array>, `responsecodesent` string, `httprequest` struct>,uri:string,args:string,httpversion:string,httpmethod:string,requestid:string,fragment:string,scheme:string,host:string>, `labels` array>, `captcharesponse` struct, `challengeresponse` struct, `ja3fingerprint` string, `ja4fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int) PARTITIONED BY ( `year` string, `month` string, `day` string, `hour` string, `min` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/AWS_ACCOUNT_NUMBER/WAFLogs/cloudfront/webacl/' ``` # 建立 AWS WAF 日誌的資料表而不分割本節說明如何為 AWS WAF 日誌建立資料表，而無需分割或分割投影。 **注意** 出於效能和成本考量，我們不建議使用非分割結構描述進行查詢。如需詳細資訊，請參閱 AWS 大數據部落格中的 [Amazon Athena 的前 10 個效能調校秘訣](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)。 **建立 AWS WAF 資料表** 1. 複製下列 DDL 陳述式，並將其貼到 Athena 主控台。視需要修改欄位，以符合您的日誌輸出。修改 Amazon S3 儲存貯體的 `LOCATION` 以與存放日誌的儲存貯體相對應。此查詢會使用 [OpenX JSON SerDe](openx-json-serde.md)。 **注意** SerDe 預期每筆 JSON 文件都以單行文字表示，而且沒有行終止字元分隔記錄中的欄位。如果 JSON 文字是美化過的列印格式，則在建立資料表後嘗試在其中查詢時可能會收到下列錯誤訊息：HIVE\$1CURSOR\$1ERROR: Row is not a valid JSON Object (HIVE\$1CURSOR\$1ERROR：資料列不是有效的 JSON 物件) 或 HIVE\$1CURSOR\$1ERROR: JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (HIVE\$1CURSOR\$1ERROR：JsonParseException：非預期的輸入結束：預期為 OBJECT 的關閉標記)。如需詳細資訊，請參閱 GitHub 上 OpenX SerDe 文件中的 [JSON 資料檔案](https://github.com/rcongiu/Hive-JSON-Serde#json-data-files)。 ``` CREATE EXTERNAL TABLE `waf_logs`( `timestamp` bigint, `formatversion` int, `webaclid` string, `terminatingruleid` string, `terminatingruletype` string, `action` string, `terminatingrulematchdetails` array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, `httpsourcename` string, `httpsourceid` string, `rulegrouplist` array < struct < rulegroupid: string, terminatingrule: struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > > >, nonterminatingmatchingrules: array < struct < ruleid: string, action: string, overriddenaction: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, excludedrules: string > >, `ratebasedrulelist` array < struct < ratebasedruleid: string, limitkey: string, maxrateallowed: int > >, `nonterminatingmatchingrules` array < struct < ruleid: string, action: string, rulematchdetails: array < struct < conditiontype: string, sensitivitylevel: string, location: string, matcheddata: array < string > > >, challengeresponse: struct < responsecode: string, solvetimestamp: string >, captcharesponse: struct < responsecode: string, solvetimestamp: string > > >, `requestheadersinserted` array < struct < name: string, value: string > >, `responsecodesent` string, `httprequest` struct < clientip: string, country: string, headers: array < struct < name: string, value: string > >, uri: string, args: string, httpversion: string, httpmethod: string, requestid: string >, `labels` array < struct < name: string > >, `captcharesponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `challengeresponse` struct < responsecode: string, solvetimestamp: string, failureReason: string >, `ja3Fingerprint` string, `oversizefields` string, `requestbodysize` int, `requestbodysizeinspectedbywaf` int ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/prefix/' ``` 1. 在 Athena 主控台查詢編輯器中執行 `CREATE EXTERNAL TABLE` 陳述式。這會註冊 `waf_logs` 資料表，並讓其中的資料可用於從 Athena 進行查詢。 # AWS WAF 日誌的範例查詢本節中有許多範例查詢使用先前建立的分割區投影資料表。請根據您的需求修改範例中的資料表名稱、資料欄值及其他變數。若要改善查詢的效能並降低成本，請在篩選條件中新增分割區資料欄。 **Topics** + [計數推薦網站、IP 位址或相符規則](query-examples-waf-logs-count.md) + [使用日期和時間的查詢](query-examples-waf-logs-date-time.md) + [查詢封鎖的請求或地址](query-examples-waf-logs-blocked-requests.md) # 計數推薦網站、IP 位址或相符規則本節中的範例會查詢感興趣的日誌項目計數。 + [Count the number of referrers that contain a specified term](#waf-example-count-referrers-with-specified-term) + [Count all matched IP addresses in the last 10 days that have matched excluded rules](#waf-example-count-matched-ip-addresses) + [Group all counted managed rules by the number of times matched](#waf-example-group-managed-rules-by-times-matched) + [Group all counted custom rules by number of times matched](#waf-example-group-custom-rules-by-times-matched) **Example – 計算包含指定字詞的 Referrer 數量** 以下查詢會計算在指定的日期範圍內包含 "amazon" 一詞的 Referrer 數量。 ``` WITH test_dataset AS (SELECT header FROM waf_logs CROSS JOIN UNNEST(httprequest.headers) AS t(header) WHERE "date" >= '2021/03/01' AND "date" < '2021/03/31') SELECT COUNT(*) referer_count FROM test_dataset WHERE LOWER(header.name)='referer' AND header.value LIKE '%amazon%' ``` **Example – 計算過去 10 天內符合排除規則的所有相符 IP 地址** 以下查詢會計算過去 10 天內 IP 地址符合規則群組中排除規則的次數。 ``` WITH test_dataset AS (SELECT * FROM waf_logs CROSS JOIN UNNEST(rulegrouplist) AS t(allrulegroups)) SELECT COUNT(*) AS count, "httprequest"."clientip", "allrulegroups"."excludedrules", "allrulegroups"."ruleGroupId" FROM test_dataset WHERE allrulegroups.excludedrules IS NOT NULL AND from_unixtime(timestamp/1000) > now() - interval '10' day GROUP BY "httprequest"."clientip", "allrulegroups"."ruleGroupId", "allrulegroups"."excludedrules" ORDER BY count DESC ``` **Example – 依相符的次數對所有計數的受管規則進行分組** 如果您在 2022 年 10 月 27 日之前將規則群組規則動作設定為 Web ACL 組態中的計數，則會將 Web ACL JSON 中的覆寫 AWS WAF 儲存為 `excludedRules`。現在，將規則覆寫為計數的 JSON 設定位於 `ruleActionOverrides` 設定中。如需詳細資訊，請參閱《*AWS WAF 開發人員指南*》中的[規則群組中的動作覆寫](https://docs.aws.amazon.com/waf/latest/developerguide/web-acl-rule-group-override-options.html)。若要從新的日誌結構擷取「計數」模式下的受管規則，請查詢 `ruleGroupList` 區段中的 `nonTerminatingMatchingRules` 而非 `excludedRules` 欄位，如下列範例所示。 ``` SELECT count(*) AS count, httpsourceid, httprequest.clientip, t.rulegroupid, t.nonTerminatingMatchingRules FROM "waf_logs" CROSS JOIN UNNEST(rulegrouplist) AS t(t) WHERE action <> 'BLOCK' AND cardinality(t.nonTerminatingMatchingRules) > 0 GROUP BY t.nonTerminatingMatchingRules, action, httpsourceid, httprequest.clientip, t.rulegroupid ORDER BY "count" DESC Limit 50 ``` **Example – 依相符的次數對所有計數的自訂規則進行分組** 下列查詢會依符合的次數，將所有計數的自訂規則分組。 ``` SELECT count(*) AS count, httpsourceid, httprequest.clientip, t.ruleid, t.action FROM "waf_logs" CROSS JOIN UNNEST(nonterminatingmatchingrules) AS t(t) WHERE action <> 'BLOCK' AND cardinality(nonTerminatingMatchingRules) > 0 GROUP BY t.ruleid, t.action, httpsourceid, httprequest.clientip ORDER BY "count" DESC Limit 50 ``` 如需有關自訂規則和受管規則群組的日誌位置的資訊，請參閱《*AWS WAF 開發人員指南*》中的[監控和調校](https://docs.aws.amazon.com/waf/latest/developerguide/web-acl-testing-activities.html)。 # 使用日期和時間的查詢本節中的範例包括使用日期和時間值的查詢。 + [Return the timestamp field in human-readable ISO 8601 format](#waf-example-return-human-readable-timestamp) + [Return records from the last 24 hours](#waf-example-return-records-last-24-hours) + [Return records for a specified date range and IP address](#waf-example-return-records-date-range-and-ip) + [For a specified date range, count the number of IP addresses in five minute intervals](#waf-example-count-ip-addresses-in-date-range) + [Count the number of X-Forwarded-For IP in the last 10 days](#waf-example-count-x-forwarded-for-ip) **Example – 以人類看得懂的 ISO 8601 格式傳回時間戳記欄位** 以下查詢會使用 `from_unixtime` 和 `to_iso8601` 函數，以人類看得懂的 ISO 8601 格式傳回 `timestamp` 欄位 (例如 `2019-12-13T23:40:12.000Z`，而非 `1576280412771`) 查詢也會傳回 HTTP 來源名稱、來源 ID 和請求。 ``` SELECT to_iso8601(from_unixtime(timestamp / 1000)) as time_ISO_8601, httpsourcename, httpsourceid, httprequest FROM waf_logs LIMIT 10; ``` **Example – 傳回過去 24 小時的記錄** 以下查詢會使用 `WHERE` 子句中的篩選條件，傳回過去 24 小時記錄的 HTTP 來源名稱、HTTP 來源 ID 和 HTTP 請求欄位。 ``` SELECT to_iso8601(from_unixtime(timestamp/1000)) AS time_ISO_8601, httpsourcename, httpsourceid, httprequest FROM waf_logs WHERE from_unixtime(timestamp/1000) > now() - interval '1' day LIMIT 10; ``` **Example – 傳回指定日期範圍和 IP 地址的記錄** 以下查詢會列出指定用戶端 IP 地址的指定日期範圍內的記錄。 ``` SELECT * FROM waf_logs WHERE httprequest.clientip='53.21.198.66' AND "date" >= '2021/03/01' AND "date" < '2021/03/31' ``` **Example – 如果是指定的日期範圍，則會計算每隔五分鐘的 IP 地址數量** 以下查詢會針對特定日期範圍，計算每隔五分鐘的 IP 地址數量。 ``` WITH test_dataset AS (SELECT format_datetime(from_unixtime((timestamp/1000) - ((minute(from_unixtime(timestamp / 1000))%5) * 60)),'yyyy-MM-dd HH:mm') AS five_minutes_ts, "httprequest"."clientip" FROM waf_logs WHERE "date" >= '2021/03/01' AND "date" < '2021/03/31') SELECT five_minutes_ts,"clientip",count(*) ip_count FROM test_dataset GROUP BY five_minutes_ts,"clientip" ``` **Example – 計算過去 10 天內的 X-Forwarded-For IP 數量** 以下查詢會篩選請求標頭，並計算過去 10 天內的 X-Forwarded-For IP 數量。 ``` WITH test_dataset AS (SELECT header FROM waf_logs CROSS JOIN UNNEST (httprequest.headers) AS t(header) WHERE from_unixtime("timestamp"/1000) > now() - interval '10' DAY) SELECT header.value AS ip, count(*) AS COUNT FROM test_dataset WHERE header.name='X-Forwarded-For' GROUP BY header.value ORDER BY COUNT DESC ``` 如需有關日期和時間函數的詳細資訊，請參閱 Trino 文件中的 [Date and time functions and operators](https://trino.io/docs/current/functions/datetime.html) (日期和時間函數和運算子)。 # 查詢封鎖的請求或地址本節中的範例會查詢封鎖的請求或地址。 + [Extract the top 100 IP addresses blocked by a specified rule type](#waf-example-extract-top-100-blocked-ip-by-rule) + [Count the number of times a request from a specified country has been blocked](#waf-example-count-request-blocks-from-country) + [Count the number of times a request has been blocked, grouping by specific attributes](#waf-example-count-request-blocks-by-attribute) + [Count the number of times a specific terminating rule ID has been matched](#waf-example-count-terminating-rule-id-matches) + [Retrieve the top 100 IP addresses blocked during a specified date range](#waf-example-top-100-ip-addresses-blocked-for-date-range) **Example – 擷取遭到指定規則類型封鎖的前 100 個 IP 地址** 以下查詢會擷取並計算在指定日期範圍內，已遭到 `RATE_BASED` 終止規則封鎖的前 100 個 IP 地址。 ``` SELECT COUNT(httpRequest.clientIp) as count, httpRequest.clientIp FROM waf_logs WHERE terminatingruletype='RATE_BASED' AND action='BLOCK' and "date" >= '2021/03/01' AND "date" < '2021/03/31' GROUP BY httpRequest.clientIp ORDER BY count DESC LIMIT 100 ``` **Example – 計算來自指定國家/地區遭到封鎖的請求次數** 以下查詢會計算來自愛爾蘭 (IE) 的 IP 地址，並遭 `RATE_BASED` 終止規則封鎖的請求次數。 ``` SELECT COUNT(httpRequest.country) as count, httpRequest.country FROM waf_logs WHERE terminatingruletype='RATE_BASED' AND httpRequest.country='IE' GROUP BY httpRequest.country ORDER BY count LIMIT 100; ``` **Example – 計算遭封鎖的請求次數 (依特定屬性分組)** 以下查詢會計算遭封鎖的請求次數，並以 WebACL、RuleId、ClientIP 和 HTTP Request URI 分組。 ``` SELECT COUNT(*) AS count, webaclid, terminatingruleid, httprequest.clientip, httprequest.uri FROM waf_logs WHERE action='BLOCK' GROUP BY webaclid, terminatingruleid, httprequest.clientip, httprequest.uri ORDER BY count DESC LIMIT 100; ``` **Example – 計算與特定終止規則 ID 相符的次數。** 以下查詢會計算與特定終止規則 ID (`WHERE terminatingruleid='e9dd190d-7a43-4c06-bcea-409613d9506e'`) 相符的次數。查詢接著會以 WebACL、Action、ClientIP 和 HTTP Request URI 將結果分組。 ``` SELECT COUNT(*) AS count, webaclid, action, httprequest.clientip, httprequest.uri FROM waf_logs WHERE terminatingruleid='e9dd190d-7a43-4c06-bcea-409613d9506e' GROUP BY webaclid, action, httprequest.clientip, httprequest.uri ORDER BY count DESC LIMIT 100; ``` **Example – 擷取指定日期範圍內遭到封鎖的前 100 個 IP 地址** 以下查詢會擷取在指定日期範圍內，已遭到封鎖的前 100 個 IP 地址。該查詢也會列出 IP 地址遭到封鎖的次數。 ``` SELECT "httprequest"."clientip", "count"(*) "ipcount", "httprequest"."country" FROM waf_logs WHERE "action" = 'BLOCK' and "date" >= '2021/03/01' AND "date" < '2021/03/31' GROUP BY "httprequest"."clientip", "httprequest"."country" ORDER BY "ipcount" DESC limit 100 ``` 如需有關查詢 Amazon S3 日誌的資訊，請參閱下列主題： + AWS 知識中心中的[如何使用 Athena 分析 Amazon S3 伺服器存取日誌？](https://aws.amazon.com/premiumsupport/knowledge-center/analyze-logs-athena/) + 《Amazon Simple Storage Service 使用者指南》中的[使用 Amazon Athena 查詢請求的 Amazon S3 存取日誌](https://docs.aws.amazon.com/AmazonS3/latest/dev/using-s3-access-logs-to-identify-requests.html#querying-s3-access-logs-for-requests) + 《Amazon Simple Storage Service 使用者指南》中的[使用 AWS CloudTrail 識別 Amazon S3 請求](https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudtrail-request-identification.html)