

# 使用分区投影功能创建和查询 Amazon VPC 流日志表
<a name="vpc-flow-logs-partition-projection"></a>

使用如下所示的 `CREATE TABLE` 语句创建表、对表进行分区并使用[分区投影](partition-projection.md)自动填充分区。将示例中的表名称 `test_table_vpclogs` 替换为您的表名称。编辑 `LOCATION` 子句以指定包含 Amazon VPC 日志数据的 Amazon S3 存储桶。

以下 `CREATE TABLE` 语句适用于以非 Hive 样式的分区格式传送的 VPC 流日志。该示例支持多账户聚合。要将来自多个账户的 VPC 流日志集中到一个 Amazon S3 存储桶中，必须在 Amazon S3 路径中输入账户 ID。

```
CREATE EXTERNAL TABLE IF NOT EXISTS test_table_vpclogs (
  version int,
  account_id string,
  interface_id string,
  srcaddr string,
  dstaddr string,
  srcport int,
  dstport int,
  protocol bigint,
  packets bigint,
  bytes bigint,
  start bigint,
  `end` bigint,
  action string,
  log_status string,
  vpc_id string,
  subnet_id string,
  instance_id string,
  tcp_flags int,
  type string,
  pkt_srcaddr string,
  pkt_dstaddr string,
  az_id string,
  sublocation_type string,
  sublocation_id string,
  pkt_src_aws_service string,
  pkt_dst_aws_service string,
  flow_direction string,
  traffic_path int
)
PARTITIONED BY (accid string, region string, day string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION '$LOCATION_OF_LOGS'
TBLPROPERTIES
(
"skip.header.line.count"="1",
"projection.enabled" = "true",
"projection.accid.type" = "enum",
"projection.accid.values" = "$ACCID_1,$ACCID_2",
"projection.region.type" = "enum",
"projection.region.values" = "$REGION_1,$REGION_2,$REGION_3",
"projection.day.type" = "date",
"projection.day.range" = "$START_RANGE,NOW",
"projection.day.format" = "yyyy/MM/dd",
"storage.location.template" = "s3://amzn-s3-demo-bucket/AWSLogs/${accid}/vpcflowlogs/${region}/${day}"
)
```

## test\$1table\$1vpclogs 表的查询示例
<a name="query-examples-vpc-logs-pp"></a>

以下示例查询将查询之前 `CREATE TABLE` 语句创建的 `test_table_vpclogs`。将查询中的 `test_table_vpclogs` 替换为您自己的表名称。根据您的要求修改列值和其他变量。

若要在指定时间段内按时间顺序返回前 100 个访问日志条目，请运行如下所示的查询。

```
SELECT *
FROM test_table_vpclogs
WHERE day >= '2021/02/01' AND day < '2021/02/28'
ORDER BY day ASC
LIMIT 100
```

若要查看哪个服务器在指定时间段内接收前十个 HTTP 数据包，请运行如下所示的查询。该查询计算在 HTTPS 端口 443 上接收的数据包数，按目标 IP 地址对其进行分组，并返回上一周的前 10 个条目。

```
SELECT SUM(packets) AS packetcount, 
       dstaddr
FROM test_table_vpclogs
WHERE dstport = 443
  AND day >= '2021/03/01'
  AND day < '2021/03/31'
GROUP BY dstaddr
ORDER BY packetcount DESC
LIMIT 10
```

若要返回在指定时间段内创建的日志，请运行如下所示的查询。

```
SELECT interface_id,
       srcaddr,
       action,
       protocol,
       to_iso8601(from_unixtime(start)) AS start_time,
       to_iso8601(from_unixtime("end")) AS end_time
FROM test_table_vpclogs
WHERE DAY >= '2021/04/01'
  AND DAY < '2021/04/30'
```

若要返回指定时间段内源 IP 地址的访问日志，请运行如下所示的查询。

```
SELECT *
FROM test_table_vpclogs
WHERE srcaddr = '10.117.1.22'
  AND day >= '2021/02/01'
  AND day < '2021/02/28'
```

若要列出已被拒绝的 TCP 连接，请运行如下所示的查询。

```
SELECT day,
       interface_id,
       srcaddr,
       action,
       protocol
FROM test_table_vpclogs
WHERE action = 'REJECT' AND protocol = 6 AND day >= '2021/02/01' AND day < '2021/02/28'
LIMIT 10
```

若要返回以 `10.117` 开头的 IP 地址范围的访问日志，请运行如下所示的查询。

```
SELECT *
FROM test_table_vpclogs
WHERE split_part(srcaddr,'.', 1)='10'
  AND split_part(srcaddr,'.', 2) ='117'
```

若要返回某个时间范围内目标 IP 地址的访问日志，请运行如下所示的查询。

```
SELECT *
FROM test_table_vpclogs
WHERE dstaddr = '10.0.1.14'
  AND day >= '2021/01/01'
  AND day < '2021/01/31'
```