

# Tutorial: Stream JSON Log Files to Amazon S3 Using Kinesis Agent for Windows
Tutorial: Stream JSON Log Files to Amazon S3

This tutorial presents detailed steps for setting up a data pipeline using Amazon Kinesis Agent for Microsoft Windows (Kinesis Agent for Windows). 

The tutorial includes the following steps:
+ Using Kinesis Agent for Windows to stream JSON-formatted log files to [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/) via [Amazon Data Firehose](https://docs.aws.amazon.com/firehose/latest/dev/). For information about Kinesis Agent for Windows, see [What Is Amazon Kinesis Agent for Microsoft Windows?](what-is-kinesis-agent-windows.md).
+ Enhancing the log data before streaming using object decoration. For more information, see [Configuring Sink Decorations](sink-object-declarations.md#configuring-kinesis-agent-windows-decoration-configuration).
+ Using [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/) to search for particular kinds of log records.

**Prerequisites**  
If you don't already have an AWS account, follow the instructions in [Setting Up an AWS account](getting-started.md#getting-started-setting-up) to get one.

**Topics**
+ [

# Step 1: Configure AWS services
](kaw-ds2s3-tutorial-step1.md)
+ [

# Step 2: Install, Configure, and Run Kinesis Agent for Windows
](kaw-ds2s3-tutorial-step2.md)
+ [

# Step 3: Query the Log Data in Amazon S3
](kaw-ds2s3-tutorial-step3.md)
+ [

## Next Steps
](#kaw-ds2s3-tutorial-step4-next)

# Step 1: Configure AWS services


Follow these steps to prepare your environment for streaming log data to Amazon Simple Storage Service (Amazon S3) using Amazon Kinesis Agent for Microsoft Windows. For more information and prerequisites, see [Tutorial: Stream JSON Log Files to Amazon S3 Using Kinesis Agent for Windows](directory-source-to-s3-tutorial.md).

Use the AWS Management Console to configure AWS Identity and Access Management (IAM), Amazon S3, Firehose, and Amazon Elastic Compute Cloud (Amazon EC2) to prepare for streaming log data from an EC2 instance to Amazon S3.

**Topics**
+ [

## Configure IAM Policies and Roles
](#kaw-ds2s3-tutorial-step1.1)
+ [

## Create the Amazon S3 Bucket
](#kaw-ds2s3-tutorial-step1.2)
+ [

## Create the Firehose Delivery Stream
](#kaw-ds2s3-tutorial-step1.3)
+ [

## Create the Amazon EC2 Instance to Run Kinesis Agent for Windows
](#kaw-ds2s3-tutorial-step1.4)
+ [

## Next Steps
](#kaw-ds2s3-tutorial-next)

## Configure IAM Policies and Roles


Create the following policy, which authorizes Kinesis Agent for Windows to stream records to a specific Firehose delivery stream:

Replace the example Region, *us-east-1* with the name of the AWS Region where the Firehose delivery stream will be created. Also, replace the example account, *123456789012* with the 12-digit account ID for the AWS account where the delivery stream will be created.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": "arn:aws:firehose:us-east-1:123456789012:deliverystream/log-delivery-stream"
        }
    ]
}
```

------

 In the navigation bar, choose **Support**, and then **Support Center**. Your currently signed-in 12-digit account number (ID) appears in the **Support Center** navigation pane. 

Create the policy using the following procedure. Name the policy `log-delivery-stream-access-policy`. 

**To create a policy using the JSON policy editor**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane on the left side, choose **Policies**. 

   If this is your first time choosing **Policies**, the **Welcome to Managed Policies** page appears. Choose **Get Started**.

1. At the top of the page, choose **Create policy**.

1. Choose the **JSON** tab.

1. Enter a JSON policy document. For details about the IAM policy language, see [IAM JSON Policy Reference](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies.html) in the *IAM User Guide*. 

1. When you are finished, choose **Review policy**. The [Policy Validator](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_policy-validator.html) reports any syntax errors.
**Note**  
You can switch between the **Visual editor** and **JSON** tabs any time. However, if you make changes or choose **Review policy** in the **Visual editor** tab, IAM might restructure your policy to optimize it for the visual editor. For more information, see [Policy Restructuring](https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_policies.html#troubleshoot_viseditor-restructure) in the *IAM User Guide*.

1. On the **Review policy** page, enter a **Name** and a **Description** (optional) for the policy that you are creating. Review the policy **Summary** to see the permissions that are granted by your policy. Then choose **Create policy** to save your work.

**To create the role that gives Firehose access to an S3 bucket**

1. Using the previous procedure, create a policy named `firehose-s3-access-policy` that is defined using the following JSON.

   Replace the following in your IAM policy example:
   + The example Amazon S3 bucket name, *amzn-s3-demo-bucket* with a unique bucket name where the logs will be stored.
   + The example Region, *us-east-1* with the AWS Region where the CloudWatch Logs log group and log stream will be created. These are for logging any errors that occur during streaming the data to Amazon S3 via Firehose.
   + The example AWS account ID, *123456789012* with the 12-digit account ID for the account where the log group and log stream will be created.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	   
       "Statement":
       [    
           {      
               "Effect": "Allow",      
               "Action": [        
                   "s3:AbortMultipartUpload",        
                   "s3:GetBucketLocation",        
                   "s3:GetObject",        
                   "s3:ListBucket",        
                   "s3:ListBucketMultipartUploads",        
                   "s3:PutObject"
               ],      
               "Resource": [        
                   "arn:aws:s3:::amzn-s3-demo-bucket",
                   "arn:aws:s3:::amzn-s3-demo-bucket/*"		    
               ]    
           },
           {
              "Effect": "Allow",
              "Action": [
                  "logs:PutLogEvents"
              ],
              "Resource": [
                  "arn:aws:logs:us-east-1:123456789012:log-group:firehose-error-log-group:log-stream:firehose-error-log-stream"
              ]
           }
       ]
   }
   ```

------

1. In the navigation pane of the IAM console, choose **Roles**, and then choose **Create role**.

1. Choose the **AWS service** role type, and then choose the **Kinesis** service.

1. Choose **Firehose** for the use case, and then choose **Next: Permissions**.

1. In the search box, enter **firehose-s3-access-policy**, choose that policy, and then choose **Next: Review**.

1. In the **Role name** box, enter **firehose-s3-access-role**.

1. Choose **Create role**.

**To create the role to associate with the instance profile for the EC2 instance that will run Kinesis Agent for Windows**

1. In the navigation pane of the IAM console, choose **Roles**, and then choose **Create role**.

1. Choose the **AWS service** role type, and then choose **EC2**.

1. Choose **Next: Permissions**.

1. In the search box, enter **log-delivery-stream-access-policy**.

1. Choose the policy, and then choose **Next: Review**.

1. In the **Role name** box, enter **kinesis-agent-instance-role**.

1. Choose **Create role**.

## Create the Amazon S3 Bucket


 Create the S3 bucket where Firehose streams the logs. 

**To create the S3 bucket for log storage**

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. Choose **Create bucket**.

1. In the **Bucket name** box, enter the unique S3 bucket name that you chose in [Configure IAM Policies and Roles](#kaw-ds2s3-tutorial-step1.1).

1. Choose the Region where the bucket should be created. This is typically the same Region where you intend to create the Firehose delivery stream and the Amazon EC2 instance.

1. Choose **Create**.

## Create the Firehose Delivery Stream


Create the Firehose delivery stream that will store streamed records in Amazon S3.

**To create the Firehose delivery stream**

1. Open the Firehose console at [https://console.aws.amazon.com/firehose/](https://console.aws.amazon.com/firehose/).

1. Choose **Create Delivery Stream**.

1. In the **Delivery stream name** box, enter **log-delivery-stream**.

1. For the **Source**, choose **Direct PUT or other sources**.  
![\[Screenshot demonstrating how to specify a source when creating a Firehose delivery stream.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/fh-create-delivery-stream-1.png)

1. Choose **Next**.

1. Choose **Next** again.

1. For the destination, choose **Amazon S3**.

1. For the **S3 bucket**, choose the name of the bucket that you created in [Create the Amazon S3 Bucket](#kaw-ds2s3-tutorial-step1.2).  
![\[Screenshot demonstrating how to specify the destination when creating a Firehose delivery stream.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/fh-create-delivery-stream-2.png)

1. Choose **Next**.

1. In the **Buffer interval** box, enter **60**.

1. Under **IAM role**, choose **Create new or choose**.

1. For **IAM role**, choose `firehose-s3-access-role`.

1. Choose **Allow**.  
![\[Screenshot demonstrating how to configure options and security when creating a Firehose delivery stream.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/fh-create-delivery-stream-3.png)

1. Choose **Next**.

1. Choose **Create delivery stream**.

## Create the Amazon EC2 Instance to Run Kinesis Agent for Windows


Create the EC2 instance that uses Kinesis Agent for Windows to stream log records via Firehose.

**To create the EC2 instance**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. Follow the instructions in [Getting Started with Amazon EC2 Windows Instances](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EC2_GetStarted.html), using the following additional steps:
   + For the **IAM role** for the instance, choose `kinesis-agent-instance-role`.
   + If you don't already have a public internet-connected virtual private cloud (VPC), follow the instructions in [Setting Up with Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/get-set-up-for-amazon-ec2.html) in the *Amazon EC2 User Guide*.
   + Create or use a security group that limits access to the instance from only your computer, or only your organization's computers. For more information, see [Setting Up with Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/get-set-up-for-amazon-ec2.html) in the *Amazon EC2 User Guide*.
   + If you specify an existing key pair, be sure to have access to the private key for the key pair. Or, create a new key pair and save the private key in a safe place.
   + Before continuing, wait until the instance is running and has completed two out of two health checks.
   + Your instance requires a public IP address. If one hasn't been allocated, follow the instructions at [Elastic IP Addresses](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/elastic-ip-addresses-eip.html) in the *Amazon EC2 User Guide*.

## Next Steps


[Step 2: Install, Configure, and Run Kinesis Agent for Windows](kaw-ds2s3-tutorial-step2.md)

# Step 2: Install, Configure, and Run Kinesis Agent for Windows
Step 2: Install, Configure, and Run Kinesis Agent for Windows

In this step, you use the AWS Management Console to remotely connect to the instance that you launched in [Create the Amazon EC2 Instance to Run Kinesis Agent for Windows](kaw-ds2s3-tutorial-step1.md#kaw-ds2s3-tutorial-step1.4). You then install Amazon Kinesis Agent for Microsoft Windows on the instance, create and deploy the configuration file for Kinesis Agent for Windows, and start the `AWSKinesisTap` service.

1. Remotely connect to the instance via Remore Desktop Protocol (RDP) by following the instructions in [Step 2: Connect to Your Instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EC2_GetStarted.html#ec2-connect-to-instance-windows) in the *Amazon EC2 User Guide*.

1. On the instance, use Windows Server Manager to disable Microsoft Internet Explorer Enhanced Security Configuration for users and administrators. For more information, see [How To Turn Off Internet Explorer Enhanced Security Configuration](https://blogs.technet.microsoft.com/chenley/2011/03/10/how-to-turn-off-internet-explorer-enhanced-security-configuration/) on the Microsoft TechNet website. 

1. On the instance, install and configure Kinesis Agent for Windows. For more information, see [Installing Kinesis Agent for Windows](getting-started.md#getting-started-installation).

1. On the instance, use Notepad to create a Kinesis Agent for Windows configuration file. Save the file to `%PROGRAMFILES%\Amazon\AWSKinesisTap\appsettings.json`. Add the following content to the configuration file:

   ```
   {
     "Sources": [
       {
         "Id": "JsonLogSource",
         "SourceType": "DirectorySource",
         "RecordParser": "SingleLineJson",
         "Directory": "C:\\LogSource\\",
         "FileNameFilter": "*.log",
         "InitialPosition": 0
       }
     ],
     "Sinks": [
       {
         "Id": "FirehoseLogStream",
         "SinkType": "KinesisFirehose",
         "StreamName": "log-delivery-stream",
         "Region": "us-east-1",
         "Format": "json",
         "ObjectDecoration": "ComputerName={ComputerName};DT={timestamp:yyyy-MM-dd HH:mm:ss}"
       }
     ],
     "Pipes": [
       {
         "Id": "JsonLogSourceToFirehoseLogStream",
         "SourceRef": "JsonLogSource",
         "SinkRef": "FirehoseLogStream"
       }
     ]
   }
   ```

   This file configures Kinesis Agent for Windows to send JSON-formatted log records from files in the `c:\logsource\` directory (the *source*) to a Firehose delivery stream named `log-delivery-stream` (the *sink*). Before each log record is streamed to Firehose, it is enhanced with two extra key-value pairs that contain the name of the computer and a timestamp.

1. Create the `c:\LogSource\` directory, and use Notepad to create a `test.log` file in that directory with the following content:

   ```
   { "Message": "Copasetic message 1", "Severity": "Information" }
   { "Message": "Copasetic message 2", "Severity": "Information" }
   { "Message": "Problem message 2", "Severity": "Error" }
   { "Message": "Copasetic message 3", "Severity": "Information" }
   ```

1. In an elevated PowerShell session, use the following command to start the `AWSKinesisTap` service: 

   ```
   Start-Service -ServiceName AWSKinesisTap
   ```

1. Using File Explorer, browse to the `%PROGRAMDATA%\Amazon\AWSKinesisTap\logs` directory. Open the most recent log file. The log file should look similar to the following:

   ```
   2018-09-28 23:51:02.2472 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.AWS.AWSEventSinkFactory.
   2018-09-28 23:51:02.2784 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.Windows.PerformanceCounterSinkFactory.
   2018-09-28 23:51:02.5753 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.Core.DirectorySourceFactory.
   2018-09-28 23:51:02.5909 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.ExchangeSource.ExchangeSourceFactory.
   2018-09-28 23:51:02.5909 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.Uls.UlsSourceFactory.
   2018-09-28 23:51:02.5909 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.Windows.WindowsSourceFactory.
   2018-09-28 23:51:02.9347 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.Core.Pipes.PipeFactory.
   2018-09-28 23:51:03.5128 Amazon.KinesisTap.Hosting.LogManager INFO Registered factory Amazon.KinesisTap.AutoUpdate.AutoUpdateFactory.
   2018-09-28 23:51:03.5440 Amazon.KinesisTap.Hosting.LogManager INFO Performance counter sink  started.
   2018-09-28 23:51:03.7628 Amazon.KinesisTap.Hosting.LogManager INFO KinesisFirehoseSink id FirehoseLogStream for StreamName log-delivery-stream started.
   2018-09-28 23:51:03.7784 Amazon.KinesisTap.Hosting.LogManager INFO Connected source JsonLogSource to sink FirehoseLogStream
   2018-09-28 23:51:03.7940 Amazon.KinesisTap.Hosting.LogManager INFO DirectorySource id JsonLogSource watching directory C:\LogSource\ with filter *.log started.
   ```

   This log file indicates that the service has started and log records are now being collected from the `c:\LogSource\` directory. Each line is parsed as a single JSON object. Key-value pairs for the computer name and timestamp are added to each object. Then it is streamed to Firehose.

1. In a minute or two, navigate to the Amazon S3 bucket that you created in [Create the Amazon S3 Bucket](kaw-ds2s3-tutorial-step1.md#kaw-ds2s3-tutorial-step1.2) using the AWS Management Console. Be sure that you have chosen the correct Region on the console. 

   In that bucket, there is a folder for the current year. Open that folder to reveal a folder for the current month. Open that folder to reveal a folder for the current day. Open that folder to reveal a folder for the current hour (in UTC). Open that folder to reveal one or more items that start with the name `log-delivery-stream`.   
![\[Screenshot demonstrating browsing for the log records in Amazon S3.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/s3-view-log-stream.png)

1. Open the contents of the latest item to confirm that the log records have been successfully stored in Amazon S3 with the desired enhancements. If everything is configured correctly, the contents look similar to the following:

   ```
   {"Message":"Copasetic message 1","Severity":"Information","ComputerName":"EC2AMAZ-ABCDEFGH","DT":"2018-09-28 23:51:04"}
   {"Message":"Copasetic message 2","Severity":"Information","ComputerName":"EC2AMAZ-ABCDEFGH","DT":"2018-09-28 23:51:04"}
   {"Message":"Problem message 2","Severity":"Error","ComputerName":"EC2AMAZ-ABCDEFGH","DT":"2018-09-28 23:51:04"}
   {"Message":"Copasetic message 3","Severity":"Information","ComputerName":"EC2AMAZ-ABCDEFGH","DT":"2018-09-28 23:51:04"}
   ```

1. For information about resolving any of the following issues, see [Troubleshooting Amazon Kinesis Agent for Microsoft Windows](troubleshooting-kinesis-agent-windows.md):
   + The Kinesis Agent for Windows log file contains errors.
   + Expected folders or items in Amazon S3 do not exist.
   + The contents of an Amazon S3 item are incorrect.

## Next Steps


[Step 3: Query the Log Data in Amazon S3](kaw-ds2s3-tutorial-step3.md)

# Step 3: Query the Log Data in Amazon S3


In the final step of this Amazon Kinesis Agent for Microsoft Windows [tutorial](directory-source-to-s3-tutorial.md), you use Amazon Athena to query the log data stored in Amazon Simple Storage Service (Amazon S3).

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Choose the plus sign (**\$1**) in the Athena query window to create a new query window.  
![\[Screenshot demonstrating how to create a new query window in Athena.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/athena-new-query.png)

1. Enter the following text into the query window:

   ```
   CREATE DATABASE logdatabase
   
   CREATE EXTERNAL TABLE logs (
     Message string,
     Severity string,
     ComputerName string,
     DT timestamp
   )
   ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
   LOCATION 's3://bucket/year/month/day/hour/'
   
   SELECT * FROM logs
   SELECT * FROM logs WHERE severity = 'Error'
   ```

   Replace *`bucket`* with the name of the bucket that you created in [Create the Amazon S3 Bucket](kaw-ds2s3-tutorial-step1.md#kaw-ds2s3-tutorial-step1.2). Replace *`year`*, *`month`*, *`day`* and *`hour`* with the year, month, day, and hour when the Amazon S3 log file was created in UTC.

1. Select the text for the `CREATE DATABASE` statement, and then choose **Run query**. This creates the log database in Athena.

1. Select the text for the `CREATE EXTERNAL TABLE` statement, and then choose **Run query**. This creates an Athena table that references the S3 bucket with the log data, mapping the schema for the JSON to the schema for the Athena table.

1. Select the text for the first `SELECT` statement, and then choose **Run query**. This displays all the rows in the table.  
![\[Screenshot demonstrating querying log records using Athena.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/athena-first-select.png)

1. Select the text for the second `SELECT` statement, and then choose **Run query**. This displays only the rows in the table that represent log records with an `Error`-level severity. This kind of query finds interesting log records from a potentially large set of log records.  
![\[Screenshot demonstrating how to query for specific kinds of records in Athena.\]](http://docs.aws.amazon.com/kinesis-agent-windows/latest/userguide/images/athena-second-query.png)

## Next Steps


Use the AWS Management Console to clean up the resources created during the tutorial:

1. Terminate the EC2 instance (see step 3 in [Getting Started with Amazon EC2 Windows Instances](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EC2_GetStarted.html#ec2-connect-to-instance-windows)).
**Important**  
If you launched an instance that was not within the [AWS Free Tier](https://aws.amazon.com/free/), you are charged for the instance until you terminate it.

1. Delete the Firehose delivery stream.

   1. Open the Firehose console at [https://console.aws.amazon.com/firehose/](https://console.aws.amazon.com/firehose/).

   1. Choose the delivery stream that you created.

   1. Choose **Delete**.

   1. Choose **Delete delivery stream**.

1. Delete the S3 bucket. For instructions, see [How Do I Delete an S3 Bucket?](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-bucket.html) in the *Amazon Simple Storage Service User Guide*.

For more information, see the following topics:
+ [Configuring Amazon Kinesis Agent for Microsoft Windows](configuring-kinesis-agent-windows.md)
+ [What Is Amazon Kinesis Data Firehose?](https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html)
+ [What Is Amazon S3?](https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html)
+ [What is Amazon Athena?](https://docs.aws.amazon.com/athena/latest/ug/what-is.html)