

AWS Application Discovery Service is no longer open to new customers. Alternatively, use AWS Transform which provides similar capabilities. For more information, see [AWS Application Discovery Service availability change](https://docs.aws.amazon.com/application-discovery/latest/userguide/application-discovery-service-availability-change.html).

# Exploring data in Amazon Athena
<a name="explore-data"></a>

Data exploration in Amazon Athena allows you to analyze the data that's collected from all the discovered on-premises servers by Discovery Agent in one place. Once Data exploration in Amazon Athena is enabled from the Migration Hub console (or by using the StartContinousExport API) and the data collection for agents is turned on, data that's collected by agents is automatically get stored in your S3 bucket at regular intervals. For more information, see [Exploring data in Amazon Athena](#explore-data).

Data exploration in Amazon Athena allows you to analyze the data that's collected from all the discovered on-premises servers by Discovery Agents in one place. Once data exploration in Amazon Athena is enabled from the Migration Hub console (or by using the StartContinousExport API) and the data collection for agents is turned on, data that's collected by agents is automatically get stored in your S3 bucket at regular intervals.

You can then visit Amazon Athena to run pre-defined queries to analyze the time-series system performance for each server, the type of processes that are running on each server and the network dependencies between different servers. In addition, you can write your own custom queries using Amazon Athena, upload additional existing data sources such as configuration management database (CMDB) exports, and associate the discovered servers with the actual business applications. You can also integrate the Athena database with Amazon Quick to visualize the query outputs and perform additional analysis.

The topics in this section describe the ways that you can work with your data in Athena to assess and plan for migrating your local environment to AWS.

# Turning on data exploration in Amazon Athena
<a name="ce-prep-agents"></a>

Data exploration in Amazon Athena is enabled by turning on Continuous Export using the Migration Hub console or an API call from the AWS CLI. You must turn on data exploration before you can see and start exploring your discovered data in Amazon Athena. 

When you turn on Continuous Export the service-linked role `AWSServiceRoleForApplicationDiscoveryServiceContinuousExport` is automatically used by your account. For more information about this service-linked role, see [Service-linked role permissions for Application Discovery Service](service-linked-role-permissions.md). 

The following instructions show how to turn on data exploration in Amazon Athena by using the console and the AWS CLI.

------
#### [ Turn on with the console ]

Data exploration in Amazon Athena is enabled by Continuous Export implicitly being turned on when you choose "Start data collection", or click the toggle labeled, "Data exploration in Amazon Athena" on the **Data Collectors** page of the Migration Hub console.

**To turn on data exploration in Amazon Athena from the console**

1. In the navigation pane, choose **Data Collectors**.

1. Choose the **Agents** tab.

1. Choose **Start data collection**, or if you already have data collection turned on, click the **Data exploration in Amazon Athena** toggle.

1. In the dialog box generated from the previous step, click the checkbox agreeing to associated costs and choose **Continue** or **Enable**.

**Note**  
Your agents are now running in "continuous export" mode which will enable you to see and work with your discovered data in Amazon Athena. The first time this is enable it may take up to 30 minutes for your data to appear in Amazon Athena.

------
#### [ Enable with the AWS CLI ]

Data exploration in Amazon Athena is enabled by Continuous Export explicitly being turned on through an API call from the AWS CLI. To do this, the AWS CLI must first be installed in your environment.

**To install the AWS CLI and turn on data exploration in Amazon Athena**

1. Install the AWS CLI for your operating system (Linux, macOS, or Windows). See the [AWS Command Line Interface User Guide](https://docs.aws.amazon.com/cli/latest/userguide/) for instructions.

1. Open the Command prompt (Windows) or Terminal (Linux or macOS).

   1. Type `aws configure` and press Enter.

   1. Enter your AWS Access Key Id and AWS Secret Access Key.

   1. Enter `us-west-2` for the Default Region Name.

   1. Enter `text` for Default Output Format.

1. Type the following command:

   ```
   aws discovery start-continuous-export
   ```

**Note**  
Your agents are now running in "continuous export" mode which will enable you to see and work with your discovered data in Amazon Athena. The first time this is enable it may take up to 30 minutes for your data to appear in Amazon Athena.

------

# Exploring data directly in Amazon Athena
<a name="explore-direct-in-ate"></a>

After you turn on data exploration in Amazon Athena, you can begin exploring and working with detailed current data that was discovered by your agents by querying the data directly in Athena. You can use the data to generate spreadsheets, run a cost analysis, port the query to a visualization program to diagram network dependencies, and more.

The following instructions explain how to explore your agent data directly in the Athena console. If you don’t have any data in Athena or have not enabled data exploration in Amazon Athena, you will be prompted by a dialog box to enable data exploration in Amazon Athena , as explained in [Turning on data exploration in Amazon AthenaTurning on data exploration](ce-prep-agents.md).

**To explore agent-discovered data directly in Athena**

1. In the AWS Migration Hub console, choose **Servers** in the navigation pane.

1. To open the Amazon Athena console, choose **Explore data in Amazon Athena**. 

1. On the **Query Editor** page, in the navigation pane under **Database**, make sure that **application\$1discovery\$1service\$1database** is selected.
**Note**  
Under **Tables** the following tables represent the datasets grouped by the agents.  
**os\$1info\$1agent**
**network\$1interface\$1agent**
**sys\$1performance\$1agent**
**processes\$1agent**
**inbound\$1connection\$1agent**
**outbound\$1connection\$1agent**
**id\$1mapping\$1agent**

1. Query the data in the Amazon Athena console by writing and running SQL queries in the Athena Query Editor. For example, you can use the following query to see all of the discovered server IP addresses. 

   ```
   SELECT * FROM network_interface_agent;
   ```

   For more example queries, see [Using predefined queries in Amazon AthenaUsing predefined queries](predefined-queries.md).

# Visualizing Amazon Athena data
<a name="port-query-to-visualization"></a>

To visualize your data, a query can be ported to a visualization program such as Amazon Quick or other open-source visualization tools such as Cytoscape, yEd, or Gelphi. Use these tools to render network diagrams, summary charts, and other graphical representations. When this method is used, you connect to Athena through the visualization program so that it can access your collected data as a source to produce the visualization.

**To visualize your Amazon Athena data using Quick**

1. Sign in to [Amazon Quick](https://aws.amazon.com/quicksight/).

1. Choose **Connect to another data source or upload a file**.

1. Choose **Athena**. The **New Athena data source** dialog box displays.

1. Enter a name in the **Data source name** field.

1. Choose **Create data source**.

1. Select the **Agents-servers-os** table in the **Choose your table** dialog box and choose **Select**.

1. In the **Finish dataset creation** dialog box, select **Import to SPICE for quicker analytics**, and choose **Visualize**.

   Your visualization is rendered.

# Using predefined queries in Amazon Athena
<a name="predefined-queries"></a>

This section contains a set of predefined queries that perform typical use cases, such as TCO analysis and network visualization. You can use these queries as is or modify them to suit your needs.

**To use a predefined query**

1. In the AWS Migration Hub console, choose **Servers** in the navigation pane.

1. To open the Amazon Athena console, choose **Explore data in Amazon Athena**. 

1. On the **Query Editor** page, in the navigation pane under **Database**, make sure that **application\$1discovery\$1service\$1database** is selected.

1. Choose the plus (**\$1**) sign in the Query Editor to create a tab for a new query.

1. Copy one of the queries from [Predefined queries](#pq-query-examples).

1. Paste the query into the query pane of the new query tab you just created.

1. Choose **Run Query**.

## Predefined queries
<a name="pq-query-examples"></a>

Choose a title to see information about the query.

### Obtain IP addresses and hostnames for servers
<a name="pq-helper-function"></a>

This view helper function retrieves IP addresses and hostnames for a given server. You can use this view in other queries. For information about how to create a view, see [CREATE VIEW](https://docs.aws.amazon.com/athena/latest/ug/create-view.html) in the *Amazon Athena User Guide*.

```
CREATE OR REPLACE VIEW hostname_ip_helper AS 
SELECT DISTINCT
  "os"."host_name"
, "nic"."agent_id"
, "nic"."ip_address"
FROM
  os_info_agent os
, network_interface_agent nic
WHERE ("os"."agent_id" = "nic"."agent_id");
```

### Identify servers with or without agents
<a name="pq-agents-installed-or-not"></a>

This query can help you perform data validation. If you've deployed agents on a number of servers in your network, you can use this query to understand if there are other servers in your network without agents deployed on them. In this query, we look into the inbound and outbound network traffic, and filter the traffic for private IP addresses only. That is, IP addresses starting with `192`, `10`, or `172`.

```
SELECT DISTINCT "destination_ip" "IP Address" ,
         (CASE
    WHEN (
    (SELECT "count"(*)
    FROM network_interface_agent
    WHERE ("ip_address" = "destination_ip") ) = 0) THEN
        'no'
        WHEN (
        (SELECT "count"(*)
        FROM network_interface_agent
        WHERE ("ip_address" = "destination_ip") ) > 0) THEN
            'yes' END) "agent_running"
    FROM outbound_connection_agent
WHERE ((("destination_ip" LIKE '192.%')
        OR ("destination_ip" LIKE '10.%'))
        OR ("destination_ip" LIKE '172.%'))
UNION
SELECT DISTINCT "source_ip" "IP ADDRESS" ,
         (CASE
    WHEN (
    (SELECT "count"(*)
    FROM network_interface_agent
    WHERE ("ip_address" = "source_ip") ) = 0) THEN
        'no'
        WHEN (
        (SELECT "count"(*)
        FROM network_interface_agent
        WHERE ("ip_address" = "source_ip") ) > 0) THEN
            'yes' END) "agent_running"
    FROM inbound_connection_agent
WHERE ((("source_ip" LIKE '192.%')
        OR ("source_ip" LIKE '10.%'))
        OR ("source_ip" LIKE '172.%'));
```

### Analyze system performance data for servers with agents
<a name="pq-agents-server-performance"></a>

You can use this query to analyze system performance and utilization pattern data for your on-premises servers that have agents installed on them. The query combines the `system_performance_agent` table with the `os_info_agent` table to identify the hostname for each server. This query returns the time series utilization data (in 15 minute intervals) for all the servers where agents are running.

```
SELECT "OS"."os_name" "OS Name" ,
    "OS"."os_version" "OS Version" ,
    "OS"."host_name" "Host Name" ,
     "SP"."agent_id" ,
     "SP"."total_num_cores" "Number of Cores" ,
     "SP"."total_num_cpus" "Number of CPU" ,
     "SP"."total_cpu_usage_pct" "CPU Percentage" ,
     "SP"."total_disk_size_in_gb" "Total Storage (GB)" ,
     "SP"."total_disk_free_size_in_gb" "Free Storage (GB)" ,
     ("SP"."total_disk_size_in_gb" - "SP"."total_disk_free_size_in_gb") "Used Storage" ,
     "SP"."total_ram_in_mb" "Total RAM (MB)" ,
     ("SP"."total_ram_in_mb" - "SP"."free_ram_in_mb") "Used RAM (MB)" ,
     "SP"."free_ram_in_mb" "Free RAM (MB)" ,
     "SP"."total_disk_read_ops_per_sec" "Disk Read IOPS" ,
     "SP"."total_disk_bytes_written_per_sec_in_kbps" "Disk Write IOPS" ,
     "SP"."total_network_bytes_read_per_sec_in_kbps" "Network Reads (kbps)" ,
     "SP"."total_network_bytes_written_per_sec_in_kbps" "Network Write (kbps)"
FROM "sys_performance_agent" "SP" , "OS_INFO_agent" "OS"
WHERE ("SP"."agent_id" = "OS"."agent_id") limit 10;
```

### Track outbound communication between servers based on port number and process details
<a name="pq-analyze-outbound-connections"></a>

This query gets the details on the outbound traffic for each service, along with the port number and process details. 

Before running the query, if you have not already done so, you must create the `iana_service_ports_import` table that contains the IANA port registry database downloaded from IANA. For information about how to create this table, see [Creating the IANA port registry import table](#pq-create-iana-import-table).

After the `iana_service_ports_import` table is created, create two view helper functions for tracking outbound traffic. For information about how to create a view, see [CREATE VIEW](https://docs.aws.amazon.com/athena/latest/ug/create-view.html) in the *Amazon Athena User Guide*. 

**To create outbound tracking helper functions**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Create the `valid_outbound_ips_helper` view, using the following helper function that lists all distinct outbound destination IP addresses.

   ```
   CREATE OR REPLACE VIEW valid_outbound_ips_helper AS 
   SELECT DISTINCT "destination_ip"
   FROM outbound_connection_agent;
   ```

1. Create the `outbound_query_helper` view, using the following helper function that determines the frequency of communication for outbound traffic.

   ```
   CREATE OR REPLACE VIEW outbound_query_helper AS
   SELECT "agent_id" ,
            "source_ip" ,
            "destination_ip" ,
            "destination_port" ,
            "agent_assigned_process_id" ,
            "count"(*) "frequency"
   FROM outbound_connection_agent
   WHERE (("ip_version" = 'IPv4')
           AND ("destination_ip" IN 
       (SELECT *
       FROM valid_outbound_ips_helper )))
   GROUP BY  "agent_id", "source_ip", "destination_ip", "destination_port", "agent_assigned_process_id";
   ```

1. After you create the `iana_service_ports_import` table and your two helper functions, you can run the following query to get the details on the outbound traffic for each service, along with the port number and process details.

   ```
   SELECT hip1.host_name "Source Host Name",
            outbound_connections_results0.source_ip "Source IP Address",
            hip2.host_name "Destination Host Name",
            outbound_connections_results0.destination_ip "Destination IP Address",
            outbound_connections_results0.frequency "Connection Frequency",
            outbound_connections_results0.destination_port "Destination Communication Port",
            outbound_connections_results0.servicename "Process Service Name",
            outbound_connections_results0.description "Process Service Description"
   FROM 
       (SELECT DISTINCT o.source_ip,
            o.destination_ip,
            o.frequency,
            o.destination_port,
            ianap.servicename,
            ianap.description
       FROM outbound_query_helper o, iana_service_ports_import ianap
       WHERE o.destination_port = TRY_CAST(ianap.portnumber AS integer)) AS outbound_connections_results0 LEFT OUTER
   JOIN hostname_ip_helper hip1
       ON outbound_connections_results0.source_ip = hip1.ip_address LEFT OUTER
   JOIN hostname_ip_helper hip2
       ON outbound_connections_results0.destination_ip = hip2.ip_address
   ```

### Track inbound communication between servers based on port number and process details
<a name="pq-analyze-inbound-connections"></a>

This query gets information about inbound traffic for each service, along with the port number and process details.

Before running this query, if you have not already done so, you must create the `iana_service_ports_import` table that contains the IANA port registry database downloaded from IANA. For information about how to create this table, see [Creating the IANA port registry import table](#pq-create-iana-import-table).

After the `iana_service_ports_import` table is created, create two view helper functions for tracking inbound traffic. For information about how to create a view, see [CREATE VIEW](https://docs.aws.amazon.com/athena/latest/ug/create-view.html) in the *Amazon Athena User Guide*. 

**To create import tracking helper functions**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Create the `valid_inbound_ips_helper` view, using the following helper function that lists all distinct inbound source IP addresses.

   ```
   CREATE OR REPLACE VIEW valid_inbound_ips_helper AS 
   SELECT DISTINCT "source_ip"
   FROM inbound_connection_agent;
   ```

1. Create the `inbound_query_helper` view, using the following helper function that determines the frequency of communication for inbound traffic.

   ```
   CREATE OR REPLACE VIEW inbound_query_helper AS 
   SELECT "agent_id" ,
            "source_ip" ,
            "destination_ip" ,
            "destination_port" ,
            "agent_assigned_process_id" ,
            "count"(*) "frequency"
   FROM inbound_connection_agent
   WHERE (("ip_version" = 'IPv4')
           AND ("source_ip" IN 
       (SELECT *
       FROM valid_inbound_ips_helper )))
   GROUP BY  "agent_id", "source_ip", "destination_ip", "destination_port", "agent_assigned_process_id";
   ```

1. After you create the `iana_service_ports_import` table and your two helper functions, you can run the following query to get the details on the inbound traffic for each service, along with the port number and process details.

   ```
   SELECT hip1.host_name "Source Host Name",
            inbound_connections_results0.source_ip "Source IP Address",
            hip2.host_name "Destination Host Name",
            inbound_connections_results0.destination_ip "Destination IP Address",
            inbound_connections_results0.frequency "Connection Frequency",
            inbound_connections_results0.destination_port "Destination Communication Port",
            inbound_connections_results0.servicename "Process Service Name",
            inbound_connections_results0.description "Process Service Description"
   FROM 
       (SELECT DISTINCT i.source_ip,
            i.destination_ip,
            i.frequency,
            i.destination_port,
            ianap.servicename,
            ianap.description
       FROM inbound_query_helper i, iana_service_ports_import ianap
       WHERE i.destination_port = TRY_CAST(ianap.portnumber AS integer)) AS inbound_connections_results0 LEFT OUTER
   JOIN hostname_ip_helper hip1
       ON inbound_connections_results0.source_ip = hip1.ip_address LEFT OUTER
   JOIN hostname_ip_helper hip2
       ON inbound_connections_results0.destination_ip = hip2.ip_address
   ```

### Identify running software from port number
<a name="pq-identify-software"></a>

This query identifies the running software based on port numbers.

Before running this query, if you have not already done so, you must create the `iana_service_ports_import` table that contains the IANA port registry database downloaded from IANA. For information about how to create this table, see [Creating the IANA port registry import table](#pq-create-iana-import-table).

Run the following query to identify the running software based on port numbers.

```
SELECT o.host_name "Host Name",
       ianap.servicename "Service",
       ianap.description "Description",
       con.destination_port,
       con.cnt_dest_port "Destination Port Count"
FROM   (SELECT agent_id,
               destination_ip,
               destination_port,
               Count(destination_port) cnt_dest_port
        FROM   inbound_connection_agent
        GROUP  BY agent_id,
                  destination_ip,
                  destination_port) con,
       (SELECT agent_id,
               host_name,
               Max("timestamp")
        FROM   os_info_agent
        GROUP  BY agent_id,
                  host_name) o,
       iana_service_ports_import ianap
WHERE  ianap.transportprotocol = 'tcp'
       AND con.destination_ip NOT LIKE '172%'
       AND con.destination_port = ianap.portnumber
       AND con.agent_id = o.agent_id
ORDER BY cnt_dest_port DESC;
```

## Creating the IANA port registry import table
<a name="pq-create-iana-import-table"></a>

Some of the predefined queries require a table named `iana_service_ports_import` that contains information downloaded from Internet Assigned Numbers Authority (IANA).

**To create the iana\$1service\$1ports\$1import table**

1. Download the IANA port registry database **CSV** file from [Service Name and Transport Protocol Port Number Registry](https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml) on *iana.org*.

1. Upload the file to Amazon S3. For more information, see [How Do I Upload Files and Folders to an S3 Bucket?](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html).

1. Create a new table in Athena named `iana_service_ports_import`. For instructions, see [Create a Table](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html#step-2-create-a-table) in the *Amazon Athena User Guide*. In the following example, you need to replace `my_bucket_name` with the name of the S3 bucket that you uploaded the CSV file to in the previous step.

   ```
   CREATE EXTERNAL TABLE IF NOT EXISTS iana_service_ports_import (
            ServiceName STRING,
            PortNumber INT,
            TransportProtocol STRING,
            Description STRING,
            Assignee STRING,
            Contact STRING,
            RegistrationDate STRING,
            ModificationDate STRING,
            Reference STRING,
            ServiceCode STRING,
            UnauthorizedUseReported STRING,
            AssignmentNotes STRING
   )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES (
     'serialization.format' = ',',
     'quoteChar' = '"',
     'field.delim' = ','
   ) LOCATION 's3://my_bucket_name/'
   TBLPROPERTIES ('has_encrypted_data'='false',"skip.header.line.count"="1");
   ```