Data collection - AWS Transform

Data collection

Discovery tool collection schedule

After your initial discovery collection, the discovery tool continues to run on this schedule:

  • VMware discovery – every hour

  • Hyper-V discovery – every hour

The discovery tool also collects OS metrics through the following independent modules, each with its own schedule:

  • Database discovery – once a day

  • Network metrics – every 15 seconds, might be less frequent for large environments

  • Server performance metrics – every 10 minutes

  • Storage performance metrics – every 10 minutes

  • Server provisioning data – daily

  • Storage provisioning data – daily

  • Network interfaces – daily

  • Running processes – hourly

You can independently start, stop, or trigger each OS metrics module by using Collect data now.

To manually run a collection, from the Actions menu choose:

  • Start – Enables the discovery module.

  • Stop – Disables the discovery module.

  • Collect data now – Starts discovery immediately. Use this option, for example, after you make a change in your network.

These actions apply per module. You can control OS metrics modules individually.

OS data collection attempts

When a new server is discovered, the discovery tool attempts each configured credential for each IP address and the hostname. After the discovery tool finds a valid credential, it continues to use that credential unless you add a new credential.

After a collection failure, the discovery tool attempts to collect networking data for a server after 3 minutes, 30 minutes, 2 hours, and then 6 hours. After 4 failed attempts, the discovery tool continues to try all configured credentials once every 6 hours.

Discovered inventory

After you configure a discovery source, the Number of discovered servers value in the Discovery tool status frame begins to increment. The discovery status for the configured source changes to Enabled in the Collection module frame. The inventory page shows servers from all configured sources: VMware VMs, Hyper-V VMs, and imported bare metal servers. Each server shows its source and collection status per module.

Navigate to the Discovered inventory page to see the servers that the discovery tool has found. From this page, choose Download inventory to download a ZIP file (discovery_tool_export.zip) that contains up to 28 days of collected data, including MPA files for all configured sources, performance utilization data, database information, and server-to-server communication information.

You can download the ZIP file while the discovery tool continues to work, and obtain partial results. Upload this file to Migration assessment to obtain a business case for migration.

Data points collected

The discovery tool gathers comprehensive data across VMware, Hyper-V, OS metrics, database, and network components. The following sections detail the specific data points collected for each component.

VMware data collection

This table describes the VMware virtual machine information collected by the discovery tool:

Name Type Category Sample Value
vm_name String VM Info "w2k22-snmpd-v2-en-us-mssql-2022-testcase4-1"
vm_id String VM Info "vm-30920"
vm_uuid String VM Info "4201ecf8-cc44-ee7e-01da-34dfb2acf6c0"
powerstate String VM Info "poweredOn"
host String VM Info "esxi-70-node1.testlab.local"
primary_ip_address String VM Info "192.168.0.52"
cpus Integer VM Info 2
memory Integer VM Info 4096
total_disk_capacity_mib Integer VM Info 32768
os_according_to_the_configuration_file String VM Info "Microsoft Windows Server 2016 or later (64-bit)"
max_cpu_usage_pct_dec Float VM Performance 79.33
avg_cpu_usage_pct_dec Float VM Performance 45.06
max_ram_usage_pct_dec Float VM Performance 63.99
avg_ram_utl_pct_dec Float VM Performance 29.27

Hyper-V data collection

This table describes the Hyper-V virtual machine information collected by the discovery tool:

Name Type Category Sample Value
vm_name String VM Info "win2022-hyperv-test-01"
vm_id String VM Info "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
powerstate String VM Info "Running"
cpus Integer VM Info 4
memory_mb Integer VM Info 8192
disk_paths String Disk "C:\\VMs\\disk1.vhdx"
disk_size_gb Float Disk 127.0
network_adapters String Network "00:15:5D:01:02:03"
ip_addresses String Network "10.0.1.50"
host_name String Host "hyperv-host-01.example.com"
host_os_version String Host "Windows Server 2022 Datacenter"
cluster_name String Host "FailoverCluster01"
hypervisor String VM Info "Hyper-V"

Bare metal data

Bare metal servers are not auto-discovered. They are imported through a CSV file. The discovery tool does not collect hypervisor-level data for bare metal servers. Instead, it collects database, network, and OS metrics data by using the OS credentials associated with each server during import.

Discovery tool's OS-related data

OS metrics data collection

The discovery tool collects OS-level metrics from servers through SSH (Linux) and WinRM (Windows). Data is collected across six sub-modules and exported into six CSV files.

Server inventory (server_inventory.csv)

Combines server provisioning (hardware and OS configuration) with aggregated storage performance. Collected every 24 hours.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
server_nameStringServer Info"web-server-01"
resource_typeStringServer Info"virtual_machine"
power_stateStringServer Info"Running"
os_typeStringServer Info"Linux"
os_nameStringServer Info"Amazon Linux"
os_versionStringServer Info"2023"
primary_hostnameStringServer Info"web-server-01.example.com"
primary_ip_addressStringServer Info"10.0.2.101"
netmaskStringServer Info"255.255.255.0"
total_num_network_cardsIntegerServer Info2
total_num_disksIntegerServer Info1
cpu_countIntegerServer Info4
total_memory_gbFloatServer Info15.88
server_uuidStringServer Info"4201ecf8-cc44-ee7e-01da-34dfb2acf6c0"
smbios_uuidStringServer Info"4201ecf8-cc44-ee7e-01da-34dfb2acf6c0"
cluster_nameStringServer Info"production-cluster-01"
hypervisor_object_idStringServer Info"vm-30920"
hypervisor_typeStringServer Info"VMware"
hypervisor_versionStringServer Info"8.0.0"
hypervisor_hostnameStringServer Info"esxi-node1.example.com"
hypervisor_host_idStringServer Info"host-1234"
hypervisor_idStringServer Info"4201ecf8-cc44-ee7e-01da-34dfb2acf6c0"
disk_read_iops_avgFloatStorage Performance12.5
disk_read_iops_peakFloatStorage Performance245.0
disk_write_iops_avgFloatStorage Performance8.3
disk_write_iops_peakFloatStorage Performance180.0
disk_total_iops_avgFloatStorage Performance20.8
disk_total_iops_peakFloatStorage Performance425.0
disk_read_throughput_avg_mbpsFloatStorage Performance1.2
disk_read_throughput_peak_mbpsFloatStorage Performance24.5
disk_write_throughput_avg_mbpsFloatStorage Performance0.8
disk_write_throughput_peak_mbpsFloatStorage Performance18.0
disk_total_throughput_avg_mbpsFloatStorage Performance2.0
disk_total_throughput_peak_mbpsFloatStorage Performance42.5

Server performance metrics (server_performance_metrics.csv)

CPU, memory, and network throughput utilization. Sampled every 10 minutes, aggregated over 28 days.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
data_sourceStringServer Info"OS"
cpu_utilization_avg_pctFloatCPU45.06
cpu_utilization_peak_pctFloatCPU79.33
cpu_countIntegerCPU4
memory_total_gbFloatMemory15.88
memory_utilization_avg_pctFloatMemory29.27
memory_utilization_peak_pctFloatMemory63.99
network_in_avg_mbpsFloatNetwork0.52
network_in_peak_mbpsFloatNetwork12.3
network_out_avg_mbpsFloatNetwork0.31
network_out_peak_mbpsFloatNetwork8.7
network_total_avg_mbpsFloatNetwork0.83
network_total_peak_mbpsFloatNetwork21.0

Storage performance (server_storage_performance.csv)

Per-volume disk I/O and space utilization. Sampled every 10 minutes, aggregated over 28 days.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
data_sourceStringServer Info"OS"
disk_volume_idStringVolume Info"/dev/nvme0n1p1"
disk_mount_pointStringVolume Info"/"
file_systemStringVolume Info"xfs"
disk_total_gbFloatDisk Space30.0
disk_used_gbFloatDisk Space12.5
disk_free_gbFloatDisk Space17.5
disk_read_iops_avgFloatDisk I/O12.5
disk_read_iops_peakFloatDisk I/O245.0
disk_write_iops_avgFloatDisk I/O8.3
disk_write_iops_peakFloatDisk I/O180.0
disk_total_iops_avgFloatDisk I/O20.8
disk_total_iops_peakFloatDisk I/O425.0
disk_read_throughput_avg_mbpsFloatDisk Throughput1.2
disk_read_throughput_peak_mbpsFloatDisk Throughput24.5
disk_write_throughput_avg_mbpsFloatDisk Throughput0.8
disk_write_throughput_peak_mbpsFloatDisk Throughput18.0
disk_total_throughput_avg_mbpsFloatDisk Throughput2.0
disk_total_throughput_peak_mbpsFloatDisk Throughput42.5

Storage configuration (storage_config.csv)

Physical disk hardware details. Collected every 24 hours.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
disk_controller_idStringDisk Info"/dev/sda"
vmdk_vhd_file_nameStringDisk Info"web-server-01.vmdk"
disk_volume_typeStringDisk Info"Virtual"
disk_provisioned_gbFloatDisk Info30.0
disk_device_typeStringDisk Info"SCSI HDD"
disk_interface_typeStringDisk Info"SCSI"
disk_protocolStringDisk Info"LSI Logic SAS"

Network interfaces (network_interfaces.csv)

Network adapter configuration. Collected every 24 hours.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
interface_nameStringInterface Info"eth0"
interface_indexIntegerInterface Info2
mac_addressStringInterface Info"0A:1B:2C:3D:4E:5F"
adapter_typeStringInterface Info"vmxnet3"
virtual_network_nameStringInterface Info"VM Network"
virtual_network_idStringInterface Info"dvportgroup-1234"
virtual_switchStringInterface Info"vSwitch0"
ipv4_addressStringIP Config"10.0.2.101"
ipv4_subnet_maskStringIP Config"255.255.255.0"
ipv4_gatewayStringIP Config"10.0.2.1"
ipv6_addressStringIP Config"fe80::a1b:2cff:fe3d:4e5f"
ipv6_prefix_lengthIntegerIP Config64
ipv6_gatewayStringIP Config"fe80::1"
dns_serversStringIP Config"10.0.0.2"
dhcp_enabledBooleanIP Configfalse
interface_statusStringInterface Info"Up"
vlan_idIntegerInterface Info100
is_primaryBooleanInterface Infotrue

Running processes (process_metrics.csv)

Snapshot of running processes. Collected every hour, deduplicated over 28 days.

Name Type Category Sample Value
server_idStringServer Info"vm-web-server-01"
process_nameStringProcess Info"sshd"
process_idIntegerProcess Info1234
process_command_lineStringProcess Info"/usr/sbin/sshd -D"
process_userStringProcess Info"root"

Network collection

The Network collection module helps you discover dependencies among servers in your on-premises data center. This network data accelerates your migration planning by providing visibility into how applications communicate across servers.

This module collects network data for servers from all configured sources, including VMware, Hyper-V, and bare metal. It uses WinRM to collect data from Windows servers and uses SSH, SNMPv2, and SNMPv3 to collect data from Linux servers.

Network data collection

The Network collection module captures TCP IPv4 connections in ESTABLISHED or TIME_WAIT state. These data points are collected:

  • Source IP, port, process ID, and process name

  • Target IP, port, process ID, and process name

  • State (ESTABLISHED and TIME_WAIT)

  • Transport protocol (TCP)

  • IP version (IPv4)

  • Count (number of times this unique connection was observed)

Database collection

The Database collection module gathers database (SQL Server) information from Windows servers across all configured sources, including VMware, Hyper-V, and bare metal. The module uses the WinRM protocol to remotely connect to each Windows server and run PowerShell queries to get information about all installed SQL Server services (components) on the server by using WMI namespaces, registry, and file properties.

A SQL Server component is a specific service or feature instance installed as part of a SQL Server deployment on a Windows server. The discovery tool collects Database Engine, Analysis Services, Reporting Services, and Integration Services.

Database data collection

The Database collection module gathers SQL Server component information. This table describes key database data points collected:

Name Type Category Sample Value
Engine Type String Component sql_server
Is Engine Component Boolean Component Y
Status String Service Running, Stopped, StartPending
Version String Service 2015.131.5026.0
Edition String Service Developer Edition (64-bit)
SQL Service Name String Service MsDtsServer130, Mssql
SQL Service Type String Service SQL Server service, Integration Services service
Instance Name String Instance MSSQLSERVER
Display Name String Service SQL Server (MSSQLSERVER2017)
Start Mode String Service Automatic, Manual, Disabled
Service Account Name String Service NT Service/MsDtsServer130
Is Clustered Boolean Configuration N
Note

Full format includes all service types. MPA format includes only database engine components. Not all fields are available depending on the SQL service type and configuration.