# Log management
<a name="LogManagement"></a>

CloudWatch Logs provides advanced log management capabilities that help you organize, transform, and analyze your log data more effectively. These features include [cross-account and cross-region data centralization](CloudWatchLogs_Centralization.md), [automatic data discovery and schema management,](data-source-discovery-management.md) [log transformation during ingestions](CloudWatch-Logs-Transformation.md), and [enhanced analytics with facets for interactive log exploration](CloudWatchLogs-Facets.md).

**Topics**
+ [

# Data source discovery and management
](data-source-discovery-management.md)
+ [

# Features enabled by data sources
](features-enabled-by-data-sources.md)
+ [

# Supported AWS services for data sources
](supported-aws-services-data-sources.md)
+ [

# Supported third-party sources for data sources
](supported-third-party-sources-data-sources.md)

# Data source discovery and management
<a name="data-source-discovery-management"></a>

CloudWatch Logs automatically discovers and categorizes your log data by data source and type, making it easier to understand and manage your logs at scale. This feature provides schema discovery for AWS vended sources such as Amazon VPC Flow Logs, CloudTrail, and Route 53, as well as third-party security tools.

The Logs Management console provides a high-level view of your logs organized by data source and type, rather than just log groups. This organization helps you:
+ View logs categorized by AWS services, third-party sources (such as Okta or CrowdStrike), and custom sources
+ Understand the schema and structure of your log data automatically
+ Create field index policies based on discovered schema fields
+ Manage logs more efficiently across different data sources
+ Query logs by different data sources

When you [enable CloudWatch Logs logging for supported AWS services](AWS-logs-and-resource-policy.md), CloudWatch Logs automatically applies the appropriate schema to your logs. This automatic schema application helps maintain consistency and provides immediate insights into your log structure.

## What is CloudWatch Logs Data Sources?
<a name="what-is-cloudwatch-data-sources"></a>

CloudWatch Logs Data Sources is a feature that provides a new way to organize and categorize your logs data based on the source that generates the logs. While CloudWatch Logs traditionally uses log groups to organize logs, Data Sources offers an additional layer of organization that groups logs by their originating service and log type.

### How Data Sources work
<a name="how-data-sources-work"></a>

Data Sources provide service-based log organization and simplified discovery across your AWS infrastructure. You can easily locate logs from specific services and filter by log type without needing to know individual log group names or structures.

For third-party sources and optionally for application logs sources, Data Sources work with CloudWatch pipelines to categorize your logs. When you configure a pipeline to ingest and transform logs, you specify the data source name and type. CloudWatch Logs then automatically categorizes all logs that the pipeline processes. For more information, see [CloudWatch pipelines](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-pipelines.html) in the *Amazon CloudWatch User Guide*.

 Data Sources categorize logs using two key identifiers:
+ **Data Source Name**: The AWS service, third-party source, or application that generates the logs (for example, Route 53, Amazon VPC, CloudTrail, Okta SSO, or CrowdStrike Falcon).
+ **Data Source Type**: The specific type of log generated by that service. 

A schema defines the structure of log data, including what fields are present and how information is organized. A single data source can produce multiple types of logs with different schemas and purposes. For example, the AWS CloudTrail data source has two types: management events (which track control plane operations like creating or deleting resources) and data events (which track data plane operations like S3 object access). Each type has a different schema because they capture different kinds of information.

## How to get started
<a name="how-to-get-started-data-sources"></a>

CloudWatch Logs categorizes your logs into data sources based on their origin. The method depends on the type of logs you're working with:

### AWS service logs
<a name="aws-service-logs"></a>

Logs from [supported AWS services](supported-aws-services-data-sources.md) are automatically grouped by data source without any configuration required. CloudWatch Logs recognizes these logs and applies the appropriate data source name and type based on the originating service.

### Third-party logs
<a name="third-party-logs"></a>

Third-party logs require pipelines for data source categorization. When you configure a pipeline to ingest logs from supported third-party sources such as Microsoft Office 365, Okta, CrowdStrike, or Palo Alto Networks, you specify the [data source name and type](supported-third-party-sources-data-sources.md) in the pipeline configuration. CloudWatch Logs automatically categorizes all logs that the pipeline processes using those identifiers. 

Pipelines can optionally transform third-party logs into Open Cybersecurity Schema Framework (OCSF) format for standardized security event analysis. When OCSF transformation is enabled, the data source name and type are automatically determined based on the OCSF schema mapping. Without OCSF transformation, you specify the data source name and type in the pipeline configuration.

### Application logs
<a name="application-logs"></a>

For custom application logs, you can categorize them by data source using one of these methods:
+ **Log group tags** - Add tags to your log groups using the keys `cw:datasource:name` and `cw:datasource:type` to specify the data source name and type respectively for all logs ingested in the log group. Tag values can be up to 64 characters and may contain only lower case letters, numbers and underscore. They must start with either a letter or a number and they may not contain double underscores(\$1\$1) .
+ **Pipeline configuration** - Configure data source information through log processing pipelines when ingesting your application logs.

**Note**  
Data source names cannot start with "aws" or "amazon" to avoid conflicts with AWS service logs.

## System fields
<a name="system-fields"></a>

CloudWatch Logs automatically adds three system fields to logs that are categorized by data source. These fields serve as default facets:
+ `@data_source_name` - Contains the name of the data source, or "Unknown" if not determined
+ `@data_source_type` - Contains the type of the data source, or "Unknown" if not determined
+ `@data_format` - Indicates the format of the log data

When the data source name or type cannot be determined, these fields are set to "Unknown". Data sources with "Unknown" values are still visible in facets and in the data sources table under "Log Management" in Console, allowing you to identify uncategorized logs and from which log group they come from.

The `@data_format` field can contain one of the following values:
+ `Default` - Logs ingested without modification.
+ `Custom` - Logs processed through pipeline processors or logs ingested into log group with data source name/type tags.
+ `OCSF-<version>` - Logs processed with OCSF (Open Cybersecurity Schema Framework) processors in pipelines.
+ `AWS-OTEL-LOG-V<version>` - OpenTelemetry logs ingested through the CloudWatch OTLP endpoint.
+ `AWS-OTEL-TRACE-V<version>` - OpenTelemetry traces ingested through the CloudWatch OTLP endpoint.

These system fields enable you to filter and query your logs based on their source and format, making it easier to work with logs from different origins and processing pipelines.

## Accessing Data Sources
<a name="accessing-data-sources"></a>

### Console
<a name="console-access"></a>

In the CloudWatch Logs console, you use the **Log Management** tab to access your data sources. CloudWatch Logs automatically consolidates your log data by data sources and types, continuously discovering newly ingested data. From the data sources list, you can create pipelines, define field indexes and facets.

### AWS CLI
<a name="aws-cli-access"></a>

Use the following command to list distinct data sources and types of logs in your account:

```
aws logs list-aggregate-log-group-summaries --group-by DATA_SOURCE_NAME_AND_TYPE
```

## Relationship to log groups
<a name="relationship-to-log-groups"></a>

Data sources complement rather than replace log groups. Your logs continue to be stored in log groups as before, but now they're also automatically tagged with data source information. This dual organization allows you to:
+ Use log groups for fine-grained access control and retention policies
+ Use data sources for service-based log discovery and analysis
+ Query logs using either organizational method depending on your needs

Data sources make it easier to work with logs at scale by providing a service-centric view of your log data across your AWS infrastructure.

# Features enabled by data sources
<a name="features-enabled-by-data-sources"></a>

Data sources enable advanced log processing and analytics capabilities through field discovery and consistent data structures.
+ **Facets**: Facets are indexed log fields that provide interactive filtering and analysis without writing queries. CloudWatch Logs automatically creates facets for data source name and type, and you can create facet policies on discovered fields to accelerate troubleshooting. Facets display value distributions and counts in CloudWatch Logs Insights, making it easy to identify patterns through point-and-click exploration.
+ **Pipelines**: Create transformation pipelines that apply to all logs from a specific data source name and type. This allows you to define consistent processing rules for logs from the same source.
+ **Field discovery**: CloudWatch Logs automatically discovers fields and their data types for each data source name and type combination based on pipeline processors. For AWS managed logs, field structures are predefined. For application logs, we recommend maintaining consistent log formats to maximize compatibility with analytics tools such as Amazon S3 tables that require well-defined field structures.

You can view the complete list of fields and their types for any data source using the `GetLogFields` API:

```
aws logs get-log-fields --data-source-name <name>  --data-source-type <type>
```

This field discovery and consistency enables advanced analytics and integrations, as external tools can work with predictable field structures when processing your log data.

# Supported AWS services for data sources
<a name="supported-aws-services-data-sources"></a>

The following table lists the AWS services that are automatically categorized by CloudWatch Logs as data sources:


| Data Source Name (@data\$1source\$1name field) | Data Source Type (@data\$1source\$1type field) | 
| --- | --- | 
| amazon\$1api\$1gateway | access | 
| amazon\$1bedrock\$1agentcore | browser\$1usage | 
| amazon\$1bedrock\$1agentcore | code\$1interpreter\$1application | 
| amazon\$1bedrock\$1agentcore | code\$1interpreter\$1usage | 
| amazon\$1bedrock\$1agentcore | gateway\$1application | 
| amazon\$1bedrock\$1agentcore | identity\$1workload\$1application | 
| amazon\$1bedrock\$1agentcore | memory\$1application | 
| amazon\$1bedrock\$1agentcore | online\$1evaluation\$1config | 
| amazon\$1bedrock\$1agentcore | runtime\$1application | 
| amazon\$1bedrock\$1agentcore | runtime\$1usage | 
| amazon\$1bedrock\$1agents | application | 
| amazon\$1bedrock\$1agents | event | 
| amazon\$1bedrock\$1knowledge\$1bases | application | 
| amazon\$1cloudfront | access | 
| amazon\$1cloudfront | connection | 
| amazon\$1cloudwatch | rum\$1app\$1monitor | 
| amazon\$1cognito | user\$1pool | 
| amazon\$1ec2 | verified\$1access | 
| amazon\$1eks | api\$1server | 
| amazon\$1eks | audit | 
| amazon\$1eks | authenticator | 
| amazon\$1eks | controller\$1manager | 
| amazon\$1eks | scheduler | 
| amazon\$1elasticache | cluster | 
| amazon\$1eventbridge | eventbus\$1error | 
| amazon\$1eventbridge | eventbus\$1info | 
| amazon\$1eventbridge | pipes\$1execution | 
| amazon\$1interactive\$1video\$1service | chat | 
| amazon\$1managed\$1prometheus | scraper | 
| amazon\$1managed\$1prometheus | workspace | 
| amazon\$1msk | broker | 
| amazon\$1msk | connect | 
| amazon\$1opensearch\$1service | pipeline | 
| amazon\$1q\$1business | events | 
| amazon\$1q\$1business | sync\$1job | 
| amazon\$1q\$1connect | events | 
| amazon\$1route53 | global\$1resolver\$1query | 
| amazon\$1route53 | hosted\$1zones | 
| amazon\$1route53 | profiles\$1resolver\$1query | 
| amazon\$1route53 | resolver\$1query | 
| amazon\$1sagemaker | workteam\$1activity | 
| amazon\$1ses | ingress\$1endpoints | 
| amazon\$1ses | rule\$1sets | 
| amazon\$1ses | traffic\$1policy | 
| amazon\$1vpc | flow | 
| amazon\$1vpc | route\$1server\$1peer | 
| amazon\$1vpc\$1lattice | access | 
| amazon\$1vpc\$1lattice | resource\$1access | 
| amazon\$1workmail | access\$1control | 
| amazon\$1workmail | authentication | 
| amazon\$1workmail | personal\$1access | 
| amazon\$1workmail | workmail\$1access | 
| amazon\$1workmail | workmail\$1availability | 
| aws\$1b2b\$1data\$1interchange | execution | 
| aws\$1backup | data\$1access | 
| aws\$1backup | hypervisor | 
| aws\$1clean\$1rooms | analysis | 
| aws\$1client\$1vpn | connection | 
| aws\$1client\$1vpn | event | 
| aws\$1cloudtrail | data | 
| aws\$1cloudtrail | management | 
| aws\$1elemental\$1mediapackage | egress\$1access | 
| aws\$1elemental\$1mediapackage | ingress\$1access | 
| aws\$1elemental\$1mediatailor | ad\$1decision | 
| aws\$1elemental\$1mediatailor | manifest | 
| aws\$1elemental\$1mediatailor | transcode | 
| aws\$1entity\$1resolution | id\$1mapping\$1workflow | 
| aws\$1entity\$1resolution | matching\$1workflow | 
| aws\$1iot\$1fleetwise | error | 
| aws\$1mainframe\$1modernization | batch\$1job | 
| aws\$1mainframe\$1modernization | config | 
| aws\$1mainframe\$1modernization | console | 
| aws\$1mainframe\$1modernization | dataset\$1import | 
| aws\$1network\$1firewall | alert | 
| aws\$1network\$1firewall | flow | 
| aws\$1network\$1firewall | tls | 
| aws\$1nlb | access | 
| aws\$1pcs | job\$1completion | 
| aws\$1pcs | scheduler | 
| aws\$1security\$1hub\$1cspm | asff\$1finding | 
| aws\$1shield | protection\$1flow | 
| aws\$1step\$1functions | express | 
| aws\$1step\$1functions | standard | 
| aws\$1transfer\$1family | server | 
| aws\$1waf | access | 

# Supported third-party sources for data sources
<a name="supported-third-party-sources-data-sources"></a>

The following table lists the third-party sources that are automatically categorized by CloudWatch Logs as data sources when ingested through pipelines:


| Data Source Name (@data\$1source\$1name field) | Data Source Type (@data\$1source\$1type field) | 
| --- | --- | 
| crowdstrike\$1falcon | detection\$1finding | 
| crowdstrike\$1falcon | process\$1activity | 
| github\$1auditlogs | account\$1change | 
| github\$1auditlogs | api\$1activity | 
| github\$1auditlogs | entity\$1management | 
| microsoft\$1entraid | account\$1change | 
| microsoft\$1entraid | authentication | 
| microsoft\$1entraid | entity\$1management | 
| microsoft\$1entraid | user\$1access\$1management | 
| microsoft\$1office365 | account\$1change | 
| microsoft\$1office365 | application\$1lifecycle | 
| microsoft\$1office365 | authentication | 
| microsoft\$1office365 | compliance\$1finding | 
| microsoft\$1office365 | detection\$1finding | 
| microsoft\$1office365 | email\$1activity | 
| microsoft\$1office365 | file\$1hosting\$1activity | 
| microsoft\$1office365 | group\$1management | 
| microsoft\$1office365 | incident\$1finding | 
| microsoft\$1office365 | user\$1access\$1management | 
| microsoft\$1office365 | vulnerability\$1finding | 
| microsoft\$1office365 | web\$1resources\$1activity | 
| microsoft\$1windows | account\$1change | 
| microsoft\$1windows | authentication | 
| microsoft\$1windows | entity\$1management | 
| microsoft\$1windows | event\$1log\$1activity | 
| microsoft\$1windows | file\$1system\$1activity | 
| microsoft\$1windows | group\$1management | 
| microsoft\$1windows | kernel\$1activity | 
| okta\$1auth0 | api\$1activity | 
| okta\$1auth0 | authentication | 
| okta\$1sso | api\$1activity | 
| okta\$1sso | authentication | 
| okta\$1sso | detection\$1finding | 
| okta\$1sso | entity\$1management | 
| paloaltonetworks\$1nextgenerationfirewall | authentication | 
| paloaltonetworks\$1nextgenerationfirewall | detection\$1finding | 
| paloaltonetworks\$1nextgenerationfirewall | network\$1activity | 
| paloaltonetworks\$1nextgenerationfirewall | process\$1activity | 
| sentinelone\$1endpointsecurity | dns\$1activity | 
| sentinelone\$1endpointsecurity | file\$1system\$1activity | 
| sentinelone\$1endpointsecurity | http\$1activity | 
| sentinelone\$1endpointsecurity | process\$1activity | 
| servicenow\$1cmdb | api\$1activity | 
| servicenow\$1cmdb | datastore\$1activity | 
| servicenow\$1cmdb | entity\$1management | 
| wiz\$1cnapp | api\$1activity | 
| wiz\$1cnapp | authentication | 
| wiz\$1cnapp | compliance\$1finding | 
| wiz\$1cnapp | detection\$1finding | 
| wiz\$1cnapp | vulnerability\$1finding | 
| zscaler\$1internetaccess | authentication | 
| zscaler\$1internetaccess | dns\$1activity | 
| zscaler\$1internetaccess | http\$1activity | 
| zscaler\$1internetaccess | network\$1activity |