

# Data Validation
<a name="data-validation"></a>

Data Validation is a crucial step early in the forecast creation process that ensures the input data meets the necessary quality standards for forecasting. This feature runs a series of checks on your data, surfacing data errors that need to be fixed before proceeding to forecast creation, helping you identify and resolve issues early in the process.

The data validation step is preceded by a set of preprocessing activities to prepare the data, based on the plan settings or definition, which includes the following:
+ *Aggregation to align with forecast granularity.* For example:
  + If your forecast granularity is set to weekly, daily demand history data will be aggregated to weekly totals.
  + If your demand history contains product, site, customer, and channel dimensions, but your forecast granularity is set to product-site level, the system will aggregate sales across all customers and channels for each product-site combination.
+ *Data transformations from Demand Plan settings.* These transformations are based on your Demand Planning configuration settings. For example, if you have configured the system to ignore negative values, these will be handled accordingly.
+ *Product lineage consideration*. The system takes into account product relationships, such as predecessor-successor pairs or product alternatives, as defined in your configuration.
+ *Supplementary time series transformation*. The system transforms supplementary time series data into demand drivers that can influence the forecast generation. These transformed demand drivers provide additional context to the items above. 

**Topics**
+ [

# Data Validation Process
](data-validation-process.md)
+ [

# Data Validation Report Access
](data-validation-report-access.md)
+ [

# Data Validation Error Export
](data-validation-error-export.md)
+ [

# Data Validation Rules
](data-validation-rules.md)

# Data Validation Process
<a name="data-validation-process"></a>

After the preprocessing process described above completes, the data validation process begins. Data validation consists of three steps:

1. Data Structure Validation[Demand Planning](required_entities.md) - This step includes checks to ensure all required tables and columns exist and have data before any transformation begins. This stage confirms your data tables are properly set up.

1. Data Quality Validation - This step ensures that data content is complete and error-free. It checks for:
   + Missing values in essential fields
   + Validation checks on data formats and validity of dates
   + Data completeness required for building forecast input

   This ensures all necessary data is present and valid before proceeding with transformations.

1. Forecasting Eligibility Validation: This step ensures that sufficient data is provided to create a forecast, including:
   + Minimum historical data requirements
   + Time series length limitations
   + Other algorithm-specific constraints

   This stage ensures that your data is suitable for generating forecasts.

Even a single validation failure will stop the forecast creation process. You must work with your data administrator to correct the underlying data issues, then choose **Retry** to try forecast creation again.

# Data Validation Report Access
<a name="data-validation-report-access"></a>

When creating a forecast for the first time, navigate to the **Demand Planning** module in AWS Supply Chain and choose **Create a Plan**. The system guides you through three steps: Data Ingestion, Plan Configuration, and finally, Forecast Generation. After completing data ingestion and plan configuration, choose **Generate Forecast** to initiate data validation. Each new forecast generation creates a fresh validation report based on the current state of your data.

 Data Structure validation failures (such as missing tables or columns) appear as banner messages at the top of your screen. These fundamental issues must be resolved before proceeding. After data structure validation passes, the system proceeds with Data Quality and Forecasting Eligibility validations. Any failures in these stages are detailed in the validation report, accessible by choosing **Data Validations**.

## Subsequent Forecast Creation
<a name="subsequent-forecast"></a>

For subsequent forecasts, choose **Generate Forecast**. You will see a banner displaying three steps, with data validation as the first step. The same validation behavior applies. Structural issues appear as banners, while other validation failures are available in the detailed report.

## Report Content
<a name="report-content"></a>

The Data Validation Issues report provides a comprehensive view of Data Quality and Forecasting Eligibility validation failures that need to be addressed. The report displays the following:
+ Dataset: Identifies the specific dataset where the issue occurs
+ Rule: Describes the type of validation that failed
+ Error Date/Time: Shows when the error was detected
+ Status Message: Provides detailed information about the records affected and recommended actions

To help navigate and resolve these issues, you can do the following:
+ Use the search box to find specific types of errors
+ Filter by dataset using the drop-down menu
+ Download a detailed report containing all validation failures
+ View **Records affected** for each validation to understand the scope of the issue

# Data Validation Error Export
<a name="data-validation-error-export"></a>

Error records can be exported by choosing **Download** on the **Data Validation** report page when the validation is checking individual data points that failed.

**Note**  
The export option is not available when the validation is checking structural, systemic, or aggregate-level requirements. 

Export is available for the following:
+ Validation checks for content or quality of existing data
+ Validations that involve checking for missing or invalid values in existing fields
+ Data Quality Validations (such as null checks, and date range validations)

**Note**  
 The system limits error record downloads to a maximum of 10,000 rows. If the total error count exceeds this limit, a notification will appear on the screen. Work with your data administrator to review and resolve all errors in the source table. 

 Export is not available for the following:
+ Validation checks for structural elements (such as table existence or column presence)
+ Validations that involve system-level constraints (such as size limits, counts, and thresholds)
+ Forecasting eligibility checks (such as time series limits or active product counts)

# Data Validation Rules
<a name="data-validation-rules"></a>

The validations performed prior to forecast creation are below. For more information, see [Demand Planning](required_entities.md).


****  

| Rule Type | Rule | Datasets | Description | Export error records? | 
| --- | --- | --- | --- | --- | 
| Data Structure Validation | Mandatory columns existence validation | Product, Outbound order line, Supplementary time series |  Verifies presence of critical columns in datasets in required datasets: Outbound order line: product\$1id, order\$1date, final\$1quantity\$1requested Product: id, description Verifies presence of critical columns in recommended datasets, if provided: Supplementary Time Series: id, order\$1date, time\$1series\$1name, time\$1series\$1value  | No | 
| Data Structure Validation | Granularity columns existence validation | Product, Outbound order line |  Verifies presence of columns set as forecast granularity, if set in the demand plan settings. Outbound order line: product\$1id, ship\$1from\$1site\$1id, ship\$1to\$1site\$1id, ship\$1to\$1site\$1address\$1city, ship\$1to\$1address\$1state, ship\$1to\$1address\$1country, channel\$1id, customer\$1tpartner\$1id Product: id, product\$1group\$1id, product\$1type, brand\$1name, color, display\$1desc, parent\$1product\$1id  | No | 
| Data Structure Validation | Active product's history validation | Product, Outbound order line,Product Alternate | Verifies that there is atleast one active product that has history on its own or through product lineage | No | 
| Data Quality Validation | Missing values in mandatory columns validation | Product, Outbound order line, Supplementary time series | Verifies for null/empty values in mandatory columns specified in Mandatory columns existence check | Yes | 
| Data Quality Validation | Missing values in granularity columns validation | Product, Outbound order line | Verifies for null/empty values in mandatory columns specified in Granularity columns existence check | Yes | 
| Data Quality Validation | Date Range validation | OutboundOrderLine, SupplementaryTimeSeries | The order\$1date column in the dataset must contain dates in a sane time range: Anywhere from 01/01/1900 00:00:00 to 12/31/2050 00:00:00.  | Yes | 
| Forecasting Eligibility Validation | Timeseries per Predictor validation | OutboundOrderLine |  The timeseries per predictor must not exceed 5,000,000.  "Timeseries per predictor" is calculated by taking the count of unique values for the product\$1id column and each of the forecast granularity columns and then taking the product of all those counts.  | No | 
| Forecasting Eligibility Validation | Count of active products validation | Product | The number of active products with records in the OOL dataset must not exceed 800,000. | No | 
| Forecasting Eligibility Validation | Historical data sufficiency validation | Outbound order line |  Verifies if at least one product in the dataset has sufficient historical demand data to generate reliable forecasts The forecast horizon must be no greater than 1/3 the time range in the dataset (if training a new auto predictor) or 1/4 the time range in the dataset (if training an existing auto predictor). There is also a global maximum forecast horizon, which is 500.  | No | 
| Forecasting Eligibility Validation | Row Count validation | Partitioned OutboundOrderLine | The number of records in the partitioned OOL dataset must not exceed 3,000,000,000. There are certain forecast models that have smaller limits that are checked here as well, if those models are being used. | No | 
| Forecasting Eligibility Validation | Maximum Timeseries validation | Partitioned OutboundOrderLine |  The number of distinct timeseries must not exceed the model's limit, if there is one.  "Distinct timeseries" is defined as the number of distinct rows in the dataset when product\$1id \$1 all forecast granularity columns are considered.  | No | 
| Forecasting Eligibility Validation |  Data Density validation  | Partitioned OutboundOrderLine |  The Data density of the dataset must be at least 5. Data density is defined as (number of distinct products in the dataset) / (total number of rows in the dataset). In other words it is "average rows per product". The rule applies only when Prophet is selected as the forecasting algorithm.  | No | 