

 Amazon Forecast is no longer available to new customers. Existing customers of Amazon Forecast can continue to use the service as normal. [Learn more"](https://aws.amazon.com/blogs/machine-learning/transition-your-amazon-forecast-usage-to-amazon-sagemaker-canvas/)

# Importing Datasets
<a name="howitworks-datasets-groups"></a>

*Datasets* contain the data used to train a [predictor](howitworks-predictor.md). You create one or more Amazon Forecast datasets and import your training data into them. A *dataset group* is a collection of complementary datasets that detail a set of changing parameters over a series of time. After creating a dataset group, you use it to train a predictor. 

Each dataset group can have up to three datasets, one of each [dataset](#howitworks-dataset-domainstypes) type: target time series, related time series, and item metadata.

To create and manage Forecast datasets and dataset groups, you can use the Forecast console, AWS Command Line Interface (AWS CLI), or AWS SDK.

For example Forecast datasets, see the [Amazon Forecast Sample GitHub repository](https://github.com/aws-samples/amazon-forecast-samples).

**Topics**
+ [Datasets](#howitworks-dataset)
+ [Dataset Groups](#howitworks-datasetgroup)
+ [Resolving Conflicts in Data Collection Frequency](#howitworks-data-alignment)
+ [Using Related Time Series Datasets](related-time-series-datasets.md)
+ [Using Item Metadata Datasets](item-metadata-datasets.md)
+ [Predefined Dataset Domains and Dataset Types](howitworks-domains-ds-types.md)
+ [Updating Data](updating-data.md)
+ [Handling Missing Values](howitworks-missing-values.md)
+ [Dataset Guidelines for Forecast](dataset-import-guidelines-troubleshooting.md)

## Datasets
<a name="howitworks-dataset"></a>

To create and manage Forecast datasets, you can use the Forecast APIs, including the [CreateDataset](API_CreateDataset.md) and [DescribeDataset](API_DescribeDataset.md) operations. For a complete list of Forecast APIs, see [API Reference](api-reference.md).

When creating a dataset, you provide information, such as the following:
+ The frequency/interval at which you recorded your data. For example, you might aggregate and record retail item sales every week. In the [Getting Started](getting-started.md) exercise, you use the average electricity used per hour.
+ The prediction format (the *domain*) and dataset type (within the domain). A dataset domain specifies which type of forecast you'd like to perform, while a dataset type helps you organize your training data into Forecast-friendly categories.
+ The dataset *schema*. A schema maps the column headers of your dataset. For instance, when monitoring demand, you might have collected hourly data on the sales of an item at multiple stores. In this case, your schema would define the order, from left to right, in which timestamp, location, and hourly sales appear in your training data file. Schemas also define each column's data type, such as `string` or `integer`.
+ Geolocation and time zone information. The geolocation attribute is defined within the schema with the attribute type `geolocation`. Time zone information is defined with the [ CreateDatasetImportJob](API_CreateDatasetImportJob.md) operation. Both geolocation and time zone data must be included to enable the [Weather Index.](weather.md)

Each column in your Forecast dataset represents either a forecast *dimension* or *feature*. Forecast dimensions describe the aspects of your data that do not change over time, such a `store` or `location`. Forecast features include any parameters in your data that vary across time, such as `price` or `promotion`. Some dimensions, like `timestamp` or `itemId`, are required in target time series and related time series datasets.

### Dataset Domains and Dataset Types
<a name="howitworks-dataset-domainstypes"></a>

When you create a Forecast dataset, you choose a domain and a dataset type. Forecast provides domains for a number of use cases, such as forecasting retail demand or web traffic. You can also create a custom domain. For a complete list of Forecast domains, see [Predefined Dataset Domains and Dataset Types](howitworks-domains-ds-types.md).

Within each domain, Forecast users can specify the following types of datasets:
+ Target time series dataset (required) – Use this dataset type when your training data is a time series *and* it includes the field that you want to generate a forecast for. This field is called the *target field*.
+ Related time series dataset (optional) – Choose this dataset type when your training data is a time series, but it *doesn't* include the target field. For instance, if you're forecasting item demand, a related time series dataset might have `price` as a field, but not `demand`.
+ Item metadata dataset (optional) – Choose this dataset type when your training data *isn't* time-series data, but includes metadata information about the items in the target time series or related time series datasets. For instance, if you're forecasting item demand, an item metadata dataset might have `color` or `brand` as dimensions. 

  Forecast only considers the data provided by an item metadata dataset type when you use the [CNN-QR](aws-forecast-algo-cnnqr.md) or [DeepAR\$1](aws-forecast-recipe-deeparplus.md) algorithm.

  Item metadata is especially useful in coldstart forecasting scenarios, in which you have little direct historical data with which to make predictions, but do have historical data on items with similar metadata attributes. When you include item metadata, Forecast creates coldstart forecasts based on similar time series, which can create a more accurate forecast. 

Depending on the information in your training data and what you want to forecast, you might create more than one dataset. 

For example, suppose that you want to generate a forecast for the demand of retail items, such as shoes and socks. You might create the following datasets in the RETAIL domain:
+ Target time series dataset – Includes the historical time-series demand data for the retail items (`item_id`, `timestamp`, and the target field `demand`). Because it designates the target field that you want to forecast, you must have at least one target time series dataset in a dataset group.

  You can also add up to ten other dimensions to a target time series dataset. If you include only a target time series dataset in your dataset group, you can create forecasts at either the item level or the forecast dimension level of granularity only. For more information, see [CreatePredictor](API_CreatePredictor.md).
+ Related time series dataset – Includes historical time-series data other than the target field, such as `price` or `revenue`. Because related time series data must be mappable to target time series data, each related time series dataset must contain the same identifying fields. In the RETAIL domain, these would be `item_id` and `timestamp`.

  A related time series dataset might contain data that refines the forecasts made off of your target time series dataset. For example, you might include `price` data in your related time series dataset on the future dates that you want to generate a forecast for. This way, Forecast can make predictions with an additional dimension of context. For more information, see [Using Related Time Series Datasets](related-time-series-datasets.md).
+ Item metadata dataset – Includes metadata for the retail items. Examples of metadata include `brand`, `category`,`color`, and `genre`.

**Example Dataset with a Forecast Dimension**

Continuing with the preceding example, imagine that you want to forecast the demand for shoes and socks based on a store's previous sales. In the following target time series dataset, `store` is a time-series forecast dimension, while `demand` is the target field. Socks are sold in two store locations (NYC and SFO), and shoes are sold only in ORD.

The first three rows of this table contain the first available sales data for the NYC, SFO, and ORD stores. The last three rows contain the last recorded sales data for each store. The `...` row represents all of the item sales data recorded between the first and last entries.


| `timestamp` | `item_id` | `store` | `demand` | 
| --- | --- | --- | --- | 
| 2019-01-01 | socks | NYC |  25  | 
| 2019-01-05 | socks | SFO | 45 | 
| 2019-02-01 | shoes | ORD | 10 | 
| ... | 
| 2019-06-01 | socks | NYC | 100 | 
| 2019-06-05 | socks | SFO | 5 | 
| 2019-07-01 | shoes | ORD | 50 | 

### Dataset Schema
<a name="howitworks-dataset-schema"></a>

Each dataset requires a schema, a user-provided JSON mapping of the fields in your training data. This is where you list both the required and optional dimensions and features that you want to include in your dataset.

If your dataset includes a geolocation attribute, define the attribute within the schema with the attribute type `geolocation`. For more information, see [Adding Geolocation information](weather.md#adding-geolocation). In order to apply the [Weather Index](weather.md), you must include a geolocation attribute in your target time series and any related time series datasets.

Some domains have optional dimensions that we recommend including. Optional dimensions are listed in the descriptions of each domain later in this guide. For an example, see [RETAIL Domain](retail-domain.md). All optional dimensions take the data type `string`.

A schema is required for every dataset. The following is the accompanying schema for the example target time series dataset above.

```
{
     "attributes": [
        {
           "AttributeName": "timestamp",
           "AttributeType": "timestamp"
        },
        {
           "AttributeName": "item_id",
           "AttributeType": "string"
        },
        {
           "AttributeName": "store",
           "AttributeType": "string"
        },
        {
           "AttributeName": "demand",
           "AttributeType": "float"
        }
    ]
}
```

When you upload your training data to the dataset that uses this schema, Forecast assumes that the `timestamp` field is column 1, the `item_id` field is column 2, the `store` field is column 3, and the `demand` field, the *target* field, is column 4.

For the related time series dataset type, all related features must have a float or integer attribute type. For the item metadata dataset type, all features must have a string attribute type. For more information, see [SchemaAttribute](API_SchemaAttribute.md).

**Note**  
An `attributeName` and `attributeType` pair is required for every column in the dataset. Forecast reserves a number of names that can't be used as the name of a schema attribute. For the list of reserved names, see [Reserved Field Names](reserved-field-names.md).

## Dataset Groups
<a name="howitworks-datasetgroup"></a>

A *dataset group* is a collection of one to three complimentary datasets, one of each dataset type. You import datasets to a dataset group, then use the dataset group to train a predictor.

Forecast includes the following operations to create dataset groups and add datasets to them:
+ [CreateDatasetGroup](API_CreateDatasetGroup.md)
+ [UpdateDatasetGroup](API_UpdateDatasetGroup.md)

## Resolving Conflicts in Data Collection Frequency
<a name="howitworks-data-alignment"></a>

Forecast can train predictors with data that doesn't align with the data frequency you specify in the [CreateDataset](API_CreateDataset.md) operation. For example, you can import data in recorded in hourly intervals even though some of the data isn't timestamped at the top of the hour (02:20, 02:45). Forecast uses the data frequency you specify to learn about your data. Then Forecast aggregates the data during predictor training. For more information see [Data aggregation for different forecast frequencies](data-aggregation.md). 

# Using Related Time Series Datasets
<a name="related-time-series-datasets"></a>

A related time series dataset includes time-series data that isn't included in a target time series dataset and might improve the accuracy of your predictor.

For example, in the demand forecasting domain, a target time series dataset would contain `timestamp` and `item_id` dimensions, while a complementary related time series dataset also includes the following supplementary features: `item price`, `promotion`, and `weather`.

A related time series dataset can contain up to 10 forecast dimensions (the same ones in your target time series dataset) and up to 13 related time-series features.

**Python notebooks**  
For a step-by-step guide on using related time-series datasets, see [Incorporating Related Time Series](https://github.com/aws-samples/amazon-forecast-samples/blob/master/notebooks/advanced/Incorporating_Related_Time_Series_dataset_to_your_Predictor/Incorporating_Related_Time_Series_dataset_to_your_Predictor.ipynb).

**Topics**
+ [Historical and Forward-looking Related Time Series](#related-time-series-historical-futurelooking)
+ [Related Time Series Dataset Validation](#related-time-series-dataset-validation)
+ [Example: Forward-looking Related Time Series File](#related-time-series-example)
+ [Example: Forecasting Granularity](#related-time-series-granularity)
+ [Legacy Predictors and Related Time Series](#related-time-series-legacy)

## Historical and Forward-looking Related Time Series
<a name="related-time-series-historical-futurelooking"></a>

**Note**  
 A related time series that contains any values within the forecast horizon is treated as a forward-looking time series. 

 Related time series come in two forms: 
+  **Historical time series**: time series *without* data points within the forecast horizon. 
+  **Forward-looking time series**: time series *with* data points within the forecast horizon. 

Historical related time series contain data points up to the forecast horizon, and do not contain any data points within the forecast horizon. Forward-looking related time series contain data points up to *and* within the forecast horizon. 

![\[Time series graph showing target, forward-looking, and historical related data with forecast window.\]](http://docs.aws.amazon.com/forecast/latest/dg/images/short-long-rts.png)


## Related Time Series Dataset Validation
<a name="related-time-series-dataset-validation"></a>

A related time series dataset has the following restrictions:
+ It can't include the target value from the target time series.
+ It must include `item_id` and `timestamp` dimensions, and at least one related feature (such as `price`).
+ Related time series feature data must be of the `int` or `float` datatypes.
+ In order to use the entire target time series, all items from the target time series dataset must also be included in the related time series dataset. If a related time series only contains a subset of items from the target time series, then the model creation and forecast generation will be limited to that specific subset of items.

   For example, if the target time series contains 1000 items and the related time series dataset only contains 100 items, then the model and forecasts will be based on only those 100 items. 
+ The frequency at which data is recorded in the related time series dataset must match the interval at which you want to generate forecasts (the forecasting *granularity*).

  For example, if you want to generate forecasts at a weekly granularity, the frequency at which data is recorded in the related time series must also be weekly, even if the frequency at which data is recorded in the target time series is daily.
+ The data for each item in the related time series dataset must start on or before the beginning `timestamp` of the corresponding `item_id` in the target time series dataset.

  For example, if the target time series data for `socks` starts at 2019-01-01 and the target time series data for `shoes` starts at 2019-02-01, the related time series data for `socks` must begin on or before 2019-01-01 and the data for `shoes` must begin on or before 2019-02-01.
+ For forward-looking related time series datasets, the last timestamp for every item must be on the last timestamp in the user-designated forecast window (called the *forecast horizon*).

  In the example related time series file below, the `timestamp` data for both socks and shoes must end on or after 2019-07-01 (the last recorded timestamp) *plus* the forecast horizon. If data frequency in the target time series is daily and the forecast horizon is 10 days, daily data points must be provided in the forward-looking related time series file until 2019-07-11.
+ For historical related time series datasets, the last timestamp for every item must match the last timestamp in the target time series.

  In the example related time series file below, the `timestamp` data for both socks and shoes must end on 2019-07-01 (the last recorded timestamp).
+ The Forecast dimensions provided in the related time series dataset must be either equal to or a subset of the dimensions designated in the target time series dataset.
+  Related time series cannot have missing values. For information on missing values in a related time series dataset, see [Handling Missing Values](howitworks-missing-values.md). 

## Example: Forward-looking Related Time Series File
<a name="related-time-series-example"></a>

The following table shows a correctly configured related time series dataset file. For this example, assume the following:
+ The last data point was recorded in the target time series dataset on 2019-07-01.
+  The forecast horizon is 10 days. 
+ The forecast granularity is daily (`D`). 

A "`…`" row indicates all of the data points in between the previous and succeeding rows.


| `timestamp` | `item_id` | `store` | `price` | 
| --- | --- | --- | --- | 
| 2019-01-01 | socks | NYC | 10 | 
| 2019-01-02 | socks | NYC | 10 | 
| 2019-01-03 | socks | NYC | 15 | 
| ... | 
| 2019-06-01 | socks | NYC | 10 | 
| ... | 
| 2019-07-01 | socks | NYC | 10 | 
| ... | 
| 2019-07-11 | socks | NYC | 20 | 
| 2019-01-05 | socks | SFO | 45 | 
| ... | 
| 2019-06-05 | socks | SFO | 10 | 
| ... | 
| 2019-07-01 | socks | SFO | 10 | 
| ... | 
| 2019-07-11 | socks | SFO | 30 | 
| 2019-02-01 | shoes | ORD | 50 | 
| ... | 
| 2019-07-01 | shoes | ORD | 75 | 
| ... | 
| 2019-07-11 | shoes | ORD | 60 | 

## Example: Forecasting Granularity
<a name="related-time-series-granularity"></a>

The following table shows compatible data recording frequencies for target time series and related time series to forecast at a weekly granularity. Because data in a related time series dataset can't be aggregated, Forecast accepts only a related time series data frequency that is the same as the chosen forecasting granularity.


| Target Input Data Frequency | Related Time Series Frequency | Forecasting Granularity | Supported by Forecast? | 
| --- | --- | --- | --- | 
| Daily | Weekly | Weekly | Yes | 
| Weekly | Weekly | Weekly | Yes | 
| N/A | Weekly | Weekly | Yes | 
| Daily | Daily | Weekly | No | 

## Legacy Predictors and Related Time Series
<a name="related-time-series-legacy"></a>

**Note**  
To upgrade an existing predictor to AutoPredictor, see [Upgrading to AutoPredictor](howitworks-predictor.md#upgrading-autopredictor)

When using a legacy predictor, you can use a related time series dataset when training a predictor with the [CNN-QR](aws-forecast-algo-cnnqr.md), [DeepAR\$1](aws-forecast-recipe-deeparplus.md), and [Prophet](aws-forecast-recipe-prophet.md) algorithms. [NPTS](aws-forecast-recipe-npts.md), [ARIMA](aws-forecast-recipe-arima.md), and [ETS](aws-forecast-recipe-ets.md) do not accept related time series data.

The following table shows the types of related time series each Amazon Forecast algorithm accepts. 


|  | CNN-QR | DeepAR\$1 | Prophet | NPTS | ARIMA | ETS | 
| --- | --- | --- | --- | --- | --- | --- | 
|  Historical related time series  | ![\[Yes\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-yes.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | 
|  Forward-looking related time series  | ![\[Yes\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-yes.png)  | ![\[Yes\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-yes.png)  | ![\[Yes\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-yes.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | ![\[No\]](http://docs.aws.amazon.com/forecast/latest/dg/images/icon-no.png)  | 

 When using AutoML, you can provide both historical and forward-looking related time series data, and Forecast will only use those time series where applicable. 

 If you provide *forward-looking* related time series data, Forecast will use the related data with CNN-QR, DeepAR\$1, and Prophet, and will not use the related data with NPTS, ARIMA and ETS. If provided *historical* related time series data, Forecast will use the related data with CNN-QR, and will not use the related data with DeepAR\$1, Prophet, NPTS, ARIMA, and ETS. 

# Using Item Metadata Datasets
<a name="item-metadata-datasets"></a>

An *item metadata dataset* contains categorical data that provides valuable context for the items in a target time-series dataset. Unlike related time-series datasets, item metadata datasets provide information that is static. That is, the data values remain constant over time, like an item's color or brand. Item metadata datasets are optional additions to your dataset groups. You can use an item metadata only if every item in your target time-series dataset is present in the corresponding item metadata dataset.

Item metadata might include the brand, color, model, category, place of origin, or other supplemental feature of a particular item. For example, an item metadata dataset might provide context for some of the demand data found in a target time-series dataset that represents the sales of black Amazon e-readers with 32 GB of storage. Because these characteristics don't change from day-to-day or hour-to-hour, they belong in an item metadata dataset.

Item metadata is useful for discovering and tracking descriptive patterns across your time-series data. If you include an item metadata dataset in your dataset group, Forecast can train the model to make more accurate predictions based on similarities across items. For example, you might find that virtual assistant products made by Amazon are more likely to sell out than those created by other companies, and then plan your supply chain accordingly.

Item metadata is especially useful in coldstart forecasting scenarios, in which you have no historical data with which to make predictions, but do have historical data on items with similar metadata attributes. The item metadata enables Forecast to leverage similar items to your coldstart items to produce a forecast.

When you include item metadata, Forecast creates coldstart forecasts based on similar time series, which can create a more accurate forecast. Coldstart forecasts are generated for items that are in the item metadata dataset but not in the trailing time series. First, Forecast generates forecasts for the non-coldstart items, which are items with historical data in the trailing time series. Next, for each coldstart item, its nearest neighbors are found using the item metadata dataset. Then, these nearest neighbors are used to create a coldstart forecast.

Each row in an item metadata dataset can contain up to 10 metadata fields, one of which must be an identification field to match the metadata to an item in the target time series. As with all dataset types, the values of each field are designated by a dataset schema.

**Python notebooks**  
For a step-by-step guide on using item metadata, see [Incorporating Item Metadata](https://github.com/aws-samples/amazon-forecast-samples/blob/master/notebooks/advanced/Incorporating_Item_Metadata_Dataset_to_your_Predictor/Incorporating_Item_Metadata_Dataset_to_your_Predictor.ipynb).

**Topics**
+ [Example: Item Metadata File and Schema](#item-metadata-example)
+ [Legacy Predictors and Item Metadata](#item-metadata-legacy)
+ [See Also](#item-metadata-see-also)

## Example: Item Metadata File and Schema
<a name="item-metadata-example"></a>

The following table shows a section of a correctly configured item metadata dataset file that describes Amazon e-readers. For this example, assume that the header row represents the dataset's schema, and that each listed item is in a corresponding target time-series dataset.


| `item_id` | `brand` | `model` | `color` | `waterproof` | 
| --- | --- | --- | --- | --- | 
| 1 | amazon | paperwhite | black | yes | 
| 2 | amazon | paperwhite | blue | yes | 
| 3 | amazon | base\$1model | black | no | 
| 4 | amazon | base\$1model | white | no | 
| ... | 

The following is the same information represented in CSV format.

```
1,amazon,paperwhite,black,yes
2,amazon,paperwhite,blue,yes
3,amazon,base_model,black,no
4,amazon,base_model,white,no
...
```

The following is the schema for this example dataset.

```
{
     "attributes": [
        {
           "AttributeName": "item_id",
           "AttributeType": "string"
        },
        {
           "AttributeName": "brand",
           "AttributeType": "string"
        },
        {
           "AttributeName": "model",
           "AttributeType": "string"
        },
        {
           "AttributeName": "color",
           "AttributeType": "string"
        },
        {
           "AttributeName": "waterproof",
           "AttributeType": "string"
        }
    ]
}
```

## Legacy Predictors and Item Metadata
<a name="item-metadata-legacy"></a>

**Note**  
To upgrade an existing predictor to AutoPredictor, see [Upgrading to AutoPredictor](howitworks-predictor.md#upgrading-autopredictor)

When using a legacy predictor, you can use item metadata when training a predictor with the [CNN-QR](aws-forecast-algo-cnnqr.md) or [DeepAR\$1](aws-forecast-recipe-deeparplus.md) algorithms. When using AutoML, you can provide Item metadata and Forecast will only use those time series where applicable

## See Also
<a name="item-metadata-see-also"></a>

For an in-depth walkthrough on using item metadata datasets, see [Incorporating Item Metadata Datasets into Your Predictor](https://github.com/aws-samples/amazon-forecast-samples/blob/master/notebooks/advanced/Incorporating_Item_Metadata_Dataset_to_your_Predictor/Incorporating_Item_Metadata_Dataset_to_your_Predictor.ipynb) in the [Amazon Forecast Samples GitHub Repository](https://github.com/aws-samples/amazon-forecast-samples).

# Predefined Dataset Domains and Dataset Types
<a name="howitworks-domains-ds-types"></a>

To train a predictor, you create one or more datasets, add them to a dataset group, and provide the dataset group for training.

For each dataset that you create, you associate a dataset domain and a dataset type. A *dataset domain* specifies a pre-defined dataset schema for a common use case, and does not impact model algorithms or hyperparameters.

Amazon Forecast supports the following dataset domains:
+ [RETAIL Domain](retail-domain.md) – For retail demand forecasting
+ [INVENTORY\$1PLANNING Domain](inv-planning-domain.md) – For supply chain and inventory planning
+ [EC2 CAPACITY Domain](ec2-capacity-domain.md) – For forecasting Amazon Elastic Compute Cloud (Amazon EC2) capacity 
+ [WORK\$1FORCE Domain](workforce-domain.md) – For work force planning 
+ [WEB\$1TRAFFIC Domain](webtraffic-domain.md) – For estimating future web traffic 
+ [METRICS Domain](metrics-domain.md) – For forecasting metrics, such as revenue and cash flow
+ [CUSTOM Domain](custom-domain.md) – For all other types of time-series forecasting

Each domain can have one to three *dataset types*. The dataset types that you create for a domain are based on the type of data that you have and what you want to include in training.

Each domain requires a target time series dataset, and optionally supports the related time series and item metadata dataset types.

The dataset types are:
+ Target time series – The only required dataset type. This type defines the *target* field that you want to generate forecasts for. For example, if you want to forecast the sales for a set of products, then you must create a dataset of historical time-series data for each of the products that you want to forecast. Similarly, you can create a target time series dataset for metrics— such as revenue, cash flow, and sales—that you might want to forecast.
+ Related time series – Time-series data that is related to the target time series data. For example, price is related to product sales data, so you might provide it as a related time series.
+ Item metadata – Metadata that is applicable to the target time-series data. For example, if you are forecasting sales for a particular product, attributes of the product—such as brand, color, and genre—will be part of item metadata. When predicting EC2 capacity for EC2 instances, metadata might include the CPU and memory of the instance types.

For each dataset type, your input data must contain certain required fields. You can also include optional fields that Amazon Forecast suggests that you include.

The following examples show how to choose a dataset domain and corresponding dataset types.

**Example 1: Dataset Types in the RETAIL Domain**  
If you are a retailer interested in forecasting demand for items, you might create the following datasets in the RETAIL domain:  
+ Target time series is the required dataset of historical time-series demand (sales) data for each item (each product a retailer sells). In the RETAIL domain, this dataset type requires that the dataset includes the `item_id`, `timestamp`, and the `demand` fields. The `demand` field is the forecast target, and is typically the number of items sold by the retailer in a particular week or day.
+ Optionally, a dataset of the related time series type. In the RETAIL domain, this type can include optional, but suggested, time-series information such as `price`, `inventory_onhand`, and `webpage_hits`.
+ Optionally, a dataset of the item metadata type. In the RETAIL domain, Amazon Forecast suggests providing metadata information related to the items that you provided in target time series, such as `brand`, `color`, `category`, and `genre`.

**Example 2: Dataset Types in the METRICS Domain**  
If you want to forecast key metrics for your organization—such as revenue, sales and cash flow—you can provide Amazon Forecast with the following datasets:  
+ The target time series dataset that provides historical time-series data for the metric that you want to forecast. If your interest is to forecast the revenue of all of the business units in your organization, you can create a `target time series` dataset with the `metric`, `business unit`, and `metric_value` fields.
+ If you have any metadata for each metric that isn't required, such as `category` or `location`, you might provide datasets of the related time series and item metadata type.
At a minimum, you must provide a target time series dataset for Forecast to generate forecasts for your target metrics.

**Example 3: Dataset Types in the CUSTOM Domain**  
The training data for your forecasting application might not fit into any of the Amazon Forecast domains. If that's the case, choose the CUSTOM domain. You must provide the target time series dataset, but you can add your own custom fields.  
The [Getting Started](getting-started.md) exercise forecasts electricity usage for a client. The electricity usage training data doesn't fit into any of the dataset domains, so we used the CUSTOM domain. In the exercise, we use only one dataset type, the target time series type. We map the data fields to the minimum fields required by the dataset type.

# RETAIL Domain
<a name="retail-domain"></a>

The RETAIL domain supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-retail-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-retail-domain)
+ [Item Metadata Dataset Type](#item-metadata-type-retail-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-retail-domain"></a>

The target time series is the historical time series data for each item or product sold by the retail organization. The following fields are required: 
+ `item_id ` (string) – A unique identifier for the item or product that you want to predict the demand for.
+ `timestamp` (timestamp)
+ `demand` (float) – The number of sales for that item at the timestamp. This is also the *target* field for which Amazon Forecast generates a forecast.

The following dimension is optional and can be used to change forecasting granularity:
+ `location` (string) – The location of the store that the item got sold at. This should only be used if you have multiple stores/locations.

Ideally, only these required fields and optional dimensions should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-retail-domain"></a>

You can provide Amazon Forecast with related time series datasets, such as the price or the number of web hits the item received on a particular date. The more information that you provide, the more accurate the forecast. The following fields are required: 
+ `item_id ` (string)
+ `timestamp `(timestamp)

The following fields are optional and might be useful in improving forecast results:
+ `price` (float) – The price of the item at the time of the timestamp.
+ `promotion_applied` (integer; 1=true, 0=false) – A flag that specifies whether there was a marketing promotion for that item at the timestamp.

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

## Item Metadata Dataset Type
<a name="item-metadata-type-retail-domain"></a>

This dataset provides Amazon Forecast with information about metadata (attributes) of the items whose demand is being forecast. The following fields are required:
+ `item_id `(string)

The following fields are optional and might be useful in improving forecast results:
+ `category` (string)
+ `brand` (string)
+ `color` (string)
+ `genre` (string)

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# CUSTOM Domain
<a name="custom-domain"></a>

The CUSTOM domain supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-custom-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-custom-domain)
+ [Item Metadata Dataset Type](#item-metadata-type-custom-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-custom-domain"></a>

The following fields are required:
+ `item_id ` (string)
+ `timestamp` (timestamp)
+ `target_value` (floating-point integer) – This is the `target` field for which Amazon Forecast generates a forecast.

Ideally, only these required fields should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-custom-domain"></a>

The following fields are required:
+ `item_id` (string)
+ `timestamp` (timestamp)

In addition to the required fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

## Item Metadata Dataset Type
<a name="item-metadata-type-custom-domain"></a>

The following field is required:
+ `item_id` (string)

The following field is optional and might be useful in improving forecast results:
+ `category` (string)

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# INVENTORY\$1PLANNING Domain
<a name="inv-planning-domain"></a>

Use the INVENTORY\$1PLANNING domain for forecasting demand for raw materials and determining how much inventory of a particular item to stock. It supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-inv-planning-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-related-time-series-domain)
+ [Item Metadata Dataset Type](#item-metadata-type-related-time-series-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-inv-planning-domain"></a>

The following fields are required: 
+ `item_id` (string)
+ `timestamp` (timestamp)
+ `demand` (float) – This is the `target` field for which Amazon Forecast generates a forecast.

The following dimension is optional and can be used to change forecasting granularity:
+ `location` (string) – The location of the distribution center where the item is stocked. This should only be used if you have multiple stores/locations.

Ideally, only these required fields and optional dimensions should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-related-time-series-domain"></a>

The following fields are required: 
+ `item_id` (string)
+ `timestamp` (timestamp)

The following fields are optional and might be useful in improving forecast results:
+ `price` (float) – The price of the item 

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

## Item Metadata Dataset Type
<a name="item-metadata-type-related-time-series-domain"></a>

The following fields are required: 
+ `item_id` (string)

The following fields are optional and might be useful in improving forecast results:
+ `category` (string) – The category of the item.
+ `brand` (string) – The brand of the item.
+ `lead_time` (string) – The lead time, in days, to manufacture the item.
+ `order_cycle` (string) – The order cycle starts when work begins and ends when the item is ready for delivery.
+ `safety_stock` (string) – The minimum amount of stock to keep on hand for that item.

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# EC2 CAPACITY Domain
<a name="ec2-capacity-domain"></a>

Use the EC2 CAPACITY domain for forecasting Amazon EC2 capacity. It supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

## Target Time Series Dataset Type
<a name="target-time-series-type-ec2-capacity-domain"></a>

The following fields are required:
+ `instance_type` (string) – The type of instance (for example, c5.xlarge).
+ `timestamp` (timestamp)
+ `number_of_instances` (integer) – The number of instances of that particular instance type that was consumed at the timestamp. This is the `target` field for which Amazon Forecast generates a forecast.

The following dimension is optional and can be used to change forecasting granularity:
+ `location` (string) – You can provide an AWS Region, such as us-west-2 or us-east-1. This should only be used if you're modeling multiple Regions.

Ideally, only these required and suggested optional fields should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-ec2-capacity-domain"></a>

The following fields are required: 
+ `instance_type` (string)
+ `timestamp` (timestamp)

In addition to the required fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# WORK\$1FORCE Domain
<a name="workforce-domain"></a>

Use the WORK\$1FORCE domain to forecast workforce demand. It supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-workforce-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-workforce-domain)
+ [Item Metadata Dataset Type](#item-metadata-type-workforce-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-workforce-domain"></a>

The following fields are required: 
+ `workforce_type` (string) – The type of work force labor being forecast. For example, call center demand or fulfillment center labor demand.
+ `timestamp` (timestamp)
+ `workforce_demand` (floating-point integer) – This is the `target` field for which Amazon Forecast generates a forecast.

The following dimension is optional and can be used to change forecasting granularity:
+ `location` (string) – The location where the work force resources are sought. This should be used if you have multiple stores/locations.

Ideally, only these required fields and optional dimensions should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-workforce-domain"></a>

The following fields are required: 
+ `workforce_type` (string)
+ `timestamp` (timestamp)

In addition to the required fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

## Item Metadata Dataset Type
<a name="item-metadata-type-workforce-domain"></a>

The following field is required: 
+ `workforce_type` (string)

The following fields are optional and might be useful in improving forecast results:
+ `wages` (float) – The average wages for that particular workforce type.
+ `shift_length` (string) – The length of the shift.
+ `location` (string) – The location of the workforce.

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# WEB\$1TRAFFIC Domain
<a name="webtraffic-domain"></a>

Use the WEB\$1TRAFFIC domain to forecast web traffic to a web property or a set of web properties. It supports the following dataset types. The relevant topics describe required and optional fields the dataset type supports. For information about how to map these fields to columns in your training data see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-webtraffic-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-webtraffic-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-webtraffic-domain"></a>

The following fields are required: 
+ `item_id` (string) – A unique identifier for each web property being forecast.
+ `timestamp` (timestamp)
+ `value` (float) – This is the `target` field for which Amazon Forecast generates a forecast.

Ideally, only these required fields should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-webtraffic-domain"></a>

The following fields are required: 
+ `item_id` (string)
+ `timestamp` (timestamp)

In addition to the required fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

### Item Metadata Dataset Type
<a name="idem-metadata-type-webtraffic-domain"></a>

The following field is required: 
+ `item_id` (string)

The following field is optional and might be useful in improving forecast results:
+ `category` (string)

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# METRICS Domain
<a name="metrics-domain"></a>

Use the METRICS domain for forecasting metrics, such as revenue, sales, and cash flow. It supports the following dataset types. For each dataset type, we list required and optional fields. For information on how to map the fields to columns in your training data, see [Dataset Domains and Dataset Types](howitworks-datasets-groups.md#howitworks-dataset-domainstypes).

**Topics**
+ [Target Time Series Dataset Type](#target-time-series-type-metrics-domain)
+ [Related Time Series Dataset Type](#related-time-series-type-metrics-domain)
+ [Item Metadata Dataset Type](#item-metadata-type-metrics-domain)

## Target Time Series Dataset Type
<a name="target-time-series-type-metrics-domain"></a>

The following fields are required: 
+ `metric_name` (string)
+ `timestamp` (timestamp)
+ `metric_value` (floating-point integer) – This is the `target` field for which Amazon Forecast generates a forecast (for example, the amount of revenue generated on a particular day).

Ideally, only these required fields should be included. Other additional time series information should be included in a related time series dataset.

## Related Time Series Dataset Type
<a name="related-time-series-type-metrics-domain"></a>

The following fields are required: 
+ `metric_name` (string)
+ `timestamp` (timestamp)

In addition to the required fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

## Item Metadata Dataset Type
<a name="item-metadata-type-metrics-domain"></a>

The following field is required: 
+ `metric_name` (string)

The following field is optional and might be useful in improving forecast results:
+ `category` (string)

In addition to the required and suggested optional fields, your training data can include other fields. To include other fields in the dataset, provide the fields in a schema when you create the dataset.

# Updating Data
<a name="updating-data"></a>

As you collect new data, you will want to import that data into Forecast. To do so, you have two options, replacement and incremental updates. A replacement dataset import job will overwrite all existing data with the newly imported data. An incremental update will append the newly imported data to the dataset.

After importing your new data, you can use an existing predictor to generate a forecast for that data.

**Topics**
+ [Import modes](#idsi)
+ [Updating existing datasets](#idsi-console)
+ [Updating forecasts](#update-data-new-forecasts)

## Import modes
<a name="idsi"></a>

To configure how Amazon Forecast adds new data to existing dataset, you specify the import mode for your dataset import job. The default import mode is `FULL`. You can only configure the import mode by using the Amazon Forecast API.
+ To overwrite all existing data in your dataset, specify `FULL` in the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) API operation.
+ To append the records to the existing data in your dataset, specify `INCREMENTAL` in the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) API operation. If an existing record and an imported record have the same timeseries ID (item ID, dimension, and timestamp), then the existing record is replaced with the newly imported record. Amazon Forecast always uses the record with the most recent timestamp.

If you have not imported a dataset, the incremental option is not available. The default import mode is a full replacement.

### Incremental import mode guidelines
<a name="idsi-incremental"></a>

When you perform an incremental dataset import, you cannot change the timestamp format, data format, or geolocation data. To change any of these items, you need to perform a full data dataset import.

## Updating existing datasets
<a name="idsi-console"></a>

**Important**  
By default, a dataset import job replaces any existing data in the dataset that you imported into. You can change this by specifying the dataset import job's [Import modes](#idsi).

To update a dataset, create a dataset import job for the dataset and specify the import mode.

------
#### [ CLI ]

To update a dataset, use the `create-dataset-import-job` command. For the `import-mode`, specify `FULL`, to replace existing data or `INCREMENTAL` to add to it. For more information, see [Import modes](#idsi).

The following code shows how to create a dataset import job that incrementally imports new data into a dataset.

```
aws forecast create-dataset-import-job \
                        --dataset-import-job-name dataset import job name \
                        --dataset-arn dataset arn \
                        --data-source "S3Config":{"KMSKeyArn":"string", "Path":"string", "RoleArn":"string"} \
                        --import-mode INCREMENTAL
```

------
#### [ Python ]

To update a dataset, use the `create_dataset_import_job` method. For the `import-mode`, specify `FULL`, to replace existing data or `INCREMENTAL` to add to it. For more information, see [Import modes](#idsi).

```
import boto3

forecast = boto3.client('forecast')

response = forecast.create_dataset_import_job(
    datasetImportJobName = 'YourImportJob',
    datasetArn = 'dataset_arn',
    dataSource = {"S3Config":{"KMSKeyArn":"string", "Path":"string", "RoleArn":"string"}},
    importMode = 'INCREMENTAL'
)
```

------

## Updating forecasts
<a name="update-data-new-forecasts"></a>

As you collect new data, you might want to use it to generate new forecasts. Forecast does not automatically retrain a predictor when you import an updated dataset, but you can manually retrain a predictor to generate a new forecast with the updated data. For instance, if you collect daily sales data and want to include new data points in your forecast, you could import the updated data and use it to generate a forecast without training a new predictor. For newly imported data to have an impact on your forecasts, you must retrain the predictor.

**To generate a forecast from the new data:**

1. Upload the new data to an Amazon S3 bucket. Your new data should contain only the data added since your last data set import.

1. Create an **Incremental** dataset import job with the new data. The new data is appended to the existing data and the forecast is generated from the updated data. If your new data file contains both previously-imported data and new data, create a **Full** dataset import job.

1. Create a new forecast using the existing predictor.

1. Retrieve the forecast as usual.

# Handling Missing Values
<a name="howitworks-missing-values"></a>

A common issue in time-series forecasting data is the presence of missing values. Your data might contain missing values for a number of reasons, including measurement failures, formatting problems, human errors, or a lack of information to record. For instance, if you're forecasting product demand for a retail store and an item is sold out or unavailable, there would be no sales data to record while that item is out of stock. If prevalent enough, missing values can significantly impact a model's accuracy.

Amazon Forecast provides a number of filling methods to handle missing values in your target time series and related time series datasets. Filling is the process of adding standardized values to missing entries in your dataset.

Forecast supports the following filling methods:
+ **Middle filling** – Fills any missing values between the item start and item end date of a data set.
+ **Back filling** – Fills any missing values between the last recorded data point and global end date of a dataset.
+ **Future filling (related time series only)** – Fills any missing values between the global end date and the end of the forecast horizon.

The following image provides a visual representation of different filling methods.

![\[Timeline showing three items with varying durations and fill methods between global start and end dates.\]](http://docs.aws.amazon.com/forecast/latest/dg/images/Filling_types.PNG)


## Choosing Filling Logic
<a name="choosing-missing-values"></a>

When choosing a filling logic, you should consider how the logic will be interpreted by your model. For instance, in a retail scenario, recording 0 sales of an available item is different from recording 0 sales of an unavailable item, as the latter does not imply a lack of customer interest in the item. Because of this, `0` filling in the target time series might cause the predictor to be under-biased in its predictions, while `NaN` filling might ignore actual occurrences of 0 available items being sold and cause the predictor to be over-biased.

The following time-series graphs illustrate how choosing the wrong filling value can significantly affect the accuracy of your model. Graphs A and B plot the demand for an item that is partially out-of-stock, with the black lines representing actual sales data. Missing values in A1 are filled with `0`, leading to relatively under-biased predictions (represented by the dotted lines) in A2. Similarly, missing values in B1 are filled with `NaN`, which leads to predictions that are more exact in B2.

![\[Time-series graphs comparing item demand predictions with different filling values for missing data.\]](http://docs.aws.amazon.com/forecast/latest/dg/images/filling_values.PNG)


For a list of supported filling logic, see the following section.

## Target Time Series and Related Time Series Filling Logic
<a name="filling-restrictions"></a>

You can perform filling on both target time series and related time series datasets. Each dataset type has different filling guidelines and restrictions.


**Filling Guidelines**  

| Dataset type | Filling by default? | Supported filling methods | Default filling logic | Accepted filling logic | 
| --- | --- | --- | --- | --- | 
| Target time series | Yes | Middle and back filling | 0 |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/forecast/latest/dg/howitworks-missing-values.html)  | 
| Related time series | No | Middle, back, and future filling | No default |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/forecast/latest/dg/howitworks-missing-values.html)  | 

**Important**  
For both target and related time series datasets, `mean`, `median`, `min`, and `max` are calculated based on a rolling window of the 64 most recent data entries before the missing values.

## Missing Value Syntax
<a name="filling-syntax"></a>

To perform missing value filling, specify the types of filling to implement when you call the [CreatePredictor](API_CreatePredictor.md) operation. Filling logic is specified in [FeaturizationMethod](API_FeaturizationMethod.md) objects.

The following excerpt demonstrates a correctly formatted `FeaturizationMethod` object for a target time series attribute and related time series attribute (`target_value` and `price` respectively).

 To set a filling method to a specific value, set the fill parameter to `value` and define the value in a corresponding `_value` parameter. As shown below, backfilling for the related time series is set to a value of 2 with the following: `"backfill": "value"` and `"backfill_value":"2"`. 

```
[
    {
        "AttributeName": "target_value",
        "FeaturizationPipeline": [
            {
                "FeaturizationMethodName": "filling",
                "FeaturizationMethodParameters": {
                    "aggregation": "sum",
                    "middlefill": "zero",
                    "backfill": "zero"
                }
            }
        ]
    },
    {
        "AttributeName": "price",
        "FeaturizationPipeline": [
            {
                "FeaturizationMethodName": "filling",
                "FeaturizationMethodParameters": {
                    "middlefill": "median",
                    "backfill": "value",
                    "backfill_value": "2",
                    "futurefill": "max"               
                    }
            }
        ]
    }
]
```

# Dataset Guidelines for Forecast
<a name="dataset-import-guidelines-troubleshooting"></a>

Consult to the following guidelines if Amazon Forecast fails to import your dataset, or if your dataset doesn't function as expected.

**Timestamp Format**  
For Year (`Y`), Month (`M`), Week (`W`), and Day (`D`) collection frequencies, Forecast supports the `yyyy-MM-dd` timestamp format (for example, `2019-08-21`) and, optionally, the `HH:mm:ss` format (for example, `2019-08-21 15:00:00`).  
For Hour (`H`) and Minute (`M`) frequencies, Forecast supports only the `yyyy-MM-dd HH:mm:ss` format (for example `2019-08-21 15:00:00`).  
Guideline: Change the timestamp format for the collection frequency of your dataset to the supported format.

**Amazon S3 File or Bucket **  
When you import a dataset, you can specify either the path to a CSV or Parquet file in your Amazon Simple Storage Service (Amazon S3) bucket that contains your data or the name of the S3 bucket that contains your data. If you specify a CSV or Parquet file, Forecast imports just that file. If you specify an S3 bucket, Forecast imports all of the CSV or Parquet files in the bucket up to 10,000 files. If you import multiple files by specifying a bucket name, all CSV or Parquet files must conform to the specified schema.  
Guideline: Specify a specific file or an S3 bucket using the following syntax:  
`s3://bucket-name/example-object.csv`  
`s3://bucket-name/example-object.parquet`  
`s3://bucket-name/prefix/`  
`s3://bucket-name`  
Parquet files can have the extension .parquet, .parq, .pqt, or no extension at all.

**Full Dataset Updates**  
Your first dataset import is always a full import, subsequent imports can either be full or incremental updates. You must use the Forecast API to specify the import mode.  
With a full update, all existing data is replaced with the newly imported data. Because full dataset import jobs are not aggregated, your most recent dataset import is the one that is used when training a predictor or generating a forecast.  
Guideline: Create an incremental dataset update to append your new data to the existing data. Otherwise, ensure that your most recent dataset import contains all of the data you want to model, and not just the new data collected since the previous import.

**Incremental Dataset Updates**  
Fields such as timestamp, data format, geolocation, etc. are read from the currently active dataset. You do not need to include this information with an incremental dataset import. If they are included, they must match the originally provided values.   
Guideline: Perform a full dataset import to change any of these values.

**Attribute Order**  
The order of attributes specified in the schema definition must match the column order in the CSV or Parquet file that you are importing. For example, if you defined `timestamp` as the first attribute, then `timestamp` must also be the first column in the input file.   
Guideline: Verify that the columns in the input file are in the same order as the schema attributes that you created. 

**Weather Index**  
In order to apply the Weather Index, you must include a [geolocation attribute](weather.md#adding-geolocation) in your target time series and any related time series datasets. You also need to specify [time zones](weather.md#specifying-timezones) for your target time series timestamps.  
Guideline: Ensure that your datasets include a geolocation attribute and that your timestamps have an assigned time zone. For more information, refer to the Weather Index [Conditions and Restrictions.](weather.md#weather-conditions-restrictions)

**Dataset Header**  
A dataset header in your input CSV may cause a validation error. We recommend omitting a header for CSV files.  
Guideline: Delete the dataset header and try the import again.  
A dataset header is required for Parquet files. 

**Dataset Status**  
Before you can import training data with the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) operation, the `Status` of the dataset must be `ACTIVE`.   
Guideline: Use the [DescribeDataset](API_DescribeDataset.md) operation to get the dataset's status. If the creation or update of the dataset failed, check the formatting of your dataset file and attempt to create it again.

**Default File Format**  
The default file format is CSV. 

**File Format and Delimiter**  
Forecast supports only the comma-separated values (CSV) file format and Parquet format. You can't separate values using tabs, spaces, colons, or any other characters.  
Guideline: Convert your dataset to CSV format (using only commas as your delimiter) or Parquet format and try importing the file again. 

**File Name**  
File names must contain at least one alphabetic character. Files with names that are only numeric can't be imported.  
Guideline: Rename your input data file to include at least one alphabetic character and try importing the file again. 

**Partitioned Parquet Data**  
Forecast does not read partitioned Parquet files.

**What-if analysis Dataset Requirements**  
What-if analyses require CSV datasets. The TimeSeriesSelector operation of the [CreateWhatIfAnalysis](API_CreateWhatIfAnalysis.md) action and the TimeSeriesReplacementDataSource operation of the [CreateWhatIfForecast](API_CreateWhatIfForecast.md) do not accept Parquet files.