

# Content Domain 2: Exploratory Data Analysis
<a name="machine-learning-specialty-01-domain2"></a>

**Topics**
+ [Task 2.1: Sanitize and prepare data for modeling](#machine-learning-specialty-01-domain2-task1)
+ [Task 2.2: Perform feature engineering](#machine-learning-specialty-01-domain2-task2)
+ [Task 2.3: Analyze and visualize data for ML](#machine-learning-specialty-01-domain2-task3)

## Task 2.1: Sanitize and prepare data for modeling
<a name="machine-learning-specialty-01-domain2-task1"></a>
+ Identify and handle missing data, corrupt data, and stop words.
+ Format, normalize, augment, and scale data.
+ Determine whether there is sufficient labeled data.
  + Identify mitigation strategies.
  + Use data labelling tools (for example, Amazon Mechanical Turk).

## Task 2.2: Perform feature engineering
<a name="machine-learning-specialty-01-domain2-task2"></a>
+ Identify and extract features from datasets, including from data sources such as text, speech, images, and public datasets.
+ Analyze and evaluate feature engineering concepts (for example, binning, tokenization, outliers, synthetic features, one-hot encoding, reducing dimensionality of data).

## Task 2.3: Analyze and visualize data for ML
<a name="machine-learning-specialty-01-domain2-task3"></a>
+ Create graphs (for example, scatter plots, time series, histograms, box plots).
+ Interpret descriptive statistics (for example, correlation, summary statistics, p-value).
+ Perform cluster analysis (for example, hierarchical, diagnosis, elbow plot, cluster size).