

# Content Domain 1: Data Engineering
<a name="machine-learning-specialty-01-domain1"></a>

**Topics**
+ [

## Task 1.1: Create data repositories for ML
](#machine-learning-specialty-01-domain1-task1)
+ [

## Task 1.2: Identify and implement a data ingestion solution
](#machine-learning-specialty-01-domain1-task2)
+ [

## Task 1.3: Identify and implement a data transformation solution
](#machine-learning-specialty-01-domain1-task3)

## Task 1.1: Create data repositories for ML
<a name="machine-learning-specialty-01-domain1-task1"></a>
+ Identify data sources (for example, content and location, primary sources such as user data).
+ Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]).

## Task 1.2: Identify and implement a data ingestion solution
<a name="machine-learning-specialty-01-domain1-task2"></a>
+ Identify data job styles and job types (for example, batch load, streaming).
+ Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads).
  + Amazon Kinesis
  + Amazon Data Firehose
  + Amazon EMR
  + AWS Glue
  + Amazon Managed Service for Apache Flink
+ Schedule jobs.

## Task 1.3: Identify and implement a data transformation solution
<a name="machine-learning-specialty-01-domain1-task3"></a>
+ Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).
+ Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).