# Content Domain 1: Data Engineering **Topics** + [ ## Task 1.1: Create data repositories for ML ](#machine-learning-specialty-01-domain1-task1) + [ ## Task 1.2: Identify and implement a data ingestion solution ](#machine-learning-specialty-01-domain1-task2) + [ ## Task 1.3: Identify and implement a data transformation solution ](#machine-learning-specialty-01-domain1-task3) ## Task 1.1: Create data repositories for ML + Identify data sources (for example, content and location, primary sources such as user data). + Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]). ## Task 1.2: Identify and implement a data ingestion solution + Identify data job styles and job types (for example, batch load, streaming). + Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads). + Amazon Kinesis + Amazon Data Firehose + Amazon EMR + AWS Glue + Amazon Managed Service for Apache Flink + Schedule jobs. ## Task 1.3: Identify and implement a data transformation solution + Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch). + Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).