

# Content Domain 3: Modeling


**Topics**
+ [

## Task 3.1: Frame business problems as ML problems
](#machine-learning-specialty-01-domain3-task1)
+ [

## Task 3.2: Select the appropriate model(s) for a given ML problem
](#machine-learning-specialty-01-domain3-task2)
+ [

## Task 3.3: Train ML models
](#machine-learning-specialty-01-domain3-task3)
+ [

## Task 3.4: Perform hyperparameter optimization
](#machine-learning-specialty-01-domain3-task4)
+ [

## Task 3.5: Evaluate ML models
](#machine-learning-specialty-01-domain3-task5)

## Task 3.1: Frame business problems as ML problems

+ Determine when to use and when not to use ML.
+ Know the difference between supervised and unsupervised learning.
+ Select from among classification, regression, forecasting, clustering, recommendation, and foundation models.

## Task 3.2: Select the appropriate model(s) for a given ML problem

+ XGBoost, logistic regression, k-means, linear regression, decision trees, random forests, RNN, CNN, ensemble, transfer learning, and large language models (LLMs)
+ Express the intuition behind models.

## Task 3.3: Train ML models

+ Split data between training and validation (for example, cross validation).
+ Understand optimization techniques for ML training (for example, gradient descent, loss functions, convergence).
+ Choose appropriate compute resources (for example GPU or CPU, distributed or non-distributed).
  + Choose appropriate compute platforms (Spark or non-Spark).
+ Update and retrain models.
  + Batch or real-time/online

## Task 3.4: Perform hyperparameter optimization

+ Perform regularization.
  + Dropout
  + L1/L2
+ Perform cross-validation.
+ Initialize models.
+ Understand neural network architecture (layers and nodes), learning rate, and activation functions.
+ Understand tree-based models (number of trees, number of levels).
+ Understand linear models (learning rate).

## Task 3.5: Evaluate ML models

+ Avoid overfitting or underfitting.
  + Detect and handle bias and variance.
+ Evaluate metrics (for example, area under curve [AUC]-receiver operating characteristics [ROC], accuracy, precision, recall, Root Mean Square Error [RMSE], F1 score).
+ Interpret confusion matrices.
+ Perform offline and online model evaluation (A/B testing).
+ Compare models by using metrics (for example, time to train a model, quality of model, engineering costs).
+ Perform cross-validation.