AWS ML - Part 3

May 26, 2023

Domain 3: Modeling

3.1 Frame business problems as machine learning problems.

 Determine when to use/when not to use ML

 Know the difference between supervised and unsupervised learning

 Selecting from among classification, regression, forecasting, clustering, recommendation, etc.

3.2 Select the appropriate model(s) for a given machine learning problem.

 Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN,

CNN, Ensemble, Transfer learning

 Express intuition behind models

3.3 Train machine learning models.

 Train validation test split, cross-validation

 Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability,

etc.

 Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])

 Model updates and retraining

o Batch vs. real-time/online

3.4 Perform hyperparameter optimization.

 Regularization

o Drop out

o L1/L2

 Cross validation

 Model initialization

 Neural network architecture (layers/nodes), learning rate, activation functions

 Tree-based models (# of trees, # of levels)

 Linear models (learning rate)

3.5 Evaluate machine learning models.

 Avoid overfitting/underfitting (detect and handle bias and variance)

 Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)

 Confusion matrix

 Offline and online model evaluation, A/B testing

 Compare models using metrics (time to train a model, quality of model, engineering costs)

 Cross validation