AWS ML - Part 3
Domain 3: Modeling
3.1 Frame business problems as machine learning problems.
Determine when to use/when not to use ML
Know the difference between supervised and unsupervised learning
Selecting from among classification, regression, forecasting, clustering, recommendation, etc.
3.2 Select the appropriate model(s) for a given machine learning problem.
Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN,
CNN, Ensemble, Transfer learning
Express intuition behind models
3.3 Train machine learning models.
Train validation test split, cross-validation
Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability,
etc.
Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])
Model updates and retraining
o Batch vs. real-time/online
3.4 Perform hyperparameter optimization.
Regularization
o Drop out
o L1/L2
Cross validation
Model initialization
Neural network architecture (layers/nodes), learning rate, activation functions
Tree-based models (# of trees, # of levels)
Linear models (learning rate)
3.5 Evaluate machine learning models.
Avoid overfitting/underfitting (detect and handle bias and variance)
Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)
Confusion matrix
Offline and online model evaluation, A/B testing
Compare models using metrics (time to train a model, quality of model, engineering costs)
Cross validation