AWS ML - Part 1

May 26, 2023

Recommended AWS knowledge

The target candidate should have the following knowledge:

 The ability to express the intuition behind basic ML algorithms

 Experience performing basic hyperparameter optimization

 Experience with ML and deep learning frameworks

 The ability to follow model-training best practices

 The ability to follow deployment best practices

 The ability to follow operational best practices

Domain 1: Data Engineering

1.1 Create data repositories for machine learning.

 Identify data sources (e.g., content and location, primary sources such as user data)

 Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)

1.2 Identify and implement a data ingestion solution.

 Data job styles/types (batch load, streaming)

 Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)

o Kinesis

o Kinesis Analytics

o Kinesis Firehose

o EMR

o Glue

 Job scheduling

1.3 Identify and implement a data transformation solution.

 Transforming data transit (ETL: Glue, EMR, AWS Batch)

 Handle ML-specific data using map reduce (Hadoop, Spark, Hive)

Domain 2: Exploratory Data Analysis

2.1 Sanitize and prepare data for modeling.

 Identify and handle missing data, corrupt data, stop words, etc.

 Formatting, normalizing, augmenting, and scaling data

 Labeled data (recognizing when you have enough labeled data and identifying mitigation

strategies [Data labeling tools (Mechanical Turk, manual labor)])