AWS ML - Part 1
Recommended AWS knowledge
The target candidate should have the following knowledge:
The ability to express the intuition behind basic ML algorithms
Experience performing basic hyperparameter optimization
Experience with ML and deep learning frameworks
The ability to follow model-training best practices
The ability to follow deployment best practices
The ability to follow operational best practices
Domain 1: Data Engineering
1.1 Create data repositories for machine learning.
Identify data sources (e.g., content and location, primary sources such as user data)
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
1.2 Identify and implement a data ingestion solution.
Data job styles/types (batch load, streaming)
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)
o Kinesis
o Kinesis Analytics
o Kinesis Firehose
o EMR
o Glue
Job scheduling
1.3 Identify and implement a data transformation solution.
Transforming data transit (ETL: Glue, EMR, AWS Batch)
Handle ML-specific data using map reduce (Hadoop, Spark, Hive)
Domain 2: Exploratory Data Analysis
2.1 Sanitize and prepare data for modeling.
Identify and handle missing data, corrupt data, stop words, etc.
Formatting, normalizing, augmenting, and scaling data
Labeled data (recognizing when you have enough labeled data and identifying mitigation
strategies [Data labeling tools (Mechanical Turk, manual labor)])