Table of Contents
- Distributed Computing Primer
- Data Ingestion
- Data Cleansing and Integration
- Real-time Data Analytics
- Scalable Machine Learning with PySpark
- Feature Engineering – Extraction, Transformation, and Selection
- Supervised Machine Learning
- Unsupervised Machine Learning
- Machine Learning Life Cycle Management
- Scaling Out Single-Node Machine Learning Using PySpark
- Data Visualization with PySpark
- Spark SQL Primer
- Integrating External Tools with Spark SQL
- The Data Lakehouse

