Table of Contents
- Installing Pyspark and Setting up Your Development Environment
- Getting Your Big Data into the Spark Environment Using RDDs
- Big Data Cleaning and Wrangling with Spark Notebooks
- Aggregating and Summarizing Data into Useful Reports
- Powerful Exploratory Data Analysis with MLlib
- Putting Structure on Your Big Data with SparkSQL
- Transformations and Actions
- Immutable Design
- Avoiding Shuffle and Reducing Operational Expenses
- Saving Data in the Correct Format
- Working with the Spark Key/Value API
- Testing Apache Spark Jobs
- Leveraging the Spark GraphX API

