PySpark Essentials for Data Scientists (Big Data + Python)

PySpark Essentials for Data Scientists (Big Data + Python)

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 16 Hours | 7.43 GB

Learn how to wrangle Big Data for Machine Learning using Python & MLflow on Apache Spark taught by an industry expert!

This course is for data scientists (or aspiring data scientists) who want to get PRACTICAL training in PySpark (Python for Apache Spark) using REAL WORLD datasets and APPLICABLE coding knowledge that you’ll use everyday as a data scientist! By enrolling in this course, you’ll gain access to over 100 lectures, hundreds of example problems and quizzes and over 100,000 lines of code!

I’m going to provide the essentials for what you need to know to be an expert in Pyspark by the end of this course, that I’ve designed based on my EXTENSIVE experience consulting as a data scientist for clients like the IRS, the US Department of Labor and United States Veterans Affairs.

I’ve structured the lectures and coding exercises for real world application, so you can understand how PySpark is actually used on the job. We are also going to dive into my custom functions that I wrote MYSELF to get you up and running in the MLlib API fast and make getting started building machine learning models a breeze! We will also touch on MLflow which will help us manage and track our model training and evaluation process in a custom user interface that will make you even more competitive on the job market!

Each section will have a concept review lecture as well as code along activities structured problem sets for you to work through to help you put what you have learned into action, as well as the solutions to each problem in case you get stuck. Additionally, real world consulting projects have been provided in every section with AUTHENTIC datasets to help you think through how to apply each of the concepts we have covered.

Lastly, I’ve written up some condensed review notebooks and handouts of all the course content to make it super easy for you to reference later on. This will be super helpful once you land your first job programming in PySpark!

What you’ll learn

  • Use Python with Big Data on a distributed framework (Apache Spark)
  • Work with REAL datasets on realistic consulting projects
  • Gets hands on practice solving REAL problems with BIG DATA
  • Integrate a UI to monitor your model training and development process with MLflow
  • Theory and application of cutting edge data science algorithms
  • Manipulate, Join and Aggregate Dataframes in Spark with Python
  • Learn how to apply Spark’s machine learning techniques on distributed Dataframes
  • Cross Validation & Hyperparameter Tuning
  • Frequent Pattern Mining Techniques
  • Classification & Regression Techniques
  • Data Wrangling for Natural Language Processing
  • How to write SQL Queries in Spark
Table of Contents

Course Introduction
1 Course Introduction
2 Course Orientation
3 Frequently Asked Questions
4 Resources for Setting up PySpark
5 Python Cheatsheet Resources
6 Introduction to PySpark
7 Transitioning from Python to PySpark Concept Review
8 Transitioning from Python to PySpark Code Along Activity

Dataframe Essentials Read, Write, Validate & Explore
9 Dataframe Essentials Concept Review
10 Search and Filter Dataframes HW Solution Code Review
11 A little something to keep you going….
12 SQL Options in SparkPySpark Code Along Activity
13 SQL Options in SparkPySpark HW
14 SQL Options in SparkPySpark HW Solutions
15 A little something to keep you going….
16 A little something to keep you going….
17 Read, Write and Validate Dataframes Code Along Activity
18 Read, Write and Validate Data HW
19 Read, Write and Validate Data HW Solutions Code Review
20 A little something to keep you going….
21 Search and Filter Dataframes Code Along Activity
22 Search and Filter Dataframes HW

Dataframe Essentials Clean, Manipulate, Join, Aggregate
23 Manipulating Dataframes Code Along Activity
24 Joining and Appending Dataframes HW
25 Joining and Appending Dataframes HW Solution Code Review
26 A little something to keep you going….
27 Handling Missing Data in Dataframes Code Along Activity
28 Handling Missing Data in Dataframes HW
29 Handling Missing Data in Dataframes HW Solution
30 Dataframe Essentials Coding Master Review
31 A little something to keep you going….
32 Manipulating Dataframes HW
33 Manipulating Dataframes HW Solution
34 A little something to keep you going….
35 Aggregating Data in Dataframes Code Along Activity
36 Aggregating Data in Dataframes HW
37 Aggregating Data in Dataframes HW Solution
38 A little something to keep you going….
39 Joining and Appending Dataframes Code Along Activity

Introduction to Spark MLlib
40 Introduction to Machine Learning Concept Review
41 Introduction to MLlib Concept Review
42 Model Selection and Tuning in MLlib Concept Review
43 A little something to keep you going….

Classification in MLlib
44 Introduction to Classification in MLlib Concept Review
45 Classification in MLlib Code Review Part 2.4 Train & Test Models [Naive Bayes]
46 Classification in MLlib Code Review Part 2.5 Train & Test Models [Linear SVM]
47 Classification in MLlib Code Review Part 2.6 Train & Test Models[Decision Tree]
48 Classification in MLlib Code Review Part 2.7 Train & Test Models[Random Forest]
49 Classification in MLlib Code Review Part 2.8 Train & Test Models [GBT]
50 A little something to keep you going….
51 BONUS Add loop functions to your training and evaluation script
52 BONUS Leverage MLflow to better track and manage your results
53 Classification Project
54 Remember to be creative with this project!
55 Classification Project Solution
56 A little something to keep you going….
57 Classification in MLlib Code Along Part 1 Data Formatting and Transformations
58 Classification in MLlib Code Review Part 2.0 Train and Evaluate Models [Intro]
59 Classification in MLlib Code Review Part 2.1 Train & Test Models [Logistic]
60 Classification in MLlib Code Review Part 2.2 Train & Test Models [1 vs Rest]
61 A little something to keep you going….
62 Classification in MLlib Code Review Part 2.3 Train & Test Models[Multilayer PC]

Natural Language Processing in MLlib
63 Introduction to Natural Language Processing
64 Natural Language Processing Project Solution
65 A little something to keep you going….
66 Natural Language Processing Concept Review [Part 1 Feature Transformers]
67 Natural Language Processing Concept Review [Part 2 Feature Extractors]
68 A little something to keep you going….
69 Natural Language Processing Code Along Activity Part 1 Data Prep
70 Natural Language Processing Code Along Activity Part 2 Vectorize, Train & Eval
71 Natural Language Processing Project

Regression in MLlib
72 Regression in MLlib Concept Review
73 A little something to keep you going….
74 BONUS Add loop functions to your regression training and evaluation script
75 Regression Project
76 And finally… have FUN with this project and LOVE what you do!
77 Regression Project Solution Code Along Activity
78 Regression in MLlib Code Review Introduction
79 Regression in MLlib Code Review Part 1 Data Prep
80 Regression in MLlib Code Review Part 2.0 Linear Regression
81 A little something to keep you going….
82 Regression in MLlib Code Review Part 2.1 Decision Tree Regression
83 Regression in MLlib Code Review Part 2.2 Random Forest Regression
84 Regression in MLlib Code Review Part 2.3 Gradient Boosted Tree Regression

Clustering in PySpark
85 Intro to Clustering in MLlib Concept Review
86 K-Means & Bisecting K-Means in MLlib Code Along Activity
87 Latent Dirichlet Allocation in MLlib Code Along Activity
88 A little something to keep you going….
89 Gaussian Mixture Modeling in MLlib Code Along Activity
90 Clustering Project Introduction
91 Clustering Project Solution Code Review
92 A little something to keep you going….

Frequent Pattern Mining in MLlib
93 Frequent Pattern Mining in MLlib Concept Review
94 Frequent Pattern Mining Code Along Activity [Part 1 FP-Growth]
95 Frequent Pattern Mining Code Along Activity [Part 2 PrefixSpan]
96 A little something to keep you going….
97 Frequent Pattern Mining Project Introduction
98 Frequent Pattern Mining Project Solution Code Review

Course Wrap-up
99 Closing Remarks
100 Tips for success moving forward
101 And finally… remember to set your goals HIGH!