The Complete Pandas Bootcamp 2020: Data Science with Python

The Complete Pandas Bootcamp 2020: Data Science with Python

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 32 Hours | 11.4 GB

Pandas fully explained | 150+ Exercises | Must-have skills for Machine Learning & Finance | + Scikit-Learn and Seaborn

This course is structured in four parts, beginning from zero with all the Pandas Basics (Part 1). Part 2 is the heart of this course and shows the complete data workflow: Importing, Cleaning, Merging, Aggregating, Grouping, and Preparing Data for Statistics & Machine Learning. Finally, you can test your new skills in a Comprehensive Project Challenge that is frequently used in Data Science job applications/assessment centres (Part 3). In the last part of this course (Part 4), you will learn how to import, handle, and work with (financial) Time Series Data.

Why should you learn Pandas?

The world is getting more and more data-driven. Data Scientists are gaining ground with $100k+ salaries. It´s time to switch from soapbox cars (spreadsheet software like Excel) to High Tuned Racing Cars (Pandas)!

Python is a great platform/environment for Data Science with powerful Tools for Science, Statistics, Finance, and Machine Learning. The Pandas Library is the Heart of Python Data Science. Pandas enables you to import, clean, join/merge/concatenate, manipulate, and deeply understand your Data and finally prepare/process Data for further Statistical Analysis, Machine Learning, or Data Presentation. In reality, all of these tasks require a high proficiency in Pandas! Data Scientists typically spend up to 85% of their time with manipulating Data in Pandas.

What you’ll learn

  • Bring your Data Handling & Data Analysis skills to an outstanding level.
  • Learn and practice all relevant Pandas methods and workflows with Real-World Datasets
  • Learn Pandas based on NEW Version 1.0 (the days of versions 0.x are over)
  • Import, clean, and merge messy Data and prepare Data for Machine Learning
  • Master a complete Machine Learning Project A-Z with Pandas, Scikit-Learn, and Seaborn
  • Analyze, visualize, and understand your Data with Pandas, Matplotlib, and Seaborn
  • Practice and master your Pandas skills with Quizzes, 150+ Exercises, and Comprehensive Projects
  • Import Financial/Stock Data from Web Sources and analyze them with Pandas
  • Learn and master the most important Pandas workflows for Finance
  • Learn how to best transition from Versions 0.X to new Version 1.0
  • Learn the Basics of Pandas and Numpy Coding (Appendix)
  • Learn and master important Statistical Concepts with scipy
Table of Contents

Getting Started
Overview Student FAQ
Tips How to get the most out of this course
Did you know that…
More FAQ Important Information
Installation of Anaconda
Opening a Jupyter Notebook
How to use Jupyter Notebooks
How to tackle Pandas Version 1.0

PART 1 PANDAS FROM ZERO TO HERO (BUILDING BLOCKS)—
Intro to Tabular Data Pandas
Download Part 1 Course Materials

Pandas Basics (DataFrame Basics I)
Create your very first Pandas DataFrame (from csv)
Selecting one Column with the dot notation
Zero-based Indexing and Negative Indexing
Selecting Rows with iloc (position-based indexing)
Slicing Rows and Columns with iloc (position-based indexing)
Position-based Indexing Cheat Sheets
Selecting Rows with loc (label-based indexing)
Slicing Rows and Columns with loc (label-based indexing)
Label-based Indexing Cheat Sheets
Indexing and Slicing with reindex()
Summary, Best Practices and Outlook
Pandas Display Options and the methods head() & tail()
Indexing and Slicing
Coding Exercise 2 (Intro)
Coding Exercise 2 (Solution)
Advanced Indexing and Slicing (optional)
First Data Inspection
Built-in Functions, Attributes and Methods with Pandas
Make it easy TAB Completion and Tooltip
First Steps
Explore your own Dataset Coding Exercise 1 (Intro)
Explore your own Dataset Coding Exercise 1 (Solution)
Selecting Columns

Pandas Series and Index Objects
Intro
idxmin() and idxmax()
Manipulating Pandas Series
Pandas Series
Coding Exercise 3 (Intro)
Coding Exercise 3 (Solution)
First Steps with Pandas Index Objects
Creating Index Objects from Scratch
Changing Row Index with set index() and reset index()
Changing Column Labels
Renaming Index & Column Labels with rename()
First Steps with Pandas Series
Pandas Index objects
Coding Exercise 4 (Intro)
Coding Exercise 4 (Solution)
Analyzing Numerical Series with unique(), nunique() and value counts()
Analyzing non-numerical Series with unique(), nunique(), value counts()
Creating Pandas Series (Part 1)
Creating Pandas Series (Part 2)
Indexing and Slicing Pandas Series
Sorting of Series and Introduction to the inplace – parameter
nlargest() and nsmallest()

DataFrame Basics II
Intro
Creating Columns based on other Columns
Adding Columns with insert()
Creating DataFrames from Scratch with pd.DataFrame()
Adding new Rows (hands-on approach)
DataFrame Basics II
Coding Exercise 5 (Intro)
Coding Exercise 5 (Solution)
Filtering DataFrames by one Condition
Filtering DataFrames by many Conditions (AND)
Filtering DataFrames by many Conditions (OR)
Advanced Filtering with between(), isin() and ~
any() and all()
Removing Columns
Removing Rows
Adding new Columns to a DataFrame

Manipulating Elements in a DataFrame Slice +++Important, know the Pitfalls!+++
Intro
Best Practice (How you should do it)
Chained Indexing How you should NOT do it (Part 1)
Chained Indexing How you should NOT do it (Part 2)
View vs. Copy
Simple Rules what to do when…
Manipulating DataFrames Slices
Coding Exercise 6 (Intro)
Coding Exercise 6 (Solution)

DataFrame Basics III
Intro
Hierarchical Indexing (Part 1)
Hierarchical Indexing (Part 2)
String Operations (Part 1)
String Operations (Part 2)
Coding Exercise 8 (Intro)
Coding Exercise 8 (Solution)
Sorting DataFrames with sort index() and sort values() (Version 1.0 Update)
Ranking DataFrames with rank()
nunique() and nlargest() nsmallest() with DataFrames
Summary Statistics and Accumulations
The agg() method
Coding Exercise 7 (Intro)
Coding Exercise 7 (Solution)
User-defined Functions with apply(), map() and applymap()

Visualization with Matplotlib
Intro
The plot() method
Customization of Plots
Histograms (Part 1)
Histograms (Part 2)
Barcharts and Piecharts
Scatterplots
Coding Exercise 9 (Intro)
Coding Exercise 9 (Solution)

PART 2 FULL DATA WORKFLOW A-Z—-
Welcome to PART 2 Full Data Workflow A-Z
Download Part 2 Course Materials

Importing Data
Importing csv-files with pd.read csv
Importing messy csv-files with pd.read csv
Importing Data from Excel with pd.read excel()
Importing messy Data from Excel with pd.read excel()
Importing Data from the Web with pd.read html()
Coding Exercise 10

Cleaning Data
First Inspection & Handling of inconsistent Data
Handling Removing Duplicates
The ignore index parameter (NEW in Pandas 1.0)
Detection of Outliers
Handling Removing Outliers
Categorical Data
Pandas Version 1.0 New dtypes and pd.NA
Coding Exercise 11 (Intro)
Coding Exercise 11 (Solution)
String Operations
Changing Datatype of Columns with astype()
Intro NA values missing values
Detection of missing Values
Removing missing values
Replacing missing values
Intro Duplicates
Detection of Duplicates

Merging, Joining, and Concatenating Data
Intro
Right Joins (without Intersection) with merge()
Left Joins with merge()
Right Joins with merge()
Joining on different Column Names Indexes
Joining on more than one Column
pd.merge() and join()
Coding Exercise 12
Adding Rows with append() and pd.concat() (Part 1)
Adding Rows with pd.concat() (Part 2)
Arithmetic with Pandas Objects Data Alignment
EXCURSUS Comparing two DataFrames Identify Differences
Outer Joins with merge()
Inner Joins with merge()
Outer Joins (without Intersection) with merge()
Left Joins (without Intersection) with merge()

GroupBy Operations
Intro
Replacing NA Values by group-specific Values
Generalizing split-apply-combine with apply()
Hierarchical Indexing with Groupby
stack() and unstack()
GroupBy 2
Coding Exercise 13 (Intro)
Coding Exercise 13 (Solution)
Understanding the GroupBy Object
Splitting with many Keys
split-apply-combine explained
split-apply-combine applied
GroupBy 1
Advanced aggregation with agg()
GroupBy Aggregation with Relabeling (NEW – Pandas Version 0.25)
Transformation with transform()

Reshaping and Pivoting DataFrames
Intro
Transposing Rows and Columns
Pivoting DataFrames with pivot()
Limits of pivot()
pivot table()
pd.crosstab()
melting DataFrames with melt()
Coding Exercise 14

Data Preparation and Feature Creation
Intro
Scaling Standardization
Creating Dummy Variables
String Operations
Coding Exercise 15
Arithmetic Operations (Part 1)
Arithmetic Operations (Part 2)
TransformationMapping with map()
Conditional Transformation
Discretization and Binning with pd.cut() (Part 1)
Discretization and Binning with pd.cut() (Part 2)
Discretization and Binning with pd.qcut()
Floors and Caps

Advanced Visualization with Seaborn
Intro
First Steps in Seaborn
Categorical Plots
Joint Plots Regression Plots
Matrixplots Heatmaps
Coding Exercise 16

PART 3 COMPREHENSIVE PROJECT CHALLENGE—
Download Part 3 Course Materials
Olympic Medal Tables (Instruction & Hints)
Olympic Medal Tables (Solution Part 1)
Olympic Medal Tables (Solution Part 2)
Olympic Medal Tables (Solution Part 3)

+++ BONUS PROJECT Machine Learning A-Z with Scikit-Learn, Pandas & Seaborn +++
Project Intro
Training the Machine Learning Model
TestingEvaluating the Model with the Test Set
Feature Importance
Downloads
Importing the Dataset and first Inspection
Cleaning the Data and Creating more Features
Explanatory Data Analysis (Part 1)
Explanatory Data Analysis (Part 2)
Feature Engineering (Part 1)
Feature Engineering (Part 2)
Splitting the Data into Training Set and Test Set

PART 4 MANAGING TIME SERIES DATA WITH PANDAS—-
Welcome to PART 4 Time Series Data with Pandas
Download Part 4 Course Materials

Time Series Basics
Importing Time Series Data from csv-files
Advanced Indexing with reindex()
Converting strings to datetime objects with pd.to datetime()
Initial Analysis Visualization of Time Series
Indexing and Slicing Time Series
Creating a customized DatetimeIndex with pd.date range()
More on pd.date range()
Downsampling Time Series with resample() (Part 1)
Downsampling Time Series with resample (Part 2)
The PeriodIndex object

Time Series Advanced Financial Time Series
Intro
Financial Time Series – Covariance and Correlation
Helpful DatetimeIndex Attributes and Methods
Filling NA Values with bfill, ffill and interpolation
Coding Exercise 17
Getting Ready (Installing required package)
Importing Stock Price Data from Yahoo Finance (it still works!)
Initial Inspection and Visualization
Normalizing Time Series to a Base Value (100)
The shift() method
The methods diff() and pct change()
Measuring Stock Performance with MEAN Returns and STD of Returns
Financial Time Series – Return and Risk

+++ WHAT´S NEW IN PANDAS VERSION 1.0 – A HANDS-ON GUIDE +++
Intro and Overview
The NEW StringDtype
The NEW nullable BooleanDtype
Addition of the ignore index parameter
Removal of prior Version Deprecations
How to update Pandas to Version 1.0
Downloads for this Section
Important Recap Pandas Display Options (Changed in Version 0.25)
Info() method – new and extended output
NEW Extension dtypes (nullable dtypes) Why do we need them
Creating the NEW extension dtypes with convert dtypes()
NEW pd.NA value for missing values
The NEW nullable Int64Dtype

APPENDIX PYTHON BASICS, NUMPY & STATISTICS—
Welcome to the Appendix

Python Basics
Downloads
Data Types Sets
Operators & Booleans
Conditional Statements (if, elif, else, while)
For Loops
Key words break, pass, continue
Generating Random Numbers
User Defined Functions (Part 1)
User Defined Functions (Part 2)
User Defined Functions (Part 3)
Visualization with Matplotlib
Intro
Python Basics
Python Basics Quiz Solution
First Steps
Variables
Data Types Integers and Floats
Data Types Strings
Data Types Lists (Part 1)
Data Types Lists (Part 2)
Data Types Tuples

The Numpy Package
Downloads
Case Study Numpy vs. Python Standard Library
Summary Statistics
Visualization and (Linear) Regression
Numpy
Numpy Quiz Solution
Introduction to Numpy Arrays
Numpy Arrays Vectorization
Numpy Arrays Indexing and Slicing
Numpy Arrays Shape and Dimensions
Numpy Arrays Indexing and Slicing of multi-dimensional Arrays
Numpy Arrays Boolean Indexing
Generating Random Numbers
Performance Issues

Statistical Concepts
Statistics – Overview, Terms and Vocabulary
Downloads for this Section
Population vs. Sample
Visualizing Frequency Distributions with plt.hist()
Relative and Cumulative Frequencies with plt.hist()
Measures of Central Tendency (Theory)
Coding Measures of Central Tendency – Mean and Median
Coding Measures of Central Tendency – Geometric Mean
Variability around the Central Tendency Dispersion (Theory)
Minimum, Maximum and Range with PythonNumpy
Percentiles with PythonNumpy
Variance and Standard Deviation with PythonNumpy
Skew and Kurtosis (Theory)
How to calculate Skew and Kurtosis with scipy.stats
How to generate Random Numbers with Numpy
Reproducibility with np.random.seed()
Probability Distributions – Overview
Discrete Uniform Distributions
Continuous Uniform Distributions
The Normal Distribution (Theory)
Creating a normally distributed Random Variable
Normal Distribution – Probability Density Function (pdf) with scipy.stats
Normal Distribution – Cumulative Distribution Function (cdf) with scipy.stats
The Standard Normal Distribution and Z-Values
Properties of the Standard Normal Distribution (Theory)
Probabilities and Z-Values with scipy.stats
Confidence Intervals with scipy.stats
Covariance and Correlation Coefficient (Theory)
Cleaning and preparing the Data – Movies Database (Part 1)
Cleaning and preparing the Data – Movies Database (Part 2)
How to calculate Covariance and Correlation in Python
Correlation and Scatterplots – visual Interpretation
What is Linear Regression (Theory)
A simple Linear Regression Model with numpy & Scipy
How to interpret Intercept and Slope Coefficient
Case Study (Part 1) The Market Model (Single Factor Model)
Case Study (Part 2) The Market Model (Single Factor Model)

What´s next
Get your special BONUS here!