Artificial Intelligence: Reinforcement Learning in Python

Artificial Intelligence: Reinforcement Learning in Python

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 112 lectures (14h 41m) | 4.16 GB

Complete guide to Reinforcement Learning, with Stock Trading and Online Advertising Applications

Ever wondered how AI technologies like OpenAI ChatGPT and GPT-4 really work? In this course, you will learn the foundations of these groundbreaking applications.

When people talk about artificial intelligence, they usually don’t mean supervised and unsupervised machine learning.

These tasks are pretty trivial compared to what we think of AIs doing – playing chess and Go, driving cars, and beating video games at a superhuman level.

Reinforcement learning has recently become popular for doing all of that and more.

Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn’t been until recently that we’ve been able to observe first hand the amazing results that are possible.

In 2016 we saw Google’s AlphaGo beat the world Champion in Go.

We saw AIs playing video games like Doom and Super Mario.

Self-driving cars have started driving on real roads with other drivers and even carrying passengers (Uber), all without human assistance.

If that sounds amazing, brace yourself for the future because the law of accelerating returns dictates that this progress is only going to continue to increase exponentially.

Learning about supervised and unsupervised machine learning is no small feat. To date I have over TWENTY FIVE (25!) courses just on those topics alone.

And yet reinforcement learning opens up a whole new world. As you’ll learn in this course, the reinforcement learning paradigm is very from both supervised and unsupervised learning.

It’s led to new and amazing insights both in behavioral psychology and neuroscience. As you’ll learn in this course, there are many analogous processes when it comes to teaching an agent and teaching an animal or even a human. It’s the closest thing we have so far to a true artificial general intelligence. What’s covered in this course?

The multi-armed bandit problem and the explore-exploit dilemma

  • Ways to calculate means and moving averages and their relationship to stochastic gradient descent
  • Markov Decision Processes (MDPs)
  • Dynamic Programming
  • Monte Carlo
  • Temporal Difference (TD) Learning (Q-Learning and SARSA)
  • Approximation Methods (i.e. how to plug in adeep neural network or other differentiable model into your RL algorithm)
  • How to use OpenAI Gym, with zero code changes
  • Project: Apply Q-Learning to build a stock trading bot

If you’re ready to take on a brand new challenge, and learn about AI techniques that you’ve never seen before in traditional supervised machine learning, unsupervised machine learning, or even deep learning, then this course is for you.

What you’ll learn

  • Apply gradient-based supervised machine learning methods to reinforcement learning
  • Understand reinforcement learning on a technical level
  • Understand the relationship between reinforcement learning and psychology
  • Implement 17 different reinforcement learning algorithms
  • Understand important foundations for OpenAI ChatGPT, GPT-4
Table of Contents

Welcome
1 Introduction
2 Course Outline and Big Picture
3 Where to get the Code
4 How to Succeed in this Course
5 Warmup

Return of the Multi-Armed Bandit
6 Section Introduction The Explore-Exploit Dilemma
7 Applications of the Explore-Exploit Dilemma
8 Epsilon-Greedy Theory
9 Calculating a Sample Mean (pt 1)
10 Epsilon-Greedy Beginner’s Exercise Prompt
11 Designing Your Bandit Program
12 Epsilon-Greedy in Code
13 Comparing Different Epsilons
14 Optimistic Initial Values Theory
15 Optimistic Initial Values Beginner’s Exercise Prompt
16 Optimistic Initial Values Code
17 UCB1 Theory
18 UCB1 Beginner’s Exercise Prompt
19 UCB1 Code
20 Bayesian Bandits Thompson Sampling Theory (pt 1)
21 Bayesian Bandits Thompson Sampling Theory (pt 2)
22 Thompson Sampling Beginner’s Exercise Prompt
23 Thompson Sampling Code
24 Thompson Sampling With Gaussian Reward Theory
25 Thompson Sampling With Gaussian Reward Code
26 Exercise on Gaussian Rewards
27 Why don’t we just use a library
28 Nonstationary Bandits
29 Bandit Summary, Real Data, and Online Learning
30 (Optional) Alternative Bandit Designs
31 Suggestion Box

High Level Overview of Reinforcement Learning
32 What is Reinforcement Learning
33 From Bandits to Full Reinforcement Learning

Markov Decision Proccesses
34 MDP Section Introduction
35 Gridworld
36 Choosing Rewards
37 The Markov Property
38 Markov Decision Processes (MDPs)
39 Future Rewards
40 Value Functions
41 The Bellman Equation (pt 1)
42 The Bellman Equation (pt 2)
43 The Bellman Equation (pt 3)
44 Bellman Examples
45 Optimal Policy and Optimal Value Function (pt 1)
46 Optimal Policy and Optimal Value Function (pt 2)
47 MDP Summary

Dynamic Programming
48 Dynamic Programming Section Introduction
49 Iterative Policy Evaluation
50 Designing Your RL Program
51 Gridworld in Code
52 Iterative Policy Evaluation in Code
53 Windy Gridworld in Code
54 Iterative Policy Evaluation for Windy Gridworld in Code
55 Policy Improvement
56 Policy Iteration
57 Policy Iteration in Code
58 Policy Iteration in Windy Gridworld
59 Value Iteration
60 Value Iteration in Code
61 Dynamic Programming Summary

Monte Carlo
62 Monte Carlo Intro
63 Monte Carlo Policy Evaluation
64 Monte Carlo Policy Evaluation in Code
65 Monte Carlo Control
66 Monte Carlo Control in Code
67 Monte Carlo Control without Exploring Starts
68 Monte Carlo Control without Exploring Starts in Code
69 Monte Carlo Summary

Temporal Difference Learning
70 Temporal Difference Introduction
71 TD(0) Prediction
72 TD(0) Prediction in Code
73 SARSA
74 SARSA in Code
75 Q Learning
76 Q Learning in Code
77 TD Learning Section Summary

Approximation Methods
78 Approximation Methods Section Introduction
79 Linear Models for Reinforcement Learning
80 Feature Engineering
81 Approximation Methods for Prediction
82 Approximation Methods for Prediction Code
83 Approximation Methods for Control
84 Approximation Methods for Control Code
85 CartPole
86 CartPole Code
87 Approximation Methods Exercise
88 Approximation Methods Section Summary

Interlude Common Beginner Questions
89 This Course vs. RL Book What’s the Difference

Stock Trading Project with Reinforcement Learning
90 Beginners, halt! Stop here if you skipped ahead
91 Stock Trading Project Section Introduction
92 Data and Environment
93 How to Model Q for Q-Learning
94 Design of the Program
95 Code pt 1
96 Code pt 2
97 Code pt 3
98 Code pt 4
99 Stock Trading Project Discussion

Setting Up Your Environment (FAQ by Student Request)
100 Pre-Installation Check
101 Anaconda Environment Setup
102 How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow

Extra Help With Python Coding for Beginners (FAQ by Student Request)
103 How to Code by Yourself (part 1)
104 How to Code by Yourself (part 2)
105 Proof that using Jupyter Notebook is the same as not using it
106 Python 2 vs Python 3

Effective Learning Strategies for Machine Learning (FAQ by Student Request)
107 How to Succeed in this Course (Long Version)
108 Is this for Beginners or Experts Academic or Practical Fast or slow-paced
109 Machine Learning and AI Prerequisite Roadmap (pt 1)
110 Machine Learning and AI Prerequisite Roadmap (pt 2)

Appendix FAQ Finale
111 What is the Appendix
112 BONUS

Homepage