English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 14.5 Hours | 2.92 GB

Complete guide to Reinforcement Learning, with Stock Trading and Online Advertising Applications

When people talk about artificial intelligence, they usually don’t mean supervised and unsupervised machine learning.

These tasks are pretty trivial compared to what we think of AIs doing – playing chess and Go, driving cars, and beating video games at a superhuman level.

Reinforcement learning has recently become popular for doing all of that and more.

Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn’t been until recently that we’ve been able to observe first hand the amazing results that are possible.

In 2016 we saw Google’s AlphaGo beat the world Champion in Go.

We saw AIs playing video games like Doom and Super Mario.

Self-driving cars have started driving on real roads with other drivers and even carrying passengers (Uber), all without human assistance.

If that sounds amazing, brace yourself for the future because the law of accelerating returns dictates that this progress is only going to continue to increase exponentially.

Learning about supervised and unsupervised machine learning is no small feat. To date I have over TWENTY FIVE (25!) courses just on those topics alone.

And yet reinforcement learning opens up a whole new world. As you’ll learn in this course, the reinforcement learning paradigm is very from both supervised and unsupervised learning.

It’s led to new and amazing insights both in behavioral psychology and neuroscience. As you’ll learn in this course, there are many analogous processes when it comes to teaching an agent and teaching an animal or even a human. It’s the closest thing we have so far to a true artificial general intelligence. What’s covered in this course?

- The multi-armed bandit problem and the explore-exploit dilemma
- Ways to calculate means and moving averages and their relationship to stochastic gradient descent
- Markov Decision Processes (MDPs)
- Dynamic Programming
- Monte Carlo
- Temporal Difference (TD) Learning (Q-Learning and SARSA)
- Approximation Methods (i.e. how to plug in a deep neural network or other differentiable model into your RL algorithm)
- How to use OpenAI Gym, with zero code changes
- Project: Apply Q-Learning to build a stock trading bot

If you’re ready to take on a brand new challenge, and learn about AI techniques that you’ve never seen before in traditional supervised machine learning, unsupervised machine learning, or even deep learning, then this course is for you.

What you’ll learn

- Apply gradient-based supervised machine learning methods to reinforcement learning
- Understand reinforcement learning on a technical level
- Understand the relationship between reinforcement learning and psychology
- Implement 17 different reinforcement learning algorithms

## Table of Contents

**Welcome**

1 Introduction

2 Course Outline and Big Picture

3 Where to get the Code

4 How to Succeed in this Course

5 Warmup

**Return of the Multi-Armed Bandit**

6 Section Introduction The Explore-Exploit Dilemma

7 Applications of the Explore-Exploit Dilemma

8 Epsilon-Greedy Theory

9 Calculating a Sample Mean (pt 1)

10 Epsilon-Greedy Beginner’s Exercise Prompt

11 Designing Your Bandit Program

12 Epsilon-Greedy in Code

13 Comparing Different Epsilons

14 Optimistic Initial Values Theory

15 Optimistic Initial Values Beginner’s Exercise Prompt

16 Optimistic Initial Values Code

17 UCB1 Theory

18 UCB1 Beginner’s Exercise Prompt

19 UCB1 Code

20 Bayesian Bandits Thompson Sampling Theory (pt 1)

21 Bayesian Bandits Thompson Sampling Theory (pt 2)

22 Thompson Sampling Beginner’s Exercise Prompt

23 Thompson Sampling Code

24 Thompson Sampling With Gaussian Reward Theory

25 Thompson Sampling With Gaussian Reward Code

26 Why don’t we just use a library

27 Nonstationary Bandits

28 Bandit Summary, Real Data, and Online Learning

29 (Optional) Alternative Bandit Designs

30 Suggestion Box

**High Level Overview of Reinforcement Learning**

31 What is Reinforcement Learning

32 From Bandits to Full Reinforcement Learning

**Markov Decision Proccesses**

33 MDP Section Introduction

34 Gridworld

35 Choosing Rewards

36 The Markov Property

37 Markov Decision Processes (MDPs)

38 Future Rewards

39 Value Functions

40 The Bellman Equation (pt 1)

41 The Bellman Equation (pt 2)

42 The Bellman Equation (pt 3)

43 Bellman Examples

44 Optimal Policy and Optimal Value Function (pt 1)

45 Optimal Policy and Optimal Value Function (pt 2)

46 MDP Summary

**Dynamic Programming**

47 Dynamic Programming Section Introduction

48 Iterative Policy Evaluation

49 Designing Your RL Program

50 Gridworld in Code

51 Iterative Policy Evaluation in Code

52 Windy Gridworld in Code

53 Iterative Policy Evaluation for Windy Gridworld in Code

54 Policy Improvement

55 Policy Iteration

56 Policy Iteration in Code

57 Policy Iteration in Windy Gridworld

58 Value Iteration

59 Value Iteration in Code

60 Dynamic Programming Summary

**Monte Carlo**

61 Monte Carlo Intro

62 Monte Carlo Policy Evaluation

63 Monte Carlo Policy Evaluation in Code

64 Monte Carlo Control

65 Monte Carlo Control in Code

66 Monte Carlo Control without Exploring Starts

67 Monte Carlo Control without Exploring Starts in Code

68 Monte Carlo Summary

**Temporal Difference Learning**

69 Temporal Difference Introduction

70 TD(0) Prediction

71 TD(0) Prediction in Code

72 SARSA

73 SARSA in Code

74 Q Learning

75 Q Learning in Code

76 TD Learning Section Summary

**Approximation Methods**

77 Approximation Methods Section Introduction

78 Linear Models for Reinforcement Learning

79 Feature Engineering

80 Approximation Methods for Prediction

81 Approximation Methods for Prediction Code

82 Approximation Methods for Control

83 Approximation Methods for Control Code

84 CartPole

85 CartPole Code

86 Approximation Methods Exercise

87 Approximation Methods Section Summary

**Interlude Common Beginner Questions**

88 This Course vs. RL Book What’s the Difference

**Stock Trading Project with Reinforcement Learning**

89 Beginners, halt! Stop here if you skipped ahead

90 Stock Trading Project Section Introduction

91 Data and Environment

92 How to Model Q for Q-Learning

93 Design of the Program

94 Code pt 1

95 Code pt 2

96 Code pt 3

97 Code pt 4

98 Stock Trading Project Discussion

**Setting Up Your Environment (FAQ by Student Request)**

99 Windows-Focused Environment Setup 2018

100 How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow

**Extra Help With Python Coding for Beginners (FAQ by Student Request)**

101 How to Code by Yourself (part 1)

102 How to Code by Yourself (part 2)

103 Proof that using Jupyter Notebook is the same as not using it

104 Python 2 vs Python 3

**Effective Learning Strategies for Machine Learning (FAQ by Student Request)**

105 How to Succeed in this Course (Long Version)

106 Is this for Beginners or Experts Academic or Practical Fast or slow-paced

107 Machine Learning and AI Prerequisite Roadmap (pt 1)

108 Machine Learning and AI Prerequisite Roadmap (pt 2)

**Appendix FAQ Finale**

109 What is the Appendix

110 BONUS Where to get discount coupons and FREE deep learning material

Resolve the captcha to access the links!