Home Page Icon
Home Page
Table of Contents for
Contributors
Close
Contributors
by Andrea Lonza
Reinforcement Learning Algorithms with Python
Title Page
Copyright and Credits
Reinforcement Learning Algorithms with Python
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Algorithms and Environments
The Landscape of Reinforcement Learning
An introduction to RL
Comparing RL and supervised learning
History of RL
Deep RL
Elements of RL
Policy
The value function
Reward
Model
Applications of RL
Games 
Robotics and Industry 4.0
Machine learning
Economics and finance
Healthcare
Intelligent transportation systems
Energy optimization and smart grid
Summary
Questions
Further reading
Implementing RL Cycle and OpenAI Gym
Setting up the environment
Installing OpenAI Gym 
Installing Roboschool 
OpenAI Gym and RL cycles
Developing an RL cycle
Getting used to spaces
Development of ML models using TensorFlow
Tensor
Constant
Placeholder
Variable
Creating a graph
Simple linear regression example
Introducing TensorBoard
Types of RL environments
Why different environments?
Open source environments
Summary
Questions
Further reading
Solving Problems with Dynamic Programming
MDP
Policy
Return
Value functions
Bellman equation
Categorizing RL algorithms
Model-free algorithms
Value-based algorithms
Policy gradient algorithms
Actor-Critic algorithms
Hybrid algorithms
Model-based RL
Algorithm diversity
Dynamic programming
Policy evaluation and policy improvement
Policy iteration
Policy iteration applied to FrozenLake
Value iteration
Value iteration applied to FrozenLake
Summary
Questions
Further reading
Section 2: Model-Free RL Algorithms
Q-Learning and SARSA Applications
Learning without a model
User experience
Policy evaluation
The exploration problem
Why explore?
How to explore
TD learning
TD update
Policy improvement
Comparing Monte Carlo and TD
SARSA
The algorithm
Applying SARSA to Taxi-v2
Q-learning
Theory
The algorithm
Applying Q-learning to Taxi-v2
Comparing SARSA and Q-learning
Summary
Questions
Deep Q-Network
Deep neural networks and Q-learning
Function approximation 
Q-learning with neural networks
Deep Q-learning instabilities
DQN
The solution
Replay memory
The target network
The DQN algorithm
The loss function
Pseudocode
Model architecture
DQN applied to Pong
Atari games 
Preprocessing
DQN implementation
DNNs
The experienced buffer
The computational graph and training loop
Results
DQN variations
Double DQN
DDQN implementation
Results
Dueling DQN
Dueling DQN implementation
Results
N-step DQN
Implementation
Results
Summary
Questions
Further reading
Learning Stochastic and PG Optimization
Policy gradient methods
The gradient of the policy
Policy gradient theorem
Computing the gradient
The policy
On-policy PG
Understanding the REINFORCE algorithm
Implementing REINFORCE
Landing a spacecraft using REINFORCE
Analyzing the results
REINFORCE with baseline
Implementing REINFORCE with baseline
Learning the AC algorithm
Using a critic to help an actor to learn
The n-step AC model
The AC implementation
Landing a spacecraft using AC 
Advanced AC, and tips and tricks
Summary
Questions
Further reading
TRPO and PPO Implementation
Roboschool
Control a continuous system
Natural policy gradient
Intuition behind NPG
A bit of math
FIM and KL divergence
Natural gradient complications
Trust region policy optimization
The TRPO algorithm
Implementation of the TRPO algorithm
Application of TRPO
Proximal Policy Optimization
A quick overview
The PPO algorithm
Implementation of PPO
PPO application
Summary
Questions
Further reading
DDPG and TD3 Applications
Combining policy gradient optimization with Q-learning
Deterministic policy gradient
Deep deterministic policy gradient
The DDPG algorithm
DDPG implementation
Appling DDPG to BipedalWalker-v2
Twin delayed deep deterministic policy gradient (TD3)
Addressing overestimation bias
Implementation of TD3
Addressing variance reduction
Delayed policy updates
Target regularization
Applying TD3 to BipedalWalker
Summary
Questions
Further reading
Section 3: Beyond Model-Free Algorithms and Improvements
Model-Based RL
Model-based methods
A broad perspective on model-based learning
A known model
Unknown model
Advantages and disadvantages
Combining model-based with model-free learning
A useful combination
Building a model from images
ME-TRPO applied to an inverted pendulum
Understanding ME-TRPO
Implementing ME-TRPO
Experimenting with RoboSchool
Results on RoboSchoolInvertedPendulum
Summary
Questions
Further reading
Imitation Learning with the DAgger Algorithm
Technical requirements
Installation of Flappy Bird
The imitation approach
The driving assistant example
Comparing IL and RL
The role of the expert in imitation learning
The IL structure
Comparing active with passive imitation
Playing Flappy Bird
How to use the environment
Understanding the dataset aggregation algorithm
The DAgger algorithm
Implementation of DAgger
Loading the expert inference model
Creating the learner's computational graph
Creating a DAgger loop
Analyzing the results on Flappy Bird
IRL
Summary
Questions
Further reading
Understanding Black-Box Optimization Algorithms
Beyond RL
A brief recap of RL
The alternative
EAs
The core of EAs
Genetic algorithms
Evolution strategies
CMA-ES
ES versus RL
Scalable evolution strategies
The core
Parallelizing ES
Other tricks
Pseudocode
Scalable implementation
The main function
Workers
Applying scalable ES to LunarLander
Summary
Questions
Further reading
Developing the ESBAS Algorithm
Exploration versus exploitation
Multi-armed bandit
Approaches to exploration
The ∈-greedy strategy
The UCB algorithm
UCB1
Exploration complexity
Epochal stochastic bandit algorithm selection
Unboxing algorithm selection
Under the hood of ESBAS
Implementation
Solving Acrobot
Results
Summary
Questions
Further reading
Practical Implementation for Resolving RL Challenges
Best practices of deep RL
Choosing the appropriate algorithm
From zero to one
Challenges in deep RL
Stability and reproducibility
Efficiency
Generalization
Advanced techniques
Unsupervised RL
Intrinsic reward
Transfer learning
Types of transfer learning
1-task learning
Multi-task learning
RL in the real world
Facing real-world challenges
Bridging the gap between simulation and the real world
Creating your own environment
Future of RL and its impact on society
Summary
Questions
Further reading
Assessments
Other Books You May Enjoy
Leave a review - let other readers know what you think
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Why subscribe?
Next
Next Chapter
About the author
Contributors
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset