Title Page Copyright and Credits Reinforcement Learning Algorithms with Python Dedication About Packt Why subscribe? Contributors About the author About the reviewer Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Section 1: Algorithms and Environments The Landscape of Reinforcement Learning An introduction to RL Comparing RL and supervised learning History of RL Deep RL Elements of RL Policy The value function Reward Model Applications of RL Games  Robotics and Industry 4.0 Machine learning Economics and finance Healthcare Intelligent transportation systems Energy optimization and smart grid Summary Questions Further reading Implementing RL Cycle and OpenAI Gym Setting up the environment Installing OpenAI Gym  Installing Roboschool  OpenAI Gym and RL cycles Developing an RL cycle Getting used to spaces Development of ML models using TensorFlow Tensor Constant Placeholder Variable Creating a graph Simple linear regression example Introducing TensorBoard Types of RL environments Why different environments? Open source environments Summary Questions Further reading Solving Problems with Dynamic Programming MDP Policy Return Value functions Bellman equation Categorizing RL algorithms Model-free algorithms Value-based algorithms Policy gradient algorithms Actor-Critic algorithms Hybrid algorithms Model-based RL Algorithm diversity Dynamic programming Policy evaluation and policy improvement Policy iteration Policy iteration applied to FrozenLake Value iteration Value iteration applied to FrozenLake Summary Questions Further reading Section 2: Model-Free RL Algorithms Q-Learning and SARSA Applications Learning without a model User experience Policy evaluation The exploration problem Why explore? How to explore TD learning TD update Policy improvement Comparing Monte Carlo and TD SARSA The algorithm Applying SARSA to Taxi-v2 Q-learning Theory The algorithm Applying Q-learning to Taxi-v2 Comparing SARSA and Q-learning Summary Questions Deep Q-Network Deep neural networks and Q-learning Function approximation  Q-learning with neural networks Deep Q-learning instabilities DQN The solution Replay memory The target network The DQN algorithm The loss function Pseudocode Model architecture DQN applied to Pong Atari games  Preprocessing DQN implementation DNNs The experienced buffer The computational graph and training loop Results DQN variations Double DQN DDQN implementation Results Dueling DQN Dueling DQN implementation Results N-step DQN Implementation Results Summary Questions Further reading Learning Stochastic and PG Optimization Policy gradient methods The gradient of the policy Policy gradient theorem Computing the gradient The policy On-policy PG Understanding the REINFORCE algorithm Implementing REINFORCE Landing a spacecraft using REINFORCE Analyzing the results REINFORCE with baseline Implementing REINFORCE with baseline Learning the AC algorithm Using a critic to help an actor to learn The n-step AC model The AC implementation Landing a spacecraft using AC  Advanced AC, and tips and tricks Summary Questions Further reading TRPO and PPO Implementation Roboschool Control a continuous system Natural policy gradient Intuition behind NPG A bit of math FIM and KL divergence Natural gradient complications Trust region policy optimization The TRPO algorithm Implementation of the TRPO algorithm Application of TRPO Proximal Policy Optimization A quick overview The PPO algorithm Implementation of PPO PPO application Summary Questions Further reading DDPG and TD3 Applications Combining policy gradient optimization with Q-learning Deterministic policy gradient Deep deterministic policy gradient The DDPG algorithm DDPG implementation Appling DDPG to BipedalWalker-v2 Twin delayed deep deterministic policy gradient (TD3) Addressing overestimation bias Implementation of TD3 Addressing variance reduction Delayed policy updates Target regularization Applying TD3 to BipedalWalker Summary Questions Further reading Section 3: Beyond Model-Free Algorithms and Improvements Model-Based RL Model-based methods A broad perspective on model-based learning A known model Unknown model Advantages and disadvantages Combining model-based with model-free learning A useful combination Building a model from images ME-TRPO applied to an inverted pendulum Understanding ME-TRPO Implementing ME-TRPO Experimenting with RoboSchool Results on RoboSchoolInvertedPendulum Summary Questions Further reading Imitation Learning with the DAgger Algorithm Technical requirements Installation of Flappy Bird The imitation approach The driving assistant example Comparing IL and RL The role of the expert in imitation learning The IL structure Comparing active with passive imitation Playing Flappy Bird How to use the environment Understanding the dataset aggregation algorithm The DAgger algorithm Implementation of DAgger Loading the expert inference model Creating the learner's computational graph Creating a DAgger loop Analyzing the results on Flappy Bird IRL Summary Questions Further reading Understanding Black-Box Optimization Algorithms Beyond RL A brief recap of RL The alternative EAs The core of EAs Genetic algorithms Evolution strategies CMA-ES ES versus RL Scalable evolution strategies The core Parallelizing ES Other tricks Pseudocode Scalable implementation The main function Workers Applying scalable ES to LunarLander Summary Questions Further reading Developing the ESBAS Algorithm Exploration versus exploitation Multi-armed bandit Approaches to exploration The ∈-greedy strategy The UCB algorithm UCB1 Exploration complexity Epochal stochastic bandit algorithm selection Unboxing algorithm selection Under the hood of ESBAS Implementation Solving Acrobot Results Summary Questions Further reading Practical Implementation for Resolving RL Challenges Best practices of deep RL Choosing the appropriate algorithm From zero to one Challenges in deep RL Stability and reproducibility Efficiency Generalization Advanced techniques Unsupervised RL Intrinsic reward Transfer learning Types of transfer learning 1-task learning Multi-task learning RL in the real world Facing real-world challenges Bridging the gap between simulation and the real world Creating your own environment Future of RL and its impact on society Summary Questions Further reading Assessments Other Books You May Enjoy Leave a review - let other readers know what you think