Experimenting with RoboSchool

Let's test ME-TRPO on RoboSchoolInvertedPendulum, a continuous inverted pendulum environment similar to the well-known discrete control counterpart, CartPole. A screenshot of RoboSchoolInvertedPendulum-v1 is shown here:

The goal is to keep the pole upright by moving the cart. A reward of +1 is obtained for every step that the pole points upward.

Considering that ME-TRPO needs the reward function and, consequently, a done function, we have to define both for this task. To this end, we defined pendulum_reward, which returns 1 no matter what the observation and actions are:

def pendulum_reward(ob, ac):
return 1

 pendulum_done returns True if the absolute value of the angle of the pole is higher than a fixed threshold. We can retrieve the angle directly from the state. In fact, the third and fourth elements of the state are the cosine and sine of the angle, respectively. We can then arbitrarily choose one of the two to compute the angle. Hence, pendulum_done is as follows:

def pendulum_done(ob):
return np.abs(np.arcsin(np.squeeze(ob[3]))) > .2

Besides the usual hyperparameters of TRPO that remain almost unchanged compared to the ones used in Chapter 7, TRPO and PPO Implementation, ME-TRPO asks for the following:

  • The learning rate of the dynamic models' optimizer, mb_lr
  • The mini-batch size, model_batch_size, which is used to train the dynamic models
  • The number of simulated steps to execute on each iteration, simulated_steps (this is also the batch size used to train the policy)
  • The number of models that constitute the ensemble, num_ensemble_models
  • The number of iterations to wait before interrupting the model_iter training of the model if the validation hasn't decreased

The values of these hyperparameters used in this environment are as follows:

Hyperparameters Values
Learning rate (mb_lr) 1e-5
Model batch size (model_batch_size) 50
Number of simulated steps (simulated_steps) 50000
Number of models (num_ensemble_models) 10
Early stopping iterations (model_iter) 15
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset