The experienced buffer

The experienced buffer is a class of the ExperienceBuffer type and stores a queue of type FIFO (First InFirst Out) for each of the following components: observation, reward, action, next observation, and done. FIFO means that once it reaches the maximum capacity specified by maxlen, it discards the elements starting from the oldest one. In our implementation, the capacity is buffer_size:

class ExperienceBuffer():

def __init__(self, buffer_size):
self.obs_buf = deque(maxlen=buffer_size)
self.rew_buf = deque(maxlen=buffer_size)
self.act_buf = deque(maxlen=buffer_size)
self.obs2_buf = deque(maxlen=buffer_size)
self.done_buf = deque(maxlen=buffer_size)

def add(self, obs, rew, act, obs2, done):
self.obs_buf.append(obs)
self.rew_buf.append(rew)
self.act_buf.append(act)
self.obs2_buf.append(obs2)
self.done_buf.append(done)

The ExperienceBuffer class also manages the sampling of mini-batches, which are used to train the neural network. These are uniformly sampled from the buffer and have a predefined batch_size size:

    def sample_minibatch(self, batch_size):
mb_indices = np.random.randint(len(self.obs_buf), size=batch_size)

mb_obs = scale_frames([self.obs_buf[i] for i in mb_indices])
mb_rew = [self.rew_buf[i] for i in mb_indices]
mb_act = [self.act_buf[i] for i in mb_indices]
mb_obs2 = scale_frames([self.obs2_buf[i] for i in mb_indices])
mb_done = [self.done_buf[i] for i in mb_indices]

return mb_obs, mb_rew, mb_act, mb_obs2, mb_done

Lastly, we override the _len method to provide the length of the buffers. Note that because every buffer is the same size as the others, we only return the length of self.obs_buf:

    def __len__(self):
return len(self.obs_buf)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset