Remember, act, and replay!

Beyond the usual suspects involved in our neural network, we need to define additional functions for our agent's memory. The remember function takes a number of inputs, as follows:

  • State
  • Action
  • Reward
  • Next state
  • Is done

It appends these values to the memory (that is, a sequentially ordered list).

We now define how an agent takes an action in an act function. This is where we manage the balance between the exploration of the state space and the exploitation of learned knowledge. These are the steps to follow:

  1. It takes in one value, that is, the state.
  2. From there, it applies epsilon; that is, if a random value between 0 and 1 is less than epsilon, then take a random action. Over time, our epsilon decays, reducing the randomness of the action!
  3. We then feed the state into our model to make a prediction about what action to take.
  4. From this function, we return max(a).

The additional function we need is for the experience replay. The steps that this function take are as follows:

  1. Create a random sample (of batch_size) selected from our 2,000-unit memory, which was defined and added to by the preceding remember function
  2. Iterate over the state, action, reward, next_state, and isdone inputs, as follows:
    1. Set target = reward
    2. If not done, then use the following formula:

Estimated future reward = current reward + (discounting factor (gamma) * call to model(predicted max expected reward) of next_state)

  1. Map the future reward input to the model (that is, the predicted future reward input from the current state)
  2. Finally, replay the memory by passing the current state and the targeted future reward for a single epoch of training
  3. Decrement epsilon by using epsilon_decay

This covers the theory of DQNs and Q-learning more generally; now, it's time to write some code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset