Getting ready

The agents in this recipe are not learning any policy; they make their decision based on their initial set of weights (fixed policy). The agent picks the actions based on the probability given by the neural network. The decision that each agent makes is based only on the present observation of the environment.

We implement this by a fully connected neural network. The inputs to the NN are decided by the observation space of the environment; the number of output neurons is decided by the number of possible discrete actions. The Pac-Man game has nine discrete possible actions--NoOp, Turn Right, Turn Left, Turn Up, Turn Down, Move Left, Move Right, Move Up, and Move Down--and so our NN has nine output neurons. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset