Scalable implementation

To simplify the implementation and to make the parallelized version of ES work well with a limited number of workers (and CPUs), we will develop a structure similar to the one that's shown in the following diagram. The main process creates one worker for each CPU core and executes the main cycle. On each iteration, it waits until a given number of new candidates are evaluated by the workers. Different from the implementation provided in the paper, each worker evaluates more than one agent on each iteration. So, if we have four CPUs, four workers will be created. Then, if we want a total batch size bigger than the number of workers on each iteration of the main process, let's say, 40, each worker will create and evaluate 10 individuals each time. The return values and seeds are returned to the main application, which waits for results from all 40 individuals, before continuing with the following lines of code.

Then, these results are propagated in a batch to all the workers, which optimize the neural network seperately, following the update provided in the formula (11.2):

Figure 11.4. Diagram showing the main components involved in the parallel version of ES

Following what we just described, the code is divided into three main buckets:

The main process that creates and manages the queues and the workers.
A function that defines the task of the workers.
Additionally, there are some functions that perform simple tasks, such as ranking the returns and evaluating the agent.

Let's explain the code of the main process so that you have a broad view of the algorithm before going into detail about the workers.

Table of Contents for Scalable implementation

Create new playlist

Sign In

Sign Up

Table of Contents for
Scalable implementation