Parallelizing the implementation

We can easily parallelize our bagging implementation using from concurrent.futures import ProcessPoolExecutor. This executor allows the user to spawn a number of tasks to be executed and executes them in parallel processes. It only needs to be passed a target function and its parameters. In our example, we only need to create functions out of code sections (sections 2 and 3):

def create_learner(train_x, train_y):
# We sample indices in order to access features and targets
bootstrap_sample_indices = np.random.randint(0, train_size, size=train_size)
bootstrap_x = train_x[bootstrap_sample_indices]
bootstrap_y = train_y[bootstrap_sample_indices]
dtree = DecisionTreeClassifier()
dtree.fit(bootstrap_x, bootstrap_y)
return dtree

def predict(learner, test_x):
return learner.predict(test_x)

Then, in the original sections 2 and 3, we modify the code as follows:

# Original Section 2
with ProcessPoolExecutor() as executor:
futures = []
for _ in range(ensemble_size):
future = executor.submit(create_learner, train_x, train_y)
futures.append(future)

for future in futures:
base_learners.append(future.result())

# Original Section 3
base_predictions = []
base_accuracy = []
with ProcessPoolExecutor() as executor:
futures = []
for learner in base_learners:
future = executor.submit(predict, learner, test_x)
futures.append(future)

for future in futures:
predictions = future.result()
base_predictions.append(predictions)
acc = metrics.accuracy_score(test_y, predictions)
base_accuracy.append(acc)

The executor returns an object (in our case future), which contains the results of our function. The rest of the code remains unchanged with the exception that it is enclosed in if __name__ == '__main__' guard, as each new process will import the whole script. This guard prevents them from re-executing the rest of the code. As our example is small, with six processes available, we need to have at least 1,000 base learners to see any considerable speedup in the execution times. For a fully working version, please refer to 'bagging_custom_parallel.py' from the provided codebase.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset