Multithreading for parallel processing

As we are now focusing on writing our scripts efficiently, a major aspect of this is how efficiently, quickly, and correctly we fetch the information. When we use the for loop, we parse through each item one by one, which is fine if we get results quickly.

Now, if each item in a for loop is a router from which we need to get the output of show version, and if each router takes around 10 seconds to log in, gather the output, and log out, and we have around 30 routers that we need to get this information from, we would need 10*30 = 300 seconds for the program to complete the execution. If we are looking for more advanced or complex calculations on each output, which might take up to a minute, then it will take 30 minutes for just 30 routers.

This starts becoming very inefficient when our complexity and scalability grows. To help with this, we need to add parallelism to our programs. What this simply means is, we log in simultaneously on all 30 routers, and perform the same task to fetch the output at the same time. Effectively, this means that we now get the output on all 30 routers in 10 seconds, because we have 30 parallel threads being called.

A thread is nothing but another instance of the same function being called, and calling it 30 times means we are invoking 30 threads at the same time to perform the same tasks.

Here's an example:

import datetime
from threading import Thread

def checksequential():
    for x in range(1,10):
        print (datetime.datetime.now().time())

def checkparallel():
    print (str(datetime.datetime.now().time())+"
")

checksequential()
print ("
Now printing parallel threads
")
threads = []
for x in range(1,10):
    t = Thread(target=checkparallel)
    t.start()
    threads.append(t)

for t in threads:
    t.join()

The output of the multi-threading code is as follows:

As we can see in the preceding example, we created two functions, named checksequential and checkparallel, to print the system's date time. The datetime module is used to get the system's date time in this case. In the for loop, a sequential run was done that shows the increment time in the output when the function was called.

For the threading, we use a blank array named threads. Each of the instances that is created has a unique thread number or value, which is stored in this empty thread array each time the checkparallel method is spawned. This unique number or reference for each thread identifies each thread as and when its executed. The start() method is used to get the thread to perform the function called in the thread.
The last loop is important in the thread. What it signifies is that the program will wait for all the threads to complete before moving forward. The join() method specifies that until all the threads are complete, the program will not proceed to the next step.

Now, as we can see in the output of the thread, some of the timestamps are the same, which means that all those instances were invoked and executed at the same time in parallel rather than sequentially.

The output in the program is not in order for parallel threads, because the moment any thread is completed, the output is printed, irrespective of the order. This is different to sequential execution, since parallel threads do not wait for any previous thread to complete before executing another. So, any thread that completes will print its value and end.

PowerShell sample code for the preceding task is as follows:

#PowerShell sample code
Get-Job  #This get the current running threads or Jobs in PowerShell
Remove-Job -Force * # This commands closes forcible all the previous threads

$Scriptblock = {
      Param (
         [string]$ipaddress
      )
    if (Test-Connection $ipaddress -quiet)
    { 
        return ("Ping for "+$ipaddress+" is successful")
     }
    else
    {
       return ("Ping for "+$ipaddress+" FAILED") 
    }
   }

$iplist="4.4.4.4","8.8.8.8","10.10.10.10","20.20.20.20","4.2.2.2"

foreach ($ip in $iplist)
{
    Start-Job -ScriptBlock $Scriptblock -ArgumentList $ip | Out-Null
    #The above command is used to invoke the $scriptblock in a multithread
}

#Following logic waits for all the threads or Jobs to get completed
While (@(Get-Job | Where { $_.State -eq "Running" }).Count -ne 0)
  { # Write-Host "Waiting for background jobs..."
     Start-Sleep -Seconds 1
  }

#Following logic is used to print all the values that are returned by each thread and then remove the thread # #or job from memory
ForEach ($Job in (Get-Job)) {
  Receive-Job $Job
  Remove-Job $Job
  }

Table of Contents for Multithreading for parallel processing

Create new playlist

Sign In

Sign Up

Table of Contents for
Multithreading for parallel processing