How it works...

As noted in the previous subsections, this recipe is comprised of one terminal window transmitting event data using nc. The second window runs our Spark Streaming application, reading from the port that the first window is transmitting to.

The important call outs for this code are noted here:

  • We're creating a Spark context using two working threads, hence the use of local[2].
  • As noted in the Netcat window, we're using ssc.socketTextStream to listen to the local socket of the localhost, port 9999.
  • Recall that for each 1-second batch, we're not only reading a single line (for example, blue blue blue blue blue green green green), but also splitting it up into individual words via split.
  • We're using a Python lambda function and PySpark map and reduceByKey functions to quickly count the occurrences of words within the 1-second batch. For example, in the case of blue blue blue blue blue green green green, there are five blue and three green events, as reported at 2018-06-21 23:00:30 of our streaming application.
  • ssc.start() is in reference to the application starting the Spark Streaming context.
  • ssc.awaitTermination() is waiting for a termination command to stop the streaming application (for example, Ctrl + C); otherwise, the application will continue to run.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset