Configuring Hadoop run-time on Windows

If you are developing your machine learning application on windows using Eclipse (as Maven project of course), probably you will face a problem since Spark expects that there is a runtime environment for Hadoop on Windows too.

More specifically, suppose you are running a Spark project written in Java with main class as JavaNaiveBayes_ML.java, then you will experience an IO exception saying that:

16/10/04 11:59:52 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable nullinwinutils.exe in the Hadoop binaries.
Configuring Hadoop run-time on Windows

Figure 1: IO exception due to the missing Hadoop runtime

The reason is that by default Hadoop is developed for the Linux environment and if you are developing your Spark applications on windows platform, a bridge is required that will provide the Hadoop environment for the Hadoop runtime for Spark to be properly executed.

Now, how to get rid of this problem then? The solution is straight forward. As the error message says, we need to have an executable namely winutils.exe. Now download the winutils.exe file from the code directory of Packt for this chapter and copy and paste it in the Spark distribution directory and configure Eclipse.

More specifically, suppose your Spark distribution containing Hadoop is located at C:/Users/spark-2.0.0-bin-hadoop2.7. Inside the Spark distribution there is a directory named bin. Now, paste the executable there (that is, path = C:/Users/spark-2.0.0-binhadoop2.7/bin/).

The second phase of the solution is going to Eclipse, select the main class (that is, JavaNaiveBayes_ML.java in this case), and then go to the Run menu. From the Run menu go to the Run Configurations option and from this option select the Environment tab. If you select the tab, you a will have the option to create a new environmental variable for Eclipse suing the JVM.

Now create a new environmental variable and put the value as C:/Users/spark-2.0.0-bin-hadoop2.7/. Now press on apply and re-run your application and your problem should be resolved.

More technically, the details of the IO exception can be described as follows in Figure 1:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset