How it works...

In this recipe, we implement two programs to square all the numbers. In the first program, we use a standard R function, sapply, to square the sequence from 1 to 100,000. To record the program execution time, we first record the processing time before the execution in a.time, and then subtract a.time from the current processing time after the execution. Normally, the execution takes no more than 10 seconds. In the second program, we use the rmr2 package to implement a program in the R MapReduce version. In this program, we also record the execution time. Normally, this program takes a few minutes to complete a task.

The performance comparison shows that a standard R program outperforms the MapReduce program when processing small amounts of data. This is because a Hadoop system often requires time to spawn daemons, job coordination between daemons, and fetching data from data nodes. Therefore, a MapReduce program often takes a few minutes to a couple of hours to finish the execution. As a result, if you can fit your data in the memory, you should write a standard R program to solve the problem. Otherwise, if the data is too large to fit in the memory, you can implement a MapReduce solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset