The Hadoop MapReduce framework is more commonly compared to Apache Spark, a newer technology that aims to solve problems in a similar problem space. Some of their most important attributes are summarized in the following table:
Hadoop MapReduce |
Apache Spark |
|
Written in |
Java |
Scala |
Programming model |
MapReduce |
RDD |
Client bindings |
Most high-level languages |
Java, Scala, Python |
Ease of use |
Moderate, with high-level abstractions (Pig, Hive, and so on) |
Good |
Performance |
High throughput in batch |
High throughput in streaming and batch mode |
Uses |
Disk (I/O bound) |
Memory, degrading performance if disk is needed |
Typical node |
Medium |
Medium-large |
As we can see from the preceding comparison, there are pros and cons for both technologies. Spark arguably has better performance, especially in problems that use fewer nodes. On the other hand, Hadoop is a mature framework with excellent tooling on top of it to cover almost every use case.