Troubleshooting MapReduce

Over the years, one of the major shortcomings of MapReduce frameworks has been the inherent difficulty in troubleshooting, as opposed to simpler non-distributed patterns. Most of the time, the most effective tool is debugging using log statements to verify that output values match our expected values. In the mongo shell, which is a JavaScript shell, it is as simple as providing the output using the console.log() function.

Diving deeper into MapReduce in MongoDB, we can debug both in the map and the reduce phase by overloading the output values.

By debugging the mapper phase, we can overload the emit() function to test what the output key values will be, as follows:

> var emit = function(key, value) {
print("debugging mapper's emit");
print("key: " + key + " value: " + tojson(value));

We can then call it manually on a single document to verify that we get back the key-value pair that we expect:

> var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
> mapper.apply(myDoc);

The reducer function is somewhat more complicated. A MapReduce reducer function must meet the following criteria:

  • It must be idempotent
  • It must be commutative
  • The order of values coming from the mapper function should not matter for the reducer's result
  • The reducer function must return the same type of result as the mapper function

We will dissect each of these following requirements to understand what they really mean:

  • It must be idempotent: MapReduce, by design, may call the reducer function multiple times for the same key with multiple values from the mapper phase. It also doesn't need to reduce single instances of a key as it's just added to the set. The final value should be the same no matter the order of execution. This can be verified by writing our own verifier function and forcing reducer to re-reduce, or by executing reducer many times as shown in the following code snippet:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
  • It must be commutative: As multiple invocations of the reducer function may happen for the same key, if it has multiple values, the following code should hold:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
  • The order of values coming from the mapper function should not matter for the reducer's result: We can test that the order of values from mapper doesn't change the output for reducer, by passing in documents to mapper in a different order and verifying that we get the same results out:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
  • The reduce function must return the same type of result as the mapper function: Hand-in-hand with the first requirement, the type of object that the reduce function returns should be the same as the output of the mapper function.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.