Incremental MapReduce

Incremental MapReduce is a pattern where we use MapReduce to aggregate to previously calculated values. An example of this could be counting non-distinct users in a collection for different reporting periods (that is, by hour, day, or month) without the need to recalculate the result every hour.

To set up our data for incremental MapReduce, we need to do the following:

  • Output our reduce data to a different collection
  • At the end of every hour, query only for the data that got into the collection in the last hour
  • With the output of our reduce data, merge our results with the calculated results from the previous hour

Continuing with the previous example, let's assume that we have a published field in each of the documents with our input dataset, as shown in the following code:

> db.books.find()
{ "_id" : ObjectId("592149c4aabac953a3a1e31e"), "isbn" : "101", "name" : "Mastering MongoDB", "price" : 30, "published" : ISODate("2017-06-25T00:00:00Z") }
{ "_id" : ObjectId("59214bc1aabac954263b24e0"), "isbn" : "102", "name" : "MongoDB in 7 years", "price" : 50, "published" : ISODate("2017-06-26T00:00:00Z") }

Using our previous example of counting books, we will get the following code:

var mapper = function() {
emit(this.id, 1);
};
var reducer = function(id, count) {
return Array.sum(count);
};
> db.books.mapReduce(mapper, reducer, { out: "books_count" })
{
"result" : "books_count",
"timeMillis" : 16700,
"counts" : {
"input" : 2,
"emit" : 2,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
> db.books_count.find()
{ "_id" : null, "value" : 2 }

Now we get a third book in our mongo_book collection with a document, as follows:

{ "_id" : ObjectId("59214bc1aabac954263b24e1"), "isbn" : "103", "name" : "MongoDB for experts", "price" : 40, "published" : ISODate("2017-07-01T00:00:00Z") }
> db.books.mapReduce( mapper, reducer, { query: { published: { $gte: ISODate('2017-07-01 00:00:00') } }, out: { reduce: "books_count" } } )
> db.books_count.find()
{ "_id" : null, "value" : 3 }

What happened in the preceding code is that by querying for documents in July, 2017, we only got the new document out of the query and then used its value to reduce the value with the already-calculated value of 2 in our books_count document, adding 1 to the final sum of 3 documents.

This example, as contrived as it is, shows a powerful attribute of MapReduce: the ability to re-reduce results to incrementally calculate aggregations over time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset