Why aggregation?

The aggregation framework was introduced by MongoDB in version 2.2 (which is version 2.1 in the development branch). It serves as an alternative to both the MapReduce framework and querying the database directly.

Using the aggregation framework, we can perform GROUP BY operations in the server. Thus, we can project only the fields that are needed in the result set. Using the $match and $project operators, we can reduce the amount of data passed through the pipeline, resulting in faster data processing.

Self-joins—that is, joining data within the same collection—can also be performed using the aggregation framework, as we will see in our use case.

When comparing the aggregation framework to simply using the queries available via the shell or various other drivers, it is important to remember that there is a use case for both.

For selection and projection queries, it's almost always better to use simple queries, as the complexity of developing, testing, and deploying an aggregation framework operation cannot easily outweigh the simplicity of using built-in commands. Finding documents with ( db.books.find({price: 50} {price: 1, name: 1}) ), or without ( db.books.find({price: 50}) ) projecting only some of the fields, is simple and fast enough to not warrant the usage of the aggregation framework.

On the other hand, if we want to perform GROUP BY and self-join operations using MongoDB, there might be a case for using the aggregation framework. The most important limitation of the group() command in the MongoDB shell is that the resulting set has to fit in a document, meaning that it can't be more than 16 MB in size. In addition, the result of any group() command can't have more than 20,000 results. Finally, group() doesn't work with sharded input collections, which means that when our data size grows we have to rewrite our queries anyway.

In comparison to MapReduce, the aggregation framework is more limited in functionality and flexibility. In the aggregation framework, we are limited by the available operators. On the plus side, the API for the aggregation framework is simpler to understand and use than MapReduce. In terms of performance, the aggregation framework was way faster than MapReduce in earlier versions of MongoDB, but seems to be on a par with the most recent versions after the improvement in performance by MapReduce.

Finally, there is always the option of using the database as data storage and performing complex operations using the application. Sometimes this can be quick to develop, but should be avoided as it will most likely incur memory, networking, and ultimately performance costs down the road.

In the next section, we will describe the available operators before using them in a real use case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset