The PDI MongoDB Map/Reduce Output step

Most aggregation operations in MongoDB are done by the Aggregation Framework, which provides better performance, but in some cases, it is necessary that it possesses flexibility that isn't present in it and is just possible with Map/Reduce commands.

Ivy Information Systems has contributed a plugin with two MongoDB steps—MongoDB Map/Reduce and MongoDB Lookup—under the AGPL license. These are available on GitHub at https://github.com/ivylabs/ivy-pdi-mongodb-steps.

Getting ready

To get ready for this recipe, you will need to start your ETL development environment Spoon, and make sure that you have the MongoDB server running with the data from the previous chapters.

How to do it…

Perform the following steps to create a quick sample for users with MongoDB Map/Reduce in PDI:

  1. Let's install the Ivy PDI MongoDB by performing the following steps:
    1. On the menu bar of Spoon, select Help and then Marketplace.
    2. A PDI Marketplace popup will show you the list of plugins available for installation. Search for MongoDB in the Detected Plugins field.
    3. Expand the Ivy PDI MongoDB Steps Plugin item. As you can see in the following screenshot:
      How to do it…
    4. Click on the Install this plugin button.
    5. Next, click on the OK button in the alert for restarting Spoon.
    6. Restart Spoon.
  2. Let's make the same Map/Reduce transformation that was made in the first chapter with User Defined Java Class to prove how much easier it is:
    1. In Spoon, create a new transformation with the name mongodb-map-reduce.ktr.
    2. Under the Transformation properties and Parameters tab, create a new parameter with the CUSTOMER_NAME name.
    3. Select the Design tab in the left-hand-side view.
    4. From the Big Data category folder, find the MongoDB Map/Reduce Input step, and drag and drop it into the working area in the right-hand-side view.
    5. Double-click on the step to open the MongoDB Map/Reduce Input configuration dialog.
    6. Set Step Name to Get data.
    7. In the Configure connection tab, click on the Get DBs button and select the SteelWheels option for the Database field. Then, click on the Get collections button and select the Orders option for the Collection field.
    8. In the Map function tab, set this JavaScript map function:
      function() {
        var category;
        if ( this.customer.name == '${CUSTOMER_NAME}' ) 
          category = '${CUSTOMER_NAME}'; 
        else 
          category = 'Others'; 
        emit(category, {totalPrice: this.totalPrice, count: 1}); 
      }
    9. In the Reduce function tab, set the following JavaScript reduce function:
      function(key, values) { 
        var n = { count: 0, totalPrice: 0}; 
        for ( var i = 0; i < values.length; i++ ) { 
          n.count += values[i].count; 
          n.totalPrice += values[i].totalPrice; 
        } 
        return n;
      }
    10. Then, in the Fields tab, click on the Get fields button, and you'll be able to get new fields there: _id, count, and totalPrice. Remove the _id field. The final configuration should look like this:
      How to do it…
    11. Click on the OK button.
    12. From the Flow category folder, find the Dummy (do nothing) step, and drag and drop it into the working area in the right-hand-side view.
    13. Connect the Get data step to the Dummy (do nothing) step.
    14. Double-click on the step to open the Dummy (do nothing) configuration dialog.
    15. Set Step Name to OUT.
    16. Click on the OK button. The transformation should be similar to what is shown in the following screenshot, and you may be able to preview the execution transformation:
    How to do it…

How it works…

Using this step for Map and Reduce is much easier than using the UJDC step, but the latter is much flexible in the way for processing data; however, users are prone to making mistakes.

The Map and Reduce functions in MongoDB are in JavaScript, and you can get more flexibility because the map function can create more than one key and value mapping or no mapping at all.

This recipe was a simple example based on the last recipe of the first chapter, but using this popular data processing paradigm, you can perform many complex queries as you like.

See also

In the MongoDB Map/Reduce using the User Defined Java Class step and MongoDB Java Driver recipe of the first chapter, we have explained the same functionality, but using the User Defined Java Class step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset