Working with jobs and filtering MongoDB data using parameters and variables

In this recipe, we guide you through creating two PDI jobs. One uses variables and the other uses parameters. In a PDI process, jobs orchestrate other jobs and transformations in a coordinated way to realize the main business process. These jobs use the transformation created in the last recipe but with some changes, as described in this recipe.

So, in this recipe, we are going create two different jobs, which will be used to send data to a subtransformation. The subtransformation that we will use will be a copy of the transformation in the previous recipe.

Getting ready

To get ready for this recipe, you need to start your ETL development environment Spoon, and make sure you have the MongoDB server started with the data inserted in the last recipes.

How to do it…

Let's start using jobs and variables. We can orchestrate the ETL to run in different ways. In this simple case, we are just using the customer name. Perform the following steps:

  1. Let's copy and paste the transformation created in the previous recipe and save it as chapter1-mongodb-map-reduce-writelog.ktr.
  2. Open that transformation using Spoon, and from the Utility category folder, find the Write to log step. Drag and drop it into the working area in the right-side view.
    1. Create a hop between the OUTPUT step and the Write to log step.
    2. Double-click on the Write to Log step to open the configuration dialog.
    3. Set Step Name to MapReduce.
    4. Click on the Get Fields button.
    5. Click on OK to finish the configuration.
  3. Let's create a new empty job.
    1. Click on the New file button from the toolbar menu and select the Job item entry. Alternatively from menu bar, go to File | New | Job.
    2. Open the Job properties dialog by pressing Ctrl + J or by right-clicking on the right-hand-side working area and selecting Job settings.
    3. Select the Job tab. Set Job Name to Job Parameters.
    4. Select the Parameters tab and add a Parameter entry with the name as CUSTOMER_NAME. Click on OK.
    5. Save the Job with the name job-parameters.
  4. From the General category folder, find the START, Transformation, and Success steps and drag and drop them into the working area in the right-side view.
    1. Create a hop between the START step and the Transformation step.
    2. Then, create a hop from the Transformation step to the Success step.
    3. Double-click on the Transformation step to open the configuration dialog
    4. Change the Name of job entry property to MapReduce Transf.
    5. Click on the transformation button of the Transformation filename field and select the transformation file that you copied before in your filesystem. Also select the chapter1-mongodb-map-reduce-writelog.ktr file.
    6. Select the Parameters tab. By default, the Pass all parameters values down to the sub-transformation option is checked, which means our job parameter will be passed to the transformation.
    7. Click on OK to finish.
    8. Run the job and analyze the results and check the logs on the Logging tab.

Now let's do a quick and simple example using variables:

  1. Copy and paste the chapter1-mongodb-map-reduce-writelog transformation. Save it as chapter1-mongodb-map-reduce-writelog-without-parameter.
  2. Open the transformation with Spoon and remove the parameter from Transformation properties.
  3. Copy and paste the last job. Save it as job-variables.
    1. Open the job with Spoon.
    2. In Job properties, change the job name to Job Variables. From the Parameters tab, remove the CUSTOMER_NAME parameter. Select the parameter, right-click on it and select Delete selected lines, or just press delete on your keyboard.
    3. Click on OK to finish.
  4. From the General category folder, find the Set variables step and drag and drop it into the working area in the right-side view.
    1. Remove the hop from between the START step and MapReduce Transf step.
    2. Create a hop between the START step and the Set variables step.
    3. Then, create a hop between Set Variables and the MapReduce Transf step.
    4. Double-click on the Set Variables step to open the configuration dialog.
    5. Set the Step name property to Set CUSTOMER_NAME.
    6. On Variables, create a new variable with the CUSTOMER_NAME name. Set the value to an existing client in the database and the Scope type to Valid in the root job.
    7. Click on OK to finish the configuration.
  5. On the MapReduce Transf transformation step, change the file location for the transformation file to the transformation without the parameter.
  6. Run the job and analyze the results, checking the logs in the Logging tab.

How it works…

Most ETL solutions created in Pentaho Data Integration will be sets of jobs and transformations.

Transformations are workflows with an orchestration of actions that manipulate data using essentially input, transformation, and output steps.

Jobs are workflows with an orchestration of tasks that can be order execution failure or success.

Variables and parameters are extremely useful functions that we can use to create dynamic jobs and transformations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset