In this recipe, you will transfer data from a sample RDBMS to a MongoDB database. The sample data is called SteelWheels and is available in the Pentaho BA server, running on the Hypersonic Database Server.
Start the Pentaho BA Server by executing the appropriate scripts located in the BA Server's home folder. It is start-pentaho.sh
for Unix/Linux operating systems, and for the Windows operating system, it is start-pentaho.bat
. Also in Windows, you can go to the Start menu and choose Pentaho Enterprise Edition, then Server Management, and finally Start BA Server.
Start Pentaho Data Integration by executing the right scripts in the PDI home folder. It is spoon.sh
for Unix/Linux operating systems and spoon.bat
for the Windows operating system. Besides this, in Windows, you can go to the Start menu and choose Pentaho Enterprise Edition, then Design Tools, and finally Data Integration.
Start MongoDB. If you don't have the server running as a service, you need execute the mongod –dbpath=<data folder>
command in the bin folder of MongoDB.
To make sure you have the Pentaho BA Server started, you can access the default URL, which is http://localhost:8080/pentaho/
. When you launch Spoon, you should see a welcome screen like the one pictured here:
After you have made that sure you are ready to start the recipe, perform the following steps:
To do this, click on the New button next to the Database Connection pulldown. This will open the Database Connection dialog.
Set Connection Name to SteelWheels. Next, select the Connection Type as Hypersonic. Set Host Name to localhost, Database Name to SampleData, Port to 9001, Username to pentaho_user, and finally Password to password. Your setup should look similar to the following screenshot:
SELECT * FROM CUSTOMERs
As you can see, this is a basic transformation that loads data from the RDBMS database and inserts it into a MongoDB collection. This is a very simple example of loading data from one point to another. Not all transformations are like this. That is why PDI comes with various steps that allow you to manipulate data along the way.
In this case, we truncate the collection each time the transformation is run. However, it is also possible to use other combinations, such as Insert&Update or just Insert or Update individually.
Now that we have designed a transformation, let's look at a simple way of reusing the MongoDB connection for future transformations.
If you have to create MongoDB connections manually for each transformation, you are likely to make mistakes and typos. A good way to avoid this is to store the MongoDB connection details in a separate .properties
file on your filesystem. There is a file called kettle.properties
that is located in a hidden directory called .kettle
in your home directory. For example, in Linux, the location will be /home/latino/.kettle
. In Windows, it will be C:Userslatino.kettle
. Navigate to and open this .properties file in your favorite text editor. Then, copy and paste the following lines:
MONGODB_STEELWHEELS_HOSTNAME=localhost MONGODB_STEELWHEELS_PORT=27017 MONGODB_STEELWHEELS_USERNAME= MONGODB_STEELWHEELS_PASSWORD=
Save the .properties file and restart Spoon.
Now, where can we use these properties?
You will notice that when you are setting properties in certain PDI steps, you can see the following icon:
This icon denotes that we can use a variable or parameter in place of a static value. Variables are defined using the following structure: ${MY_VARIABLE}. You will notice that the variables are encapsulated in ${}. If you are not sure what the name of your variable is, you can also press Ctrl and the Spacebar; this will open a drop-down list of the available variables. You will see the MongoDB variables that you defined in the .properties file earlier in this list. With this in mind, we can now replace the connection details in our steps with variables as shown in this screenshot:
You can find out more about the MongoDB Output step on this documentation website: http://wiki.pentaho.com/display/EAI/MongoDB+Output