Running the Pentaho Data Integration server in a single instance

This recipe guides you through starting a Data Integration server and the simple steps required to work with a Data Integration repository. We will add a MongoDB MapReduce transformation to the DI repository and define a data service that runs from the server.

Getting ready

To get ready for this recipe, you first need to start the MongoDB server with the same database as that of the last chapter. You will also have to verify that <user home folder>/.pentaho/metastore is accessible to Data Integration server.

How to do it…

To run the DI Server, perform the following steps:

  1. There is a ctlscript.sh script for Unix/Linux operating systems and ctlscript.bat for Windows operating systems in the Pentaho EE suite. This allows you to control the servers packed in the platform. We can start, stop, and restart various servers using this script:
    1. Open a command-line tool on your operating system and navigate to the <pentaho-installation-path>/ folder.
    2. Execute the ./ctlscript.sh help command to get all the available options for managing the Pentaho suite.
    3. Next, execute the ./ctlscript.sh start command and all Pentaho services will start. As we mentioned before, it is possible to execute various servers manually using this script. We could have run the postgres server first (needed for the data integration server) and the Data Integration server afterwards using ./ctlscript.sh start postgresql and then ./ctlscript.sh start data-integration-server.
  2. Another way of running the DI server is by executing the <pentaho-installation-path>/server/data-integration-server/start-pentaho.sh file for Unix/Linux operating systems and <pentaho-installation-path>/server/data-integration-server/start-pentaho.bat for Windows operating systems. Even in Windows, you can start the DI server from the Start menu by going to Start | Pentaho Enterprise Edition | Server Management | Start Data Integration Server.
  3. Check whether or not the DI server has started correctly by accessing http://localhost:9080/pentaho-di in your web browser. You should get a login page similar to what is shown in this screenshot:
    How to do it…
    1. You should see a login screen. Enter admin as the username and password as the password.
    2. You can list the available Kettle Data Services by navigating to the http://localhost:9080/pentaho-di/kettle/listServices endpoint.
  4. Open the chapter1-mongodb-map-reduce-writelog-without-parameter.ktr file in Spoon, save it as chapter2-mongodb-map-reduce.ktr, and change the transformation name to MongoDB MapReduce Kettle Thin.
  5. Save the transformation in the DI repository.
    1. In the main menu, navigate to Tools | Repository and click on Connect... or press Ctrl + R.
    2. Click on the plus icon to add a new repository.
    3. Once the Select the repository type opened, select the DI Repository option. The following screenshot is seen:
      How to do it…
    4. In the Repository configuration dialog, enter http://localhost:9080/pentaho-di in the URL property, PentahoMongoDB in the Name property, and Pentaho MongoDB Cookbook for the Description property. Then click on the OK button.
    5. In the Repository Connection dialog, use the default credentials; the username is admin and the password is password. Click on OK. Then, in the Close files dialog, click on the No button.
    6. Saving your transformation will display the Transformation properties dialog. Click on OK and then the Enter comment dialog will appear with a default comment. Click on OK again. The comments dialog appears, because the Data Integration Repository is based on the JRC version control.
  6. Define a data service for this transformation.
    1. Open the Transformation settings dialog by any of these ways: press Ctrl + T; right-click on the right-hand side working area and select Transformation settings; or in the main menu, select the Settings... item entry from the Edit menu.
    2. Once the Transformation properties dialog opens, select the Data Service tab.
    3. Click on the Create new Data Service button.
    4. Set the new virtual table property to MapReduceTable.
    5. Select the OUTPUT option of the Service step drop-down property.
    6. Click on the OK button.
    7. Save the transformation again. Because you are connected to the DI repository, the Enter comment dialog is displayed. Enter a comment and click on OK.
  7. Register the new PentahoMongoDB repository with the DI server by adding the following XML to the <pentaho-ee-installation-path>/server/data-integration-server/pentaho-solutions/system/kettle/slave-server-config.xml file inside the slave_config tag:
      <slave_config>
        …
        <repository>
          <name>PentahoMongoDB</name>
          <username>admin</username>
          <password>password</password>
        </repository>
    </slave_config>
  8. The MongoDB driver is not available in the full class path of the DI server, and it is necessary to add it. Copy the MongoDB driver from <pentaho-ee-installation-path>/design-tools/data-integration/plugins/pentaho-mongodb-plugin/lib/mongo-java-driver-2.13.0.jar and paste it in the <pentaho-ee-installation-path>/server/data-integration-server/tomcat/webapps/pentaho-di/WEB-INF/lib folder.
  9. Restart the Data Integration server using the./ctlscript.sh restart data-integration-server command.
  10. Get the MapReduceTable service definition by navigating to the http://localhost:9080/pentaho-di/kettle/listServices endpoint.

Note

For Windows operating systems, if you don't see your service, one of the reasons is that you probably have the wrong value for the KETTLE_HOME variable.

KETTLE_HOME is the home folder of the .kettle folder. Inside the latter, you can find configurations for Pentaho Data Integration, for example, the repositories.xml file. As the DI Server is running as a service over the Administrator user, the KETTLE_HOME variable has the C: value by default.

  • There are two things that you can do to fix this:
  • Copy the repositories.xml file from your home user; for example, copy it from C:Users<user home folder>.kettle epositories.xml to C:.kettle epositories.xml.
  • Stop the DI server service and run the following command from <pentaho-installation-path>/server/data-integration-server/tomcat/bin:

tomcat6.exe //US//pentahoDataIntegrationServer ++JvmOptions -DKETTLE_HOME=C:Users<user home folder>.kettle epositories.xml

How it works…

The user interface of the DI server looks similar to the Carte server. However, Carte is a lightweight web server based on the Jetty server and doesn't provide enterprise features, such as scheduling jobs or transformations. The DI server is a Tomcat-based server with more capabilities for integration systems, for example, LDAP authentication.

In this recipe, we walked you through the steps for managing the DI server using the ctlscript.sh script. It's worth noting that it is also possible to use the start-pentaho and stop-pentaho scripts from the <pentaho-ee-installation-path>/server/data-integration-server/ folder.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset