Chapter 4. A MongoDB OLAP Schema

In this chapter, we will cover these recipes:

  • Creating a date dimension
  • Creating an Orders cube
  • Creating the customer and product dimensions
  • Saving and publishing a Mondrian schema
  • Creating a Mondrian 4 physical schema
  • Creating a Mondrian 4 cube
  • Publishing a Mondrian 4 schema

Introduction

In this chapter, you'll learn how to create OLAP (short for Online Analytical Processing) schemas for Pentaho with MongoDB as a data source. OLAP is an approach to creating multidimensional analyses. Pentaho uses the ROLAP (short for Relational Online Analytical Processing) engine, called by Mondrian to convert MDX (short for Multidimensional Expressions) queries into SQL queries.

If you aren't a business intelligence consultant, you probably have never heard about data warehouse and the preceding terms. Essentially, a data warehouse is a system for storing historical data from different data sources, so that you're prepared to use reporting systems, for example, Pentaho and Mondrian. This is a quick and simple explanation, but it is recommended that you carry out research about these terms, as this book is focused on using Pentaho and MongoDB, and not business intelligence technologies.

As Mondrian is responsible for generating SQL queries and MongoDB does not support it, it's necessary that we use a layer to convert SQL to MongoDB queries. With Pentaho, there are three main ways to create OLAP using MongoDB. Based on your requirements or customer requirements, you should choose one of these:

  • RDBMS: Use a relational database, in preference to a column-oriented database, and connect Mondrian on top. You need to create an ETL to get the data from MongoDB and load it into the relational database. This is the approach that was used long before NoSQL databases became popular.
  • Thin Kettle JDBC Driver: This approach uses Pentaho Data Integration as the layer responsible for getting the MongoDB data, based on an SQL query. Depending on the hardware and the configurations, it is possible that you will face performance issues with a lot of data in MongoDB. This approach is only possible with Pentaho Enterprise Edition because the Thin Kettle JDBC Driver is available on that version only.
  • Mondrian 4 and Pentaho EE native connector for MongoDB: The latest version of Pentaho Enterprise Edition comes with Mondrian 4 and a connector for MongoDB. This is probably the best approach based on performance—using MongoDB and Mondrian. However, this native connection works for single collections only. This means that you need all of the data for about one fact in a single JSON document, because the current MongoDB versions doesn't support joins.

In summary, this chapter is divided into two main parts. One is about creating a regular cube using the Thin Kettle JDBC Driver and Mondrian 3.x. We'll use two transformations that come in the source code of this chapter as our Thin Kettle JDBC data services: chapter4-getdates and chapter4-getorders. As was explained in previous chapters, you should be able to convert those transformations into data services. It is possible, in a particular way, to use this part to create a Mondrian schema for RDBMS, just by changing the database connection.

The second part is about the new Mondrian 4.x schema. This is done using the native connection for MongoDB, which is available on Pentaho Enterprise Edition only.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset