Define the granularity

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Data modeling: End to end 87

4.2.2 Define the granularity

Stage two of the process is to determine the right level of detail that must be

contained within the model and ultimately, within the fact table. As an example, a

model might contain a complete record of a sales transaction as one entry within

the fact table, with all the products sold contained within the one entry or maybe

not even listed. In this case, the lowest level is be the individual sales transaction.

An alternative can be to set the product as the lowest level, thus requiring

multiple entries within the fact table to represent just one sales transaction. The

grain (detail) of the fact table has a significant impact on the data warehouse

from business, technical, and development points of view.

From a business perspective, a dimensional model designed to the lowest level

of grain is easy to change, with new dimensions or facts being easily added.

Thus, the model can maintain future capability and be flexible enough to answer

questions at a lower level of detail.

From a technical perspective, the grain of a table has a major impact on the size

of the star schema and thus the size of the tables. This has an effect on table

implementation and the performance of transform and query operations.

From a development perspective, the development team might have to consider

more dimensions and have a greater understanding of the underlying production

tables. They might also have to perform a greater number of, and more complex,

transforms for getting the data into the warehouse.

At this point in the modelling process it is worth investigating documents and

account forms from within the business; items such as order forms, invoices, and

sales receipts can help determine an acceptable grain level. You also must

consider time- and date-dependent information, because many of the business

questions that you aim to answer can be time related. An example might be

“What was the number of sales of a particular item on a particular date.” In this

scenario, you have to model time as one of your dimensions. You do, however,

have to check that the source data does contain information at the day, month, or

year level.

4.2.3 Identify the dimension tables

Dimensional tables are a major factor in the construction of a Star Schema

dimensional model. As illustrated in Figure 4-2 on page 84, one or more

dimensional tables are connected to the fact table, with each normally containing

detailed and relevant information about a section or element of the complete fact

table.

88 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition

Therefore, the dimensional model has to create a relationship between the fact

table and its required dimensional tables. This is achieved by having individual

columns within the fact table assigned to hold the foreign key of each primary key

within the dimensional tables.

A dimensional table might, within your model, also be designed to contain

additional information than that normally included within the associated

production tables related to the business process being modeled. A useful

example is the inclusion of extra columns within the dimension table for storing a

“product type” or “suppliers details. This additonal information will not normally

be part of a sales receipt, upon which you have based your fact table. But by

adding this extra, related information to the dimension table, query performance

can be improved because fewer less table joins are required in answering

business questions related to this additonal information.

In stage three of developing the dimensional model, you must determine which

tables currently implemented within the production 3NF model are actually

required as dimension tables within the star schema dimensional model.

However, this is not a simple process of only copying complete sets of tables

from the 3NF model that represents your production database into your

dimensional model as dimensional tables. Doing so does not produce an

effective and efficient star schema. For performance reasons, the 3NF model,

being based on the OLTP production database, might contain multiple related

tables, with each table only containing a small number of columns from the

complete related subject.

As an example, consider that the 3NF model of your production database has

been implemented to contain a set of product-related tables. Your model

contains a table holding the product names, a separate table for the type of

product, and a final table of the supplier’s details. The reason is that this

information is not critical to the core function of storing a sales transaction, and in

all probability, will only be updated within the production database at infrequent

intervals.

However, if this complete set of tables is copied into your dimensional model as

the basis for the “Product” dimension, the resulting model will d have more in

common with a snowflake schema than the more efficient star schema. It is at

this point that the dimensional model has to move away from that of the 3NF

model and its related set of tables.

To create the desired star schema, it might be necessary to collapse many of the

individual tables in the 3NF logical model into one single table with a larger set of

columns within the dimensional model. A side effect of this process is that the

now larger dimensional table will contain a greater amount of repeating

information and require a larger amount of storage than that given to the original

Chapter 4. Data modeling: End to end 89

base tables. Figure 4-4 illustrates the principle of collapsing multiple tables from

the production database into a single dimensional table.

Figure 4-4 Collapse multiple tables into one dimensional table

90 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition

Figure 4-5 illustrates the multiple tables within a production database that can be

collapsed into a single dimensional table.

Figure 4-5 Multiple tables within a production database

The process of collapsing or merging tables together to create a single

dimensional table is often referred to as a table being

de-normalized. Figure 4-6

on page 91 illustrates the logical de-normalized dimensional model of the table in

Figure 4-4 on page 89.

In the operational database, data is normalized

PRODUCT_LINE

PRODUCT_TYPE

Before collapsing into a star schema dimension

Chapter 4. Data modeling: End to end 91

Figure 4-6 Example of a de-normalized dimensional table

Dimension tables contain attributes that describe fact records in the fact table.

Some of these attributes provide descriptive information. Other attributes are

used to specify how fact table data is to be summarized to provide useful

information within the business report. Dimension tables contain hierarchies of

attributes that aid in summarization.

Note the following key points to consider about dimension tables:

???? Each dimension table has one and only one lowest level element, or lowest

level of detail, which is the granularity of the dimension.

???? Have each non-key element (other than the surrogate key) appear in only one

dimension table.

???? If the data for the dimension tables is drawn from different sources, it is

advisable to use surrogate keys as the primary keys in all the dimension

tables.

In the warehouse database, data is de-normalized

PRODUCT_DIM

After collapsing into a star schema dimension

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Define the granularity

Create new playlist

Sign In

Sign Up

Table of Contents for
Define the granularity