Chapter 4. Data modeling: End to end 87
4.2.2 Define the granularity
Stage two of the process is to determine the right level of detail that must be
contained within the model and ultimately, within the fact table. As an example, a
model might contain a complete record of a sales transaction as one entry within
the fact table, with all the products sold contained within the one entry or maybe
not even listed. In this case, the lowest level is be the individual sales transaction.
An alternative can be to set the product as the lowest level, thus requiring
multiple entries within the fact table to represent just one sales transaction. The
grain (detail) of the fact table has a significant impact on the data warehouse
from business, technical, and development points of view.
From a business perspective, a dimensional model designed to the lowest level
of grain is easy to change, with new dimensions or facts being easily added.
Thus, the model can maintain future capability and be flexible enough to answer
questions at a lower level of detail.
From a technical perspective, the grain of a table has a major impact on the size
of the star schema and thus the size of the tables. This has an effect on table
implementation and the performance of transform and query operations.
From a development perspective, the development team might have to consider
more dimensions and have a greater understanding of the underlying production
tables. They might also have to perform a greater number of, and more complex,
transforms for getting the data into the warehouse.
At this point in the modelling process it is worth investigating documents and
account forms from within the business; items such as order forms, invoices, and
sales receipts can help determine an acceptable grain level. You also must
consider time- and date-dependent information, because many of the business
questions that you aim to answer can be time related. An example might be
“What was the number of sales of a particular item on a particular date.” In this
scenario, you have to model time as one of your dimensions. You do, however,
have to check that the source data does contain information at the day, month, or
year level.
4.2.3 Identify the dimension tables
Dimensional tables are a major factor in the construction of a Star Schema
dimensional model. As illustrated in Figure 4-2 on page 84, one or more
dimensional tables are connected to the fact table, with each normally containing
detailed and relevant information about a section or element of the complete fact
table.
88 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
Therefore, the dimensional model has to create a relationship between the fact
table and its required dimensional tables. This is achieved by having individual
columns within the fact table assigned to hold the foreign key of each primary key
within the dimensional tables.
A dimensional table might, within your model, also be designed to contain
additional information than that normally included within the associated
production tables related to the business process being modeled. A useful
example is the inclusion of extra columns within the dimension table for storing a
“product type” or “suppliers details. This additonal information will not normally
be part of a sales receipt, upon which you have based your fact table. But by
adding this extra, related information to the dimension table, query performance
can be improved because fewer less table joins are required in answering
business questions related to this additonal information.
In stage three of developing the dimensional model, you must determine which
tables currently implemented within the production 3NF model are actually
required as dimension tables within the star schema dimensional model.
However, this is not a simple process of only copying complete sets of tables
from the 3NF model that represents your production database into your
dimensional model as dimensional tables. Doing so does not produce an
effective and efficient star schema. For performance reasons, the 3NF model,
being based on the OLTP production database, might contain multiple related
tables, with each table only containing a small number of columns from the
complete related subject.
As an example, consider that the 3NF model of your production database has
been implemented to contain a set of product-related tables. Your model
contains a table holding the product names, a separate table for the type of
product, and a final table of the supplier’s details. The reason is that this
information is not critical to the core function of storing a sales transaction, and in
all probability, will only be updated within the production database at infrequent
intervals.
However, if this complete set of tables is copied into your dimensional model as
the basis for the “Product” dimension, the resulting model will d have more in
common with a snowflake schema than the more efficient star schema. It is at
this point that the dimensional model has to move away from that of the 3NF
model and its related set of tables.
To create the desired star schema, it might be necessary to collapse many of the
individual tables in the 3NF logical model into one single table with a larger set of
columns within the dimensional model. A side effect of this process is that the
now larger dimensional table will contain a greater amount of repeating
information and require a larger amount of storage than that given to the original
Chapter 4. Data modeling: End to end 89
base tables. Figure 4-4 illustrates the principle of collapsing multiple tables from
the production database into a single dimensional table.
Figure 4-4 Collapse multiple tables into one dimensional table
90 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
Figure 4-5 illustrates the multiple tables within a production database that can be
collapsed into a single dimensional table.
Figure 4-5 Multiple tables within a production database
The process of collapsing or merging tables together to create a single
dimensional table is often referred to as a table being
de-normalized. Figure 4-6
on page 91 illustrates the logical de-normalized dimensional model of the table in
Figure 4-4 on page 89.
In the operational database, data is normalized
PRODUCT_LINE
PRODUCT_TYPE
PRODUCT_TYPE
Before collapsing into a star schema dimension
Chapter 4. Data modeling: End to end 91
Figure 4-6 Example of a de-normalized dimensional table
Dimension tables contain attributes that describe fact records in the fact table.
Some of these attributes provide descriptive information. Other attributes are
used to specify how fact table data is to be summarized to provide useful
information within the business report. Dimension tables contain hierarchies of
attributes that aid in summarization.
Note the following key points to consider about dimension tables:
???? Each dimension table has one and only one lowest level element, or lowest
level of detail, which is the granularity of the dimension.
???? Have each non-key element (other than the surrogate key) appear in only one
dimension table.
???? If the data for the dimension tables is drawn from different sources, it is
advisable to use surrogate keys as the primary keys in all the dimension
tables.
In the warehouse database, data is de-normalized
PRODUCT_DIM
After collapsing into a star schema dimension
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset