112 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
multidimensional expressions (MDXs) as the query language. However, the data
is not copied or persisted in another storage mechanism, but remains in the DB2
relational database.
Cubing Services thus is what we refer to as a “multidimensional OLAP hot
cache.” That is, InfoSphere Warehouse Cubing Services loads data into a
multidimensional structure in memory.
That partial MOLAP structure can then deliver fast response times for the data
that is in memory cache. If a user of a query tool requests data that is not in the
cache, the Cubing Services engine generates queries to the relational database
to retrieve the data and builds a partial in-memory cube-like structure for only
that data. Although the first request for that data sees a longer response time
while the in-memory structures are being built, subsequent requests for that data
are satisfied from the in-memory structures and experience MOLAP-style
response times while that data is maintained in memory.
Using Cubing Services as the OLAP engine results in a MOLAP-style response
in a large percentage of queries without the need to copy the data into another
storage mechanism. To improve the relational access, there is an optimization
wizard that recommends the appropriate MQTs that can be implemented to
improve the response of aggregated data when it is needed from the relational
database.
4.4.1 Cube model dimensional concepts
Before creating any cube models, it is important o understand the components
that make up such a model, and to understand their association to the fact and
dimensional tables that make up the dimensional model based on a star schema.
Cube models have four main core ingredients, which themselves have a number
of additional elements. These four ingredients are listed here.
???? Fact: derived from the fact table within the dimensional model, and which
contain:
Measures: derived from the columns or calculated.
???? Dimension: derived from the dimension table within the dimensional model,
and which contain:
Hierarchies: aggregates of data.
Levels: the granularity of the data with uniquely identifiable attributes.
Attributes: identifiable entities.
???? Joins: derived from the relationship between tables:
Joins between fact and dimension tables, in a star schema.
Chapter 4. Data modeling: End to end 113
Joins between multiple dimension tables, in a snow flake schema.
???? Cube: an instance of parts or a complete cube model implemented as a cube
service.
Figure 4-16 illustrates the components of a cube model and their relationship to a
the tables in a warehouse database.
Figure 4-16 Relationship between cube model and tables in a relational database
Fact element
In a dimensional model, the fact table holds the central position within the star
schema, with all the dimensional tables linked to this one table. In a cube model,
the
fact element maintains a similar central position, with the dimensional
elements connected, this time as a defined join. This means, within the definition
of the cube model, you have to define a set of SQL queries that join the fact table
to the individual dimensional tables within the relational database.
A fact object also contains a set of measures or metrics that best describes how
to aggregate data from the fact table, across dimensions. These measures or
metrics can be drawn directly from fact table columns, or derived through
Attribute
Relationship
Cube
Cube Dimension
Cube Hierarchy
Cube Facts
Cube Model
JoinMeasureMeasure
Facts
Cube
Metadata
Fact tables
Dimension tables
Dimension tables
Dimension tables
Relational tables in DB2
AttributeAttribute
Join
Hierarchy
Dimension
114 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
calculations. It is these measures or metrics that form the basis of what is to be
analyzed within the cube model. As an example, they might be sales amounts,
counts of product failures, or the number of admitted patients.
Within the cube model, understand that measures in themselves only become
meaningful when used in the context of a set of dimensions. For instance, when
used on its own, a sale of 300 units is not meaningful. However, when combined
with dimensions for location and product line, the resulting measure takes on
more meaning, that is, “A clothing line in the eastern sales region sold 300 units.
It is these measures or metrics in connection with the dimension objects, such as
time or location, that allow you to manage your business, analyze options, and
ask your BI questions.
The following types of fact measures can be implemented:
Calculated facts These are calculations that are based on existing
measures or column values in the relational database.
For example, the sales amount might be calculated from
the unit price of a item against the number of items sold.
These calculations can be performed by two methods
within a cube service in InfoSphere Warehouse: SQL and
MDX. For SQL calculations, the work is performed by
DB2 and SQL functions. For MDX calculations, the work
is performed within the Java application that is the cube
service and which is better suited to complex calculations.
Additive facts This refers to where a measure is rolled up across all
dimensions and levels using a consistent aggregation
function. An example of an additive fact is the sum of
sales or cost of goods sold. Cubing Services also support
average, count, standard deviation, max, min, and
variance as aggregation types.
Non-additive facts Non-additive facts cannot be rolled up through any of the
dimensions in the cube model, which is typically the case
for any measure calculations that result in ratios,
averages, or variance percentages, for example. Cubing
Services supports non-additive facts for calculated
measures by setting the aggregation function to None,
which results in that calculation being calculated after the
values are aggregated
Dimension element
In a star schema-based dimensional model, each dimension table contains a
specific set of related columns, biased towards a particular topic such as time,
location, or product range. Because a cube model is an abstraction of the star
Chapter 4. Data modeling: End to end 115
schema, it too has dimension elements. And these in turn are defined as a set of
related attributes that together describe one partition or aspect of the model,
along the lines of its metrics.
Similar to the dimensions within the star schema, these related dimensions
within a cube model organize the data in the fact object according to common
logical entities, such as by time, by location, by person. These dimensions are
then subdivided into additional components that are known as attributes, levels,
and hierarchies, as described here:
???? Hierarchies
Generally, for each dimension defined within the cube model, one or multiple
hierarchies are defined. These hierarchies describe the relationship and
structure of the referenced attributes, and provide a navigational and
computational way to traverse the dimension. Individual hierarchies are
divided into an ordered list of levels that represent the scope of dimension
granularity from the lowest to highest.
During the creation of the hierarchy, you must also specify the hierarchy type.
The following are hierarchy types that can be implemented within a
dimension:
Balanced hierarchy
All the branches in a balanced hierarchy descend to the same level and
have the same corresponding number of levels. A good example of a
balanced hierarchy is time, where the meaning and depth of each level,
such as year, month, and week, are constant.
Unbalanced hierarchy
In this context, although all the branches within an unbalanced hierarchy
have a parent-child relationship, they have an inconsistent number of
levels and might not all reach to the same depth. An example of this type
of hierarchy might be that of an organizational chart where a large
department has a sub-manager, but a smaller department does not.
Ragged hierarchy
Similar to an unbalanced hierarchy, the branches of a ragged hierarchy
can descend to different levels. However, an hierarchy becomes ragged
when the parent of a member is not in the level immediately above that
member and thus, one branch does not have an entity at all the levels. An
good example might be a set of countries in which a larger country might
be divided into states, then a city level. However, a smaller country just
has a city level.
116 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
Network hierarchy
In this situation, the levels do not have an inherent parent-child
relationship, so their order is not important. But the network hierarchy still
has attributes such as color, size, or product type.
In a relational database, the different levels of a hierarchy can be stored in a
single table (as in a star schema) or in separate tables (as in a snowflake
schema).
A hierarchy can have an “All” level as the top-most level in the hierarchy,
which is a single member that represents the aggregation of all of the
members in the levels below in the hierarchy.
???? Attributes
These are the basic building blocks of the dimension element and are the
identifiable aspects in the model with which to describe, organize, group and
ask questions about entities with the cube model. In the simplest terms,
attributes represent the basic abstraction of a database column. They can be
expressed as a simple SQL expression that maps to a table column within a
dimensional table, or as a more complex expression that can combines
multiple columns within one or more tables.
Attributes often serve as headings for the columns of a report, so make them
descriptive and easy to understand.
???? Levels
Levels define the specific unique “slices” of each hierarchy or the specific
granular layers within the hierarchy for each dimension definition. If a
hierarchy contains more than one level, then these levels have a defined
order within hierarchy structure. As an example, the natural levels of a time
dimension are days, months, years. They are placed in a specific levels order
to form a Calendar Time hierarchy; that is, the highest level will be years,
followed by months, and then at the lowest level by days. Each member, a
specific day or month, of the level is uniquely identifiable. In the context of a
dimension table in a logical model or within the database, this property maps
to the key column or a group of columns that uniquely identifies every row in
the table.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset