So just how do you decide if you're working on a true data warehouse? First, examine the intended nature of your database and the application it supports. For each subject area in your data warehouse, simply ask your sponsoring business user to provide the following eight items:
Mission statement
Number of ad-hoc query users
Number ad-hoc queries per day per ad-hoc user
Number of pre-canned report users
Number of pre-canned reports per day per pre-canned user
Number of pre-canned reports
Amount of history to keep in months, quarters, or years
Typical daily, weekly, or monthly volume of data to record
These answers should help you categorize your database application into one of the following choices:
Online transaction processing (OLTP)
Operational data store (ODS)
Online analytical processing (OLAP)
Data mart/data warehouse (DM/DW)
Use the criteria outlined in Table 1-1 to make your distinction.
OLTP | ODS | OLAP | DM / DW | |
---|---|---|---|---|
Business Focus | Operational | Operational / Tactical | Tactical | Tactical / Strategic |
End User Tools | Client/Server or Web | Client/Server or Web | Client/Server | Client/Server or Web |
DB Technology | Relational | Relational | Cubic | Relational |
Transaction Count | Large | Medium | Small | Small |
Transaction Size | Small | Medium | Medium | Large |
Transaction Time | Short | Medium | Medium | Long |
DB Size in GB | 10–400 | 100–800 | 100–800 | 800—80,000 |
Data Modeling | Traditional ERD | Traditional ERD | N/A | Dimensional |
Normalization | 3–5 NF[1] | 3 NF | N/A | 0 NF |
[1] Normal Form
For example, suppose your answers are as follows:
“The point of sale (POS) subject area of the data warehouse should enable executives and senior sales managers to perform predictive, “what-if” sales analysis and historical analysis of:
A sales campaign's effectiveness
Geographic sales patterns
Calendar sales patterns
The effects of weather on sales
20 ad-hoc query users
10–20 ad-hoc queries a day per ad-hoc user
40 pre-canned report users
1–4 pre-canned reports a day per pre-canned user
60 months of history
40 million sales transactions per day
From this example, we can discern that we genuinely have a candidate for a data mart or data warehouse. First, the mission statement clearly indicates that our users' requirements are of a more tactical or strategic nature. Second, the majority of our report executions will clearly be ad-hoc (200–400 ad-hoc versus a maximum of 160 pre-canned). Third, we have significant historical data requirements and large amounts of raw data—and thus a potentially very large database (especially once we consider aggregates as well).
While it may seem like I've painted an example tailored to the conclusion, I've actually found the process to be this straightforward and easy in most cases. Unfortunately, these days, people tend to call any reporting database a data warehouse. It's okay for people to call their projects whatever they like, but as I pointed out, the techniques in this book only apply to the DM/DW column of Table 1-1.