Chapter 3
Project/Program Planning

With the historical perspective and grounding from the previous chapter, it's time to get everyone organized to embark on a DW/BI project. We begin this chapter by focusing on the project team, their role as information publishers, and the sometimes insurmountable realities that they confront. We then turn our attention to the all important business stakeholders. Finally, we describe the Kimball Lifecycle approach for tackling your DW/BI initiatives.

Professional Responsibilities

We begin by discussing what the data warehouse/business intelligence manager and team should (and shouldn't) do.

3.1 Professional Boundaries

Ralph Kimball, DBMS, Jul 1998

This article and the next discuss the similarities between a data warehouse manager's job and the responsibilities of an editor in chief.

The data warehouse manager's job is potentially huge, offering many opportunities and just as many risks. The data warehouse manager has been given control of one of the most valuable assets of any organization: the data. Furthermore, the data warehouse manager is expected to interpret and deliver that asset to the rest of the organization in a way that makes it most useful. All eyes are on the data warehouse manager.

In spite of all this visibility, many newly appointed data warehouse managers are simply given their titles without a clear job definition or a clear sense of what is and is not their responsibility. As an industry, we have been groping for a definition of the data warehouse manager position. Perhaps this was appropriate in past years because we needed to define the job. We needed time to get some accumulated experience. We needed to test the boundaries of what being a data warehouse manager means. I think we are somewhat overdue in defining the data warehouse manager's job. It is not sufficient or helpful to just say that a data warehouse manager's job is to “bring all the data into a central place and make it available for management to make decisions.” Although such a definition may be correct, it isn't precise enough for anyone to tell if the data warehouse manager has done the job well. In this article I'll begin to tackle the data warehouse manager's job definition. I will suggest a metaphor for the job that, if accurate, may provide a rich set of criteria to judge a “job well done.” Furthermore, a clear definition will help senior IT executives understand what the data warehouse manager needs to do and, just as importantly, what things a data warehouse manager should not have to do.

A good metaphor for the job of the data warehouse manager is the job of an editor in chief for a magazine or newspaper. At a high level, the editor in chief:

  • Collects input from a variety of sources, including third party authors, investigative reporters, and in-house writers
  • Assures the quality of this input by correcting spelling, removing mistakes, and eliminating questionable material
  • Applies broad editorial control over the nature of the published material and assures a consistent editorial view
  • Publishes on a regular schedule
  • Relies on and respects the trust of the readers
  • Is named prominently on the masthead to serve as a clear communication as to where the buck stops
  • Is driven by continuously changing demographics and reader interests
  • Is driven by rapidly changing media technologies, especially the internet revolution that is happening as we speak
  • Is very aware of the power of the media and consciously markets the publications

These statements seem a little obvious because we all know, based on experience, what the job title “editor in chief” implies. And most editors in chief understand very clearly that they don't create the content about which they write, report, or publish. They are, rather, the purveyors of content created by others.

I hope that you have been struck by the many parallels between the job of an editor in chief and the job of a data warehouse manager. Perhaps a good way to sum this up is to say that the job of the data warehouse manager is to publish the enterprise's data.

Let's examine the parallels between these two jobs. In most cases, the data warehouse manager is aggressively pursuing the same goals as the editor in chief. In some cases, the data warehouse manager could learn some useful things by emulating the editor in chief. At a high level, the data warehouse manager:

  • Collects data inputs from a variety of sources, including legacy operational systems, third party data suppliers, and informal sources
  • Assures the quality of these data inputs by correcting spelling, removing mistakes, eliminating null data, and combining multiple sources
  • Applies broad data stewardship over the nature of the published data and assures the use of conformed dimensions and facts across the disparate business process subject areas (which can be thought of as separate publications). See article 5.14, Data Stewardship 101: The First Step to Quality and Consistency.
  • Releases the data from the ETL data staging area to the individual data subject areas on a regular schedule
  • Relies on and respects the trust of the business users
  • Is named prominently on the organizational chart to serve as a clear communication as to where the buck stops
  • Is driven by the continuously changing business requirements of the organization and the increasingly available sources of data
  • Is driven by rapidly changing media technologies, especially the current internet revolution
  • Is very aware of the business significance of the data warehouse and consciously “captures” and takes credit for the business decisions made as a result of using the data warehouse

In addition, the data warehouse manager has a number of responsibilities that most editors do not have to think about. These special data warehouse responsibilities include backing up the data sources and the final, published versions of the data. These backups must be available—sometimes on an emergency basis—to recover from disasters or provide detail that wasn't published the first time around. The data warehouse manager must deal with overwhelming volumes of data and must diligently avoid being stranded by obsolete backups. He or she must replicate published data in a highly synchronized way to each of the downstream “publications” (business process subject areas) and provide a detailed audit trail of where the data came from and what its lineage and provenance is. The data warehouse manager must be able to explain the significance and true meaning of the data and justify each editing step that the ETL may have performed on the data before it was published, and must protect published data from all unauthorized readers. Of all the responsibilities the data warehouse manager has in addition to the classic editorial responsibilities, this security requirement is the most nightmarish; it is also the biggest departure from the editing metaphor. The data warehouse manager must somehow balance the goal of publishing the data to everyone with the goal of protecting the data from everyone. No wonder data warehouse managers have an identity problem.

In the discussion of the editor in chief's responsibilities, we remarked that nearly all editors in chief understand that they are merely the purveyors of content created by others. Most editors don't have a boundary problem in this area. Many data warehouse managers, on the other hand, do. Frequently, the data warehouse manager agrees to be responsible for allocations, forecasts, behavior scoring, modeling, or data mining. This is a major mistake! All these activities are content creation activities. It is understandable that the data warehouse manager is drawn into these activities, because, in many cases, there is no prior model for the division of the new responsibilities between IT and a business user group such as finance. If an organization has never had good allocated costs, for example, and the data warehouse manager is asked to present these costs, then the data warehouse manager is also going to be expected to create these costs.

The data warehouse manager should treat allocations, forecasts, behavior scoring, modeling, and data mining as clients of the data warehouse. These activities should be the responsibilities of various analytic groups in the finance and marketing departments, and these groups should have an arm's length relationship to the data warehouse. They should consume warehouse data as inputs to their analytic activities and, possibly, engage the data warehouse to republish their results when they are done. But these activities should not be mixed into all the mainline publishing activities of the data warehouse.

Creating allocation rules that let you assign infrastructure costs to various product lines or marketing initiatives is a political hot potato. It is easy for a data warehouse manager to get pulled into creating allocations because it is a necessary step in bringing up a profit-oriented subject area. The data warehouse manager should be aware of the possibility of this task being thrust on the data warehouse and should tell management that, for example, the “data warehouse will be glad to publish the allocation numbers once the finance department has created them.” In this case, the editorial metaphor is a useful guide.

Beyond the boundaries defined by the editorial metaphor, we must add the responsibilities of backing up data, auditing the extract and transform processes, replicating to the presentation tables of the data warehouse, and managing security. These tasks make the data warehouse manager's job more technical, more intricate, and at the same time, broader than the job of an editor in chief.

Perhaps this article can stimulate the development of criteria for data warehouse manager training and help IT executive management appreciate what data warehouse managers face in doing their jobs. In many ways, the responsibilities discussed in this article have been implicitly assumed, but the data warehouse managers have neither had them spelled out, nor been compensated for them.

Finally, by focusing on boundaries, we can see more clearly some content creation activities the data warehouse manager should leave on the table. Allocating, forecasting, behavior scoring, modeling, and data mining are valuable and interesting, but they are done by the readers (business users) of the data warehouse. Or to put it another way, if the data warehouse manager takes on these activities, then the data warehouse manager (of course) should get two business cards and two salaries. Then the boundaries can be twice as large.

3.2 An Engineer's View

Ralph Kimball, Intelligent Enterprise, Jul 26, 2002

This article builds on the preceding one to illustrate the evolution of the publisher metaphor.

Based on reader feedback, I've decided to do something I haven't done for quite some time: go back to lay the foundation for what data warehouse designers do, and why they use certain techniques. Doing this also lets me restate the assumptions and techniques with the benefit of hindsight and more experience. I hope the results are tighter, more up to date, and more clearly worded.

The Data Warehouse Mission

At the outset, I want to share a perspective that I take very seriously, and in some ways is the foundation for all my work in data warehousing. It's the publishing metaphor.

Imagine that you've been asked to take over responsibility for a high quality magazine. If you approach this responsibility thoughtfully, you'll do the following 12 things:

  • Identify your readers demographically.
  • Find out what the readers want in this kind of magazine.
  • Identify loyal readers who'll renew their subscriptions and buy products from the magazine's advertisers.
  • Find potential new readers and make them aware of the magazine.
  • Choose the magazine content most appealing to the target readers.
  • Make layout and rendering decisions that maximize the readers' pleasure.
  • Uphold high quality writing and editing standards and adopt a consistent presentation style.
  • Continuously monitor the accuracy of the articles and the advertisers' claims.
  • Keep the readers' trust.
  • Develop a good network of writers and contributors.
  • Draw in advertising and run the magazine profitably.
  • Keep the business owners happy.

While these responsibilities may seem obvious, here are some dubious “goals” that you should avoid:

  • Build the magazine around the technology of a particular printing press.
  • Put most of your management energy into the printing press operational efficiencies.
  • Use a highly technical and complex writing style that many readers may not understand.
  • Use an intricate and crowded layout style that's difficult to read and navigate.

By building the whole business on the foundation of serving the readers, your magazine is likely to be successful.

The point of this metaphor, of course, is to draw the parallel between being a conventional publisher and being a data warehouse project manager. I'm convinced that the correct job description for a data warehouse project manager is publish the right data. Your main responsibility is to serve your readers who are your business users. While you'll certainly use technology to deliver your data warehouse, the technology is at best a means to an end. The technology and the techniques you use to build your data warehouses shouldn't show up directly in your top 12 responsibilities, but the appropriate technologies and techniques will become much more obvious if your overriding goal is to effectively publish the right data.

Now let's recast the 12 magazine publishing responsibilities as data warehouse responsibilities:

  • Understand your users by business area, job responsibilities, and computer tolerance.
  • Find out what decisions the users want to make with the help of the data warehouse.
  • Identify loyal users who will make effective decisions using the data warehouse.
  • Find potential new users and make them aware of the data warehouse.
  • Choose the most effective, actionable subset of the data to present in the data warehouse, drawn from the vast universe of possible data in your organization.
  • Make the user screens and applications much simpler and more template driven, explicitly matching the screens to the cognitive processing profiles of your users.
  • Make sure your data is accurate and can be trusted, labeling it consistently across the enterprise.
  • Continuously monitor the accuracy of the data and the content of the delivered reports.
  • Keep the business users' trust.
  • Continuously search for new data sources, and continuously adapt the data warehouse to changing data profiles and reporting requirements.
  • Take a portion of the credit for business decisions made using the data warehouse, and use these successes to justify your staffing, software, and hardware expenditures.
  • Keep the business users, their executives, and your boss happy.

If you do a good job with all these responsibilities, I think you'll be a great data warehouse project leader! Conversely, go down through the list and imagine what happens if you omit any single item. Ultimately your data warehouse would have problems.

Design Drivers

Let's boil down the data warehouse mission into some simple but specific design drivers, because as engineers we have to build something. Our users will be happy if we publish the “right” data that gives them insight into their key business questions, in a way that they perceive as simple and fast.

These design drivers dominate everything. As an engineer, I know I can't compromise them or my data warehouse will fail. But, assuredly, there are other real-world constraints that I have to pay attention to. I also try to:

  • Limit administrative overhead in the design and production phases.
  • Limit the total cost of ownership.
  • Reduce the risk of no result or an irrelevant result.
  • Reduce the risks of centralization.

Design Constraints

As data warehouse designers, we live in an enormously complex world. I'm almost reluctant to make a list of the implicit constraints we have to live with because I worry that the new student will decide on a different career! But here's my list:

  • Necessity for decentralized, incremental development because very few organizations are truly, effectively centralized in every function
  • Requirement to integrate multiple technologies that are incompatible at various levels because there are many vendors of technology
  • Truly unreasonable demands for rapid deployment (business users think that six weeks is a little excessive)
  • Need to allow for continuous change (little surprises and big surprises)
  • Inevitability of remote independent data marts springing up
  • Users' desire for instantaneous system response

And three more requirements that have been added just since 2000:

  • The so-called 360 degree view of the customer
  • Tracking, storing, and predicting customer behavior
  • Access to atomic data, delivered from operational systems in real time, and seamlessly connected to old history

And finally a requirement that has grown explosively since 2010:

  • Access to big data in many forms, including machine data from the Internet of Things, streaming event data from the internet, unstructured textual data, and non-textual data such as images

The Engineer's Response

So what does an engineer make of all this? All these requirements and expectations placed on us are beyond overwhelming. Well, building a bridge across the Golden Gate must have seemed pretty impossible in 1930. And how about going to the moon in the 1960s? Data warehouses may not be that hard, but the examples are inspiring.

A good engineer first sorts through all the available mathematics and science, accumulating ideas and possible techniques. Some of the math and science is practical and some isn't. This is the time to be a skeptic. Next, the engineer decomposes the design problem. If it can be broken into pieces that are relatively independent, then the engineer can focus on manageable parts.

Once the design is started, the engineer must continuously choose practical techniques that are reusable, simple, and symmetrical. Idealized designs and assumptions need to be recognized and thrown out. There's no such thing as perfect information or control (especially of people), either at the beginning of a project or after the project is in production. Throughout the project, the engineer must be a good manager. And, finally, the original design goals have to be constantly in view to provide the foundation for all decisions.

3.3 Beware the Objection Removers

Ralph Kimball, Intelligent Enterprise, Sep 1, 2005

An objection remover is a claim made during the sales cycle intended to counter a fear or concern that you have. Objection removers occupy a gray zone in between legitimate benefits and outright misrepresentations. While an objection remover may be technically true, it's a distraction intended to make you close the door on your concern and move forward in the sales process without careful thinking.

Objection removers crop up in every kind of business, but often the more complex and expensive your problem is, the more likely objection removers will play a role. In the data warehouse world, classic objection removers include:

  • You don't need a data warehouse because now you can query the source systems directly. (Whew! That data warehouse was expensive and complicated. Now I don't need it.)
  • You can leave the data in a normalized structure all the way to the business user query tools because our system is so powerful that it easily handles the most complex queries. (Whew! Now I can eliminate the step of preparing so-called queryable schemas. That gets rid of a whole layer of application specialists, and my business users can sort out the data however they want. And since I don't need to transform the source data, I can run my whole shop with one DBA!)
  • Our “applications integrator” makes incompatible legacy systems smoothly function together. (Whew! Now I don't have to face the issues of upgrading my legacy systems or resolving their incompatibilities.)
  • Centralizing customer management within our system makes your customer matching problems go away and provides one place where all customer information resides. (Whew! Now I don't need a system for deduplication or merge-purge, and this new system will feed all my business processes.)
  • You don't need to build aggregates to make your data warehouse queries run fast. (Whew! I can eliminate a whole layer of administration and all those extra tables.)
  • Leave all your security concerns to the IT security team and their LDAP server. (Whew! Security is a big distraction, and I don't like dealing with all those personnel issues.)
  • Centralizing all IT functions lets you establish control over the parts of your data warehouse. (Whew! By centralizing, I'll have everything under my control, and I won't have to deal with remote organizations that do their own thing.)
  • Leave your backup worries behind with our comprehensive solution. (Whew! I didn't really have a comprehensive plan for backup, and I have no idea what to do about long term archiving.)
  • Build your data warehouse in 15 minutes. (Whew! Every data warehouse development plan I've reviewed recently has proposed months of development!)

Objection removers are tricky because they are true—at least if you look at your problem narrowly. And objection removers are crafted to erase your most tangible headaches. The relief you feel when you suddenly imagine that a big problem has gone away is so overwhelming that you feel the impulse to rush to the finish line and sign the contract. That is the purpose of an objection remover in its purest form. It doesn't add lasting value; it makes you close the deal.

So, what to do about objection removers? We don't want to throw the baby out with the bathwater. Showing the salesperson to the door is an overreaction. Somehow we have to stifle the impulse to sign the contract and step back to see the larger picture. Here are four steps you should keep in mind when you encounter an objection remover:

  1. Recognize the objection remover. When your radar is working, spotting objection removers is easy. A startling claim that flies in the face of conventional practice is almost always too good to be true. A sudden feeling of exhilaration or relief means that you have listened to an objection remover. In some cases, the claim is written in black and white and is sitting in plain view on your desk. In other cases, you need to do a little self analysis and be honest about why you're suddenly feeling so good.
  2. Frame the larger problem. Objection removers work because they narrow the problem to one specific pain, which they decisively nail, but they often ignore the larger complexity of the problem or transfer the hard work to another stage of the process. Once you recognize an objection remover, count to 10 before signing the contract and force yourself to think about the larger context of your problem. This is the key step. You'll be shocked by how these claims lose their luster, if only they are placed in the proper context. Let's take a second look at the objection removers listed earlier.
    • You don't need a data warehouse; you can query the source systems directly. This is an old and venerable objection remover that has been around since the earliest days of data warehousing. Originally, it was easy to disqualify this objection remover because the source transaction systems didn't have the computing capacity to respond to complex user queries. But with the recent technical advances in grid computing and powerful parallel processing hardware, a good salesperson can argue that the source systems do have the capacity to do the double duty of processing transactions and serving user queries. But in this case there's a larger issue: The data warehouse is the comprehensive historical repository for perhaps your most important company asset—your data. The data warehouse is purpose-built to house, protect, and expose your historical data. Certainly these issues have come sharply into focus with the recent emphasis on compliance and business transparency. Transaction processing systems are virtually never built with the goal of maintaining an accurate historical perspective on your data. For instance, if the transaction system ever modifies a data element by destructively overwriting it, you've lost historical context. You must have a copy of the transaction system because the historical context is structurally different from the volatile data in a transaction system. Thus, you must have a data warehouse.
    • You can leave your data in a normalized structure all the way to the users. The larger issue: Decision support systems will only work if they seem simple to the business users. Building a simple view is hard work that requires very specific design steps. In many ways, the delivery of data to the business users and their BI tools is analogous to the final delivery made by a restaurant chef from the kitchen through the door to the customers in the dining room. We all recognize the contribution made by the chef in making sure the food is perfect just as it's being taken from the kitchen. The transfer of an (uncooked) complex database schema to business users and their immediate application support specialists is in some ways worse than an objection remover: It's a snare and a delusion. To make this work, all the transformations necessary to create a simple view of the data are still required, but the work has been transferred out of IT to the business user. Finally, if the simple data views are implemented as relational database “views,” then the simple queries all devolve into extremely complex SQL that requires an oversized machine to process!
    • Applications integrators make incompatible legacy systems smoothly function together. The larger issue: Incompatibilities don't come from deficiencies in any technology, but rather from the underlying business practices and business rules that create the incompatible data in the first place. For any application integration technology to work, a lot of hard work must take place up front, defining new business practices and business rules. This hard work is mostly people sitting around a table hammering out those practices and rules, getting executive support for the business process reengineering that must be adopted across the organization, and then manually creating the database links and transformations that make an integrated system work.
    • A centralized customer management system makes your customer data issues go away. This is a subtle objection remover that's not as blatant as the first three. Probably every company would benefit from a rational single view of their customers. But only a core part of the customer identity should be centralized. Once a single customer ID has been created and deployed through all customer-facing processes, and selected customer attributes have been standardized, there remain many aspects of the customer that should be collected and maintained by individual systems. In most cases, these systems are located remotely from a centralized site.
    • You don't need to build aggregates. The larger issue: Data warehouse queries typically summarize large amounts of data, which implies a lot of disk activity. Supporting a data warehouse requires that you constantly monitor patterns of queries and respond with all the technical tools possible to improve performance. The best ways to improve query performance in data warehouse applications, in order of effectiveness, have proven to be:

      1. Clever software techniques building indexes on the data
      2. Aggregates and aggregate navigation
      3. Large amounts of RAM
      4. Fast disk drives
      5. Hardware parallelism

      Software beats hardware every time. The vendors who talk down aggregates are hardware vendors who want you to improve the performance of slow queries on their massive, expensive hardware. Aggregates, including materialized views, occupy an important place in the overall portfolio of techniques for improving performance of long-running queries. They should be used in combination with all the software and hardware techniques to improve performance.

    • Leave all your security concerns to the IT security team. The larger issue: Data warehouse security can only be administered by simultaneously being aware of data content and the appropriate user roles entitled to use that data. The only staff who understands both these domains is the data warehouse staff. Controlling data security is a fundamental and unavoidable responsibility of the data warehouse team.
    • Centralizing all IT functions lets you establish control over the parts of your data warehouse. This objection remover is a cousin of the earlier application integration claim. But this general claim of centralization is more dangerous because it is less specific and therefore harder to measure. Strong centralization has always appealed to the IT mentality, but in the largest organizations, centralizing all IT functions through a single point of control has about as much chance of succeeding as a centrally planned economy. The grand experiment of centrally planned economies in Eastern Europe lasted most of the 20th century and was a spectacular failure. The arguments for centralization have a certain consistency, but the problem is that it's too expensive and too time consuming to do fully centralized planning, and these idealistically motivated designs are too insular to be in touch with dynamic, real-world environments. These plans assume perfect information and perfect control, and are too often designs of what we'd like to have, not designs reflecting what we actually have. Every data architect in charge of a monolithic enterprise data model should be forced to do business user support.
    • Leave your backup worries behind you. Developing a good backup and recovery strategy is a complex task that depends on the content of your data and the scenarios under which you must recover that data from the backup media. There are at least three independent scenarios:

      1. Immediate restart or resumption of a halted process such as a data warehouse load job
      2. Recovery of a data set from a stable starting point within the past few hours or days, as when a physical storage medium fails
      3. Very long term recovery of data when the original application or software environment that handles the data may not be available

      The larger picture for this objection remover, obviously, is that each of these scenarios is highly dependent on your data content, your technical environment, and the legal requirements, such as compliance, that mandate how you maintain custody of your data. A good backup strategy requires thoughtful planning and a multipronged approach.

    • Build your data warehouse in 15 minutes. I've saved the best one for last! The only way you can build a data warehouse in 15 minutes is to narrow the scope of the data warehouse so drastically that there's no extraction, no transformation, and no loading of data in a format meant for consumption by your BI tools. In other words, the definition of a 15-minute data warehouse is one that is already present in some form. Too bad this objection remover is so transparent; it doesn't even offer a temporary feeling of relief.

      There's always a letdown after shining a bright light on these objection removers by examining their larger context. Life is complex, after all. Let's finish our list of four steps:

  3. Create the counterargument. Remember, most objection removers aren't patently fraudulent, but by creating a good counterargument, you'll remove the impulse factor and understand the full context of your problem. This is also an interesting point at which you'll see if the salesperson has a textured, reasonable view of a product or service.
  4. Make your decision. It's perfectly reasonable to buy a product or service even when you've detected an objection remover. If you've placed it in an objective context or dismissed it altogether yet are still feeling good about the product, then go for it!

3.4 What Does the Central Team Do?

Ralph Kimball, DBMS, Jun 1997

Are you part of the central data warehouse team in your large organization? Do you also work with remote divisional or departmental teams implementing far-flung independent data marts? Do you wonder what your role should be versus the divisional teams' role? If these questions are familiar to you, you may be frustrated because you feel you have an important responsibility in your central role, yet you may not actually own anything. Even worse, perhaps the divisional teams are implementing their own subject area data stores and are charging ahead with their own technologies, vendor selections, and logical and physical database designs.

If you are part of the central data warehouse team, you need to play an active role in defining and controlling your enterprise data warehouse. You should not be content to just be a liaison, a recommender, or some kind of ill-defined glue that holds projects together. You need to be a leader and a provider, and you must be very visible to the separate divisional teams implementing their own subject area data marts.

Conversely, if you are a part of a decentralized divisional data warehouse team, you should expect the central team to provide you with some very valuable services and impose some noticeable structure and discipline on your effort. You are a part of an overall enterprise, and you need to conform to these structures and disciplines so that the data from your division can be usefully combined with the data from other divisions.

The central data warehouse team has three major responsibilities that not only are essential to the integrity of the overall data warehousing effort, but cannot be provided by any of the individual divisions. These responsibilities are defining and publishing the corporate dimensions, providing cross-divisional applications, and defining a consistent data warehouse security architecture for the enterprise.

Defining and Publishing Corporate Dimensions

If your organization has multiple lines of business, and if you have any interest in combining these lines of business into an overall framework, you need to identify four or five dimensions that are common to the multiple lines of business. The most obvious corporate dimensions are customer, product, geography, and date. Since this article was written in 1997, the industry has given a name to defining and publishing corporate dimensions: master data management (MDM).

The customer dimension has one record for each one of your customers (actually, one record for each historical description of each customer). Although this seems obvious, the crucial step is to make sure that each divisional data mart (subject area) uses the same single customer dimension wherever there is a reference to customer. Each division will therefore see the customer the same way. Individual divisions will be able to see the relationship other divisions have with the customer, and the enterprise will, perhaps for the first time, be able to see the whole customer relationship.

The requirement that each division must always use the corporate customer dimension is a strong requirement that will impact everyone. From the central team's perspective, the corporate customer dimension must embrace the complete range of possible customers. The key for corporate customer almost certainly will need to be an artificial customer number defined and created by the central data warehouse team. No individual divisional customer number will be administered and controlled in a way that serves the needs of the entire enterprise. The descriptive attributes for customer will need to be a broad set that includes all of the desired descriptors for all of the divisional groups. Hopefully the central data warehouse team can meet with the divisional groups and try to compress and standardize the various descriptors into a reasonable working set, but in the long run there is no absolute need to force any individual group to give up its idiosyncratic customer descriptors. Because dimension tables are generally much smaller than fact tables in a dimensional data warehouse design, there is room for multiple sets of customer descriptors in the same customer dimension record. In banks, I have often seen the master customer record contain 200 descriptive attributes. I advocate making this customer table a single flat, denormalized table in order to improve user understandability and allow the exciting new bitmap indexing technologies to speed lookup. The leading database vendors all have bitmap indexes that thrive on fat, wide, denormalized dimension tables.

The need to standardize on a corporate definition of customer is a very important central team responsibility. This need will arise whenever management wants to look at overall customer relationships, and it will arise whenever your enterprise makes an acquisition. The good news is that you just acquired your biggest competitor. The bad news is that you must now merge the separate customer dimensions.

Often there will be no individual division willing or able to provide the corporate customer dimension. Yet frequently these same divisions will be eager to receive such a customer dimension from you, the central team. Even the production online transaction processing (OLTP) systems in the division may be willing to upload the central data warehouse's customer dimension if it contains cleaned and corrected customer addresses. In these cases, the central data warehouse team becomes the creator and provider of the master enterprise customer file.

The failure to enforce the use of a corporate customer dimension is a very serious lapse and therefore must be addressed by the central data warehouse team. The ability to “drill across” separate business processes can only be accomplished if all of the shared dimensions in the separate data marts are conformed. In other words, if two data marts have the same dimension, such as customer, these two customer dimensions must share a set of common attribute fields that are drawn from the same domains. The descriptive attributes within these shared dimensions must be defined and populated in a consistent way. Allowing two customer dimensions in different business process subject areas to drift apart means that the two subject areas cannot be used together. Ever.

The product dimension has one record for each one of your products or services. As the central data warehouse team, you must be the judge as to whether your individual divisions have enough in common to warrant building a corporate product dimension. In a tightly integrated business, such as the multiple geographic divisions of a big retailer, it seems obvious that a single master product dimension is called for, even if there are regional variations in the products sold. In such a case, you probably already have a production group that works constantly to define this product master and download its descriptions and price points to the cash registers of the individual stores. The central data warehouse team can easily work with this group to create a proper product dimension for the enterprise. In a financial services organization, such as a bank or insurance company, there is a lot of interest in a core or super-type view of the business that groups all of the products and services into a single hierarchical framework. But there will be lots of special descriptive attributes that only make sense for a single division, such as mortgage loans versus credit card accounts. In this case, the best approach is to create a core product dimension containing only common attributes, and a set of custom or sub-type product dimensions that are different for each division. This heterogeneous product design is discussed in article 9.17, Hot-Swappable Dimensions. Finally, if your enterprise is so loosely integrated that your products have almost nothing in common, and if your management has never bothered to define a set of hierarchies that group all of the different divisional products into a common framework, then you may not need a corporate product dimension. Just remember, if you don't create a conformed product dimension, you will not be able to combine the separate divisional data marts together.

The geography dimension has one record for each one of your districts, regions, or zones. If your individual divisions have incompatible sales geographies, then the separate division data marts can only be used together at the highest common geographical aggregate. If you are lucky, this highest geographical aggregate will be something useful to all the data marts, such as state.

The date dimension has one record for each calendar time period. Hopefully all of your divisions operate on the same calendar and report on the same fiscal periods. If at all possible, all reporting should be done either at an individual daily grain or at the fiscal period grain, such as month. In this case, days always roll up to the fiscal periods, and the fiscal periods roll up to years. It becomes a monumental headache if the separate divisions have incompatible reporting calendars. Fortunately, the central data warehouse team has a strong ally in the corporate financial reporting group who hopefully is in the process of standardizing the reporting calendars of the divisions. Separate databases denominated in weeks and months should be avoided at all costs because these time dimensions cannot usefully be conformed, and the week databases will always be isolated from the month databases.

Providing Cross-Divisional Applications

The central data warehouse team is in a unique position of being able to provide cross-divisional applications. A value chain of distributed business process subject areas can be assembled from each of the separate divisions. In article 6.2, Drilling Down, Up, and Across, I showed six subject areas for a clothing store retailer. These subject areas may have been built at different times, but they share a number of common dimensions. From the previous discussion, it should be clear that the central team should define and enforce the use of these dimensions.

With the proper tools, it is fairly easy to build cross-divisional applications in these environments that are capable of drilling across. I discussed drilling across in some detail in article 13.25, Features for Query Tools.

The central team's responsibility is to provide a reporting tool that allows drilling across and to enforce the use of common dimensions.

Defining a Consistent Data Warehouse Security Architecture

The data warehouse team must play a very proactive role in understanding and defining security. The data warehouse team must include a new member: the security architect.

The security architect for the data warehouse should define and enforce a data warehouse security plan for the enterprise and all of the individual subject areas; define consistent user privilege profiles; identify all possible access points in the corporate network, including every modem and every web interface; implement a single logon protocol so that a user is authenticated once for the whole network; track and analyze all attempts to override security or gain unauthorized access to the warehouse; implement a link encryption scheme, or its equivalent, for remote users; implement a remote user authentication scheme, such as a biometric reader at each PC, that is more reliable than typed user passwords; and educate the divisional teams about security issues and consistent administration of the subject areas.

3.5 Avoid Isolating DW and BI Teams

Margy Ross, Design Tip #64, Feb 8, 2005

Several people have asked a similar question recently. “Should the DW or BI team gather requirements from the business?” Honestly, this question makes the hair start to stand up on the back of my neck. I'm concerned that too many organizations have overly compartmentalized their data warehouse and business intelligence teams.

Of course, some of this division is natural, especially when the resources allocated to DW/BI grow as the environment expands, creating an obvious span of control issues. Also, separation of labor allows for specialization. Viewing the overall DW/BI environment as analogous to a commercial restaurant—some team members are highly skilled in kitchen food preparation, while others are extremely attentive to the needs of the restaurant patrons, ensuring their return for a subsequent visit. There are likely few waiters who should suddenly don the chef's garb, and vice versa.

Despite the distribution of responsibilities, the kitchen and dining rooms of a restaurant are tightly entwined. Neither can be successful on its own. The best chefs need a well-trained, well-oiled front room machine; the most attractive dining room requires depth and quality from the kitchen. Only the complete package can deliver consistent, pleasurable dining experiences (and sustainability as a restaurant). That's why the chef and wait staff often huddle to educate and compare notes before a meal rush.

In the world of DW/BI, we've observed that some teams take a more isolationist approach. Matters are further complicated by the complexities and realities of organizational culture and politics. There may be a kitchen and dining area, but there's no swinging door between the two. It's like there's a transom (above eye level) where orders and plates are flung back and forth, but the two teams of specialists aren't communicating or cooperating. In this scenario, you end up with data models that can't reasonably be populated; or data models that don't address the diners' needs and/or leverage their tools; or diners' tools that are overtaxed or slow performing because they're repeatedly doing the work that could have been done once in the kitchen and shared throughout the organization. In the worst case, the wall becomes so impenetrable that the BI dining room substitutes a different kitchen (or creates their own) to source meals.

The data warehouse should be the foundation for effective business intelligence. Too many organizations have focused on one without the other. Sure, you can create a data warehouse without concern for business intelligence, and vice versa, but neither situation is sustainable for long. Isolationism is not a healthy approach for building and supporting the DW/BI environment. Even if you don't report into the same management structure, collaboration and communication are critical.

3.6 Better Business Skills for BI and Data Warehouse Professionals

Warren Thornthwaite, Intelligent Enterprise, May 11, 2008

Anyone who takes even a cursory glance at the common DW/BI failure points would spot the pattern: The hardest parts for most teams are understanding business issues and handling cross-organizational interactions. The challenges will become even more common as the DW/BI system evolves into a standard component of the IT environment, thus weakening the connection to the business.

Historically, DW/BI projects were driven by visionary people who were conversant in both business and technology. They understood the potential of better business insight and were motivated to help the DW/BI team deliver on it. As DW/BI systems have become commonplace, teams are now often made up of more technically oriented people. This change is not bad in and of itself, but it does mean that DW/BI teams will need to build up skills around understanding business requirements and strengthening relationships with business users.

Let's be clear here: Not everyone on the team needs to get an MBA, but everyone on the team should develop a basic understanding of how their organization works, how to work well with other people, and how to communicate more effectively, both in writing and in business meetings and presentations. This article presents advice on improving in these areas, along with valuable resources for professional development.

Building Business Understanding

To better understand business, the first place to start is with documents from your own organization. Annual reports, strategic plans, marketing plans, and internal vision documents all provide solid insight into your business and its challenges and opportunities. However, without a basic business understanding, you may not get full value from these documents. Fortunately, there are many readily accessible resources to gain this understanding. Start with a book or two. There are at least a dozen good books available that attempt to boil a two-year MBA program down into a short, easy read. Here's a book that offers a good overview of the basics of business:

The Ten-Day MBA: A Step-By-Step Guide to Mastering the Skills Taught in America's Top Business Schools, 4th edition, by Steven A. Silbiger (Collins, 2012)

Once your senior management team has selected its top priorities for the DW/BI system and you want to learn more about it, seek out books that go into detail on those areas, whether it's marketing, sales, finance, promotions, manufacturing, logistics, or other functional areas. At this level, it's probably more valuable to take a course in the particular subject area rather than read a book. A good instructor can bring experience and perspective to the topic, drawing from multiple resources and adding exercises, examples, and anecdotes. You also get the benefit of interaction with classmates. If you can't find relevant courses at local universities, colleges, or community colleges, try looking online. Many major institutions offer online MBA courses. Search the internet and you will find plenty of options.

Building Interpersonal Skills

Some would argue that interpersonal skills are even more important than business acumen. Right from the start, someone on the DW/BI team must be able to persuasively articulate the vision and value of the system if you are to bring the project into existence or realign an existing project. Before any software is installed, someone on the team must ask questions about goals and elicit honest, useful answers without frightening or alienating the business users. Fortunately, these, too, are skills that can be learned. To master the basics of interpersonal and relationships skills, try one of these classics:

Crucial Conversations: Tools for Talking When Stakes are High, 2nd edition, by Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler (McGraw-Hill, 2011)

How to Win Friends and Influence People by Dale Carnegie (Simon and Schuster, 1936, revised 1998)

The 7 Habits of Highly Effective People by Stephen R. Covey (Simon and Schuster, 2013)

You will need to translate the advice offered in these books to your situation, but the core principles and recommendations are enduring. Basic principles such as having a sincere interest in others, listening to what they say, and looking for a win-win solution are the foundation of interpersonal success. Employ these principles with the goal of building a positive long term relationship with your business users—a relationship based on openness and trust. Use the techniques described in these books in a sincere fashion; if your intent is to manipulate or deceive, your efforts will backfire in the long run.

Building Public Speaking Skills

Every successful leader must be able to effectively communicate to groups. Public speaking and presentations involve two skill sets: the private act of preparing the content itself and the public act of delivering it to the audience. Both skill sets are addressed at length in these two books:

Presentation Zen: Simple Ideas on Presentation Design and Delivery by Garr Reynolds (New Riders Press, 2011)

The Quick and Easy Way to Effective Speaking by Dale Carnegie (Simon and Schuster, 1990; dated but still useful)

Getting better at speaking and presenting takes practice, but practicing in the work environment can be difficult and intimidating. It may help to take a class or find another venue for practice. One excellent, low cost option for developing public speaking skills is Toastmasters International (http://www.toastmasters.org/). Toastmasters is a nonprofit organization with thousands of local chapters across the globe. Toastmasters clubs typically have 20 to 30 members who meet two to four times per month for one or two hours. Members learn by speaking to the group and working with others in a supportive environment. The Toastmasters site can help you find clubs near you. There are also numerous commercial organizations that provide professional training in public speaking.

Building Written Communication Skills

Written communications are another pillar of the communications skill set. Much of how we interact with others is through the written word. Emails, proposals, requirements documents, and related documentation actually define the DW/BI system to 95 percent of your business user community. Poor or sloppy writing can distract the reader and undermine the value and credibility of the message. Try one or more of these bibles on good writing:

The Elements of Style by William Strunk Jr. and E.B. White (Allyn and Bacon, 2011)

Keys to Great Writing by Stephen Wilbers (Writers Digest Books, 2007)

On Writing Well: The Classic Guide to Writing Nonfiction by William Zinsser (Collins, 2006)

In addition to these books, there are hundreds of web sites that offer writing tips and style guides. Take a look at Purdue University's Online Writing Lab (http://owl.english.purdue.edu/).

Most of what you find on the internet is a starting point at best. It's difficult to provide the depth of content available in a book or classroom on the web, but it's an easy place to start.

Practice, Practice, Practice

The internet and books are a good starting point, but they are no substitutes for practice. These are skills like any other; the more you work on them, the more you will improve. Every time you communicate with another person, you have an opportunity to practice your interpersonal skills. Every email, document, or even comments in your code present a chance to work on better writing. Opportunities to practice public speaking are all around you, but they need to be cultivated a bit. Ask for time in your team's weekly meeting to present an educational segment on an interesting problem you've been dealing with and how you solved it. Offer to teach one of the DW/BI system ad hoc tool classes. In fact, if you take it seriously—by preparing well, creating clear graphics, and practicing—even a design review is an opportunity to practice your presentation skills.

Ask your manager or human resources department for additional recommendations on communications resources. While you're at it, have them include the whole area of communications in your performance appraisal. That way, you get credit for doing what you need to do to be successful in your project. You also get a little external motivation.

If you are excited by the prospect of improving your business acumen, interpersonal skills, speaking talent, and writing abilities, then you are half way there! Your enthusiasm and motivation will make it a rewarding endeavor.

3.7 Risky Project Resources Are Risky Business

Margy Ross, Design Tip #173, Mar 9, 2015

Over the years, we've worked with countless exemplary DW/BI project team members: smart, skilled, dedicated, and motivated, coupled with a healthy dose of mutual trust, respect, and camaraderie with their teammates. Teams with members who possess these characteristics tend to fire on all cylinders, with the resulting whole often greater than the sum of the parts. But we've also run into risky project resources; in addition to being individual non-contributors, they can undermine the effectiveness of the entire DW/BI team. Model team members often become short-timers if the team is stacked with unproductive non-performers. We hope your team doesn't include resources that resemble the following profiles:

  • Obstructionist debaters are perpetual naysayers who find fault with everything and get more satisfaction from the process of debating than the process of delivering.
  • Heat seekers are always anxious to try the latest, greatest technical gadgets and gizmos regardless of whether they align with the DW/BI project's objectives.
  • Cookie cutters continue to do exactly what's worked for them in the past, regardless of their latest assignment's nuances.
  • Weed dwellers lose sight of the forest for the trees, focusing exclusively on the nitty-gritty details without regard to the bigger picture.
  • Perpetual students and researchers want to read, read, and then read some more, but are disinclined to ever take action because there's always more to learn.
  • Independent spirits march to their own drummer without regard to rules, standards, or accepted best practices.
  • Honesty dodgers and problem hiders are always nodding “yes” and saying “no problem,” even when serious issues are lurking just around the corner.
  • Dysfunctional incompetents and mental retirees are checked out and unable to perform.
  • Self-declared “know it all” experts don't need to listen because they already have all the answers—just ask them!
  • Threatened worriers are so paralyzed with fear about what might happen that they respond by doing nothing at all.

Of course, even with superstar teammates, the right team leadership is also necessary. Hopefully your DW/BI project/program manager fits the following bill:

  • Establishes a partnership with the business, including joint ownership for the DW/BI project/program, in part because they're respected by the business as being user-oriented rather than technology-focused.
  • Demonstrates excellent interpersonal and organizational skills since the DW/BI project/program is a political and cultural animal.
  • Recruits and retains resources with the talent to deliver, gets them operating cohesively from a common playbook, and understands that adding more mediocre players won't increase the team's chances of winning. Conversely, they also spot individuals who are slowing down the effort and proactively counsel them (or at least minimize the risk of project derailment).
  • Listens keenly, plus communicates effectively and honestly, setting appropriate expectations and having the courage to say “no” when necessary.
  • Optimally possesses some DW/BI domain expertise, in addition to strong project management skills. At a minimum, they're staying one chapter ahead of the project team in The Data Warehouse Lifecycle Toolkit …
  • Understands that DW/BI success is directly tied to business acceptance. Period.

Inexperienced, ineffective, or indecisive DW/BI project managers who don't demonstrate these characteristics are equally risky project resources.

3.8 Implementation Analysis Paralysis

Bob Becker, Design Tip #66, Apr 11, 2005

Many data warehouse teams lean heavily toward the doing side. They leap into implementation activities without spending enough time and energy to develop their data models, identify thorough business rules, or plan their ETL data staging processes. As a result, they charge full speed ahead and end up reworking their processes, delivering bad or incomplete data, and generally causing themselves difficulty.

Other project teams have the opposite challenge. These teams are committed to doing their homework in all the critical areas. They are focused on data quality, consistency, completeness, and stewardship. However, these project teams sometimes bog down on issues that should have been resolved long ago. Of course, this impasse occurs at the worst time—the promised implementation dates are rapidly approaching and design decisions that should be well into implementation remain unresolved.

Naturally, the outstanding issues involve the most difficult choices and the project team disagrees on the best solutions. The easy issues have already been resolved and the solutions for the more difficult issues don't come as easily. Despite copious amounts of time spent in research, data profiling, design meetings, and informal discussions, nothing seems to move the team closer to a decision on the best approach. The project sits at a crossroads unable to move forward. By this time, fear has usually taken hold of the project team. The pressure is on.

One helpful approach is the use of an arbitrator, a trusted individual from outside the project team, to help move the team ahead. The outstanding issues are identified and meetings scheduled with the arbitrator and interested stakeholders. All participants must agree that a final decision will be made during these sessions. The arbitrator should establish a time box to limit discussion on each issue. Discuss the pros and cons of each approach one last time; the arbitrator makes the ultimate decision if the team can't reach consensus.

Another approach is to defer the unresolved issues until a future implementation after further research and discussion have identified an appropriate solution. The downsides to this approach are that the business requirements may not allow the issues to be deferred; postponing resolution may simply delay the inevitable without any significant gain.

There is a delicate balance between planning and doing in the data warehouse world. The goal is to identify reasonable solutions, not necessarily perfect solutions, so the team can transition from planning to implementation. There may still be more to learn, but the implementation process is often more effective at revealing the weak spots in the plan so they can be reinforced than any amount of talking and planning. In fact, for many of the hard choices, much of the information needed to make good choices can only be gained through trial and error.

Clearly, we are not advocating a casual, ad hoc approach to implementing the data warehouse. But we recognize that sometimes you must be pragmatic and move forward with less than ideal solutions that may need to be revisited to achieve your overall goals.

3.9 Contain DW/BI Scope Creep and Avoid Scope Theft

Bob Becker, Design Tip #154, Apr 1, 2013

Keeping tight control over the scope of your data warehouse/business intelligence (DW/BI) program is an important ingredient for success. Surprisingly, in some organizations it's equally important to ensure that the program doesn't suffer the theft of its scope after an otherwise good plan has been developed.

It's nearly impossible to tackle everything at once in a DW/BI program. The DW/BI team should identify subsets of effort based on the organization's business processes (see article 4.7, Identifying Business Processes), to phase the overall design, development, and deployment effort. Each phase or iteration should be meaningful and manageable. That is, the scope of effort should be large enough to result in a deliverable that provides meaningful business value, yet the scope should be small enough that the size of the team, the amount of data involved, and the communications requirements are “reasonable,” especially given the resources allocated. It's important to avoid a large galactic scope that requires an army of resources and years of effort.

Once the scope for the project is established, there will inevitably be pressures to increase the scope. The most disruptive kind of scope creep is adding a new data source. Such a small incremental change in scope seems innocuous enough: “We are just adding a new cost element” or “We are just adding a weather source to the sales data.” However, these so-called small changes add up over time. Often the DW/BI project team is at fault. Most DW/BI teams are very focused on serving the requirements of their business partners and sponsors. But you need to be careful in your enthusiasm to serve not to succumb to scope creep that negatively impacts your ability to meet expectations. While it's important to be flexible, it's necessary to say “no” to maintain scope, especially after the project is underway.

Scope creep also results from overzealous business partners and sponsors. Once they've thrown their support behind the DW/BI initiative, they're eager for quick progress and immediate results. They clamor for a greater scope (more business processes, more history, complex business rules, and so on) delivered more quickly than originally agreed upon. Clearly, the business's enthusiasm and support is positive, but so is sticking to the game plan and delivering the meaningful and manageable scope. The main business sponsor in a DW/BI project needs to be a sophisticated observer of the technical development, as well as being a good manager.

A strong program/project manager is critical for successfully managing scope creep. The program manager's job will be made easier with a supportive business sponsor, a solid IT/business partnership, and a project team committed to meeting expectations. Frequent frank discussions between the program manager and business sponsor regarding the scope, challenges, progress, timing, and expectations will ensure everyone remains on the same page.

A second concern regarding project scope is scope theft. Scope theft occurs when proponents of other projects/programs within the organization try to attach their agenda to the DW/BI program. Often these efforts support important enterprise needs; however, when they become attached to the DW/BI program, it's usually the DW/BI program that suffers. These other efforts often lack business sponsorship, payback, and most importantly, funding. Supporters of these initiatives want to attach their agendas to the DW/BI program to leverage (i.e., steal) the momentum, senior management visibility, business sponsorship, staffing, and/or funding of the DW/BI initiative. These efforts are often repackaged as prerequisites, companion, and/or complementary efforts to the DW/BI initiative; suddenly the agreed scope of the DW/BI initiative has been usurped to focus on other initiatives.

There are a wide variety of enterprise initiatives that might potentially attach themselves to a DW/BI program. Some common examples include:

  • Data quality initiatives focused on the replacement of aging operational systems
  • Efforts to solve ongoing operational data integration challenges resulting from ineffective source system integration
  • Master data management efforts
  • IT projects focused on infrastructure re-platforming
  • Implementation of a new role-based security infrastructure
  • First deployment of a publish-subscribe style service oriented architecture (SOA)
  • Utilization of off-premises cloud-based storage for the DW/BI initiative

There is little argument that these initiatives may be in the best interest of the enterprise (as is the DW/BI program); it's hard to dispute them conceptually. But, be wary. The reason the proponents of these initiatives want to leverage the DW/BI program is because their initiatives are complex, challenging, and expensive—they have likely been unable to secure the funding and sponsorship required to launch as a separate viable initiative. Since the DW/BI program has strong business support and funding, these competing initiatives look to cloak themselves as prerequisites or architectural requirements for the DW/BI program to move under its program umbrella.

The key question is whether these other initiatives become part and parcel of the DW/BI program. Typically not; it is likely best for both initiatives if they stand on their own merits. Otherwise the DW/BI program risks becoming a more IT-driven, galactic effort that delays the planned DW/BI capabilities and fails to meet the business's expectations. These other efforts are almost always clearly distinct from the DW/BI program and should be scoped, funded, sponsored, and governed independently. The DW/BI program manager and business sponsor need to be politically astute, cautious, and fight to ensure the DW/BI program remains independent. The DW/BI business sponsor must typically work through these challenges given that the organizational politics are often beyond the DW/BI program manager's control.

3.10 Are IT Procedures Beneficial to DW/BI Projects?

Joy Mundy, Design Tip #129, Nov 3, 2010

Back when the world was young and data warehousing was new, projects were a lot more fun. Kimball Group consultants (before there was a Kimball Group) were usually called in by the business users, and we'd help them design and build a system largely outside the aegis of corporate IT. In the intervening years—decades!—data warehousing has become a mainstream component of most IT organizations. For the most part, this is a good thing: The rigor that formal IT management brings to the DW makes our systems more reliable, maintainable, and performant. No one likes the idea of a BI server sitting under a business user's desk.

However, IT infrastructure is not always helpful to the DW/BI project. Sometimes it gets in the way, or is actively obstructionist. No doubt every little sub-specialty of information technology clamors that it is somehow different or special, but in the case of DW/BI, it's really true.

Specifications

The classic waterfall methodology subscribed to by many IT organizations has you develop a painfully detailed specification that's formalized, agreed to by the business and IT, and then turned over to the developers for implementation. Changes to the specification are tightly controlled, and are the exception rather than the rule.

If you try to apply a waterfall methodology to a DW/BI project, the best you'll end up with is a reporting system. The only things you can specify in sufficient detail are standard reports, so that's what you get: a system to deliver those standard reports. Many specs include a demand to support ad hoc analysis, and sometimes include examples of specific analyses the users would like to be able to do. But within the waterfall methodology it's impossible to clearly specify the boundaries and requirements of ad hoc analyses. So the project team “meets” this requirement by plopping an ad hoc query tool in front of the database.

It's really frustrating for the business users who are asked to write or approve the specification. They know the spec doesn't capture the richness of what they want, but they don't know how to communicate their needs to IT. I was recently in a meeting with a disgruntled business user who glared at the DW manager and said “Just provide me with a system that captures all the relationships in the data.” If only that one sentence were sufficient for the poor guy to design a data warehouse.

The data model takes a much more central role in the system design for a data warehouse than for a transaction system. As Bob argued in article 4.12, Using the Dimensional Model to Validate Business Requirements, the data model—developed collaboratively with the business users—becomes the core of the system specification. If the business users agree that any realistic analysis on the subject area can be met through the data in the model, and IT agrees the data model can be populated, the two sides can shake hands over the model. This is quite different from the standard waterfall approach, where the data modeler would take the spec into his cubicle and emerge several weeks later with the fully formed data model.

Naming Conventions

Another place I've seen formal IT processes get in the way is in naming the entities in the data warehouse. Of course, this is much less important an issue than the specifications, but I find myself deeply annoyed by naming dogmatisms. Because the DW database is designed for ad hoc use, business users are going to see the table and column names. They are displayed as report titles and column headers, so extremely long column names are problematic.

That said, naming conventions, the use of case words, and minimizing the use of abbreviations are all good ideas. But like all good ideas, temper them with reason. Taken to an extreme, you can end up with absurdly long column names like (and I'm not making this up) CurrentWorldwideTimezoneAreaIdentifier.

Although they give us something to laugh about, the real problem with absurdly long names is that the users won't tolerate them in their reports. They'll always be changing them in the report display, which means they'll use inconsistent names and introduce a new (and stupid) reason for the “multiple versions of the truth” problem. Please let your naming conventions be tempered by common sense.

Dogma

Unless it's Kimball Group dogma. I have heard all sorts of rules instituted by “someone at corporate IT.” These include:

  • All queries will be served from stored procedures (clearly someone doesn't understand what “ad hoc” means).
  • All ETL will be done in stored procedures (All? Why?).
  • All database constraints will be declared and enforced at all times (most of the time—sure; but all the time?).
  • All tables will be fully normalized (no comment).
  • There will be no transformations to data in the DW (I don't have any response to this one other than a puzzled expression).

Don't get us wrong… As easy as it is to mock some practices, we believe in professionally developed and managed DW/BI systems, which usually mean IT. The advantages are huge:

  • Central skills in data modeling, ETL architecture, development, and operations are greatly leveraged from one project to the next.
  • Professional development, including code check-ins, code reviews, ongoing regression testing, automated ETL, and ongoing management
  • Solid bug tracking, release management techniques, and deployment procedures mean an IT-managed DW/BI system should more smoothly roll out improvements to a system already in production.
  • Security and compliance

However, if you're on a project that's suffering under an IT mandate that makes no sense to you, don't be afraid to push back.

Justification and Sponsorship

With the project team roles and responsibilities defined, we shift our focus to the all important business stakeholders of the data warehouse program. We'll describe the habits that make or break a business sponsor, challenge the conventional view of total cost of ownership calculations, then close this section with guidance to boost your skills for interacting with the business.

3.11 Habits of Effective Sponsors

Margy Ross, Intelligent Enterprise, Sep 1, 2003

As data warehouse designers, you know how important a business executive sponsor is to your initiative. After focusing on data warehousing for the past two decades, I'm convinced that strong business sponsorship is the leading make-or-break indicator of data warehouse success. Having the right sponsor can overcome a multitude of shortcomings elsewhere in the project. On the other hand, sponsor turnover is one of the most common causes of data warehouse stagnation; unsponsored data warehouses simply don't survive.

In this article, I explore the characteristics that distinguish highly effective data warehouse sponsors. Sponsors who take their responsibilities seriously seem to naturally crave guidance about doing their job well. They're just as interested in data warehouse success as you are. Of course, the more effective the business sponsor is, the more fun you'll have associating with the data warehouse initiative. Remember that you're both on the same team. So after reading this article, route it to your existing or potential sponsors as this one's for them.

Setting Up for Success

You've volunteered (or perhaps been mildly coerced) to serve as the business sponsor for the data warehouse. You've been successful in most previous ventures, but this is new and different. Then again, it can't be that difficult, can it?

No, it's not difficult, if you're committed. After working with hundreds of data warehouse sponsors, the same patterns of behavior occur repeatedly. I encourage you to learn from others' mistakes and keep these habits in mind as you undertake your new responsibilities.

As a data warehouse sponsor, it's important that you visualize and verbalize the potential effects of improved information on the organization's key initiatives. If you have the authority, but don't truly believe, then you should step aside as the business sponsor because you'll inevitably struggle to be effective. Data warehouse business sponsors need to be passionate about the cause and convey their vision to the organization. If this doesn't sound like you, then you and the data warehouse team need to find an alternate sponsor before plowing full steam ahead; otherwise, you'll be doing a disservice to yourself, the warehouse team, and the entire organization.

Resist the Path of Least Resistance

A common approach to managing the enterprise's data assets is avoidance. Data is managed departmentally rather than across the organization. Each department or organizational function builds its own private data repository, but there's no overall enterprise view. It's initially appealing because every department gets exactly what it wants without contending with organizational consensus. Numerous existing data warehouses have been constructed on this basis.

However, because each department uses slightly different definitions and interpretations, no one's data ties to anyone else's, and the result is anarchy. You lose the ability to look across the enterprise, missing opportunities for cross-functional sharing. Likewise, this mayhem produces significant organizational waste. Partially redundant databases translate into partially redundant data development, administration, and storage expenses. Even more wasteful are the resources devoted to understanding and reconciling this inconsistently defined data.

Of course, you can bring order to the chaos, but you need the political clout, financial means, and inclination to challenge the status quo. Rather than letting everyone build independent, department-centric databases or dashboards, corporate information assets need to be proactively managed. Many CIOs consider this their fiduciary responsibility. As the business sponsor, you'll need to work closely with the CIO to encourage key stakeholders to watch out for the good of the greater whole. The enterprise will need to commit to a common foundation, relying on shared reference information (known as dimensions to our regular readers) as the foundation for integration. No one said it would be a cakewalk.

Establishing order begins with a simple underlying premise: Key performance metrics are reused repeatedly across the organization. In a health insurance company, processing claims is a primary business activity, although one department may analyze the insured party's characteristics while another wants to study the health care professional involved. Despite distinct departmental focuses, claims data is a common link.

Rally Those around You

Data warehouse business sponsors need to influence others up, down, and across the organization. You're the internal data warehouse booster. Rallying the troops typically comes naturally to sponsors. You need to create enthusiasm within your organization, without overpromising or setting expectations that the data warehouse team can never fulfill. Don't be surprised if the project manager asks you to send out a friendly reminder conveying the importance of this initiative to key business representatives, encouraging their participation. Predictably, business folks more actively engage in data warehouse requirements or review meetings when it has been strongly suggested by their boss's boss. In the long run, getting business representatives personally involved during the design and development phases is crucial to their acceptance of the ultimate deliverable.

Rallying your peers is equally important, especially if you understand the ultimate costs associated with the easy route and commit to an enterprise approach. Obviously, you'll need your peers' backing, both financially and organizationally, to realize this vision. However, you may need to create awareness first. Perhaps your peers have already written big checks for new transaction processing systems, but don't yet understand that these operationally oriented systems won't address their analytic needs.

Business sponsors also need to build peer consensus on priorities. This consensus must be resilient enough that everyone involved sends consistent messages to the organization, both in words and actions. Attempting to step above the political fray doesn't come naturally. You'll need both a keen awareness of your peers' needs and the ability to empathize, without alienating them or being condescending. Early sponsors of the data warehouse often recruit peers to become cosponsors. You and your cosponsors must agree on and be committed to common goals. Inevitably, you will feel organizational pressure to change course.

Patience Is a Virtue

Successful data warehouses are built incrementally. Business users won't get everything they want in the first phase, when establishing the core shared reference data is the priority. You need to be patient and ensure others follow suit. The data warehouse business sponsor should share the team's desire to balance meaningfulness and manageability when considering project iterations and time frames.

Analytic and reporting alternatives to the data warehouse probably already exist in the organization. Any incremental expenditure to enhance or expand these pre-existing point solutions should be evaluated against the overall goal. In the long run, many of these departmental solutions will be migrated or retired. Obviously, you need peer support to make this happen, especially when the near-term resolution to requests is “no” or “you're next in line.”

Serving as a business sponsor for the data warehouse is not a short term assignment, so plan to be involved for the long haul. Your visible participation helps encourage organizational acceptance and maintain momentum and enthusiasm.

Battles will erupt, usually over funding or resources. Business sponsors need to provide ongoing financial support for the foreseeable future. Data warehouses collapse without adequate funding. On the other hand, you can overspend. Although big budgets provide the team with short term euphoria, nothing is sustainable if the costs exceed the benefits. You must expect and assure that the data warehouse team continues to deliver new incremental value to the organization in a timely and consistent manner.

Remain Focused on the Goal

Data warehouses are built to materially improve the organization's decision-making capabilities. As the data warehouse sponsor, you should continually remind all within earshot that business acceptance is the key success metric for the initiative. Everyone involved with the program, both on the business and IT side, needs to recite this pledge daily. That's especially true for those vulnerable to distraction by whiz-bang technology. Remember, nothing in your mission to improve the organization's decision making says anything about technology.

You must understand that building a data warehouse isn't a single project, but an ongoing program. Asking when the data warehouse will be finished is akin to asking, “When will we stop developing and enhancing our operational transaction processing systems?” The data warehouse environment will advance as an organization's core business processes, associated operational systems, and analytic processes mature.

As the business sponsor for the data warehouse, you ensure that the program is in continuous alignment with the business. As the business and its initiatives evolve, so must the vision for the data warehouse. The team can't afford to become complacent and rest on its laurels. The sponsor should ensure the program stays on a trajectory for excellence, without dictating or expecting compliance with unrealistic time frames. Your strategy should focus on those opportunities that translate into the biggest benefit for the overall enterprise.

The habits of highly effective data warehouse business sponsors don't require a degree in rocket science. They're common sense, largely variations on themes that let us be effective in other aspects of our professional and personal lives. So go forth and be an effective sponsor—the organization and data warehouse team are counting on you!

1 3.12 TCO Starts with the End User

Ralph Kimball, Intelligent Enterprise, May 13, 2003

The recent industry preoccupation with calculating the total cost of ownership (TCO) for a data warehouse focuses on minor issues while missing the big picture. Metaphors such as “not seeing the forest for the trees” or even “rearranging the deck chairs on the Titanic” come to mind.

Reading various white papers and listening to industry consultants would make you think that data warehouse TCO is dominated by the costs of hardware, software licenses and maintenance contracts, IT staff expenses, and services.

Some of the more detailed analyses break these categories down to reveal chronically unplanned costs such as hiring and firing, training, tuning, testing, and documentation. One large hardware vendor even bases its TCO advantages on reducing the number of DBAs. With all these categories as a foundation, the various vendors “prove” that their approach is better by claiming to show head-to-head comparisons. Observers like you and me are left to wonder how the conflicting claims could be so dramatically at odds. But all of this muddling misses the big picture. As one of my former colleagues used to say, there is a hippopotamus in the room but nobody is talking about it.

Bad Decisions Are Costs

When I think about data warehouse TCO, I start by asking: Why do we have a data warehouse, and who in the organization judges the cost and the return from a data warehouse? My answer is that the data warehouse publishes the data assets of the company to most effectively facilitate decision making. In other words, the data warehouse is an organization-wide resource, the cost and value of which you must judge by its effect on decision making in the broadest possible sense.

So who are the decision makers in an organization? For the most part, they're not the IT employees! The mainline decision makers of an organization are the business users of the data warehouse, whether they're executives, managers, knowledge workers, analysts, shop floor stewards, customer service representatives, secretaries, or clerks. All these users have a powerful, instinctive need to see data. With the computer revolution of the past half century, a profound cultural shift has taken place in the way businesses are run. Business end users are now quite certain that if they can see their data, they can run their businesses better.

So when I step back to the broader perspective of an entire organization trying to make better decisions with a data warehouse, the traditional IT-centric costs don't even make my top ten costs to be concerned about! The hippopotamus in this room is a set of problems that can destroy the value of a data warehouse. And yes, to avoid these problems entails some real costs.

Here's my list of the real sources of data warehouse costs that come before the traditional hardware, software, staff, and services costs that we've been so fixated on. The list begins with the most important in my opinion, but you can adjust the ranking to your own environment:

  • Data needed for decisions is unavailable
  • Lack of partnership between IT and business users
  • Lack of explicit user-focused cognitive and conceptual models
  • Delayed data
  • Unconformed dimensions
  • Unconformed facts
  • Insufficiently verbose data
  • Data in awkward formats
  • Sluggish, unresponsive delivery of data
  • Data locked in a report or dashboard
  • Prematurely aggregated data
  • Focus on data warehouse return on investment (ROI)
  • Creation of a corporate data model
  • A mandate to load all data into the warehouse

As you'll see, most of these problems are akin to snatching defeat from the jaws of victory. When these problems raise their heads, they can threaten the entire data warehouse initiative. If the data warehouse fails, the cost analysis is really ugly. Add the traditional IT-centric costs to the cost of running your business without your data.

In all of these types of problems, the potential cost is the failure of the data warehouse. This cost dominates and renders traditional cost analyses meaningless. There's no upper bound on the potential cost of not being able to make the right decision. Your goal is to replace this potentially unbounded cost with finite, knowable costs, and at the same time eliminate the risks of losing the data warehouse. Many of these finite costs are in the numeric range of the less important costs I listed at the start of this article.

A Closer Look at the Costs

Taking a constructive view, let's look at these sources of cost to the data warehouse, how much they affect the overall organization, and what we can do to reduce them.

  • Data needed for decisions is unavailable. This is the big one. Unavailable data means the data warehouse failed to inform decisions. We want to replace this unbounded and unknowable cost with the predictable costs of gathering business requirements from the users, studying what information users need when making decisions, regularly canvassing business decision makers to understand new requirements, and systematically trawling for new sources of data and new metrics that explain or predict events.
  • Lack of partnership between IT and business users. When IT and the business users don't have a good partnership, the users will be frustrated because they aren't being served well, and IT will blame the users for complaining, not being computer literate, and not reading the documentation. A failed or underperforming data warehouse results with probably no clear consensus on how to fix it. Decisions will be missed because the system isn't usable. A good partnership means that IT staff live in the user environments, and that there's a flow of personnel across the business/IT organization boundary. I've often said that the best application support person is permanently conflicted as to whether the business or the technology is more appealing. These people (of which I am one) spend their entire careers moving back and forth across the business/IT boundary. The cost to address this problem is an explicit program of tours of duty for IT people to spend a year or longer working directly in the business user department. Business credibility for IT personnel is the “gold coin.”
  • Lack of explicit user-focused cognitive and conceptual models. IT application designers systematically make things too complicated, or assume that business users will be adept at wrangling data from one computer window into another, or assume that users even want to perform analysis. Business end users come in many flavors. A good IT applications delivery team will carefully profile the cognitive and computer sophistication of the users, and at the same time construct conceptual models of how the user performs a task and makes a decision. Then the team can choose or configure the delivery tools to be the best match. The cost of this approach is significant. Multiple tools may be needed. More custom user interfaces and canned reports may have to be built. In my experience, this focus on business users is rare.
  • Data needed for decisions is delayed. There's been a groundswell of interest recently in providing real-time data warehouse access. The tongue in cheek definition of real time is “any data delivery that is too fast for the current extract, transform, and load system.” The demand for real-time data warehousing appears in many market trading, customer support, credit approval, security authorization, medical diagnosis, and process management situations. The killer, of course, is a data delivery system that's too slow to support the decision that must be made in real time. The costs of real-time data delivery can be significant, and there's no single approach. I described one piece of this puzzle in article 11.44, Real-Time Partitions.
  • Unconformed dimensions. If a customer dimension (for instance) is unconformed, it means that two of your data sources have incompatible customer categorizations and labels. The result is that the two data sources can't be used together. Or, more insidiously, the data sources will look like they can be compared but the logic is wrong. The cost, once again, is a lost opportunity to be well informed about your customers, but there are huge unreported costs when managers waste time resolving the data incompatibilities and vent their anger at not having comprehensible data. The more desirable cost in this case is the cost of resolving the categorization and labeling differences up front when designing conformed dimensions, as I describe in article 5.4, Divide and Conquer.
  • Unconformed facts. Unconformed facts are related to unconformed dimensions. They arise when two numeric measures are similar, but cannot be logically combined in a calculation such as a ratio or a difference. For instance, two different revenue numbers may not be put into the same calculation because one is before tax adjustments and one is after. The cost to fix this problem can be combined with the cost of conforming dimensions and can be accomplished by the same people in the same meetings.
  • Insufficiently verbose data. Providing verbose dimensional descriptions is a basic responsibility of the data warehouse designer. Each attribute in a product or customer dimension is a separate entry point into the data because attributes are the dominant means of constraining and grouping data. The cost of making data more verbose usually comes from finding, cleaning, and merging auxiliary data sources into the database.
  • Data in awkward formats. There are several categories of poorly formatted data that defeat the business users, even when the data is present. By far, the worst offender is data presented to the business user in a normalized entity-relationship format. These complex normalized schemas are impossible for business users to understand, and they require custom schema-dependent programming to deliver queries and reports. A few vendors actually recommend normalized schemas for data warehouse delivery and then sell extremely expensive hardware solutions to IT that are powerful enough to overcome these inefficient schemas. What these vendors systematically avoid is an honest accounting of the application development costs and lost opportunity costs they have transferred to the business users.
  • Sluggish, unresponsive delivery of data. Business users have little tolerance for slow user interfaces. The only truly acceptable response time is instantaneous, and the data warehouse designer must always have this as a goal. Business users aren't likely to try ad hoc queries more than once if they take many minutes or hours to return a result. The use of a really fast decision support system is qualitatively different from a system that has to be used in batch job mode. Users of a fast system try far more alternatives and explore more directions than users of a slow system. Fixing a slow system is a multipronged challenge, but it starts with good database design and good software. Dimensional models are fast for querying and normalized models are slow, given the same hardware capacities. After addressing the database design and choice of software, the next relevant performance knobs are lots of real memory (RAM), proper tuning with aggregations and indexes, distributed parallel architectures, and CPU raw speed.
  • Data locked in a report or dashboard. Data that can't be transferred in tabular format with a single command to a spreadsheet is locked uselessly in an application. Choose applications that allow any visible data to be copied by the user to another tool, especially a spreadsheet.
  • Prematurely aggregated data. Business process subject areas that consist of aggregated, non-atomic data are dangerous because they anticipate a set of business questions and prohibit the business user from drilling down to needed detail. This was the fatal mistake of the executive information system movement in the late 1980s. The solution to this problem, of course, is not to use the outmoded definition of “data mart” but to base all business process subject areas on dimensional models using the most atomic data. Atomic data is the most naturally dimensional data and is the most able to withstand the ad hoc attack where business users pose unexpected and precise questions.
  • Focus on data warehouse ROI. It's fashionable to measure return-on-investment with highly analytic-sounding techniques, such as payback period, net present value, or internal rate of return. In my opinion, these miss the main point of evaluating the costs and eventually the value of a data warehouse. A data warehouse supports decisions. After a decision is made, give the data warehouse a portion of the credit, and then compare that retrospectively to the costs of the warehouse. My rule of thumb is to take 20 percent of the monetary value of a decision made and book that to the benefit of the data warehouse. This really drives the point that the only meaningful view of the warehouse is the ability to support business user decisions.
  • Creation of a corporate data model. In most cases, an a priori effort to model an organization's data is a waste of time. All too often the model is an ideal expression of how data ought to be, and the thousands of entities created are never actually physically populated. It may be fun, and even mildly educational, but the corporate data model is a waste of time that delays the data warehouse. Now, a data model that describes an actual data source, warts and all, is probably a good thing.
  • Mandate to load all data into the warehouse. Finally, following a mandate to source “all data” in an organization is an excuse to avoid talking to the business users. While it's necessary to have a design perspective that understands the basic dimensionality and content of all your data sources, most IT shops will never be able to address more than a fraction of all their possible data sources. When the preliminary data audit is finished, it's time to go hang out with the business users and understand which of those data sources need to be published in the data warehouse first.

I hope that after reading about all these potential and real costs of the data warehouse you've almost forgotten about hardware, software, and services. Years ago at the Xerox Palo Alto Research Center (now just PARC), I was shocked when Alan Kay, inventor of the personal computer, said, “Hardware is tissue paper. You use it and throw it away.” That seemed disrespectful of all the tangible pieces of gear we had at the research center, but he was right. For the data warehouse, the only thing that matters is to effectively publish, in the most pleasing and fulfilling sense, the right data that supports our ability to make decisions.

Kimball Methodology

The final section in this chapter describes the Kimball Lifecycle approach, presents common mistakes made on DW/BI projects, and offers practical advice to reduce project risks.

3.13 Kimball Lifecycle in a Nutshell

Margy Ross, Design Tip #115, Aug 4, 2009

A student in a recent class asked me for an overview of the Kimball Lifecycle approach to share with her manager. Confident that we'd published an executive summary, I was happy to oblige. Much to my surprise, our only published Lifecycle overview was a chapter in a Toolkit book, so this Design Tip addresses the unexpected content void in our archives.

The Kimball Lifecycle approach has been around for decades. The concepts were originally conceived in the 1980s by members of the Kimball Group and several colleagues at Metaphor Computer Systems. When we first published the methodology in The Data Warehouse Lifecycle Toolkit (Wiley, 1998), it was referred to as the Business Dimensional Lifecycle because this name reinforced three fundamental concepts:

  • Focus on adding business value across the enterprise.
  • Dimensionally structure the data delivered to the business via reports and queries.
  • Iteratively develop the solution in manageable lifecycle increments rather than attempting a Big Bang deliverable.

Rewinding back to the 1990s, our methodology was one of the few emphasizing this set of core principles, so the Business Dimensional Lifecycle name differentiated our approach from others in the industry. Fast forwarding to the late 2000s when we published the second edition of the Lifecycle Toolkit, we still absolutely believed in these concepts, but the industry had evolved. Our principles had become mainstream best practices touted by many, so we condensed the methodology's official name to simply the Kimball Lifecycle.

In spite of dramatic advances in technology and understanding during the last couple of decades, the basic constructs of the Kimball Lifecycle have remained strikingly constant. Our approach to designing, developing, and deploying DW/BI solutions is tried and true. It has been utilized by thousands of project teams in virtually every industry, application area, business function, and technology platform. The Kimball Lifecycle approach has proven to work again and again.

The Kimball Lifecycle approach is illustrated in Figure 3.1. Successful DW/BI implementations depend on the appropriate amalgamation of numerous tasks and components; it's not enough to have a perfect data model or best of breed technology. The Lifecycle diagram is the overall roadmap depicting the sequence of tasks required for effective design, development, and deployment.

Image described by caption and surrounding text.

Figure 3.1 The Kimball Lifecycle diagram.

Program/Project Planning and Management

The first box on the roadmap focuses on getting the program/project launched, including scoping, justification, and staffing. Throughout the Lifecycle, ongoing program and project management tasks keep activities on track.

Business Requirements

Eliciting business requirements is a key task in the Kimball Lifecycle because these findings drive most upstream and downstream decisions. Requirements are collected to determine the key factors impacting the business by focusing on what business users do today (or want to do in the future), rather than asking “what do you want in the data warehouse?” Major opportunities across the enterprise are identified, prioritized based on business value and feasibility, and then detailed requirements are gathered for the first iteration of the DW/BI system development. Three concurrent Lifecycle tracks follow the business requirements definition.

Technology Track

DW/BI environments mandate the integration of numerous technologies, data stores, and associated metadata. The technology track begins with system architecture design to establish a shopping list of needed capabilities, followed by the selection and installation of products satisfying those architectural needs.

Data Track

The data track begins with the design of a target dimensional model to address the business requirements, while considering the underlying data realities. The word Kimball is synonymous with dimensional modeling where data is divided into either measurement facts or descriptive dimensions. Dimensional models can be instantiated in relational databases, referred to as star schemas, or multidimensional databases, known as OLAP cubes. Regardless of the platform, dimensional models attempt to address two simultaneous goals: ease of use from the users' perspective and fast query performance. The enterprise data warehouse bus matrix is a key Kimball Lifecycle deliverable representing an organization's core business processes and associated common conformed dimensions; it's a data blueprint to ensure top-down enterprise integration with manageable bottom-up delivery by focusing on a single business process at a time. The bus matrix is tremendously important because it simultaneously serves as a technical guide, a management guide, and a forum for communication with executives. The bus matrix is illustrated and described in detail in articles 5.5, The Matrix, and 5.6, The Matrix: Revisited.

The dimensional model is converted into a physical design where performance tuning strategies are considered, then the extract, transform, and load (ETL) system design and development challenges are tackled. The Lifecycle describes 34 subsystems in the ETL process flow grouped into four major operations: extracting the data from the source, performing cleaning and conforming transformations, delivering the data to the presentation layer, and managing the back room ETL processes and environment.

Business Intelligence Track

While some project members are immersed in the technology and data, others focus on identifying and constructing a broad range of BI applications, including standardized reports, parameterized queries, dashboards, scorecards, analytic models, and data mining applications, along with the associated navigational interfaces.

Deployment, Maintenance, and Growth

The three Lifecycle tracks converge at deployment, bringing together the technology, data, and BI applications. The deployed iteration enters a maintenance phase, while growth is addressed by the arrow back to project planning for the next iteration of the DW/BI system. Remember that a DW/BI system is a long term program, not a one-off project!

Throughout the Kimball Lifecycle, there's a recurring theme acknowledging that DW/BI professionals must continuously straddle the business's requirements and the underlying realities of the source data, technology, and related resources. Project teams who focus exclusively on the requirements (or realities) in isolation will inevitably face significant delivery and/or business adoption risks.

Finally, we've said it before, and we'll surely repeat it again. Regardless of your organization's specific DW/BI objectives, we believe your overarching goal should be business acceptance of the DW/BI deliverables to support decision making. This target must remain in the bull's eye throughout the design, development, and deployment lifecycle of any DW/BI system.

3.14 Off the Bench

Margy Ross, Design Tip #49, Sep 15, 2003

Plenty of folks in the DW/BI industry seem to feel personally qualified to explain the Kimball approach, while further fueling debate about which approach for tackling DW/BI development is best. Unfortunately, they sometimes spread misunderstandings and continue to blur the issues. While we'd never pretend to be an expert on the corporate information factory (CIF), we do feel it's our responsibility to clarify our methods rather than watching from the sidelines.

When we wrote the first edition of The Data Warehouse Lifecycle Toolkit (Wiley, 1998), we referred to our approach as the Business Dimensional Lifecycle. In retrospect, we should have probably just called it the Kimball Approach as suggested by our publisher. We chose the Business Dimensional Lifecycle label instead because it reinforced our core tenets about successful data warehousing based on our collective experiences since the mid-1980s:

  1. First and foremost, you need to focus on the business. If you're not enabling better business decision making, then you shouldn't bother investing resources in data warehouses and business intelligence. Focusing on the business does NOT imply that we encourage the development of isolated data stores to address specific departmental business needs. You must have one eye on the business' requirements, while the other is focused on broader enterprise data integration and consistency issues.
  2. The analytic data should be delivered in dimensional models for ease-of-use and query performance. We recommend that the most atomic data be made available dimensionally so that it can be sliced and diced “any which way.” As soon as you limit the dimensional model to pre-summarized information, you've limited your ability to answer queries that need to drill down into more details.
  3. While the data warehouse program will constantly evolve, each iteration should be considered a project lifecycle consisting of predictable activities with a finite start and end.

Somewhere along the line, we were tagged as being a “bottom-up” approach. Perhaps this term was associated because of our strong alignment with the business. Unfortunately, the label fails to reflect that we strongly recommend the development of an enterprise data warehouse bus matrix to capture the relationships between the core business processes/events and core descriptive dimensions BEFORE development begins. These linkages ensure that each project iteration fits into the larger enterprise puzzle.

Finally, we believe conformed dimensions, logically defined in the bus matrix and then physically enforced through the ETL staging process, are absolutely critical to data consistency and integration. They provide consistent labels, business rules/definitions, and domains that are reused as we construct more fact tables to integrate and capture the results from additional business processes/events.

So these are the concepts that we hold near and dear. I know I'm biased, but I frankly don't see that they warrant debate.

3.15 The Anti-Architect

Ralph Kimball, Intelligent Enterprise, Jan 14, 2002

Data warehousing is interesting because it involves so many different kinds of businesses and because the responsibility is so central to the mission of IT. But, as important as this job is, I have often felt overwhelmed when I listen to someone explain all the data warehouse manager's responsibilities. Be responsive to the business. Be responsive to the business users. Use technology wisely. Don't forget anything. Deliver results on time. Be courteous, kind, thrifty, brave, clean, and reverent.

Sometimes I find that casting data warehouse responsibilities in the negative is an effective way to cut through the vagueness. We've always been told what to do; now let's balance the list with what not to do. I list the mistakes in order of increasing seriousness here. Of course, all of these are showstoppers, so you might order them differently.

  • Mistake 1: Rely on past consultants or other IT staff to tell you the data warehouse requirements. Don't interview the business users.

    The reality: Nothing substitutes for direct exposure to the business users. Develop and trust your instincts gained from firsthand experience. Develop the ability to listen.

  • Mistake 2: Live with the assumption that the administrators of the major OLTP source systems of the enterprise are too busy and important to spend a lot of time with the data warehouse team, and they certainly cannot significantly alter their operational procedures for passing data to or from the data warehouse.

    The reality: If your organization really understands and values the data warehouse, then the OLTP source system administrators should be effective partners with you in downloading the data you need and uploading your cleaned data, such as customer names and addresses.

  • Mistake 3: After the data warehouse has been rolled out, set up a planning meeting to discuss ongoing communications with the business users, if the budget allows.

    The reality: Newsletters, training sessions, and ongoing personal support of the user community should all be part and parcel of the first rollout of the data warehouse.

  • Mistake 4: Make sure all the data warehouse support personnel have nice offices in the IT building, which is only a short drive from the business users. Set up a data warehouse support number with lots of touchtone options, and a guarantee that you will get back to the users as soon as possible. And of course, assure the users that they can email the support team any time, day or night.

    The reality: Data warehouse support people should be physically located in the business user departments, and while on assignment, should spend all their waking hours devoted to the business content of the departments they serve. Such a relationship engenders trust and credibility with the business users, which ultimately is the “gold coin” for IT.

  • Mistake 5: Declare user success at the end of the first training class. Make sure that the user's business intelligence tools are very powerful and be sure to demonstrate every feature and every command, including building complex reports, in the first training class. Defer training about the content of the data, because you have scheduled the training class on the dummy data you have been using for development, and the real data won't be ready for another few months. Don't bother to schedule follow up training or training for new employees. You've met the milestone.

    The reality: Delay training until your first business process subject area is ready to go live on real data. Keep the first training session short and focus only on the simple uses of the tool. Train 50 percent on the tool and 50 percent on the content of the data. Plan on a permanent series of beginner and follow up training classes. Take credit for the user success milestone when your trained users are still using the data warehouse six months after training.

  • Mistake 6: Assume that sales, operations, and finance users will naturally gravitate to the good data and will develop their own killer apps.

    The reality: Business users are not application developers. Most will use the data warehouse only if a killer application is waiting to beckon them.

  • Mistake 7: Make sure that before the data warehouse is implemented you write a comprehensive plan that describes all possible data assets of your enterprise and all the intended uses of information. Avoid the seductive illusion of iterative development, which is only an excuse for not getting it right the first time.

    The reality: Very few organizations or human beings can develop the perfect, comprehensive plan for a data warehouse up front. Not only are the data assets of an organization too vast and complex to describe completely up front, but the urgent business drivers, and even the staff, will change significantly over the life of the first implementation. Start with a lightweight data warehouse bus architecture of conformed dimensions and conformed facts, and build your data warehouse iteratively. You will keep altering and building it forever.

  • Mistake 8: Don't bother the senior executives of your organization with the data warehouse until you have it up and running and can point to a significant success.

    The reality: The senior executives must support the data warehouse effort from the very beginning. If they don't, or can't, then your organization may not be able to use a data warehouse effectively. Get this support at the very beginning.

  • Mistake 9: Encourage the business users to give you continuous feedback throughout the development cycle about new data sources and new key performance metrics they would like to see. Make sure to include these requirements in your upcoming release.

    The reality: You need to think like a software developer and manage three very visible stages of developing each subject area deliverable:

    1. The requirements gathering stage, in which every suggestion is considered seriously
    2. The implementation stage, during which changes can be accommodated but must be negotiated and will generally slip the schedule
    3. The rollout stage, where the project features are frozen. In the second and third stages you have to stop being a pal to everyone or else fall victim to scope creep.

    The trick is to turn around the development cycle as fast as possible, where in each “sprint” you have a clearly defined and achievable set of goals which you implement and test before accepting new input.

  • Mistake 10: Agree to deliver a high profile customer-centric dimensional model as your first deliverable. Ideally, choose customer profitability or customer satisfaction as your beginning point.

    The reality: These kinds of subject areas are consolidated, “second level” subject areas with serious dependencies on multiple sources of data. Customer profitability requires all the sources of revenue and all the sources of cost, as well as an allocation scheme to map costs onto the revenue! Focus the first deliverable instead on a single source of data and do the more ambitious subject areas later.

  • Mistake 11: Define your professional role as the authority on appropriate use of the data warehouse. Educate the business users as to how to think about the data, and what the appropriate uses of computers should be. Systematically raise the sophistication of the user community until most business users can develop their own data access applications, thereby eliminating the need for long term support.

    The reality: Your job is to be the publisher of the right data. Your professional role is to listen to the business users, who are always right. The users, not you, define the usability of the computers. You will be successful only if you serve the users' needs, not the other way around.

  • Mistake 12: Collect all the data in a physically centralized data warehouse before interviewing any business users or releasing any data marts. Ideally, implement the data warehouse on a single, monolithic machine where you can control and protect everything.

    The reality: More power to you if you have the organizational clout and the budget to implement a fully centralized data warehouse. But, in my opinion, a centrally planned data warehouse is as likely to be as successful as a centrally planned economy. It's hard to argue against idealistic inspirational promises made in advance of reality, but the truth is, the centrally planned systems often don't work. Instead, build cost-effective, distributed systems, and add incrementally to the logical and physical design as you learn from your business users. And finally, don't assume that your big, expensive, centralized machine is intrinsically secure because it is big, expensive, and centralized. If anything, such a centralized machine is a single point of failure—a system vulnerability.

3.16 Think Critically When Applying Best Practices

Bob Becker and Ralph Kimball, Intelligent Enterprise, Mar 26, 2007

Data warehousing is a mature discipline with well-established best practices. But these best practices are useless or even harmful if they are described inaccurately or incompletely. We have published more than 100 articles describing various aspects of the Kimball method. Yet every year or two we encounter serious misrepresentations made especially by speakers at training organizations where hundreds of students are given misleading information about our approach.

This article addresses major points of misunderstanding and vagueness by providing guidelines that DW/BI professionals can tuck into their project notebooks and refer to as unassailable facts and best practices of the Kimball method.

Take an Enterprise Approach

The Kimball method is specifically intended to deliver large-scale enterprise DW/BI solutions. Occasionally it has been described as a bottom-up approach, but it's more accurately described as a blended approach starting with an enterprise, top-down view. At the same time, it's tempered with the bottom-up realities of real data sources.

We teach an enterprise point of view, starting with horizontal, cross-department requirements gathering. This involves the executive team, senior managers, and data analysts identifying and prioritizing high value needs. The next step is to create the enterprise data warehouse bus matrix, a pivotal design document and powerful tool for understanding and creating the appropriate enterprise data architecture to support the business requirements. As we've said many times, real data sources in their atomic form are the data marts (or business process subject areas) of the enterprise, a definition that differs from other designers who define data marts only as aggregated releases from a centralized data store. When we then say (correctly) that the enterprise data warehouse is the sum of these data marts, other observers sometimes miss the point of our architecture; see article 4.10, The Bottom-Up Misnomer, for more details.

Embrace Business Intelligence

Business intelligence is a term that has emerged and evolved over the past few years and is now often used to describe all the systems and processes an enterprise uses to gather, process, provide access to, and analyze business information. The term data warehouse is now used to mean the platform for all forms of business intelligence.

Since we have been writing on this topic for more than 20 years, we are beholden to our legacy of books and articles. In fact, “data warehouse” is included in the title of all of our books! Nonetheless, changing industry vernacular does not change the core concepts and methodologies described by the Kimball method. Our approach has always embraced the entire, end-to-end process as critical to an organization's success.

Design Dimensional Schemas

The Kimball method is predicated on the principle that all business user access to data is supported via dimensional schemas. Thus, what we call the presentation area of the overall business intelligence solution is comprised of a number of granular, atomic fact tables decorated with a rich set of descriptive, conformed dimension tables. We specifically avoid summarized, departmental data marts supported by a large, normalized data warehouse providing access to atomic data. We believe that such a centralized and normalized view of a data warehouse is responsible for many of the failures of data warehousing to support business intelligence applications.

Dimensional modeling is a design discipline focused on optimizing the business intelligence platform for business users' ease of use and query performance. To achieve the goals of simple and fast, we describe a set of very specific design recommendations:

  • Conformed master dimensions form the bedrock of the enterprise DW/BI system and by themselves address the central issues of integration.Fact tables are derived directly from measurement processes found in familiar operational transaction applications. A fact table should never be departmentally or functionally bound but rather depends only on the physics of the original measurement process.
  • Fact tables should always be populated at the most atomic level possible for maximum flexibility. Atomic data lets business users ask constantly changing, far-ranging, and very precise questions. It also assures the extensibility of additional attributes, metrics, or dimensions without disrupting existing reports and queries.
  • Exposing snowflaked or normalized dimension tables directly to business users is strongly discouraged. We have shown repeatedly that properly designed denormalized (flat) dimension tables contain precisely the same information content as normalized schemas. The only difference is complexity, as experienced by the business users. We embrace (and teach) normalized designs in the extract, transform, and load (ETL) phases; however, we avoid normalization in the user accessible presentation area.

Articles 5.16, Differences of Opinion, and 6.10, Fables and Facts, describe these concepts in greater detail.

Use Conformed Dimensions for Integration

Data integration and consistency are key goals of any enterprise business intelligence effort. Data integration requires organizational consensus to establish and administer common labels and measures enterprise wide. In the Kimball method, these labels and measures reside in conformed dimensions and conformed facts, respectively. Conformed dimensions are typically built and maintained as centralized, persistent master data during ETL, then reused across dimensional models to enable data integration and ensure consistency.

We enthusiastically support the recent master data management (MDM) and customer data integration (CDI) trends, because they are very consistent with the conformed approach. For more insight, read articles 5.6, The Matrix: Revisited, and 5.12, Integration for Real People.

Carefully Plan the ETL Architecture

Our approach describes a formal data staging area, much like the kitchen in a restaurant, with detailed ETL processes required to bridge the gap between the production system source data and the presentation area dimensional schema. The approach further defines cleaning and conforming activities as part of the transformation process.

Let there be no doubt, the ETL effort is hard work. The ETL system is often estimated to consume 70 percent of the time and effort of building a business intelligence environment. Too often, little thought goes into architecting a robust ETL system, and it ends up as an uncoordinated, spaghetti-mess of tables, modules, processes, scripts, triggers, alerts, and job schedules. This sort of design approach has unmistakably derailed many business intelligence efforts.

The Kimball method describes a comprehensive set of ETL subsystems that comprise a robust set of ETL best practices, including those required to support real-time requirements as described in article 11.2, The 34 Subsystems of ETL.

Be wary of any approach that suggests that ETL is no longer a required architectural component. Some architects believe that a simple intermediate data structure or an integration software layer is all that's needed to perform translation on the fly. Unfortunately, true data integration can only succeed if the textual descriptors in each separate source are physically altered so they have the same label (column name) and content (data domain values). If it sounds too easy, it is.

This article has highlighted five best practices drawn from the Kimball method, which we recommend designers study carefully in order to avoid the misrepresentations sometimes heard in various teaching and writing venues. As a designer, you are free to choose any approach you are comfortable with, but we want you to think critically when you are making these choices.

3.17 Eight Guidelines for Low Risk Enterprise Data Warehousing

Ralph Kimball, Intelligent Enterprise, Apr 26, 2009

In today's economic climate, DW/BI projects face two powerful and conflicting pressures. On the one hand, business users want more focused insight from their BI tools into customer satisfaction and profitability. Conversely, these same users are under huge pressure to control costs and reduce risks. The explosion of new data sources and new delivery modes available for BI really makes this dilemma acute.

How can we fail? We could do nothing, thereby overlooking important customer insights and specific areas where we could be more profitable. We could start a task force to produce a grand architectural specification covering the next couple of years, which is just another way of doing nothing. We could implement several high priority spot solutions, ignoring overall enterprise integration. We could start by buying a big piece of iron, believing that it is so powerful that it will handle any type of data, once we decide what that data is.

You get the idea. Even though some of these ways to fail seem obviously dumb, we can nevertheless find ourselves in these positions when we respond with a crisis mentality.

How can we succeed? How can we move forward quickly and decisively while at the same time clamping down on risk? Enterprise data warehousing (EDW) development is never easy, but this article presents eight guidelines for approaching this intimidating task in a flexible, reasonably low risk way.

Work on the Right Thing

We recommend a simple technique for deciding what the right thing is. Make a list of all your potential DW/BI projects and place them on a simple 2 x 2 grid, like the one in Figure 3.2.

A 2x2 grid as a matrix for business impact and feasibility. Box A located in High Business Impact, High Feasibility quadrant is encircled.

Figure 3.2 Impact versus feasibility prioritization.

Figure out, with your business executives, how valuable each of the potential projects would be, independent of the feasibility. Next, do an honest assessment of whether each project has high quality data and how difficult it will be to build the data delivery pipelines from the source to the BI tool. Remember that at least 70 percent of BI project risks and delays come from problems with the data sources and meeting data delivery freshness (latency) requirements.

Once projects have been placed on the grid, work from the upper-right corner. Project A in Figure 3.2 has high business impact and is eminently feasible. Don't take the easy way out and start with low risk project D. That project may be feasible, but even if you do a great job, it won't have much impact. Similarly, don't start with project C. The users would love to have it, but there are big feasibility issues which translate into big risks.

Give Business Users Control

A few years ago, data warehousing was essentially relabeled as “business intelligence.” This relabeling was far more than a marketing tactic, because it correctly signaled the transfer of the initiative and ownership of the data assets to the business users. Everyone knows instinctively that they can do a better job if they can see the right data. Our job in IT is to sort through all the technology in order to give the users what they want.

The transfer of control means having users directly involved with, and responsible for, each DW/BI project. Obviously these users have to learn how to work with IT so as to make reasonable demands. The impact-feasibility grid shown in Figure 3.2 is not a bad place to start.

Proceed Incrementally

In this era of financial uncertainty, it's hard to justify a classic waterfall approach to DW/BI development. In the waterfall approach, a written functional specification is created that completely specifies the sources, the final deliverables, and the detailed implementation. The rest of the project implements this specification, often with a big bang comprehensive release. The origins of the waterfall approach lie in the manufacturing industry, where changes after implementation are prohibitively costly. The problem with the waterfall approach for DW/BI projects is that it takes too long and does not recognize the need to adapt to new requirements or changes in understanding.

Many DW/BI projects are gravitating to what could be called an “agile” approach that emphasizes frequent releases and midcourse corrections. Interestingly, a fundamental tenet of the agile approach is ownership by the business users, not by technical developers.

An agile approach requires tolerating some code rewriting and not depending on fixed price contracts. The agile approach can successfully be adapted to enterprise-wide projects such as master data management and enterprise integration. In these cases, the first few agile releases are not working code but rather architectural guidelines.

Start with Lightweight, Focused Governance

Governance is recognizing the value of your data assets and managing those assets responsibly. Governance is not something that is tacked onto the end of a DW/BI project. Governance is part of a larger culture that recognizes the value of your data assets and is supported and driven by senior executives. At the level of an individual project, governance is identifying, cataloging, valuing, assigning responsibility, securing, protecting, complying, controlling, improving, establishing consistent practices, integrating across subject areas, planning for growth, planning to harvest value, and generally nurturing. Governance doesn't need a waterfall approach, but these issues need to be part of the project from the very start. Failing to think about governance can result in fundamental rework of the DW/BI project.

Build a Simple, Universal Platform

One thing is certain in the BI space: The nature of the user-facing BI tools cannot be predicted. In the future, what's going to be more important: data mining predictive analytics, delivery to mobile devices, batch reporting, real-time alerts, or something we haven't thought of yet? Fortunately, we have a good answer to this question; we must recognize that the enterprise data warehouse is the single platform for all forms of business intelligence. This viewpoint makes us realize that the enterprise data warehouse's interface to all forms of BI must be agnostic, simple, and universal.

Dimensional modeling meets these goals as the interface to all forms of BI. Dimensional schemas contain all possible data relationships, but at the same time can be processed efficiently with simple SQL emitted by any BI tool. Even the flat files preferred by data mining tools are easily delivered from dimensional models.

Integrate Using Conformed Dimensions

Enterprise-wide integration has risen to the top of the list of DW/BI technical drivers along with data quality and data latency. Dimensional modeling provides a simple set of procedures for achieving integration that can be effectively used by BI tools. Conformed dimensions enable BI tools to drill across multiple business process subject areas, assembling a final integrated report. The key insight is that the entire dimension (customer, for example) does not need to be made identical across all subject areas. The minimum requirement for a drill-across report is that at least one field be common across multiple subject areas. Thus, the EDW can define a master enterprise dimension containing a small but growing number of conformed fields. These fields can be added incrementally over time. In this way, we reduce the risk and cost of enterprise integration at the BI interface. This approach also fits well with our recommendation to develop the DW/BI system incrementally.

Manage Quality a Few Screens at a Time

In article 11.19, An Architecture for Data Quality, I describe an effective approach to managing data quality by placing data quality screens throughout the data pipelines leading from the sources to the targets. Each data quality screen is a test. When the test fails or finds a suspected data quality violation, the screen writes a record in an error event fact table—a dimensional schema hidden in the back room away from direct access by business users. The error event fact table lets DW/BI administrators measure the volume and source of the errors encountered. A companion audit dimension summarizes the error conditions and is exposed to the business users along with every dimensional fact table.

The data quality screens can be implemented one at a time, allowing development of the data quality system to grow incrementally.

Use Surrogate Keys Throughout

Finally, a seemingly small recommendation to reduce your DW/BI development risk: Make sure to build all your dimensions (even type 1 dimensions) with surrogate primary keys. This insulates you from surprises downstream when you acquire a new division that has its own ideas about keys. What's more, all your databases will run faster with surrogate keys.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset