With the historical perspective and grounding from the previous chapter, it's time to get everyone organized to embark on a DW/BI project. We begin this chapter by focusing on the project team, their role as information publishers, and the sometimes insurmountable realities that they confront. We then turn our attention to the all important business stakeholders. Finally, we describe the Kimball Lifecycle approach for tackling your DW/BI initiatives.
We begin by discussing what the data warehouse/business intelligence manager and team should (and shouldn't) do.
Ralph Kimball, DBMS, Jul 1998
This article and the next discuss the similarities between a data warehouse manager's job and the responsibilities of an editor in chief.
The data warehouse manager's job is potentially huge, offering many opportunities and just as many risks. The data warehouse manager has been given control of one of the most valuable assets of any organization: the data. Furthermore, the data warehouse manager is expected to interpret and deliver that asset to the rest of the organization in a way that makes it most useful. All eyes are on the data warehouse manager.
In spite of all this visibility, many newly appointed data warehouse managers are simply given their titles without a clear job definition or a clear sense of what is and is not their responsibility. As an industry, we have been groping for a definition of the data warehouse manager position. Perhaps this was appropriate in past years because we needed to define the job. We needed time to get some accumulated experience. We needed to test the boundaries of what being a data warehouse manager means. I think we are somewhat overdue in defining the data warehouse manager's job. It is not sufficient or helpful to just say that a data warehouse manager's job is to “bring all the data into a central place and make it available for management to make decisions.” Although such a definition may be correct, it isn't precise enough for anyone to tell if the data warehouse manager has done the job well. In this article I'll begin to tackle the data warehouse manager's job definition. I will suggest a metaphor for the job that, if accurate, may provide a rich set of criteria to judge a “job well done.” Furthermore, a clear definition will help senior IT executives understand what the data warehouse manager needs to do and, just as importantly, what things a data warehouse manager should not have to do.
A good metaphor for the job of the data warehouse manager is the job of an editor in chief for a magazine or newspaper. At a high level, the editor in chief:
These statements seem a little obvious because we all know, based on experience, what the job title “editor in chief” implies. And most editors in chief understand very clearly that they don't create the content about which they write, report, or publish. They are, rather, the purveyors of content created by others.
I hope that you have been struck by the many parallels between the job of an editor in chief and the job of a data warehouse manager. Perhaps a good way to sum this up is to say that the job of the data warehouse manager is to publish the enterprise's data.
Let's examine the parallels between these two jobs. In most cases, the data warehouse manager is aggressively pursuing the same goals as the editor in chief. In some cases, the data warehouse manager could learn some useful things by emulating the editor in chief. At a high level, the data warehouse manager:
In addition, the data warehouse manager has a number of responsibilities that most editors do not have to think about. These special data warehouse responsibilities include backing up the data sources and the final, published versions of the data. These backups must be available—sometimes on an emergency basis—to recover from disasters or provide detail that wasn't published the first time around. The data warehouse manager must deal with overwhelming volumes of data and must diligently avoid being stranded by obsolete backups. He or she must replicate published data in a highly synchronized way to each of the downstream “publications” (business process subject areas) and provide a detailed audit trail of where the data came from and what its lineage and provenance is. The data warehouse manager must be able to explain the significance and true meaning of the data and justify each editing step that the ETL may have performed on the data before it was published, and must protect published data from all unauthorized readers. Of all the responsibilities the data warehouse manager has in addition to the classic editorial responsibilities, this security requirement is the most nightmarish; it is also the biggest departure from the editing metaphor. The data warehouse manager must somehow balance the goal of publishing the data to everyone with the goal of protecting the data from everyone. No wonder data warehouse managers have an identity problem.
In the discussion of the editor in chief's responsibilities, we remarked that nearly all editors in chief understand that they are merely the purveyors of content created by others. Most editors don't have a boundary problem in this area. Many data warehouse managers, on the other hand, do. Frequently, the data warehouse manager agrees to be responsible for allocations, forecasts, behavior scoring, modeling, or data mining. This is a major mistake! All these activities are content creation activities. It is understandable that the data warehouse manager is drawn into these activities, because, in many cases, there is no prior model for the division of the new responsibilities between IT and a business user group such as finance. If an organization has never had good allocated costs, for example, and the data warehouse manager is asked to present these costs, then the data warehouse manager is also going to be expected to create these costs.
The data warehouse manager should treat allocations, forecasts, behavior scoring, modeling, and data mining as clients of the data warehouse. These activities should be the responsibilities of various analytic groups in the finance and marketing departments, and these groups should have an arm's length relationship to the data warehouse. They should consume warehouse data as inputs to their analytic activities and, possibly, engage the data warehouse to republish their results when they are done. But these activities should not be mixed into all the mainline publishing activities of the data warehouse.
Creating allocation rules that let you assign infrastructure costs to various product lines or marketing initiatives is a political hot potato. It is easy for a data warehouse manager to get pulled into creating allocations because it is a necessary step in bringing up a profit-oriented subject area. The data warehouse manager should be aware of the possibility of this task being thrust on the data warehouse and should tell management that, for example, the “data warehouse will be glad to publish the allocation numbers once the finance department has created them.” In this case, the editorial metaphor is a useful guide.
Beyond the boundaries defined by the editorial metaphor, we must add the responsibilities of backing up data, auditing the extract and transform processes, replicating to the presentation tables of the data warehouse, and managing security. These tasks make the data warehouse manager's job more technical, more intricate, and at the same time, broader than the job of an editor in chief.
Perhaps this article can stimulate the development of criteria for data warehouse manager training and help IT executive management appreciate what data warehouse managers face in doing their jobs. In many ways, the responsibilities discussed in this article have been implicitly assumed, but the data warehouse managers have neither had them spelled out, nor been compensated for them.
Finally, by focusing on boundaries, we can see more clearly some content creation activities the data warehouse manager should leave on the table. Allocating, forecasting, behavior scoring, modeling, and data mining are valuable and interesting, but they are done by the readers (business users) of the data warehouse. Or to put it another way, if the data warehouse manager takes on these activities, then the data warehouse manager (of course) should get two business cards and two salaries. Then the boundaries can be twice as large.
Ralph Kimball, Intelligent Enterprise, Jul 26, 2002
This article builds on the preceding one to illustrate the evolution of the publisher metaphor.
Based on reader feedback, I've decided to do something I haven't done for quite some time: go back to lay the foundation for what data warehouse designers do, and why they use certain techniques. Doing this also lets me restate the assumptions and techniques with the benefit of hindsight and more experience. I hope the results are tighter, more up to date, and more clearly worded.
At the outset, I want to share a perspective that I take very seriously, and in some ways is the foundation for all my work in data warehousing. It's the publishing metaphor.
Imagine that you've been asked to take over responsibility for a high quality magazine. If you approach this responsibility thoughtfully, you'll do the following 12 things:
While these responsibilities may seem obvious, here are some dubious “goals” that you should avoid:
By building the whole business on the foundation of serving the readers, your magazine is likely to be successful.
The point of this metaphor, of course, is to draw the parallel between being a conventional publisher and being a data warehouse project manager. I'm convinced that the correct job description for a data warehouse project manager is publish the right data. Your main responsibility is to serve your readers who are your business users. While you'll certainly use technology to deliver your data warehouse, the technology is at best a means to an end. The technology and the techniques you use to build your data warehouses shouldn't show up directly in your top 12 responsibilities, but the appropriate technologies and techniques will become much more obvious if your overriding goal is to effectively publish the right data.
Now let's recast the 12 magazine publishing responsibilities as data warehouse responsibilities:
If you do a good job with all these responsibilities, I think you'll be a great data warehouse project leader! Conversely, go down through the list and imagine what happens if you omit any single item. Ultimately your data warehouse would have problems.
Let's boil down the data warehouse mission into some simple but specific design drivers, because as engineers we have to build something. Our users will be happy if we publish the “right” data that gives them insight into their key business questions, in a way that they perceive as simple and fast.
These design drivers dominate everything. As an engineer, I know I can't compromise them or my data warehouse will fail. But, assuredly, there are other real-world constraints that I have to pay attention to. I also try to:
As data warehouse designers, we live in an enormously complex world. I'm almost reluctant to make a list of the implicit constraints we have to live with because I worry that the new student will decide on a different career! But here's my list:
And three more requirements that have been added just since 2000:
And finally a requirement that has grown explosively since 2010:
So what does an engineer make of all this? All these requirements and expectations placed on us are beyond overwhelming. Well, building a bridge across the Golden Gate must have seemed pretty impossible in 1930. And how about going to the moon in the 1960s? Data warehouses may not be that hard, but the examples are inspiring.
A good engineer first sorts through all the available mathematics and science, accumulating ideas and possible techniques. Some of the math and science is practical and some isn't. This is the time to be a skeptic. Next, the engineer decomposes the design problem. If it can be broken into pieces that are relatively independent, then the engineer can focus on manageable parts.
Once the design is started, the engineer must continuously choose practical techniques that are reusable, simple, and symmetrical. Idealized designs and assumptions need to be recognized and thrown out. There's no such thing as perfect information or control (especially of people), either at the beginning of a project or after the project is in production. Throughout the project, the engineer must be a good manager. And, finally, the original design goals have to be constantly in view to provide the foundation for all decisions.
Ralph Kimball, Intelligent Enterprise, Sep 1, 2005
An objection remover is a claim made during the sales cycle intended to counter a fear or concern that you have. Objection removers occupy a gray zone in between legitimate benefits and outright misrepresentations. While an objection remover may be technically true, it's a distraction intended to make you close the door on your concern and move forward in the sales process without careful thinking.
Objection removers crop up in every kind of business, but often the more complex and expensive your problem is, the more likely objection removers will play a role. In the data warehouse world, classic objection removers include:
Objection removers are tricky because they are true—at least if you look at your problem narrowly. And objection removers are crafted to erase your most tangible headaches. The relief you feel when you suddenly imagine that a big problem has gone away is so overwhelming that you feel the impulse to rush to the finish line and sign the contract. That is the purpose of an objection remover in its purest form. It doesn't add lasting value; it makes you close the deal.
So, what to do about objection removers? We don't want to throw the baby out with the bathwater. Showing the salesperson to the door is an overreaction. Somehow we have to stifle the impulse to sign the contract and step back to see the larger picture. Here are four steps you should keep in mind when you encounter an objection remover:
You don't need to build aggregates. The larger issue: Data warehouse queries typically summarize large amounts of data, which implies a lot of disk activity. Supporting a data warehouse requires that you constantly monitor patterns of queries and respond with all the technical tools possible to improve performance. The best ways to improve query performance in data warehouse applications, in order of effectiveness, have proven to be:
Software beats hardware every time. The vendors who talk down aggregates are hardware vendors who want you to improve the performance of slow queries on their massive, expensive hardware. Aggregates, including materialized views, occupy an important place in the overall portfolio of techniques for improving performance of long-running queries. They should be used in combination with all the software and hardware techniques to improve performance.
Leave your backup worries behind you. Developing a good backup and recovery strategy is a complex task that depends on the content of your data and the scenarios under which you must recover that data from the backup media. There are at least three independent scenarios:
The larger picture for this objection remover, obviously, is that each of these scenarios is highly dependent on your data content, your technical environment, and the legal requirements, such as compliance, that mandate how you maintain custody of your data. A good backup strategy requires thoughtful planning and a multipronged approach.
Build your data warehouse in 15 minutes. I've saved the best one for last! The only way you can build a data warehouse in 15 minutes is to narrow the scope of the data warehouse so drastically that there's no extraction, no transformation, and no loading of data in a format meant for consumption by your BI tools. In other words, the definition of a 15-minute data warehouse is one that is already present in some form. Too bad this objection remover is so transparent; it doesn't even offer a temporary feeling of relief.
There's always a letdown after shining a bright light on these objection removers by examining their larger context. Life is complex, after all. Let's finish our list of four steps:
Ralph Kimball, DBMS, Jun 1997
Are you part of the central data warehouse team in your large organization? Do you also work with remote divisional or departmental teams implementing far-flung independent data marts? Do you wonder what your role should be versus the divisional teams' role? If these questions are familiar to you, you may be frustrated because you feel you have an important responsibility in your central role, yet you may not actually own anything. Even worse, perhaps the divisional teams are implementing their own subject area data stores and are charging ahead with their own technologies, vendor selections, and logical and physical database designs.
If you are part of the central data warehouse team, you need to play an active role in defining and controlling your enterprise data warehouse. You should not be content to just be a liaison, a recommender, or some kind of ill-defined glue that holds projects together. You need to be a leader and a provider, and you must be very visible to the separate divisional teams implementing their own subject area data marts.
Conversely, if you are a part of a decentralized divisional data warehouse team, you should expect the central team to provide you with some very valuable services and impose some noticeable structure and discipline on your effort. You are a part of an overall enterprise, and you need to conform to these structures and disciplines so that the data from your division can be usefully combined with the data from other divisions.
The central data warehouse team has three major responsibilities that not only are essential to the integrity of the overall data warehousing effort, but cannot be provided by any of the individual divisions. These responsibilities are defining and publishing the corporate dimensions, providing cross-divisional applications, and defining a consistent data warehouse security architecture for the enterprise.
If your organization has multiple lines of business, and if you have any interest in combining these lines of business into an overall framework, you need to identify four or five dimensions that are common to the multiple lines of business. The most obvious corporate dimensions are customer, product, geography, and date. Since this article was written in 1997, the industry has given a name to defining and publishing corporate dimensions: master data management (MDM).
The customer dimension has one record for each one of your customers (actually, one record for each historical description of each customer). Although this seems obvious, the crucial step is to make sure that each divisional data mart (subject area) uses the same single customer dimension wherever there is a reference to customer. Each division will therefore see the customer the same way. Individual divisions will be able to see the relationship other divisions have with the customer, and the enterprise will, perhaps for the first time, be able to see the whole customer relationship.
The requirement that each division must always use the corporate customer dimension is a strong requirement that will impact everyone. From the central team's perspective, the corporate customer dimension must embrace the complete range of possible customers. The key for corporate customer almost certainly will need to be an artificial customer number defined and created by the central data warehouse team. No individual divisional customer number will be administered and controlled in a way that serves the needs of the entire enterprise. The descriptive attributes for customer will need to be a broad set that includes all of the desired descriptors for all of the divisional groups. Hopefully the central data warehouse team can meet with the divisional groups and try to compress and standardize the various descriptors into a reasonable working set, but in the long run there is no absolute need to force any individual group to give up its idiosyncratic customer descriptors. Because dimension tables are generally much smaller than fact tables in a dimensional data warehouse design, there is room for multiple sets of customer descriptors in the same customer dimension record. In banks, I have often seen the master customer record contain 200 descriptive attributes. I advocate making this customer table a single flat, denormalized table in order to improve user understandability and allow the exciting new bitmap indexing technologies to speed lookup. The leading database vendors all have bitmap indexes that thrive on fat, wide, denormalized dimension tables.
The need to standardize on a corporate definition of customer is a very important central team responsibility. This need will arise whenever management wants to look at overall customer relationships, and it will arise whenever your enterprise makes an acquisition. The good news is that you just acquired your biggest competitor. The bad news is that you must now merge the separate customer dimensions.
Often there will be no individual division willing or able to provide the corporate customer dimension. Yet frequently these same divisions will be eager to receive such a customer dimension from you, the central team. Even the production online transaction processing (OLTP) systems in the division may be willing to upload the central data warehouse's customer dimension if it contains cleaned and corrected customer addresses. In these cases, the central data warehouse team becomes the creator and provider of the master enterprise customer file.
The failure to enforce the use of a corporate customer dimension is a very serious lapse and therefore must be addressed by the central data warehouse team. The ability to “drill across” separate business processes can only be accomplished if all of the shared dimensions in the separate data marts are conformed. In other words, if two data marts have the same dimension, such as customer, these two customer dimensions must share a set of common attribute fields that are drawn from the same domains. The descriptive attributes within these shared dimensions must be defined and populated in a consistent way. Allowing two customer dimensions in different business process subject areas to drift apart means that the two subject areas cannot be used together. Ever.
The product dimension has one record for each one of your products or services. As the central data warehouse team, you must be the judge as to whether your individual divisions have enough in common to warrant building a corporate product dimension. In a tightly integrated business, such as the multiple geographic divisions of a big retailer, it seems obvious that a single master product dimension is called for, even if there are regional variations in the products sold. In such a case, you probably already have a production group that works constantly to define this product master and download its descriptions and price points to the cash registers of the individual stores. The central data warehouse team can easily work with this group to create a proper product dimension for the enterprise. In a financial services organization, such as a bank or insurance company, there is a lot of interest in a core or super-type view of the business that groups all of the products and services into a single hierarchical framework. But there will be lots of special descriptive attributes that only make sense for a single division, such as mortgage loans versus credit card accounts. In this case, the best approach is to create a core product dimension containing only common attributes, and a set of custom or sub-type product dimensions that are different for each division. This heterogeneous product design is discussed in article 9.17, Hot-Swappable Dimensions. Finally, if your enterprise is so loosely integrated that your products have almost nothing in common, and if your management has never bothered to define a set of hierarchies that group all of the different divisional products into a common framework, then you may not need a corporate product dimension. Just remember, if you don't create a conformed product dimension, you will not be able to combine the separate divisional data marts together.
The geography dimension has one record for each one of your districts, regions, or zones. If your individual divisions have incompatible sales geographies, then the separate division data marts can only be used together at the highest common geographical aggregate. If you are lucky, this highest geographical aggregate will be something useful to all the data marts, such as state.
The date dimension has one record for each calendar time period. Hopefully all of your divisions operate on the same calendar and report on the same fiscal periods. If at all possible, all reporting should be done either at an individual daily grain or at the fiscal period grain, such as month. In this case, days always roll up to the fiscal periods, and the fiscal periods roll up to years. It becomes a monumental headache if the separate divisions have incompatible reporting calendars. Fortunately, the central data warehouse team has a strong ally in the corporate financial reporting group who hopefully is in the process of standardizing the reporting calendars of the divisions. Separate databases denominated in weeks and months should be avoided at all costs because these time dimensions cannot usefully be conformed, and the week databases will always be isolated from the month databases.
The central data warehouse team is in a unique position of being able to provide cross-divisional applications. A value chain of distributed business process subject areas can be assembled from each of the separate divisions. In article 6.2, Drilling Down, Up, and Across, I showed six subject areas for a clothing store retailer. These subject areas may have been built at different times, but they share a number of common dimensions. From the previous discussion, it should be clear that the central team should define and enforce the use of these dimensions.
With the proper tools, it is fairly easy to build cross-divisional applications in these environments that are capable of drilling across. I discussed drilling across in some detail in article 13.25, Features for Query Tools.
The central team's responsibility is to provide a reporting tool that allows drilling across and to enforce the use of common dimensions.
The data warehouse team must play a very proactive role in understanding and defining security. The data warehouse team must include a new member: the security architect.
The security architect for the data warehouse should define and enforce a data warehouse security plan for the enterprise and all of the individual subject areas; define consistent user privilege profiles; identify all possible access points in the corporate network, including every modem and every web interface; implement a single logon protocol so that a user is authenticated once for the whole network; track and analyze all attempts to override security or gain unauthorized access to the warehouse; implement a link encryption scheme, or its equivalent, for remote users; implement a remote user authentication scheme, such as a biometric reader at each PC, that is more reliable than typed user passwords; and educate the divisional teams about security issues and consistent administration of the subject areas.
Margy Ross, Design Tip #64, Feb 8, 2005
Several people have asked a similar question recently. “Should the DW or BI team gather requirements from the business?” Honestly, this question makes the hair start to stand up on the back of my neck. I'm concerned that too many organizations have overly compartmentalized their data warehouse and business intelligence teams.
Of course, some of this division is natural, especially when the resources allocated to DW/BI grow as the environment expands, creating an obvious span of control issues. Also, separation of labor allows for specialization. Viewing the overall DW/BI environment as analogous to a commercial restaurant—some team members are highly skilled in kitchen food preparation, while others are extremely attentive to the needs of the restaurant patrons, ensuring their return for a subsequent visit. There are likely few waiters who should suddenly don the chef's garb, and vice versa.
Despite the distribution of responsibilities, the kitchen and dining rooms of a restaurant are tightly entwined. Neither can be successful on its own. The best chefs need a well-trained, well-oiled front room machine; the most attractive dining room requires depth and quality from the kitchen. Only the complete package can deliver consistent, pleasurable dining experiences (and sustainability as a restaurant). That's why the chef and wait staff often huddle to educate and compare notes before a meal rush.
In the world of DW/BI, we've observed that some teams take a more isolationist approach. Matters are further complicated by the complexities and realities of organizational culture and politics. There may be a kitchen and dining area, but there's no swinging door between the two. It's like there's a transom (above eye level) where orders and plates are flung back and forth, but the two teams of specialists aren't communicating or cooperating. In this scenario, you end up with data models that can't reasonably be populated; or data models that don't address the diners' needs and/or leverage their tools; or diners' tools that are overtaxed or slow performing because they're repeatedly doing the work that could have been done once in the kitchen and shared throughout the organization. In the worst case, the wall becomes so impenetrable that the BI dining room substitutes a different kitchen (or creates their own) to source meals.
The data warehouse should be the foundation for effective business intelligence. Too many organizations have focused on one without the other. Sure, you can create a data warehouse without concern for business intelligence, and vice versa, but neither situation is sustainable for long. Isolationism is not a healthy approach for building and supporting the DW/BI environment. Even if you don't report into the same management structure, collaboration and communication are critical.
Warren Thornthwaite, Intelligent Enterprise, May 11, 2008
Anyone who takes even a cursory glance at the common DW/BI failure points would spot the pattern: The hardest parts for most teams are understanding business issues and handling cross-organizational interactions. The challenges will become even more common as the DW/BI system evolves into a standard component of the IT environment, thus weakening the connection to the business.
Historically, DW/BI projects were driven by visionary people who were conversant in both business and technology. They understood the potential of better business insight and were motivated to help the DW/BI team deliver on it. As DW/BI systems have become commonplace, teams are now often made up of more technically oriented people. This change is not bad in and of itself, but it does mean that DW/BI teams will need to build up skills around understanding business requirements and strengthening relationships with business users.
Let's be clear here: Not everyone on the team needs to get an MBA, but everyone on the team should develop a basic understanding of how their organization works, how to work well with other people, and how to communicate more effectively, both in writing and in business meetings and presentations. This article presents advice on improving in these areas, along with valuable resources for professional development.
To better understand business, the first place to start is with documents from your own organization. Annual reports, strategic plans, marketing plans, and internal vision documents all provide solid insight into your business and its challenges and opportunities. However, without a basic business understanding, you may not get full value from these documents. Fortunately, there are many readily accessible resources to gain this understanding. Start with a book or two. There are at least a dozen good books available that attempt to boil a two-year MBA program down into a short, easy read. Here's a book that offers a good overview of the basics of business:
The Ten-Day MBA: A Step-By-Step Guide to Mastering the Skills Taught in America's Top Business Schools, 4th edition, by Steven A. Silbiger (Collins, 2012)
Once your senior management team has selected its top priorities for the DW/BI system and you want to learn more about it, seek out books that go into detail on those areas, whether it's marketing, sales, finance, promotions, manufacturing, logistics, or other functional areas. At this level, it's probably more valuable to take a course in the particular subject area rather than read a book. A good instructor can bring experience and perspective to the topic, drawing from multiple resources and adding exercises, examples, and anecdotes. You also get the benefit of interaction with classmates. If you can't find relevant courses at local universities, colleges, or community colleges, try looking online. Many major institutions offer online MBA courses. Search the internet and you will find plenty of options.
Some would argue that interpersonal skills are even more important than business acumen. Right from the start, someone on the DW/BI team must be able to persuasively articulate the vision and value of the system if you are to bring the project into existence or realign an existing project. Before any software is installed, someone on the team must ask questions about goals and elicit honest, useful answers without frightening or alienating the business users. Fortunately, these, too, are skills that can be learned. To master the basics of interpersonal and relationships skills, try one of these classics:
Crucial Conversations: Tools for Talking When Stakes are High, 2nd edition, by Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler (McGraw-Hill, 2011)
How to Win Friends and Influence People by Dale Carnegie (Simon and Schuster, 1936, revised 1998)
The 7 Habits of Highly Effective People by Stephen R. Covey (Simon and Schuster, 2013)
You will need to translate the advice offered in these books to your situation, but the core principles and recommendations are enduring. Basic principles such as having a sincere interest in others, listening to what they say, and looking for a win-win solution are the foundation of interpersonal success. Employ these principles with the goal of building a positive long term relationship with your business users—a relationship based on openness and trust. Use the techniques described in these books in a sincere fashion; if your intent is to manipulate or deceive, your efforts will backfire in the long run.
Every successful leader must be able to effectively communicate to groups. Public speaking and presentations involve two skill sets: the private act of preparing the content itself and the public act of delivering it to the audience. Both skill sets are addressed at length in these two books:
Presentation Zen: Simple Ideas on Presentation Design and Delivery by Garr Reynolds (New Riders Press, 2011)
The Quick and Easy Way to Effective Speaking by Dale Carnegie (Simon and Schuster, 1990; dated but still useful)
Getting better at speaking and presenting takes practice, but practicing in the work environment can be difficult and intimidating. It may help to take a class or find another venue for practice. One excellent, low cost option for developing public speaking skills is Toastmasters International (http://www.toastmasters.org
/). Toastmasters is a nonprofit organization with thousands of local chapters across the globe. Toastmasters clubs typically have 20 to 30 members who meet two to four times per month for one or two hours. Members learn by speaking to the group and working with others in a supportive environment. The Toastmasters site can help you find clubs near you. There are also numerous commercial organizations that provide professional training in public speaking.
Written communications are another pillar of the communications skill set. Much of how we interact with others is through the written word. Emails, proposals, requirements documents, and related documentation actually define the DW/BI system to 95 percent of your business user community. Poor or sloppy writing can distract the reader and undermine the value and credibility of the message. Try one or more of these bibles on good writing:
The Elements of Style by William Strunk Jr. and E.B. White (Allyn and Bacon, 2011)
Keys to Great Writing by Stephen Wilbers (Writers Digest Books, 2007)
On Writing Well: The Classic Guide to Writing Nonfiction by William Zinsser (Collins, 2006)
In addition to these books, there are hundreds of web sites that offer writing tips and style guides. Take a look at Purdue University's Online Writing Lab (http://owl.english.purdue.edu/
).
Most of what you find on the internet is a starting point at best. It's difficult to provide the depth of content available in a book or classroom on the web, but it's an easy place to start.
The internet and books are a good starting point, but they are no substitutes for practice. These are skills like any other; the more you work on them, the more you will improve. Every time you communicate with another person, you have an opportunity to practice your interpersonal skills. Every email, document, or even comments in your code present a chance to work on better writing. Opportunities to practice public speaking are all around you, but they need to be cultivated a bit. Ask for time in your team's weekly meeting to present an educational segment on an interesting problem you've been dealing with and how you solved it. Offer to teach one of the DW/BI system ad hoc tool classes. In fact, if you take it seriously—by preparing well, creating clear graphics, and practicing—even a design review is an opportunity to practice your presentation skills.
Ask your manager or human resources department for additional recommendations on communications resources. While you're at it, have them include the whole area of communications in your performance appraisal. That way, you get credit for doing what you need to do to be successful in your project. You also get a little external motivation.
If you are excited by the prospect of improving your business acumen, interpersonal skills, speaking talent, and writing abilities, then you are half way there! Your enthusiasm and motivation will make it a rewarding endeavor.
Margy Ross, Design Tip #173, Mar 9, 2015
Over the years, we've worked with countless exemplary DW/BI project team members: smart, skilled, dedicated, and motivated, coupled with a healthy dose of mutual trust, respect, and camaraderie with their teammates. Teams with members who possess these characteristics tend to fire on all cylinders, with the resulting whole often greater than the sum of the parts. But we've also run into risky project resources; in addition to being individual non-contributors, they can undermine the effectiveness of the entire DW/BI team. Model team members often become short-timers if the team is stacked with unproductive non-performers. We hope your team doesn't include resources that resemble the following profiles:
Of course, even with superstar teammates, the right team leadership is also necessary. Hopefully your DW/BI project/program manager fits the following bill:
Inexperienced, ineffective, or indecisive DW/BI project managers who don't demonstrate these characteristics are equally risky project resources.
Bob Becker, Design Tip #66, Apr 11, 2005
Many data warehouse teams lean heavily toward the doing side. They leap into implementation activities without spending enough time and energy to develop their data models, identify thorough business rules, or plan their ETL data staging processes. As a result, they charge full speed ahead and end up reworking their processes, delivering bad or incomplete data, and generally causing themselves difficulty.
Other project teams have the opposite challenge. These teams are committed to doing their homework in all the critical areas. They are focused on data quality, consistency, completeness, and stewardship. However, these project teams sometimes bog down on issues that should have been resolved long ago. Of course, this impasse occurs at the worst time—the promised implementation dates are rapidly approaching and design decisions that should be well into implementation remain unresolved.
Naturally, the outstanding issues involve the most difficult choices and the project team disagrees on the best solutions. The easy issues have already been resolved and the solutions for the more difficult issues don't come as easily. Despite copious amounts of time spent in research, data profiling, design meetings, and informal discussions, nothing seems to move the team closer to a decision on the best approach. The project sits at a crossroads unable to move forward. By this time, fear has usually taken hold of the project team. The pressure is on.
One helpful approach is the use of an arbitrator, a trusted individual from outside the project team, to help move the team ahead. The outstanding issues are identified and meetings scheduled with the arbitrator and interested stakeholders. All participants must agree that a final decision will be made during these sessions. The arbitrator should establish a time box to limit discussion on each issue. Discuss the pros and cons of each approach one last time; the arbitrator makes the ultimate decision if the team can't reach consensus.
Another approach is to defer the unresolved issues until a future implementation after further research and discussion have identified an appropriate solution. The downsides to this approach are that the business requirements may not allow the issues to be deferred; postponing resolution may simply delay the inevitable without any significant gain.
There is a delicate balance between planning and doing in the data warehouse world. The goal is to identify reasonable solutions, not necessarily perfect solutions, so the team can transition from planning to implementation. There may still be more to learn, but the implementation process is often more effective at revealing the weak spots in the plan so they can be reinforced than any amount of talking and planning. In fact, for many of the hard choices, much of the information needed to make good choices can only be gained through trial and error.
Clearly, we are not advocating a casual, ad hoc approach to implementing the data warehouse. But we recognize that sometimes you must be pragmatic and move forward with less than ideal solutions that may need to be revisited to achieve your overall goals.
Bob Becker, Design Tip #154, Apr 1, 2013
Keeping tight control over the scope of your data warehouse/business intelligence (DW/BI) program is an important ingredient for success. Surprisingly, in some organizations it's equally important to ensure that the program doesn't suffer the theft of its scope after an otherwise good plan has been developed.
It's nearly impossible to tackle everything at once in a DW/BI program. The DW/BI team should identify subsets of effort based on the organization's business processes (see article 4.7, Identifying Business Processes), to phase the overall design, development, and deployment effort. Each phase or iteration should be meaningful and manageable. That is, the scope of effort should be large enough to result in a deliverable that provides meaningful business value, yet the scope should be small enough that the size of the team, the amount of data involved, and the communications requirements are “reasonable,” especially given the resources allocated. It's important to avoid a large galactic scope that requires an army of resources and years of effort.
Once the scope for the project is established, there will inevitably be pressures to increase the scope. The most disruptive kind of scope creep is adding a new data source. Such a small incremental change in scope seems innocuous enough: “We are just adding a new cost element” or “We are just adding a weather source to the sales data.” However, these so-called small changes add up over time. Often the DW/BI project team is at fault. Most DW/BI teams are very focused on serving the requirements of their business partners and sponsors. But you need to be careful in your enthusiasm to serve not to succumb to scope creep that negatively impacts your ability to meet expectations. While it's important to be flexible, it's necessary to say “no” to maintain scope, especially after the project is underway.
Scope creep also results from overzealous business partners and sponsors. Once they've thrown their support behind the DW/BI initiative, they're eager for quick progress and immediate results. They clamor for a greater scope (more business processes, more history, complex business rules, and so on) delivered more quickly than originally agreed upon. Clearly, the business's enthusiasm and support is positive, but so is sticking to the game plan and delivering the meaningful and manageable scope. The main business sponsor in a DW/BI project needs to be a sophisticated observer of the technical development, as well as being a good manager.
A strong program/project manager is critical for successfully managing scope creep. The program manager's job will be made easier with a supportive business sponsor, a solid IT/business partnership, and a project team committed to meeting expectations. Frequent frank discussions between the program manager and business sponsor regarding the scope, challenges, progress, timing, and expectations will ensure everyone remains on the same page.
A second concern regarding project scope is scope theft. Scope theft occurs when proponents of other projects/programs within the organization try to attach their agenda to the DW/BI program. Often these efforts support important enterprise needs; however, when they become attached to the DW/BI program, it's usually the DW/BI program that suffers. These other efforts often lack business sponsorship, payback, and most importantly, funding. Supporters of these initiatives want to attach their agendas to the DW/BI program to leverage (i.e., steal) the momentum, senior management visibility, business sponsorship, staffing, and/or funding of the DW/BI initiative. These efforts are often repackaged as prerequisites, companion, and/or complementary efforts to the DW/BI initiative; suddenly the agreed scope of the DW/BI initiative has been usurped to focus on other initiatives.
There are a wide variety of enterprise initiatives that might potentially attach themselves to a DW/BI program. Some common examples include:
There is little argument that these initiatives may be in the best interest of the enterprise (as is the DW/BI program); it's hard to dispute them conceptually. But, be wary. The reason the proponents of these initiatives want to leverage the DW/BI program is because their initiatives are complex, challenging, and expensive—they have likely been unable to secure the funding and sponsorship required to launch as a separate viable initiative. Since the DW/BI program has strong business support and funding, these competing initiatives look to cloak themselves as prerequisites or architectural requirements for the DW/BI program to move under its program umbrella.
The key question is whether these other initiatives become part and parcel of the DW/BI program. Typically not; it is likely best for both initiatives if they stand on their own merits. Otherwise the DW/BI program risks becoming a more IT-driven, galactic effort that delays the planned DW/BI capabilities and fails to meet the business's expectations. These other efforts are almost always clearly distinct from the DW/BI program and should be scoped, funded, sponsored, and governed independently. The DW/BI program manager and business sponsor need to be politically astute, cautious, and fight to ensure the DW/BI program remains independent. The DW/BI business sponsor must typically work through these challenges given that the organizational politics are often beyond the DW/BI program manager's control.
Joy Mundy, Design Tip #129, Nov 3, 2010
Back when the world was young and data warehousing was new, projects were a lot more fun. Kimball Group consultants (before there was a Kimball Group) were usually called in by the business users, and we'd help them design and build a system largely outside the aegis of corporate IT. In the intervening years—decades!—data warehousing has become a mainstream component of most IT organizations. For the most part, this is a good thing: The rigor that formal IT management brings to the DW makes our systems more reliable, maintainable, and performant. No one likes the idea of a BI server sitting under a business user's desk.
However, IT infrastructure is not always helpful to the DW/BI project. Sometimes it gets in the way, or is actively obstructionist. No doubt every little sub-specialty of information technology clamors that it is somehow different or special, but in the case of DW/BI, it's really true.
The classic waterfall methodology subscribed to by many IT organizations has you develop a painfully detailed specification that's formalized, agreed to by the business and IT, and then turned over to the developers for implementation. Changes to the specification are tightly controlled, and are the exception rather than the rule.
If you try to apply a waterfall methodology to a DW/BI project, the best you'll end up with is a reporting system. The only things you can specify in sufficient detail are standard reports, so that's what you get: a system to deliver those standard reports. Many specs include a demand to support ad hoc analysis, and sometimes include examples of specific analyses the users would like to be able to do. But within the waterfall methodology it's impossible to clearly specify the boundaries and requirements of ad hoc analyses. So the project team “meets” this requirement by plopping an ad hoc query tool in front of the database.
It's really frustrating for the business users who are asked to write or approve the specification. They know the spec doesn't capture the richness of what they want, but they don't know how to communicate their needs to IT. I was recently in a meeting with a disgruntled business user who glared at the DW manager and said “Just provide me with a system that captures all the relationships in the data.” If only that one sentence were sufficient for the poor guy to design a data warehouse.
The data model takes a much more central role in the system design for a data warehouse than for a transaction system. As Bob argued in article 4.12, Using the Dimensional Model to Validate Business Requirements, the data model—developed collaboratively with the business users—becomes the core of the system specification. If the business users agree that any realistic analysis on the subject area can be met through the data in the model, and IT agrees the data model can be populated, the two sides can shake hands over the model. This is quite different from the standard waterfall approach, where the data modeler would take the spec into his cubicle and emerge several weeks later with the fully formed data model.
Another place I've seen formal IT processes get in the way is in naming the entities in the data warehouse. Of course, this is much less important an issue than the specifications, but I find myself deeply annoyed by naming dogmatisms. Because the DW database is designed for ad hoc use, business users are going to see the table and column names. They are displayed as report titles and column headers, so extremely long column names are problematic.
That said, naming conventions, the use of case words, and minimizing the use of abbreviations are all good ideas. But like all good ideas, temper them with reason. Taken to an extreme, you can end up with absurdly long column names like (and I'm not making this up) CurrentWorldwideTimezoneAreaIdentifier.
Although they give us something to laugh about, the real problem with absurdly long names is that the users won't tolerate them in their reports. They'll always be changing them in the report display, which means they'll use inconsistent names and introduce a new (and stupid) reason for the “multiple versions of the truth” problem. Please let your naming conventions be tempered by common sense.
Unless it's Kimball Group dogma. I have heard all sorts of rules instituted by “someone at corporate IT.” These include:
Don't get us wrong… As easy as it is to mock some practices, we believe in professionally developed and managed DW/BI systems, which usually mean IT. The advantages are huge:
However, if you're on a project that's suffering under an IT mandate that makes no sense to you, don't be afraid to push back.
With the project team roles and responsibilities defined, we shift our focus to the all important business stakeholders of the data warehouse program. We'll describe the habits that make or break a business sponsor, challenge the conventional view of total cost of ownership calculations, then close this section with guidance to boost your skills for interacting with the business.
Margy Ross, Intelligent Enterprise, Sep 1, 2003
As data warehouse designers, you know how important a business executive sponsor is to your initiative. After focusing on data warehousing for the past two decades, I'm convinced that strong business sponsorship is the leading make-or-break indicator of data warehouse success. Having the right sponsor can overcome a multitude of shortcomings elsewhere in the project. On the other hand, sponsor turnover is one of the most common causes of data warehouse stagnation; unsponsored data warehouses simply don't survive.
In this article, I explore the characteristics that distinguish highly effective data warehouse sponsors. Sponsors who take their responsibilities seriously seem to naturally crave guidance about doing their job well. They're just as interested in data warehouse success as you are. Of course, the more effective the business sponsor is, the more fun you'll have associating with the data warehouse initiative. Remember that you're both on the same team. So after reading this article, route it to your existing or potential sponsors as this one's for them.
You've volunteered (or perhaps been mildly coerced) to serve as the business sponsor for the data warehouse. You've been successful in most previous ventures, but this is new and different. Then again, it can't be that difficult, can it?
No, it's not difficult, if you're committed. After working with hundreds of data warehouse sponsors, the same patterns of behavior occur repeatedly. I encourage you to learn from others' mistakes and keep these habits in mind as you undertake your new responsibilities.
As a data warehouse sponsor, it's important that you visualize and verbalize the potential effects of improved information on the organization's key initiatives. If you have the authority, but don't truly believe, then you should step aside as the business sponsor because you'll inevitably struggle to be effective. Data warehouse business sponsors need to be passionate about the cause and convey their vision to the organization. If this doesn't sound like you, then you and the data warehouse team need to find an alternate sponsor before plowing full steam ahead; otherwise, you'll be doing a disservice to yourself, the warehouse team, and the entire organization.
A common approach to managing the enterprise's data assets is avoidance. Data is managed departmentally rather than across the organization. Each department or organizational function builds its own private data repository, but there's no overall enterprise view. It's initially appealing because every department gets exactly what it wants without contending with organizational consensus. Numerous existing data warehouses have been constructed on this basis.
However, because each department uses slightly different definitions and interpretations, no one's data ties to anyone else's, and the result is anarchy. You lose the ability to look across the enterprise, missing opportunities for cross-functional sharing. Likewise, this mayhem produces significant organizational waste. Partially redundant databases translate into partially redundant data development, administration, and storage expenses. Even more wasteful are the resources devoted to understanding and reconciling this inconsistently defined data.
Of course, you can bring order to the chaos, but you need the political clout, financial means, and inclination to challenge the status quo. Rather than letting everyone build independent, department-centric databases or dashboards, corporate information assets need to be proactively managed. Many CIOs consider this their fiduciary responsibility. As the business sponsor, you'll need to work closely with the CIO to encourage key stakeholders to watch out for the good of the greater whole. The enterprise will need to commit to a common foundation, relying on shared reference information (known as dimensions to our regular readers) as the foundation for integration. No one said it would be a cakewalk.
Establishing order begins with a simple underlying premise: Key performance metrics are reused repeatedly across the organization. In a health insurance company, processing claims is a primary business activity, although one department may analyze the insured party's characteristics while another wants to study the health care professional involved. Despite distinct departmental focuses, claims data is a common link.
Data warehouse business sponsors need to influence others up, down, and across the organization. You're the internal data warehouse booster. Rallying the troops typically comes naturally to sponsors. You need to create enthusiasm within your organization, without overpromising or setting expectations that the data warehouse team can never fulfill. Don't be surprised if the project manager asks you to send out a friendly reminder conveying the importance of this initiative to key business representatives, encouraging their participation. Predictably, business folks more actively engage in data warehouse requirements or review meetings when it has been strongly suggested by their boss's boss. In the long run, getting business representatives personally involved during the design and development phases is crucial to their acceptance of the ultimate deliverable.
Rallying your peers is equally important, especially if you understand the ultimate costs associated with the easy route and commit to an enterprise approach. Obviously, you'll need your peers' backing, both financially and organizationally, to realize this vision. However, you may need to create awareness first. Perhaps your peers have already written big checks for new transaction processing systems, but don't yet understand that these operationally oriented systems won't address their analytic needs.
Business sponsors also need to build peer consensus on priorities. This consensus must be resilient enough that everyone involved sends consistent messages to the organization, both in words and actions. Attempting to step above the political fray doesn't come naturally. You'll need both a keen awareness of your peers' needs and the ability to empathize, without alienating them or being condescending. Early sponsors of the data warehouse often recruit peers to become cosponsors. You and your cosponsors must agree on and be committed to common goals. Inevitably, you will feel organizational pressure to change course.
Successful data warehouses are built incrementally. Business users won't get everything they want in the first phase, when establishing the core shared reference data is the priority. You need to be patient and ensure others follow suit. The data warehouse business sponsor should share the team's desire to balance meaningfulness and manageability when considering project iterations and time frames.
Analytic and reporting alternatives to the data warehouse probably already exist in the organization. Any incremental expenditure to enhance or expand these pre-existing point solutions should be evaluated against the overall goal. In the long run, many of these departmental solutions will be migrated or retired. Obviously, you need peer support to make this happen, especially when the near-term resolution to requests is “no” or “you're next in line.”
Serving as a business sponsor for the data warehouse is not a short term assignment, so plan to be involved for the long haul. Your visible participation helps encourage organizational acceptance and maintain momentum and enthusiasm.
Battles will erupt, usually over funding or resources. Business sponsors need to provide ongoing financial support for the foreseeable future. Data warehouses collapse without adequate funding. On the other hand, you can overspend. Although big budgets provide the team with short term euphoria, nothing is sustainable if the costs exceed the benefits. You must expect and assure that the data warehouse team continues to deliver new incremental value to the organization in a timely and consistent manner.
Data warehouses are built to materially improve the organization's decision-making capabilities. As the data warehouse sponsor, you should continually remind all within earshot that business acceptance is the key success metric for the initiative. Everyone involved with the program, both on the business and IT side, needs to recite this pledge daily. That's especially true for those vulnerable to distraction by whiz-bang technology. Remember, nothing in your mission to improve the organization's decision making says anything about technology.
You must understand that building a data warehouse isn't a single project, but an ongoing program. Asking when the data warehouse will be finished is akin to asking, “When will we stop developing and enhancing our operational transaction processing systems?” The data warehouse environment will advance as an organization's core business processes, associated operational systems, and analytic processes mature.
As the business sponsor for the data warehouse, you ensure that the program is in continuous alignment with the business. As the business and its initiatives evolve, so must the vision for the data warehouse. The team can't afford to become complacent and rest on its laurels. The sponsor should ensure the program stays on a trajectory for excellence, without dictating or expecting compliance with unrealistic time frames. Your strategy should focus on those opportunities that translate into the biggest benefit for the overall enterprise.
The habits of highly effective data warehouse business sponsors don't require a degree in rocket science. They're common sense, largely variations on themes that let us be effective in other aspects of our professional and personal lives. So go forth and be an effective sponsor—the organization and data warehouse team are counting on you!
Ralph Kimball, Intelligent Enterprise, May 13, 2003
The recent industry preoccupation with calculating the total cost of ownership (TCO) for a data warehouse focuses on minor issues while missing the big picture. Metaphors such as “not seeing the forest for the trees” or even “rearranging the deck chairs on the Titanic” come to mind.
Reading various white papers and listening to industry consultants would make you think that data warehouse TCO is dominated by the costs of hardware, software licenses and maintenance contracts, IT staff expenses, and services.
Some of the more detailed analyses break these categories down to reveal chronically unplanned costs such as hiring and firing, training, tuning, testing, and documentation. One large hardware vendor even bases its TCO advantages on reducing the number of DBAs. With all these categories as a foundation, the various vendors “prove” that their approach is better by claiming to show head-to-head comparisons. Observers like you and me are left to wonder how the conflicting claims could be so dramatically at odds. But all of this muddling misses the big picture. As one of my former colleagues used to say, there is a hippopotamus in the room but nobody is talking about it.
When I think about data warehouse TCO, I start by asking: Why do we have a data warehouse, and who in the organization judges the cost and the return from a data warehouse? My answer is that the data warehouse publishes the data assets of the company to most effectively facilitate decision making. In other words, the data warehouse is an organization-wide resource, the cost and value of which you must judge by its effect on decision making in the broadest possible sense.
So who are the decision makers in an organization? For the most part, they're not the IT employees! The mainline decision makers of an organization are the business users of the data warehouse, whether they're executives, managers, knowledge workers, analysts, shop floor stewards, customer service representatives, secretaries, or clerks. All these users have a powerful, instinctive need to see data. With the computer revolution of the past half century, a profound cultural shift has taken place in the way businesses are run. Business end users are now quite certain that if they can see their data, they can run their businesses better.
So when I step back to the broader perspective of an entire organization trying to make better decisions with a data warehouse, the traditional IT-centric costs don't even make my top ten costs to be concerned about! The hippopotamus in this room is a set of problems that can destroy the value of a data warehouse. And yes, to avoid these problems entails some real costs.
Here's my list of the real sources of data warehouse costs that come before the traditional hardware, software, staff, and services costs that we've been so fixated on. The list begins with the most important in my opinion, but you can adjust the ranking to your own environment:
As you'll see, most of these problems are akin to snatching defeat from the jaws of victory. When these problems raise their heads, they can threaten the entire data warehouse initiative. If the data warehouse fails, the cost analysis is really ugly. Add the traditional IT-centric costs to the cost of running your business without your data.
In all of these types of problems, the potential cost is the failure of the data warehouse. This cost dominates and renders traditional cost analyses meaningless. There's no upper bound on the potential cost of not being able to make the right decision. Your goal is to replace this potentially unbounded cost with finite, knowable costs, and at the same time eliminate the risks of losing the data warehouse. Many of these finite costs are in the numeric range of the less important costs I listed at the start of this article.
Taking a constructive view, let's look at these sources of cost to the data warehouse, how much they affect the overall organization, and what we can do to reduce them.
I hope that after reading about all these potential and real costs of the data warehouse you've almost forgotten about hardware, software, and services. Years ago at the Xerox Palo Alto Research Center (now just PARC), I was shocked when Alan Kay, inventor of the personal computer, said, “Hardware is tissue paper. You use it and throw it away.” That seemed disrespectful of all the tangible pieces of gear we had at the research center, but he was right. For the data warehouse, the only thing that matters is to effectively publish, in the most pleasing and fulfilling sense, the right data that supports our ability to make decisions.
The final section in this chapter describes the Kimball Lifecycle approach, presents common mistakes made on DW/BI projects, and offers practical advice to reduce project risks.
Margy Ross, Design Tip #115, Aug 4, 2009
A student in a recent class asked me for an overview of the Kimball Lifecycle approach to share with her manager. Confident that we'd published an executive summary, I was happy to oblige. Much to my surprise, our only published Lifecycle overview was a chapter in a Toolkit book, so this Design Tip addresses the unexpected content void in our archives.
The Kimball Lifecycle approach has been around for decades. The concepts were originally conceived in the 1980s by members of the Kimball Group and several colleagues at Metaphor Computer Systems. When we first published the methodology in The Data Warehouse Lifecycle Toolkit (Wiley, 1998), it was referred to as the Business Dimensional Lifecycle because this name reinforced three fundamental concepts:
Rewinding back to the 1990s, our methodology was one of the few emphasizing this set of core principles, so the Business Dimensional Lifecycle name differentiated our approach from others in the industry. Fast forwarding to the late 2000s when we published the second edition of the Lifecycle Toolkit, we still absolutely believed in these concepts, but the industry had evolved. Our principles had become mainstream best practices touted by many, so we condensed the methodology's official name to simply the Kimball Lifecycle.
In spite of dramatic advances in technology and understanding during the last couple of decades, the basic constructs of the Kimball Lifecycle have remained strikingly constant. Our approach to designing, developing, and deploying DW/BI solutions is tried and true. It has been utilized by thousands of project teams in virtually every industry, application area, business function, and technology platform. The Kimball Lifecycle approach has proven to work again and again.
The Kimball Lifecycle approach is illustrated in Figure 3.1. Successful DW/BI implementations depend on the appropriate amalgamation of numerous tasks and components; it's not enough to have a perfect data model or best of breed technology. The Lifecycle diagram is the overall roadmap depicting the sequence of tasks required for effective design, development, and deployment.
The first box on the roadmap focuses on getting the program/project launched, including scoping, justification, and staffing. Throughout the Lifecycle, ongoing program and project management tasks keep activities on track.
Eliciting business requirements is a key task in the Kimball Lifecycle because these findings drive most upstream and downstream decisions. Requirements are collected to determine the key factors impacting the business by focusing on what business users do today (or want to do in the future), rather than asking “what do you want in the data warehouse?” Major opportunities across the enterprise are identified, prioritized based on business value and feasibility, and then detailed requirements are gathered for the first iteration of the DW/BI system development. Three concurrent Lifecycle tracks follow the business requirements definition.
DW/BI environments mandate the integration of numerous technologies, data stores, and associated metadata. The technology track begins with system architecture design to establish a shopping list of needed capabilities, followed by the selection and installation of products satisfying those architectural needs.
The data track begins with the design of a target dimensional model to address the business requirements, while considering the underlying data realities. The word Kimball is synonymous with dimensional modeling where data is divided into either measurement facts or descriptive dimensions. Dimensional models can be instantiated in relational databases, referred to as star schemas, or multidimensional databases, known as OLAP cubes. Regardless of the platform, dimensional models attempt to address two simultaneous goals: ease of use from the users' perspective and fast query performance. The enterprise data warehouse bus matrix is a key Kimball Lifecycle deliverable representing an organization's core business processes and associated common conformed dimensions; it's a data blueprint to ensure top-down enterprise integration with manageable bottom-up delivery by focusing on a single business process at a time. The bus matrix is tremendously important because it simultaneously serves as a technical guide, a management guide, and a forum for communication with executives. The bus matrix is illustrated and described in detail in articles 5.5, The Matrix, and 5.6, The Matrix: Revisited.
The dimensional model is converted into a physical design where performance tuning strategies are considered, then the extract, transform, and load (ETL) system design and development challenges are tackled. The Lifecycle describes 34 subsystems in the ETL process flow grouped into four major operations: extracting the data from the source, performing cleaning and conforming transformations, delivering the data to the presentation layer, and managing the back room ETL processes and environment.
While some project members are immersed in the technology and data, others focus on identifying and constructing a broad range of BI applications, including standardized reports, parameterized queries, dashboards, scorecards, analytic models, and data mining applications, along with the associated navigational interfaces.
The three Lifecycle tracks converge at deployment, bringing together the technology, data, and BI applications. The deployed iteration enters a maintenance phase, while growth is addressed by the arrow back to project planning for the next iteration of the DW/BI system. Remember that a DW/BI system is a long term program, not a one-off project!
Throughout the Kimball Lifecycle, there's a recurring theme acknowledging that DW/BI professionals must continuously straddle the business's requirements and the underlying realities of the source data, technology, and related resources. Project teams who focus exclusively on the requirements (or realities) in isolation will inevitably face significant delivery and/or business adoption risks.
Finally, we've said it before, and we'll surely repeat it again. Regardless of your organization's specific DW/BI objectives, we believe your overarching goal should be business acceptance of the DW/BI deliverables to support decision making. This target must remain in the bull's eye throughout the design, development, and deployment lifecycle of any DW/BI system.
Margy Ross, Design Tip #49, Sep 15, 2003
Plenty of folks in the DW/BI industry seem to feel personally qualified to explain the Kimball approach, while further fueling debate about which approach for tackling DW/BI development is best. Unfortunately, they sometimes spread misunderstandings and continue to blur the issues. While we'd never pretend to be an expert on the corporate information factory (CIF), we do feel it's our responsibility to clarify our methods rather than watching from the sidelines.
When we wrote the first edition of The Data Warehouse Lifecycle Toolkit (Wiley, 1998), we referred to our approach as the Business Dimensional Lifecycle. In retrospect, we should have probably just called it the Kimball Approach as suggested by our publisher. We chose the Business Dimensional Lifecycle label instead because it reinforced our core tenets about successful data warehousing based on our collective experiences since the mid-1980s:
Somewhere along the line, we were tagged as being a “bottom-up” approach. Perhaps this term was associated because of our strong alignment with the business. Unfortunately, the label fails to reflect that we strongly recommend the development of an enterprise data warehouse bus matrix to capture the relationships between the core business processes/events and core descriptive dimensions BEFORE development begins. These linkages ensure that each project iteration fits into the larger enterprise puzzle.
Finally, we believe conformed dimensions, logically defined in the bus matrix and then physically enforced through the ETL staging process, are absolutely critical to data consistency and integration. They provide consistent labels, business rules/definitions, and domains that are reused as we construct more fact tables to integrate and capture the results from additional business processes/events.
So these are the concepts that we hold near and dear. I know I'm biased, but I frankly don't see that they warrant debate.
Ralph Kimball, Intelligent Enterprise, Jan 14, 2002
Data warehousing is interesting because it involves so many different kinds of businesses and because the responsibility is so central to the mission of IT. But, as important as this job is, I have often felt overwhelmed when I listen to someone explain all the data warehouse manager's responsibilities. Be responsive to the business. Be responsive to the business users. Use technology wisely. Don't forget anything. Deliver results on time. Be courteous, kind, thrifty, brave, clean, and reverent.
Sometimes I find that casting data warehouse responsibilities in the negative is an effective way to cut through the vagueness. We've always been told what to do; now let's balance the list with what not to do. I list the mistakes in order of increasing seriousness here. Of course, all of these are showstoppers, so you might order them differently.
Mistake 1: Rely on past consultants or other IT staff to tell you the data warehouse requirements. Don't interview the business users.
The reality: Nothing substitutes for direct exposure to the business users. Develop and trust your instincts gained from firsthand experience. Develop the ability to listen.
Mistake 2: Live with the assumption that the administrators of the major OLTP source systems of the enterprise are too busy and important to spend a lot of time with the data warehouse team, and they certainly cannot significantly alter their operational procedures for passing data to or from the data warehouse.
The reality: If your organization really understands and values the data warehouse, then the OLTP source system administrators should be effective partners with you in downloading the data you need and uploading your cleaned data, such as customer names and addresses.
Mistake 3: After the data warehouse has been rolled out, set up a planning meeting to discuss ongoing communications with the business users, if the budget allows.
The reality: Newsletters, training sessions, and ongoing personal support of the user community should all be part and parcel of the first rollout of the data warehouse.
Mistake 4: Make sure all the data warehouse support personnel have nice offices in the IT building, which is only a short drive from the business users. Set up a data warehouse support number with lots of touchtone options, and a guarantee that you will get back to the users as soon as possible. And of course, assure the users that they can email the support team any time, day or night.
The reality: Data warehouse support people should be physically located in the business user departments, and while on assignment, should spend all their waking hours devoted to the business content of the departments they serve. Such a relationship engenders trust and credibility with the business users, which ultimately is the “gold coin” for IT.
Mistake 5: Declare user success at the end of the first training class. Make sure that the user's business intelligence tools are very powerful and be sure to demonstrate every feature and every command, including building complex reports, in the first training class. Defer training about the content of the data, because you have scheduled the training class on the dummy data you have been using for development, and the real data won't be ready for another few months. Don't bother to schedule follow up training or training for new employees. You've met the milestone.
The reality: Delay training until your first business process subject area is ready to go live on real data. Keep the first training session short and focus only on the simple uses of the tool. Train 50 percent on the tool and 50 percent on the content of the data. Plan on a permanent series of beginner and follow up training classes. Take credit for the user success milestone when your trained users are still using the data warehouse six months after training.
Mistake 6: Assume that sales, operations, and finance users will naturally gravitate to the good data and will develop their own killer apps.
The reality: Business users are not application developers. Most will use the data warehouse only if a killer application is waiting to beckon them.
Mistake 7: Make sure that before the data warehouse is implemented you write a comprehensive plan that describes all possible data assets of your enterprise and all the intended uses of information. Avoid the seductive illusion of iterative development, which is only an excuse for not getting it right the first time.
The reality: Very few organizations or human beings can develop the perfect, comprehensive plan for a data warehouse up front. Not only are the data assets of an organization too vast and complex to describe completely up front, but the urgent business drivers, and even the staff, will change significantly over the life of the first implementation. Start with a lightweight data warehouse bus architecture of conformed dimensions and conformed facts, and build your data warehouse iteratively. You will keep altering and building it forever.
Mistake 8: Don't bother the senior executives of your organization with the data warehouse until you have it up and running and can point to a significant success.
The reality: The senior executives must support the data warehouse effort from the very beginning. If they don't, or can't, then your organization may not be able to use a data warehouse effectively. Get this support at the very beginning.
Mistake 9: Encourage the business users to give you continuous feedback throughout the development cycle about new data sources and new key performance metrics they would like to see. Make sure to include these requirements in your upcoming release.
The reality: You need to think like a software developer and manage three very visible stages of developing each subject area deliverable:
The trick is to turn around the development cycle as fast as possible, where in each “sprint” you have a clearly defined and achievable set of goals which you implement and test before accepting new input.
Mistake 10: Agree to deliver a high profile customer-centric dimensional model as your first deliverable. Ideally, choose customer profitability or customer satisfaction as your beginning point.
The reality: These kinds of subject areas are consolidated, “second level” subject areas with serious dependencies on multiple sources of data. Customer profitability requires all the sources of revenue and all the sources of cost, as well as an allocation scheme to map costs onto the revenue! Focus the first deliverable instead on a single source of data and do the more ambitious subject areas later.
Mistake 11: Define your professional role as the authority on appropriate use of the data warehouse. Educate the business users as to how to think about the data, and what the appropriate uses of computers should be. Systematically raise the sophistication of the user community until most business users can develop their own data access applications, thereby eliminating the need for long term support.
The reality: Your job is to be the publisher of the right data. Your professional role is to listen to the business users, who are always right. The users, not you, define the usability of the computers. You will be successful only if you serve the users' needs, not the other way around.
Mistake 12: Collect all the data in a physically centralized data warehouse before interviewing any business users or releasing any data marts. Ideally, implement the data warehouse on a single, monolithic machine where you can control and protect everything.
The reality: More power to you if you have the organizational clout and the budget to implement a fully centralized data warehouse. But, in my opinion, a centrally planned data warehouse is as likely to be as successful as a centrally planned economy. It's hard to argue against idealistic inspirational promises made in advance of reality, but the truth is, the centrally planned systems often don't work. Instead, build cost-effective, distributed systems, and add incrementally to the logical and physical design as you learn from your business users. And finally, don't assume that your big, expensive, centralized machine is intrinsically secure because it is big, expensive, and centralized. If anything, such a centralized machine is a single point of failure—a system vulnerability.
Bob Becker and Ralph Kimball, Intelligent Enterprise, Mar 26, 2007
Data warehousing is a mature discipline with well-established best practices. But these best practices are useless or even harmful if they are described inaccurately or incompletely. We have published more than 100 articles describing various aspects of the Kimball method. Yet every year or two we encounter serious misrepresentations made especially by speakers at training organizations where hundreds of students are given misleading information about our approach.
This article addresses major points of misunderstanding and vagueness by providing guidelines that DW/BI professionals can tuck into their project notebooks and refer to as unassailable facts and best practices of the Kimball method.
The Kimball method is specifically intended to deliver large-scale enterprise DW/BI solutions. Occasionally it has been described as a bottom-up approach, but it's more accurately described as a blended approach starting with an enterprise, top-down view. At the same time, it's tempered with the bottom-up realities of real data sources.
We teach an enterprise point of view, starting with horizontal, cross-department requirements gathering. This involves the executive team, senior managers, and data analysts identifying and prioritizing high value needs. The next step is to create the enterprise data warehouse bus matrix, a pivotal design document and powerful tool for understanding and creating the appropriate enterprise data architecture to support the business requirements. As we've said many times, real data sources in their atomic form are the data marts (or business process subject areas) of the enterprise, a definition that differs from other designers who define data marts only as aggregated releases from a centralized data store. When we then say (correctly) that the enterprise data warehouse is the sum of these data marts, other observers sometimes miss the point of our architecture; see article 4.10, The Bottom-Up Misnomer, for more details.
Business intelligence is a term that has emerged and evolved over the past few years and is now often used to describe all the systems and processes an enterprise uses to gather, process, provide access to, and analyze business information. The term data warehouse is now used to mean the platform for all forms of business intelligence.
Since we have been writing on this topic for more than 20 years, we are beholden to our legacy of books and articles. In fact, “data warehouse” is included in the title of all of our books! Nonetheless, changing industry vernacular does not change the core concepts and methodologies described by the Kimball method. Our approach has always embraced the entire, end-to-end process as critical to an organization's success.
The Kimball method is predicated on the principle that all business user access to data is supported via dimensional schemas. Thus, what we call the presentation area of the overall business intelligence solution is comprised of a number of granular, atomic fact tables decorated with a rich set of descriptive, conformed dimension tables. We specifically avoid summarized, departmental data marts supported by a large, normalized data warehouse providing access to atomic data. We believe that such a centralized and normalized view of a data warehouse is responsible for many of the failures of data warehousing to support business intelligence applications.
Dimensional modeling is a design discipline focused on optimizing the business intelligence platform for business users' ease of use and query performance. To achieve the goals of simple and fast, we describe a set of very specific design recommendations:
Articles 5.16, Differences of Opinion, and 6.10, Fables and Facts, describe these concepts in greater detail.
Data integration and consistency are key goals of any enterprise business intelligence effort. Data integration requires organizational consensus to establish and administer common labels and measures enterprise wide. In the Kimball method, these labels and measures reside in conformed dimensions and conformed facts, respectively. Conformed dimensions are typically built and maintained as centralized, persistent master data during ETL, then reused across dimensional models to enable data integration and ensure consistency.
We enthusiastically support the recent master data management (MDM) and customer data integration (CDI) trends, because they are very consistent with the conformed approach. For more insight, read articles 5.6, The Matrix: Revisited, and 5.12, Integration for Real People.
Our approach describes a formal data staging area, much like the kitchen in a restaurant, with detailed ETL processes required to bridge the gap between the production system source data and the presentation area dimensional schema. The approach further defines cleaning and conforming activities as part of the transformation process.
Let there be no doubt, the ETL effort is hard work. The ETL system is often estimated to consume 70 percent of the time and effort of building a business intelligence environment. Too often, little thought goes into architecting a robust ETL system, and it ends up as an uncoordinated, spaghetti-mess of tables, modules, processes, scripts, triggers, alerts, and job schedules. This sort of design approach has unmistakably derailed many business intelligence efforts.
The Kimball method describes a comprehensive set of ETL subsystems that comprise a robust set of ETL best practices, including those required to support real-time requirements as described in article 11.2, The 34 Subsystems of ETL.
Be wary of any approach that suggests that ETL is no longer a required architectural component. Some architects believe that a simple intermediate data structure or an integration software layer is all that's needed to perform translation on the fly. Unfortunately, true data integration can only succeed if the textual descriptors in each separate source are physically altered so they have the same label (column name) and content (data domain values). If it sounds too easy, it is.
This article has highlighted five best practices drawn from the Kimball method, which we recommend designers study carefully in order to avoid the misrepresentations sometimes heard in various teaching and writing venues. As a designer, you are free to choose any approach you are comfortable with, but we want you to think critically when you are making these choices.
Ralph Kimball, Intelligent Enterprise, Apr 26, 2009
In today's economic climate, DW/BI projects face two powerful and conflicting pressures. On the one hand, business users want more focused insight from their BI tools into customer satisfaction and profitability. Conversely, these same users are under huge pressure to control costs and reduce risks. The explosion of new data sources and new delivery modes available for BI really makes this dilemma acute.
How can we fail? We could do nothing, thereby overlooking important customer insights and specific areas where we could be more profitable. We could start a task force to produce a grand architectural specification covering the next couple of years, which is just another way of doing nothing. We could implement several high priority spot solutions, ignoring overall enterprise integration. We could start by buying a big piece of iron, believing that it is so powerful that it will handle any type of data, once we decide what that data is.
You get the idea. Even though some of these ways to fail seem obviously dumb, we can nevertheless find ourselves in these positions when we respond with a crisis mentality.
How can we succeed? How can we move forward quickly and decisively while at the same time clamping down on risk? Enterprise data warehousing (EDW) development is never easy, but this article presents eight guidelines for approaching this intimidating task in a flexible, reasonably low risk way.
We recommend a simple technique for deciding what the right thing is. Make a list of all your potential DW/BI projects and place them on a simple 2 x 2 grid, like the one in Figure 3.2.
Figure out, with your business executives, how valuable each of the potential projects would be, independent of the feasibility. Next, do an honest assessment of whether each project has high quality data and how difficult it will be to build the data delivery pipelines from the source to the BI tool. Remember that at least 70 percent of BI project risks and delays come from problems with the data sources and meeting data delivery freshness (latency) requirements.
Once projects have been placed on the grid, work from the upper-right corner. Project A in Figure 3.2 has high business impact and is eminently feasible. Don't take the easy way out and start with low risk project D. That project may be feasible, but even if you do a great job, it won't have much impact. Similarly, don't start with project C. The users would love to have it, but there are big feasibility issues which translate into big risks.
A few years ago, data warehousing was essentially relabeled as “business intelligence.” This relabeling was far more than a marketing tactic, because it correctly signaled the transfer of the initiative and ownership of the data assets to the business users. Everyone knows instinctively that they can do a better job if they can see the right data. Our job in IT is to sort through all the technology in order to give the users what they want.
The transfer of control means having users directly involved with, and responsible for, each DW/BI project. Obviously these users have to learn how to work with IT so as to make reasonable demands. The impact-feasibility grid shown in Figure 3.2 is not a bad place to start.
In this era of financial uncertainty, it's hard to justify a classic waterfall approach to DW/BI development. In the waterfall approach, a written functional specification is created that completely specifies the sources, the final deliverables, and the detailed implementation. The rest of the project implements this specification, often with a big bang comprehensive release. The origins of the waterfall approach lie in the manufacturing industry, where changes after implementation are prohibitively costly. The problem with the waterfall approach for DW/BI projects is that it takes too long and does not recognize the need to adapt to new requirements or changes in understanding.
Many DW/BI projects are gravitating to what could be called an “agile” approach that emphasizes frequent releases and midcourse corrections. Interestingly, a fundamental tenet of the agile approach is ownership by the business users, not by technical developers.
An agile approach requires tolerating some code rewriting and not depending on fixed price contracts. The agile approach can successfully be adapted to enterprise-wide projects such as master data management and enterprise integration. In these cases, the first few agile releases are not working code but rather architectural guidelines.
Governance is recognizing the value of your data assets and managing those assets responsibly. Governance is not something that is tacked onto the end of a DW/BI project. Governance is part of a larger culture that recognizes the value of your data assets and is supported and driven by senior executives. At the level of an individual project, governance is identifying, cataloging, valuing, assigning responsibility, securing, protecting, complying, controlling, improving, establishing consistent practices, integrating across subject areas, planning for growth, planning to harvest value, and generally nurturing. Governance doesn't need a waterfall approach, but these issues need to be part of the project from the very start. Failing to think about governance can result in fundamental rework of the DW/BI project.
One thing is certain in the BI space: The nature of the user-facing BI tools cannot be predicted. In the future, what's going to be more important: data mining predictive analytics, delivery to mobile devices, batch reporting, real-time alerts, or something we haven't thought of yet? Fortunately, we have a good answer to this question; we must recognize that the enterprise data warehouse is the single platform for all forms of business intelligence. This viewpoint makes us realize that the enterprise data warehouse's interface to all forms of BI must be agnostic, simple, and universal.
Dimensional modeling meets these goals as the interface to all forms of BI. Dimensional schemas contain all possible data relationships, but at the same time can be processed efficiently with simple SQL emitted by any BI tool. Even the flat files preferred by data mining tools are easily delivered from dimensional models.
Enterprise-wide integration has risen to the top of the list of DW/BI technical drivers along with data quality and data latency. Dimensional modeling provides a simple set of procedures for achieving integration that can be effectively used by BI tools. Conformed dimensions enable BI tools to drill across multiple business process subject areas, assembling a final integrated report. The key insight is that the entire dimension (customer, for example) does not need to be made identical across all subject areas. The minimum requirement for a drill-across report is that at least one field be common across multiple subject areas. Thus, the EDW can define a master enterprise dimension containing a small but growing number of conformed fields. These fields can be added incrementally over time. In this way, we reduce the risk and cost of enterprise integration at the BI interface. This approach also fits well with our recommendation to develop the DW/BI system incrementally.
In article 11.19, An Architecture for Data Quality, I describe an effective approach to managing data quality by placing data quality screens throughout the data pipelines leading from the sources to the targets. Each data quality screen is a test. When the test fails or finds a suspected data quality violation, the screen writes a record in an error event fact table—a dimensional schema hidden in the back room away from direct access by business users. The error event fact table lets DW/BI administrators measure the volume and source of the errors encountered. A companion audit dimension summarizes the error conditions and is exposed to the business users along with every dimensional fact table.
The data quality screens can be implemented one at a time, allowing development of the data quality system to grow incrementally.
Finally, a seemingly small recommendation to reduce your DW/BI development risk: Make sure to build all your dimensions (even type 1 dimensions) with surrogate primary keys. This insulates you from surprises downstream when you acquire a new division that has its own ideas about keys. What's more, all your databases will run faster with surrogate keys.