Data Warehouse vs. Big Database

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data Warehouse vs. Big Database

One of the key mistakes people make is labeling their database as a data warehouse solely based on its size. Over the past decade, three phenomena have occurred resulting in major increases in average database size:

The cost of space versus the value of the data has decreased.
Companies now value the data as a critical business asset.
Companies have merged into large multi-national entities.

In other words, the cost of keeping data online is cheap, the perceived value of that data is now very high, and the size of companies and their data needs have grown. As such, many of today's OLTP and ODS databases routinely grow into the 100–800 gigabyte (GB) range. But that does not make them data warehouses. For example, SAP and PeopleSoft enterprise resource planning (ERP) databases of 400 GB or more are not uncommon, yet they are not data warehouses, even at these extremely large sizes. Remember, size alone does not a data warehouse make.

The simplest way to avoid labeling a large database as a data warehouse is to add some DBA-centric questions and answers to the description of the nature of that database. For each subject area in your data warehouse, simply ask the physical DBA to provide estimates for the following seven items:

The number of tables
Average big table row count
Average big table size in GB
Largest table's row count
Largest table's size in GB
Largest transaction rollback needed in GB
Largest temporary segment needed in GB

Data warehouses generally have fewer, larger tables, whereas non-data warehouse databases usually possess more, smaller tables. Of additional interest are the temporary and rollback segment needs of the database. Data warehouses tend to need them as large as the largest object (for rebuilds), whereas non-data warehouse databases only need them large enough for the largest transaction.

Use the criteria outlined in Table 1-2 for your evaluation.

Table 1-2. General Database Application Characteristics
	OLTP	ODS	OLAP	DM / DW
Number of Tables	100–1000's	100–1000's	10–100's	10–100's
Average Table's Row Count	10's of Thousands	10's of Thousands	10–100's of Millions	100–1000's of Millions
Average Table's Size in GB	10's of MB	10's of MB	10's of GB	10–100's of GB
Largest Table's Row Count	10–100's of Millions	10–100's of Millions	10–100's of Millions	100–10,000's of Millions
Largest Table's Size in GB	10's of GB	10's of GB	10's of GB	10–100's of GB
Rollback Segment's Size in GB	100's of MB	100's of MB	N/A	10–100's of GB
Temp Segment's Size in GB	100's of MB	100's of MB	N/A	10–100's of GB

Continuing with our previous example, suppose your requirements are as follows:

8 tables
500 million rows per big table
50 GB per big table
2 billion rows for largest table
160 GB for largest table
160 GB to rebuild largest table
60 GB to rebuild largest index

From this example, we can again discern that we have a data mart or data warehouse. First, we have very few tables. A typical OLTP or ERP database would have hundreds or even thousands of tables. Second, the row counts of our smallest big table and largest table have the right order of magnitude. Row counts expressed with lots of zeros or in powers of ten greater than ten (e.g., 10¹⁰) are more likely to be in data warehouses. Finally, look at our rollback and temporary segments' needs. They're as big as some entire databases!

While it may seem like I've once again painted an example tailored to the conclusion, I've actually found the process to be this straightforward and easy in most cases as well. Unfortunately, these days, people tend to call any very large database a data warehouse. Once again, it's okay for people to call their projects whatever they like. But as pointed out, the techniques in this book only apply to the DM/DW column of Table 1-2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data Warehouse vs. Big Database

Create new playlist

Sign In

Sign Up

Data Warehouse vs. Big Database

Table of Contents for
Data Warehouse vs. Big Database