Loading Architecture

Back in Chapter 2, we identified some key software architectural decisions the DBA must make. Principal among those were how many instances will comprise the data warehouse and will the data warehouse data be loaded in one step or two? In this chapter, we will focus on just the latter issue—loading the data.

When loading data into a data warehouse, there are two options (shown in Figure 6-1):

  • Option 1— Load the data from the source directly into the data warehouse's query tables.

  • Option 2— Load the data from the source into a staging area first, then into the query tables.

Figure 6-1. Common Data Warehouse ETL Architectures


We'll refer to these approaches as transform, then load and load, then transform, respectively. We reviewed the pros and cons of these approaches back in Chapter 2. Now, we'll examine optimally implementing them using Oracle 8i and 9i in the sections that follow. The implementation options are somewhat different since the database versions' features are different (i.e., newer Oracle versions tend to offer newer, better data warehousing solutions).

We also reviewed back in Chapter 2 that data loading programs must be designed to utilize SMP/MPP multi-CPU architectures, otherwise CPU usage may not exceed 1/No. of CPUs. The Golden Rules are very simple:

  • Minimize inter-process wait states.

  • Maximize total concurrent CPU usage.

Our goal will be to achieve a parallel loading architecture something like Figure 6-2.

Figure 6-2. True Parallel ETL Processing via Forking


However, we must first examine two other criteria before we can settle on our final data loading architecture: upstream data sources and data transformation requirements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset