270 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
Choose the one that best matches your situation. A checklist item can have
multiple answers because there might be different answers for different data
sources and tables in your project.
The checklist has the following sections:
???? Determining your data ingest pattern
???? Transformations involving the target database
???? Data volume and latency
???? Populating summary (or aggregate) tables
Determine your ETL pattern
In an operational data warehouse environment, the luxury of having an offline
window at the end of each day to process data in large batches is not always
available. It is expected that data is presented for processing at frequent intervals
during the day and that data must be ingested online without affecting the
availability of data to the business. The different patterns can be described as
follows:
???? Continuous feed
Data arrives continually in the form of individual records from a data source or
data feed using messaging middleware (or by OS pipe, or through SQL
operations). The ETL processes run continuously and ingest each insert and
update as it arrives. Thus, new data is constantly becoming available to
business users rather than at fixed intervals.
???? Concurrent batch (“Intra-day batch”)
Several times a day, data is extracted from source system and prepared for
ingesting into the target database. The ETL processes data in batches (files)
as they arrive or on a schedule. The target table is updated at scheduled
intervals, ranging from twice a day to every 15 minutes.
???? Dedicated batch window (“daily batch”)
After the close of the business day (for example, 5 p.m.), data is extracted
from a source system and prepared for ingesting into the target database.
The ETL application populates the target project table during a dedicated,
scheduled batch window (for example, 5 p.m. to midnight).
A given database, star-schema (or even a given dimension or fact table) might be
populated using more than one pattern.
Although the pattern labels emphasize that each pattern differs in terms of
latency, that is not the only or primary difference. Each pattern requires a
somewhat different approach in articulating service level objectives and deciding
which ingest methods might be suitable.