Section 10

Loading a Warehouse

  • Lesson 60: Dimension Load
  • Lesson 61: Fact Table Load

Chapter 60

Dimension Load

If you have a data warehouse, you’ve probably been thumbing through this book looking for a way to load your dimension tables. Luckily, what used to take thousands of lines of code is now done with a simple wizard in SSIS. This Slowly Changing Dimension (SCD) Wizard is a Data Flow object that takes all the complexity out of creating a load process for your dimension table.


NOTE This lesson does not cover a Step-by-Step example on how to build a data warehouse from a design perspective because that is a book in itself.

Before we discuss the Slowly Changing Dimension Wizard, you must understand a bit of terminology. The wizard can handle three types of dimensions: Type 0, Type 1, and Type 2. Each of these types is defined on a column-by-column basis.

  • A Type 0 (Fixed Attribute) dimension column does not allow you to make updates to it, such as a Social Security number. Even if the source value changes, the change is not propagated to a fixed attribute.
  • A Type 1 (Changing Attribute) dimension handles updates, but does not track the history of such changes.
  • A Type 2 (Historical Dimension) dimension tracks changes of a column. For example, if the price of a product changes and it’s a Type 2 column, the original row is expired, and a new row with the updated data is created.

The last term you need to be familiar with is inferred members (also called late arriving dimension members). These happen when you load a fact table and the dimension data doesn’t exist yet, such as if you are loading a sale record into the fact table when the product does not exist. Perhaps you get the product data from one server and the sales information from another. The server with the product data was unavailable, but the sales server was available. You, therefore, imported sales information, but were unable to update the product table. There may have been a sale for a product that you were unable to load. In that case, your fact load should create a dimension stub record in the dimension table. When the dimension record finally comes from the source, the transform updates the dimension as if it were a Type 1 dimension, even if it’s classified as a Type 2.

To use the Slowly Changing Dimension Wizard, you should first create a Source and Destination Connection Manager, and then create a source component in your Data Flow. Then drag the SCD Transform onto the Data Flow window and connect it to the source. After connecting it to a source or another transform, double-click the transform to open the Slowly Changing Dimension Wizard. The first screen (Figure 60-1) specifies which destination you want to load. First, select the destination connection manager, then the destination table, and then map the source input columns to the target dimension columns. Lastly, select one key to be your business key (the primary key from the source system is sometimes called the alternate key or the business key).

In the next screen (shown in Figure 60-2), assign a type to each column. These are the slowly changing dimension types discussed earlier. The SCD Wizard calls dimension Type 0 a Fixed Attribute, Type 1 a Changing Attribute, and Type 2 a Historical Attribute. This example uses List Price as Historical. All others will be Changing.

If any of those columns are set to a Historical Attribute, then in a few screens (shown in Figure 60-3) you are asked how you want to expire the row and create a new row. The top section enables you to define a column where you can set a value to Expired, Active, or whatever value you want. The bottom section sets a start date and an end date column to a date system or user variable. Don’t worry—all of this can be customized later. The end date column you choose must allow nulls. Current columns will contain null and expired columns will have a non-null end date. All of this is managed by the SCD Task(s).

After you complete the wizard, the template code is created, and your dimension is ready to load. As the Data Flow Task runs, every row is checked to see if the row is a duplicate, new row, or a Type 1 or Type 2 update. Inferred members are also supported. All the code that you see can be customized, but keep in mind that if you change any code and rerun the wizard, the customization will be dropped and the template code will be recreated.

Try It

In this Try It, you learn how to use the Slowly Changing Dimension Wizard to load a new product dimension. You then make some changes to the source data and see the changes flow into the dimension table. After this lesson, you will understand how to load a dimension table using the Slowly Changing Dimension Wizard.

You can download the completed package and SQL scripts for this lesson from www.wrox.com.

Lesson Requirements

To complete this lesson, you must have permissions to create and drop tables from the AdventureWorks2012 and AdventureWorksDW2012 databases. To create and load the Lesson60ProductSource table and to create the Lesson60DimProduct table, run the Lesson60Create.sql script in Management Studio. (The Lesson60Create.sql creation script is available at www.wrox.com.) This script will load only four product rows, so it will be easy for you to see what is happening. You will use the Slowly Changing Dimension Wizard in your package. Your source table is Production.Lesson60ProductSource in the AdventureWorks2012 database. The destination table is Production.Lesson60DimProduct and is in AdventureWorksDW2012. The ListPrice column will be a Type 2 (Historical) dimension column. As a business requirement, you must replace null values in the Color column with the value Unknown. The Color column will be treated as a Type 1 (Changing) attribute. After you run the package, run Lesson60Update.sql to make changes to the source table. Then run the package again to propagate those changes to the destination dimension table.

Hints

  • Use the OLE DB Source to pull data out of the Production.Lesson60ProductSource table.
  • You can use the Derived Column Transform to change the Color column to Unknown if it is null.
  • Use the Slowly Changing Dimension Wizard to load the dimension.

Step-by-Step

1. Create a new package called Lesson60.dtsx. You may also download this package and all the scripts for this lesson, which are available at www.wrox.com.
2. Create a connection manager to the AdventureWorks2012 database called SourceDB and another connection manager to AdventureWorksDW2012 database called DestinationDB.
3. Create a Data Flow Task, and in the Data Flow tab, drag an OLE DB Source over. Point the OLE DB Source to the Production.Lesson60ProductSource table in the AdventureWorks2012 database.
4. Connect a Derived Column Transform and configure it to replace the Color column with the following expression:
ISNULL(Color) ? “Unknown” : Color
5. Drag the Slowly Changing Dimension Transform from the Toolbox and connect the transform to the Derived Column Transform. Open the wizard and go to the Mappings page (shown in Figure 60-1). The Connection Manager property should be set to DestinationDB, and the table should be dbo.Lesson60DimProduct (this table is created by Lesson60Create.sql). Map all the columns by name, but the Name column from the source left side should map to Product Name in the dimension. The ProductID in the input column will map to the Business Key ProductID in the dimension column. Additionally, the Business Key ProductID should be set to the Business key. As you map columns, you may notice that spaces are added in the dimension to make them more user-friendly.
6. The next screen is the Slowly Changing Dimension screen where you assign a dimension type to each column. Set each column to a Changing Attribute except for the List Price, which should be a Historical Attribute, as shown back in Figure 60-2.
7. Click Next to go to the Fixed and Changing Attribute Options screen. You will not need to change any options on this screen, but you can take a look at what is available. On this screen, the Fail the transformation if changes are detected in a fixed attribute option tells the transform how to handle Type 0 changes. When checked, the task will fail if it detects that a fixed attribute has changed. There are no fixed attributes in the example. The second option on this screen applies to changing attributes. The option is Change all matching records, including outdated records, when changes are detected in a changing attribute. A single table may contain both Changing and Historical Attributes. Each time a Historical Attribute changes, a new version of the row is created. What should happen when a fixed attribute changes, especially when that fixed attribute is contained in all historical versions of the record? When checked, the fixed attribute change will automatically be propagated to all versions of the record.
8. Click Next to go to the Historical Attribute screen (shown back in Figure 60-3). Select the Use start and end dates to identify current and expired records option. Set the Start date column box to Effective Start Date and set the End date column box to Effective End Date. Set the Variable to set date values box to System::ContainerStartTime.
9. For the remainder of the screens, you can keep the default options. Click Next, and then click Finish to finish the wizard. Run the package, and the results should look like Figure 60-4.
10. Run Lesson60Select.sql and see that the rows were copied from the source to the destination. Null colors were changed to Unknown and effective Start and End Dates were set. Figure 60-5 shows these results.
11. Run Lesson60Update.sql to make changes to the underlying data. This made a change to a Type 1 column, color was changed to Blue on ProductID = 1. A change was made to a Type 2 column, ListPrice was changed to $2.00 on ProductID = 2. ProductID = 3 was deleted and ProductID = 600 was inserted. All these changes were made to the source. The script will show you the new source rows and the current destination dimension rows, as in Figure 60-6. The top table is the source with changes, and the bottom table is the current destination.
12. Run your package again. The results should look like Figure 60-7. Only two rows were inserted and one row was updated.
13. Now run Lesson60Select.sql one more time and see the results of your new package. Be sure to save your package before you exit. Figure 60-8 shows the results. You should see the Changing Attribute color has been changed to blue in place. The Historical Attribute List Price was changed for the Ball Bearing product. It changed from $0.00 to $2.00. You will see the original row with $0.00 as well as the new row. Remember the purpose of Historical Attributes is to retain the history of changes. The new product was added to the dimension. BB Ball Bearings was deleted from the source, but remains in the dimension. We chose to ignore deletions. Dimension members are rarely deleted, because there could be old historical facts that refer to them.

Please select Lesson 60 on the DVD, or online at www.wrox.com/go/ssis2012video, to view the video that accompanies this lesson.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset