Chapter 17

Creating a Data Flow

This lesson covers the basics of the Data Flow Task. Section 3 comprises lessons covering the sources, destinations, and transformations in detail. However, this lesson gives you the tools to get started creating a Data Flow and understanding its purpose.

The Data Flow Task is used to transfer data from a source to a destination and can transform the data as needed. The Data Flow Task is capable of handling large amounts of data. The source and destination can be any of several different types, such as a flat file or database, and can contain millions of rows of data. The destination can also be of several types.

You can use the Data Flow Task to extract data from a database and write to a flat file location or to move the data from a flat file to a database. This capability enables you to receive files from outside sources and write this data to your database. You can also use the Data Flow Task to move data from one database to another.

The transforms that exist in the Data Flow enable you to make changes to the data as you move it from a source to a destination. For example, if you are receiving flat files from a vendor and the data is not formatted correctly (say, the Social Security numbers need to have dashes) you can fix that before writing it to a database. Fixing things like that prior to writing to your database prevents you from having to run updates on your database later.

Additionally, these transforms are faster in SSIS. SSIS performs the transforms in memory, which is why it is much faster than reading and writing the data to a drive. This speed is especially evident in the case of running updates to a table. SQL update commands read data from a database and write data back to the same database. This reading and writing to the same location makes the process very slow compared to SSIS.

The Data Flow enables you to save data to multiple locations simultaneously, which also improves performance when you are saving data. You can receive a flat file from a vendor, open it with an SSIS package, parse through the data, make decisions on where data needs to be written, and write the data to multiple locations, all in one Data Flow.

You have two ways to create a Data Flow in a package. You can drag out the Data Flow Task from the Toolbox, or you can click the Data Flow tab at the top and click the blue link in the middle of the screen. This link states, as shown in Figure 17-1, “No Data Flow tasks have been added to this package. Click here to add a new Data Flow task.”

If there is already a Data Flow in the package, clicking the Data Flow tab shows that Data Flow. If multiple Data Flows are in a package, you see a drop-down menu at the top of the Data Flow screen showing the list of all the Data Flows in the package. It is a best practice to give the Data Flows a name that is descriptive. With descriptive names, you can then easily select the correct Data Flow you are trying to alter. Descriptive naming is a major help when your package contains a large number of Data Flows.

After you enter the Data Flow tab by either method previously mentioned, the Toolbox will contain a new set of tools. The top section contains the Favorites, the middle contains the common transforms, and the bottom contains the source and destinations. These tasks can be used only in the Data Flow Task and cannot be used in the Control Flow screen. You can move any item in the Toolbox to the Favorites section by right-clicking on it and clicking Move to Favorites.

Several sources have the same type as a destination. For example, there is an Excel Source and an Excel Destination. These tasks are not interchangeable. A source can only read data and a destination can only write data. Keep in mind that any connections you add to the Connection Managers in a package can be reused in other Data Flows or even in the Control Flow of the package (connection managers are explained in Lesson 6). So, a source in a Data Flow can connect to an Excel file, and an Excel Destination can connect to the same Excel file. The connection exists just once in the connection manager but can be used multiple times in a package.

Once you drag a source into a Data Flow, two lines appear from the bottom of the task. The blue line is the good data. The red line is the bad data. Sources and destinations, including how to use these blue and red lines, are explained in detail in the lessons following this one. Double-clicking the source opens the source editor for that source type. In the editor, you can select the location of the source.

After your source is established, you can connect it to a transform from the transformation section of the Toolbox. This transform can manipulate the data to be in the form you want for your destination. The transform can also make complex decisions to send data to different destinations.

Once the sources and transforms are created, you can drag out a destination and connect the last step of the transforms to the destination. If the Data Flow does not contain any transforms, the source can be connected directly to the destination, which simply moves data from one location to another without changing the data. This arrangement is the simplest form of a Data Flow.

Try It

In this Try It, you create a package with a Data Flow Task. The Data Flow is going to move data from a SQL database table to a flat file. After this lesson, you will have an understanding of how to create a Data Flow Task in the Control Flow of a package.

Lesson Requirements

You need to create a Data Flow in a package and create a source and a destination. The source is going to be an OLE DB connection to the AdventureWorks2012 database to the Products table. The destination is going to be a flat file you create.

You can download the completed Lesson17.dtsx from www.wrox.com.

Hints

  • You need only one Data Flow Task for this example.
  • The Data Flow needs a source and a destination.
  • The package needs two connections in the connection manager.

Step-by-Step

1. Drag a Data Flow Task into a blank package.
2. Double-click the Data Flow Task to enter the Data Flow tab.
3. Drag in an OLE DB Source.
4. Double-click the OLE DB Source to open the Source Editor.
5. Click the New button. The Configure OLE DB Connection Manager dialog box opens (see Figure 17-2). Select the source connection to AdventureWorks2012 and click OK. Note: If the connection exists in this window, you can skip steps 6–8.
6. If the AdventureWorks2012 connection is not shown in the Configure OLE DB Connection Manager dialog box, click the New button, which takes you to the Connection Manager dialog box (see Figure 17-3).
7. Set the Server Name to the location of the server with the AdventureWorks2012 database. Usually this name is LocalHost if the server is on your machine.
8. Select AdventureWorks2012 in the Select or enter a database name drop-down and click OK twice.
9. In the OLE DB Source Editor, click the drop-down menu of tables and select the Production.Product table.
10. Click the Columns node in the left-hand pane. You should see columns in the Products table.
11. Click OK to close the Source Editor.
12. Right-click the OLE DB Source and select Rename.
13. Change the Name to AW Products.
14. Drag a Flat File Destination into the Data Flow.
15. Connect the blue line from the bottom of the AW Products Source to the Flat File Destination.
16. Double-click the Flat File Destination to open the Flat File Destination Editor.
17. Click the New button.
18. Select Delimited in the Flat File format window and click OK.
19. In the Flat File Connection Manager Editor, change the name to AW Products Extract.
20. Change the Description to All AW product data.
21. Type C:AWProducts.txt in the File Name text box.
22. Click OK to close the Connection Manager.
23. Click the Mappings node in the left-hand pane; the mappings should look like Figure 17-4.
24. Click OK to close the Flat File Destination Editor.
25. Right-click the Flat File Destination and click Rename. Change the name to AW Products Extract File. The Data Flow should match Figure 17-5.
26. Click the blue debug arrow on the toolbar. A new file will be created in your C: drive containing all the data from the product table. (If you do not see a file, you may not have rights to create a file and may need to start SQL Server Data Tools MD in administrative mode.)
27. Click the Stop Debug button on the toolbar and look at the contents of the file on your C: drive.
28. You can delete the text file when you are done viewing it.

Please select Lesson 17 on the DVD, or online at www.wrox.com/go/ssis2012video, to view the video that accompanies this lesson.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset