Chapter 53. SQL Server Audit, change tracking, and change data capture

Aaron Bertrand

SQL Server 2008 introduces three new features for tracking data and object changes within the database: change tracking, change data capture, and SQL Server Audit. Based on the marketing information alone, it might be difficult to determine which, if any, would be useful for your specific scenario. In this chapter, I will compare the features, outline their pros and cons, and try to help you to decide which solution or solutions might work best in your environment.

What are these solutions used for?

Since shortly after the very first database was persisted to disk, database administrators everywhere have been asking questions like, “Who did what to my data, and when did they do it?” It is in our nature to be curious about the way users utilize our systems in general. In some specific cases, it is important to track down exactly when a piece of data was changed, and who was responsible for the change.

Before talking about the new features introduced in SQL Server 2008, it will be useful to briefly outline some of the key motivations for implementing an auditing or change tracking solution in the first place.

One reason is compliance enforcement. Depending on your industry, you may be required to adhere to regulations for data access and privacy, for example Sarbanes-Oxley, HIPAA, or PCI-CISP. Even if you are not required to do so by law, you may still have an interest in recording information about all of the activity on your server. (For example, you may be interested in identifying specific activity, such as a delete operation that nobody will take credit for, without resorting to a log reader utility.) In the past, this could be accomplished by placing the server in C2 Audit Mode, which is an all-or-nothing approach, or by using SQL Trace. Both of these solutions can be overkill if, for example, you only care about queries against the EmployeeSalary table, or updates to the OrgChart table. Although you could filter a SQL Trace to reduce the overhead, there are some scenarios that could lead you to miss events.

A related reason for implementing an auditing or change tracking solution is to answer the simple question, “Who is accessing which data in your system?” For a long time, users have been asking for a SELECT trigger that would allow them to log all (or selected) database access to a table or to a file. Auditing products can do this for you, but there is no universal way, with SQL Server alone (at least through SQL Server 2005), to pick and choose which table you want to audit for this purpose.

Another reason is to observe data trends, in the case where you don’t have a nice big data warehouse to play with. Sometimes you will want to monitor values over time, for example the changing price of gasoline you are charging over the period of a week, so that you can correlate that with daily revenue during the same time frame. If your application is not currently set up to track values historically, then all you have at any given time is the current price. You can set this up manually with triggers, but it can be cumbersome to set up and maintain.

Yet another motivation for row-level auditing is for incremental data loading. You may want to mirror incremental changes to other environments, such as a data warehouse or a data cache server, without the overhead of database mirroring, replication, or needing to determine the deltas between large tables at runtime. In SQL Server 2005 you can accomplish this with database snapshots, but these place several limitations on the source database, such as requiring both source and snapshot to be online perpetually (and reside in the same instance), and reduced performance due to additional I/O operations to maintain the snapshot. Database snapshots also apply to the whole database and cannot be targeted at specific tables.

And finally, your semi-connected applications will often need to determine if potential updates are still valid, and can be applied without conflict. Ideally, your application should be able to ask the database a simple question like, “Has the row I’m working on been changed?” Currently, this is usually assisted by a ROWVERSION column, but this gives you an extra column in your table that can be difficult to work with. Some would suggest using merge replication for semi-connected applications, and although that is a valid alternative in some cases, it can be overbearing to set up and maintain for the average user.

What do people do now?

Before SQL Server 2008, people would use third-party tools, or write code themselves (usually with triggers), for auditing or change-tracking purposes. An article written several years ago, titled “How do I audit changes to SQL Server data?” shows how to use a simple trigger to maintain a log of user activity (INSERT, UPDATE, and DELETE statements only) against a specific table: http://sqlserver2000.databases.aspfaq.com/how-do-i-audit-changes-to-sql-server-data.html. Here are some third-party tools that are commonly used for implementing change tracking and auditing:

Some of these are either not enough or overkill, depending on the purpose of the implementation. Over time, SQL Server customers have been clamoring for some of the functionality found in these tools to be included with the product. Microsoft has answered by providing these three new features with SQL Server 2008.

How does SQL Server 2008 solve these problems?

You can look at each of the three new features and determine which of the preceding issues it will help you solve. Assume you have a simple Employees table, which you want to make available to the payroll system, and which you want to monitor for changes by unauthorized individuals. Listing 1 shows how to prepare the schema and data.

Listing 1. Preparing Employees schema and data
USE [master];
GO

CREATE DATABASE [HR];
GO

USE [HR];
GO

CREATE TABLE dbo.Employees
(
EmployeeID INT PRIMARY KEY,
FirstName NVARCHAR(32),
LastName NVARCHAR(32),
Salary DECIMAL(13, 2)
);
GO

INSERT dbo.Employees
(
EmployeeID,
FirstName,
LastName,
Salary
)
VALUES
(1,'John','Elway',300000),
(2,'Sam','Adams',150000),
(3,'Don','Mattingly',125000),
(4,'Teemu','Selanne',113500),
(5,'Aaron','Bertrand',62750);
GO

SQL Server Audit

SQL Server Audit adds a much easier and more flexible manner of auditing database engine events, compared to utilizing SQL Trace or the C2 Auditing feature available in previous versions of SQL Server. There is an underlying framework that allows you to configure audits at the server or database level, all using relatively straightforward T-SQL commands or the graphical user interface (GUI) components in SQL Server Management Studio. The auditing itself is built on top of the new Extended Events infrastructure, and can write audit records either to a file or to the Windows event log (Application or Security). The overhead of Extended Events is advertised as significantly lighter than the overhead of SQL Trace, but the impact of auditing on your system depends wholly on exactly what and how much you audit.

Reviewing the audited events is simple as well. For file targets, you can use the Log File Viewer utility inside Management Studio, or a new function included in SQL Server specifically for this purpose (fn_get_file_audit). If you target the Windows event log, you can access this programmatically through existing APIs, or through the event log viewer included in Windows. Note that this feature is available in Enterprise and Developer Editions only. (For a more thorough treatment of SQL Server Audit, you can start at the “Understanding SQL Server Audit” Books Online topic, located at http://msdn.microsoft.com/en-us/library/cc280386.aspx.)

The list of actions you can monitor with SQL Server Audit is expansive. At the server level, there are 36 action groups that are eligible for tracking with a SQL Server Audit, and at the database level, there are 15. (An action group is a set of actions.) An example of a server level action group is the DBCC_GROUP, which is raised any time a user executes any DBCC command. At the database level, an example of an action group is the SCHEMA_OBJECT_CHANGE_GROUP, which is raised whenever a user issues a CREATE, ALTER, or DROP operation against a schema. There are also seven individual actions at the database level: SELECT, UPDATE, INSERT, DELETE, EXECUTE, RECEIVE, and REFERENCES. And finally, you can audit an audit, using the audit-level action group AUDIT_CHANGE_GROUP. You can learn more about these actions and action groups in the Books Online topic, “SQL Server Audit Action Groups and Actions,” located at http://msdn.microsoft.com/en-us/library/cc280663.aspx.

Now, for this table, if you wanted to monitor changes, you could do so by setting up an audit to the Application Log, and then set up a database audit specification for this table directly. Listing 2 shows the code to do this.

Listing 2. Creating a server and database audit
USE [master];
GO

CREATE SERVER AUDIT ServerAudit
TO APPLICATION_LOG
WITH ( QUEUE_DELAY = 1000, ON_FAILURE = CONTINUE );
GO

ALTER SERVER AUDIT ServerAudit
WITH ( STATE = ON );
GO

USE [HR];
GO

CREATE DATABASE AUDIT SPECIFICATION HRAudit
FOR SERVER AUDIT ServerAudit
ADD ( SELECT, INSERT, UPDATE, DELETE ON dbo.Employees BY [public] )
WITH ( STATE = ON );
GO

When reviewing audit information (whether in a file, or in the event log), there is a variety of information available to you, including the time of the action, the session_id (SPID) of the user that performed the action, the database, server and object that was the target of the action, and whether or not the action succeeded. For a full listing of the columns written to an audit row, see the Books Online topic, “SQL Server Audit Records,” located at http://msdn.microsoft.com/en-us/library/cc280545.aspx. I was disappointed to see that host name and/or IP address of the session_id is not recorded. This can be important information in some instances, and is difficult to determine after the session has disconnected from the server. For example, if SQL Authentication is enabled, and the sa (or another sysadmin) password is commonly known, then anyone can connect that way via their own machine, and be relatively untraceable.

Another important note here is that the type of action (for example, SELECT or ALTER) is recorded, but in the case of SELECT or DML queries, none of the data involved is included. For example, if you run the statement in listing 3, the event log entry will look like listing 4 (I’ve left out some of the columns).

Listing 3. Updating the Employees table
UPDATE dbo.Employees
SET Salary = Salary * 1.8
WHERE EmployeeID = 5;
Listing 4. Event log entry for the UPDATE command in listing 3
Log Name:      Application
User: N/A
Event ID: 33205
Audit event: event_time:2008-10-05 18:14:31.3784745
action_id: UP
session_id: 56
session_server_principal_name: sa
server_instance_name: SENTINELSQL2008
database_name: HR
schema_name: dbo
object_name: Employees
statement: UPDATE [dbo].[Employees] set [Salary] = [Salary]*@1 WHERE
[EmployeeID]=@2

(Note that you may also see other events in the event log corresponding to the auditing activity itself.)

Because the literal values in the statement are replaced by parameter placeholders, and because the previous version of the data is not included, it is going to be difficult to find the entries where salaries were increased. Also, the information in the event log entry does not include the host name and/or IP address of the computer that issued the statement. So, if you are using SQL Authentication and your developers share a single login, it will be difficult with auditing alone to figure out who performed this update. You can work your way around this by adding the SUCCESSFUL_LOGIN_GROUP to the Server Audit Specification, as shown in listing 5.

Listing 5. Creating a Server Audit with the SUCCESSFUL_LOGIN_GROUP
USE [master];
GO

CREATE SERVER AUDIT SPECIFICATION CaptureLogins
FOR SERVER AUDIT ServerAudit
ADD ( SUCCESSFUL_LOGIN_GROUP )
WITH ( STATE = ON );
GO

Once you do this, you will have login records in the log or file that you can correlate with session_id and event_time to the relevant database audit activity. The successful login entry will have (in addition to session_id and other data observed above) host name information in the following form, under the Additional information field:

Additional information:<action_info...>...
<address>local machine / host name / IP</address>

If you are using Windows Authentication, on the other hand, then this seems like a reasonable way to capture exactly who executed the statement (without having to correlate to login events), but not necessarily what values they passed in. Take the case where you find that Employee 5’s salary has been increased from $100,000 to $250,000. Three such events appear in the Application Log, from three different users, with the exact same UPDATE statement. The first could have updated the salary to $250,000, and the other two could have left it that way (by explicitly defining $250,000 in their UPDATE statement, even though it did not ultimately change the data in the table). Or, the increment could have been performed by the second or third person, or each person could have increased the salary by $50,000. There are millions of other possible permutations, and this is a simple case. Imagine trying to unravel this mystery on a busy system with thousands of simultaneous users all affecting the same table.

Before moving on to the next section, if you have created the sample code above, you can remove it using the code in listing 6.

Listing 6. Cleaning up the audit specification
USE [HR];
GO
IF EXISTS
(
SELECT 1
FROM sys.database_audit_specifications
WHERE name = 'HRAudit'
)
BEGIN
ALTER DATABASE AUDIT SPECIFICATION HRAudit
WITH (STATE = OFF);
DROP DATABASE AUDIT SPECIFICATION HRAudit;
END
GO
USE [master];
GO
IF EXISTS
(
SELECT 1
FROM sys.server_audit_specifications
WHERE name = 'CaptureLogins'
)
BEGIN
ALTER SERVER AUDIT SPECIFICATION 'CaptureLogins'
WITH (STATE = OFF);

DROP SERVER AUDIT SPECIFICATION 'CaptureLogins';
END
GO
IF EXISTS
(
SELECT 1
FROM sys.server_audits
WHERE name = 'ServerAudit'
)
BEGIN
ALTER SERVER AUDIT ServerAudit
WITH (STATE = OFF);

DROP SERVER AUDIT ServerAudit;
END
GO

Change tracking

Change tracking is a feature that adds the ability to determine, at a glance, which rows in a table have changed in a specified period of time. This can be useful for synchronizing data between the primary database and a middle-tier data cache, and for allowing semi-connected applications to detect conflicts when updates have been made on both sides. Change tracking is meant to allow you to identify the rows that changed, but does not keep any information about the values that were changed (for example, a previous version of the row). Change tracking occurs synchronously, so there is some overhead to the process. In general, the overhead is equivalent to the maintenance costs of adding an additional nonclustered index to the table.

The process assumes that you can always get the current version of the row directly from the table, and that you only care about knowing whether or not a row has changed. (Change tracking is described more in-depth in Books Online, starting at the topic, “Change Tracking,” at http://msdn.microsoft.com/en-us/library/cc280462.aspx.)

To set up change tracking on a table, the table must have a primary key, and you must first enable the feature at the database level. (Books Online also suggests that the database must be at least at a compatibility level of 90, and that snapshot isolation is enabled.) Using the HR database and the dbo.Employees table created in the previous section, you can enable change tracking as shown in listing 7.

Listing 7. Enabling change tracking
ALTER DATABASE HR SET ALLOW_SNAPSHOT_ISOLATION ON;

ALTER DATABASE HR SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 3 DAYS, AUTO_CLEANUP = ON);
GO

USE HR;
GO

ALTER TABLE dbo.Employees
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = OFF);
GO

At the database level, the option CHANGE_RETENTION indicates how long you keep information about rows that have changed. If the applications last checked for changed data before that period started, then they will need to proceed as if the entire table is brand new (so a caching application, for example, will need to reload the entire table and start from scratch). AUTO_CLEANUP is the option that specifies whether this periodic purging should take place, and it can be disabled for troubleshooting purposes. Although this sounds like something that requires SQL Server Agent, it is handled by an internal background task. It will work on all editions of SQL Server, including Express Edition, with or without SQL Server Agent enabled.

At the table level, the TRACK_COLUMNS_UPDATED option is used to specify whether the system should store information about which columns were changed, or store the fact that the row was changed. The former can be useful for an application that tries to synchronize or cache data from a table that contains both an INT column and a LOB column (for example, VARCHAR(MAX)). Instead of pulling an identical copy of the LOB column for a row that changed, it can ignore that column and keep its local copy if it knows that it was not a part of any update that has happened since it was last loaded.

Once change tracking is enabled, what an application can do is connect to the database, and determine the current baseline version of the table. This is a BIGINT value that is returned by calling the new function CHANGE_TRACKING_CURRENT_VERSION() (this represents the most recent committed transaction). Once the application knows this value, it can load all of the data from the table, and then can check for further updates later using the CHANGETABLE() function. This function will return a set of data representing any rows that have changed in the specified table since the baseline version retrieved above. The following is all in T-SQL, but you can envision how an application would use the same logic. Open two new query windows in Management Studio, connected to the HR database, and run the code in listing 8.

Listing 8. Determining (and updating) the baseline version of a table
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

-- Check the current baseline:
SELECT Baseline = CHANGE_TRACKING_CURRENT_VERSION();

-- Load the current version of the table:
SELECT EmployeeID, FirstName, LastName, Salary FROM dbo.Employees;

-- Now, switch to the second query window, and make some updates to the
table:

UPDATE dbo.Employees SET LastName = 'Kinison' WHERE EmployeeID = 2;

DELETE dbo.Employees WHERE EmployeeID = 5;

INSERT dbo.Employees
(
EmployeeID, FirstName, LastName, Salary
)
SELECT
6, 'Kirby', 'Quigley', 62500;

Listing 9 shows the code to retrieve the changes made to the Employees table. Replace <x> with the result from the baseline query in listing 8.

Listing 9. Retrieving changes to the Employees table
SELECT
NewBaseLine = CHANGE_TRACKING_CURRENT_VERSION(),
cv = SYS_CHANGE_VERSION,
ccv = SYS_CHANGE_CREATION_VERSION,
op = SYS_CHANGE_OPERATION,
EmployeeID
FROM CHANGETABLE(CHANGES dbo.Employees, <x>) AS ChT;

The results should look something like this:

NewBaseLine cv ccv   op EmployeeID
3 3 NULL U 2
3 2 NULL D 5
3 1 1 I 6

Now, the application can use this output to determine which of the following it needs to do:

  • Reload the data from the row for EmployeeID 2, because it has been updated since the last time it was loaded.
  • Load the new row for EmployeeID 6, because it has been created since the table was last loaded.
  • Remove the row for EmployeeID 5 from its local cache, because it has been deleted.
  • Record the new baseline value (3) as it will need to use this as the version to check against the next time it polls for changes.

Note that change tracking does not record any information about the user who made the change. It only records the fact that a row changed. But as with SQL Server Audit, there are ways around this limitation. SQL Server 2008 supports new syntax to allow change tracking to add some contextual information to the DML statement, allowing that data to be stored along with other details of the change. This was intended to provide an application the ability to differentiate between its own updates from those of other applications, but you can use it for whatever other devious purposes you can dream up. For example, as shown in listing 10, you can easily add information such as host name and user name using an INSTEAD OF UPDATE TRIGGER, by utilizing the new WITH CHANGE_TRACKING_CONTEXT() construct, in order to store information about users performing updates to your table.

Listing 10. Using WITH CHANGE_TRACKING_CONTEXT() in an INSTEAD OF trigger
CREATE TRIGGER dbo.AppendEmployeeUpdates
ON dbo.Employees
INSTEAD OF UPDATE
AS
BEGIN
SET NOCOUNT ON;

DECLARE @i VARBINARY(128);

SET @i = CONVERT
(
VARBINARY(128), SUSER_SNAME() + '|' + HOST_NAME()
);

WITH CHANGE_TRACKING_CONTEXT (@i)
UPDATE e
SET e.FirstName = i.FirstName,
e.LastName = i.LastName,
e.Salary = i.Salary
FROM dbo.Employees e
INNER JOIN inserted i
ON e.EmployeeID = i.EmployeeID;
END
GO

In this case, because you are not tracking individual column updates, you don’t have to worry about only updating those columns that have changed. If you do implement a solution where individual columns matter, you might want more complex logic such that the trigger only touches the base table columns that should now contain different values. And for an even more bulletproof trigger, you would also want to handle the case where the primary key might change (even though, in theory, this should never happen). You could do this in a stored procedure instead, if you can prevent direct updates to the table itself, and enforce all access via stored procedures. That is possible in some environments, but not all.

Once the trigger is in place, you can run the following UPDATE statement:

UPDATE dbo.Employees SET LastName = 'Malone' WHERE EmployeeID = 2;

And now when you call the CHANGETABLE function, as shown in listing 11, you can add a new column that will return that contextual information (assuming the existing baseline was 3 after the above statements).

Listing 11. Calling the CHANGETABLE function
SELECT
NewBaseLine = CHANGE_TRACKING_CURRENT_VERSION(),
[user|host] = CONVERT(NVARCHAR(128), SYS_CHANGE_CONTEXT),
cv = SYS_CHANGE_VERSION,
ccv = SYS_CHANGE_CREATION_VERSION,
op = SYS_CHANGE_OPERATION,
EmployeeID
FROM CHANGETABLE(CHANGES dbo.Employees, 3) AS ChT;

-- results:
NewBaseLine user|host cv ccv op EmployeeID
4 SENTINELAaron|SENTINEL 11 NULL U 2

Arguably, you could also use the trigger to store the old and new values off in a table somewhere for deferred analysis. But that would require you to manually create tables to capture all of that information, and come up with your own cleanup mechanism. And, without spoiling any surprises, you would be duplicating the functionality of another feature added in SQL Server 2008.

Before proceeding, you can disable change tracking on the HR database and the dbo.Employees table using the code in listing 12.

Listing 12. Disabling change tracking
USE [HR];
GO

ALTER TABLE dbo.Employees
DISABLE CHANGE_TRACKING;

ALTER DATABASE HR
SET CHANGE_TRACKING = OFF;

Change data capture

Change data capture (CDC) is similar to change tracking in that it captures information about changes to data. But the information it captures (and how) is significantly different. Instead of capturing the primary key for each row that has changed, it records the data that has changed, for all columns, or for the subset of columns you specify. It records all of the data for INSERTs and DELETEs, and in the case of UPDATEs, it records both the before and after image of the row. And it does this by periodically retrieving data from the SQL transaction log, so the activity does not interfere directly with your OLTP processes. It does require that SQL Server Agent is enabled and running.

The primary motivation for including CDC in SQL Server 2008 was to facilitate an easier process for extract, transform, and load (ETL) applications. Making all of the changed data available separately allows the application to pull only the updated data, without having to go to the base tables for the data (or to verify timestamp columns or perform expensive joins to determine deltas). You can investigate CDC in much more depth starting with the Books Online topic, “Overview of Change Data Capture,” located at http://msdn.microsoft.com/en-us/library/cc627397.aspx.

To set up CDC, you must be running Enterprise or Developer Edition, and you must enable it at the database level first, and then for each table you want to capture. Note that unlike SQL Server Audit and change tracking, CDC features are enabled and disabled via system stored procedure calls. Using the same HR database and dbo.Employees table as in previous sections, listing 13 shows the commands necessary to start capturing data changes.

Listing 13. Enabling a database and table for change tracking
USE HR;
GO

EXEC sys.sp_cdc_enable_db;
GO

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Employees',
@supports_net_changes = 1,
@role_name = NULL;
GO

The first two parameters to the enable_table stored procedure are self-explanatory, but the last two are not. The @supports_net_changes parameter dictates whether the changed data can be retrieved as a data set that includes one row per key value, summarizing all of the changes that took place in the indicated timeframe (in this way, it works similarly to change tracking, but you will also see the data in each column in addition to the primary key value). Note that to support net changes, the source table must have a primary key or a unique index defined. You will still be able to investigate each individual change, but if you look at the net, this will allow you to perform one DML statement on the target instead of multiple, in the case where your extract, transformextract,transform, and load (ETL) program is replicating changes to another system. The @role_name parameter is used to specify who can access the changed data table. This can be a fixed server role, a database role, or left as NULL (in which case, sysadmin and db_owner have full access, and other users inherit their permissions from the base table).

The sys.sp_cdc_enable_table stored procedure has five other optional parameters. One is called @captured_column_list, which allows you to capture only changes to a specific subset of the columns. For example, you may not want to capture VARCHAR(MAX) or VARBINARY(MAX) contents, when all that has changed is a BIT column. The other is @filegroup_name, which lets you place the captured data on a filegroup other than PRIMARY/DEFAULT. The other three are @capture_instance, which allows you to specify a name for your CDC instance (because you can have multiple captures on the same table); @index_name, allowing you to specify an unique index instead of the primary key; and @allow_partition_switch, which lets you dictate whether partition switches are allowed against the source table. The @capture_instance parameter can be particularly useful in preventing the system from trying to create conflicting names for the capture instance table. For example, if you have a table called dbo_foo.bar and another table called dbo.foo_bar, enabling both for CDC, without specifying a value for @capture_instance, will fail. This is because CDC tries to name both capture tables “dbo_foo_bar.” Although this is a fairly contrived case, if in doubt, use the @capture_instance parameter to ensure you have unique names.

To retrieve information about data changes to a table, you use the new CDC functions cdc.fn_cdc_get_all_changes_<capture_instance> and, if you have enabled net changes, cdc.fn_cdc_get_net_changes_<capture_instance>. These procedures require from and to parameters, but they are not based on time; instead you must determine the range of log sequence numbers (LSNs) that you wish to query. To obtain this information, you can use the function sys.fn_cdc_map_time_to_lsn.

Now that CDC is enabled for the dbo.Employees table (make sure once again that SQL Server Agent is running), you can make some changes to the data, and see how you (or your applications) might query for the individual or net changes. Run the DML statements in listing 14.

Listing 14. Inserting data into the Employees table
SELECT CURRENT_TIMESTAMP;

INSERT dbo.Employees
(
EmployeeID,
FirstName,
LastName,
Salary
)
SELECT
7,
'Howard',
'Jones',
80000;

UPDATE dbo.Employees SET LastName = 'Donaldson' WHERE EmployeeID = 3;

UPDATE dbo.Employees SET Salary = Salary * 2 WHERE EmployeeID = 4;

DELETE dbo.Employees WHERE EmployeeID = 6;

UPDATE dbo.Employees SET LastName = 'Stern' WHERE EmployeeID = 7;

UPDATE dbo.Employees SET LastName = 'Malone' WHERE EmployeeID = 3;

Be sure to copy the result from the very first line in the query. You will need this to determine the range of LSNs you will need to pull from the CDC table. Now you can run the query in listing 15.

Listing 15. Query against (and results from) a change data capture function
DECLARE
@start DATETIME,
@end DATETIME,
@lsn_A BINARY(10),
@lsn_B BINARY(10);

SELECT
@start = '<plug in the value from above>',
@end = CURRENT_TIMESTAMP,
@lsn_A = sys.fn_cdc_map_time_to_lsn('smallest greater than', @start),
@lsn_B = sys.fn_cdc_map_time_to_lsn('largest less than', @end);

SELECT operation = CASE __$operation
WHEN 1 THEN 'D'
WHEN 2 THEN 'I'
WHEN 4 THEN 'U' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.fn_cdc_get_all_changes_dbo_Employees(@lsn_A, @lsn_B, 'all'),

-- result:
Operation EmployeeID FirstName LastName Salary
I 7 Howard Jones 80000.00
U 3 Don Donaldson 125000.00
U 4 Teemu Selanne 227000.00
D 6 Kirby Quigley 62500.00
U 7 Howard Stern 80000.00
U 3 Don Malone 125000.00

This result set does not include the before images of rows affected by UPDATE statements, because it is intended to be used to make a target data source look like the source. Hopefully you can see here that it would be easy to reconstruct all of the DML statements, in order to apply the same changes to another table that looked identical to this one before you made changes. If you change the final SELECT to use the get_net_changes function instead, as shown in listing 16, you can see that the set is compressed. Only the values necessary to make the target table look like the source (with one row per key) are included.

Listing 16. Using the get_net_changes function
SELECT operation = CASE __$operation
WHEN 1 THEN 'D'
WHEN 2 THEN 'I'
WHEN 4 THEN 'U' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.fn_cdc_get_net_changes_dbo_Employees(@lsn_A, @lsn_B, 'all'),

-- result:
Operation EmployeeID FirstName LastName Salary
U 4 Teemu Selanne 227000.00
D 6 Kirby Quigley 62500.00
I 7 Howard Stern 80000.00
U 3 Don Malone 125000.00

And finally, as per listing 17, you can see the before and after image of each key row throughout all updates by looking directly at the CDC table.

Listing 17. Viewing the before and after image of each key row
SELECT [image] = CASE __$operation
WHEN 3 THEN 'BEFORE'
WHEN 4 THEN 'AFTER' ELSE NULL END,
EmployeeID, FirstName, LastName, Salary
FROM cdc.dbo_Employees_CT
WHERE __$operation IN (3,4)
ORDER BY __$start_lsn, __$operation;

-- result:
Image EmployeeID FirstName LastName Salary
BEFORE 3 Don Mattingly 125000.00
AFTER 3 Don Donaldson 125000.00
BEFORE 4 Teemu Selanne 113500.00
AFTER 4 Teemu Selanne 227000.00
BEFORE 7 Howard Jones 80000.00
AFTER 7 Howard Stern 80000.00
BEFORE 3 Don Donaldson 125000.00
AFTER 3 Don Malone 125000.00

One challenge you might come across is when your schema changes. In this case you will need to disable CDC for the table and re-enable it when the change is complete. CDC will not break without this action, but if you add, remove, or rename columns, your captured data will be incomplete.

Also, because change tracking and SQL Server Audit are synchronous, and CDC polls the transaction log after the fact, it is not so straightforward to capture the username responsible for the change. If this is an important part of your solution, then you are probably better off sticking to one of the other features discussed in this chapter, or resorting to more traditional means (for example, triggers, log reading utilities).

To clean up the CDC settings, you can use the code in listing 18.

Listing 18. Cleaning up change data capture settings
USE HR;
GO
EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'Employees',
@capture_instance ='dbo_Employees';
GO
EXEC sys.sp_cdc_disable_db;
GO

Comparison of features

At first glance, these three new features in SQL Server 2008 seem quite similar. As demonstrated here, their functionality may overlap in some cases, but they are clearly different and serve unique purposes. This treatment should help equip you with much of the information you will need to decide which feature you will need to use. To wrap up, table 1 should help you decide whether to use SQL Server Audit, change tracking, or CDC.

Table 1. Comparing SQL Server Audit, change tracking, and change data capture

Criteria

SQL Server Audit

Change tracking

Change data capture

Capture DML statements

Yes1

No

No

Capture result of DML statements

No

Yes

Yes

Capture before and after values

No

No

Yes

Capture intermediate values

No

No

Yes

Capture SELECT statements

Yes

No

No

Capture user name / spid

Yes

Yes4

No

Capture host name

Yes2

Yes4

No

Should use specific isolation level

No

Yes5

No

Require SQL Server Agent

No

No

Yes

Available in all SKUs

No3

Yes

No3

  1. You can see a tokenized copy of the DML statement, but the values in the statement are replaced by parameter placeholders.
  2. You can capture host name in a separate login audit event, then correlate it manually with the event in question.
  3. This feature is available in Enterprise, Evaluation, and Developer Editions only.
  4. You can capture this information using a trigger to affect the context information included with the change tracking data.
  5. Using snapshot isolation level can significantly impact tempdb usage and performance. Additionally, this may be a concern if you use distributed transactions, change schema frequently, disable constraints when bulk loading, or take databases offline (for example, detach or auto-close). You should read up on snapshot isolation level in Books Online: http://msdn.microsoft.com/en-us/library/ms177404(SQL.100).aspx

Summary

SQL Server 2008 provides a healthy offering of features that can assist you in tracking and dealing with changes to your data and schema. My goal for this chapter was to provide a useful and practical guide to help you decide how these features might help solve data management issues in your environment. Hopefully this will give you a good starting point on implementing one or more of these features where you need it most.

About the author

Aaron Bertrand is the Senior Data Architect at One to One Interactive, a global marketing agency headquartered in Boston, Massachusetts. At One to One, Aaron is responsible for database design and application architecture. Due to his commitment to the community, shown through blogging at http://www.sqlblog.com, peer-to-peer support on forums and newsgroups, and speaking at user group meetings and code camps, he has been awarded as a Microsoft MVP since 1998. Aaron recently published a technical white paper for Microsoft, detailing how to use the new Resource Governor feature in SQL Server 2008.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset