CLR Integration Programming
One of the most prominent enhancements to SQL Server 2005 was the introduction of the integrated SQL Common Language Runtime, that was named SQL CLR at that time. What is now called CLR integration is an SQL Server-specific version of the .NET Common Language Runtime, which allows you to run .NET managed code in the database. CLR integration programming is a broad subject that could easily fill an entire book, and in fact it does—Pro SQL Server 2005 Assemblies, by Robin Dewson and Julian Skinner (Apress, 2005), is an excellent resource for in-depth coverage of CLR integration programming. In this chapter, we’ll discuss the methods used to extend SQL Server functionality in the past, and explain the basics of the CLR integration programming model in SQL Server 2012.
The Old Way
In versions of SQL Server prior to the 2005 release, developers could extend SQL Server functionality by writing extended stored procedures (XPs). Writing high-quality XPs required a strong knowledge of the Open Data Services (ODS) library and the poorly documented C-style Extended Stored Procedure API. Anyone who attempted the old style of XP programming can tell you it was a complex undertaking, in which a single misstep could easily result in memory leaks and corruption of the SQL Server process space. Additionally, the threading model used by XPs required SQL Server to rely on the operating system to control threading within the XP. This could lead to many issues, such as unresponsiveness of XP code.
Caution XPs have been deprecated since SQL Server 2005. Use CLR integration instead of XPs for SQL Server 2012 development.
Earlier SQL Server releases also allowed you to create OLE Automation server objects via the spOACreate SP. Creating OLE Automation servers can be complex and awkward as well. OLE Automation servers created with spOACreate can result in memory leaks and in some instances corruption of the SQL Server process space.
Another option in previous versions of SQL Server was to code all business logic exclusively in physically separate business objects. While this method is preferred by many developers and administrators, it can result in extra network traffic and a less robust security model than can be achieved through tight integration with the SQL Server security model.
The CLR Integration Way
The CLR integration programming model provides several advantages over older methods of extending SQL Server functionality via XPs, OLE Automation, or external business objects. These advantages include the following:
There’s a misperception expressed by some that CLR integration is a replacement for T-SQL altogether. CLR integration is not a replacement for T-SQL, but rather a supplement that works hand in hand with T-SQL to make SQL Server 2012 more powerful than ever. So when should you use CLR code in your database? There are no hard and fast rules concerning this, but here are some general guidelines:
On the flip side, here are some general guidelines for items that should not be converted to CLR integration assemblies:
As with T-SQL SPs, the decision on whether and to what extent CLR integration will be used in your databases depends on your needs, including organizational policies and procedures. The recommendations we have presented here are guidelines of instances that can make good business cases for conversion of existing code and creation of new code.
CLR Integration Assemblies
CLR integration exposes .NET managed code to SQL Server via assemblies. An assembly is a compiled .NET managed code library that can be registered with SQL Server using the CREATE ASSEMBLY statement. Publicly accessible members of classes within the assemblies are then referenced in the appropriate CREATE statements, which we will describe later in this chapter. Creating a CLR integration assembly requires:
CLR integration provides additional SQL Server-specific namespaces, classes, and attributes to facilitate development of assemblies. Visual Studio 2010 and Visual Studio 11 (in beta at the time of this writing) also include an SQL Server project type that assists in quickly creating assemblies. Also, to maximize your SQL Server development possibilities with Visual Studio, install the SQL Server Data Tools (SSDT) from the Microsoft Data Developer Center website (http://msdn.microsoft.com/en-us/data/tools.aspx) which provide an integrated environment for database developers inside Visual Studio by allowing you to create and manage database objects and data and to execute T-SQL queries directly.
Perform the following steps to create a new assembly using Visual Studio 2010:
Figure 14-1. Visual Studio 2010 New Project Dialog Box
Figure 14-2. The Add Database Reference Dialog Box
Figure 14-3. Adding a New CLR Integration Class to Your Project
In addition to the standard .NET namespaces and classes, CLR integration implements some SQL Server-specific namespaces and classes to simplify interfacing your code with SQL Server. Some of the most commonly used namespaces include the following:
Once the assembly is created and compiled, it is registered with SQL Server via the CREATE ASSEMBLY statement. Listing 14-1 demonstrates a CREATE ASSEMBLY statement that registers a CLR integration assembly with SQL Server from an external DLL file. The DLL file used in the example is not supplied in precompiled form in the sample downloads for this book, but you can compile it yourself from the code we will introduce in Listing 14-2. Source code is included in the sample downloads available on the Apress web site. As CLR integration is not enabled by default, we also need to enable it at the server level. Here, we do it using the sp_configure system stored procedure prior to running the CREATE ASSEMBLY statement. CREATE ASSEMBLY would succeed even if CLR integration is disabled; an error would be raised by SQL Server only when a CLR integration code module would be called by a user later on. The RECONFIGURE statement applies the configuration change immediately.
Listing 14-1. Registering a CLR Integration Assembly with SQL Server
EXEC sp_configure 'CLR Enabled';
RECONFIGURE;
CREATE ASSEMBLY ApressExamples
AUTHORIZATION dbo
FROM N'C:MyApplicationApress.Examples.DLL'
WITH PERMISSION_SET = SAFE;
GO
The CREATE ASSEMBLY statement in the example specifies an assembly name of EmailUDF. This name must be a valid SQL Server identifier, and it must be unique within the database. You will use this assembly name when referencing the assembly in other statements.
The AUTHORIZATION clause specifies the owner of the assembly, in this case dbo. If you leave out the AUTHORIZATION clause, it defaults to the current user.
The FROM clause in this example specifies the full path to the external DLL file. Alternatively, you can specify a varbinary value instead of character file name. If you use a varbinary value, SQL Server uses it, as it is a long binary string representing the compiled assembly code, and no external file needs to be specified.
Finally, the WITH PERMISSION_SET clause grants a set of Code Access Security (CAS) permissions to the assembly. Valid permission sets include the following:
After the assembly is installed, you can use variations of the T-SQL database object creation statements (e.g., CREATE FUNCTION, CREATE PROCEDURE) to access the methods exposed by the assembly classes. We will demonstrate these statements individually in the following sections.
CLR integration UDFs that return scalar values are similar to standard .NET functions. The primary differences with standard .NET functions are that the SqlFunction attribute must be applied to the main function of CLR integration functions if you are using Visual Studio to deploy your function or if you need to set additional attribute values like IsDeterministic and DataAccess. Listing 14-2 demonstrates a scalar UDF that accepts an input string value and a regular expression pattern and returns a bit value indicating a match (1) or no match (0). The UDF is named EmailMatch() and is declared as a method of the UDFExample class, inside the Apress.Example namespace that we will use for all our examples in this chapter.
Listing 14-2. Regular Expression Match UDF
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
namespace Apress.Examples
{
public static class UDFExample
{
private static readonly Regex email_pattern = new Regex
(
// Everything before the @ sign (the "local part")
"^[a-z0-9!#$%&'*+/=?^_'{|} ∼ −] + (?:\.[a-z0-9!#$%&'*+/=?^_'{|} ∼ −]+)*" +
// Subdomains after the @ sign
"@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.) + " +
// Top-level domains
"(?:[a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b$"
);
[Microsoft.SqlServer.Server.SqlFunction
(
IsDeterministic = true
)]
public static SqlBoolean EmailMatch(SqlString input)
{
SqlBoolean result = new SqlBoolean();
if (input.IsNull)
result = SqlBoolean.Null;
else
result = (email_pattern.IsMatch(input.Value.ToLower()) == true)
? SqlBoolean.True : SqlBoolean.False;
return result;
}
}
}
The first part of the listing specifies the required namespaces to import. This UDF uses the System.Data.SqlTypes and System.Text.RegularExpressions namespaces.
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
The UDFExample class and the EmailMatch function it exposes are both declared static. CLR integration functions need to be declared as static. A static function is shared between all instances of the class. Here, the class itself is also static, so it cannot be instantiated; this allows the class to be loaded faster and its memory to be shared between SQL Server sessions. The function is decorated with the Microsoft.SqlServer.Server.SqlFunction attribute with the IsDeterministic property set to true to indicate the function is a deterministic CLR integration method. The function body is relatively simple. It accepts an SqlString input string value. If the input string is NULL, the function returns NULL; otherwise the function uses the .NET Regex.IsMatch function to perform a regular expression match. If the result is a match, the function returns a bit value of 1; otherwise it returns 0.
public static class UDFExample
{
private static readonly Regex email_pattern = new Regex
(
// Everything before the @ sign (the "local part")
"^[a-z0-9!#$%&'*+/=?^_'{|} ∼ −] + (?:\.[a-z0-9!#$%&'*+/=?^_'{|} ∼ −]+)*" +
// Subdomains after the @ sign
"@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.) + " +
// Top-level domains
"(?:[a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b$"
);
[Microsoft.SqlServer.Server.SqlFunction
(
IsDeterministic = true
)]
public static SqlBoolean EmailMatch(SqlString input)
{
SqlBoolean result = new SqlBoolean();
if (input.IsNull)
result = SqlBoolean.Null;
else
result = (email_pattern.IsMatch(input.Value.ToLower()) == true)
? SqlBoolean.True : SqlBoolean.False;
return result;
}
}
The regular expression pattern used in Listing 14-2 was created by Jan Goyvaerts of Regular-Expressions.info (www.regular-expressions.info). Jan’s regular expression validates e-mail addresses according to RFC 2822, the standard for e-mail address formats. While not perfect, Jan estimates that this regular expression matches over 99 percent of “e-mail addresses in actual use today.” Performing this type of e-mail address validation using only T-SQL statements would be cumbersome, complex, and inefficient.
Tip It’s considered good practice to use the SQL Server data types for parameters and return values to CLR Integration methods (SqlString, SqlBoolean, SqlInt32, etc.). Standard .NET data types have no concept of SQL NULL and will error out if NULL is passed in as a parameter, calculated within the function, or returned from the function.
After the assembly is installed via the CREATE ASSEMBLY statement we wrote in Listing 14-1, the function is created with the CREATE FUNCTION statement using the EXTERNAL NAME clause, as shown in Listing 14-3.
Listing 14-3. Creating CLR UDF from Assembly Method
CREATE FUNCTION dbo.EmailMatch (@input nvarchar(4000))
RETURNS bit
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME ApressExamples.[Apress.Examples.UDFExample].EmailMatch
GO
After this, the CLR function can be called like any other T-SQL UDF, as shown in Listing 14-4. The results are shown in Figure 14-4.
Listing 14-4. Validating E-mail Addresses with Regular Expressions
SELECT
'[email protected]' AS Email,
dbo.EmailMatch (N'[email protected]') AS Valid
UNION
SELECT
'123@456789',
dbo.EmailMatch('123@456789')
UNION
SELECT '[email protected]',
dbo.EmailMatch('[email protected]'),
Figure 14-4. Results of E-mail Address Validation with Regular Expressions
Tip Normally you can automate the process of compiling your assembly, registering it with SQL Server, and installing the CLR Integration UDF with Visual Studio’s Build Deploy option. You can also test the CLR Integration UDF with the Visual Studio Debug Start Debugging option. With Visual Studio 2010 this does not work, as it does not recognize SQL Server 2012, wich was released after Visual Studio. In Visual Studio 11, available in beta at the time of this writing, you should be able to deploy the assembly with Visual Studio. This is just a detail; it is straightforward to copy the assembly on the server and register it manually with CREATE ASSEMBLY as shown in Listing 14-1.
As we mentioned previously, CLR UDFs also allow tabular results to be returned to the caller. This example demonstrates another situation in which CLR integration can be a useful supplement to T-SQL functionality—accessing external resources such as the file system, network resources, or even the Internet. Listing 14-5 uses a CLR function to retrieve the Yahoo Top News Stories RSS feed and return the results as a table. Table-valued CLR UDFs are a little more complex than scalar functions. The following code could be added to the sqme Visual Studio project that we created for the first CLR function example. Here we create another class named YahooRSS.
Listing 14-5. Retrieving Yahoo RSS Feed Top News Stories
using System;
using System.Collections;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Xml;
namespace Apress.Examples {
public partial class YahooRSS {
[Microsoft.SqlServer.Server.SqlFunction (
IsDeterministic = false,
DataAccess = DataAccessKind.None,
TableDefinition = "title nvarchar(256),"
+ "link nvarchar(256), "
+ "pubdate datetime, "
+ "description nvarchar(max)",
FillRowMethodName = "GetRow" )
]
public static IEnumerable GetYahooNews() {
XmlTextReader xmlsource =
new XmlTextReader("http://rss.news.yahoo.com/rss/topstories");
XmlDocument newsxml = new XmlDocument();
newsxml.Load(xmlsource);
xmlsource.Close();
return newsxml.SelectNodes("//rss/channel/item");
}
private static void GetRow (
Object o,
out SqlString title,
out SqlString link,
out SqlDateTime pubdate,
out SqlString description )
{
XmlElement element = (XmlElement)o;
title = element.SelectSingleNode("./title").InnerText;
link = element.SelectSingleNode("./link").InnerText;
pubdate = DateTime.Parse(element.SelectSingleNode("./pubDate").InnerText);
description = element.SelectSingleNode("./description").InnerText;
}
}
}
Before we step through the source listing, we need to address security since this function accesses the Internet. Because the function needs to access an external resource, it requires EXTERNAL_ACCESS permissions. In order to deploy a non-SAFE assembly, one of two sets of conditions must be met:
To meet the first set of requirements:
Listing 14-6. CREATE ASSEMBLY with EXTERNAL_ACCESS Permission Set
CREATE ASSEMBLY ApressExample
AUTHORIZATION dbo
FROM N'C:MyApplicationApress.Example.DLL'
WITH PERMISSION_SET = EXTERNAL_ACCESS;
As mentioned previously, signing assemblies is beyond the scope of this book. Additional information on signing assemblies can be found in this MSDN Data Access Technologies blog entry: http://blogs.msdn.com/b/dataaccesstechnologies/archive/2011/10/29/deploying-sql-clr-assembly-using-asymmetric-key.aspx.
The code listing begins with the using statements. This function requires the addition of the System.Xml namespace in order to parse the RSS feed, and the System.Collections namespace to allow the collection to be searched amongst other functionality specific to collections.
using System;
using System.Collections;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Xml;
The primary public function again requires that the SqlFunction attribute be declared. This time there are several additional attributes that need to be declared with it:
[Microsoft.SqlServer.Server.SqlFunction (
IsDeterministic = false,
DataAccess = DataAccessKind.None,
TableDefinition = "title nvarchar(256),"
+ "link nvarchar(256), "
+ "pubdate datetime, "
+ "description nvarchar(max)",
FillRowMethodName = "GetRow" )
]
public static IEnumerable GetYahooNews()
{
XmlTextReader xmlsource =
new XmlTextReader("http://rss.news.yahoo.com/rss/topstories");
XmlDocument newsxml = new XmlDocument();
newsxml.Load(xmlsource);
xmlsource.Close();
return newsxml.SelectNodes("//rss/channel/item");
}
We specifically set the IsDeterministic attribute to false this time to indicate that the contents of an RSS feed can change between calls, making this UDF nondeterministic. Since the function does not read data from system tables using the in-process data provider, the DataAccess attribute is set to DataAccessKind.None. This CLR TVF also sets the additional TableDefinition attribute defining the structure of the result set for Visual Studio. It also needs the FillRowMethodName attribute to designate the fill-row method. The fill-row method is a user method that converts each element of an IEnumerable object into an SQL Server result set row.
The public function is declared to return an IEnumerable result. This particular function opens an XmlTextReader that retrieves the Yahoo Top News Stories RSS feed and stores it in an XmlDocument. The function then uses the SelectNodes method to retrieve news story summaries from the RSS feed. The SelectNodes method, used to return results from the function, generates an XmlNodeList. The XmlNodeList class implements the IEnumerable interface. This is important since the fill-row method is fired once for each object returned by the IEnumerable collection returned (in this case the XmlNodeList).
The GetRow method is declared as a C# void function, which means that no value will be returned by the function; the method communicates with SQL Server via its out parameters. The first parameter is an Object passed by value—in this case an XmlElement. The remaining parameters correspond to the columns of the result set. The GetRow method casts the first parameter to an XmlElement (the parameter cannot be directly an XmlElement because the fill-row method signature must have an Object as first parameter). It then uses the SelectSingleNode method and InnerText property to retrieve the proper text from individual child nodes of the XmlElement, assigning each to the proper columns of the result set along the way.
private static void GetRow (
Object o,
out SqlString title,
out SqlString link,
out SqlDateTime pubdate,
out SqlString description )
{
XmlElement element = (XmlElement)o;
title = element.SelectSingleNode("./title").InnerText;
link = element.SelectSingleNode("./link").InnerText;
pubdate = DateTime.Parse(element.SelectSingleNode("./pubDate").InnerText);
description = element.SelectSingleNode("./description").InnerText;
}
The CLR TVF can be called with a SELECT query, as shown in Listing 14-7. The results are shown in Figure 14-5.
Listing 14-7. Querying a CLR Integration TVF
CREATE FUNCTION dbo.GetYahooNews()
RETURNS TABLE(title nvarchar(256), link nvarchar(256), pubdate datetime, description nvarchar(max))
AS EXTERNAL NAME ApressExamples.[Apress.Examples.YahooRSS].GetYahooNews
GO
SELECT
title,
link,
pubdate,
description
FROM dbo.GetYahooNews();
Figure 14-5. Retrieving the Yahoo RSS Feed with the GetYahooNews() Function
CLR integration SPs provide an alternative to extend SQL Server functionality when T-SQL SPs just won’t do. Of course, like other CLR integration functionality, there is a certain amount of overhead involved with CLR SPs, and you can expect them to be less efficient than comparable T-SQL code for set-based operations. On the other hand, if you need to access .NET functionality or external resources, or if you have code that is computationally intensive, CLR integration SPs can provide an excellent alternative to straight T-SQL code.
Listing 14-8 shows how to use CLR integration to retrieve operating system environment variables and return them as a recordset via an SP. In the Apress.Examples namespace, we create a SampleProc class.
Listing 14-8. Retrieving Environment Variables with a CLR Stored Procedure
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace Apress.Examples
{
public partial class SampleProc
{
[Microsoft.SqlServer.Server.SqlProcedure()]
public static void GetEnvironmentVars()
{
try
{
SortedList environment_list = new SortedList();
foreach (DictionaryEntry de in Environment.GetEnvironmentVariables())
{
environment_list[de.Key] = de.Value;
}
SqlDataRecord record = new SqlDataRecord (
new SqlMetaData("VarName", SqlDbType.NVarChar, 1024),
new SqlMetaData("VarValue", SqlDbType.NVarChar, 4000)
);
SqlContext.Pipe.SendResultsStart(record);
foreach (DictionaryEntry de in environment_list)
{
record.SetValue(0, de.Key);
record.SetValue(1, de.Value);
SqlContext.Pipe.SendResultsRow(record);
}
SqlContext.Pipe.SendResultsEnd();
}
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message);
}
}
}
};
As with the previous CLR integration examples, appropriate namespaces are imported at the top:
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
The GetEnvironmentVars() method is declared as a public void function. The SqlProcedure() attribute is applied to the function in this code to indicate to Visual Studio that this is a CLR SP. The body of the SP is wrapped in a try . . . catch block to capture any .NET exceptions, which are returned to SQL Server. If an exception occurs in the .NET code, it’s sent back to SQL Server via the SqlContext.Pipe.Send method.
public partial class SampleProc
{
[Microsoft.SqlServer.Server.SqlProcedure()]
public static void GetEnvironmentVars()
{
try
{
...
}
catch (Exception ex)
{
SqlContext.Pipe.Send(ex.Message);
}
}
}
};
THROWING READABLE EXCEPTIONS
When you need to raise an exception in a CLR SP, you have two options. For code readability reasons, I’ve chosen the simpler option of just allowing exceptions to bubble up through the call stack. This results in .NET Framework exceptions being returned to SQL Server. The .NET Framework exceptions return a lot of extra information, like call stack data, however.
If you want to raise a nice, simple SQL Server-style error without all the extra .NET Framework exception information, you can use a method introduced in the book Pro SQL Server 2005, by Thomas Rizzo et al. (Apress, 2005). This second method involves using the ExecuteAndSend() method of the SqlContext.Pipe to execute a T-SQL RAISERROR statement. This method is shown in the following C# code snippet:
try {
SqlContext.Pipe.ExecuteAndSend("RAISERROR ('This is a T-SQL Error', 16, 1);");
}
catch
{
// do nothing
}
The ExecuteAndSend() method call executes the RAISERROR statement on the current context connection. The try...catch block surrounding the call prevents the .NET exception generated by the RAISERROR to be handled by .NET and reported as a new error. Keep this method in mind if you want to raise SQL Server-style errors instead of returning the verbose .NET Framework exception information to SQL Server.
As the procedure begins, all of the environment variable names and their values are copied from the .NET Hashtable returned by the Environment.GetEnvironmentVariables() functions to a .NET SortedList. In this procedure, we chose to use the SortedList to ensure that the results are returned in order by key. I added the SortedList just for display purposes, but it’s not required. Greater efficiency can be gained by iterating the HashTable directly without a SortedList.
SortedList environment_list = new SortedList();
foreach (DictionaryEntry de in Environment.GetEnvironmentVariables())
{
environment_list[de.Key] = de.Value;
}
The procedure uses the SqlContext.Pipe to return results to SQL Server as a result set. The first step to using the SqlContext.Pipe to send results back is to set up an SqlRecord with the structure that you wish the result set to take. For this example, the result set consists of two nvarchar columns: VarName, which contains the environment variable names; and VarValue, which contains their corresponding values.
SqlDataRecord record = new SqlDataRecord (
new SqlMetaData("VarName", SqlDbType.NVarChar, 1024),
new SqlMetaData("VarValue", SqlDbType.NVarChar, 4000)
);
Next, the function calls the SendResultsStart() method with the SqlDataRecord to initialize the result set:
SqlContext.Pipe.SendResultsStart(record);
Then it’s a simple matter of looping through the SortedList of environment variable key/value pairs and sending them to the server via the SendResultsRow() method:
foreach (DictionaryEntry de in environment_list) {
record.SetValue(0, de.Key);
record.SetValue(1, de.Value);
SqlContext.Pipe.SendResultsRow(record);
}
The SetValue() method is called for each column of the SqlRecord to properly set the results, and then SendResultsRow() is called for each row. After all results have been sent to the client, the SendResultsEnd() method of the SqlContext.Pipe is called to complete the result set and return the SqlContext.Pipe to its initial state.
SqlContext.Pipe.SendResultsEnd();
The GetEnvironmentVars CLR SP can be called using the T-SQL EXEC statement, shown in Listing 14-9. The results are shown in Figure 14-6.
Listing 14-9. Executing the GetEnvironmentVars CLR Procedure
CREATE PROCEDURE dbo.GetEnvironmentVars
AS EXTERNAL NAME ApressExamples.[Apress.Examples.SampleProc].GetEnvironmentVars;
GO
EXEC dbo.GetEnvironmentVars;
Figure 14-6. Retrieving Environment Variables with CLR
User-defined aggregates (UDAs) are an exciting addition to SQL Server’s functionality. UDAs are similar to the built-in SQL aggregate functions (SUM, AVG, etc.) in that they can act on entire sets of data at once, as opposed to one item at a time. An SQL CLR UDA has access to .NET functionality and can operate on numeric, character, date/time, or even user-defined data types. A basic UDA has four required methods:
Tip In SQL Server 2005, there was a serialization limit of 8,000 bytes for an instance of a SQL CLR UDA, making certain tasks harder to perform using a UDA. For instance, creating an array, hash table, or other structure to hold intermediate results during an aggregation (like aggregates that calculate statistical mode or median) could cause a UDA to very quickly run up against the 8,000-byte limit and throw an exception for large datasets. SQL Server 2008 and 2012 do not have this limitation.
Creating a Simple UDA
The sample UDA in Listing 14-10 determines the statistical range for a set of numbers. The statistical range for a given set of numbers is the difference between the minimum and maximum values for the set. The UDA determines the minimum and maximum values of the set of numbers passed in and returns the difference.
Listing 14-10. Sample Statistical Range UDA
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace Apress.Examples {
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct Range
{
SqlDouble min, max;
public void Init() {
min = SqlDouble.Null;
max = SqlDouble.Null;
}
public void Accumulate(SqlDouble value)
{
if (!value.IsNull) {
if (min.IsNull || value < min)
{
min = value;
}
if (max.IsNull || value > max)
{
max = value;
}
}
}
public void Merge(Range group)
{
if (min.IsNull || (!group.min.IsNull && group.min < min))
{
min = group.min;
}
if (max.IsNull || (!group.max.IsNull && group.max > max))
{
max = group.max;
}
}
public SqlDouble Terminate() {
SqlDouble result = SqlDouble.Null;
if (!min.IsNull && !max.IsNull)
{
result = max - min;
}
return result;
}
}
}
This UDA begins, like the previous CLR integration assemblies, by importing the proper namespaces:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
Next, the code declares the struct that represents the UDA. The attributes Serializable and SqlUserDefinedAggregate are applied to the struct. We used the Format.Native serialization format for this UDA. Because this is a simple UDA, Format.Native will provide the best performance and will be the easiest to implement. More complex UDAs that use reference types require Format.UserDefined serialization and must implement the IBinarySerialize interface.
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct Range
{
}
The struct declares two member variables, min and max, which will hold the minimum and maximum values encountered during the aggregation process:
SqlDouble min, max;
The mandatory Init() method in the aggregate body initializes the min and max member variables to SqlDouble.Null:
public void Init() {
min = SqlDouble.Null;
max = SqlDouble.Null;
}
The Accumulate() method accepts a SqlDouble parameter. This method first checks that the value is not NULL (NULL is ignored during aggregation). Then it checks to see if the value passed in is less than the min variable (or if min is NULL), and if so, assigns the parameter value to min. The method also checks max and updates it if the parameter value is greater than max (or if max is NULL). In this way, the min and max values are determined on the fly as the query engine feeds values into the Accumulate() method.
public void Accumulate(SqlDouble value)
{
if (!value.IsNull) {
if (min.IsNull || value < min)
{
min = value;
}
if (max.IsNull || value > max)
{
max = value;
}
}
}
The Merge() method merges a Range structure that was created in parallel with the current structure. The method accepts a Range structure and compares its min and max variables to those of the current Range structure. It then adjusts the current structure’s min and max variables based on the Range structure passed into the method, effectively merging the two results.
public void Merge(Range group)
{
if (min.IsNull || (!group.min.IsNull && group.min < min))
{
min = group.min;
}
if (max.IsNull || (!group.max.IsNull && group.max > max))
{
max = group.max;
}
}
The final method of the UDA is the Terminate() function, which returns an SqlDouble result. This function checks for min or max results that are NULL. The UDA returns NULL if either min or max is NULL. If neither min nor max is NULL, the result is the difference between the max and min values.
public SqlDouble Terminate() {
SqlDouble result = SqlDouble.Null;
if (!min.IsNull && !max.IsNull)
{
result = max - min;
}
return result;
}
Note The Terminate() method must return the same data type that the Accumulate() method accepts. If these data types do not match, an error will occur. Also, as mentioned previously, it is best practice to use the SQL Server-specific data types, since the standard .NET types will choke on NULL.
Listing 14-11 is a simple test of this UDA. The test determines the statistical range of unit prices that customers have paid for AdventureWorks products. Information like this, on a per-product or per-model basis, can be paired with additional information to help the AdventureWorks sales teams set optimal price points for their products. The results are shown in Figure 14-7.
Listing 14-11. Retrieving Statistical Ranges with UDA
CREATE AGGREGATE Range (@value float) RETURNS float
EXTERNAL NAME ApressExamples.[Apress.Examples.Range];
GO
SELECT
ProductID,
dbo.Range(UnitPrice) AS UnitPriceRange
FROM Sales.SalesOrderDetail
WHERE UnitPrice > 0
GROUP BY ProductID;
Figure 14-7. Results of the Range Aggregate Applied to Unit Prices
Caution This UDA is an example. It will be faster to use regular T-SQL aggregation functions for this type of calculation, especially if you have a large number of rows to process.
Creating an Advanced UDA
You can create more advanced CLR aggregates that use reference data types and user-defined serialization. When creating a UDA that uses reference (nonvalue) data types such as ArrayLists, SortedLists, and Objects, CLR integration imposes the additional restriction that you cannot mark the UDA for Format.Native serialization. Instead these aggregates have to be marked for Format.UserDefined serialization, which means that the UDA must implement the IBinarySerialize interface, including both the Read and Write methods. Basically, you have to tell SQL Server how to serialize your data when using reference types. There is a performance impact associated with Format.UserDefined serialization as opposed to Format.Native.
Listing 14-12 is a UDA that calculates the statistical median of a set of numbers. The statistical median is the middle number of an ordered group of numbers. If there is an even number of numbers in the set, the statistical median is the average (mean) of the middle two numbers in the set.
Listing 14-12. UDA to Calculate Statistical Median
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlTypes;
using System.Runtime.InteropServices;
using Microsoft.SqlServer.Server;
namespace Apress.Examples {
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate (
Format.UserDefined,
IsNullIfEmpty = true,
MaxByteSize = −1 )]
#160; [StructLayout(LayoutKind.Sequential)]
public struct Median : IBinarySerialize
{
List<double> temp; // List of numbers
public void Init()
{
// Create new list of double numbers
this.temp = new List<double> ();
}
public void Accumulate(SqlDouble number)
{
if (!number.IsNull) // Skip over NULLs
{
this.temp.Add(number.Value); // If number is not NULL, add it to list
}
}
public void Merge(Median group)
{
// Merge two sets of numbers
this.temp.InsertRange(this.temp.Count, group.temp);
}
public SqlDouble Terminate() {
SqlDouble result = SqlDouble.Null; // Default result to NULL
this.temp.Sort(); // Sort list of numbers
int first, second; // Indexes to middle two numbers
if (this.temp.Count % 2 == 1)
{
// If there is an odd number of values get the middle number twice
first = this.temp.Count / 2;
second = first;
}
else
{
// If there is an even number of values get the middle two numbers
first = this.temp.Count / 2-1;
second = first + 1;
}
if (this.temp.Count > 0) // If there are numbers, calculate median
{
// Calculate median as average of middle number(s)
result = (SqlDouble)( this.temp[first] + this.temp[second] ) / 2.0;
}
return result;
}
#region IBinarySerialize Members
// Custom serialization read method
public void Read(System.IO.BinaryReader r)
{
// Create a new list of double values
this.temp = new List<double> ();
// Get the number of values that were serialized
int j = r.ReadInt32();
// Loop and add each serialized value to the list
for (int i = 0; i < j; i++)
{
this.temp.Add(r.ReadDouble());
}
}
// Custom serialization write method
public void Write(System.IO.BinaryWriter w)
{
// Write the number of values in the list
w.Write(this.temp.Count);
// Write out each value in the list
foreach (double d in this.temp)
{
w.Write(d);
}
}
#endregion
}
}
This UDA begins, like the other CLR integration examples, with namespace imports. We’ve added the System.Collections.Generic namespace this time so we can use the .NET List<T> strongly typed list.
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlTypes;
using System.Runtime.InteropServices;
using Microsoft.SqlServer.Server;
The Median structure in the example is declared with the Serializable attribute to indicate that it can be serialized, and the StructLayout attribute with the LayoutKind.Sequential property to force the structure to be serialized in sequential fashion for a UDA that has a Format different from Native. The SqlUserDefinedAggregate attribute declares three properties, as follows:
Because Format.UserDefined was specified on the Median structure, it must implement the IBinarySerialize interface. Inside the body of the struct, we’ve defined a List<double> named temp that will hold an intermediate temporary list of numbers passed into the UDA.
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate (
Format.UserDefined,
IsNullIfEmpty = true,
MaxByteSize = −1 )]
[StructLayout(LayoutKind.Sequential)]
public struct Median : IBinarySerialize
{
List<double> temp; // List of numbers
...
}
The Read() and Write() methods of the IBinarySerialize interface are used to deserialize and serialize the list, respectively:
#region IBinarySerialize Members
// Custom serialization read method
public void Read(System.IO.BinaryReader r)
{
// Create a new list of double values
this.temp = new List<double>();
// Get the number of values that were serialized
int j = r.ReadInt32();
// Loop and add each serialized value to the list
for (int i = 0; i < j; i++)
{
this.temp.Add(r.ReadDouble());
}
}
// Custom serialization write method
public void Write(System.IO.BinaryWriter w)
{
// Write the number of values in the list
w.Write(this.temp.Count);
// Write out each value in the list
foreach (double d in this.temp)
{
w.Write(d);
}
}
#endregion
The Init method of the UDA initializes the temp list by creating a new List<double> instance:
public void Init() {
// Create new list of double numbers
this.temp = new List<double>();
}
The Accumulate() method accepts a SqlDouble number and adds all non-NULL values to the temp list. Although you can include NULLs in your aggregate results, keep in mind that T-SQL developers are used to the NULL handling of built-in aggregate functions like SUM and AVG. In particular, developers are used to their aggregate functions discarding NULL. This is the main reason we eliminate NULL in this UDA.
public void Accumulate(SqlDouble number)
{
if (!number.IsNull) // Skip over NULLs
{
this.temp.Add(number.Value); // If number is not NULL, add it to list
}
}
The Merge() method in the example merges two lists of numbers if SQL Server decides to calculate the aggregate in parallel. If so, the server will pass a list of numbers into the Merge() method. This list of numbers must then be appended to the current list. For efficiency, we use the InsertRange() method of List<T> to combine the lists.
public void Merge(Median group)
{
// Merge two sets of numbers
this.temp.InsertRange(this.temp.Count, group.temp);
}
The Terminate() method of the UDA sorts the list of values and then determines the indexes of the middle numbers. If there is an odd number of values in the list, there is only a single middle number; if there is an even number of values in the list, the median is the average of the middle two numbers. If the list contains no values (which can occur if every value passed to the aggregate is NULL), the result is NULL; otherwise the Terminate() method calculates and returns the median.
public SqlDouble Terminate() {
SqlDouble result = SqlDouble.Null; // Default result to NULL
this.temp.Sort(); // Sort list of numbers
int first, second; // Indexes to middle two numbers
if (this.temp.Count % 2 == 1)
{
// If there is an odd number of values get the middle number twice
first = this.temp.Count / 2;
second = first;
}
else
{
// If there is an even number of values get the middle two numbers
first = this.temp.Count / 2–1;
second = first + 1;
}
if (this.temp.Count > 0) // If there are numbers, calculate median
{
// Calculate median as average of middle number(s)
result = (SqlDouble)( this.temp[first] + this.temp[second] ) / 2.0;
}
return result;
}
Listing 14-13 demonstrates the use of this UDA to calculate the median UnitPrice from the Sales.SalesOrderDetail table on a per-product basis. The results are shown in Figure 14-8.
Listing 14-13. Calculating Median Unit Price with a UDA
CREATE AGGREGATE dbo.Median (@value float) RETURNS float
EXTERNAL NAME ApressExamples.[Apress.Examples.Median];
GO
SELECT
ProductID,
dbo.Median(UnitPrice) AS MedianUnitPrice
FROM Sales.SalesOrderDetail
GROUP BY ProductID;
Figure 14-8. Median Unit Price for Each Product
CLR Integration User-Defined Types
SQL Server 2000 had built-in support for user-defined data types, but they were limited in scope and functionality. The old-style user-defined data types had the following restrictions and capabilities:
SQL Server 2012 provides support for old-style user-defined data types and rules, presumably for backward compatibility with existing applications. The AdventureWorks database contains examples of old-style user-defined data types, like the dbo.Phone data type, which is an alias for the varchar(25) data type.
Caution Rules (CHECK constraints that can be applied to user-defined data types) have been deprecated since SQL Server 2005 and will be removed from a future version. T-SQL user-defined data types are now often referred to as alias types.
SQL Server 2012 supports a far more flexible solution to your custom data type needs in the form of CLR user-defined types. CLR integration user-defined types allow you to access the power of the .NET Framework to meet your custom data type needs. Common examples of CLR UDTs include mathematical concepts like points, vectors, complex numbers, and other types not built into the SQL Server type system. In fact, CLR UDTs are so powerful that Microsoft has begun including some as standard in SQL Server. These CLR UDTs include the spatial data types geography and geometry, and the hierarchyid data type.
CLR UDTs are useful for implementing data types that require special handling and that implement their own special methods and functions. Complex numbers, which are a superset of real numbers, are one example. Complex numbers are represented with a “real” part and an “imaginary” part in the format “a + bi,” where a is a real number representing the real part of the value, b is a real number representing the imaginary part, and the literal letter i after the imaginary part stands for the imaginary number i, which is the square root of −1. Complex numbers are often used in math, science, and engineering to solve difficult abstract problems. Some examples of complex numbers include 101.9 + 3.7i, 98 + 12i, -19i, and 12 + 0i (which can also be represented as 12). Because their format is different from real numbers and calculations with them require special functionality, complex numbers are a good candidate for CLR. The example in Listing 14-14 implements a complex number CLR UDT.
Note To keep the example simple, only a partial implementation is reproduced here. The sample download file includes the full version of this CLR UDT that includes basic operators as well as additional documentation and implementations of many more mathematical operators and trigonometric functions.
Listing 14-14. Complex Numbers UDT
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
namespace Apress.Examples
{
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType
(
Format.Native,
IsByteOrdered = true
)]
public struct Complex : INullable
{
#region "Complex Number UDT Fields/Components"
private bool m_Null;
public Double real;
public Double imaginary;
#endregion
#region "Complex Number Parsing, Constructor, and Methods/Properties"
private static readonly Regex rx = new Regex(
"^(?<Imaginary>[+−]?([0–9] + |[0–9]*\.[0–9]+))[i|I]$|" +
"^(?<Real>[+−]?([0–9] + |[0–9]*\.[0–9]+))$|" +
"^(?<Real>[+−]?([0–9] + |[0–9]*\.[0–9]+))" +
"(?<Imaginary>[+−]?([0–9] + |[0–9]*\.[0–9]+))[i|I]$");
public static Complex Parse(SqlString s)
{
Complex u = new Complex();
if (s.IsNull)
u = Null;
else
{
MatchCollection m = rx.Matches(s.Value);
if (m.Count == 0)
throw (new FormatException("Invalid Complex Number Format."));
String real_str = m[0].Groups["Real"].Value;
String imaginary_str = m[0].Groups["Imaginary"].Value;
if (real_str == "" && imaginary_str == "")
throw (new FormatException("Invalid Complex Number Format."));
if (real_str == "")
u.real = 0.0;
else
u.real = Convert.ToDouble(real_str);
if (imaginary_str == "")
u.imaginary = 0.0;
else
u.imaginary = Convert.ToDouble(imaginary_str);
}
return u;
}
public override String ToString()
{
String sign = "";
if (this.imaginary >= 0.0)
sign = " + ";
return this.real.ToString() + sign + this.imaginary.ToString() + "i";
}
public bool IsNull
{
get
{
return m_Null;
}
}
public static Complex Null
{
get
{
Complex h = new Complex();
h.m_Null = true;
return h;
}
}
public Complex(Double r, Double i)
{
this.real = r;
this.imaginary = i;
this.m_Null = false;
}
#endregion
#region "Complex Number Basic Operators"
// Complex number addition
public static Complex operator + (Complex n1, Complex n2)
{
Complex u;
if (n1.IsNull || n2.IsNull)
u = Null;
else
u = new Complex(n1.real + n2.real, n1.imaginary + n2.imaginary);
return u;
}
#endregion
#region "Exposed Mathematical Basic Operator Methods"
// Add complex number n2 to n1
public static Complex CAdd(Complex n1, Complex n2)
{
return n1 + n2;
}
// Subtract complex number n2 from n1
public static Complex Sub(Complex n1, Complex n2)
{
return n1 - n2;
}
#endregion
// other complex operations are available in the source code
}
}
The code begins with the required namespace imports and the namespace declaration for the sample:
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
Next is the declaration of the structure that represents an instance of the UDT. The Serializable, Format.Native, and IsByteOrdered = true attributes and attribute properties are all set on the UDT. In addition, all CLR UDTs must implement the INullable interface. INullable requires that the IsNull and Null properties be defined. The CLR UDT attributes are detailed in Table 14-1.
Table 14-1. Common CLR UDT Attributes
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType
(
Format.Native,
IsByteOrdered = true
)]
public struct Complex : INullable
{
...
}
Table 14-1 shows a few of the common attributes that are used in CLR integration UDT definitions.
The public and private fields are declared inside the body of the Complex structure. The real and imaginary public fields represent the real and imaginary parts of the complex number, respectively. The m_Null field is a bool value that is set to true if the current instance of the complex type is NULL, and is set to false otherwise.
#region "Complex Number UDT Fields/Components"
private bool m_Null;
public Double real;
public Double imaginary;
#endregion
The first method declared in the UDT is the Parse method (required by all UDTs), which takes a string value from SQL Server and parses it into a complex number. The Parse method uses a .NET regular expression to simplify parsing a bit.
private static readonly Regex rx = new Regex(
"^(?<Imaginary>[+−]?([0–9] + |[0–9]*\.[0–9]+))[i|I]$|" +
"^(?<Real>[+−]?([0–9] + |[0–9]*\.[0–9]+))$|" +
"^(?<Real>[+−]?([0–9] + |[0–9]*\.[0–9]+))" +
"(?<Imaginary>[+−]?([0–9] + |[0–9]*\.[0–9]+))[i|I]$");
public static Complex Parse(SqlString s)
{
Complex u = new Complex();
if (s.IsNull)
u = Null;
else
{
MatchCollection m = rx.Matches(s.Value);
if (m.Count == 0)
throw (new FormatException("Invalid Complex Number Format."));
String real_str = m[0].Groups["Real"].Value;
String imaginary_str = m[0].Groups["Imaginary"].Value;
if (real_str == "" && imaginary_str == "")
throw (new FormatException("Invalid Complex Number Format."));
if (real_str == "")
u.real = 0.0;
else
u.real = Convert.ToDouble(real_str);
if (imaginary_str == "")
u.imaginary = 0.0;
else
u.imaginary = Convert.ToDouble(imaginary_str);
}
return u;
}
The regular expression (a.k.a. regex) uses named groups to parse the input string into Real and/or Imaginary named groups. If the regex is successful, at least one (if not both) of these named groups will be populated. If unsuccessful, both named groups will be empty and an exception of type FormatException will be thrown. If at least one of the named groups is properly set, the string representations are converted to Double type and assigned to the appropriate UDT fields. Table 14-2 shows some sample input strings and the values assigned to the UDT fields when they are parsed.
Table 14-2. Complex Number Parsing Samples
The ToString() method is required for all UDTs as well. This method converts the internal UDT data to its string representation. In the case of complex numbers, ToString() needs to perform the following steps:
Notice that if the imaginary part is negative, no sign is appended between the real and imaginary parts, since the sign is already included in the imaginary part:
public override String ToString()
{
String sign = "";
if (this.imaginary >= 0.0)
sign = " + ";
return this.real.ToString() + sign + this.imaginary.ToString() + "i";
}
The IsNull and Null properties are both required by all UDTs. IsNull is a bool property that indicates whether a UDT instance is NULL or not. The Null property returns a NULL instance of the UDT type. One thing you need to be aware of any time you invoke a UDT (or any CLR integration object) from T-SQL is SQL NULL. For purposes of the Complex UDT, we take a cue from T-SQL and return a NULL result any time a NULL is passed in as a parameter to any UDT method. So a Complex value plus NULL returns NULL, as does a Complex value divided by NULL, and so on. You will notice a lot of code in the complete Complex UDT listing that is specifically designed to deal with NULL.
public bool IsNull
{
get
{
return m_Null;
}
}
public static Complex Null
{
get
{
Complex h = new Complex();
h.m_Null = true;
return h;
}
}
This particular UDT includes a constructor function that accepts two Double type values and creates a UDT instance from them:
public Complex(Double r, Double i)
{
this.real = r;
this.imaginary = i;
this.m_Null = false;
}
Tip For a UDT designed as a.NET structure, a constructor method is not required. In fact, a default constructor (that takes no parameters) is not even allowed. To keep later code simple, we added a constructor method to this example.
In the next region, we defined a few useful complex number constants and exposed them as static properties of the Complex UDT:
#region "Useful Complex Number Constants"
// The property "i" is the Complex number 0 + 1i. Defined here because
// it is useful in some calculations
public static Complex i
{
get
{
return new Complex(0, 1);
}
}
...
#endregion
To keep this listing short but highlight the important points, the sample UDT shows only the addition operator for complex numbers. The UDT overrides the + operator. Redefining operators makes it easier to write and debug additional UDT methods. These overridden .NET math operators are not available to T-SQL code, so the standard T-SQL math operators will not work on the UDT.
// Complex number addition
public static Complex operator + (Complex n1, Complex n2)
{
Complex u;
if (n1.IsNull || n2.IsNull)
u = Null;
else
u = new Complex(n1.real + n2.real, n1.imaginary + n2.imaginary);
return u;
}
Performing mathematical operations on UDT values from T-SQL must be done via explicitly exposed methods of the UDT. These methods in the Complex UDT are CAdd and Div, for complex number addition and division, respectively. Note that we chose CAdd (which stands for “complex number add”) as a method name to avoid conflicts with the T-SQL reserved word ADD. We won’t go too deeply into the inner workings of complex numbers, but we chose to implement the basic operators in this listing because some (like complex number addition) are straightforward operations, while others (like division) are a bit more complicated. The math operator methods are declared as static, so they can be invoked on the UDT data type itself from SQL Server instead of on an instance of the UDT.
#region "Exposed Mathematical Basic Operator Methods"
// Add complex number n2 to n1
public static Complex CAdd(Complex n1, Complex n2)
{
return n1 + n2;
}
// Subtract complex number n2 from n1
public static Complex Sub(Complex n1, Complex n2)
{
return n1 - n2;
}
#endregion
Note Static methods of a UDT (declared with the static keyword in C# or the Shared keyword in Visual Basic) are invoked from SQL Server using a format like this:
Complex::CAdd(@nl, @n2)
Nonshared, or instance methods of a UDT are invoked from SQL Server using a format similar to this:
@>nl.CAdd(@n2)
The style of method you use (shared or instance) is a determination you’ll need to make on a case-by-case basis.
Listing 14-15 demonstrates how the Complex UDT can be used, and the results are shown in Figure 14-9.
Listing 14-15. Complex Number UDT Demonstration
CREATE TYPE dbo.Complex
EXTERNAL NAME ApressExamples.[Apress.Examples.Complex];
GO
DECLARE @c complex = ' + 100-10i',
@d complex = '5i';
SELECT 'ADD: ' + @c.ToString() + ' , ' + @d.ToString() AS Op,
complex::CAdd(@c, @d).ToString() AS Result
UNION
SELECT 'DIV: ' + @c.ToString() + ' , ' + @d.ToString(),
complex::Div(@c, @d).ToString()
UNION
SELECT 'SUB: ' + @c.ToString() + ' , ' + @d.ToString(),
complex::Sub(@c, @d).ToString()
UNION
SELECT 'MULT: ' + @c.ToString() + ' , ' + @d.ToString(),
complex::Mult(@c, @d).ToString()
UNION
SELECT 'PI: ',
complex::Pi.ToString();
Figure 14-9. Performing Operations with the Complex UDT
In addition to the basic operations, the Complex class can be easily extended to support several more advanced complex number operators and functions. The code sample download file contains a full listing of an expanded Complex UDT, including all the basic math operators, as well as logarithmic and exponential functions (Log(), Power(), etc.) and trigonometric and hyperbolic functions (Sin(), Cos(), Tanh(), etc.) for complex numbers.
Finally, you can also create .NET triggers. This is logical; after all, triggers are just a specialized type of stored procedures. There are few examples of really interesting .NET triggers. Most of what you want to do in a trigger can be done with regular T-SQL code. When SQL Server 2005 was released, we saw an example of a .NET trigger on a location table that calls a web service to find the coordinates of a city and adds them to a coordinates column. This could at first sound like a cool idea, but if you remember that a trigger is fired inside the scope of the DML statement’s transaction, you can guess that the latency added to every insert and update on the table might be a problem. Usually, we try to keep the trigger impact as light as possible. Listing 14-16 presents an example of a .NET trigger based on our previous regular expression UDF. It tests an e-mail inserted or modified on the AdventureWorks Person.EmailAddress table, and rolls back the transaction if it does not match the pattern of a correct e-mail address. Let’s see it in action.
Listing 14-16. Trigger to Validate an E-mail Address
using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Transactions;
namespace Apress.Examples
{
public partial class Triggers
{
private static readonly Regex email_pattern = new Regex
(
// Everything before the @ sign (the "local part")
"^[a-z0-9!#$%&'*+/=?^_'{|} ∼ −] + (?:\.[a-z0-9!#$%&'*+/=?^_'{|} ∼ −]+)*" +
// Subdomains after the @ sign
"@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.) + " +
// Top-level domains
"(?:[a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b$"
);
[Microsoft.SqlServer.Server.SqlTrigger(
Name = "EmailAddressTrigger",
Target = "[Person].[EmailAddress]",
Event = "FOR INSERT, UPDATE")]
public static void EmailAddressTrigger()
{
SqlTriggerContext tContext = SqlContext.TriggerContext;
// Retrieve the connection that the trigger is using.
using (SqlConnection cn
= new SqlConnection(@"context connection=true"))
{
SqlCommand cmd;
SqlDataReader r;
cn.Open();
cmd = new SqlCommand(@"SELECT EmailAddress FROM INSERTED", cn);
r = cmd.ExecuteReader();
try
{
while (r.Read())
{
if (!email_pattern.IsMatch(r.GetString(0).ToLower()))
Transaction.Current.Rollback();
}
}
catch (SqlException ex)
{
// Catch the expected exception.
}
finally
{
r.Close();
cn.Close();
}
}
}
}
}
As we now are used to, we first declare our .NET namespaces. To manage the transaction, we have to declare the System.Transactions namespace. In your Visual Studio project, it might not be recognized. You need to right-click on the project in the Solution Explorer and select “add reference.” Then, go to the SQL Server tab, and check “System.Transactions for framework 4.0.0.0.”
Then, like in our previous UDF, we declare the Regex object. The trigger body follows. In the function’s decoration, we name the trigger, and we declare for which target table it is intended. We also specify at what events it will fire.
[Microsoft.SqlServer.Server.SqlTrigger(
Name = "EmailAddressTrigger",
Target = "[Person].[EmailAddress]",
Event = "FOR INSERT, UPDATE")]
public static void EmailAddressTrigger()
{ ...
Then, we declare an instance of the SqlTriggerContext class. This class exposes a few properties that give information about the trigger’s context, like what columns are updated, what the action is that fired the trigger, and in case of a DDL trigger, it also gives access to the EventData XML structure containing all the execution details.
SqlTriggerContext tContext = SqlContext.TriggerContext;
The next line opens the so-called context connection to SQL Server. There is only one way to access the content of a table: with a T-SQL SELECT statement. Even a .NET code executed inside SQL Server cannot escape from this rule. To be able to retrieve the e-mails that have been inserted or updated, we need to open a connection to SQL Server and query the inserted virtual table. For that, we use a special type of connection available inside CLR integration named the context connection, which is designed to be faster than a regular network or local connection. Then we use a data reader to retrieve the e-mails in the EmailAddress column. We loop through the results and apply the regular expression pattern to each address. If it doesn’t match, we roll back the transaction by using the Transaction.Current.Rollback() method. We need to protect the rollback by a try ... catch block, because it will throw an ambiguous exception, stating that “Transaction is not allowed to roll back inside a user defined routine, trigger or aggregate because the transaction is not started in that CLR level.” This can be safely ignored. Another error will be raised even if the try ... catch block is there, and it must be dealt with at the T-SQL level. We will see that in our example later on.
using (SqlConnection cn
= new SqlConnection(@"context connection = true"))
{
SqlCommand cmd;
SqlDataReader r;
cn.Open();
cmd = new SqlCommand(@"SELECT EmailAddress FROM INSERTED", cn);
r = cmd.ExecuteReader();
try
{
while (r.Read())
{
if (!email_pattern.IsMatch(r.GetString(0).ToLower()))
Transaction.Current.Rollback();
}
}
catch (SqlException ex)
{
// Catch the expected exception.
}
finally
{
r.Close();
cn.Close();
}
}
}
}
Now that the trigger is written, let’s try it out. When the assembly is compiled and added to the AdventureWorks database using CREATE ASSEMBLY, we can add the trigger to the Person.EmailAddress table, as shown in Listing 14-17.
Listing 14-17. Creation of the CLR Trigger to Validate an E-mail Address
CREATE TRIGGER atr_Person_EmailAddress_ValidateEmail
ON Person.EmailAddress
AFTER INSERT, UPDATE
AS EXTERNAL NAME ApressExamples.[Apress.Examples.Triggers].EmailAddressTrigger;
We now try to update a line to an obviously invalid e-mail address in Listing 14-18. The result is shown in Figure 14-10.
Listing 14-18. Setting an Invalid E-mail Address
UPDATE Person.EmailAddress
SET EmailAddress = 'pro%sql@apress@com'
WHERE EmailAddress = '[email protected]';
Figure 14-10. Result of the Trigger’s Action
As you can see, the trigger worked and rolled back the UPDATE attempt, but the error message generated for the CLR code is not very user-friendly. We need to catch the exception in our T-SQL statement. A modified UPDATE dealing with that is shown in Listing 14-19.
Listing 14-19. UPDATE Statement Modified to Handle the Error
BEGIN TRY
UPDATE Person.EmailAddress
SET EmailAddress = 'pro%sql@apress@com'
WHERE EmailAddress = '[email protected]';
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 3991
RAISERROR('invalid email address', 16, 10)
END CATCH
This CLR trigger is an example, and it might not be the best solution to our e-mail checking needs, for two reasons: firstly because we need to handle the CLR error in our calling code, which forces us to enclose every statement modifying the EmailAddress inside a try ... catch block, and secondly because of performance considerations. Our CLR code loops through a DataReader and checks it line per line. A set-oriented T-SQL trigger like the one shown in Listing 14-20 will certainly be faster, especially if there are many rows affected by the INSERT or UPDATE statement.
Listing 14-20. T-SQL Trigger to Validate an E-mail Address
CREATE TRIGGER atr_Person_EmailAddress_ValidateEmail
ON Person.EmailAddress
AFTER INSERT, UPDATE
AS BEGIN
IF @@ROWCOUNT = 0 RETURN
IF EXISTS (SELECT * FROM inserted WHERE dbo.EmailMatch(EmailAddress) = 0)
BEGIN
RAISERROR('an email is invalid', 16, 10)
ROLLBACK TRANSACTION
END
END;
Summary
SQL Server 2005 introduced SQL CLR integration, allowing you to create UDFs, UDAs, SPs, UDTs, and triggers in managed .NET code. SQL Server 2008 improved on CLR integration by allowing UDTs and UDAs to have a maximum size of 2.1 GB, which is still the case in SQL Server 2012.
In this chapter, we talked about CLR integration usage considerations, and scenarios when CLR integration code might be considered a good alternative to strict T-SQL. We also discussed assemblies and security, including the SAFE, EXTERNAL_ACCESS, and UNSAFE permission sets that can be applied on a per-assembly basis.
Finally, we provided several examples of CLR integration code that cover a wide range of possible uses, including the following:
This chapter has served as an introduction to CLR integration programming. For in-depth CLR integration programming information, I highly recommend Pro SQL Server 2005 Assemblies, by Robin Dewson and Julian Skinner (Apress, 2005). Though written for SQL Server 2005, much of the information it contains is still relevant to SQL Server 2012. In the next chapter, we will introduce client-side .NET connectivity to SQL Server 2012.
EXERCISES
1. [Choose all that apply] SQL Server 2012 provides support for which of the following CLR integration objects:
2. [True/False] SQL Server 2012 limits CLR integration UDAs and UDTs to a maximum size of 8000 bytes.
3. [Choose one] SAFE permissions allow your CLR integration code to
a. Write to the file system
4. [True/False] CLR integration UDAs and UDTs must be defined with the Serializable attribute.
5. [Fill in the blank] A CLR integration UDA that is declared as Format.UserDefined must implement the _________ interface.
6. [Choose all that apply] A CLR integration UDA must implement which of the following methods?
a. Init