SQL Server 2012 is the latest release of Microsoft’s enterprise-class database management system (DBMS). As the name implies, a DBMS is a tool designed to manage, secure, and provide access to data stored in structured collections within databases. T-SQL is the language that SQL Server speaks. T-SQL provides query and data manipulation functionality, data definition and management capabilities, and security administration tools to SQL Server developers and administrators. To communicate effectively with SQL Server, you must have a solid understanding of the language. In this chapter, we will begin exploring T-SQL on SQL Server 2012.
The history of Structured Query Language (SQL), and its direct descendant Transact-SQL (T-SQL), begins with a man. Specifically, it all began in 1970 when Dr. E. F. Codd published his influential paper “A Relational Model of Data for Large Shared Data Banks” in the Communications of the Association for Computing Machinery (ACM). In his seminal paper, Dr. Codd introduced the definitive standard for relational databases. IBM went on to create the first relational database management system, known as System R. It subsequently introduced the Structured English Query Language (SEQUEL, as it was known at the time) to interact with this early database to store, modify, and retrieve data. The name of this early query language was later changed from SEQUEL to the now-common SQL due to a trademark issue.
Fast-forward to 1986 when the American National Standards Institute (ANSI) officially approved the first SQL standard, commonly known as the ANSI SQL-86 standard. Microsoft entered the relational database management system picture a few years later through a joint venture with Sybase and Ashton-Tate (of dBase fame). The original versions of Microsoft SQL Server shared a common code base with the Sybase SQL Server product. This changed with the release of SQL Server 7.0, when Microsoft partially rewrote the code base. Microsoft has since introduced several iterations of SQL Server, including SQL Server 2000, SQL Server 2005, SQL Server 2008 R2, and now SQL Server 2012. In this book, we will focus on SQL Server 2012, which further extends the capabilities of T-SQL beyond what was possible in previous releases.
Imperative vs. Declarative Languages
SQL is different from many common programming languages such as C# and Visual Basic because it is a declarative language. To contrast, languages such as C++, Visual Basic, C#, and even assembler language are imperative languages. The imperative language model requires the user to determine what the end result should be and also tell the computer step by step how to achieve that result. It’s analogous to asking a cab driver to drive you to the airport, and then giving him turn-by-turn directions to get there. Declarative languages, on the other hand, allow you to frame your instructions to the computer in terms of the end result. In this model, you allow the computer to determine the best route to achieve your objective, analogous to just telling the cab driver to take you to the airport and trusting him to know the best route. The declarative model makes a lot of sense when you consider that SQL Server is privy to a lot of “inside information.” Just like the cab driver who knows the shortcuts, traffic conditions, and other factors that affect your trip, SQL Server inherently knows several methods to optimize your queries and data manipulation operations.
Consider Listing 1-1, which is a simple C# code snippet that reads in a flat file of names and displays them on the screen.
Listing 1-1. C# Snippet to Read a Flat File
StreamReader sr = new StreamReader("c:\Person_Person.txt");
string FirstName = null;
while ((FirstName = sr.ReadLine()) != null) {
Console.WriteLine(s); } sr.Dispose();
The example performs the following functions in an orderly fashion:
Consider what happens when you want to add or delete a name from the flat-file “database.” In those cases, you must extend the previous example and add custom routines to explicitly reorganize all the data in the file so that it maintains proper ordering. If you want the names to be listed and retrieved in alphabetical (or any other) order, you must write your own sort routines as well. Any type of additional processing on the data requires that you implement separate procedural routines.
The SQL equivalent of the C# code in Listing 1-1 might look something like Listing 1-2.
Listing 1-2. SQL Query to Retrieve Names from a Table
SELECT FirstName FROM Person.Person;
Tip Unless otherwise specified, you can run all the T-SQL samples in this book in the AdventureWorks 2012 sample database using SQL Server Management Studio or SQLCMD.
To sort your data, you can simply add an ORDER BY clause to the SELECT query in Listing 1-2. With properly designed and indexed tables, SQL Server can automatically reorganize and index your data for efficient retrieval after you insert, update, or delete rows.
T-SQL includes extensions that allow you to use procedural syntax. In fact, you could rewrite the previous example as a cursor to closely mimic the C# sample code. These extensions should be used with care, however, since trying to force the imperative model on T-SQL effectively overrides SQL Server’s built-in optimizations. More often than not, this hurts performance and makes simple projects a lot more complex than they need to be.
One of the great assets of SQL Server is that you can invoke its power, in its native language, from nearly any other programming language. For example, in .NET you can connect and issue SQL queries and T-SQL statements to SQL Server via the System.Data.SqlClient namespace, which we will discuss further in Chapter 15. This gives you the opportunity to combine SQL’s declarative syntax with the strict control of an imperative language.
SQL Basics
Before we discuss developments in T-SQL, or on any SQL-based platform for that matter, we have to make sure we’re speaking the same language. Fortunately for us, SQL can be described accurately using well-defined and time-tested concepts and terminology. We’ll begin our discussion of the components of SQL by looking at statements.
To begin with, in SQL we use statements to communicate our requirements to the DBMS. A statement is composed of several parts, as shown in Figure 1-1.
Figure 1-1. Components of a SQL Statement
As you can see in the figure, SQL statements are composed of one or more clauses, some of which may be optional depending on the statement. In the SELECT statement shown, there are three clauses: the SELECT clause, which defines the columns to be returned by the query; the FROM clause, which indicates the source table for the query; and the WHERE clause, which is used to limit the results. Each clause represents a primitive operation in the relational algebra. For instance, in the example, the SELECT clause represents a relational projection operation, the FROM clause indicates the relation, and the WHERE clause performs a restriction operation.
Note The relational model of databases is the model formulated by Dr. E. F. Codd. In the relational model, what we know in SQL as tables are referred to as relations, hence the name. Relational calculus and relational algebra define the basis of query languages for the relational model in mathematical terms.
ORDER OF EXECUTION
Understanding the logical order in which SQL clauses are applied within a statement or query is important when setting your expectations about results. While vendors are free to physically perform whatever operations, in any order, that they choose to fulfill a query request, the results must be the same as if the operations were applied in a standards-defined order.
The WHERE clause in the example contains a predicate, which is a logical expression that evaluates to one of SQL’s three possible logical results: true, false, or unknown. In this case, the WHERE clause and the predicate limit the results to only rows in which the ContactId equals 1.
The SELECT clause includes an expression that is calculated during statement execution. In the example, the expression EmailPromotion * 10 is used. This expression is calculated for every row of the result set.
SQL THREE-VALUED LOGIC
SQL institutes a logic system that might seem foreign to developers coming from other languages like C++ or Visual Basic (or most other programming languages, for that matter). Most modern computer languages use simple two-valued logic: a Boolean result is either true or false. SQL supports the concept of NULL, which is a placeholder for a missing or unknown value. This results in a more complex three-valued logic (3VL).
Let us give you a quick example to demonstrate. If we asked you the question, “Is x less than 10?” your first response might be along the lines of, “How much is x?” If we refused to tell you what value x stood for, you would have no idea whether x was less than, equal to, or greater than 10; so the answer to the question is neither true nor false—it’s the third truth value, unknown. Now replace x with NULL and you have the essence of SQL 3VL. NULL in SQL is just like a variable in an equation when you don’t know the variable’s value.
No matter what type of comparison you perform with a missing value, or which other values you compare the missing value to, the result is always unknown. We’ll continue the discussion of SQL 3VL in Chapter 3.
The core of SQL is defined by statements that perform five major functions: querying data stored in tables, manipulating data stored in tables, managing the structure of tables, controlling access to tables, and managing transactions. All of these subsets of SQL are defined following:
A SQL Server instance—an individual installation of SQL Server with its own ports, logins, and databases—can manage multiple system databases and user databases. SQL Server has five system databases, as follows:
Microsoft recommends that you use the system-provided stored procedures and catalog views to modify system objects and system metadata, and let SQL Server manage the system databases. You should avoid modifying the contents and structure of the system databases directly through ad hoc T-SQL. Only modify the system objects and metadata by executing the system stored procedures and functions.
User databases are created by database administrators (DBAs) and developers on the server. These types of databases are so called because they contain user data. The AdventureWorks2012 sample database is one example of a user database.
Every SQL Server database has its own associated transaction log. The transaction log provides recoverability in the event of failure and ensures the atomicity of transactions. The transaction log accumulates all changes to the database so that database integrity can be maintained in the event of an error or other problem. Because of this arrangement, all SQL Server databases consist of at least two files: a database file with an .mdf extension and a transaction log with an .ldf extension.
SQL folks, and IT professionals in general, love their acronyms. A common acronym in the SQL world is ACID, which stands for “atomicity, consistency, isolation, durability.” These four words form a set of properties that database systems should implement to guarantee reliability of data storage, processing, and manipulation.
The transaction log is one of the main tools SQL Server uses to enforce the ACID concept when storing and manipulating data.
SQL Server 2012 supports database schemas, which are logical groupings by the owner of database objects. The AdventureWorks2012 sample database, for instance, contains several schemas, such as HumanResources, Person, and Production. These schemas are used to group tables, stored procedures, views, and user-defined functions (UDFs) for management and security purposes.
Tip When you create new database objects, like tables, and don’t specify a schema, they are automatically created in the default schema. The default schema is normally dbo, but DBAs may assign different default schemas to different users. Because of this, it’s always best to specify the schema name explicitly when creating database objects.
SQL Server supports several types of objects that can be created within a database. SQL stores and manages data in its primary data structures, tables. A table consists of rows and columns, with data stored at the intersections of these rows and columns. As an example, the AdventureWorks HumanResources.Department table is shown in Figure 1-2.
Figure 1-2. Representation of the HumanResources.Department Table
In the table, each row is associated with columns and each column has certain restrictions placed on its content. These restrictions comprise the data domain. The data domain defines all the values a column can contain. At the lowest level, the data domain is based on the data type of the column. For instance, a smallint column can contain any integer values between −32,768 and +32,767.
The data domain of a column can be further constrained through the use of check constraints, triggers, and foreign key constraints. Check constraints provide a means of automatically checking that the value of a column is within a certain range or equal to a certain value whenever a row is inserted or updated. Triggers can provide similar functionality to check constraints. Foreign key constraints allow you to declare a relationship between the columns of one table and the columns of another table. You can use foreign key constraints to restrict the data domain of a column to only include those values that appear in a designated column of another table.
RESTRICTING THE DATA DOMAIN: A COMPARISON
In this section, we have given a brief overview of three methods of constraining the data domain for a column. Each method restricts the values that can be contained in the column. Here’s a quick comparison of the three methods:
Which method you use to constrain the data domain of your column(s) needs to be determined by your project-specific requirements on a case-by-case basis.
A view is like a virtual table—the data it exposes is not stored in the view object itself. Views are composed of SQL queries that reference tables and other views, but they are referenced just like tables in queries. Views serve two major purposes in SQL Server: they can be used to hide the complexity of queries, and they can be used as a security device to limit the rows and columns of a table that a user can query. Views are expanded, meaning that their logic is incorporated into the execution plan for queries when you use them in queries and DML statements. SQL Server may not be able to use indexes on the base tables when the view is expanded, resulting in less-than-optimal performance when querying views in some situations.
To overcome the query performance issues with views, SQL Server also has the ability to create a special type of view known as an indexed view. An indexed view is a view that SQL Server persists to the database like a table. When you create an indexed view, SQL Server allocates storage for it and allows you to query it like any other table. There are, however, restrictions on inserting, updating, and deleting from an indexed view. For instance, you cannot perform data modifications on an indexed view if more than one of the view’s base tables will be affected. You also cannot perform data modifications on an indexed view if the view contains aggregate functions or a DISTINCT clause.
You can also create indexes on an indexed view to improve query performance. The downside to an indexed view is increased overhead when you modify data in the view’s base tables, since the view must be updated as well.
Indexes are SQL Server’s mechanisms for optimizing access to data. SQL Server 2012 supports several types of indexes, including the following:
A nonclustered index is analogous to an index in the back of a book.
You can also include nonkey columns in your nonclustered indexes with the INCLUDE clause of the CREATE INDEX statement. The included columns give you the ability to work around SQL Server’s index size limitations.
SQL Server supports the installation of server-side T-SQL code modules via stored procedures (SPs). It’s very common to use SPs as a sort of intermediate layer or custom server-side application programming interface (API) that sits between user applications and tables in the database. Stored procedures that are specifically designed to perform queries and DML statements against the tables in a database are commonly referred to as CRUD (create, read, update, delete) procedures.
User-defined functions (UDFs) can perform queries and calculations, and return either scalar values or tabular result sets. UDFs have certain restrictions placed on them. For instance, they cannot utilize certain nondeterministic system functions, nor can they perform DML or DDL statements, so they cannot make modifications to the database structure or content. They cannot perform dynamic SQL queries or change the state of the database (i.e., cause side effects).
SQL CLR Assemblies
SQL Server 2012 supports access to Microsoft .NET functionality via the SQL Common Language Runtime (SQL CLR). To access this functionality, you must register compiled .NET SQL CLR assemblies with the server. The assembly exposes its functionality through class methods, which can be accessed via SQL CLR functions, procedures, triggers, user-defined types, and user-defined aggregates. SQL CLR assemblies replace the deprecated SQL Server extended stored procedure (XP) functionality available in prior releases.
Tip Avoid using XPs on SQL Server 2012. The same functionality provided by XPs can be provided by SQL CLR code. The SQL CLR model is more robust and secure than the XP model. Also keep in mind that the XP library is deprecated and XP functionality may be completely removed in a future version of SQL Server.
Now that we’ve given a broad overview of the basics of SQL Server, we’ll take a look at some recommended development tips to help with code maintenance. Selecting a particular style and using it consistently helps immensely with both debugging and future maintenance. The following sections contain some general recommendations to make your T-SQL code easy to read, debug, and maintain.
SQL Server ignores extra whitespace between keywords and identifiers in SQL queries and statements. A single statement or query may include extra spaces and tab characters, and can even extend across several lines. You can use this knowledge to great advantage. Consider Listing 1-3, which is adapted from the HumanResources.vEmployee view in the AdventureWorks2012 database.
Listing 1-3. The HumanResources.vEmployee View from the AdventureWorks2012 Database
SELECT e.BusinessEntityID, p.Title, p.FirstName, p.MiddleName, p.LastName, p.Suffix, e.JobTitle,
pp.PhoneNumber, pnt.Name AS PhoneNumberType, ea.EmailAddress,
p.EmailPromotion, a.AddressLine1, a.AddressLine2, a.City, sp.Name AS StateProvinceName,
a.PostalCode, cr.Name AS CountryRegionName, p.AdditionalContactInfo
FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID =
e.BusinessEntityID INNER JOIN Person.BusinessEntityAddress AS bea ON bea.BusinessEntityID =
e.BusinessEntityID INNER JOIN Person.Address AS a ON a.AddressID = bea.AddressID INNER JOIN
Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID INNER JOIN
Person.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode LEFT OUTER JOIN
Person.PersonPhone AS pp ON pp.BusinessEntityID = p.BusinessEntityID LEFT OUTER JOIN
Person.PhoneNumberType AS pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID LEFT OUTER JOIN
Person.EmailAddress AS ea ON p.BusinessEntityID = ea.BusinessEntityID
This query will run and return the correct result, but it’s very hard to read. You can use whitespace and table aliases to generate a version that is much easier on the eyes, as demonstrated in Listing 1-4.
Listing 1-4. The HumanResources.vEmployee View Reformatted for Readability
SELECT
e.BusinessEntityID,
p.Title,
p.FirstName,
p.MiddleName,
p.LastName,
p.Suffix,
e.JobTitle,
pp.PhoneNumber,
pnt.Name AS PhoneNumberType,
ea.EmailAddress,
p.EmailPromotion,
a.AddressLine1,
a.AddressLine2,
a.City,
sp.Name AS StateProvinceName,
a.PostalCode,
cr.Name AS CountryRegionName,
p.AdditionalContactInfo
FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p
ON p.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS bea
ON bea.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = bea.AddressID
INNER JOIN Person.StateProvince AS sp
ON sp.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion AS cr
ON cr.CountryRegionCode = sp.CountryRegionCode
LEFT OUTER JOIN Person.PersonPhone AS pp
ON pp.BusinessEntityID = p.BusinessEntityID
LEFT OUTER JOIN Person.PhoneNumberType AS pnt
ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID
LEFT OUTER JOIN Person.EmailAddress AS ea
ON p.BusinessEntityID = ea.BusinessEntityID;
Notice that the ON keywords are indented, associating them visually with the INNER JOIN operators directly before them in the listing. The column names on the lines directly after the SELECT keyword are also indented, associating them visually with the SELECT keyword. This particular style is useful in helping visually break up a query into sections. The personal style you decide upon might differ from this one, but once you have decided on a standard indentation style, be sure to apply it consistently throughout your code.
Code that is easy to read is easier to debug and maintain. The code in Listing 1-4 uses table aliases, plenty of whitespace, and the semicolon (;) terminator marking the end of the SELECT statement to make the code more readable. Required in some instances, it is a good idea to get into the habit of using the terminating semicolon in your SQL queries.
Tip Semicolons are required terminators for some statements in SQL Server 2012. Instead of trying to remember all the special cases where they are or aren’t required, it is a good idea to use the semicolon statement terminator throughout your T-SQL code. You will notice the use of semicolon terminators in all the examples in this book.
SQL Server allows you to name your database objects (tables, views, procedures, and so on) using just about any combination of up to 128 characters (116 characters for local temporary table names), as long as you enclose them in single quotes (‘’) or brackets ([ ]). Just because you can, however, doesn’t necessarily mean you should. Many of the allowed characters are hard to differentiate from other similar-looking characters, and some might not port well to other platforms. The following suggestions will help you avoid potential problems:
Finally, to make your code more readable, select a capitalization style for your identifiers and code, and use it consistently. Our preference is to fully capitalize T-SQL keywords and use mixed-case and underscore characters to visually “break up” identifiers into easily readable words. Using all capital characters or inconsistently applying mixed case to code and identifiers can make your code illegible and hard to maintain. Consider the example query in Listing 1-5.
Listing 1-5. All-Capital SELECT Query
SELECT P.BUSINESSENTITYID, P.FIRSTNAME, P.LASTNAME, S.SALESYTD
FROM PERSON.PERSON P INNER JOIN SALES.SALESPERSON SP
ON P.BUSINESSENTITYID = SP.BUSINESSENTITYID;
The all-capital version is difficult to read. It’s hard to tell the SQL keywords from the column and table names at a glance. Compound words for column and table names are not easily identified. Basically your eyes have to work a lot harder to read this query than they should, which makes otherwise simple maintenance tasks more difficult. Reformatting the code and identifiers makes this query much easier on the eyes, as Listing 1-6 demonstrates.
Listing 1-6. Reformatted, Easy-on-the-Eyes Query
SELECT
p.BusinessEntityID,
p.FirstName,
p.LastName,
sp.SalesYTD
FROM Person.Person p INNER JOIN Sales.SalesPerson sp
ON p.BusinessEntityID = sp.BusinessEntityID;
The use of all capitals for the keywords in the second version makes them stand out from the mixed-case table and column names. Likewise, the mixed-case column and table names make the compound word names easy to recognize. The net effect is that the code is easier to read, which makes it easier to debug and maintain. Consistent use of good formatting habits helps keep trivial changes trivial and makes complex changes easier.
When writing SPs and UDFs, its good programming practice to use the “one entry, one exit” rule. SPs and UDFs should have a single entry point and a single exit point (RETURN statement). The following SP retrieves the ContactTypelD number from the AdventureWorks2012 Person.ContactType table for the ContactType name passed into it. If no ContactType exists with the name passed in, a new one is created, and the newly created ContactTypelD is passed back. Listing 1-7 demonstrates this simple procedure with one entry point and several exit points.
Listing 1-7. Stored Procedure Example with One Entry and Multiple Exits
CREATE PROCEDURE dbo.GetOrAdd_ContactType
(
@Name NVARCHAR(50),
@ContactTypeID INT OUTPUT
)
AS
DECLARE @Err_Code AS INT;
SELECT @Err_Code = 0;
SELECT @ContactTypeID = ContactTypeID
FROM Person.ContactType
WHERE [Name] = @Name;
IF @ContactTypeID IS NOT NULL
RETURN; -- Exit 1: if the ContactType exists
INSERT
INTO Person.ContactType ([Name], ModifiedDate)
SELECT @Name, CURRENT_TIMESTAMP;
SELECT @Err_Code = 'error';
IF @Err_Code <> 0
RETURN @Err_Code; -- Exit 2: if there is an error on INSERT
SELECT @ContactTypeID = SCOPE_IDENTITY();
RETURN @Err_Code; -- Exit 3: after successful INSERT
GO
This code has one entry point, but three possible exit points. Figure 1-3 shows a simple flowchart for the paths this code can take.
Figure 1-3. Flowchart for Example with One Entry and Multiple Exits
As you can imagine, maintaining code such as in Listing 1-7 becomes more difficult because the flow of the code has so many possible exit points, each of which must be accounted for when you make modifications to the SP. Listing 1-8 updates Listing 1-7 to give it a single entry point and a single exit point, making the logic easier to follow:
Listing 1-8. Stored Procedure with One Entry and One Exit
CREATE PROCEDURE dbo.GetOrAdd_ContactType
(
@Name NVARCHAR(50),
@ContactTypeID INT OUTPUT
)
AS
DECLARE @Err_Code AS INT;
SELECT @Err_Code = 0;
SELECT @ContactTypeID = ContactTypeID
FROM Person.ContactType
WHERE [Name] = @Name;
IF @ContactTypeID IS NULL
BEGIN
INSERT
INTO Person.ContactType ([Name], ModifiedDate)
SELECT @Name, CURRENT_TIMESTAMP;
SELECT @Err_Code = @@error;
IF @Err_Code = 0 -- If there's an error, skip next
SELECT @ContactTypeID = SCOPE_IDENTITY();
END
RETURN @Err_Code; -- Single exit point
GO
Figure 1-4 shows the modified flowchart for this new version of the SP.
Figure 1-4. Flowchart for Example with One Entry and One Exit
The one entry and one exit model makes the logic easier to follow, which in turn makes the code easier to manage. This rule also applies to looping structures, which you implement via the WHILE statement in T-SQL. Avoid using the WHILE loop’s CONTINUE and BREAK statements and the GOTO statement; these statements lead to old-fashioned, difficult-to-maintain spaghetti code.
Defensive Coding
Defensive coding involves anticipating problems before they occur and mitigating them through good coding practices. The first and foremost lesson of defensive coding is to always check user input. Once you open your system up to users, expect them to do everything in their power to try to break your system. For instance, if you ask users to enter a number between 1 and 10, expect that they’ll ignore your directions and key in ; DROP TABLE dbo.syscomments; -- at the first available opportunity. Defensive coding practices dictate that you should check and scrub external inputs. Don’t blindly trust anything that comes from an external source.
Another aspect of defensive coding is a clear delineation between exceptions and run-of-the-mill issues. The key is that exceptions are, well, exceptional in nature. Ideally, exceptions should be caused by errors that you can’t account for or couldn’t reasonably anticipate, like a lost network connection or physical corruption of your application or data storage. Errors that can be reasonably expected, like data entry errors, should be captured before they are raised to the level of exceptions. Keep in mind that exceptions are often resource intensive, expensive operations. If you can avoid an exception by anticipating a particular problem, your application will benefit in both performance and control. In fact, SQL Server 2012 offers a valuable new error handling feature called THROW. The TRY/CATCH/THROW statemetns will be discussed in more detail in Chapter 17.
Consider the SELECT * style of querying. In a SELECT clause, the asterisk (*) is a shorthand way of specifying that all columns in a table should be returned. Although SELECT * is a handy tool for ad hoc querying of tables during development and debugging, you should normally not use it in a production system. One reason to avoid this method of querying is to minimize the amount of data retrieved with each call. SELECT * retrieves all columns, whether or not they are needed by the higher-level applications. For queries that return a large number of rows, even one or two extraneous columns can waste a lot of resources.
Also, if the underlying table or view is altered, columns might be added to or removed from the returned result set. This can cause errors that are hard to locate and fix. By specifying the column names, your front-end application can be assured that only the required columns are returned by a query, and that errors caused by missing columns will be easier to locate.
As with most things, there are always exceptions—for example, if you are using the FOR XML AUTO clause to generate XML based on the structure and content of your relational data. In this case, SELECT * can be quite useful, since you are relying on FOR XML to automatically generate the node names based on the table and column names in the source tables.
Tip SELECT * should be avoided but if you do need to use it always try to limit the data set being returned. One way of doing so is to make full use of the T-SQL TOP command and restrict the number of records returned. In practice though you should never write SELECT * in your code–even for small tables. Small tables today could be large tables tomorrow.
When you create SPs, UDFs, or any script that uses T-SQL user variables, you should initialize those variables before the first use. Unlike some other programming languages that guarantee that newly declared variables will be initialized to 0 or an empty string (depending on their data types), T-SQL guarantees only that newly declared variables will be initialized to NULL. Consider the code snippet shown in Listing 1-9.
Listing 1-9. Sample Code Using an Uninitialized Variable
DECLARE @i INT; SELECT @i = @i + 5; SELECT @i;
The result is NULL, a shock if you were expecting 5. Expecting SQL Server to initialize numeric variables to 0 (like @i in the previous example) or an empty string will result in bugs that can be extremely difficult to locate in your T-SQL code. To avoid these problems, always explicitly initialize your variables after declaration, as demonstrated in Listing 1-10.
Listing 1-10. Sample Code Using an Initialized Variable
DECLARE @i INT = 0; -- Changed this statement to initialize @i to 0
SELECT @i = @i + 5;
SELECT @i;
Summary
This chapter has served as an introduction to T-SQL, including a brief history of SQL and a discussion of the declarative programming style. We started this chapter with a discussion of ISO SQL standard compatibility in SQL Server 2012 and the differences between imperative and declarative languages, of which SQL is the latter. We also introduced many of the basic components of SQL, including databases, tables, views, SPs, and other common database objects. Finally, we provided our personal recommendations for writing SQL code that is easy to debug and maintain. We subscribe to the “eat your own dog food” theory, and throughout this book we will faithfully follow the best practice recommendations that we’ve asked you to consider.
The next chapter provides an overview of the new and improved tools available out of the box for developers. Specifically Chapter 2 will discuss the SQLCMD text-based SQL client (originally a replacement for osql), SSMS, SQL Server 2012 Books Online (BOL), and some of the other available tools that make writing, editing, testing, and debugging easier and faster than ever.
EXERCISES