Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 14. Databases

Databases are one of computing’s most important inventions. They allow applications to store massive quantities of information, with the ability to search through millions of items and retrieve the ones you need in a fraction of a second. A high-quality database can scale to large numbers of concurrent end users, while providing very reliable storage, even in the face of system crashes. And even if you don’t need the scalability, databases still look compelling if your program needs to remember data for any length of time—applications that store valuable information usually rely on databases.

The .NET Framework provides several different ways to communicate with databases. We will mainly be looking at its most recently introduced data access mechanism, the Entity Framework, and how that works with the LINQ features of C#. But first, we’ll take a quick look at all of the database features of the .NET Framework, to put the Entity Framework in context.

The .NET Data Access Landscape

The main focus of this chapter, the Entity Framework, was first released as part of Service Pack 1 for Visual Studio 2008, which emerged less than a year after the initial (pre-Service-Pack) release of Visual Studio 2008. This was remarkable, since that first release had already introduced a brand-new data access feature, LINQ to SQL, but then Microsoft has released a lot of data access technologies over the years.

While the pace of change can sometimes seem daunting, each new piece has been a useful advance, and despite the new APIs, the data access services that appeared in .NET v1.0 are still relevant today. So we’re not in a state of continuous revolution—new features mostly add layers of functionality. This means it’s useful to understand all the parts in order to know what to choose for your applications, so we’ll review what each is for and how the pieces build on one another.

Classic ADO.NET

.NET v1 provided a set of data access services called ADO.NET.^[34] In more recent years, ADO.NET seems to have grown into an umbrella term—as new data access features have been added, most (but not all) appear in the ADO.NET section of the documentation. But to understand the layers, it’s worth starting with the two parts that were in the first version: interfaces for querying and updating databases, and classes that support disconnected use of data.

IDataReader and friends

ADO.NET defines a family of interfaces that provide a uniform way to perform basic operations such as executing queries, inserting new rows into database tables, and updating or deleting existing rows. Some data access features are common to many different programming systems—if you’re familiar with ODBC, or with Java’s JDBC, you could think of these ADO.NET interfaces as being the .NET equivalent of those APIs.

These interfaces provide the most direct and efficient way to access the basic services offered by relational databases, which is why the other data access features we’ll be looking at in this chapter do not replace this part of ADO.NET. They build on this low-level feature to provide higher-level services.

Because it’s not the main focus of this chapter, we won’t go into too much detail on how this part of ADO.NET works, and will instead just provide a quick taste. Table 14-1 shows the main ADO.NET base classes that represent the various things needed to get a database to do some work.

Table 14-1. ADO.NET basic data access abstract base classes

Class	Represents
`DbConnection`	Connection to a database
`DbCommand`	Command to be executed by a database
`DbParameter`	Parameter for a command
`DbDataRecord`	Single row of data returned by a query; alternatively, the `IDataRecord` interface represents the same concept
`DbDataReader`	Iterator over the full results returned by a query (potentially many rows and many row sets); implements `IDataRecord`
`DbTransaction`	Database transaction

Example 14-1 shows the typical pattern of communication. It starts by creating a connection object—a SqlConnection here because this code connects to SQL Server, but for other databases you’d use other types derived from DbConnection, such as OracleConnection. Next, it builds a command object, setting its CommandText to the SQL we want the database to execute. This particular example is a parameterized command—it selects addresses from a specified state, so we supply the command with a parameter object specifying the state. Then we execute the command by calling ExecuteReader, using the data reader object it returns to iterate through the rows produced by the query, and we print out the values. (This particular example assumes you have a SQL Server instance called .SQLEXPRESS. If you installed the full edition or developer edition of SQL Server, specify just . instead of .SQLEXPRESS. See Getting up and running with SQL Server 2008 Express for information on getting the samples installed.)

Example 14-1. Basic data access with ADO.NET

string sqlConnectionString = @"Data Source=.sqlexpress;" +
    "Initial Catalog=AdventureWorksLT2008;Integrated Security=True";
string state = "California";

using (DbConnection conn = new SqlConnection(sqlConnectionString))
using (DbCommand cmd = conn.CreateCommand())
{
    cmd.CommandText =
        "SELECT AddressLine1, AddressLine2, City FROM SalesLT.Address WHERE " +
        "StateProvince=@state";
    DbParameter stateParam = cmd.CreateParameter();
    stateParam.ParameterName = "@state";
    stateParam.Value = state;
    cmd.Parameters.Add(stateParam);

    conn.Open();
    using (DbDataReader reader = cmd.ExecuteReader())
    {
        while (reader.Read())
        {
            string addressLine1 = reader.GetString(0);
            // AddressLine2 is nullable, so we need to be prepared to get
            // back either a string or a DBNull
            string addressLine2 = reader.GetValue(1) as string;
            string city = reader.GetString(2);

            Console.WriteLine(addressLine1);
            Console.WriteLine(addressLine2);
            Console.WriteLine(city);
        }
    }
}

You might be wondering why we’re fiddling around with parameter objects when it would have been simpler to just put the state directly into the SQL string. This particular example hardcodes the state, so that would have worked, but the technique here would be important if the value was picked at runtime. In general, building SQL queries with string concatenation is a dangerous thing to do—if any of the text comes from outside your code (e.g., from a form on a web page, or part of a URL) your code will be vulnerable to a SQL injection attack. Imagine that Example 14-1 was part of a web application, and state here came from part of a URL such as http://example.com/showinfo?state=California. Users are free to modify URLs—you can just type them into the address bar—so a malicious user might decide to modify that part of the URL. If the code just took the string from the URL and concatenated it directly into the SQL, we would effectively be giving anyone with an Internet connection the ability to run arbitrary SQL commands on our database—SQL queries can contain multiple commands, so users would be able to add extra commands to run after the SELECT. Parameters are one way to avoid this, because the value of a parameter will not be treated as SQL. So it’s a good idea to get in the habit of using parameters whenever some part of the query needs to change at runtime.

The API we used here directly reflects the steps needed to communicate with a database, so we have to write a lot of code to bridge between the queries, parameters, and columns of the database world and the world of C#. Just as a sneak preview, Example 14-2 shows the equivalent code using the Entity Framework. Notice that instead of having to build a parameter object for a parameterized query, we’ve just been able to use a LINQ where clause and the C# == comparison syntax. (The Entity Framework performs a parameterized query under the covers, so this is safe from SQL injection attacks.) Also notice that all the database columns are available as object properties, so we don’t have to call GetString or similar helpers to retrieve column values.

Example 14-2. LINQ to Entities versus ADO.NET

string state = "California";
using (var context = new AdventureWorksLT2008Entities())
{
    var addresses = from address in context.Addresses
                    where address.StateProvince == state
                    select address;

    foreach (var address in addresses)
    {
        Console.WriteLine(address.AddressLine1);
        Console.WriteLine(address.AddressLine2);
        Console.WriteLine(address.City);
    }
}

Example 14-1 has one obvious benefit in exchange for the complexity: we have complete control over the SQL query. You can’t see the SQL in Example 14-2 because it gets generated for you. In general, the low-level ADO.NET API gives us more direct access to database features—for example, with SQL Server you can arrange to be notified when a query you executed earlier will now return different results due to changes in the database. (This can be useful in caching systems—ASP.NET’s cache can take advantage of this. It needs to be used with care, however, because it requires you to keep a database connection open at all times, which can cause severe scaling problems.)

Another potential benefit is that Example 14-1 does not require the application to commit to the Entity Framework’s way of doing things. Not all applications necessarily want to use databases in the way the Entity Framework chooses to.

The use of this old-style part of ADO.NET usually comes down to a need for control over some specific aspects of data access, or occasionally because it offers performance benefits in certain specialized scenarios. But for the majority of developers, this style of data access will be unnecessarily low-level and verbose.

These interfaces are not the only part of ADO.NET v1—it has another piece whose job is to manage data after the query that fetched it has completed.

ADO.NET data sets

ADO.NET defines the DataSet class, which is a collection of DataTable objects. A DataTable is an in-memory copy of some tabular data. Typically, this would be fetched from a database table or view, although it’s possible to build up a DataTable from any information source—it provides methods for creating new rows from scratch.

A DataSet can be a convenient way of loading a small subset of a database’s contents into client-side code, enabling information to be browsed locally using data binding. It also supports some basic client-side processing of a kind that might normally be done in the database—you can perform searching, filtering, and sorting, for example. In Windows GUIs, moving this sort of work to the client side can improve responsiveness—the user doesn’t have to wait for the database to respond to see results. This is what’s meant by disconnected operation—you can still work with the data even after closing the connection to the database.

DataSet objects are serializable, so it’s possible to save one to disk or send it across a network. It can use an XML representation, which in theory makes it possible for non-.NET code to access the information in a DataSet. However, while it’s certainly workable, in practice this seems not to be a popular technique. This may be because the XML representation is relatively complex and unique to the DataSet, so there’s not much support for it outside of the .NET Framework.

Visual Studio is able to generate derived classes to build a so-called strongly typed DataSet, whose tables offer row objects with .NET properties representing columns in the corresponding database table. Strongly typed DataSets are often used to reduce the amount of code required to bridge between C# and the database. However, since LINQ to SQL and LINQ to Entities came along, this use of DataSets has become less popular, because the LINQ-based approaches offer the same benefit but are typically easier to use. So DataSets are somewhat out of favor today.

The low-level ADO.NET data access interfaces were the main way to access data in .NET right up until .NET 3.5 and Visual Studio 2008 shipped, bringing LINQ.

LINQ and Databases

As we saw in Chapter 8, LINQ lets you perform tasks with collections of data including filtering, sorting, and grouping. In that chapter, we were working only with objects, but these are exactly the jobs that databases are good at. Moreover, one of the motivations behind LINQ’s design was to make it easier to use databases from code. As you can see in Example 14-2, LINQ blends data access seamlessly into C# code—this database example looks very similar to the object examples we saw in the earlier chapter.

Database LINQ Providers

Database LINQ providers work very differently from the LINQ to Objects provider, even though queries use the same syntax for both. In LINQ to Objects, a where clause does all of its work inside the .NET Framework—it’s similar in action to a loop with an if statement. But trying that with a database would be a disaster—if your where clause is designed to select a single row out of 20 million, you absolutely don’t want C# code to iterate through all 20 million rows! You want the database to do the filtering so that it can use its indexes to locate the row efficiently.

And it works exactly as you’d want—the LINQ where clause in Example 14-2 is effectively translated into a SQL WHERE clause. As you may recall, C# converts a LINQ query expression into a series of method calls, and those method calls just end up building a query object that knows how to return the results. LINQ uses deferred execution—the query doesn’t start returning results until you ask for them. LINQ providers for databases do something similar, but instead of working directly with IEnumerable<T>, they use a specialized type that derives from IEnumerable<T>, called IQueryable<T>. Since IQueryable<T> derives from IEnumerable<T>, you can still enumerate its contents in the usual ways, but it’s only when you do this that it generates a suitable database query; it won’t touch the database until you start asking to see elements. So we still have deferred execution, but crucially, when you finally execute the query the complete chain of processing that your LINQ query represents is turned into a single SQL query so that the database can do all the work.

In short, whereas LINQ to Objects enumerates all the objects from the source and runs the chain of processing inside your .NET application, database LINQ providers push the processing to the database.

Example 14-2 uses LINQ to Entities—a LINQ provider for the Entity Framework. The Entity Framework didn’t appear until Service Pack 1 of Visual Studio 2008, and there’s an older database LINQ provider called LINQ to SQL that appeared in the first Visual Studio 2008 release.

LINQ to SQL works only with SQL Server and SQL Server Compact 3.5, and has a fairly narrow goal. It aims to reduce the overhead involved in writing data access code by providing a convenient C# syntax for working with the data in a SQL Server database.

The Entity Framework is similar, but it adds a couple of additional features. First, it is designed to support multiple database vendors—it has an open provider model, enabling support to be written for any database, and you can get providers for most popular databases. Second, the Entity Framework allows the .NET representation to have a different structure from your database schema if necessary. You can define a conceptual model whose entities do not necessarily correspond directly to rows of particular tables—an entity might include data that spans multiple tables in the database itself. This entity can then be represented as a single object.

Of course, it’s possible to have your conceptual model correspond exactly to your database model—you’re free to create a straightforward mapping where one entity represents one row in one table. Used in this way the Entity Framework, in conjunction with LINQ to Entities, makes LINQ to SQL look redundant. So why do we have both?

The main reason LINQ to SQL exists is that it was ready when Visual Studio 2008 shipped, whereas Microsoft hadn’t finished the Entity Framework at that point. LINQ was a major part of that release, and since one of the main motivations behind LINQ was data access, shipping without a LINQ-based data access feature would have been a bit of a letdown. LINQ to SQL was developed by a different team (it came from the LINQ team, and not the data access group), and it was ready earlier, due no doubt in part to its less ambitious goals.

Microsoft has stated that while both technologies are fully supported, the Entity Framework is where the majority of its efforts will now be focused. Visual Studio 2010 adds a few new LINQ to SQL features, but LINQ to Entities will see more development in the long run.

That’s why this chapter’s focus is the Entity Framework (although a lot of the concepts here apply equally to both technologies). That being said, both authors really like LINQ to SQL. In scenarios where we’re using SQL Server and where we don’t need the conceptual model and mapping features of the Entity Framework, we’re both more inclined to use LINQ to SQL because of its simplicity and because we’ve already learned how to use it. But if you learn only one data access technology for .NET, the Entity Framework looks like the better choice for the long term.

Non-Microsoft Data Access Technologies

By the time Microsoft shipped the Entity Framework, various third-party options for mapping relational data into object models had been around for a while. We’re not going to talk about them in this book, but it’s useful to be aware that the Entity Framework isn’t the only game in town.

Perhaps the best known alternative is NHibernate. This is a .NET version of Hibernate, a popular Java ORM (Object Relational Mapper). NHibernate had already been around for a few years by the time the Entity Framework emerged (and its Java progenitor is considerably older). So in many respects it’s a more mature and more fully featured ORM than the Entity Framework. On the other hand, NHibernate predates LINQ (and Java currently has nothing resembling LINQ), so at the time of this writing, its LINQ support is somewhat limited.

Many other ORMs are available for .NET, some free and some commercial. They are too numerous to mention here, as a quick web search will confirm.

WCF Data Services

Most communication with databases happens over specialized, vendor-specific protocols. Firewalls are usually configured not to let such protocols through, and with good reason: from a security perspective, making your database directly accessible on the Internet tends to look like a very bad idea. Nonetheless, some people want to do this, and there are scenarios in which it’s not the terrible idea it might first seem, particularly if you can exercise sufficient control over what gets exposed.

With WCF Data Services, you can present a relational data store over HTTP and XML or JSON. You can be selective about what data you expose and to whom. Moreover, the model you present doesn’t necessarily have to be the same as your underlying database structure. In fact, there doesn’t have to be a database involved at all—there’s a provider model that enables you to present any data through this mechanism, as long as you can find a way to make it look like relational data.

You will normally use WCF Data Services in conjunction with the Entity Framework—you can define the entities you’d like to present over HTTP, and use the framework’s mapping services to bridge between that and the underlying data store. So we’ll be looking at these services in more detail later in the chapter, once we’ve finished exploring the Entity Framework.

The focus of WCF Data Services is different than for the other data access features we’ve discussed so far—it’s mainly about presenting data on the network, where everything else has been about consuming data. However, there’s also a client-side component that provides LINQ-based querying for such services. While it’s part of the WCF Data Services technology, it’s optional—you’re not obliged to use it on the client. And this client doesn’t strictly require WCF Data Services on the server—the client-side parts could be used against any service that exposes data in the same way.

Silverlight and Data Access

Silverlight uses a seriously trimmed down version of the .NET Framework to keep its download size and install time tolerably small. It doesn’t have much data access support. In fact, size is not the only reason—it wouldn’t normally make sense for a Silverlight client application to attempt to connect directly to a database, because Silverlight is a client-side web technology and most system administrators work to ensure that their databases are not accessible via their native protocols over the Internet.

Of course, a direct connection to a database server might be an option in an intranet scenario, but it’s not supported. Silverlight offers LINQ, but neither the LINQ to SQL nor the LINQ to Entity Framework providers are available, because the underlying database access mechanisms that these providers use are missing. The only supported database access mechanism in Silverlight is the WCF Data Services client.

Databases

The full .NET Framework is designed to work with a wide range of databases. The simple ADO.NET data access we started with uses interfaces to allow database vendors to supply their own database-specific implementations. Likewise, the Entity Framework is database-agnostic—it has an open provider model designed to allow support for any relational database to be added. Of course, Microsoft ships a provider for its own database, SQL Server, but other suppliers offer providers for various databases, including Oracle, MySQL, PostgreSQL, SQLite, Sybase, and DB2.

In this book, we will use SQL Server. The examples work with SQL Server, which is available for free. (Some editions of Visual Studio will automatically install SQL Server 2008 Express for you by default.) The Express edition of SQL Server is the same database engine as the “real” versions, but with some limits on database size and with some of the more advanced features missing. Despite being a trimmed down version, it’s easily capable of supporting substantial websites. It can also be used on client applications written with WPF or Windows Forms, to support client-side data stores or caching, although it can complicate the installation process for such an application—installing a SQL Server instance is not a trivial task.

Getting up and running with SQL Server 2008 Express

If you want to follow the examples in this chapter, not only will you need a copy of SQL Server 2008 Express installed, but you’ll also need to install a sample database. We’ll be using the lightweight version of the Adventure Works database available from http://msftdbprodsamples.codeplex.com/.

Getting this sample up and running is slightly fiddly, because there are numerous different versions of the Adventure Works sample—there are full and lightweight versions for both SQL Server 2005 and SQL Server 2008, and each version of SQL Server comes in various editions, not all of which put their datafiles in the same place. Because of all the variations, it’s quite easy to find that the sample database has failed to appear even though the installation appeared to proceed without error.

Moreover, the steps required to install the database change from time to time, as new versions are released. We had been planning to provide detailed steps here, but while we were writing this book, changes to the database installer rendered the first set of instructions we had produced useless. Since that could well happen again between us finishing the book and you reading it, we’re providing the instructions as part of the sample code you can download for this book from the O’Reilly website so that we can update them when necessary. You can find these at http://oreilly.com/catalog/9780596159832/.

Now that we’ve finished a quick survey of the data access features available in .NET and we’ve seen how to get the sample database installed, let’s look at the Entity Framework in more detail. We’ll start with the model at the heart of the framework.

The Entity Data Model

The main goal of the Entity Framework (or EF for short) is to make it easier for your code to work with data in the database. C# objects are quite different in nature than the information stored in a relational database, and the process of managing these differences, and transferring data between these two worlds, is called mapping. (So the Entity Framework is a kind of ORM.) As Figure 14-1 illustrates, mapping happens in both directions. As information is fetched from the database, it is loaded into objects. And if your C# code modifies these objects or creates new ones, you can arrange for the database to be correspondingly updated.

Figure 14-1. Models and mapping in the Entity Framework

The design of a database doesn’t always correspond directly to data structures convenient for our application code. There are many reasons we might want our code to work with a model that looks slightly different from the data. The database may contain information not required by the part of the application we’re writing, so we may need only a subset. Information about a particular entity may have been split across multiple tables for performance reasons. Naming conventions in the database might not suit our code.

So the Entity Framework allows us to control the mapping. We can define a conceptual model that describes the entities as we’d like to work with them from C#, along with mappings that describe how that model maps onto the underlying storage. The EF also requires us to provide a store schema, which is a definition of the structure we expect it to find in the database. This may seem redundant—after all, the database knows its own schema, so why would the EF need a copy? There are a couple of reasons. First, it’s possible to define the model before you create the database—you can generate a database schema from the store schema. Second, you can configure aspects of how the Entity Framework uses the database, such as whether it uses queries or stored procedures to access particular tables. Settings that are associated with the database itself rather than what the EF does with data belong in the store schema rather than the mappings or conceptual schema.

The three parts shown in Figure 14-1—the conceptual model, the storage model, and the mappings between them—are collectively known as the Entity Data Model, or EDM.

Warning

There are many constraints on the conceptual model, because the model is useful only if you can construct a successful mapping. There are limits on what mappings are able to do, so your existing database structure will impose some restrictions on the model. Developers who are new to the Entity Framework often find that they have a lot less freedom in the design of the conceptual model than they first presumed. We’ll see what mappings are possible in due course, but for now, do not imagine that the EF is able to take any arbitrary conceptual model and bridge it to any old database structure—there is necessarily a close relationship between the database and the conceptual model.

If you use the EF in the simplest way possible, your conceptual model will be the same as your storage model, and the mapping will be very straightforward. If you use Visual Studio’s wizard for adding EF support to a project, you’ll end up with exactly this sort of direct mapping, with one entity type for each table or view you import. But you can then tweak things to suit your needs. We’ll walk through the wizard now—even though it produces a straightforward mapping where the conceptual model matches the storage model, it still has to generate a complete set of model and mapping definitions, so it’s instructive to look at what it produces.

You can add EF support to any .NET project (except for Silverlight projects). We’ll use a console application for the examples. In the Add New Item dialog, we’ll select Visual C# Items→Data, and then choose the ADO.NET Entity Data Model item template, calling the new file “AdventureWorksModel”.

When you add an Entity Data Model to your project, Visual Studio asks whether you want to start from scratch or base your model on an existing database. We’ll choose that simpler second option. If you’ve previously told Visual Studio about any databases you’re using—either via the Server Explorer toolbar or by using this or other data-related wizards—it will show them in a drop down, but you can provide Visual Studio with new connection details from within the wizard. For this walkthrough, we’re going to connect to the AdventureWorksLT2008 sample database.

Note

The wizard uses the name of your connection for one of the types that it generates. You’ll see the identifier AdventureWorksLT2008Entities cropping up in various examples later. If you happen to give your connection a different name in Visual Studio, you’ll need to use that name in the code.

Once you’ve chosen a database, Visual Studio will show a tree view of all the tables, views, and stored procedures—you can use these as the starting point for your model. For each item you select, it will add corresponding items to the store schema, the conceptual schema, and the mappings. When you complete the wizard, it will generate an .edmx file that defines the generated entity model. Visual Studio opens a graphical view of this file—Figure 14-2 shows the conceptual model that appears if you select the Customer, SalesOrderHeader, and SalesOrderDetail tables in the wizard and then click Finish.

This view shows only the conceptual model. You can see slightly more of the EDM in the Model Browser, shown in Figure 14-3. This will normally appear by default when you open an EDM, but if you rearrange your windows and lose track of it, you can right-click on the background of the model and choose Model Browser from the context menu. The browser lists both the conceptual schema (under the AdventureWorksLT2008Model node here) and the store schema (under AdventureWorksLT2008Model.Store). The three selected tables are visible in both, and if you were to expand them, you’d see that the properties of each entity in the conceptual model correspond directly to the columns of the tables in the store schema.

Even the Model Browser doesn’t show the complete picture, as the Entity Data Model has three parts: the conceptual schema, the store schema, and the mappings. To see the mappings, you can select either entities or properties of entities. When you do this in either the main .edmx editor view (Figure 14-2) or the Model Browser (Figure 14-3), the Mapping Details window, shown in Figure 14-4, will display the mappings for the selected item. The Mapping Details panel should appear automatically, but if you don’t see it, you can open it with the View→Other Windows→Entity Data Model Mapping Details menu item.

As Figure 14-4 shows, the generated mapping is pretty simple. On the left you can see each column from the table definition in the store schema, and on the right you can see which entity property it is mapped to. Since the store schema and conceptual model were generated directly from the database schema, there’s nothing complicated going on—the same names appear on either side, and the only difference is that the lefthand side shows data types from the database world such as nvarchar and bit, while the righthand side shows .NET data types such as String and Boolean.

Figure 14-2. Conceptual model with three entities

Generated Code

Visual Studio puts the whole Entity Data Model definition in an .edmx file, which is just XML. The wizards and editor windows we’ve seen so far are just convenient views into that XML. If you look directly at the XML in the .edmx, you’ll see it contains sections corresponding to the three parts of the model—storage schema, conceptual schema, and mappings. But the whole point of this exercise was to make it easier to use data from code. So Visual Studio generates code based on the contents of any .edmx files in your project.

For each entity type you define, a corresponding .NET class will be generated. These classes provide normal properties for each property in the entity type’s definition. So when you create and edit entity types, you are in effect defining .NET types that you can use from your C# code.

Figure 14-3. EDM in the Model Browser

Figure 14-4. The EDM Mapping Details window

Note

The generated types derive from EntityObject, a base class that enables the object to participate in the Entity Framework. This includes features such as change tracking so that the framework can know when it needs to write updates to the database. The first version of the EF required entities to derive from this base class or to implement certain EF interfaces, but .NET 4 introduced so-called POCO (Plain Old CLR Object) support, which makes it possible to use an existing class hierarchy with the Entity Framework without having to modify those classes (as long as you can create a successful mapping). There’s more work to do that way—if you don’t derive from EntityObject you need to write extra supporting code to provide the EF with enough information to know how it should handle change tracking, identity, and relationships for your objects. Here we’re sticking with the simpler approach of letting the wizard generate classes that derive from the EF’s base type.

Visual Studio also generates one extra class representing something called the object context. You use this to obtain entity objects representing data already in the database and it’s also where you go to add new data. And as we’ll see later, this object provides other services for managing the data access operations. This type derives from ObjectContext, and sometimes it’s just referred to as the context. Example 14-3 uses this generated context type to retrieve rows from the SalesOrderHeader table for a particular date.

Example 14-3. Using generated entity types

using (var dbContext = new AdventureWorksLT2008Entities())
{
    DateTime orderDate = new DateTime(2004, 6, 1);
    var orders = from order in dbContext.SalesOrderHeaders
                 where order.OrderDate == orderDate
                 select order;

    foreach (SalesOrderHeader order in orders)
    {
        Console.WriteLine(order.TotalDue);
    }
}

Notice that this example wraps the context in a using statement—the object context is a disposable resource because it does a lot of work behind the scenes, and it needs to tidy up once you no longer need the state it builds up. So it’s important that you dispose it when you’re done with it.

The object context’s type here is AdventureWorksLT2008Entities. By default, Visual Studio will just append the word Entities to your database connection name. You can change this by selecting the EntityContainer item in the Model Browser—you can see this in the middle of Figure 14-3—and then use the Properties panel to choose its name. But we’ll keep the default name in the examples.

Notice that the LINQ query in Example 14-3 uses the context’s SalesOrderHeaders property as the query source. That’s not quite the same as the table name—the wizard has added an s. By default, the Entity Framework wizard will attempt to pluralize and depluralize words as appropriate—in general, it gives entity types singular names while properties that return collections of entities are plural. (The names in our conceptual model can differ slightly from our storage model thanks to the Entity Data Model’s mapping.) If you don’t like this plural handling, there’s a checkbox to turn it off when you import tables with the wizard.

Example 14-3 also uses the SalesOrderHeader class generated for the entity type of the same name. The order range variable in the LINQ query is of this type, as is the order iteration variable in the loop.

It’s this generated entity class that enables us to refer to database columns using normal C# syntax. The LINQ query’s where clause uses that entity class’s OrderDate property to build a query that uses the OrderDate column of the corresponding database table. Likewise, the loop uses normal C# property syntax to retrieve TotalDue, which represents the column of the same name in the database.

If this seems rather uneventful, well, that’s the idea. Compare this to the much more fiddly code in Example 14-1—by mapping database columns to properties, the Entity Framework reduces the amount of friction involved in writing data access code.

Note

You can find the generated source code for the entities in the Solution Explorer by expanding the .edmx file—you’ll find a file with a similar name, but with .Designer.cs in place of .edmx (so AdventureWorks.Designer.cs in this case). As with all generated code you should avoid modifying it—Visual Studio tends to regenerate code from scratch each time any setting changes. But if you’d like to add features to these generated classes, that’s easily done—all the classes are generated with the partial keyword, meaning that you can put additional members in separate source files. You can add another class definition with the same name as an entity type, marked with the partial keyword, to any source file. The C# compiler will effectively merge your partial class with the generated partial class.

Changing the Mapping

Let’s change things a bit, so the mapping has something to do. Most of the column names in this example database happen to fit the usual .NET conventions for property names, but there’s an exception: the rowguid column in SalesOrderHeader is not capitalized in the usual way. (This column exists to support SQL Server replication, so it’s fairly unusual to want to use it from code, but in this example it’s the only column with a funny-looking name.) If you change this name to RowGuid in the designer (either by double-clicking on the property or by using the Properties panel) Visual Studio will update the mapping, and the Mapping Details panel will show that the rowguid column in the table is mapped to the RowGuid property of the entity. (If you’d prefer a less subtle change, you could rename the Customer entity’s ModifiedDate to, say, LastChanged. The mapping lets you use any names you like.)

Restrictions on Removing Mapped Properties

You might look at the large number of properties in the SalesOrderHeader entity and decide that you’d like to remove some properties completely. While you can do this, be aware that the EF requires certain columns to be mapped. For example, it needs the columns that form the primary key to be mapped, because these are necessary to identify an object as representing a particular row in the database. Less obviously, the EF also demands that you provide a mapping for any nonnullable columns that don’t have default values, because without them it would be impossible to create new instances of the entity. The database will insist that the INSERT statement the EF generates for creating the new row provides values for all nonnullable columns that don’t have default values. The EF therefore requires that you provide mappings for such columns so that it’s able to generate a valid INSERT.

This can be frustrating if you don’t actually need to create new items. For example, your code might only need to read data that was added by some other part of the system. It’s possible to create read-only entities as a way around this, but it’s not straightforward—you need to define either a query view or a defining query, both of which are advanced topics beyond the scope of a one-chapter description of the Entity Framework.

Changing the names of a few columns isn’t very exciting. (And with this particular example database it’s not even very useful, although if you’re dealing with more idiosyncratic naming schemes, renaming becomes more important.) So let’s look at one of the more interesting mapping features: the way in which the EF handles relationships between entities.

Relationships

Databases usually have relationships between tables. In the Adventure Works example, the Customer table has a foreign key relationship with the SalesOrderHeader table. Both tables have a CustomerID column. This is the primary key of the Customer table, and the database schema includes a constraint enforcing that the CustomerID column in the SalesOrderHeader can contain only a value for which a corresponding Customer row exists. Or to put it more simply, each SalesOrderHeader row must be related to a specific Customer row.

Note

This has nothing to do with the relations in a relational database, incidentally. The relational name comes from the set theory underpinning databases, and a relation is not the same thing as a relationship here. A relation effectively corresponds to a table.

Visual Studio’s EDM wizard looks for foreign key constraints in the database schema to discover the relationships between tables. In the EDM, it turns these into associations. (The distinction between a relationship and an association is somewhat subtle. An association is a named item in the Entity Data Model representing a particular relationship. The main reason this distinction exists is that relationships are a slightly more general concept—associations are capable of modeling only certain kinds of relationships.) Just as tables added with the wizard end up appearing in all three parts of the EDM—a table will appear in the store schema, a corresponding entity will be added to the conceptual schema, and there will be a mapping between the two—a similar process occurs for associations. If a foreign key constraint indicates that there’s a relationship between two database tables added through the wizard, Visual Studio will add an association to the EDM’s store schema and also to the conceptual schema, and it will set up a suitable mapping between these two associations. And on top of this, it will add navigation properties to the related entities in the conceptual model.

Note

In previous versions of the Entity Framework, foreign key columns represented with navigation properties would not get scalar properties providing direct access to the key values—there would have been no CustomerID property on the SalesOrderHeader type, for example. This proved awkward in practice, so starting with .NET 4 relationships are represented both as the underlying foreign key value and also as a navigation property.

Navigation properties

Associations represent relationships between entities, and the most natural way to present them in the world of objects is through properties. For example, you’d expect an object representing a sales order to provide a property that refers to the related customer. That’s exactly what the EF does, and it also works in reverse: the customer object provides a property that holds a collection of references to all the customer’s orders. Example 14-4 shows a LINQ query that gets the number of SalesOrderHeader rows associated with each customer. It fetches one property from the Customer entity class that maps to a column in the database (CustomerID) and also uses a property called SalesOrderHeader, which represents the customer’s orders.

Example 14-4. Using a navigation property

using (var dbContext = new AdventureWorksLT2008Entities())
{
    var customerOrderCounts = from cust in dbContext.Customers
                              select new
                              {
                                  cust.CustomerID,
                                  OrderCount = cust.SalesOrderHeaders.Count
                              };
    foreach (var customerInfo in customerOrderCounts)
    {
        Console.WriteLine("Customer {0} has {1} orders",
            customerInfo.CustomerID, customerInfo.OrderCount);
    }
}

The database table that the Customer entity class represents does not have a column called SalesOrderHeader. The Entity Framework wizard added this property to represent the relationship between the Customer and SalesOrderHeader tables. This is not an ordinary property—in Figure 14-2 you can see that it appears separately, under Navigation Properties.

From C# code, a navigation property looks like a collection. Example 14-4 just retrieves the Count property, but we could do more advanced things. The query in Example 14-5 has a nested query for each customer that looks for all shipped orders (those with a Status of 5), and for each one it reads the total due for that order and a count of all the SalesOrderDetails rows associated with that order. So this uses two navigation properties—the one representing the relationship between customers and orders, and the one representing the relationship between orders and order details.

Example 14-5. Traversing multiple relationships with navigation properties

var info = from cust in dbContext.Customers
           select new
           {
               cust.CustomerID,
               Orders = from order in cust.SalesOrderHeaders
                        where order.Status == 5
                        select new
                        {
                            order.TotalDue,
                            ItemCount = order.SalesOrderDetails.Count
                        }
           };

There’s a reason we’ve used LINQ in these last two examples—it happens to avoid an issue with navigation properties. How does the EF decide how many entities to load for us, and when? In Example 14-4, the LINQ query just retrieves two pieces of information for each customer—the CustomerID and the order count—and while Example 14-5 is more complex, it’s still circumscribed, so the EF can inspect the query to work out exactly what it needs to retrieve. But when we’re not using LINQ, how does the EF know what to do? For instance, consider the code in Example 14-6.

Example 14-6. Following an association after the initial query

Customer myCustomer = dbContext.Customers.Single(
    cust => cust.CustomerID == 29531);
Console.WriteLine(myCustomer.SalesOrderHeaders.Count);

This fetches the entity for a specific customer, and then tries to get the number of SalesOrderHeader entities to which this item is related. Prior to .NET 4, this did not work—it would print out 0, even though the example database has one related order for this customer. In .NET 3.5 SP1, the Entity Framework would initialize navigation properties such as the Customer object’s SalesOrderHeaders property with an empty collection, and would load the related objects only if we ask it to, using the Load method shown in Example 14-7.

Example 14-7. Explicitly loading entities for an association

Customer myCustomer = dbContext.Customers.Single(
    cust => cust.CustomerID == 29531);
myCustomer.SalesOrderHeaders.Load();
Console.WriteLine(myCustomer.SalesOrderHeaders.Count);

.NET 4 adds an alternative way to do this, called lazy loading. Rather than having to call Load explicitly, the EF can automatically load related objects at the point at which you access them. The context has a property to control this:

dbContext.ContextOptions.LazyLoadingEnabled = false;

This is true by default; setting it to false reverts to the pre-.NET 4 behavior. With this option switched on, Example 14-6 is equivalent to Example 14-7, because the EF will call Load for us when we first try to use the navigation property. (The collection ignores calls to Load if the entities are already loaded, so requesting multiple loads is not a problem.)

In either case, the EF has to make an extra trip to the database. The call to Single will fetch the customer from the database before returning, which means that a second request is required when we later ask it (either explicitly or implicitly) to fetch the related rows, because the EF didn’t know we were going to use these items until we asked for them. This might not be a problem, but in general, the more trips you make to the database, the slower things go.

Warning

Be wary of enabling lazy loading, because it can sometimes result in a lot of unnecessary database requests. For example, one author was involved with a project that had some diagnostic code that “helpfully” wrote a snapshot of certain objects into a log, including the value of all their properties. Unfortunately, this code was recursive—if a property referred to another object, it would display that too, and if a property referred to a collection of objects, it would show all of them. This logging code had cycle detection, so it wouldn’t get stuck in an infinite loop, but otherwise it wouldn’t stop until it had showed every object reachable from the starting point. Unfortunately, lazy loading was enabled, so when this code was given an entity, it ended up fetching all entities that were related, no matter how distantly, to the first object at hand, so it hammered the database with thousands of requests each time a log entry was generated.

Modern databases are surprisingly fast—it’s possible for this sort of problem to go unnoticed on development machines with their own local database instance. But you probably don’t want it happening on a busy live server.

To get consistent results you’d want to make sure the initial query and subsequent lazy loads happen as part of a transaction (as shown later), but to ensure scalability in a busy system you want to minimize the number of requests made in any single transaction. So you can tell the EF that you want certain related entities to be fetched at the same time as the main result. You do this with the Include method shown in Example 14-8, which is available on any of the entity sets provided by the context.

Example 14-8. Specifying relationships to preload

var customersWithOrderDetails = dbContext.Customers.
    Include("SalesOrderHeaders.SalesOrderDetails");
Customer myCustomer = customersWithOrderDetails.Single(
    cust => cust.CustomerID == 29531);
Console.WriteLine(myCustomer.SalesOrderHeaders.Count);

This call to Include asks to load related entities available through the Customer entity’s SalesOrderHeaders property. (These will be loaded regardless of the lazy loading setting.) It also says that for each of those related entities, the EF should load any related entities visible through the SalesOrderDetails property. In other words, this tells the EF that we would like it to fetch all of the orders for this customer and all of the details for those orders. It will generate a single query that fetches all of the necessary information in one request.

Note

If you’re wondering why it doesn’t just prefetch all related items all of the time, consider the performance implications. In some circumstances, aggressively prefetching all related items might amount to attempting to copy a significant fraction of your database into memory! But even in more circumscribed cases, fetching more data than you need can slow your system down or reduce scalability.

So far we have seen only so-called one-to-many relationships—one customer can be related to many orders, one order can be related to many order details. But there are other kinds of relationships.

Multiplicity

The multiplicity of a relationship refers to the number of participants at either end of the association. In the Entity Framework, an association’s multiplicity determines the nature of the navigation properties that represent the relationship.

Note

In the Entity Framework, there are always two ends to an association, regardless of the multiplicity. For example, we have customers at one end of a relationship and orders at the other end. The multiplicity describes how many items may be at a particular end, not how many ends there are.

You will sometimes want to represent more complex relationships—for example, a so-called ternary relationship involves three kinds of parties. This is a different concept from multiplicity and is called degree. For example, consider a teaching arrangement in a college, where a student is taught a subject by a teacher; this relationship involves three entities (student, subject, and teacher). These higher-degree relationships are typically modeled in the database by having a table just for the relationship itself. Likewise, the EDM does not directly support relationships with a degree of more than two, so you would represent such a relationship with a distinct entity type in the conceptual model, adding associations between that entity and all the participants in the relationship.

For each end of a relationship, you can specify a multiplicity of either 1, 0..1, or *. The first, 1, means what it says—there is always one item at that end of the association. The last, *, means any number—there can be zero, one, or several items at that end. A multiplicity of 0..1 means zero or one—this indicates that the association is optional, but where present, there is just one entity at this end.

In a one-to-many relationship, the two ends have a multiplicity of 1 and *, respectively. You can see this in Figure 14-2—the lines between entities represent associations, and the multiplicity appears at each end of the line. So an item at the first end can be related to any number of items at the second end; an item at the second end is always related to exactly one item at the first. In C#, the entity at the 1 end would have a navigation property that offers a collection, in order to provide access to the many end. The entity at the * end would provide a simpler noncollection property to get back to the one entity it is related to.

A variation on this theme has 0..1 instead of 1 at the first end, and * at the second end as before. This is similar to a one-to-many relationship, except items at the many end don’t necessarily have to be related to an item at the other end. For example, you might want to represent the relationship between managers and their reports. But if you go far enough up the corporate hierarchy, you will find someone who has no manager—the navigation property would return null. So a simple one-to-many relationship doesn’t work here—you would need 0..1 instead of 1 at the manager end of the association.

Sometimes one-to-one relationships crop up—each item at one end is always related to exactly one item at the other end. This is an unusual kind of relationship because it implies that entities are inextricably and exclusively linked. Relationships that sound like they might be one-to-one are often not. Here’s an illustration from popular culture, describing a relationship between a master and an apprentice expressed as: “Always two, there are. No more, no less. A master, and an apprentice.”^[35] A master always has an apprentice, an apprentice always has a master, so isn’t that a one-to-one relationship? In fact, this might need to be a one-to-many relationship because on the death of an apprentice, the master takes a new apprentice. (The apprentice has just one master, as the only career paths are promotion to master or untimely death. So we can at least be sure that this is not a many-to-many relationship.) The constraint expressed here is merely that the master has a one-at-a-time approach to relationships, much like serial monogamy. (For example, both Darth Maul and Darth Vader were apprentices of Darth Sidious.) So if the database needs to reflect the full history rather than just the current state, a one-to-one relationship won’t be sufficient. (Although if you only need the database to store the current state, one-to-one might be fine here.) In databases, one-to-one relationships often exist because information about a single entity has been split across multiple tables, perhaps for performance reasons. (The EF lets you map this back to a single entity in the conceptual model, so such relationships are likely to be more common in the store schema than the conceptual schema.)

Variations on one-to-one where one or the other end is optional can be useful.^[36] For example, you might have an entity representing a customer and an entity representing an account. An organization (such as a butcher shop) might choose to have a policy where customers are not required to have accounts, but where accounts are held any single customer can have only one account, and accounts must be held by exactly one customer. (That’s not the only imaginable policy, of course.) The relationship between a customer entity and an account entity would have a multiplicity of 1 at the customer end and 0..1 at the account end.

Finally, there are many-to-many relationships. For example, you might have an entity type to represent a standard part such as an M3 bolt, and an entity to represent a part manufacturer. Many manufacturers are capable of producing M3 bolts, and most manufacturers produce more than one kind of product. To model the relationship of who produces what in the EDM, you could use an association with a multiplicity of * for both ends of the association. And in code, both entities would have navigation properties offering collections of objects.

However, there’s an issue with many-to-many relationships in the EF. In the database, such a relationship is represented as a separate table, where each row contains two foreign keys, one for each end of the relationship. If that’s all the table contains, the EF will happily let you map this table to an association in the conceptual model, and the navigation properties will work as described. However, if the table contains other information, you will end up needing to represent it as an entity in its own right. For example, given the product/manufacturer example earlier, it might turn out to be useful to know what product code a particular supplier uses for a particular standard product. There’s no place for this information to go if you just have navigation properties on the product and manufacturer that point to one another—you would need an extra entity type to hold this property that is specific to a particular product/manufacturer combination.

This can get slightly awkward when there are columns in the relationship table that your application doesn’t particularly care about, but which the EF insists are mapped because they are nonnullable and don’t have default values. Your conceptual model would not be able to represent this table as a simple many-to-many association, because that would leave nowhere to map the relationship property. (The underlying issue here is the same one that prevents you from omitting certain database columns from your entities.)

Finally, we’ll look at one more feature of the Entity Framework’s mapping capabilities: support for inheritance.

Inheritance

Inheritance presents a challenge for an ORM, because the typical object-oriented notions of inheritance don’t have any direct parallel in the relational model. Various solutions exist because there isn’t one really good way to do this. The Entity Framework supports mappings for a couple of the popular approaches for attempting to bridge this chasm.

While there are several approaches to mapping (which we’ll get to shortly), the conceptual model’s handling of inheritance works the same way in all cases, and is very similar to inheritance in C#. Any entity type can optionally specify one other entity type as its base type. Entities with a base type inherit all the properties from that base. An entity cannot specify more than one base type, but you are allowed to derive from an entity that derives from another entity (i.e., you can have an inheritance chain). And the corresponding generated entity classes that you use from C# will represent these inheritance relationships with normal class inheritance.

You will need to define mappings for your base entity type in the usual way. All the derived types will inherit these mappings. The question is: how do we map features unique to individual derived types?

The first mapping approach involves mapping all entity types sharing a particular base entity type to a single table in the database. The entity type the EF chooses to represent a particular row is chosen based on a discriminator column—in the mapping you simply provide a list that says, for example, if the discriminator column contains 1, the entity type is Employee, and if it’s 2, the type is Manager, while if it’s 3, the type is Director, and so on. These derived types will presumably have additional properties distinguishing them from one another, and these would map to nullable columns in the table. They will need to be nullable, because these columns will have values only when you’re using the derived types that support them—non-nullable database columns need to be mapped to properties in the base entity type if you’re using this mapping style.

The second mapping approach uses a separate table for each derived type. Derived types still inherit the base mappings, so in this scenario, derived entity types will be involved with two or more tables: the table unique to the derived type, along with any tables used by the base type. This approach requires all the tables involved to use the same primary key.

None of these mapping features would be much use without some way to retrieve data from the database, so we’ll now look at how to execute queries in the Entity Framework.

Queries

We’ve seen some simple LINQ-based examples for retrieving data from the database with the Entity Framework. Under the covers, the EF turns a LINQ query into a SQL query that the database understands. In fact, there are two ways of getting the EF to query the database for data: LINQ and something called Entity SQL. We’ve seen some simple LINQ to Entities examples already, but we’ll now look at it in more detail.

LINQ to Entities

The LINQ provider for the Entity Framework, LINQ to Entities, supports all of the standard LINQ operators we saw in Chapter 8, but it works a little differently. The idea of deferred execution is still present, and it’s even more important. The point at which you cause the LINQ query to execute—the instant at which your code first starts trying to use the results—is the point at which the EF will need to send a request to the database. So looking at the code from Example 14-3, the statement shown in Example 14-9 does not get anything from the database.

Example 14-9. Simple LINQ to Entities query expression

var orders = from order in dbContext.SalesOrderHeaders
             where order.OrderDate == orderDate
             select order;

As always with LINQ, a query expression only defines a query—orders is an object that knows what it’s supposed to return if anything happens to enumerate it. So it’s the foreach loop in Example 14-3 that kicks off the actual request.

The way the EF processes the request is different from how LINQ to Objects works. LINQ to Objects works by forming a chain of operators that work sequentially—the source collection might pass through the Where operator, followed by, say, an OrderBy or a Group operator. The Where operator in LINQ to Objects works by walking through every single item in the source, discarding the ones that don’t meet the filter criteria, and the ones that do meet the criteria get passed on to the next item in the chain.

We really don’t want data access code to work that way, and as mentioned earlier, the EF lets the database do the filtering, which is far more efficient than fetching an entire table and then filtering the items in code. We’ll now verify that it really works this way by using the SQL Profiler tool to examine what the EF does for us.

Note

SQL Profiler is not part of SQL Server 2008 Express, not even if you install the version with advanced services and Management Studio. You will need a full edition of SQL Server. (The Developer edition will do.) SQL Profiler works just fine with the Express version of the database, but it’s distributed and licensed only as part of the fuller editions. As long as you have a suitable license, you can install just the tools from a full edition SQL Server onto a machine that has only the Express version of the database, and it will work just fine. (Unfortunately, if you already installed the Express version of Management Studio, you can’t install the full management tools on the same machine.)

A full description of the SQL Profiler is beyond the scope of this book—we’re using it to show you exactly what the Entity Framework asked the database to do. However, it’s a profoundly useful tool; even if you use it only for the simple task of discovering what SQL queries are being executed. If you plan to do much work with databases, it’s well worth learning how to use it.

By single-stepping through the code in Visual Studio while running the SQL Profiler, we can see that nothing appears in the profiler until we start to execute the foreach loop, at which point the profiler shows an Audit Login message, indicating that our program has opened a connection to the database. This is followed by a RPC:Completed message, indicating that SQL Server processed a request. When we select this message, the profiler shows the SQL that the EF just ran for us:

exec sp_executesql N'SELECT
[Extent1].[SalesOrderID] AS [SalesOrderID],
[Extent1].[RevisionNumber] AS [RevisionNumber],
[Extent1].[OrderDate] AS [OrderDate],
[Extent1].[DueDate] AS [DueDate],
[Extent1].[ShipDate] AS [ShipDate],
[Extent1].[Status] AS [Status],
[Extent1].[OnlineOrderFlag] AS [OnlineOrderFlag],
[Extent1].[SalesOrderNumber] AS [SalesOrderNumber],
[Extent1].[PurchaseOrderNumber] AS [PurchaseOrderNumber],
[Extent1].[AccountNumber] AS [AccountNumber],
[Extent1].[CustomerID] AS [CustomerID],
[Extent1].[ShipToAddressID] AS [ShipToAddressID],
[Extent1].[BillToAddressID] AS [BillToAddressID],
[Extent1].[ShipMethod] AS [ShipMethod],
[Extent1].[CreditCardApprovalCode] AS [CreditCardApprovalCode],
[Extent1].[SubTotal] AS [SubTotal],
[Extent1].[TaxAmt] AS [TaxAmt],
[Extent1].[Freight] AS [Freight],
[Extent1].[TotalDue] AS [TotalDue],
[Extent1].[Comment] AS [Comment],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [SalesLT].[SalesOrderHeader] AS [Extent1]
WHERE [Extent1].[OrderDate] = @p__linq__0',
N'@p__linq__0 datetime',@p__linq__0='2004-06-01 00:00:00'

It might be quite long, but structurally that’s a pretty simple SELECT statement. The only reason it’s so large is that it explicitly requests every column required by the entity (and it has specified each column in a fairly verbose manner). The interesting part is in the last two lines. The penultimate line is a parameterized WHERE clause comparing the OrderDate to a named argument. This is what became of our LINQ query’s where clause. And the final line provides a value for that named argument.

Note that you’re free to chain operators together in LINQ to Entities just as you can in LINQ to Objects. For example, we could build on the orders query from Example 14-3:

var orderedOrders = orders.OrderBy(order => order.OrderDate);

Or, if you’d prefer to carry on using LINQ syntax, Example 14-10 is equivalent.

Example 14-10. Chained query

var orderedOrders = from order in orders
                    orderby order.OrderDate
                    select order;

This doesn’t execute the orders query. It just means we have two queries now—the orders query that just filters, and then the orderedOrders query that filters and then sorts. You could think of this chained query as shorthand for Example 14-11, which explicitly combines the clauses from Example 14-9 and Example 14-10 into one query.

Example 14-11. That same query, written out in full

var orderedOrders = from order in dbContext.SalesOrderHeaders
                    where order.OrderDate == orderDate
                    orderby order.OrderDate
                    select order;

Regardless of the various equivalent ways we would build the second query, the result of executing it (e.g., by iterating over it in a foreach loop) is, as you’d expect, a SQL query that includes an ORDER BY clause as well as a WHERE clause. (And as it happens, that’s not hugely useful because in this example database, all the orders have the exact same date. With slightly more realistic data, this would have the expected effect, though.)

So LINQ to Entities queries work in a fundamentally different way from the LINQ to Objects queries we saw previously. In LINQ to Objects, the expression in a where clause is simply a delegate in disguise—it’s a method that the Where operator calls for each item in turn to work out whether to include that in the results. But with LINQ to Entities (and LINQ to SQL for that matter) the LINQ query’s where clause has been translated into T-SQL and sent to the database—the expression we wrote in C# ends up running in a different language, probably on a different machine. If you want to understand how these kinds of queries are able to work so differently for different providers, see the sidebar on the next page.

This translation is obviously very useful for shifting the work to the database, but it brings some limitations. If you try to add arbitrary method calls into the middle of a LINQ query, it’ll fail in LINQ to Entities. For example, suppose we have the following helper:

static DateTime NextDay(DateTime dt)
{
    return dt + TimeSpan.FromDays(1);
}

We could try to use this in a LINQ query:

var orders = from order in dbContext.SalesOrderHeaders
             where order.OrderDate == NextDay(orderDate)
             select order;

With LINQ to Objects this would work just fine—it’s all just C# code, and you can use any valid Boolean expression in a where clause, including expressions that invoke methods. But with LINQ to Entities, although this will compile, the EF will throw a NotSupportedException at the point at which you try to execute the query. Its error message will read:

LINQ to Entities does not recognize the method 'System.DateTime
 NextDay(System.DateTime)' method, and this method cannot be translated into
 a store expression.

LINQ to Entities queries are limited to the things that the EF knows how to turn into database queries, and since it doesn’t know anything about this NextDay method you’ve written, it can’t work out how to do that. Of course, when you bear in mind that a LINQ to Entities query executes on the database, it’s hardly surprising that you can’t invoke arbitrary methods in your application from the middle of a query. But the EF integrates some database features into your code so seamlessly that it’s sometimes easy to forget where the boundary between your application and the database lies.

Functions Delegates Versus Expressions

Database LINQ providers are able to translate LINQ queries into SQL because their versions of the LINQ operators exploit a feature added to C# 3.0 specifically to support this sort of thing. If you compare the LINQ to Entities declaration of the Where operator with its LINQ to Objects counterpart, you’ll see a difference. As we saw in Chapter 8, LINQ to Objects is implemented as a set of extension methods for the IEnumerable<T> interface defined by the Enumerable type in the System.Linq namespace. Its Where operator is declared like this:

public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate)

LINQ to Entities works in a similar fashion, but the extension methods are all for IQueryable<T>. Since the various properties the object context provides for accessing tables all implement IQueryable<T>, you’ll end up using those extension methods instead of the LINQ to Objects ones. They are defined by the Queryable type, again in the System.Linq namespace. Here’s its definition of the Where operator:

public static IQueryable<TSource> Where<TSource>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, bool>> predicate)

Obviously, the first parameter is now an IQueryable<T>, but the second parameter’s type is also different—rather than accepting a delegate for a predicate function, it now takes an expression. The Expression<T> type is a special type recognized by the C# compiler (and also the VB.NET compiler). When you call methods that expect an Expression<Func<T, TResult>>, you just provide a normal lambda like you can with an ordinary Func<T,TResult>, but the compiler builds a data structure that is essentially an abstract syntax tree for the expression—in effect, an object model describing that structure of the code.

So while this version of the Where operator looks completely normal when we use it from a LINQ query, it will be passed a description of the expression, rather than a delegate to a method. (An Expression<T> isn’t compiled into a real method, whereas an ordinary Func delegate would be. However, expressions offer a Compile method, so you can turn them into code at runtime if necessary.) The database LINQ providers, like the EF and LINQ to SQL, use this description to work out what the database query’s WHERE clause should look like.

While LINQ to Entities is a very convenient way to build queries, it’s just a layer on top of the Entity Framework’s underlying query system, which has its own query language.

Entity SQL

The Entity Framework defines a query language for making queries against the conceptual model—rather than running queries against a database, as you do in normal SQL dialects, you can run queries that work directly against the entities in your model, as the name Entity SQL (or ESQL) suggests.

Note

ESQL can be used for queries against the EDM storage model too. So it can also function as a kind of vendor-neutral SQL. But here we’re focusing on queries that target the conceptual model.

Why do we need a second way of making queries when we already have LINQ to Entities? Well, from a historical perspective that question has things back to front: during the Entity Framework’s development, ESQL was around long before LINQ to Entities. But since LINQ to Entities made it into the first version of the EF, it’s still reasonable to ask why we have both, what ESQL is for, and when it might look like a better choice than LINQ.

ESQL’s main benefit is that it’s sometimes useful to be able to represent a query as text. In fact, the Entity Data Model itself exploits this—there are some advanced scenarios in which ESQL queries can be embedded in the .edmx file. LINQ wouldn’t be an option here because .edmx is just XML; to use LINQ requires a language that supports LINQ.^[37] If you wanted to store custom queries in a configuration file, you really wouldn’t want to have to run the C# compiler at runtime to interpret the queries. And with ESQL you don’t need to—you can represent a query as a string and the EF can execute that for you at runtime.

Another feature of a string-based query language is that it’s relatively easy to compose queries at runtime. With a LINQ query expression, the structure is fixed at compile time and you only really get to tweak individual arguments, much like a fixed SQL query with a few named arguments. (Technically, it is actually possible to build LINQ queries dynamically. After all, LINQ operators are chained together with simple function calls. However, dynamic composition of Expression<T> trees turns out to be surprisingly difficult. It’s not a scenario C# attempts to help you with—you end up having to construct the expression trees without the compiler’s assistance. This is not for the fainthearted.)

Warning

The practice of stitching together strings to form dynamic queries is messy and can be fraught with security issues such as injection attacks. It’s occasionally useful if you understand the risks and can mitigate them, but you need to exercise extreme caution.

Of course, string-based queries have a massive downside compared to LINQ: the C# compiler cannot offer any help as it doesn’t understand ESQL. If your ESQL strings are badly formed, you only get to find that out at runtime. And even if your ESQL is syntactically correct, C# does not understand the relationship between it and your code—whereas with a LINQ to Entities query C# can detect things such as type mismatches, it won’t spot when your ESQL gets out of sync with the code that uses the results.

Besides the inherent benefits and disadvantages of a string-based query, there’s also the fact that ESQL is, in effect, the native query language for the EF. This means there are a few EF features that can be accessed only through ESQL, although they’re all somewhat arcane. For example, an ESQL query can navigate associations between entities even if you’ve neglected to define navigation properties to represent those associations.

Example 14-12 shows a simple example that illustrates the basic use of ESQL.

Example 14-12. Querying with ESQL

using (var dbContext = new AdventureWorksLT2008Entities())
{
    DateTime orderDate = new DateTime(2004, 6, 1);
    var query = dbContext.CreateQuery<SalesOrderHeader>("SELECT VALUE o " +
            "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " +
            "WHERE o.OrderDate = @orderDate",
        new ObjectParameter("orderDate", orderDate));

    foreach (var order in query)
    {
        Console.WriteLine(order.TotalDue);
    }
}

This has the same effect as Example 14-3, but using ESQL in place of a LINQ query. While this looks similar to a typical SQL query, the VALUE keyword is specific to ESQL. We use this to indicate that we don’t want the usual column-like behavior of SQL. You can write a more traditional-looking query in ESQL, such as:

SELECT o.TotalDue, o.OrderDate
     FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o
     WHERE o.OrderDate = @orderDate

This asks for specific columns from the entity rather than the whole entity. This is legal ESQL, but it would fail at runtime in the context of Example 14-12. That example creates the query with a call to CreateQuery<SalesOrderHeader> on the object context. The generic type argument to CreateQuery—SalesOrderHeader here—indicates the type of result we’re expecting from the query, but this modified query clearly returns something other than a SalesOrderHeader. It returns a couple of columns from each matching entity. When you build a query like this, you get back objects that implement IDataRecord—a general-purpose interface used across all of ADO.NET to represent a record (such as a table row) whose columns might not be known until runtime. (This is one of the interfaces listed in Table 14-1.) So you’d need to use CreateQuery<IDataRecord> to create such a query, and a suitably modified loop to extract the results:

var query = dbContext.CreateQuery<IDataRecord>(
    "SELECT o.TotalDue, o.OrderDate " +
        "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " +
        "WHERE o.OrderDate = @orderDate",
    new ObjectParameter("orderDate", orderDate));

foreach (var order in query)
{
    Console.WriteLine(order["TotalDue"]);
}

Even if you ask for the whole entity as a single column in the SELECT clause, for example:

SELECT o
     FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o
     WHERE o.OrderDate = @orderDate

the query will still return IDataRecord objects, not entities. Each data record returned by this query would have a single column called o that contains a SalesOrderHeader entity. To get to the entity you’d need to unwrap it inside your loop:

foreach (var row in query)
{
    SalesOrderHeader o = (SalesOrderHeader) row["o"];
    Console.WriteLine(o.TotalDue);
}

The VALUE keyword is just a shortcut that tells ESQL to omit the IDataRecord wrapper, and to return a sequence of unwrapped entities. This enables Example 14-12 to assume that it will get SalesOrderHeader entities back from the query.

Mixing ESQL and LINQ

LINQ to Entities and ESQL are not mutually exclusive. You are free to use an ESQL query as the source for a LINQ query. Here’s a contrived example:

var orders = dbContext.CreateQuery<SalesOrderHeader>("SELECT VALUE o " +
        "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " +
        "WHERE o.OrderDate = @orderDate",
    new ObjectParameter("orderDate", orderDate));

var orderedOrders = from order in orders
                    orderby order.DueDate
                    select order;

This might be useful if you wanted to store ESQL queries in some sort of configuration mechanism to allow the exact query to be changed, but to do further processing of the results of that query with LINQ.

The EntityClient ADO.NET Provider

Yet another feature enabled by ESQL is that it lets code built around the v1 ADO.NET mechanisms shown in Example 14-1 work with the EF. The System.Data.EntityClient namespace defines concrete types that derive from the abstract base classes listed in Table 14-1: EntityConnection derives from DbConnection, EntityCommand derives from DbCommand, and so on. As far as code written to use these abstract base classes is concerned, the Entity Framework ends up looking like just another database with another funky variety of SQL. As long as your ESQL selects only column values and not whole entities, queries will only ever return the same basic data types other providers would, so the behavior will look much like any other ADO.NET v1 provider.

Object Context

As you’ve seen, the object context provides access to entities. For each entity we define in our EDM, the generated object context class provides a property that we can use as the source for a LINQ query. We’ve also used its CreateQuery<T> method to build ESQL-based queries. The object context provides some other services.

Connection Handling

To execute database queries, it’s necessary to connect to a database, so the object context needs connection information. This information typically lives in the App.config file—when you first run the EDM wizard, it will add a configuration file if your application does not already have one, and then it adds a connection string. Example 14-13 shows a configuration file containing a typical Entity Framework connection string. (This has been split over multiple lines to fit—normally the connectionString attribute is all on one line.)

Example 14-13. Connection string in App.config

<configuration>
  <connectionStrings>
    <add name="AdventureWorksLT2008Entities"
         connectionString="metadata=res://*/AdventureWorksModel.csdl|
  res://*/AdventureWorksModel.ssdl|res://*/AdventureWorksModel.msl;
  provider=System.Data.SqlClient;provider connection string=
  &quot;Data Source=.sqlexpress;Initial Catalog=AdventureWorksLT2008;
  Integrated Security=True;MultipleActiveResultSets=True&quot;"
         providerName="System.Data.EntityClient" />
  </connectionStrings>
</configuration>

This is a rather more complex connection string than the one we saw back in Example 14-1, because the Entity Framework needs three things in its connection string: information on where to find the EDM definition, the type of underlying database provider to use, and the connection string to pass to that underlying provider. This last part—an ordinary SQL Server connection string, enclosed in " character entities—is highlighted in Example 14-13 in bold.

The three URIs in the metadata section of the connectionString—the ones beginning with res://—point to the three parts of the EDM: the conceptual schema, the storage schema, and the mappings. Visual Studio extracts these from the .edmx file and embeds them as three XML resource streams in the compiled program. Without these, the EF wouldn’t know what the conceptual and storage schemas are supposed to look like, or how to map between them.

Note

It may seem a bit weird for the locations of these EDM resources to be in a connection string. It might seem more natural for the XML to use a separate attribute for each one. However, as you’ve seen, the System.Data.EntityClient namespace conforms to the ADO.NET v1 model so that it’s possible for old-style data access code to perform queries against the EDM. Since the ADO.NET v1 model includes an assumption that it’s possible to put all the information defining a particular data source into a single connection string, the Entity Framework has to follow suit. And since the EF cannot function without the XML EDM definitions, the connection string has to say where those live.

After the EDM metadata resources, you can see a provider property, which in Example 14-13 indicates that the underlying database connection is to be provided by the SQL Server client. The EF passes the provider connection string on to that provider.

You don’t have to use the App.config to configure the connection. The object context offers a constructor overload that accepts a connection string. The configuration file is useful—it’s where the object context’s no-parameters constructor we’ve been using in the examples gets its connection information from—but what if you want to let just the underlying database connection string be configurable, while keeping the parts of the connection string identifying the EDM resources fixed? Example 14-14 shows how you could achieve this. It retrieves the configured values for these two pieces and uses the EntityConnectionStringBuilder helper to combine this with the EDM resource locations, forming a complete EF connection string.

Example 14-14. Passing an explicit connection string

using System.Configuration;
using System.Data.EntityClient;

...


// Retrieve the connection string for the underlying database provider.
ConnectionStringSettings dbConnectionInfo =
    ConfigurationManager.ConnectionStrings["AdventureWorksSql"];

var csb = new EntityConnectionStringBuilder();
csb.Provider = dbConnectionInfo.ProviderName;
csb.ProviderConnectionString = dbConnectionInfo.ConnectionString;
csb.Metadata = "res://*/AdventureWorksModel.csdl|" +
    "res://*/AdventureWorksModel.ssdl|res://*/AdventureWorksModel.msl";


using (var dbContext = new AdventureWorksLT2008Entities(csb.ConnectionString))
{
   ...
}

This code uses the ConfigurationManager in the System.Configuration namespace, which provides a ConnectionStrings property. (This is in a part of the .NET Framework class library that’s not referenced by default in a .NET console application, so we need to add a reference to the System.Configuration component for this to work.) This provides access to any connection strings in your App.config file; it’s the same mechanism the EF uses to find its default connection string. Now that Example 14-14 is providing the EDM resources in code, our configuration file only needs the SQL Server part of the connection string, as shown in Example 14-15 (with a long line split across multiple lines to fit). So when the application is deployed, we have the flexibility to configure which database gets used, but we have removed any risk that such a configuration change might accidentally break the references to the EDM resources.

Example 14-15. SQL Server connection string

<configuration>
  <connectionStrings>
    <add name="AdventureWorksSql" providerName="System.Data.SqlClient"
         connectionString="Data Source=.sqlexpress;
                Initial Catalog=AdventureWorksLT2008;
                Integrated Security=True;MultipleActiveResultSets=True"
    />

  </connectionStrings>
</configuration>

Besides being able to change the connection information, what else can we do with the connection? We could choose to open the connection manually—we might want to verify that our code can successfully connect to the database. But in practice, we don’t usually do that—the EF will connect automatically when we need to. The main reason for connecting manually would be if you wanted to keep the connection open across multiple requests—if the EF opens a connection for you it will close it again. In any case, we need to be prepared for exceptions anytime we access the database—being able to connect successfully is no guarantee that someone won’t trip over a network cable at some point between us manually opening the connection and attempting to execute a query. So in practice, the connection string is often the only aspect of the connection we need to take control of.

Creating, Updating, and Deleting

So far, all of our examples have just fetched existing data from the database. Most real applications will also need to be able to add, change, and remove data. So as you’d expect, the Entity Framework supports the full range of so-called CRUD (Create, Read, Update, and Delete) operations. This involves the object context, because it is responsible for tracking changes and coordinating updates.

Updates—modifications to existing records—are pretty straightforward. Entities’ properties are modifiable, so you can simply assign new values. However, the EF does not attempt to update the database immediately. You might want to change multiple properties, in which case it would be inefficient to make a request to the database for each property in turn, and that might not even work—integrity constraints in the database may mean that certain changes need to be made in concert. So the EF just remembers what changes you have made, and attempts to apply those changes back to the database only when you call the object context’s SaveChanges method. Example 14-16 does this. In fact, most of the code here just fetches a specific entity—the most recent order of a particular customer—and only the last couple of statements modify that order.

Example 14-16. Modifying an existing entity

using (var dbContext = new AdventureWorksLT2008Entities())
{
    var orderQuery = from customer in dbContext.Customers
                     where customer.CustomerID == 29531
                     from order in customer.SalesOrderHeaders
                     orderby order.OrderDate descending
                     select order;

    SalesOrderHeader latestOrder = orderQuery.First();

    latestOrder.Comment = "Call customer when goods ready to ship";

    dbContext.SaveChanges();
}

To add a brand-new entity, you need to create a new entity object of the corresponding class, and then tell the object context you’ve done so—for each entity type, the context provides a corresponding method for adding entities. In our example, the context has AddToCustomers, AddToSalesOrderHeaders, and AddToSalesOrderDetails methods. You will need to make sure you satisfy the database’s constraints, which means that the code in Example 14-17 will not be enough.

Example 14-17. Failing to meet constraints on a new entity

SalesOrderDetail detail = new SalesOrderDetail();
dbContext.AddToSalesOrderDetails(detail);

// Will throw an exception!
dbContext.SaveChanges();

The Entity Framework will throw an UpdateException when Example 14-17 calls SaveChanges because the entity is missing all sorts of information. The example database’s schema includes a number of integrity constraints, and will refuse to allow a new row to be added to the SalesOrderDetail table unless it meets all the requirements. Example 14-18 sets the bare minimum number of properties to keep the database happy. (This is probably not good enough for real code, though—we’ve not specified any price information, and the numeric price fields will have default values of 0; while this doesn’t upset the database, it might not please the accountants.)

Example 14-18. Adding a new entity

// ...where latestOrder is a SalesOrderHeader fetched with code like
// that in Example 14-16.

SalesOrderDetail detail = new SalesOrderDetail();
detail.SalesOrderHeader = latestOrder;
detail.ModifiedDate = DateTime.Now;
detail.OrderQty = 1;
detail.ProductID = 680;     // HL Road Frame - Black, 58

dbContext.AddToSalesOrderDetails(detail);

dbContext.SaveChanges();

Several of the constraints involve relationships. A SalesOrderDetail row must be related to a particular row in the Product table, because that’s how we know what products the customer has ordered. We’ve not defined an entity type corresponding to the Product table, so Example 14-18 just plugs in the relevant foreign key value directly.

The database also requires that each SalesOrderDetail row be related to exactly one SalesOrderHeader row—remember that this was one of the one-to-many relationships we saw earlier. (The header has a multiplicity of one, and the detail has a multiplicity of many.) The constraint in the database requires the SalesOrderID foreign key column in each SalesOrderDetail row to correspond to the key for an existing SalesOrderHeader row. But unlike the ProductID column, we don’t set the corresponding property directly on the entity. Instead, the second line of Example 14-18 sets the new entity’s SalesOrderHeader property, which as you may recall is a navigation property.

When adding new entities that must be related to other entities, you normally indicate the relationships with the corresponding navigation properties. In this example, you could add the new SalesOrderDetail object to a SalesOrderHeader object’s SalesOrderDetails navigation property—since a header may have many related details, the property is a collection and offers an Add method. Or you can work with the other end of the relationship as Example 14-18 does. This is the usual way to deal with the relationships of a newly created entity—setting foreign key properties directly as we did for the other relationships here is somewhat unusual. We did that only because our EDM does not include all of the relevant entities—we represent only three of the tables because a complete model for this particular example would have been too big to fit legibly onto a single page. There may also be situations where you know that in your particular application, the key values required will never change, and you might choose to cache those key values to avoid the overhead of involving additional entities.

We’ve seen how to update existing data and add new data. This leaves deletion. It’s pretty straightforward: if you have an entity object, you can pass it to the context’s DeleteObject method, and the next time you call SaveChanges, the EF will attempt to delete the relevant row, as shown in Example 14-19.

Example 14-19. Deleting an entity

dbContext.DeleteObject(detail);

dbContext.SaveChanges();

As with any kind of change to the database, this will succeed only if it does not violate any of the database’s integrity constraints. For example, deleting an entity at the one end of a one-to-many relationship may fail if the database contains one or more rows at the many end that are related to the item you’re trying to delete. (Alternatively, the database might automatically delete all related items—SQL Server allows a constraint to require cascading deletes. This takes a different approach to enforcing the constraint—rather than rejecting the attempt to delete the parent item, it deletes all the children automatically.)

Example 14-18 adds new information that relates to information already in the database—it adds a new detail to an existing order. This is a very common thing to do, but it raises a challenge: what if code elsewhere in the system was working on the same data? Perhaps some other computer has deleted the order you were trying to add detail for. The EF supports a couple of common ways of managing this sort of hazard: transactions and optimistic concurrency.

Transactions

Transactions are an extremely useful mechanism for dealing with concurrency hazards efficiently, while keeping data access code reasonably simple. Transactions provide the illusion that each individual database client has exclusive access to the entire database for as long as it needs to do a particular job—it has to be an illusion because if clients really took it in turns, scalability would be severely limited. So transactions perform the neat trick of letting work proceed in parallel except for when that would cause a problem—as long as all the transactions currently in progress are working on independent data they can all proceed simultaneously, and clients have to wait their turn only if they’re trying to use data already involved (directly, or indirectly) in some other transaction in progress.^[38]

The classic example of the kind of problem transactions are designed to avoid is that of updating the balance of a bank account. Consider what needs to happen to your account when you withdraw money from an ATM—the bank will want to make sure that your account is debited with the amount of money withdrawn. This will involve subtracting that amount from the current balance, so there will be at least two operations: discovering the current balance, and then updating it to the new value. (Actually it’ll be a whole lot more complex than that—there will be withdrawal limit checks, fraud detection, audit trails, and more. But the simplified example is enough to illustrate how transactions can be useful.) But what happens if some other transaction occurs at the same time? Maybe you happen to be making a withdrawal at the same time as the bank processes an electronic transfer of funds.

If that happens, a problem can arise. Suppose the ATM transaction and the electronic transfer both read the current balance—perhaps they both discover a balance of $1,234. Next, if the transfer is moving $1,000 from your account to somewhere else, it will write back a new balance of $234—the original balance minus the amount just deducted. But there’s the ATM transfer—suppose you withdraw $200. It will write back a new balance of $1,034. You just withdrew $200 and paid $1,000 to another account, but your account only has $200 less in it than before rather than $1,200—that’s great for you, but your bank will be less happy. (In fact, your bank probably has all sorts of checks and balances to try to minimize opportunities such as this for money to magically come into existence. So they’d probably notice such an error even if they weren’t using transactions.) In fact, neither you nor your bank really wants this to happen, not least because it’s easy enough to imagine similar examples where you lose money.

This problem of concurrent changes to shared data crops up in all sorts of forms. You don’t even need to be modifying data to observe a problem: code that only ever reads can still see weird results. For example, you might want to count your money, in which case looking at the balances of all your accounts would be necessary—that’s a read-only operation. But what if some other code was in the middle of transferring money between two of your accounts? Your read-only code could be messed up by other code modifying the data.

A simple way to avoid this is to do one thing at a time—as long as each task completes before the next begins, you’ll never see this sort of problem. But that turns out to be impractical if you’re dealing with a large volume of work. And that’s why we have transactions—they are designed to make it look like things are happening one task at a time, but under the covers they allow tasks to proceed concurrently as long as they’re working on unrelated information. So with transactions, the fact that some other bank customer is in the process of performing a funds transfer will not stop you from using an ATM. But if a transfer is taking place on one of your accounts at the same time that you are trying to withdraw money, transactions would ensure that these two operations take it in turns.

So code that uses transactions effectively gets exclusive access to whatever data it is working with right now, without slowing down anything it’s not using. This means you get the best of both worlds: you can write code as though it’s the only code running right now, but you get good throughput.

How do we exploit transactions in C#? Example 14-20 shows the simplest approach: if you create a TransactionScope object, the EF will automatically enlist any database operations in the same transaction. The TransactionScope class is defined in the System.Transactions namespace in the System.Transactions DLL (another class library DLL for which we need to add a reference, as it’s not in the default set).

Example 14-20. TransactionScope

using (var dbContext = new AdventureWorksLT2008Entities())
{
    using (var txScope = new TransactionScope())
    {
        var customersWithOrders = from cust in dbContext.Customers
                                  where cust.SalesOrderHeaders.Count > 0
                                  select cust;

        foreach (var customer in customersWithOrders)
        {
            Console.WriteLine("Customer {0} has {1} orders",
                customer.CustomerID, customer.SalesOrderHeaders.Count);
        }

        txScope.Complete();
    }
}

For as long as the TransactionScope is active (i.e., until it is disposed at the end of the using block), all the requests to the database this code makes will be part of the same transaction, and so the results should be consistent—any other database client that tries to modify the state we’re looking at will be made to wait (or we’ll be made to wait for them) in order to guarantee consistency. The call to Complete at the end indicates that we have finished all the work in the transaction, and are happy for it to commit—without this, the transaction would be aborted at the end of the scope’s using block. For a transaction that modifies data, failure to call Complete will lose any changes. Since the transaction in Example 14-20 only reads data, this might not cause any visible problems, but it’s difficult to be certain. If a TransactionScope was already active on this thread (e.g., a function farther up the call stack started one) our TransactionScope could join in with the same transaction, at which point failure to call Complete on our scope would end up aborting the whole thing, possibly losing data. The documentation recommends calling Complete for all transactions except those you want to abort, so it’s a good practice always to call it.

Transaction Length

When transactions conflict because multiple clients want to use the same data, the database may have no choice but to make one or more of the clients wait. This means you should keep your transaction lifetimes as short as you possibly can—slow transactions can bog down the system. And once that starts happening, it becomes a bit of a pile-up—the more transactions that are stuck waiting for something else to finish, the more likely it is that new transactions will want to use data that’s already under contention. The rosy “best of both worlds” picture painted earlier evaporates.

Worse, conflicts are sometimes irreconcilable—a database doesn’t know at the start of a transaction what information will be used, and sometimes it can find itself in a place where it cannot proceed without returning results that will look inconsistent, in which case it’ll just fail with an error. (In other words, the clever tricks databases use to minimize how often transactions block sometimes backfire.) It’s easy enough to contrive pathological code that does this on purpose, but you hope not to see it in a live system. The shorter you make your transactions the less likely you are to see troublesome conflicts.

You should never start a transaction and then wait for user input before finishing the transaction—users have a habit of going to lunch mid-transaction. Transaction duration should be measured in milliseconds, not minutes.

TransactionScope represents an implicit transaction—any data access performed inside its using block will automatically be enlisted on the transaction. That’s why Example 14-20 never appears to use the TransactionScope it creates—it’s enough for it to exist. (The transaction system keeps track of which threads have active implicit transactions.) You can also work with transactions explicitly—the object context provides a Connection property, which in turn offers explicit BeginTransaction and EnlistTransaction methods. You can use these in advanced scenarios where you might need to control database-specific aspects of the transaction that an implicit transaction cannot reach.

Note

These transaction models are not specific to the EF. You can use the same techniques with ADO.NET v1-style data access code.

Besides enabling isolation of multiple concurrent operations, transactions provide another very useful property: atomicity. This means that the operations within a single transaction succeed or fail as one: all succeed, or none of them succeed—a transaction is indivisible in that it cannot complete partially. The database stores updates performed within a transaction provisionally until the transaction completes—if it succeeds, the updates are permanently committed, but if it fails, they are rolled back and it’s as though the updates never occurred. The EF uses transactions automatically when you call SaveChanges—if you have not supplied a transaction, it will create one just to write the updates. (If you have supplied one, it’ll just use yours.) This means that SaveChanges will always either succeed completely, or have no effect at all, whether or not you provide a transaction.

Transactions are not the only way to solve problems of concurrent access to shared data. They are bad at handling long-running operations. For example, consider a system for booking seats on a plane or in a theater. End users want to see what seats are available, and will then take some time—minutes probably—to decide what to do. It would be a terrible idea to use a transaction to handle this sort of scenario, because you’d effectively have to lock out all other users looking to book into the same flight or show until the current user makes a decision. (It would have this effect because in order to show available seats, the transaction would have had to inspect the state of every seat, and could potentially change the state of any one of those seats. So all those seats are, in effect, owned by that transaction until it’s done.)

Let’s just think that through. What if every person who flies on a particular flight takes two minutes to make all the necessary decisions to complete his booking? (Hours of queuing in airports and observing fellow passengers lead us to suspect that this is a hopelessly optimistic estimate. If you know of an airline whose passengers are that competent, please let us know—we’d like to spend less time queuing.) The Airbus A380 aircraft has FAA and EASA approval to carry 853 passengers, which suggests that even with our uncommonly decisive passengers, that’s still a total of more than 28 hours of decision making for each flight. That sounds like it could be a problem for a daily flight.^[39] So there’s no practical way of avoiding having to tell the odd passenger that, sorry, in between showing him the seat map and choosing the seat, someone else got in there first. In other words, we are going to have to accept that sometimes data will change under our feet, and that we just have to deal with it when it happens. This requires a slightly different approach than transactions.

Optimistic Concurrency

Optimistic concurrency describes an approach to concurrency where instead of enforcing isolation, which is how transactions usually work, we just make the cheerful assumption that nothing’s going to go wrong. And then, crucially, we verify that assumption just before making any changes.

Note

In practice, it’s common to use a mixture of optimistic concurrency and transactions. You might use optimistic approaches to handle long-running logic, while using short-lived transactions to manage each individual step of the process.

For example, an airline booking system that shows a map of available seats in an aircraft on a web page would make the optimistic assumption that the seat the user selects will probably not be selected by any other user in between the moment at which the application showed the available seats and the point at which the user picks a seat. The advantage of making this assumption is that there’s no need for the system to lock anyone else out—any number of users can all be looking at the seat map at once, and they can all take as long as they like.

Occasionally, multiple users will pick the same seat at around the same time. Most of the time this won’t happen, but the occasional clash is inevitable. We just have to make sure we notice. So when the user gets back to us and says that he wants seat 7K, the application then has to go back to the database to see if that seat is in fact still free. If it is, the application’s optimism has been vindicated, and the booking can proceed. If not, we just have to apologize to the user (or chastise him for his slowness, depending on the prevailing attitude to customer service in your organization), show him an updated seat map so that he can see which seats have been claimed while he was dithering, and ask him to make a new choice. This will happen only a small fraction of the time, and so it turns out to be a reasonable solution to the problem—certainly better than a system that is incapable of taking enough bookings to fill the plane in the time available.

Sometimes optimistic concurrency is implemented in an application-specific way. The example just described relies on an understanding of what the various entities involved mean, and would require us to write code that explicitly performs the check described. But slightly more general solutions are available—they are typically less efficient, but they can require less code. The EF offers some of these ignorant-but-effective approaches to optimistic concurrency.

The default EF behavior seems, at a first glance, to be ignorant and broken—not only does it optimistically assume that nothing will go wrong, but it doesn’t even do anything to check that assumption. We might call this blind optimism—we don’t even get to discover when our optimism turned out to be unfounded. While that sounds bad, it’s actually the right thing to do if you’re using transactions—transactions enforce isolation and so additional checks would be a waste of time. But if you’re not using transactions, this default behavior is not good enough for code that wants to change or add data—you’ll risk compromising the integrity of your application’s state.

To get the EF to check that updates are likely to be sound, you can tell it to check that certain entity properties have not changed since the entity was populated from the database. For example, in the SalesOrderDetail entity, if you select the ModifiedDate property in the EDM designer, you could go to the Properties panel and set its Concurrency Mode to Fixed (its default being None). This will cause the EF to check that this particular column’s value is the same as it was when the entity was fetched whenever you update it. And as long as all the code that modifies this particular table remembers to update the ModifiedDate, you’ll be able to detect when things have changed.

Warning

While this example illustrates the concept, it’s not entirely robust. Using a date and time to track when a row changes has a couple of problems. First, different computers in the system are likely to have slight differences between their clocks, which can lead to anomalies. And even if only one computer ever accesses the database, its clock may be adjusted from time to time. You’d end up wanting to customize the SQL code used for updates so that everything uses the database server’s clock for consistency. Such customizations are possible, but they are beyond the scope of this book. And even that might not be enough—if the row is updated often, it’s possible that two updates might have the same timestamp due to insufficient precision. A stricter approach based on GUIDs or sequential row version numbers is more robust. But this is the realm of database design, rather than Entity Framework usage—ultimately you’re going to be stuck with whatever your DBA gives you.

If any of the columns with a Concurrency Mode of Fixed change between reading an entity’s value and attempting to update it, the EF will detect this when you call SaveChanges and will throw an OptimisticConcurrencyException, instead of completing the update.

Note

The EF detects changes by making the SQL UPDATE conditional—its WHERE clause will include checks for all of the Fixed columns. It inspects the updated row count that comes back from the database to see whether the update succeeded.

How you deal with an optimistic concurrency failure is up to your application—you might simply be able to retry the work, or you may have to get the user involved. It will depend on the nature of the data you’re trying to update.

The object context provides a Refresh method that you can call to bring entities back into sync with the current state of the rows they represent in the database. You could call this after catching an OptimisticConcurrencyException as the first step in your code that recovers from a problem. (You’re not actually required to wait until you get a concurrency exception—you’re free to call Refresh at any time.) The first argument to Refresh tells it what you’d like to happen if the database and entity are out of sync. Passing RefreshMode.StoreWins tells the EF that you want the entity to reflect what’s currently in the database, even if that means discarding updates previously made in memory to the entity. Or you can pass RefreshMode.ClientWins, in which case any changes in the entity remain present in memory. The changes will not be written back to the database until you next call SaveChanges. So the significance of calling Refresh in ClientWins mode is that you have, in effect, acknowledged changes to the underlying database—if changes in the database were previously causing SaveChanges to throw an OptimisticConcurrencyException, calling SaveChanges again after the Refresh will not throw again (unless the database changes again in between the call to Refresh and the second SaveChanges).

Context and Entity Lifetime

If you ask the context object for the same entity twice, it will return you the same object both times—it remembers the identity of the entities it has returned. Even if you use different queries, it will not attempt to load fresh data for any entities already loaded unless you explicitly pass them to the Refresh method.

Note

Executing the same LINQ query multiple times against the same context will still result in multiple queries being sent to the database. Those queries will typically return all the current data for the relevant entity. But the EF will look at primary keys in the query results, and if they correspond to entities it has already loaded, it just returns those existing entities and won’t notice if their values in the database have changed. It looks for changes only when you call either SaveChanges or Refresh.

This raises the question of how long you should keep an object context around. The more entities you ask it for, the more objects it’ll hang on to. Even when your code has finished using a particular entity object, the .NET Framework’s garbage collector won’t be able to reclaim the memory it uses for as long as the object context remains alive, because the object context keeps hold of the entity in case it needs to return it again in a later query.

Note

The way to get the object context to let go of everything is to call Dispose. This is why all of the examples that show the creation of an object context do so in a using statement.

There are other lifetime issues to bear in mind. In some situations, an object context may hold database connections open. And also, if you have a long-lived object context, you may need to add calls to Refresh to ensure that you have fresh data, which you wouldn’t have to do with a newly created object context. So all the signs suggest that you don’t want to keep the object context around for too long.

How long is too long? In a web application, if you create an object context while handling a request (e.g., for a particular page) you would normally want to Dispose it before the end of that request—keeping an object context alive across multiple requests is typically a bad idea. In a Windows application (WPF or Windows Forms), it might make sense to keep an object context alive a little longer, because you might want to keep entities around while a form for editing the data in them is open. (If you want to apply updates, you normally use the same object context you used when fetching the entities in the first place, although it’s possible to detach an entity from one context and attach it later to a different one.) In general, though, a good rule of thumb is to keep the object context alive for no longer than is necessary.

WCF Data Services

The last data access feature we’ll look at is slightly different from the rest. So far, we’ve seen how to write code that uses data in a program that can connect directly to a database. But WCF Data Services lets you present data over HTTP, making data access possible from code in some scenarios where direct connections are not possible. It defines a URI structure for identifying the data you’d like to access, and the data itself can be represented in either JSON or the XML-based Atom Publishing Protocol (AtomPub).

As the use of URIs, JSON, and XML suggests, WCF Data Services can be useful in web applications. Silverlight cannot access databases directly, but it can consume data via WCF Data Services. And the JSON support means that it’s also relatively straightforward for script-based web user interfaces to use.

WCF Data Services is designed to work in conjunction with the Entity Framework. You don’t just present an entire database over HTTP—that would be a security liability. Instead, you define an Entity Data Model, and you can then configure which entity types should be accessible over HTTP, and whether they are read-only or support other operations such as updates, inserts, or deletes. And you can add code to implement further restrictions based on authentication and whatever security policy you require. (Of course, this still gives you plenty of scope for creating a security liability. You need to think carefully about exactly what information you want to expose.)

To show WCF Data Services in action, we’ll need a web application, because it’s an HTTP-based technology. If you create a new project in Visual Studio, you’ll see a Visual C#→Web category on the left, and the Empty ASP.NET Web Application template will suit our needs here. We need an Entity Data Model to define what information we’d like to expose—for this example, we’ll use the same EDM we’ve been using all along, so the steps will be the same as they were earlier in the chapter.

To expose this data over HTTP, we add another item to the project—under the Visual C#→Web template category we choose the WCF Data Service template. We’ll call the service MyData. Visual Studio will add a MyData.svc.cs file to the project, which needs some tweaking before it’ll expose any data—it assumes that it shouldn’t publish any information that we didn’t explicitly tell it to.

The first thing we need to do is modify the base class of the generated MyData class—it derives from a generic class called DataService, but the type argument needs to be filled in—Visual Studio just puts a comment in there telling you what to do. We will plug in the name of the object context class:

public class MyData : DataService<AdventureWorksLT2008Entities>

This class contains an InitializeService method to which we need to add code for each entity type we’d like to make available via HTTP. Example 14-21 makes all three entity types in the model available for read access.

Example 14-21. Making entities available

public static void InitializeService(IDataServiceConfiguration config)
{
    config.SetEntitySetAccessRule("Customers",
                                  EntitySetRights.AllRead);
    config.SetEntitySetAccessRule("SalesOrderHeaders",
                                  EntitySetRights.AllRead);
    config.SetEntitySetAccessRule("SalesOrderDetails",
                                  EntitySetRights.AllRead);
}

We can now look at how the data appears. If we press F5, Visual Studio opens a web browser showing the MyData.svc URL for our web application. It shows an XML file describing the available entity types, as Example 14-22 shows. (The exact value you see in the xml:base may be different—it depends on the port number Visual Studio chooses for debugging.)

Example 14-22. Available entities described by the web service

<service xml:base="http://localhost:1181/MyData.svc/"
   xmlns:atom="http://www.w3.org/2005/Atom"
   xmlns:app="http://www.w3.org/2007/app"
   xmlns="http://www.w3.org/2007/app">
  <workspace>
    <atom:title>Default</atom:title>
    <collection href="Customers">
      <atom:title>Customers</atom:title>
    </collection>
    <collection href="SalesOrderDetails">
      <atom:title>SalesOrderDetails</atom:title>
    </collection>
    <collection href="SalesOrderHeaders">
      <atom:title>SalesOrderHeaders</atom:title>
    </collection>
  </workspace>
</service>

Notice that each <collection> element has an href attribute. Typically, href attributes denote a link to another resource, the attribute value being a relative URL. So you can just stick an entity name on the end of the URL. The exact URL will depend on the port number Visual Studio picks for the test web server, but something like http://localhost:1181/MyData.svc/Customers will return all the customers in the system.

Note

There are two things to be aware of when looking at entities in the browser with this sort of URL. First, the simplest URLs will return all the entities of the specified type, which might take a long time. We’ll see how to be more selective in a moment. Second, by default the web browser will notice that the data format being used is a variant of Atom, and will attempt to use the same friendly feed rendering you would get on other Atom- and RSS-based feeds. (Lots of blogs offer an Atom-based feed format.) Unfortunately, the browser’s friendly rendering is aimed at the kind of Atom features usually found in blogs, and it doesn’t always understand AtomPub feeds, so you might just get an error.

To deal with the second problem, you could just View Source to see the underlying XML, or you can turn off friendly feed rendering. In IE8, you open the Internet Options window and go to the Content tab. Open the Feed and Web Slice Settings window from there, and uncheck the “Turn on feed reading view” checkbox. (If you’ve already looked at a feed and hit this problem, you might need to close all instances of IE after making this change and try again.)

WCF Data Services lets you request a specific entity by putting its primary key inside parentheses at the end of the URL. For example, http://localhost:1181/MyData.svc/Customers(29531) fetches the customer entity whose ID is 29531. If you try this, you’ll see a simple XML representation of all the property values for the entity. In that same XML document, you’ll also find this element:

<link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/
SalesOrderHeaders"
 type="application/atom+xml;type=feed"
 title="SalesOrderHeaders"
 href="Customers(29531)/SalesOrderHeaders"
/>

This is how associations in the EDM show up—if an entity has related entities available through an association, it will offer a link to the URL on which those associations can be found. So as the href in this example shows, you can just stick SalesOrderHeaders on the end of the customer instance URL to get all the related orders for customer 29531, as in the following:

http://localhost:1181/MyData.svc/Customers(29531)/SalesOrderHeaders

So you can see how joins across relationships turn into URLs, and also how simple key-based queries work. In fact, the URL syntax also supports more complex queries based on properties. For example, this returns all customers whose FirstName has the value Cory:

http://localhost:1181/MyData.svc/Customers?$filter=FirstName%20eq%20'Cory'

(The %20 is how URLs represent spaces, so we’ve really just appended $filter=FirstName eq 'Cory' to the URL.) The URL syntax also supports ordering and paging. Many standard LINQ operators are not supported, including grouping and joining.

You don’t have to work directly with these URLs and XML documents—WCF Data Services includes a client-side component that supports LINQ. So you can run LINQ queries that will be converted into HTTP requests that use the URL structure you see here. We can demonstrate this by adding a new console application to the same solution as our web application. If we right-click on the console application’s References item in the Solution Explorer and select Add Service Reference, clicking Discover in the dialog that opens will show the WCF Data Service from the other project. Selecting this and clicking OK generates code to represent each entity type defined by the service. That enables us to write code such as Example 14-23.

Example 14-23. Client-side WCF Data Services code

var ctx = new AdventureWorksLT2008Entities(
    new Uri("http://localhost:1181/MyData.svc"));
var customers = from customer in ctx.Customers
                where customer.FirstName == "Cory"
                select customer;

foreach (Customer customer in customers)
{
    Console.WriteLine(customer.CompanyName);
}

This looks superficially similar to the Entity Framework code we saw earlier—we still have an object context, for example. Visual Studio generated the AdventureWorksLT2008Entities class when we imported the service reference, and it derives from DataServiceContext. It’s slightly different from the EF context—it’s not disposable, for one thing. (That’s why there’s no using statement here—this object context doesn’t implement IDisposable.) And it’s a lot simpler—it doesn’t do any change tracking. (That’s why it doesn’t need to implement IDisposable.) It’s really just a convenient way to extract the information that an WCF Data Service exposes as objects in C#.

The LINQ query here will generate a suitable URL that encodes the query—filtering by FirstName in this case. And as with a database query, it won’t actually make the request until we start to enumerate the results—this LINQ provider follows the usual deferred execution pattern.

Warning

The range of query types supported by the WCF Data Services LINQ provider is much more limited than that offered by LINQ to Entities, LINQ to SQL, or most LINQ providers. It can only implement queries that are possible to turn into WCF Data Services URLs, and the URL syntax doesn’t cover every possible kind of LINQ query.

WCF Data Services also offers more advanced features than those shown here. For example, you can arrange for entities to be updatable and creatable, and you can provide custom filtering code, to control exactly which entities are returned.

Summary

In this chapter, we saw that the .NET Framework offers a range of data access mechanisms. The original interface-based API supports direct database access. The Entity Framework makes it easier for C# code to work with data from the database, as well as providing some support for controlling the mapping between the database and the object model representing the data. And WCF Data Services is able to take some or all of an Entity Data Model and present it over HTTP, with either AtomPub or JSON, thus making your data available to AJAX and Silverlight clients.

^[34]The name is a little confusing. In a sense, ADO.NET is a successor to ADO (ActiveX Data Objects), a data access system that was around before .NET. So ADO.NET does for . NET what ADO did for Visual Basic 6. But they are quite different technologies—ADO.NET makes no use of ADO, or ActiveX. ADO.NET can use OLE DB, the technology underpinning ADO, but native ADO.NET providers are preferred—the OLE DB provider is mainly for legacy sources.

^[35]Yoda discussing Sith terms of employment, from Star Wars Episode I: The Phantom Menace.

^[36]Opinion is divided on whether this variant can still be called one-to-one. Strictly speaking it’s incorrect, but in practice you’ll see one-to-zero-or-one relationships widely described informally as one-to-one.

^[37]And in case you’re wondering about LINQ to XML, that doesn’t help here. It lets you use LINQ-capable languages like C# or VB.NET to write LINQ queries to look in an XML document. It doesn’t let you put LINQ queries in an XML document.

^[38]In fact, it gets a good deal cleverer than that. Databases go to some lengths to avoid making clients wait for one another unless it’s absolutely necessary, and can sometimes manage this even when clients are accessing the same data, particularly if they’re only reading the common data. Not all databases do this in the same way, so consult your database documentation for further details.

^[39]And yes, bookings for daily scheduled flights are filled up gradually over the course of a few months, so 28 hours per day is not necessarily a showstopper. Even so, forcing passengers to wait until nobody else is choosing a seat would be problematic—you’d almost certainly find that your customers didn’t neatly space out their usage of the system, and so you’d get times where people wanting to book would be unable to. Airlines would almost certainly lose business the moment they told customers to come back later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Databases

Create new playlist

Sign In

Sign Up

Chapter 14. Databases

The .NET Data Access Landscape

Classic ADO.NET

IDataReader and friends

ADO.NET data sets

LINQ and Databases

Non-Microsoft Data Access Technologies

WCF Data Services

Silverlight and Data Access

Databases

Getting up and running with SQL Server 2008 Express

The Entity Data Model

Warning

Note

Generated Code

Note

Note

Changing the Mapping

Relationships

Note

Note

Navigation properties

Warning

Note

Multiplicity

Note

Inheritance

Queries

LINQ to Entities

Note

Entity SQL

Note

Warning

Mixing ESQL and LINQ

The EntityClient ADO.NET Provider

Object Context

Connection Handling

Note

Creating, Updating, and Deleting

Transactions

Note

Optimistic Concurrency

Note

Warning

Note

Context and Entity Lifetime

Note

Note

WCF Data Services

Note

Warning

Summary

Table of Contents for
14. Databases