Databases are one of computing’s most important inventions. They allow applications to store massive quantities of information, with the ability to search through millions of items and retrieve the ones you need in a fraction of a second. A high-quality database can scale to large numbers of concurrent end users, while providing very reliable storage, even in the face of system crashes. And even if you don’t need the scalability, databases still look compelling if your program needs to remember data for any length of time—applications that store valuable information usually rely on databases.
The .NET Framework provides several different ways to communicate with databases. We will mainly be looking at its most recently introduced data access mechanism, the Entity Framework, and how that works with the LINQ features of C#. But first, we’ll take a quick look at all of the database features of the .NET Framework, to put the Entity Framework in context.
The main focus of this chapter, the Entity Framework, was first released as part of Service Pack 1 for Visual Studio 2008, which emerged less than a year after the initial (pre-Service-Pack) release of Visual Studio 2008. This was remarkable, since that first release had already introduced a brand-new data access feature, LINQ to SQL, but then Microsoft has released a lot of data access technologies over the years.
While the pace of change can sometimes seem daunting, each new piece has been a useful advance, and despite the new APIs, the data access services that appeared in .NET v1.0 are still relevant today. So we’re not in a state of continuous revolution—new features mostly add layers of functionality. This means it’s useful to understand all the parts in order to know what to choose for your applications, so we’ll review what each is for and how the pieces build on one another.
.NET v1 provided a set of data access services called ADO.NET.[34] In more recent years, ADO.NET seems to have grown into an umbrella term—as new data access features have been added, most (but not all) appear in the ADO.NET section of the documentation. But to understand the layers, it’s worth starting with the two parts that were in the first version: interfaces for querying and updating databases, and classes that support disconnected use of data.
ADO.NET defines a family of interfaces that provide a uniform way to perform basic operations such as executing queries, inserting new rows into database tables, and updating or deleting existing rows. Some data access features are common to many different programming systems—if you’re familiar with ODBC, or with Java’s JDBC, you could think of these ADO.NET interfaces as being the .NET equivalent of those APIs.
These interfaces provide the most direct and efficient way to access the basic services offered by relational databases, which is why the other data access features we’ll be looking at in this chapter do not replace this part of ADO.NET. They build on this low-level feature to provide higher-level services.
Because it’s not the main focus of this chapter, we won’t go into too much detail on how this part of ADO.NET works, and will instead just provide a quick taste. Table 14-1 shows the main ADO.NET base classes that represent the various things needed to get a database to do some work.
Table 14-1. ADO.NET basic data access abstract base classes
Class | Represents |
---|---|
| Connection to a database |
| Command to be executed by a database |
| Parameter for a command |
| Single row of data
returned by a query; alternatively, the |
| Iterator over the full
results returned by a query (potentially many rows and many
row sets); implements |
| Database transaction |
Example 14-1 shows the
typical pattern of communication. It starts by creating a connection
object—a SqlConnection
here because
this code connects to SQL Server, but for other databases you’d use
other types derived from DbConnection
, such as OracleConnection
. Next, it builds a command
object, setting its CommandText
to
the SQL we want the database to execute. This particular example is a
parameterized command—it selects addresses from a specified state, so
we supply the command with a parameter object specifying the state.
Then we execute the command by calling ExecuteReader
, using the data reader object
it returns to iterate through the rows produced by the query, and we
print out the values. (This particular example assumes you have a SQL
Server instance called .SQLEXPRESS
. If you installed the full
edition or developer edition of SQL Server, specify just .
instead of .SQLEXPRESS
. See Getting up and running with SQL Server 2008 Express for information on
getting the samples installed.)
Example 14-1. Basic data access with ADO.NET
string sqlConnectionString = @"Data Source=.sqlexpress;" + "Initial Catalog=AdventureWorksLT2008;Integrated Security=True"; string state = "California"; using (DbConnection conn = new SqlConnection(sqlConnectionString)) using (DbCommand cmd = conn.CreateCommand()) { cmd.CommandText = "SELECT AddressLine1, AddressLine2, City FROM SalesLT.Address WHERE " + "StateProvince=@state"; DbParameter stateParam = cmd.CreateParameter(); stateParam.ParameterName = "@state"; stateParam.Value = state; cmd.Parameters.Add(stateParam); conn.Open(); using (DbDataReader reader = cmd.ExecuteReader()) { while (reader.Read()) { string addressLine1 = reader.GetString(0); // AddressLine2 is nullable, so we need to be prepared to get // back either a string or a DBNull string addressLine2 = reader.GetValue(1) as string; string city = reader.GetString(2); Console.WriteLine(addressLine1); Console.WriteLine(addressLine2); Console.WriteLine(city); } } }
You might be wondering why we’re fiddling around with parameter objects when it would have been simpler to
just put the state
directly into
the SQL string. This particular example hardcodes the state, so that
would have worked, but the technique here would be important if the
value was picked at runtime. In general, building SQL queries with
string concatenation is a dangerous thing to do—if any of the text
comes from outside your code (e.g., from a form on a web page, or part
of a URL) your code will be vulnerable to a SQL injection attack. Imagine that
Example 14-1 was part of a web
application, and state
here came
from part of a URL such as
http://example.com/showinfo?state=California.
Users are free to modify URLs—you can just type them into the address
bar—so a malicious user might decide to modify that part of the URL.
If the code just took the string from the URL and concatenated it
directly into the SQL, we would effectively be giving anyone with an
Internet connection the ability to run arbitrary SQL commands on our
database—SQL queries can contain multiple commands, so users would be
able to add extra commands to run after the SELECT
. Parameters are one way to avoid
this, because the value of a parameter will not be treated as SQL. So
it’s a good idea to get in the habit of using parameters whenever some
part of the query needs to change at runtime.
The API we used here directly reflects the steps needed to
communicate with a database, so we have to write a lot of code to
bridge between the queries, parameters, and columns of the database
world and the world of C#. Just as a sneak preview, Example 14-2 shows the equivalent code
using the Entity Framework. Notice that instead of having to build a
parameter object for a parameterized query, we’ve just been able to
use a LINQ where
clause and the C#
==
comparison syntax. (The Entity
Framework performs a parameterized query under the covers, so this is
safe from SQL injection attacks.) Also notice that all the database
columns are available as object properties, so we don’t have to call
GetString
or similar helpers to
retrieve column values.
Example 14-2. LINQ to Entities versus ADO.NET
string state = "California"; using (var context = new AdventureWorksLT2008Entities()) { var addresses = from address in context.Addresses where address.StateProvince == state select address; foreach (var address in addresses) { Console.WriteLine(address.AddressLine1); Console.WriteLine(address.AddressLine2); Console.WriteLine(address.City); } }
Example 14-1 has one obvious benefit in exchange for the complexity: we have complete control over the SQL query. You can’t see the SQL in Example 14-2 because it gets generated for you. In general, the low-level ADO.NET API gives us more direct access to database features—for example, with SQL Server you can arrange to be notified when a query you executed earlier will now return different results due to changes in the database. (This can be useful in caching systems—ASP.NET’s cache can take advantage of this. It needs to be used with care, however, because it requires you to keep a database connection open at all times, which can cause severe scaling problems.)
Another potential benefit is that Example 14-1 does not require the application to commit to the Entity Framework’s way of doing things. Not all applications necessarily want to use databases in the way the Entity Framework chooses to.
The use of this old-style part of ADO.NET usually comes down to a need for control over some specific aspects of data access, or occasionally because it offers performance benefits in certain specialized scenarios. But for the majority of developers, this style of data access will be unnecessarily low-level and verbose.
These interfaces are not the only part of ADO.NET v1—it has another piece whose job is to manage data after the query that fetched it has completed.
ADO.NET defines the DataSet
class, which
is a collection of DataTable
objects. A DataTable
is an
in-memory copy of some tabular data. Typically, this would be fetched
from a database table or view, although it’s possible to build up a
DataTable
from any information
source—it provides methods for creating new rows from scratch.
A DataSet
can be a convenient
way of loading a small subset of a database’s contents into
client-side code, enabling information to be browsed locally using
data binding. It also supports some basic client-side processing of a
kind that might normally be done in the database—you can perform
searching, filtering, and sorting, for example. In Windows GUIs,
moving this sort of work to the client side can improve responsiveness—the user doesn’t have to
wait for the database to respond to see results. This is what’s meant
by disconnected operation—you can
still work with the data even after closing the connection to the
database.
DataSet
objects are
serializable, so it’s possible to save one to disk or send it across a
network. It can use an XML representation, which in theory makes it
possible for non-.NET code to access the information in a DataSet
. However, while it’s certainly
workable, in practice this seems not to be a popular technique. This
may be because the XML representation is relatively complex and unique
to the DataSet
, so there’s not much
support for it outside of the .NET Framework.
Visual Studio is able to generate derived classes to build a
so-called strongly typed DataSet
, whose tables offer row objects with
.NET properties representing columns in the corresponding database
table. Strongly typed DataSet
s are
often used to reduce the amount of code required to bridge between C#
and the database. However, since LINQ to SQL and LINQ to Entities came
along, this use of DataSet
s has
become less popular, because the LINQ-based approaches offer the same
benefit but are typically easier to use. So DataSet
s are somewhat out of favor
today.
The low-level ADO.NET data access interfaces were the main way to access data in .NET right up until .NET 3.5 and Visual Studio 2008 shipped, bringing LINQ.
As we saw in Chapter 8, LINQ lets you perform tasks with collections of data including filtering, sorting, and grouping. In that chapter, we were working only with objects, but these are exactly the jobs that databases are good at. Moreover, one of the motivations behind LINQ’s design was to make it easier to use databases from code. As you can see in Example 14-2, LINQ blends data access seamlessly into C# code—this database example looks very similar to the object examples we saw in the earlier chapter.
Example 14-2 uses LINQ to Entities—a LINQ provider for the Entity Framework. The Entity Framework didn’t appear until Service Pack 1 of Visual Studio 2008, and there’s an older database LINQ provider called LINQ to SQL that appeared in the first Visual Studio 2008 release.
LINQ to SQL works only with SQL Server and SQL Server Compact 3.5, and has a fairly narrow goal. It aims to reduce the overhead involved in writing data access code by providing a convenient C# syntax for working with the data in a SQL Server database.
The Entity Framework is similar, but it adds a couple of additional features. First, it is designed to support multiple database vendors—it has an open provider model, enabling support to be written for any database, and you can get providers for most popular databases. Second, the Entity Framework allows the .NET representation to have a different structure from your database schema if necessary. You can define a conceptual model whose entities do not necessarily correspond directly to rows of particular tables—an entity might include data that spans multiple tables in the database itself. This entity can then be represented as a single object.
Of course, it’s possible to have your conceptual model correspond exactly to your database model—you’re free to create a straightforward mapping where one entity represents one row in one table. Used in this way the Entity Framework, in conjunction with LINQ to Entities, makes LINQ to SQL look redundant. So why do we have both?
The main reason LINQ to SQL exists is that it was ready when Visual Studio 2008 shipped, whereas Microsoft hadn’t finished the Entity Framework at that point. LINQ was a major part of that release, and since one of the main motivations behind LINQ was data access, shipping without a LINQ-based data access feature would have been a bit of a letdown. LINQ to SQL was developed by a different team (it came from the LINQ team, and not the data access group), and it was ready earlier, due no doubt in part to its less ambitious goals.
Microsoft has stated that while both technologies are fully supported, the Entity Framework is where the majority of its efforts will now be focused. Visual Studio 2010 adds a few new LINQ to SQL features, but LINQ to Entities will see more development in the long run.
That’s why this chapter’s focus is the Entity Framework (although a lot of the concepts here apply equally to both technologies). That being said, both authors really like LINQ to SQL. In scenarios where we’re using SQL Server and where we don’t need the conceptual model and mapping features of the Entity Framework, we’re both more inclined to use LINQ to SQL because of its simplicity and because we’ve already learned how to use it. But if you learn only one data access technology for .NET, the Entity Framework looks like the better choice for the long term.
By the time Microsoft shipped the Entity Framework, various third-party options for mapping relational data into object models had been around for a while. We’re not going to talk about them in this book, but it’s useful to be aware that the Entity Framework isn’t the only game in town.
Perhaps the best known alternative is NHibernate. This is a .NET version of Hibernate, a popular Java ORM (Object Relational Mapper). NHibernate had already been around for a few years by the time the Entity Framework emerged (and its Java progenitor is considerably older). So in many respects it’s a more mature and more fully featured ORM than the Entity Framework. On the other hand, NHibernate predates LINQ (and Java currently has nothing resembling LINQ), so at the time of this writing, its LINQ support is somewhat limited.
Many other ORMs are available for .NET, some free and some commercial. They are too numerous to mention here, as a quick web search will confirm.
Most communication with databases happens over specialized, vendor-specific protocols. Firewalls are usually configured not to let such protocols through, and with good reason: from a security perspective, making your database directly accessible on the Internet tends to look like a very bad idea. Nonetheless, some people want to do this, and there are scenarios in which it’s not the terrible idea it might first seem, particularly if you can exercise sufficient control over what gets exposed.
With WCF Data Services, you can present a relational data store over HTTP and XML or JSON. You can be selective about what data you expose and to whom. Moreover, the model you present doesn’t necessarily have to be the same as your underlying database structure. In fact, there doesn’t have to be a database involved at all—there’s a provider model that enables you to present any data through this mechanism, as long as you can find a way to make it look like relational data.
You will normally use WCF Data Services in conjunction with the Entity Framework—you can define the entities you’d like to present over HTTP, and use the framework’s mapping services to bridge between that and the underlying data store. So we’ll be looking at these services in more detail later in the chapter, once we’ve finished exploring the Entity Framework.
The focus of WCF Data Services is different than for the other data access features we’ve discussed so far—it’s mainly about presenting data on the network, where everything else has been about consuming data. However, there’s also a client-side component that provides LINQ-based querying for such services. While it’s part of the WCF Data Services technology, it’s optional—you’re not obliged to use it on the client. And this client doesn’t strictly require WCF Data Services on the server—the client-side parts could be used against any service that exposes data in the same way.
Silverlight uses a seriously trimmed down version of the .NET Framework to keep its download size and install time tolerably small. It doesn’t have much data access support. In fact, size is not the only reason—it wouldn’t normally make sense for a Silverlight client application to attempt to connect directly to a database, because Silverlight is a client-side web technology and most system administrators work to ensure that their databases are not accessible via their native protocols over the Internet.
Of course, a direct connection to a database server might be an option in an intranet scenario, but it’s not supported. Silverlight offers LINQ, but neither the LINQ to SQL nor the LINQ to Entity Framework providers are available, because the underlying database access mechanisms that these providers use are missing. The only supported database access mechanism in Silverlight is the WCF Data Services client.
The full .NET Framework is designed to work with a wide range of databases. The simple ADO.NET data access we started with uses interfaces to allow database vendors to supply their own database-specific implementations. Likewise, the Entity Framework is database-agnostic—it has an open provider model designed to allow support for any relational database to be added. Of course, Microsoft ships a provider for its own database, SQL Server, but other suppliers offer providers for various databases, including Oracle, MySQL, PostgreSQL, SQLite, Sybase, and DB2.
In this book, we will use SQL Server. The examples work with SQL Server, which is available for free. (Some editions of Visual Studio will automatically install SQL Server 2008 Express for you by default.) The Express edition of SQL Server is the same database engine as the “real” versions, but with some limits on database size and with some of the more advanced features missing. Despite being a trimmed down version, it’s easily capable of supporting substantial websites. It can also be used on client applications written with WPF or Windows Forms, to support client-side data stores or caching, although it can complicate the installation process for such an application—installing a SQL Server instance is not a trivial task.
If you want to follow the examples in this chapter, not only will you need a copy of SQL Server 2008 Express installed, but you’ll also need to install a sample database. We’ll be using the lightweight version of the Adventure Works database available from http://msftdbprodsamples.codeplex.com/.
Getting this sample up and running is slightly fiddly, because there are numerous different versions of the Adventure Works sample—there are full and lightweight versions for both SQL Server 2005 and SQL Server 2008, and each version of SQL Server comes in various editions, not all of which put their datafiles in the same place. Because of all the variations, it’s quite easy to find that the sample database has failed to appear even though the installation appeared to proceed without error.
Moreover, the steps required to install the database change from time to time, as new versions are released. We had been planning to provide detailed steps here, but while we were writing this book, changes to the database installer rendered the first set of instructions we had produced useless. Since that could well happen again between us finishing the book and you reading it, we’re providing the instructions as part of the sample code you can download for this book from the O’Reilly website so that we can update them when necessary. You can find these at http://oreilly.com/catalog/9780596159832/.
Now that we’ve finished a quick survey of the data access features available in .NET and we’ve seen how to get the sample database installed, let’s look at the Entity Framework in more detail. We’ll start with the model at the heart of the framework.
The main goal of the Entity Framework (or EF for short) is to make it easier for your code to work with data in the database. C# objects are quite different in nature than the information stored in a relational database, and the process of managing these differences, and transferring data between these two worlds, is called mapping. (So the Entity Framework is a kind of ORM.) As Figure 14-1 illustrates, mapping happens in both directions. As information is fetched from the database, it is loaded into objects. And if your C# code modifies these objects or creates new ones, you can arrange for the database to be correspondingly updated.
The design of a database doesn’t always correspond directly to data structures convenient for our application code. There are many reasons we might want our code to work with a model that looks slightly different from the data. The database may contain information not required by the part of the application we’re writing, so we may need only a subset. Information about a particular entity may have been split across multiple tables for performance reasons. Naming conventions in the database might not suit our code.
So the Entity Framework allows us to control the mapping. We can define a conceptual model that describes the entities as we’d like to work with them from C#, along with mappings that describe how that model maps onto the underlying storage. The EF also requires us to provide a store schema, which is a definition of the structure we expect it to find in the database. This may seem redundant—after all, the database knows its own schema, so why would the EF need a copy? There are a couple of reasons. First, it’s possible to define the model before you create the database—you can generate a database schema from the store schema. Second, you can configure aspects of how the Entity Framework uses the database, such as whether it uses queries or stored procedures to access particular tables. Settings that are associated with the database itself rather than what the EF does with data belong in the store schema rather than the mappings or conceptual schema.
The three parts shown in Figure 14-1—the conceptual model, the storage model, and the mappings between them—are collectively known as the Entity Data Model, or EDM.
There are many constraints on the conceptual model, because the model is useful only if you can construct a successful mapping. There are limits on what mappings are able to do, so your existing database structure will impose some restrictions on the model. Developers who are new to the Entity Framework often find that they have a lot less freedom in the design of the conceptual model than they first presumed. We’ll see what mappings are possible in due course, but for now, do not imagine that the EF is able to take any arbitrary conceptual model and bridge it to any old database structure—there is necessarily a close relationship between the database and the conceptual model.
If you use the EF in the simplest way possible, your conceptual model will be the same as your storage model, and the mapping will be very straightforward. If you use Visual Studio’s wizard for adding EF support to a project, you’ll end up with exactly this sort of direct mapping, with one entity type for each table or view you import. But you can then tweak things to suit your needs. We’ll walk through the wizard now—even though it produces a straightforward mapping where the conceptual model matches the storage model, it still has to generate a complete set of model and mapping definitions, so it’s instructive to look at what it produces.
You can add EF support to any .NET project (except for Silverlight projects). We’ll use a console application for the examples. In the Add New Item dialog, we’ll select Visual C# Items→Data, and then choose the ADO.NET Entity Data Model item template, calling the new file “AdventureWorksModel”.
When you add an Entity Data Model to your project, Visual Studio asks whether you want to start from scratch or base your model on an existing database. We’ll choose that simpler second option. If you’ve previously told Visual Studio about any databases you’re using—either via the Server Explorer toolbar or by using this or other data-related wizards—it will show them in a drop down, but you can provide Visual Studio with new connection details from within the wizard. For this walkthrough, we’re going to connect to the AdventureWorksLT2008 sample database.
The wizard uses the name of your connection for one of the types
that it generates. You’ll see the identifier AdventureWorksLT2008Entities
cropping up in
various examples later. If you happen to give your connection a
different name in Visual Studio, you’ll need to use that name in the
code.
Once you’ve chosen a database, Visual Studio will show a tree view
of all the tables, views, and stored procedures—you can use these as the
starting point for your model. For each item you select, it will add
corresponding items to the store schema, the conceptual schema, and the
mappings. When you complete the wizard, it will generate an .edmx file that defines the generated entity
model. Visual Studio opens a graphical view of this file—Figure 14-2 shows the conceptual
model that appears if you select the Customer
, SalesOrderHeader
, and SalesOrderDetail
tables in the wizard and then
click Finish.
This view shows only the conceptual model. You can see slightly more
of the EDM in the Model Browser, shown in Figure 14-3. This will normally appear by
default when you open an EDM, but if you rearrange your windows and lose
track of it, you can right-click on the background of the model and choose
Model Browser from the context menu.
The browser lists both the conceptual schema (under the AdventureWorksLT2008Model
node here) and the store schema (under AdventureWorksLT2008Model.Store
). The three
selected tables are visible in both, and if you were to expand them, you’d
see that the properties of each entity in the conceptual model correspond
directly to the columns of the tables in the store schema.
Even the Model Browser doesn’t show the complete picture, as the Entity Data Model has three parts: the conceptual schema, the store schema, and the mappings. To see the mappings, you can select either entities or properties of entities. When you do this in either the main .edmx editor view (Figure 14-2) or the Model Browser (Figure 14-3), the Mapping Details window, shown in Figure 14-4, will display the mappings for the selected item. The Mapping Details panel should appear automatically, but if you don’t see it, you can open it with the View→Other Windows→Entity Data Model Mapping Details menu item.
As Figure 14-4 shows, the
generated mapping is pretty simple. On the left you can see each column
from the table definition in the store schema, and on the right you can
see which entity property it is mapped to. Since the store schema and
conceptual model were generated directly from the database schema, there’s
nothing complicated going on—the same names appear on either side, and the
only difference is that the lefthand side shows data types from the
database world such as nvarchar
and
bit
, while the righthand side shows
.NET data types such as String
and
Boolean
.
Visual Studio puts the whole Entity Data Model definition in an .edmx file, which is just XML. The wizards and editor windows we’ve seen so far are just convenient views into that XML. If you look directly at the XML in the .edmx, you’ll see it contains sections corresponding to the three parts of the model—storage schema, conceptual schema, and mappings. But the whole point of this exercise was to make it easier to use data from code. So Visual Studio generates code based on the contents of any .edmx files in your project.
For each entity type you define, a corresponding .NET class will be generated. These classes provide normal properties for each property in the entity type’s definition. So when you create and edit entity types, you are in effect defining .NET types that you can use from your C# code.
The generated types derive from EntityObject
, a base class that enables the
object to participate in the Entity Framework. This includes features
such as change tracking so that the framework can know when it needs
to write updates to the database. The first version of the EF required
entities to derive from this base class or to implement certain EF
interfaces, but .NET 4 introduced so-called POCO (Plain Old CLR Object)
support, which makes it possible to use an existing class hierarchy
with the Entity Framework without having to modify those classes (as
long as you can create a successful mapping). There’s more work to do
that way—if you don’t derive from EntityObject
you need to write extra
supporting code to provide the EF with enough information to know how
it should handle change tracking, identity, and relationships for your
objects. Here we’re sticking with the simpler approach of letting the
wizard generate classes that derive from the EF’s base type.
Visual Studio also generates one extra class representing
something called the object context. You use this to
obtain entity objects representing data already in the database and it’s
also where you go to add new data. And as we’ll see later, this object
provides other services for managing the data access operations. This
type derives from ObjectContext
, and sometimes it’s just
referred to as the context. Example 14-3 uses this generated context
type to retrieve rows from the SalesOrderHeader
table for a particular
date.
Example 14-3. Using generated entity types
using (var dbContext = new AdventureWorksLT2008Entities()) { DateTime orderDate = new DateTime(2004, 6, 1); var orders = from order indbContext.SalesOrderHeaders
where order.OrderDate == orderDate select order; foreach (SalesOrderHeader order
in orders) { Console.WriteLine(order.TotalDue); } }
Notice that this example wraps the context in a using
statement—the object context is a
disposable resource because it does a lot of work behind the scenes, and
it needs to tidy up once you no longer need the state it builds up. So
it’s important that you dispose it when you’re done with it.
The object context’s type here is AdventureWorksLT2008Entities
. By default,
Visual Studio will just append the word Entities to
your database connection name. You can change this by selecting the
EntityContainer
item in the Model
Browser—you can see this in the middle of Figure 14-3—and then use the Properties panel
to choose its name. But we’ll keep the default name in the
examples.
Notice that the LINQ query in Example 14-3 uses the context’s SalesOrderHeaders
property as the query
source. That’s not quite the same as the table name—the wizard has added
an s. By default, the Entity Framework wizard will
attempt to pluralize and depluralize words as appropriate—in general, it
gives entity types singular names while properties that return
collections of entities are plural. (The names in our conceptual model
can differ slightly from our storage model thanks to the Entity Data
Model’s mapping.) If you don’t like this plural handling, there’s a
checkbox to turn it off when you import tables with the wizard.
Example 14-3 also uses the
SalesOrderHeader
class generated for
the entity type of the same name. The order
range variable in the LINQ query is of
this type, as is the order
iteration
variable in the loop.
It’s this generated entity class that enables us to refer to
database columns using normal C# syntax. The LINQ query’s where
clause uses that entity class’s OrderDate
property to build a query that uses
the OrderDate
column of the
corresponding database table. Likewise, the loop uses normal C# property
syntax to retrieve TotalDue
, which
represents the column of the same name in the database.
If this seems rather uneventful, well, that’s the idea. Compare this to the much more fiddly code in Example 14-1—by mapping database columns to properties, the Entity Framework reduces the amount of friction involved in writing data access code.
You can find the generated source code for the entities in the
Solution Explorer by expanding the .edmx file—you’ll find a file with a
similar name, but with .Designer.cs in place of .edmx (so AdventureWorks.Designer.cs in this case).
As with all generated code you should avoid modifying
it—Visual Studio tends to regenerate code from scratch each time any
setting changes. But if you’d like to add features to these generated
classes, that’s easily done—all the classes are generated with the
partial
keyword, meaning that you can
put additional members in separate source files. You can add another
class definition with the same name as an entity type, marked with the
partial
keyword, to any source
file. The C# compiler will effectively merge your partial class with
the generated partial class.
Let’s change things a bit, so the mapping has something to
do. Most of the column names in this example database happen to fit the
usual .NET conventions for property names, but there’s an exception: the
rowguid
column in SalesOrderHeader
is not capitalized in the
usual way. (This column exists to support SQL Server replication, so
it’s fairly unusual to want to use it from code, but in this example
it’s the only column with a funny-looking name.) If you change this name
to RowGuid
in the designer (either by
double-clicking on the property or by using the Properties panel) Visual
Studio will update the mapping, and the Mapping Details panel will show
that the rowguid
column in the table
is mapped to the RowGuid
property of
the entity. (If you’d prefer a less subtle change, you could rename the
Customer
entity’s ModifiedDate
to, say, LastChanged
. The mapping lets you use any
names you like.)
Changing the names of a few columns isn’t very exciting. (And with this particular example database it’s not even very useful, although if you’re dealing with more idiosyncratic naming schemes, renaming becomes more important.) So let’s look at one of the more interesting mapping features: the way in which the EF handles relationships between entities.
Databases usually have relationships between tables. In
the Adventure Works example, the Customer
table has a foreign key relationship
with the SalesOrderHeader
table. Both
tables have a CustomerID
column. This
is the primary key of the Customer
table, and the database schema includes a constraint enforcing that the
CustomerID
column in the SalesOrderHeader
can contain only a value for
which a corresponding Customer
row
exists. Or to put it more simply, each SalesOrderHeader
row must be related to a
specific Customer
row.
This has nothing to do with the relations in a relational database, incidentally. The relational name comes from the set theory underpinning databases, and a relation is not the same thing as a relationship here. A relation effectively corresponds to a table.
Visual Studio’s EDM wizard looks for foreign key constraints in the database schema to discover the relationships between tables. In the EDM, it turns these into associations. (The distinction between a relationship and an association is somewhat subtle. An association is a named item in the Entity Data Model representing a particular relationship. The main reason this distinction exists is that relationships are a slightly more general concept—associations are capable of modeling only certain kinds of relationships.) Just as tables added with the wizard end up appearing in all three parts of the EDM—a table will appear in the store schema, a corresponding entity will be added to the conceptual schema, and there will be a mapping between the two—a similar process occurs for associations. If a foreign key constraint indicates that there’s a relationship between two database tables added through the wizard, Visual Studio will add an association to the EDM’s store schema and also to the conceptual schema, and it will set up a suitable mapping between these two associations. And on top of this, it will add navigation properties to the related entities in the conceptual model.
In previous versions of the Entity Framework, foreign key
columns represented with navigation properties would not get scalar
properties providing direct access to the key values—there would have
been no CustomerID
property on the SalesOrderHeader
type, for example. This
proved awkward in practice, so starting with .NET 4 relationships are
represented both as the underlying foreign key value and also as a
navigation property.
Associations represent relationships between entities,
and the most natural way to present them in the world of objects is
through properties. For example, you’d expect an object representing a
sales order to provide a property that refers to the related customer.
That’s exactly what the EF does, and it also works in reverse: the
customer object provides a property that holds a collection of
references to all the customer’s orders. Example 14-4 shows a LINQ query that gets
the number of SalesOrderHeader
rows
associated with each customer. It fetches one property from the
Customer
entity class that maps to
a column in the database (CustomerID
) and also uses a property called
SalesOrderHeader
, which represents
the customer’s orders.
Example 14-4. Using a navigation property
using (var dbContext = new AdventureWorksLT2008Entities()) { var customerOrderCounts = from cust in dbContext.Customers select new { cust.CustomerID, OrderCount = cust.SalesOrderHeaders.Count }; foreach (var customerInfo in customerOrderCounts) { Console.WriteLine("Customer {0} has {1} orders", customerInfo.CustomerID, customerInfo.OrderCount); } }
The database table that the Customer
entity class represents does not
have a column called SalesOrderHeader
. The Entity Framework
wizard added this property to represent the relationship between the
Customer
and SalesOrderHeader
tables. This is not an
ordinary property—in Figure 14-2 you can see that it
appears separately, under Navigation Properties.
From C# code, a navigation property looks like a collection.
Example 14-4 just retrieves the
Count
property, but we could do
more advanced things. The query in Example 14-5 has a nested
query for each customer that looks for all shipped orders (those with
a Status
of 5
), and for each one it reads the total due
for that order and a count of all the SalesOrderDetails
rows associated with that
order. So this uses two navigation properties—the one representing the
relationship between customers and orders, and the one representing
the relationship between orders and order details.
Example 14-5. Traversing multiple relationships with navigation properties
var info = from cust in dbContext.Customers select new { cust.CustomerID, Orders = from order in cust.SalesOrderHeaders where order.Status == 5 select new { order.TotalDue, ItemCount = order.SalesOrderDetails.Count } };
There’s a reason we’ve used LINQ in these last two examples—it
happens to avoid an issue with navigation properties. How does the EF
decide how many entities to load for us, and when? In Example 14-4, the LINQ query just
retrieves two pieces of information for each customer—the CustomerID
and the order count—and while
Example 14-5 is more
complex, it’s still circumscribed, so the EF can inspect the query to
work out exactly what it needs to retrieve. But when we’re not using
LINQ, how does the EF know what to do? For instance, consider the code
in Example 14-6.
Example 14-6. Following an association after the initial query
Customer myCustomer = dbContext.Customers.Single( cust => cust.CustomerID == 29531); Console.WriteLine(myCustomer.SalesOrderHeaders.Count);
This fetches the entity for a specific customer, and then tries
to get the number of SalesOrderHeader
entities to which this item
is related. Prior to .NET 4, this did not work—it would print out
0
, even though the example database
has one related order for this customer. In .NET 3.5 SP1, the Entity
Framework would initialize navigation properties such as the Customer
object’s SalesOrderHeaders
property with an empty
collection, and would load the related objects only if we ask it to,
using the Load
method shown in
Example 14-7.
Example 14-7. Explicitly loading entities for an association
Customer myCustomer = dbContext.Customers.Single(
cust => cust.CustomerID == 29531);
myCustomer.SalesOrderHeaders.Load();
Console.WriteLine(myCustomer.SalesOrderHeaders.Count);
.NET 4 adds an alternative way to do this, called lazy loading. Rather than having to
call Load
explicitly, the EF can
automatically load related objects at the point at which you access
them. The context has a property to control this:
dbContext.ContextOptions.LazyLoadingEnabled = false;
This is true
by default;
setting it to false
reverts to the
pre-.NET 4 behavior. With this option switched on, Example 14-6 is equivalent to
Example 14-7, because
the EF will call Load
for us when
we first try to use the navigation property. (The collection ignores
calls to Load
if the entities are
already loaded, so requesting multiple loads is not a problem.)
In either case, the EF has to make an extra trip to the
database. The call to Single
will
fetch the customer from the database before returning, which means
that a second request is required when we later ask it (either
explicitly or implicitly) to fetch the related rows, because the EF
didn’t know we were going to use these items until we asked for them.
This might not be a problem, but in general, the more trips you make
to the database, the slower things go.
Be wary of enabling lazy loading, because it can sometimes result in a lot of unnecessary database requests. For example, one author was involved with a project that had some diagnostic code that “helpfully” wrote a snapshot of certain objects into a log, including the value of all their properties. Unfortunately, this code was recursive—if a property referred to another object, it would display that too, and if a property referred to a collection of objects, it would show all of them. This logging code had cycle detection, so it wouldn’t get stuck in an infinite loop, but otherwise it wouldn’t stop until it had showed every object reachable from the starting point. Unfortunately, lazy loading was enabled, so when this code was given an entity, it ended up fetching all entities that were related, no matter how distantly, to the first object at hand, so it hammered the database with thousands of requests each time a log entry was generated.
Modern databases are surprisingly fast—it’s possible for this sort of problem to go unnoticed on development machines with their own local database instance. But you probably don’t want it happening on a busy live server.
To get consistent results you’d want to make sure the initial
query and subsequent lazy loads happen as part of a transaction (as
shown later), but to ensure scalability in a busy system you want to
minimize the number of requests made in any single transaction. So you
can tell the EF that you want certain related entities to be fetched
at the same time as the main result. You do this with the Include
method shown
in Example 14-8, which is
available on any of the entity sets provided by the context.
Example 14-8. Specifying relationships to preload
var customersWithOrderDetails = dbContext.Customers. Include("SalesOrderHeaders.SalesOrderDetails"); Customer myCustomer = customersWithOrderDetails.Single( cust => cust.CustomerID == 29531); Console.WriteLine(myCustomer.SalesOrderHeaders.Count);
This call to Include
asks to
load related entities available through the Customer
entity’s SalesOrderHeaders
property. (These will be
loaded regardless of the lazy loading setting.) It also says that for
each of those related entities, the EF should load any related
entities visible through the SalesOrderDetails
property. In other words,
this tells the EF that we would like it to fetch all of the orders for
this customer and all of the details for those orders. It will
generate a single query that fetches all of the necessary information
in one request.
If you’re wondering why it doesn’t just prefetch all related items all of the time, consider the performance implications. In some circumstances, aggressively prefetching all related items might amount to attempting to copy a significant fraction of your database into memory! But even in more circumscribed cases, fetching more data than you need can slow your system down or reduce scalability.
So far we have seen only so-called one-to-many relationships—one customer can be related to many orders, one order can be related to many order details. But there are other kinds of relationships.
The multiplicity of a relationship refers to the number of participants at either end of the association. In the Entity Framework, an association’s multiplicity determines the nature of the navigation properties that represent the relationship.
In the Entity Framework, there are always two ends to an association, regardless of the multiplicity. For example, we have customers at one end of a relationship and orders at the other end. The multiplicity describes how many items may be at a particular end, not how many ends there are.
You will sometimes want to represent more complex relationships—for example, a so-called ternary relationship involves three kinds of parties. This is a different concept from multiplicity and is called degree. For example, consider a teaching arrangement in a college, where a student is taught a subject by a teacher; this relationship involves three entities (student, subject, and teacher). These higher-degree relationships are typically modeled in the database by having a table just for the relationship itself. Likewise, the EDM does not directly support relationships with a degree of more than two, so you would represent such a relationship with a distinct entity type in the conceptual model, adding associations between that entity and all the participants in the relationship.
For each end of a relationship, you can specify a multiplicity
of either 1
, 0..1
, or *
. The first, 1
, means what it says—there is always one
item at that end of the association. The last, *
, means any number—there can be zero, one,
or several items at that end. A multiplicity of 0..1
means zero or one—this indicates that
the association is optional, but where present, there is just one
entity at this end.
In a one-to-many relationship, the two
ends have a multiplicity of 1
and
*
, respectively. You can see this
in Figure 14-2—the lines
between entities represent associations, and the multiplicity appears
at each end of the line. So an item at the first end can be related to
any number of items at the second end; an item at the second end is
always related to exactly one item at the first. In C#, the entity at
the 1
end would have a navigation
property that offers a collection, in order to provide access to the
many end. The entity at the *
end would provide a simpler noncollection
property to get back to the one entity it is related to.
A variation on this theme has 0..1
instead of 1
at the first end, and *
at the second end as before. This is
similar to a one-to-many relationship, except items at the
many end don’t necessarily have to be related to
an item at the other end. For example, you might want to represent the
relationship between managers and their reports. But if you go far
enough up the corporate hierarchy, you will find someone who has no
manager—the navigation property would return null. So a simple
one-to-many relationship doesn’t work here—you would need 0..1
instead of 1
at the manager end of
the association.
Sometimes one-to-one relationships crop up—each item at one end is always related to exactly one item at the other end. This is an unusual kind of relationship because it implies that entities are inextricably and exclusively linked. Relationships that sound like they might be one-to-one are often not. Here’s an illustration from popular culture, describing a relationship between a master and an apprentice expressed as: “Always two, there are. No more, no less. A master, and an apprentice.”[35] A master always has an apprentice, an apprentice always has a master, so isn’t that a one-to-one relationship? In fact, this might need to be a one-to-many relationship because on the death of an apprentice, the master takes a new apprentice. (The apprentice has just one master, as the only career paths are promotion to master or untimely death. So we can at least be sure that this is not a many-to-many relationship.) The constraint expressed here is merely that the master has a one-at-a-time approach to relationships, much like serial monogamy. (For example, both Darth Maul and Darth Vader were apprentices of Darth Sidious.) So if the database needs to reflect the full history rather than just the current state, a one-to-one relationship won’t be sufficient. (Although if you only need the database to store the current state, one-to-one might be fine here.) In databases, one-to-one relationships often exist because information about a single entity has been split across multiple tables, perhaps for performance reasons. (The EF lets you map this back to a single entity in the conceptual model, so such relationships are likely to be more common in the store schema than the conceptual schema.)
Variations on one-to-one where one or the other end is optional
can be useful.[36] For example, you might have an entity representing a
customer and an entity representing an account. An organization (such
as a butcher shop) might choose to have a policy where customers are
not required to have accounts, but where accounts are held any single
customer can have only one account, and accounts must be held by
exactly one customer. (That’s not the only imaginable policy, of
course.) The relationship between a customer entity and an account
entity would have a multiplicity of 1
at the customer end and 0..1
at the account end.
Finally, there are many-to-many relationships. For
example, you might have an entity type to represent a standard part
such as an M3 bolt, and an entity to represent a part manufacturer.
Many manufacturers are capable of producing M3 bolts, and most
manufacturers produce more than one kind of product. To model the
relationship of who produces what in the EDM, you could use an
association with a multiplicity of *
for both ends of the association. And in
code, both entities would have navigation properties offering
collections of objects.
However, there’s an issue with many-to-many relationships in the EF. In the database, such a relationship is represented as a separate table, where each row contains two foreign keys, one for each end of the relationship. If that’s all the table contains, the EF will happily let you map this table to an association in the conceptual model, and the navigation properties will work as described. However, if the table contains other information, you will end up needing to represent it as an entity in its own right. For example, given the product/manufacturer example earlier, it might turn out to be useful to know what product code a particular supplier uses for a particular standard product. There’s no place for this information to go if you just have navigation properties on the product and manufacturer that point to one another—you would need an extra entity type to hold this property that is specific to a particular product/manufacturer combination.
This can get slightly awkward when there are columns in the relationship table that your application doesn’t particularly care about, but which the EF insists are mapped because they are nonnullable and don’t have default values. Your conceptual model would not be able to represent this table as a simple many-to-many association, because that would leave nowhere to map the relationship property. (The underlying issue here is the same one that prevents you from omitting certain database columns from your entities.)
Finally, we’ll look at one more feature of the Entity Framework’s mapping capabilities: support for inheritance.
Inheritance presents a challenge for an ORM, because the typical object-oriented notions of inheritance don’t have any direct parallel in the relational model. Various solutions exist because there isn’t one really good way to do this. The Entity Framework supports mappings for a couple of the popular approaches for attempting to bridge this chasm.
While there are several approaches to mapping (which we’ll get to shortly), the conceptual model’s handling of inheritance works the same way in all cases, and is very similar to inheritance in C#. Any entity type can optionally specify one other entity type as its base type. Entities with a base type inherit all the properties from that base. An entity cannot specify more than one base type, but you are allowed to derive from an entity that derives from another entity (i.e., you can have an inheritance chain). And the corresponding generated entity classes that you use from C# will represent these inheritance relationships with normal class inheritance.
You will need to define mappings for your base entity type in the usual way. All the derived types will inherit these mappings. The question is: how do we map features unique to individual derived types?
The first mapping approach involves mapping all entity types
sharing a particular base entity type to a single table in the database.
The entity type the EF chooses to represent a particular row is chosen
based on a discriminator column—in the mapping you
simply provide a list that says, for example, if the discriminator
column contains 1, the entity type is Employee
, and if it’s 2, the type is Manager
, while if it’s 3, the type is Director
, and so on. These derived types will
presumably have additional properties distinguishing them from one
another, and these would map to nullable columns in the table. They will
need to be nullable, because these columns will have values only when
you’re using the derived types that support them—non-nullable database
columns need to be mapped to properties in the base entity type if
you’re using this mapping style.
The second mapping approach uses a separate table for each derived type. Derived types still inherit the base mappings, so in this scenario, derived entity types will be involved with two or more tables: the table unique to the derived type, along with any tables used by the base type. This approach requires all the tables involved to use the same primary key.
None of these mapping features would be much use without some way to retrieve data from the database, so we’ll now look at how to execute queries in the Entity Framework.
We’ve seen some simple LINQ-based examples for retrieving data from the database with the Entity Framework. Under the covers, the EF turns a LINQ query into a SQL query that the database understands. In fact, there are two ways of getting the EF to query the database for data: LINQ and something called Entity SQL. We’ve seen some simple LINQ to Entities examples already, but we’ll now look at it in more detail.
The LINQ provider for the Entity Framework, LINQ to Entities, supports all of the standard LINQ operators we saw in Chapter 8, but it works a little differently. The idea of deferred execution is still present, and it’s even more important. The point at which you cause the LINQ query to execute—the instant at which your code first starts trying to use the results—is the point at which the EF will need to send a request to the database. So looking at the code from Example 14-3, the statement shown in Example 14-9 does not get anything from the database.
Example 14-9. Simple LINQ to Entities query expression
var orders = from order in dbContext.SalesOrderHeaders where order.OrderDate == orderDate select order;
As always with LINQ, a query expression only
defines a query—orders
is an object that knows what it’s
supposed to return if anything happens to enumerate it. So it’s the
foreach
loop in Example 14-3 that kicks off the actual
request.
The way the EF processes the request is different from how LINQ to
Objects works. LINQ to Objects works by forming a chain of operators
that work sequentially—the source collection might pass through the
Where
operator, followed by, say, an
OrderBy
or a Group
operator. The Where
operator in LINQ to Objects works by
walking through every single item in the source, discarding the ones
that don’t meet the filter criteria, and the ones that do meet the
criteria get passed on to the next item in the chain.
We really don’t want data access code to work that way, and as mentioned earlier, the EF lets the database do the filtering, which is far more efficient than fetching an entire table and then filtering the items in code. We’ll now verify that it really works this way by using the SQL Profiler tool to examine what the EF does for us.
SQL Profiler is not part of SQL Server 2008 Express, not even if you install the version with advanced services and Management Studio. You will need a full edition of SQL Server. (The Developer edition will do.) SQL Profiler works just fine with the Express version of the database, but it’s distributed and licensed only as part of the fuller editions. As long as you have a suitable license, you can install just the tools from a full edition SQL Server onto a machine that has only the Express version of the database, and it will work just fine. (Unfortunately, if you already installed the Express version of Management Studio, you can’t install the full management tools on the same machine.)
A full description of the SQL Profiler is beyond the scope of this book—we’re using it to show you exactly what the Entity Framework asked the database to do. However, it’s a profoundly useful tool; even if you use it only for the simple task of discovering what SQL queries are being executed. If you plan to do much work with databases, it’s well worth learning how to use it.
By single-stepping through the code in Visual Studio while running
the SQL Profiler, we can see that nothing appears in the profiler until
we start to execute the foreach
loop,
at which point the profiler shows an Audit
Login
message, indicating that our program has opened a
connection to the database. This is followed by a RPC:Completed
message,
indicating that SQL Server processed a request. When we select this
message, the profiler shows the SQL that the EF just ran for us:
exec sp_executesql N'SELECT [Extent1].[SalesOrderID] AS [SalesOrderID], [Extent1].[RevisionNumber] AS [RevisionNumber], [Extent1].[OrderDate] AS [OrderDate], [Extent1].[DueDate] AS [DueDate], [Extent1].[ShipDate] AS [ShipDate], [Extent1].[Status] AS [Status], [Extent1].[OnlineOrderFlag] AS [OnlineOrderFlag], [Extent1].[SalesOrderNumber] AS [SalesOrderNumber], [Extent1].[PurchaseOrderNumber] AS [PurchaseOrderNumber], [Extent1].[AccountNumber] AS [AccountNumber], [Extent1].[CustomerID] AS [CustomerID], [Extent1].[ShipToAddressID] AS [ShipToAddressID], [Extent1].[BillToAddressID] AS [BillToAddressID], [Extent1].[ShipMethod] AS [ShipMethod], [Extent1].[CreditCardApprovalCode] AS [CreditCardApprovalCode], [Extent1].[SubTotal] AS [SubTotal], [Extent1].[TaxAmt] AS [TaxAmt], [Extent1].[Freight] AS [Freight], [Extent1].[TotalDue] AS [TotalDue], [Extent1].[Comment] AS [Comment], [Extent1].[rowguid] AS [rowguid], [Extent1].[ModifiedDate] AS [ModifiedDate] FROM [SalesLT].[SalesOrderHeader] AS [Extent1] WHERE [Extent1].[OrderDate] = @p__linq__0', N'@p__linq__0 datetime',@p__linq__0='2004-06-01 00:00:00'
It might be quite long, but structurally that’s a pretty simple
SELECT
statement. The only reason
it’s so large is that it explicitly requests every column required by
the entity (and it has specified each column in a fairly verbose
manner). The interesting part is in the last two lines. The penultimate
line is a parameterized WHERE
clause
comparing the OrderDate
to a named
argument. This is what became of our LINQ query’s where
clause. And the final line provides a
value for that named argument.
Note that you’re free to chain operators together in LINQ to
Entities just as you can in LINQ to Objects. For example, we could build
on the orders
query from Example 14-3:
var orderedOrders = orders.OrderBy(order => order.OrderDate);
Or, if you’d prefer to carry on using LINQ syntax, Example 14-10 is equivalent.
Example 14-10. Chained query
var orderedOrders = from order in orders orderby order.OrderDate select order;
This doesn’t execute the orders
query. It just means we have two queries now—the orders
query that just filters, and then the
orderedOrders
query that filters and
then sorts. You could think of this chained query as shorthand for Example 14-11, which explicitly
combines the clauses from Example 14-9 and Example 14-10 into one query.
Example 14-11. That same query, written out in full
var orderedOrders = from order in dbContext.SalesOrderHeaders where order.OrderDate == orderDate orderby order.OrderDate select order;
Regardless of the various equivalent ways we would build the
second query, the result of executing it (e.g., by iterating over it in
a foreach
loop) is, as you’d expect,
a SQL query that includes an ORDER BY
clause as well as a WHERE
clause.
(And as it happens, that’s not hugely useful because in this example
database, all the orders have the exact same date. With slightly more
realistic data, this would have the expected effect, though.)
So LINQ to Entities queries work in a fundamentally different way
from the LINQ to Objects queries we saw previously. In LINQ to Objects,
the expression in a where
clause is
simply a delegate in disguise—it’s a method that the Where
operator calls for each item in turn to
work out whether to include that in the results. But with LINQ to
Entities (and LINQ to SQL for that matter) the LINQ query’s where
clause has been translated into T-SQL
and sent to the database—the expression we wrote in C# ends up running
in a different language, probably on a different machine. If you want to
understand how these kinds of queries are able to work so differently
for different providers, see the sidebar on the next page.
This translation is obviously very useful for shifting the work to the database, but it brings some limitations. If you try to add arbitrary method calls into the middle of a LINQ query, it’ll fail in LINQ to Entities. For example, suppose we have the following helper:
static DateTime NextDay(DateTime dt) { return dt + TimeSpan.FromDays(1); }
We could try to use this in a LINQ query:
var orders = from order in dbContext.SalesOrderHeaders
where order.OrderDate == NextDay(orderDate)
select order;
With LINQ to Objects this would work just fine—it’s all just C#
code, and you can use any valid Boolean expression in a where
clause, including expressions that
invoke methods. But with LINQ to Entities, although this will compile,
the EF will throw a NotSupportedException
at the point at which
you try to execute the query. Its error message will read:
LINQ to Entities does not recognize the method 'System.DateTime NextDay(System.DateTime)' method, and this method cannot be translated into a store expression.
LINQ to Entities queries are limited to the things that the EF
knows how to turn into database queries, and since it doesn’t know
anything about this NextDay
method
you’ve written, it can’t work out how to do that. Of course, when you
bear in mind that a LINQ to Entities query executes on the database,
it’s hardly surprising that you can’t invoke arbitrary methods in your
application from the middle of a query. But the EF integrates some
database features into your code so seamlessly that it’s sometimes easy
to forget where the boundary between your application and the database
lies.
While LINQ to Entities is a very convenient way to build queries, it’s just a layer on top of the Entity Framework’s underlying query system, which has its own query language.
The Entity Framework defines a query language for making queries against the conceptual model—rather than running queries against a database, as you do in normal SQL dialects, you can run queries that work directly against the entities in your model, as the name Entity SQL (or ESQL) suggests.
ESQL can be used for queries against the EDM storage model too. So it can also function as a kind of vendor-neutral SQL. But here we’re focusing on queries that target the conceptual model.
Why do we need a second way of making queries when we already have LINQ to Entities? Well, from a historical perspective that question has things back to front: during the Entity Framework’s development, ESQL was around long before LINQ to Entities. But since LINQ to Entities made it into the first version of the EF, it’s still reasonable to ask why we have both, what ESQL is for, and when it might look like a better choice than LINQ.
ESQL’s main benefit is that it’s sometimes useful to be able to represent a query as text. In fact, the Entity Data Model itself exploits this—there are some advanced scenarios in which ESQL queries can be embedded in the .edmx file. LINQ wouldn’t be an option here because .edmx is just XML; to use LINQ requires a language that supports LINQ.[37] If you wanted to store custom queries in a configuration file, you really wouldn’t want to have to run the C# compiler at runtime to interpret the queries. And with ESQL you don’t need to—you can represent a query as a string and the EF can execute that for you at runtime.
Another feature of a string-based query language is that it’s
relatively easy to compose queries at runtime. With a LINQ query
expression, the structure is fixed at compile time and you only really
get to tweak individual arguments, much like a fixed SQL query with a
few named arguments. (Technically, it is actually possible to build LINQ
queries dynamically. After all, LINQ operators are chained together with
simple function calls. However, dynamic composition of Expression<T>
trees turns out to be
surprisingly difficult. It’s not a scenario C# attempts to help you
with—you end up having to construct the expression trees without the
compiler’s assistance. This is not for the fainthearted.)
The practice of stitching together strings to form dynamic queries is messy and can be fraught with security issues such as injection attacks. It’s occasionally useful if you understand the risks and can mitigate them, but you need to exercise extreme caution.
Of course, string-based queries have a massive downside compared to LINQ: the C# compiler cannot offer any help as it doesn’t understand ESQL. If your ESQL strings are badly formed, you only get to find that out at runtime. And even if your ESQL is syntactically correct, C# does not understand the relationship between it and your code—whereas with a LINQ to Entities query C# can detect things such as type mismatches, it won’t spot when your ESQL gets out of sync with the code that uses the results.
Besides the inherent benefits and disadvantages of a string-based query, there’s also the fact that ESQL is, in effect, the native query language for the EF. This means there are a few EF features that can be accessed only through ESQL, although they’re all somewhat arcane. For example, an ESQL query can navigate associations between entities even if you’ve neglected to define navigation properties to represent those associations.
Example 14-12 shows a simple example that illustrates the basic use of ESQL.
Example 14-12. Querying with ESQL
using (var dbContext = new AdventureWorksLT2008Entities()) { DateTime orderDate = new DateTime(2004, 6, 1); var query = dbContext.CreateQuery<SalesOrderHeader>("SELECT VALUE o " + "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " + "WHERE o.OrderDate = @orderDate", new ObjectParameter("orderDate", orderDate)); foreach (var order in query) { Console.WriteLine(order.TotalDue); } }
This has the same effect as Example 14-3, but using ESQL in place of a
LINQ query. While this looks similar to a typical SQL query, the
VALUE
keyword is
specific to ESQL. We use this to indicate that we don’t want the usual
column-like behavior of SQL. You can write a more traditional-looking
query in ESQL, such as:
SELECT o.TotalDue, o.OrderDate FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o WHERE o.OrderDate = @orderDate
This asks for specific columns from the entity rather than the
whole entity. This is legal ESQL, but it would fail at runtime in the
context of Example 14-12. That example creates
the query with a call to CreateQuery<SalesOrderHeader>
on the
object context. The generic type argument to CreateQuery
—SalesOrderHeader
here—indicates the type of
result we’re expecting from the query, but this modified query clearly
returns something other than a SalesOrderHeader
. It returns a couple of
columns from each matching entity. When you build a query like this, you
get back objects that implement IDataRecord
—a general-purpose
interface used across all of ADO.NET to represent a record (such as a
table row) whose columns might not be known until runtime. (This is one
of the interfaces listed in Table 14-1.) So you’d need to
use CreateQuery<IDataRecord>
to
create such a query, and a suitably modified loop to extract the
results:
var query = dbContext.CreateQuery<IDataRecord>( "SELECT o.TotalDue, o.OrderDate " + "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " + "WHERE o.OrderDate = @orderDate", new ObjectParameter("orderDate", orderDate)); foreach (var order in query) { Console.WriteLine(order["TotalDue"]); }
Even if you ask for the whole entity as a single column in the
SELECT
clause, for example:
SELECT o FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o WHERE o.OrderDate = @orderDate
the query will still return IDataRecord
objects, not entities. Each data
record returned by this query would have a single column called o
that contains a SalesOrderHeader
entity. To get to the entity
you’d need to unwrap it inside your loop:
foreach (var row in query) { SalesOrderHeader o = (SalesOrderHeader) row["o"]; Console.WriteLine(o.TotalDue); }
The VALUE
keyword is just a
shortcut that tells ESQL to omit the IDataRecord
wrapper, and to return a sequence
of unwrapped entities. This enables Example 14-12 to assume that it will get SalesOrderHeader
entities back from the
query.
LINQ to Entities and ESQL are not mutually exclusive. You are free to use an ESQL query as the source for a LINQ query. Here’s a contrived example:
var orders = dbContext.CreateQuery<SalesOrderHeader>("SELECT VALUE o " + "FROM AdventureWorksLT2008Entities.SalesOrderHeaders AS o " + "WHERE o.OrderDate = @orderDate", new ObjectParameter("orderDate", orderDate)); var orderedOrders = from order in orders orderby order.DueDate select order;
This might be useful if you wanted to store ESQL queries in some sort of configuration mechanism to allow the exact query to be changed, but to do further processing of the results of that query with LINQ.
Yet another feature enabled by ESQL is that it lets code
built around the v1 ADO.NET mechanisms shown in Example 14-1 work with the EF. The
System.Data.EntityClient
namespace
defines concrete types that derive from the abstract base classes listed
in Table 14-1: EntityConnection
derives from DbConnection
, EntityCommand
derives from DbCommand
, and so on. As far as code written
to use these abstract base classes is concerned, the Entity Framework
ends up looking like just another database with another funky variety of
SQL. As long as your ESQL selects only column values and not whole
entities, queries will only ever return the same basic data types other
providers would, so the behavior will look much like any other ADO.NET
v1 provider.
As you’ve seen, the object context provides access to
entities. For each entity we define in our EDM, the generated object
context class provides a property that we can use as the source for a LINQ
query. We’ve also used its CreateQuery<T>
method to build ESQL-based
queries. The object context provides some other services.
To execute database queries, it’s necessary to connect to
a database, so the object context needs connection information. This
information typically lives in the App.config file—when you first run the EDM
wizard, it will add a configuration file if your application does not
already have one, and then it adds a connection string. Example 14-13 shows a configuration file
containing a typical Entity Framework connection string. (This has been
split over multiple lines to fit—normally the connectionString
attribute is all on one
line.)
Example 14-13. Connection string in App.config
<configuration> <connectionStrings> <add name="AdventureWorksLT2008Entities" connectionString="metadata=res://*/AdventureWorksModel.csdl| res://*/AdventureWorksModel.ssdl|res://*/AdventureWorksModel.msl; provider=System.Data.SqlClient;provider connection string= "Data Source=.sqlexpress;Initial Catalog=AdventureWorksLT2008; Integrated Security=True;MultipleActiveResultSets=True"" providerName="System.Data.EntityClient" /> </connectionStrings> </configuration>
This is a rather more complex connection string than the one we
saw back in Example 14-1, because
the Entity Framework needs three things in its connection string:
information on where to find the EDM definition, the type of underlying
database provider to use, and the connection string to pass to that
underlying provider. This last part—an ordinary SQL Server connection
string, enclosed in "
character entities—is highlighted
in Example 14-13 in bold.
The three URIs in the metadata section of the connectionString
—the ones beginning with
res://
—point to the three parts of
the EDM: the conceptual schema, the storage schema, and the mappings.
Visual Studio extracts these from the .edmx file and embeds them as three XML
resource streams in the compiled program. Without these, the EF wouldn’t
know what the conceptual and storage schemas are supposed to look like,
or how to map between them.
It may seem a bit weird for the locations of these EDM resources
to be in a connection string. It might seem more natural for the XML
to use a separate attribute for each one. However, as you’ve seen, the
System.Data.EntityClient
namespace
conforms to the ADO.NET v1 model so that it’s possible for old-style
data access code to perform queries against the EDM. Since the ADO.NET
v1 model includes an assumption that it’s possible to put all the
information defining a particular data source into a single connection
string, the Entity Framework has to follow suit. And since the EF
cannot function without the XML EDM definitions, the connection string
has to say where those live.
After the EDM metadata resources, you can see a provider
property, which in Example 14-13 indicates that the
underlying database connection is to be provided by the SQL Server
client. The EF passes the provider connection
string
on to that provider.
You don’t have to use the App.config to configure the connection. The
object context offers a constructor overload that accepts a connection
string. The configuration file is useful—it’s where the object context’s
no-parameters constructor we’ve been using in the examples gets its
connection information from—but what if you want to let just the
underlying database connection string be configurable, while keeping the
parts of the connection string identifying the EDM resources fixed?
Example 14-14 shows how you
could achieve this. It retrieves the configured values for these two
pieces and uses the EntityConnectionStringBuilder
helper to
combine this with the EDM resource locations, forming a complete EF
connection string.
Example 14-14. Passing an explicit connection string
using System.Configuration; using System.Data.EntityClient; ... // Retrieve the connection string for the underlying database provider. ConnectionStringSettings dbConnectionInfo = ConfigurationManager.ConnectionStrings["AdventureWorksSql"]; var csb = new EntityConnectionStringBuilder(); csb.Provider = dbConnectionInfo.ProviderName; csb.ProviderConnectionString = dbConnectionInfo.ConnectionString; csb.Metadata = "res://*/AdventureWorksModel.csdl|" + "res://*/AdventureWorksModel.ssdl|res://*/AdventureWorksModel.msl"; using (var dbContext = new AdventureWorksLT2008Entities(csb.ConnectionString)) { ... }
This code uses the ConfigurationManager
in the System.Configuration
namespace, which provides
a ConnectionStrings
property. (This
is in a part of the .NET Framework class library that’s not referenced
by default in a .NET console application, so we need to add a reference
to the System.Configuration
component
for this to work.) This provides access to any connection strings in
your App.config file; it’s the same
mechanism the EF uses to find its default connection string. Now that
Example 14-14 is providing
the EDM resources in code, our configuration file only needs the SQL
Server part of the connection string, as shown in Example 14-15 (with a long line split across
multiple lines to fit). So when the application is deployed, we have the
flexibility to configure which database gets used, but we have removed
any risk that such a configuration change might accidentally break the
references to the EDM resources.
Example 14-15. SQL Server connection string
<configuration> <connectionStrings> <add name="AdventureWorksSql" providerName="System.Data.SqlClient" connectionString="Data Source=.sqlexpress; Initial Catalog=AdventureWorksLT2008; Integrated Security=True;MultipleActiveResultSets=True" /> </connectionStrings> </configuration>
Besides being able to change the connection information, what else can we do with the connection? We could choose to open the connection manually—we might want to verify that our code can successfully connect to the database. But in practice, we don’t usually do that—the EF will connect automatically when we need to. The main reason for connecting manually would be if you wanted to keep the connection open across multiple requests—if the EF opens a connection for you it will close it again. In any case, we need to be prepared for exceptions anytime we access the database—being able to connect successfully is no guarantee that someone won’t trip over a network cable at some point between us manually opening the connection and attempting to execute a query. So in practice, the connection string is often the only aspect of the connection we need to take control of.
So far, all of our examples have just fetched existing data from the database. Most real applications will also need to be able to add, change, and remove data. So as you’d expect, the Entity Framework supports the full range of so-called CRUD (Create, Read, Update, and Delete) operations. This involves the object context, because it is responsible for tracking changes and coordinating updates.
Updates—modifications to existing records—are pretty
straightforward. Entities’ properties are modifiable, so you can simply
assign new values. However, the EF does not attempt to update the
database immediately. You might want to change multiple properties, in
which case it would be inefficient to make a request to the database for
each property in turn, and that might not even work—integrity
constraints in the database may mean that certain changes need to be
made in concert. So the EF just remembers what changes you have made,
and attempts to apply those changes back to the database only when you
call the object context’s SaveChanges
method.
Example 14-16 does this. In fact, most
of the code here just fetches a specific entity—the most recent order of
a particular customer—and only the last couple of statements modify that
order.
Example 14-16. Modifying an existing entity
using (var dbContext = new AdventureWorksLT2008Entities()) { var orderQuery = from customer in dbContext.Customers where customer.CustomerID == 29531 from order in customer.SalesOrderHeaders orderby order.OrderDate descending select order; SalesOrderHeader latestOrder = orderQuery.First(); latestOrder.Comment = "Call customer when goods ready to ship"; dbContext.SaveChanges(); }
To add a brand-new entity, you need to create a new entity object
of the corresponding class, and then tell the object context you’ve done
so—for each entity type, the context provides a corresponding method for
adding entities. In our example, the context has AddToCustomers
,
AddToSalesOrderHeaders
, and AddToSalesOrderDetails
methods. You will need
to make sure you satisfy the database’s constraints, which means that
the code in Example 14-17
will not be enough.
Example 14-17. Failing to meet constraints on a new entity
SalesOrderDetail detail = new SalesOrderDetail(); dbContext.AddToSalesOrderDetails(detail); // Will throw an exception! dbContext.SaveChanges();
The Entity Framework will throw an UpdateException
when Example 14-17 calls SaveChanges
because the
entity is missing all sorts of information. The example database’s
schema includes a number of integrity constraints, and will refuse to
allow a new row to be added to the SalesOrderDetail
table unless it meets all the
requirements. Example 14-18 sets the bare
minimum number of properties to keep the database happy. (This is
probably not good enough for real code, though—we’ve not specified any
price information, and the numeric price fields will have default values
of 0
; while this doesn’t upset the
database, it might not please the accountants.)
Example 14-18. Adding a new entity
// ...where latestOrder is a SalesOrderHeader fetched with code like // that in Example 14-16. SalesOrderDetail detail = new SalesOrderDetail(); detail.SalesOrderHeader = latestOrder; detail.ModifiedDate = DateTime.Now; detail.OrderQty = 1; detail.ProductID = 680; // HL Road Frame - Black, 58 dbContext.AddToSalesOrderDetails(detail); dbContext.SaveChanges();
Several of the constraints involve relationships. A SalesOrderDetail
row must be related to a
particular row in the Product
table,
because that’s how we know what products the customer has ordered. We’ve
not defined an entity type corresponding to the Product
table, so Example 14-18 just plugs in the relevant foreign key
value directly.
The database also requires that each SalesOrderDetail
row be related to exactly one
SalesOrderHeader
row—remember that
this was one of the one-to-many relationships we saw earlier. (The
header has a multiplicity of one, and the detail has a multiplicity of
many.) The constraint in the database requires the SalesOrderID
foreign key column in each
SalesOrderDetail
row to correspond to
the key for an existing SalesOrderHeader
row. But unlike the ProductID
column, we don’t set the
corresponding property directly on the entity. Instead, the second line
of Example 14-18 sets the new entity’s SalesOrderHeader
property, which as you may
recall is a navigation property.
When adding new entities that must be related to other entities,
you normally indicate the relationships with the corresponding
navigation properties. In this example, you could add the new SalesOrderDetail
object to a SalesOrderHeader
object’s SalesOrderDetails
navigation property—since a
header may have many related details, the property is a collection and
offers an Add
method. Or you can
work with the other end of the relationship as Example 14-18 does. This is the usual way to deal
with the relationships of a newly created entity—setting foreign key
properties directly as we did for the other relationships here is
somewhat unusual. We did that only because our EDM does not include all
of the relevant entities—we represent only three of the tables because a
complete model for this particular example would have been too big to
fit legibly onto a single page. There may also be situations where you
know that in your particular application, the key values required will
never change, and you might choose to cache those key values to avoid
the overhead of involving additional entities.
We’ve seen how to update existing data and add new data. This
leaves deletion. It’s pretty straightforward: if you have an entity
object, you can pass it to the context’s DeleteObject
method,
and the next time you call SaveChanges
, the EF will attempt to delete the
relevant row, as shown in Example 14-19.
As with any kind of change to the database, this will succeed only if it does not violate any of the database’s integrity constraints. For example, deleting an entity at the one end of a one-to-many relationship may fail if the database contains one or more rows at the many end that are related to the item you’re trying to delete. (Alternatively, the database might automatically delete all related items—SQL Server allows a constraint to require cascading deletes. This takes a different approach to enforcing the constraint—rather than rejecting the attempt to delete the parent item, it deletes all the children automatically.)
Example 14-18 adds new information that relates to information already in the database—it adds a new detail to an existing order. This is a very common thing to do, but it raises a challenge: what if code elsewhere in the system was working on the same data? Perhaps some other computer has deleted the order you were trying to add detail for. The EF supports a couple of common ways of managing this sort of hazard: transactions and optimistic concurrency.
Transactions are an extremely useful mechanism for dealing with concurrency hazards efficiently, while keeping data access code reasonably simple. Transactions provide the illusion that each individual database client has exclusive access to the entire database for as long as it needs to do a particular job—it has to be an illusion because if clients really took it in turns, scalability would be severely limited. So transactions perform the neat trick of letting work proceed in parallel except for when that would cause a problem—as long as all the transactions currently in progress are working on independent data they can all proceed simultaneously, and clients have to wait their turn only if they’re trying to use data already involved (directly, or indirectly) in some other transaction in progress.[38]
The classic example of the kind of problem transactions are designed to avoid is that of updating the balance of a bank account. Consider what needs to happen to your account when you withdraw money from an ATM—the bank will want to make sure that your account is debited with the amount of money withdrawn. This will involve subtracting that amount from the current balance, so there will be at least two operations: discovering the current balance, and then updating it to the new value. (Actually it’ll be a whole lot more complex than that—there will be withdrawal limit checks, fraud detection, audit trails, and more. But the simplified example is enough to illustrate how transactions can be useful.) But what happens if some other transaction occurs at the same time? Maybe you happen to be making a withdrawal at the same time as the bank processes an electronic transfer of funds.
If that happens, a problem can arise. Suppose the ATM transaction and the electronic transfer both read the current balance—perhaps they both discover a balance of $1,234. Next, if the transfer is moving $1,000 from your account to somewhere else, it will write back a new balance of $234—the original balance minus the amount just deducted. But there’s the ATM transfer—suppose you withdraw $200. It will write back a new balance of $1,034. You just withdrew $200 and paid $1,000 to another account, but your account only has $200 less in it than before rather than $1,200—that’s great for you, but your bank will be less happy. (In fact, your bank probably has all sorts of checks and balances to try to minimize opportunities such as this for money to magically come into existence. So they’d probably notice such an error even if they weren’t using transactions.) In fact, neither you nor your bank really wants this to happen, not least because it’s easy enough to imagine similar examples where you lose money.
This problem of concurrent changes to shared data crops up in all sorts of forms. You don’t even need to be modifying data to observe a problem: code that only ever reads can still see weird results. For example, you might want to count your money, in which case looking at the balances of all your accounts would be necessary—that’s a read-only operation. But what if some other code was in the middle of transferring money between two of your accounts? Your read-only code could be messed up by other code modifying the data.
A simple way to avoid this is to do one thing at a time—as long as each task completes before the next begins, you’ll never see this sort of problem. But that turns out to be impractical if you’re dealing with a large volume of work. And that’s why we have transactions—they are designed to make it look like things are happening one task at a time, but under the covers they allow tasks to proceed concurrently as long as they’re working on unrelated information. So with transactions, the fact that some other bank customer is in the process of performing a funds transfer will not stop you from using an ATM. But if a transfer is taking place on one of your accounts at the same time that you are trying to withdraw money, transactions would ensure that these two operations take it in turns.
So code that uses transactions effectively gets exclusive access to whatever data it is working with right now, without slowing down anything it’s not using. This means you get the best of both worlds: you can write code as though it’s the only code running right now, but you get good throughput.
How do we exploit transactions in C#? Example 14-20 shows the simplest approach: if you create
a TransactionScope
object, the EF
will automatically enlist any database operations in the same
transaction. The TransactionScope
class is defined in the System.Transactions
namespace in the
System.Transactions
DLL (another
class library DLL for which we need to add a reference, as it’s not in
the default set).
Example 14-20. TransactionScope
using (var dbContext = new AdventureWorksLT2008Entities()) { using (var txScope = new TransactionScope()) { var customersWithOrders = from cust in dbContext.Customers where cust.SalesOrderHeaders.Count > 0 select cust; foreach (var customer in customersWithOrders) { Console.WriteLine("Customer {0} has {1} orders", customer.CustomerID, customer.SalesOrderHeaders.Count); } txScope.Complete(); } }
For as long as the TransactionScope
is active (i.e., until it is
disposed at the end of the using
block), all the requests to the database this code makes will be part of
the same transaction, and so the results should be consistent—any other
database client that tries to modify the state we’re looking at will be
made to wait (or we’ll be made to wait for them) in order to guarantee
consistency. The call to Complete
at
the end indicates that we have finished all the work in the transaction,
and are happy for it to commit—without this, the transaction would be
aborted at the end of the scope’s using
block. For a transaction that modifies
data, failure to call Complete
will
lose any changes. Since the transaction in Example 14-20 only reads data, this might not cause any
visible problems, but it’s difficult to be certain. If a TransactionScope
was already active on this
thread (e.g., a function farther up the call stack started one) our
TransactionScope
could join in with
the same transaction, at which point failure to call Complete
on our scope would end up aborting
the whole thing, possibly losing data. The documentation recommends
calling Complete
for all transactions
except those you want to abort, so it’s a good practice always to call
it.
TransactionScope
represents an
implicit transaction—any data access
performed inside its using
block will
automatically be enlisted on the transaction. That’s why Example 14-20 never appears to use the TransactionScope
it creates—it’s enough for it
to exist. (The transaction system keeps track of which threads have
active implicit transactions.) You can also work with transactions
explicitly—the object context provides a Connection
property, which in turn offers
explicit BeginTransaction
and
EnlistTransaction
methods. You can
use these in advanced scenarios where you might need to control
database-specific aspects of the transaction that an implicit
transaction cannot reach.
These transaction models are not specific to the EF. You can use the same techniques with ADO.NET v1-style data access code.
Besides enabling isolation of multiple concurrent operations,
transactions provide another very useful property: atomicity. This means
that the operations within a single transaction succeed or fail as one:
all succeed, or none of them
succeed—a transaction is indivisible in that it cannot complete
partially. The database stores updates performed within a transaction
provisionally until the transaction completes—if it succeeds, the
updates are permanently committed, but if it fails, they are rolled back
and it’s as though the updates never occurred. The EF uses transactions
automatically when you call SaveChanges
—if you have not supplied a
transaction, it will create one just to write the updates. (If you have
supplied one, it’ll just use yours.) This means that SaveChanges
will always either succeed
completely, or have no effect at all, whether or not you provide a
transaction.
Transactions are not the only way to solve problems of concurrent access to shared data. They are bad at handling long-running operations. For example, consider a system for booking seats on a plane or in a theater. End users want to see what seats are available, and will then take some time—minutes probably—to decide what to do. It would be a terrible idea to use a transaction to handle this sort of scenario, because you’d effectively have to lock out all other users looking to book into the same flight or show until the current user makes a decision. (It would have this effect because in order to show available seats, the transaction would have had to inspect the state of every seat, and could potentially change the state of any one of those seats. So all those seats are, in effect, owned by that transaction until it’s done.)
Let’s just think that through. What if every person who flies on a particular flight takes two minutes to make all the necessary decisions to complete his booking? (Hours of queuing in airports and observing fellow passengers lead us to suspect that this is a hopelessly optimistic estimate. If you know of an airline whose passengers are that competent, please let us know—we’d like to spend less time queuing.) The Airbus A380 aircraft has FAA and EASA approval to carry 853 passengers, which suggests that even with our uncommonly decisive passengers, that’s still a total of more than 28 hours of decision making for each flight. That sounds like it could be a problem for a daily flight.[39] So there’s no practical way of avoiding having to tell the odd passenger that, sorry, in between showing him the seat map and choosing the seat, someone else got in there first. In other words, we are going to have to accept that sometimes data will change under our feet, and that we just have to deal with it when it happens. This requires a slightly different approach than transactions.
Optimistic concurrency describes an approach to concurrency where instead of enforcing isolation, which is how transactions usually work, we just make the cheerful assumption that nothing’s going to go wrong. And then, crucially, we verify that assumption just before making any changes.
In practice, it’s common to use a mixture of optimistic concurrency and transactions. You might use optimistic approaches to handle long-running logic, while using short-lived transactions to manage each individual step of the process.
For example, an airline booking system that shows a map of available seats in an aircraft on a web page would make the optimistic assumption that the seat the user selects will probably not be selected by any other user in between the moment at which the application showed the available seats and the point at which the user picks a seat. The advantage of making this assumption is that there’s no need for the system to lock anyone else out—any number of users can all be looking at the seat map at once, and they can all take as long as they like.
Occasionally, multiple users will pick the same seat at around the same time. Most of the time this won’t happen, but the occasional clash is inevitable. We just have to make sure we notice. So when the user gets back to us and says that he wants seat 7K, the application then has to go back to the database to see if that seat is in fact still free. If it is, the application’s optimism has been vindicated, and the booking can proceed. If not, we just have to apologize to the user (or chastise him for his slowness, depending on the prevailing attitude to customer service in your organization), show him an updated seat map so that he can see which seats have been claimed while he was dithering, and ask him to make a new choice. This will happen only a small fraction of the time, and so it turns out to be a reasonable solution to the problem—certainly better than a system that is incapable of taking enough bookings to fill the plane in the time available.
Sometimes optimistic concurrency is implemented in an application-specific way. The example just described relies on an understanding of what the various entities involved mean, and would require us to write code that explicitly performs the check described. But slightly more general solutions are available—they are typically less efficient, but they can require less code. The EF offers some of these ignorant-but-effective approaches to optimistic concurrency.
The default EF behavior seems, at a first glance, to be ignorant and broken—not only does it optimistically assume that nothing will go wrong, but it doesn’t even do anything to check that assumption. We might call this blind optimism—we don’t even get to discover when our optimism turned out to be unfounded. While that sounds bad, it’s actually the right thing to do if you’re using transactions—transactions enforce isolation and so additional checks would be a waste of time. But if you’re not using transactions, this default behavior is not good enough for code that wants to change or add data—you’ll risk compromising the integrity of your application’s state.
To get the EF to check that updates are likely to be sound, you
can tell it to check that certain entity properties have not changed
since the entity was populated from the database. For example, in the
SalesOrderDetail
entity, if you
select the ModifiedDate
property in
the EDM designer, you could go to the Properties panel and set its
Concurrency Mode to Fixed (its default being None). This will cause the
EF to check that this particular column’s value is the same as it was
when the entity was fetched whenever you update it. And as long as all
the code that modifies this particular table remembers to update the
ModifiedDate
, you’ll be able to
detect when things have changed.
While this example illustrates the concept, it’s not entirely robust. Using a date and time to track when a row changes has a couple of problems. First, different computers in the system are likely to have slight differences between their clocks, which can lead to anomalies. And even if only one computer ever accesses the database, its clock may be adjusted from time to time. You’d end up wanting to customize the SQL code used for updates so that everything uses the database server’s clock for consistency. Such customizations are possible, but they are beyond the scope of this book. And even that might not be enough—if the row is updated often, it’s possible that two updates might have the same timestamp due to insufficient precision. A stricter approach based on GUIDs or sequential row version numbers is more robust. But this is the realm of database design, rather than Entity Framework usage—ultimately you’re going to be stuck with whatever your DBA gives you.
If any of the columns with a Concurrency Mode of Fixed change
between reading an entity’s value and attempting to update it, the EF
will detect this when you call SaveChanges
and will throw an OptimisticConcurrencyException
, instead of
completing the update.
The EF detects changes by making the SQL UPDATE
conditional—its WHERE
clause will include checks for all of
the Fixed
columns. It inspects the
updated row count that comes back from the database to see whether the
update succeeded.
How you deal with an optimistic concurrency failure is up to your application—you might simply be able to retry the work, or you may have to get the user involved. It will depend on the nature of the data you’re trying to update.
The object context provides a Refresh
method that you
can call to bring entities back into sync with the current state of the
rows they represent in the database. You could call this after catching
an OptimisticConcurrencyException
as
the first step in your code that recovers from a problem. (You’re not
actually required to wait until you get a concurrency exception—you’re
free to call Refresh
at any time.)
The first argument to Refresh
tells
it what you’d like to happen if the database and entity are out of sync.
Passing RefreshMode.StoreWins
tells
the EF that you want the entity to reflect what’s currently in the
database, even if that means discarding updates previously made in
memory to the entity. Or you can pass RefreshMode.ClientWins
, in which case any
changes in the entity remain present in memory. The changes will not be
written back to the database until you next call SaveChanges
. So the significance of calling
Refresh
in ClientWins
mode is that you have, in effect,
acknowledged changes to the underlying database—if changes in the
database were previously causing SaveChanges
to throw an OptimisticConcurrencyException
, calling
SaveChanges
again after the Refresh
will not throw again (unless the
database changes again in between the call to Refresh
and the second SaveChanges
).
If you ask the context object for the same entity twice,
it will return you the same object both times—it remembers the identity
of the entities it has returned. Even if you use different queries, it
will not attempt to load fresh data for any entities already loaded
unless you explicitly pass them to the Refresh
method.
Executing the same LINQ query multiple times against the same
context will still result in multiple queries being sent to the
database. Those queries will typically return all the current data for
the relevant entity. But the EF will look at primary keys in the query
results, and if they correspond to entities it has already loaded, it
just returns those existing entities and won’t notice if their values
in the database have changed. It looks for changes only when you call
either SaveChanges
or Refresh
.
This raises the question of how long you should keep an object context around. The more entities you ask it for, the more objects it’ll hang on to. Even when your code has finished using a particular entity object, the .NET Framework’s garbage collector won’t be able to reclaim the memory it uses for as long as the object context remains alive, because the object context keeps hold of the entity in case it needs to return it again in a later query.
The way to get the object context to let go of everything is to
call Dispose
.
This is why all of the examples that show the creation of an object
context do so in a using
statement.
There are other lifetime issues to bear in mind. In some
situations, an object context may hold database connections open. And
also, if you have a long-lived object context, you may need to add calls
to Refresh
to ensure that you have
fresh data, which you wouldn’t have to do with a newly created object
context. So all the signs suggest that you don’t want to keep the object
context around for too long.
How long is too long? In a web application, if you create an
object context while handling a request (e.g., for a particular page)
you would normally want to Dispose
it
before the end of that request—keeping an object context alive across
multiple requests is typically a bad idea. In a Windows application (WPF
or Windows Forms), it might make sense to keep an object context alive a
little longer, because you might want to keep entities around while a
form for editing the data in them is open. (If you want to apply
updates, you normally use the same object context you used when fetching
the entities in the first place, although it’s possible to detach an
entity from one context and attach it later to a different one.) In
general, though, a good rule of thumb is to keep the object context
alive for no longer than is necessary.
The last data access feature we’ll look at is slightly different from the rest. So far, we’ve seen how to write code that uses data in a program that can connect directly to a database. But WCF Data Services lets you present data over HTTP, making data access possible from code in some scenarios where direct connections are not possible. It defines a URI structure for identifying the data you’d like to access, and the data itself can be represented in either JSON or the XML-based Atom Publishing Protocol (AtomPub).
As the use of URIs, JSON, and XML suggests, WCF Data Services can be useful in web applications. Silverlight cannot access databases directly, but it can consume data via WCF Data Services. And the JSON support means that it’s also relatively straightforward for script-based web user interfaces to use.
WCF Data Services is designed to work in conjunction with the Entity Framework. You don’t just present an entire database over HTTP—that would be a security liability. Instead, you define an Entity Data Model, and you can then configure which entity types should be accessible over HTTP, and whether they are read-only or support other operations such as updates, inserts, or deletes. And you can add code to implement further restrictions based on authentication and whatever security policy you require. (Of course, this still gives you plenty of scope for creating a security liability. You need to think carefully about exactly what information you want to expose.)
To show WCF Data Services in action, we’ll need a web application, because it’s an HTTP-based technology. If you create a new project in Visual Studio, you’ll see a Visual C#→Web category on the left, and the Empty ASP.NET Web Application template will suit our needs here. We need an Entity Data Model to define what information we’d like to expose—for this example, we’ll use the same EDM we’ve been using all along, so the steps will be the same as they were earlier in the chapter.
To expose this data over HTTP, we add another item to the
project—under the Visual C#→Web template
category we choose the WCF Data Service template. We’ll call the service
MyData
. Visual Studio will add a
MyData.svc.cs file to the project,
which needs some tweaking before it’ll expose any data—it assumes that it
shouldn’t publish any information that we didn’t explicitly tell it
to.
The first thing we need to do is modify the base class of the
generated MyData
class—it derives from
a generic class called DataService
, but
the type argument needs to be filled in—Visual Studio just puts a comment
in there telling you what to do. We will plug in the name of the object
context class:
public class MyData : DataService<AdventureWorksLT2008Entities>
This class contains an InitializeService
method
to which we need to add code for each entity type we’d like to make
available via HTTP. Example 14-21 makes all
three entity types in the model available for read access.
Example 14-21. Making entities available
public static void InitializeService(IDataServiceConfiguration config) { config.SetEntitySetAccessRule("Customers", EntitySetRights.AllRead); config.SetEntitySetAccessRule("SalesOrderHeaders", EntitySetRights.AllRead); config.SetEntitySetAccessRule("SalesOrderDetails", EntitySetRights.AllRead); }
We can now look at how the data appears. If we press F5, Visual
Studio opens a web browser showing the MyData.svc
URL for our web application. It shows
an XML file describing the available entity types, as Example 14-22 shows. (The exact
value you see in the xml:base
may be
different—it depends on the port number Visual Studio chooses for
debugging.)
Example 14-22. Available entities described by the web service
<service xml:base="http://localhost:1181/MyData.svc/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"> <workspace> <atom:title>Default</atom:title> <collection href="Customers"> <atom:title>Customers</atom:title> </collection> <collection href="SalesOrderDetails"> <atom:title>SalesOrderDetails</atom:title> </collection> <collection href="SalesOrderHeaders"> <atom:title>SalesOrderHeaders</atom:title> </collection> </workspace> </service>
Notice that each <collection>
element has an href
attribute. Typically, href
attributes denote a link to another
resource, the attribute value being a relative URL. So you can just stick
an entity name on the end of the URL. The exact URL will depend on the
port number Visual Studio picks for the test web server, but something
like http://localhost:1181/MyData.svc/Customers will
return all the customers in the system.
There are two things to be aware of when looking at entities in the browser with this sort of URL. First, the simplest URLs will return all the entities of the specified type, which might take a long time. We’ll see how to be more selective in a moment. Second, by default the web browser will notice that the data format being used is a variant of Atom, and will attempt to use the same friendly feed rendering you would get on other Atom- and RSS-based feeds. (Lots of blogs offer an Atom-based feed format.) Unfortunately, the browser’s friendly rendering is aimed at the kind of Atom features usually found in blogs, and it doesn’t always understand AtomPub feeds, so you might just get an error.
To deal with the second problem, you could just View Source to see the underlying XML, or you can turn off friendly feed rendering. In IE8, you open the Internet Options window and go to the Content tab. Open the Feed and Web Slice Settings window from there, and uncheck the “Turn on feed reading view” checkbox. (If you’ve already looked at a feed and hit this problem, you might need to close all instances of IE after making this change and try again.)
WCF Data Services lets you request a specific entity by putting its primary key inside parentheses at the end of the URL. For example, http://localhost:1181/MyData.svc/Customers(29531) fetches the customer entity whose ID is 29531. If you try this, you’ll see a simple XML representation of all the property values for the entity. In that same XML document, you’ll also find this element:
<link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/ SalesOrderHeaders" type="application/atom+xml;type=feed" title="SalesOrderHeaders" href="Customers(29531)/SalesOrderHeaders" />
This is how associations in the EDM show up—if an entity has related
entities available through an association, it will offer a link to the URL
on which those associations can be found. So as the href
in this example shows, you can just stick
SalesOrderHeaders
on the end of the
customer instance URL to get all the related orders for customer 29531, as
in the following:
http://localhost:1181/MyData.svc/Customers(29531)/SalesOrderHeaders
So you can see how joins across relationships turn into URLs, and
also how simple key-based queries work. In fact, the URL syntax also
supports more complex queries based on properties. For example, this
returns all customers whose FirstName
has the value Cory
:
http://localhost:1181/MyData.svc/Customers?$filter=FirstName%20eq%20'Cory'
(The %20
is how URLs represent
spaces, so we’ve really just appended $filter=FirstName eq 'Cory'
to the URL.) The URL
syntax also supports ordering and paging. Many standard LINQ operators are
not supported, including grouping and joining.
You don’t have to work directly with these URLs and XML documents—WCF Data Services includes a client-side component that supports LINQ. So you can run LINQ queries that will be converted into HTTP requests that use the URL structure you see here. We can demonstrate this by adding a new console application to the same solution as our web application. If we right-click on the console application’s References item in the Solution Explorer and select Add Service Reference, clicking Discover in the dialog that opens will show the WCF Data Service from the other project. Selecting this and clicking OK generates code to represent each entity type defined by the service. That enables us to write code such as Example 14-23.
Example 14-23. Client-side WCF Data Services code
var ctx = new AdventureWorksLT2008Entities( new Uri("http://localhost:1181/MyData.svc")); var customers = from customer in ctx.Customers where customer.FirstName == "Cory" select customer; foreach (Customer customer in customers) { Console.WriteLine(customer.CompanyName); }
This looks superficially similar to the Entity Framework code we saw
earlier—we still have an object context, for example. Visual Studio
generated the AdventureWorksLT2008Entities
class when we
imported the service reference, and it derives from DataServiceContext
. It’s slightly different from
the EF context—it’s not disposable, for one thing. (That’s why there’s no
using
statement here—this object
context doesn’t implement IDisposable
.)
And it’s a lot simpler—it doesn’t do any change tracking. (That’s why it
doesn’t need to implement IDisposable
.)
It’s really just a convenient way to extract the information that an WCF
Data Service exposes as objects in C#.
The LINQ query here will generate a suitable URL that encodes the
query—filtering by FirstName
in this
case. And as with a database query, it won’t actually make the request
until we start to enumerate the results—this LINQ provider follows the
usual deferred execution pattern.
The range of query types supported by the WCF Data Services LINQ provider is much more limited than that offered by LINQ to Entities, LINQ to SQL, or most LINQ providers. It can only implement queries that are possible to turn into WCF Data Services URLs, and the URL syntax doesn’t cover every possible kind of LINQ query.
WCF Data Services also offers more advanced features than those shown here. For example, you can arrange for entities to be updatable and creatable, and you can provide custom filtering code, to control exactly which entities are returned.
In this chapter, we saw that the .NET Framework offers a range of data access mechanisms. The original interface-based API supports direct database access. The Entity Framework makes it easier for C# code to work with data from the database, as well as providing some support for controlling the mapping between the database and the object model representing the data. And WCF Data Services is able to take some or all of an Entity Data Model and present it over HTTP, with either AtomPub or JSON, thus making your data available to AJAX and Silverlight clients.
[34] The name is a little confusing. In a sense, ADO.NET is a successor to ADO (ActiveX Data Objects), a data access system that was around before .NET. So ADO.NET does for . NET what ADO did for Visual Basic 6. But they are quite different technologies—ADO.NET makes no use of ADO, or ActiveX. ADO.NET can use OLE DB, the technology underpinning ADO, but native ADO.NET providers are preferred—the OLE DB provider is mainly for legacy sources.
[35] Yoda discussing Sith terms of employment, from Star Wars Episode I: The Phantom Menace.
[36] Opinion is divided on whether this variant can still be called one-to-one. Strictly speaking it’s incorrect, but in practice you’ll see one-to-zero-or-one relationships widely described informally as one-to-one.
[37] And in case you’re wondering about LINQ to XML, that doesn’t help here. It lets you use LINQ-capable languages like C# or VB.NET to write LINQ queries to look in an XML document. It doesn’t let you put LINQ queries in an XML document.
[38] In fact, it gets a good deal cleverer than that. Databases go to some lengths to avoid making clients wait for one another unless it’s absolutely necessary, and can sometimes manage this even when clients are accessing the same data, particularly if they’re only reading the common data. Not all databases do this in the same way, so consult your database documentation for further details.
[39] And yes, bookings for daily scheduled flights are filled up gradually over the course of a few months, so 28 hours per day is not necessarily a showstopper. Even so, forcing passengers to wait until nobody else is choosing a seat would be problematic—you’d almost certainly find that your customers didn’t neatly space out their usage of the system, and so you’d get times where people wanting to book would be unable to. Airlines would almost certainly lose business the moment they told customers to come back later.