Software applications deal with data in a wide array of forms: single values such as integers or strings; composite values paired together as tuples, records, or objects; collections of smaller pieces of data represented as lists, sets, arrays, or sequences; XML strings with tags describing the shape and kind of data; and data coming from relational or object-oriented databases, just to name a few.
This chapter looks at ways of working with some common data sources:
In Chapter 3, you saw that sequences, similar to other enumerable data types such as lists, arrays, maps, and sets, have various aggregate iteration, query, and transform operators. This chapter first looks at how you can use these operators in a straightforward manner to form SQL-like operations over in-memory collections. This mechanism can be further tuned and applied to other data sources where the original data source is mapped to a sequence.
We cover how you can work with relational databases from F# programs using the ADO.NET libraries. You learn how to connect to a database; create, update, and delete tables and records using simple SQL statements; access database data sequentially; and work with parts of tables using disconnected in-memory representations. You also learn how to manage database connections and data sources; create databases; add new tables; and work with relationships, constraints, and stored procedures using Visual Studio.
We cover how you can use the Language Integrated Query (LINQ) infrastructure in combination with F# metaprogramming to bring relational data query logic into reach without explicit use of SQL. The essential goal is to write database queries in F# itself using the same techniques you use to query in-memory data structures.
Finally, the chapter looks at the use of XML as a generic data format. You saw in Chapter 9 how to work with the XML Document Object Model (DOM), and we briefly survey how to desugar XML into sequences using LinqToXml.
Query languages are often made up of building blocks that transform and filter data. Functional programming gives you the basic tools that allow you to apply standard query logic on all F# types that are compatible with the F# sequence type, such as F# lists, arrays, sequences, and anything else that implements the IEnumerable<'a>/seq<'a>
interface.
Consider the following aggregate operators described in Chapter 3:
let select = Seq.map let where = Seq.filter
Here you rename the operations in the F# Seq
module to correspond to more standard database query terminology: the select
operator is defined as a transform operation, because in a selection you take sequence elements and map (often narrow) them to new values. Similarly, the where
operator is defined as a filter applied on a sequence that takes an 'a -> bool
predicate to filter out the desired elements. You can use these aggregate operators in a straightforward manner to query and transform in-memory data. The only other piece you need is the glue that makes the query and transform operators work together: the standard pipe operator (|>
). Recall that this operator flips the order of its arguments; being an infix operator, it feeds the left-side argument into the function on the right. This is useful because the argument to the function is seen before the function itself, propagating important typing information into the function.
For instance, given a string * int * string
array representing the name, age, and department of various employees, you can select those names that start with the letter R as follows:
let people = [| ("Joe", 27, "Sales"); ("Rick", 35, "Marketing"); ("Mark", 40, "Sales"); ("Rob", 31, "Administration"); ("Bob", 34, "Marketing") |] let namesR = people |> select (fun (name, age, dept) -> name) |> where (fun name -> name.StartsWith "R")
Note that the types of name
, age
, and dept
are inferred from people
, and no type annotation is necessary. Finding those people who work in sales and are older than 30 years old is also straightforward:
> let namesSalesOver30 =
people |> where (fun (_, age, _) -> age >= 30)
|> where (fun (_, _, dept) -> dept = "Sales")
|> select (fun (name, _, _) -> name);;
val namesSalesOver30 : seq<string>
> namesSalesOver30;;
val it : seq<string> = seq ["Mark"]
In the previous section, you used alternative names such as select
and where
for some standard F# operations such as Seq.map
and Seq.filter
. This is for illustrative purposes to show the connection between these operators and SQL-like querying. In most F# code, you should continue to use the standard F# operators from the Seq
module.
In addition to the restriction (filter
/where
) and projection (map
/select
) operators, the Seq
module contains other useful functions, many of which were described in Chapter 3; and you can easily define further operators. For instance, you can define sorting over sequences by using a custom sorting function together with Seq.sortBy
.
Another useful query-like function is Seq.truncate
, which takes the first n
elements and truncates the rest. Using these new operators—given, for example, an unbounded stream of random numbers—you can extract the first three even numbers and return a pair of those numbers and their square in reverse order, as the following example shows:
let rand = System.Random() let numbers = seq { while true do yield rand.Next(1000) } numbers |> Seq.filter (fun i -> i % 2 = 0)// "where"
|> Seq.truncate 3 |> Seq.sort// sort ascending
|> Seq.map (fun i -> i, i*i)// "select"
// random - results will vary!
val it : seq<int * int> = seq [(814, 662596); (686, 470596); (242, 58564)]
Some of the most general operators supported by most F# data structures are reduce
, fold
, and foldBack
. These apply a function to each element of a collection and accumulate a result. For fold
and foldBack
, the function is applied in left-to-right or right-to-left order, respectively. If you use the name fold
, then typically the ordering is left to right. Both functions also take an initial value for the accumulator. For example:
> List.fold (fun acc x -> acc + x) 0 [4; 5; 6];;val it : int = 15
> Seq.fold (fun acc x -> acc + x) 0.0 [4.0; 5.0; 6.0];;val it : float = 15.0
> List.foldBack (fun x acc -> min x acc) [4; 5; 6; 3; 5] System.Int32.MaxValue;; val it : int = 3
The following are equivalent, but no explicit anonymous function values are used:
> List.fold (+) 0 [4; 5; 6];;val it : int = 15
> Seq.fold (+) 0.0 [4.0; 5.0; 6.0];;val it : float = 15.0
> List.foldBack min [4; 5; 6; 3; 5] System.Int32.MaxValue;; val it : int = 3
If used carefully, the various foldBack
operators are pleasantly compositional, because they let you apply a selection function as part of the accumulating function:
> List.foldBack (fst >> min) [(3, "three"); (5, "five")] System.Int32.MaxValue;; val it : int = 3
The F# library also includes more direct accumulation functions such as Seq.sum
and Seq.sumBy
. These use a fixed accumulation function (addition) with a fixed initial value (zero).
Folding operators are very powerful and can help you avoid many explicit uses of recursion or loops in your code. However, they're sometimes overused in functional programming and can be hard for novice users to read and understand. Take the time to document uses of these operators, or consider using them to build simpler operators that apply a particular accumulation function.
Using aggregate operators to form queries is closely related to the sequence expression notation described in Chapter 3 and is used frequently in this book. For example, namesSalesOver30
defined previously can also be defined as follows:
seq { for (name, age, dept) in people do if age >= 30 && dept = "Sales" then yield name }
This is a different notation for the same computation. For very simple queries, F# sequence expressions also support an even more compact form where if
/then
clauses are replaced by when
, the final do
is dropped from the for
statement, and yield
is replaced by ->
:
seq { for (name, age, dept) in people do if age >= 30 && dept = "Sales" then yield name }
There is no difference between these two sequence expressions—it's just a matter of syntax.
You can use sequence expressions in conjunction with the |>
operator. For example:
seq { for i in numbers do if i % 2 = 0 then yield (i, i*i) } |> Seq.truncate 3 |> Seq.sort
There are pros and cons to using sequence-expression syntax for some parts of queries:
Sequence expressions are very good for the subset of queries expressed using iteration (for
), mapping (select
/yield
), and filtering (if
/then
/when
/where
). They're particularly good for queries containing multiple nested for
statements.
Other query constructs such as ordering, truncating, grouping, and aggregating must be expressed directly using aggregate operators such as Seq.sortBy
and Seq.groupBy
.
Some queries depend on the index position of an item within a stream. These are best expressed directly using aggregate operators such as Seq.mapi
.
Many queries are part of a longer series of transformations chained by |>
operators. Often, the type of the data being transformed at each step varies substantially through the chain of operators. These queries are best expressed using aggregate operator chains.
It's likely that in the future, the F# sequence-expression syntax will include support for specifying grouping and aggregation operations within the expression syntax. However, at the time of writing, it's necessary to explicitly use operators such as Seq.sortBy
and Seq.groupBy
for these operations.
Storing data in various files, reading them into memory, and querying and manipulating the resulting in-memory data collection is a sufficient approach for many applications working with data, but there comes a point where more persistent data handling is required.
For example, consider a business that has two applications that need to access the same set of data about employees. One way to do this is to store data in a text file and work with this data file from both applications. But this approach is vulnerable; you'd quickly realize how redundant text files can get when adding new pieces of data (say you wanted to add an address next to the name, department, and age of each employee) or if you wanted to group your employee data by introducing various relationships (say, multiple addresses per employee). Even if you managed to find a good storage alternative, you'd still have to modify both applications to read and write this new data format. Databases make scenarios like these much easier to cope with by enforcing centralized control of persistent data (data that is more persistent than what appears in a single application) and by giving you the freedom to define, manipulate, and query this data in an application-neutral way.
Databases provide many benefits. Some of the more important ones are listed here:
Data security: When you have centralized control of your data, you can erect a full security system around the data, implementing specific access rules for each type of access or parts of the database.
Sharing data: Any number of applications with the appropriate access rights can connect to your database and read the data stored within—without needing to worry about containing the logic to extract this data. As you see shortly, applications use various query languages (most notably SQL) to communicate with databases.
A logical organization of data: You can write new applications that work with the same data without having to worry about how the data is physically represented and stored. On the basic level, this logical structure is provided by a set of entities (data tables) and their relationships.
Avoiding data redundancy: Having all requirements from each consuming application up front helps to identify a logical organization for your data that minimizes possible redundancy. For instance, you can use foreign keys instead of duplicating pieces of data. Data normalization is the process of systematically eliminating data redundancy, a large but essential topic that we don't consider in this book.
Transactions: Reading from and writing to databases occurs atomically, and as a result, two concurrent transactions can never leave data in an inconsistent, inaccurate state. Isolation levels refer to specific measures taken to ensure transaction isolation by locking various parts of the database (fields, records, tables). Higher isolation levels increase locking overhead and can lead to a loss of parallelism by rendering concurrent transactions sequential; on the other hand, no isolation can lead to inconsistent data.
Maintaining data integrity: Databases make sure the data stored within is accurate. Having no redundancy is one way to maintain data integrity (if a piece of data is changed, it's changed in the only place it occurs; thus, it remains accurate); on the other hand, data security and transaction isolation are needed to ensure that the data stored is modified in a controlled manner.
Table 15-1 shows some of the most common database engines, all of which can be used from F# and .NET.
Table 15.1. Common Databases
Name | Type | Description | Available From |
---|---|---|---|
Open source | Open source database engine |
| |
Open source | Small, embeddable, zero-configuration SQL database engine |
| |
DB2 | Commercial | IBM's database engine |
|
Firebird | Open source | Based on Borland Interbase |
|
Open source | Reliable and popular database |
| |
Commercial | Reliable database engine |
| |
Oracle | Commercial | One of the most popular enterprise database engines |
|
Commercial | Microsoft's main database engine |
| |
Commercial | Free and easy-to-use version of SQL Server |
| |
Commercial | Mobile database engine |
|
Applications communicate with relational databases using Structured Query Language (SQL). Each time you create tables, create relationships, insert new records, or update or delete existing ones, you're explicitly or implicitly issuing SQL statements to the database. The examples in this chapter use a dialect of Standard SQL called Transact-SQL (T-SQL), used by SQL Server and SQL Server Express. SQL has syntax to define the structure of a database schema (loosely speaking, a collection of data tables and their relations) and also syntax to manage the data within. These subsets of SQL are called Data Definition Language (DDL) and Data Manipulation Language (DML), respectively. The most important DDL statements are the following:
CREATE/ALTER/DROP TABLE CREATE/DROP VIEW
ADO.NET is the central database-access machinery in the .NET Framework, and it provides full XML support, disconnected and typed datasets, scalability, and high performance. This section gives a brief overview of the ADO.NET fundamentals.
With ADO.NET, data is acquired through a connection to the database via a provider. This connection serves as a medium against which to execute a command; this can be used to fetch, update, insert, or delete data from the data store. Statements and queries are articulated as SQL text (CREATE
, SELECT
, UPDATE
, INSERT
, and DELETE
statements) and are passed to the command object's constructor. When you execute these statements, you obtain data (in the case of queries) or the number of affected rows (in the case of UPDATE
, INSERT
, and DELETE
statements). The data returned can be processed via two main mechanisms: sequentially in a read-only fashion using a DataReader
object or by loading it into an in-memory representation (a DataSet
object) for further disconnected processing. DataSet
objects store data in a set of table objects along with metadata that describes their relationships and constraints in a fully contained model.
ADO.NET 2.0 comes with four data providers: SQLClient
, OleDb
, Oracle
, and Odbc
. Table 15-2 describes them and a couple of more commonly used providers. These providers act as the main gateway to the database.
Table 15.2. Common Data Providers
Name | Namespace | Available From |
---|---|---|
|
| .NET 2.0 |
|
| .NET 2.0 |
|
| .NET 2.0 |
|
| .NET 2.0 |
|
|
|
|
|
|
The OleDb
and ODBC
data providers are provided for compatibility with earlier database-access technologies. All ADO.NET connection and command classes have the data provider name as the prefix to their class name, as in OdbcConnection
and OdbcCommand
or OleDbConnection
and OleDbCommand
.
Before you can do any work with a database, you need to establish a connection to it. For instance, you can connect to a locally running instance of SQL Server 2005 Express using the following code:
open System.Data open System.Data.SqlClient let connString = @"Server='.SQLEXPRESS';Integrated Security=SSPI" let conn = new SqlConnection(connString)
The value connString
is a connection string. Regardless of how you created your connection object, to execute any updates or queries on it, you need to open it first:
> conn.Open();; val it : unit = ()
If this command fails, then you may need to do one of the following:
Install SQL Server 2005 Express or a newer version.
Consult the latest SQL Server Express samples for alternative connection strings.
Add UserInstance='true'
to the connection string. This starts the database engine as a user-level process.
Change the connection string if you have a different database engine installed and running (for instance, if you're using SQL Server instead of SQL Server Express).
Connections established using the same connection string are pooled and reused depending on your database engine. Connections are often a limited resource and should generally be closed as soon as possible within your application.
Now that you've established a connection to the database engine, you can explicitly create a database from F# code by executing a SQL statement directly. For example, you can create a database called company
as follows:
open System.Data open System.Data.SqlClient let execNonQuery conn s = let comm = new SqlCommand(s, conn, CommandTimeout = 10) comm.ExecuteNonQuery() |> ignore execNonQuery conn "CREATE DATABASE company"
You use execNonQuery
in the subsequent sections. This method takes a connection object and a SQL string and executes it as a SQL command, ignoring its result.
If you try to create the same database twice, you receive a runtime exception. However, if you intend to drop an existing database, you can do so by issuing a DROP DATABASE company
SQL command. The DROP
command can also be used for other database artifacts, including tables, views, and stored procedures.
You can execute a simple SQL command to create a table; all you need to do is specify its data fields and their types and whether null values are allowed. The following example creates an Employees
table with a primary key EmpID
and FirstName
, LastName
, and Birthday
fields:
execNonQuery conn "CREATE TABLE Employees ( EmpID int NOT NULL, FirstName varchar(50) NOT NULL, LastName varchar(50) NOT NULL, Birthday datetime, PRIMARY KEY (EmpID))"
You can now insert two new records as follows
execNonQuery conn "INSERT INTO Employees (EmpId, FirstName, LastName, Birthday) VALUES (1001, 'Joe', 'Smith', '02/14/1965')" execNonQuery conn "INSERT INTO Employees (EmpId, FirstName, LastName, Birthday) VALUES (1002, 'Mary', 'Jones', '09/15/1985')"
and retrieve two columns of what was inserted using a fresh connection and a data reader:
let query() = seq { use conn = new SqlConnection(connString) do conn.Open() use comm = new SqlCommand("SELECT FirstName, Birthday FROM Employees", conn) use reader = comm.ExecuteReader() while reader.Read() do yield (reader.GetString 0, reader.GetDateTime 1) }
When you evaluate the query expression in F# Interactive, a connection to the database is created and opened, the command is built, and the reader is used to read successive elements:
> fsi.AddPrinter(fun (d: System.DateTime) -> d.ToString());;val it : unit = ()
> query();;val it : seq<string * System.DateTime> =
seq [("Joe", 14/02/1965 00:00:00); ("Mary", 15/09/1985 00:00:00)]
The definition of query
uses sequence expressions that locally define new IDisposable
objects such as conn
, comm
, and reader
using declarations of the form use var = expr
. These ensure that the locally defined connection, command, and reader objects are disposed after exhausting the entire sequence. See Chapters 4, 8, and 9 for more details about sequence expressions of this kind.
F# sequences are on-demand (that is, lazy), and the definition of query
doesn't open a connection to the database. This is done when the sequence is first iterated; a connection is maintained until the sequence is exhausted.
Note that the command object's ExecuteReader
method returns a DataReader
instance that is used to extract the typed data returned from the query. You can read from the resulting sequence in a straightforward manner using a sequence iterator. For instance, you can use a simple anonymous function to print data on the screen:
> query() |> Seq.iter (fun (fn, bday) -> printfn "%s has birthday %O" fn bday);;Joe has birthday 14/02/1965 00:00:00
Mary has birthday 15/09/1985 00:00:00
val it : unit = ()
The query brings the data from the database in-memory, although still as a lazy sequence. You can then use standard F# in-memory data transformations on the result:
> query() |> Seq.filter (fun (nm, bday) -> bday < System.DateTime.Parse("01/01/1985")) |> Seq.length;; val it : int = 1
However, be aware that these additional transformations are happening in-memory and not in the database.
The command object has different methods for executing different queries. For instance, if you have a nonselect query, you need to use the ExecuteNonQuery
method (for UPDATE
, INSERT
, and DELETE
statements, as previously in execNonQuery
), which returns the number of rows affected (updated, inserted, or deleted), or the ExecuteScalar
method, which returns the first column of the first row of the result, providing a fast and efficient way to extract a single value, such as the number of rows in a table or a result set.
In the previous command, you extracted fields from the result rows using Get
XXX
methods on the reader
object. The particular methods have to match the types of the fields selected in the SQL query, and a mismatch results in a runtime InvalidCastException
. For these and other reasons, DataReader
tends to be suitable only in situations when the following items are true:
You need to read data only in a sequential order (as returned from the database). DataReader
provides forward-only data access.
The field types of the result are known, and the query isn't configurable.
You're reading only and not writing data. DataReader
provides read-only access.
Your use of the DataReader
is localized. The data connection is open throughout the reader loop.
Datasets allow applications to work in a disconnected mode, which means the application connects to the database, loads relevant data to an in-memory representation, and processes the data locally. When the processing is completed, the data in the in-memory dataset can be synchronized with the database.
ADO.NET datasets are compact in-memory databases and provide similar functionality to real databases; however, they're designed to work with a limited amount of data. A DataSet
contains a collection of tables, the relationships between those tables, and various constraints. Tables are represented by the DataTable
type. And as in a SQL server, tables are defined by their columns (DataColumn
objects), and the data is stored in rows (DataRow
objects).
In ADO.NET, the loading of data into the dataset and the synchronization to the database are coordinated by data adapters—for instance, SqlDataAdapter
objects. Consider the following example, which defines a buildDataSet
function that takes a connection and a SQL SELECT
query and returns the resulting DataSet
object:
let dataAdapter = new SqlDataAdapter() let buildDataSet conn queryString = dataAdapter.SelectCommand <- new SqlCommand(queryString, conn) let dataSet = new DataSet()// This line is needed to configure the command
let _ = new SqlCommandBuilder(dataAdapter) dataAdapter.Fill(dataSet) |> ignore// ignore the number of records returned
dataset
The inferred types are as follows:
val dataAdapter : SqlDataAdapter
val buildDataSet : SqlConnection -> string -> DataSet
When setting up the data adapter, you initialize its SelectCommand
property to the query string that was passed as an argument. This SQL query is used to populate the DataSet
when the Fill
method is called. The somewhat mysterious line let _ = new SqlCommandBuilder(dataAdapter)
creates a command-builder object, which has the side effect of building the INSERT
, UPDATE
, and DELETE
commands automatically from the query in SelectCommand
, making the dataset capable of persisting changes. Also, note that you don't have to worry about opening or closing connections, because this is taken care of with the data adapter; all you need is the connection object itself:
let dataSet = buildDataSet conn "SELECT EmpID, FirstName, LastName, Birthday from Employees"
The resulting DataSet
contains a single table; you obtain it by index and print its content. This table is assembled based on the query and contains the columns that are part of the SELECT
statement:
let table = dataSet.Tables.Item(0) for row in table.Rows do printfn "%O, %O - %O" (row.Item "LastName") (row.Item "FirstName") (row.Item "Birthday")
When run in F# Interactive, this produces the following output:
Smith, Joe - 14/02/1965 00:00:00
Jones, Mary - 15/09/1985 00:00:00
You can refer to each column of a row by the same field name used in the SELECT
query. Adding a new row to table
is treated similarly:
let row = table.NewRow()
row.Item("EmpID") <- 1003
row.Item("FirstName") <- "Eve"
row.Item("LastName") <- "Smith"
row.Item("Birthday") <- System.DateTime.Today
table.Rows.Add row
dataAdapter.Update(dataSet) |> ignore // ignore the number of affected rows
Repeating the SQL query from the previous section reveals the addition of the new entry to the database table:
> query();;val it : seq<string * System.DateTime> =
seq [("Joe", 14/02/1965 00:00:00);
("Mary", 15/09/1985 00:00:00);
("Eve", 27/09/2007 00:00:00)]
Note that you use the INSERT
statement that was built by the command-builder object based on the selection query. Using untyped datasets is a great way to execute dynamic queries and to provide ad hoc access to your data. On the other hand, the lack of strong typing means it suffers from possible type mismatches or incorrect field names.
Typed datasets are derived from ordinary datasets and allow you to work with data in a type-safe manner. This means that instead of (row.Item "FirstName" :?> string)
, you can write row.FirstName
. You can create typed datasets using the xsd.exe
command-line tool included in the .NET SDK. Among other things, this tool generates source code from XML schema documents; to use it, you must obtain an XSD document for the tables for which you want data objects. (You can quickly generate an XML schema document using Visual Studio; select File
Alternatively, you can extract this schema definition using code:
open System.IO let dataSet2 = buildDataSet conn "SELECT * FROM Employees" let file name = Path.Combine(@"c:fsharp", name) File.WriteAllText(file "employees.xsd", dataSet2.GetXmlSchema())
Using this extracted XSD document, you can now fire up xsd.exe
from the command-line shell:
C:fsharp
> xsd.exe employees.xsd /d /l:CS /n:Employees.DataWriting file 'C:fsharpEmployees.cs'.
C:fsharp> csc /target:library Employees.cs
This generates a C# file Employees.cs
in the Employees.Data
namespace, containing a typed dataset for the Employees
table, which you then compile to Employees.dll
. You can now reference this DLL and create an instance of a typed dataset:
> #r @"employees.dll";;Binding session to 'C:fsharpEmployees.dll'...
> let employeesTable = new Employees.Data.NewDataSet();;val employeesTable : Employees.Data.NewDataSet
> dataAdapter.Fill(employeesTable) |> ignore;;// ignore the number of records
val it : unit = ()
> for emp in employeesTable._Table do printfn "%s, %s - %O" emp.LastName emp.FirstName emp.Birthday;;Smith, Joe - 14/02/1965 00:00:00
Jones, Mary - 15/09/1985 00:00:00
Smith, Eve - 27/09/2007 00:00:00
Note that in the iteration emp
is known to have a strong type that allows you to access the fields LastName
, FirstName
, and Birthday
.
Finally, it's very easy to dump out XML for your data:
> printf "%s" (employeesTable.GetXml());;<NewDataSet>
<Table>
<EmpID>1001</EmpID>
<FirstName>Joe</FirstName>
<LastName>Smith</LastName>
<Birthday>1965-02-14T00:00:00+00:00</Birthday>
</Table>
<Table>
<EmpID>1002</EmpID>
<FirstName>Mary</FirstName>
<LastName>Jones</LastName>
<Birthday>1985-09-15T00:00:00+01:00</Birthday>
</Table>
<Table>
<EmpID>1003</EmpID>
<FirstName>Eve</FirstName>
<LastName>Smith</LastName>
<Birthday>2007-09-27T00:00:00+01:00</Birthday>
</Table>
</NewDataSet>
You can also write it to a file:
> System.IO.File.WriteAllText(file "employees.xml", employeesTable.GetXml());; val it : unit = ()
Stored procedures are defined and stored in your relational database and provide a number of benefits over literal SQL. First, they're external to the application and thus provide a clear division of the data logic from the rest of the application. This enables you to make data-related modifications without having to change application code or having to redeploy the application. Second, they're stored in the database in a prepared or compiled form and thus are executed more efficiently than literal SQL statements (although those can be prepared as well at a one-time cost, but they're still contained in application space, which is undesirable). Supplying arguments to stored procedures instantiates the compiled formula.
In Visual Studio, you can add stored procedures just like any other database artifacts using the Server Explorer window: right-click the Stored Procedures item in the appropriate database, and select Add New Stored Procedure. Doing so creates a stored procedure template that you can easily customize. Alternatively, you can add stored procedures programmatically using the CREATE PROCEDURE
SQL command. Consider the following stored procedure that returns the first and last names of all employees whose last name matches the given pattern:
execNonQuery conn " CREATE PROCEDURE dbo.GetEmployeesByLastName ( @Name nvarchar(50) ) AS SELECT Employees.FirstName, Employees.LastName FROM Employees WHERE Employees.LastName LIKE @Name"
You can wrap this stored procedure in a function as follows:
let GetEmployeesByLastName (name: string) = use comm = new SqlCommand("GetEmployeesByLastName", conn, CommandType=CommandType.StoredProcedure) comm.Parameters.AddWithValue("@Name", name) |> ignore use adapter = new SqlDataAdapter(comm) let table = new DataTable() adapter.Fill(table) |> ignore table
You can execute the stored procedure as follows to find employees with the last name Smith:
> for row in (GetEmployeesByLastName "Smith").Rows do printfn "row = %O, %O" (row.Item("FirstName")) (row.Item("LastName"));;row = Joe, Smith
row = Eve, Smith
val it : unit = ()
You saw in Chapter 14 how data tables can be visualized in web applications. The return value of GetEmployeesByLastName
from the previous section is a DataTable
. These objects can also be directly bound to a Windows Forms data grid, a visual data control that supports the DataSource
property and that can display data in a tabular format (Windows Forms controls were discussed in Chapter 11):
open System.Windows.Forms let emps = GetEmployeesByLastName "Smith" let grid = new DataGrid(Width=300, Height=200, DataSource=emps) let form = new Form(Visible=true, TopMost=true) form.Controls.Add(grid)
Figure 15-1 shows what you see when you run this code.
Stored procedures can also perform deletions or updates (executed via the ExecuteNonQuery()
method of the command object) or return data through out parameters. You can define these using the OUTPUT
keyword after a single parameter definition in the stored procedure. When calling the stored procedure, the out parameter's direction must be set to ParameterDirection.Output
; after the stored procedure executes, its return value can be read using the Value
property of the given parameter.
Many of the database tasks you saw earlier in this chapter can be easily performed using the built-in capabilities of Visual Studio. It also provides good tools for working with stored procedures and views, building SQL queries, and designing entity models.
Assuming you have a version of SQL Server installed, you can create a new SQL Server database in Visual Studio's Server Explorer (Ctrl+Alt+S, or View
In addition to creating a new database, this process also creates a Server Explorer connection to the new database, which you can use in Visual Studio to browse, create, manipulate, and delete tables, triggers, views, stored procedures, functions, and other database artifacts.
Various tools exist to assist application architects in designing the data layer. On a logical level, entity-relationship (ER) models provide a visual representation of data tables and their relationships, showing table fields, primary and foreign keys, and various constraints. Visual Studio simplifies the design process and supports visual modeling; let's take a brief look at how you can exploit its capabilities.
In the previous section, you used Visual Studio to create an SQL Server database called company
; and earlier, you saw how to create a table to store data about employees. Let's extend this database with a new table that stores addresses. You can then link the existing Employees
table to it to store an optional address record for each employee. This means you allow multiple employees to live at the same address, but not multiple addresses for a given employee.
Start by creating the Addresses
table: right-click Add New Table in the Tables list item in Server Explorer
Note that you designate AddID
as a non-null primary key (right-click, select Set Primary Key, and clear the Allow Nulls flag). Also, under Column Properties and Identity Specification, you should set the Is Identity
property to Yes
; this takes care of automatically incrementing the primary-key value when inserting new records.
Next, create a new database diagram, and add/drag your existing two tables, Employees
and Addresses
, onto it. Before you can add a relationship between these two tables, you must create a new nullable field in the Employees
table to store the address foreign key; call it AddressID
.
Now you're ready to link the twotables. First, right-click the Employees
table, and select Relationships. Next, click Add to add a new foreign-key relationship, name it FK_Employee_Address
under Identity
, and then click the ellipsis icon next to Tables and Columns Specification under General to configure the relationship. The foreign-key table (the table that stores references to rows in another table) is Employees
, which is grayed out because you started by adding a relationship to the Employees
table; but you can select the foreign-key field to be AddressID
. Then, select the primary-key table to be Addresses
, with the field AddID
. Doing so links the two tables by storing unique address IDs in the AddressID
field of the Employees
table records, giving you a one-to-many relationship between addresses and employees. Figure 15-4 shows the table design canvas after you're finished. Similarly, linking two primary keys yields a one-to-one relationship.
Linking tables via explicit relationships and controlling the nullability of the foreign-key columns gives you fine control of referential integrity. For instance, you run into a foreign-key violation if you try to remove a record that is linked to another by a foreign-key constraint. On the other hand, storing raw primary-key values without an explicit constraint in related tables loses these benefits.
In the previous example, you made the AddressID
field nullable, meaning it's possible that not every employee has an address. You can also control what happens if employees or addresses are removed or updated. (However, as a design decision, you need to carefully consider one-to-many relationships such as those shown earlier: should a change in one employee's address cause the same change in the addresses of all other employees with the same address? Probably, if it's the same address; otherwise, a particular employee should be updated with a different address record.) Placing a cascade on deletes removes all records that are linked to a record that is deleted; naturally, you should treat this option with care.
The following sections show how to perform relational database queries using FLinq, the F# way of doing LinqToSql queries. FLinq uses F# quotation metaprogramming to represent SQL queries. These are then translated across to SQL and executed using the Microsoft LINQ libraries that are part of .NET Framework 3.5 or higher. At the time of writing, you can work with FLinq as part of the F# Power Pack; be sure to check the latest samples in that distribution for full details and updates.
We assume you're working with the Northwnd.mdf
database, a common database used in many LINQ samples. You can download this sample database as part of the F# Power Pack, or from many other sources on the Web.
The first step in using LINQ with F# is to generate the code that implements the object/relational (O/R) mapping for the database tables to which you're connecting. Let's first look at why you want to do this. You learned previously how to create simple tables and store and retrieve data using SQL code. However, this approach doesn't work well for large data schemas with multiple related tables and constraints. For instance, creating a hierarchy of records in a number of related tables connected by foreign keys involves issuing multiple SQL statements to create the base data and to connect them in ways that obey the foreign-key and other constraints that may be in place. Instead, it can often be much better to view records in tables as if they were objects. This is part of what an O/R mapping provides.
You already saw that datasets can be used to fetch and manipulate data in a disconnected way. These datasets are managed by data adapters that handle the gory details of producing the appropriate SQL statements and issuing them to the underlying database when fetching records or synchronizing the changes made to the dataset back to the database. The main goal of a separate Data Access Layer (DAL) is to bridge the gap between two disparate domains: the database and your application logic. Practically speaking, this frees you from having to write SQL code and mingle it with your application code.
O/R mappings use smart data objects that can load and persist record-level data. Underneath, they use objects such as ADO.NET datasets that are filled and flushed on demand.
The tool this section focuses on is SqlMetal, which computes the O/R mapping for LINQ, part of the .NET Framework 3.5. For example, to use SqlMetal to generate bindings for the Northwnd.mdf
database, you can use this:
sqlmetal /code:northwind.cs /namespace:Nwind /server:.SQLExpress Northwnd.mdf
This assumes you're running SQL Server Express on your machine as a named instance. Further options are available when you run SqlMetal without parameters. You may want to use the /code
and /xml
flags to specify a different output file of the object bindings and to create a schema definition, respectively. The resulting code by default uses C#, which you can change using the /language
option. You can easily add the generated mappings under a separate DLL project in your F# solution and reference it from your F# project.
The remainder of this section uses the generated data object layer for the classic Northwind database, as used by the F# LINQ samples in the F# distribution. This C# source file contains classes for each data table in the Northwind database, and the relationships between those tables are strongly typed.
Regardless of your choice to work with SQL explicitly or implicitly, in applied database scenarios you frequently want to separate data definitions entirely from application code. DDL scripts that create schemas, tables, triggers, and views and scripts that create seed data should be managed as separate artifacts that are executed prior to application deployment.
One of the classes (with the same name as your database) generated by SqlMetal represents the entire database. This class inherits from the DataContext
class and carries with it all the connection details that are used when accessing the data contained in the database through the mapped objects. You can supply a connection string when you instantiate this main database object, as the following code snippet shows (to run this, you need the Northwnd.mdf
file in your source directory):
#I @"c:Program FilesReference AssembliesMicrosoftFrameworkv3.5" #r "System.Core.dll" #r "System.Data.Linq.dll" #r "FSharp.PowerPack.Linq.dll" #r "Northwind.dll" open System open System.Data.SqlClient open Nwind let db = let connB = new SqlConnectionStringBuilder() connB.AttachDBFilename <- __YOUR_SOURCE_DIRECTORY__ + @"Northwnd.mdf" connB.IntegratedSecurity <- true connB.Enlist <- false connB.DataSource <- @".SQLExpress" new Northwnd(connB.ConnectionString)
So far, you've seen how you can perform LINQ-style queries using a set of aggregate operators that work on enumerable objects, and you see these same operators query and manipulate XML data in the coming section. Performing querying and manipulation on relational data is done almost the same way, except that these are implemented under the hood by calling LINQ. Assuming you've mapped your employees
database with SqlMetal and created the main database object as db
, here is an example similar to those in earlier sections using FLinq syntax:
open Microsoft.FSharp.Linq.Query let res = query <@ seq { for emp in db.Employees do if emp.BirthDate.Value.Year > 1960 && emp.LastName.StartsWith "S" then yield (emp.FirstName, emp.LastName) } |> Seq.take 5 @> for (first, last) in res do printfn "%s %s" first last
One notable difference between this and previous queries is the use of F# quotations inside <@
and @>
. As shown in Chapter 9, quotations are reified syntax trees that can be read and manipulated by F# programs. Quoted expressions (provided by the Microsoft.FSharp.Quotations
namespace) are of type Expr<'a>
, where 'a
is the type of the unquoted expression. You can read more about metaprogramming via F# quotations in depth in Chapter 9.
Note that in F# quotations, you can splice values bound to identifiers into the quotation expression tree. This allows quotations to reference defined objects and values such as those representing the database.
The type of the previous query
function is as follows:
val query: Quotations.Expr<'T> -> 'T
This function works on F# quotations. These are converted to LINQ Expression
objects and passed to the underlying LINQ mechanism to produce SQL that is then executed on demand.
Note When writing embedded queries using F# LinqToSql, you can use only a limited subset of operators to express your queries, in particular those defined in the F# LinqToSql library. Check the latest F# LINQ documentation for more details.
XML provides a way to represent data in plain-text format that is independent of the platform, operating system, and application. Nowadays, XML is ubiquitous; it's widely used to describe application configuration data, as an output format for applications such as Microsoft Word and Excel, to wrap data that is sent across networks, and as a way to interact with the new generation of database servers, including Oracle 8i and newer, as well as Microsoft SQL Server 2000 and above. These database servers can work with XML data directly, allowing you to update the database from XML documents or extract data in XML.
Data represented as XML carries various tags and metainformation that help to identify what sort of data is contained within. This results in a larger size, but typically this can be compensated for by applying compression on the XML text.
As an example, consider the following classic XML example (contacts.xml
):
<contacts> <contact> <name>John Smith</name> <phone type="home">+1 626-123-4321</phone> </contact> </contacts>
One way to represent and work with XML documents is via the XML Document Object Model (DOM) contained in the System.Xml
namespace; you saw how to work with this model in Chapter 9. Using the XML DOM constructors and methods, you can create the previous XML as follows:
open System.Xml let doc = new XmlDocument() let rootNode = doc.CreateElement "contacts" doc.AppendChild rootNode |> ignore let contactNode = doc.CreateElement "contact" let nameNode = doc.CreateElement "name" let nameText = doc.CreateTextNode "John Smith" let phoneNode = doc.CreateElement "phone" phoneNode.SetAttribute("type", "home") let phoneText = doc.CreateTextNode "+1 626-123-4321" nameNode.AppendChild nameText |> ignore contactNode.AppendChild nameNode |> ignore contactNode.AppendChild phoneNode |> ignore phoneNode.AppendChild phoneText |> ignore rootNode.AppendChild contactNode |> ignore
Here, you build an XML document in a bottom-up fashion via a series of method calls that mutate the main XML document object. This means various XML elements can't be constructed without this document container object. In addition, construction by mutation makes the shape of the XML hard to read.
Using XmlWriter
, the result is a bit more readable:
let doc = new XmlDocument() let writer = doc.CreateNavigator().AppendChild() writer.WriteStartElement "contacts" writer.WriteStartElement "contact" writer.WriteElementString ("name", "John Smith")
writer.WriteStartElement "phone" writer.WriteAttributeString ("type", "home") writer.WriteString "+1 626-123-4321" writer.WriteEndElement() writer.Close()
Here, you don't have to worry about creating the structure of the document; instead, you output each element in a sequence. XmlWriter
also takes care of the closing tags, even if you forget them before closing the writer.
LINQ to XML (LinqToXml) offers a new and easier way to work with XML than using the traditional XML DOM. The System.Xml.Linq
namespace contains everything you need to construct, load and save, manipulate, and query over XML documents. You can reference the DLL containing this namespace as follows:
#I @"c:Program FilesReference AssembliesMicrosoftFrameworkv3.5" #r "System.Xml.Linq.dll"
Being a data format for tree-like structures, XML trees are made up of a collection of XNode
objects. Structurally, two XNode
descendants can contain other nodes (and thus inherit from XContainer
, a subclass of XNode
): XDocument
and XElement
. Therefore, all XML documents are represented either as an XDocument
with nested XElement
objects or as a collection of nested XElement
s.
A fully qualified XML document (an instance of XDocument
) contains metainformation such as declarations (added as a child XDeclaration
object) and a DTD (added as an XDocumentType
) and may contain various XML processing instructions (instances of XProcessingInstruction
). Typically, you need an XDocument
only if the XML you produce is exchanged with the external world (information passed through a web service, for instance); in any other case, you're working with a collection of nested XElement
objects. These can have other XNode
objects such as XText
for storing text, which often represents binary data encoded using Base64 encoding mentioned in Chapter 10, or XComment
for embedding comments. On the other hand, XElement
s typically have attributes (XAttribute
)—key/value pairs, which are non-XNode
objects. Both the XElement
names (the tags themselves) and the XAttribute
keys are XName
objects.
The easiest way to construct an XML document is to call the Parse
method of the XDocument
or XElement
class on a string:
open System.Xml.Linq let xml = "<contacts> <contact> <name>John Smith</name> <phone type="home">+1 626-123-4321</phone> </contact> </contacts>" let doc = XDocument.Parse xml
LinqToXml makes functional construction of XML possible. It does so by making all XML constituents first-class values that can be created and embedded in each other. The functional construction also requires that you treat these values uniformly. Let's assume you create a few shorthand functions for various XML constructors:
open System.Xml.Linq let xname n = XName.op_Implicit(n) let xdoc (el: seq<XElement>) = new XDocument(Array.map box (Array.ofSeq el)) let xelem s el = new XElement(xname s, box el) let xatt a b = new XAttribute(xname a, b) |> box let xstr s = box s
Using these functions, you can construct the XML from the beginning of this section as follows:
let doc = xdoc [ xelem "contacts" [ xelem "contact" [ (xelem "name" (xstr "John Smith")) (xelem "phone" [ xatt "type" "home" xstr "+1 626-123-4321" ]) ] ] ]
This also includes the default document header (<?xml version="1.0" encoding="utf-8"?>
). If this header isn't needed, you can omit the top-level call to xdoc
.
Loading an existing XML document is straightforward; you can call the Load
static method on either an XDocument
or an XElement
:
let file name = Path.Combine(__YOUR_SOURCE_DIRECTORY__, name) XElement.Load (file "contacts.xml")
Saving is just as easy, reusing doc
from earlier:
doc.Save (file "contacts.xml")
LinqToXml considers an XML document as a collection of XElement
objects, each in a parent/child relationship. The root XElement
's Parent
property is null, even if it's embedded in an XDocument
, which can be ignored as far as the data in the XML is concerned. You can obtain XElement
children using the Elements()
method, or its override, which expects an XName
argument and returns all elements with a given name:
let contacts = doc.Element(xname "contacts") // Get the first contact
for elem in contacts.Elements() do
printfn "Tag=%s, Value=%A" elem.Name.LocalName elem.Value
Queries on XML are often expressed using the XPath query language, which this book doesn't cover in detail but which is supported by the types in the System.Xml
namespace. As a good alternative to XPath, you can use the standard sequence operators to perform queries over XML data. The following example uses the helper functions and the file contacts.xml
created in the previous section:
open System open System.Xml.Linq let elem (e: XElement) s = e.Element(xname s) let elemv e s = (elem e s).Value let contactsXml = XElement.Load(file "contacts.xml") let contacts = contactsXml.Elements ()
> contacts |> Seq.filter (fun e -> (elemv e "name").StartsWith "J") |> Seq.map (fun e -> (elemv e "name"), (elemv e "phone"));; val it : seq<string * string> = seq [("John Smith", "+1 626-123-4321")]
This example also defines some helper functions: elem
to extract from an XElement
object the first child element with a given name, and elemv
to convert that to a string value.
You can also use the query operators in building new XML:
xelem "results" [ contacts |> Seq.filter (fun e -> (elemv e "name").StartsWith "J") ]
This creates a <results>
tag and inserts all employees whose last name starts with the letter J. You can also use sequence expressions to achieve the same result:
xelem "results" [ for e in contacts do if (elemv e "name").StartsWith "J" then yield e ]
In this chapter, you saw how the functional programming techniques from Chapter 3 are often used to implement in-memory queries similar to those used to access databases. You also learned how to use sequence expressions as an alternative notation for these query expressions. We then turned to databases themselves and covered how to use ADO.NET to access relational databases. You saw how to perform a variety of database-related tasks from Visual Studio. You next learned how to perform simple, typed queries using F# LinqToSql, taking particular advantage of the object/relational data objects generated by the LINQ tool SqlMetal.exe
. Finally, you saw how to use XML as a generic data format, partly by using functionality from the LINQ libraries.
The next chapter covers parsing techniques, including using the lexer and parser generator tools that come with the F# distribution.