Chapter 15. Working with Data

Software applications deal with data in a wide array of forms: single values such as integers or strings; composite values paired together as tuples, records, or objects; collections of smaller pieces of data represented as lists, sets, arrays, or sequences; XML strings with tags describing the shape and kind of data; and data coming from relational or object-oriented databases, just to name a few.

This chapter looks at ways of working with some common data sources:

  • In Chapter 3, you saw that sequences, similar to other enumerable data types such as lists, arrays, maps, and sets, have various aggregate iteration, query, and transform operators. This chapter first looks at how you can use these operators in a straightforward manner to form SQL-like operations over in-memory collections. This mechanism can be further tuned and applied to other data sources where the original data source is mapped to a sequence.

  • We cover how you can work with relational databases from F# programs using the ADO.NET libraries. You learn how to connect to a database; create, update, and delete tables and records using simple SQL statements; access database data sequentially; and work with parts of tables using disconnected in-memory representations. You also learn how to manage database connections and data sources; create databases; add new tables; and work with relationships, constraints, and stored procedures using Visual Studio.

  • We cover how you can use the Language Integrated Query (LINQ) infrastructure in combination with F# metaprogramming to bring relational data query logic into reach without explicit use of SQL. The essential goal is to write database queries in F# itself using the same techniques you use to query in-memory data structures.

  • Finally, the chapter looks at the use of XML as a generic data format. You saw in Chapter 9 how to work with the XML Document Object Model (DOM), and we briefly survey how to desugar XML into sequences using LinqToXml.

Querying In-Memory Data Structures

Query languages are often made up of building blocks that transform and filter data. Functional programming gives you the basic tools that allow you to apply standard query logic on all F# types that are compatible with the F# sequence type, such as F# lists, arrays, sequences, and anything else that implements the IEnumerable<'a>/seq<'a> interface.

Select/Where/From Queries Using Aggregate Operators

Consider the following aggregate operators described in Chapter 3:

let select = Seq.map
let where  = Seq.filter

Here you rename the operations in the F# Seq module to correspond to more standard database query terminology: the select operator is defined as a transform operation, because in a selection you take sequence elements and map (often narrow) them to new values. Similarly, the where operator is defined as a filter applied on a sequence that takes an 'a -> bool predicate to filter out the desired elements. You can use these aggregate operators in a straightforward manner to query and transform in-memory data. The only other piece you need is the glue that makes the query and transform operators work together: the standard pipe operator (|>). Recall that this operator flips the order of its arguments; being an infix operator, it feeds the left-side argument into the function on the right. This is useful because the argument to the function is seen before the function itself, propagating important typing information into the function.

For instance, given a string * int * string array representing the name, age, and department of various employees, you can select those names that start with the letter R as follows:

let people = [| ("Joe", 27, "Sales");  ("Rick", 35, "Marketing");
                ("Mark", 40, "Sales"); ("Rob", 31, "Administration");
                ("Bob", 34, "Marketing") |]

let namesR =
    people |> select (fun (name, age, dept) -> name)
           |> where  (fun name -> name.StartsWith "R")

Note that the types of name, age, and dept are inferred from people, and no type annotation is necessary. Finding those people who work in sales and are older than 30 years old is also straightforward:

> let namesSalesOver30 =
    people |> where  (fun (_, age, _)  -> age >= 30)
           |> where  (fun (_, _, dept) -> dept = "Sales")
           |> select (fun (name, _, _) -> name);;
val namesSalesOver30 : seq<string>

> namesSalesOver30;;
val it : seq<string> = seq ["Mark"]

Using Aggregate Operators in Queries

In the previous section, you used alternative names such as select and where for some standard F# operations such as Seq.map and Seq.filter. This is for illustrative purposes to show the connection between these operators and SQL-like querying. In most F# code, you should continue to use the standard F# operators from the Seq module.

In addition to the restriction (filter/where) and projection (map/select) operators, the Seq module contains other useful functions, many of which were described in Chapter 3; and you can easily define further operators. For instance, you can define sorting over sequences by using a custom sorting function together with Seq.sortBy.

Another useful query-like function is Seq.truncate, which takes the first n elements and truncates the rest. Using these new operators—given, for example, an unbounded stream of random numbers—you can extract the first three even numbers and return a pair of those numbers and their square in reverse order, as the following example shows:

let rand = System.Random()
let numbers = seq { while true do yield rand.Next(1000) }

numbers |> Seq.filter (fun i -> i % 2 = 0)  // "where"
        |> Seq.truncate 3
        |> Seq.sort                         // sort ascending
        |> Seq.map (fun i -> i, i*i)        // "select"
// random - results will vary!
val it : seq<int * int> = seq [(814, 662596); (686, 470596); (242, 58564)]

Accumulating Using Folding Operators

Some of the most general operators supported by most F# data structures are reduce, fold, and foldBack. These apply a function to each element of a collection and accumulate a result. For fold and foldBack, the function is applied in left-to-right or right-to-left order, respectively. If you use the name fold, then typically the ordering is left to right. Both functions also take an initial value for the accumulator. For example:

> List.fold (fun acc x -> acc + x) 0 [4; 5; 6];;
val it : int = 15

> Seq.fold (fun acc x -> acc + x) 0.0 [4.0; 5.0; 6.0];;
val it : float = 15.0

> List.foldBack (fun x acc -> min x acc) [4; 5; 6; 3; 5] System.Int32.MaxValue;;
val it : int = 3

The following are equivalent, but no explicit anonymous function values are used:

> List.fold (+) 0 [4; 5; 6];;
val it : int = 15

> Seq.fold (+) 0.0 [4.0; 5.0; 6.0];;
val it : float = 15.0

> List.foldBack min [4; 5; 6; 3; 5] System.Int32.MaxValue;;
val it : int = 3

If used carefully, the various foldBack operators are pleasantly compositional, because they let you apply a selection function as part of the accumulating function:

> List.foldBack (fst >> min) [(3, "three"); (5, "five")] System.Int32.MaxValue;;
val it : int = 3

The F# library also includes more direct accumulation functions such as Seq.sum and Seq.sumBy. These use a fixed accumulation function (addition) with a fixed initial value (zero).

Warning

Folding operators are very powerful and can help you avoid many explicit uses of recursion or loops in your code. However, they're sometimes overused in functional programming and can be hard for novice users to read and understand. Take the time to document uses of these operators, or consider using them to build simpler operators that apply a particular accumulation function.

Expressing Some Queries Using Sequence Expressions

Using aggregate operators to form queries is closely related to the sequence expression notation described in Chapter 3 and is used frequently in this book. For example, namesSalesOver30 defined previously can also be defined as follows:

seq { for (name, age, dept) in people do
        if age >= 30 && dept = "Sales" then
          yield name }

This is a different notation for the same computation. For very simple queries, F# sequence expressions also support an even more compact form where if/then clauses are replaced by when, the final do is dropped from the for statement, and yield is replaced by ->:

seq { for (name, age, dept) in people do
        if age >= 30 && dept = "Sales" then
          yield name }

There is no difference between these two sequence expressions—it's just a matter of syntax.

You can use sequence expressions in conjunction with the |> operator. For example:

seq { for i in numbers do
          if i % 2 = 0 then
              yield (i, i*i) }
|> Seq.truncate 3
|> Seq.sort

There are pros and cons to using sequence-expression syntax for some parts of queries:

  • Sequence expressions are very good for the subset of queries expressed using iteration (for), mapping (select/yield), and filtering (if/then/when/where). They're particularly good for queries containing multiple nested for statements.

  • Other query constructs such as ordering, truncating, grouping, and aggregating must be expressed directly using aggregate operators such as Seq.sortBy and Seq.groupBy.

  • Some queries depend on the index position of an item within a stream. These are best expressed directly using aggregate operators such as Seq.mapi.

  • Many queries are part of a longer series of transformations chained by |> operators. Often, the type of the data being transformed at each step varies substantially through the chain of operators. These queries are best expressed using aggregate operator chains.

Note

It's likely that in the future, the F# sequence-expression syntax will include support for specifying grouping and aggregation operations within the expression syntax. However, at the time of writing, it's necessary to explicitly use operators such as Seq.sortBy and Seq.groupBy for these operations.

Using Databases to Manage Data

Storing data in various files, reading them into memory, and querying and manipulating the resulting in-memory data collection is a sufficient approach for many applications working with data, but there comes a point where more persistent data handling is required.

For example, consider a business that has two applications that need to access the same set of data about employees. One way to do this is to store data in a text file and work with this data file from both applications. But this approach is vulnerable; you'd quickly realize how redundant text files can get when adding new pieces of data (say you wanted to add an address next to the name, department, and age of each employee) or if you wanted to group your employee data by introducing various relationships (say, multiple addresses per employee). Even if you managed to find a good storage alternative, you'd still have to modify both applications to read and write this new data format. Databases make scenarios like these much easier to cope with by enforcing centralized control of persistent data (data that is more persistent than what appears in a single application) and by giving you the freedom to define, manipulate, and query this data in an application-neutral way.

Databases provide many benefits. Some of the more important ones are listed here:

  • Data security: When you have centralized control of your data, you can erect a full security system around the data, implementing specific access rules for each type of access or parts of the database.

  • Sharing data: Any number of applications with the appropriate access rights can connect to your database and read the data stored within—without needing to worry about containing the logic to extract this data. As you see shortly, applications use various query languages (most notably SQL) to communicate with databases.

  • A logical organization of data: You can write new applications that work with the same data without having to worry about how the data is physically represented and stored. On the basic level, this logical structure is provided by a set of entities (data tables) and their relationships.

  • Avoiding data redundancy: Having all requirements from each consuming application up front helps to identify a logical organization for your data that minimizes possible redundancy. For instance, you can use foreign keys instead of duplicating pieces of data. Data normalization is the process of systematically eliminating data redundancy, a large but essential topic that we don't consider in this book.

  • Transactions: Reading from and writing to databases occurs atomically, and as a result, two concurrent transactions can never leave data in an inconsistent, inaccurate state. Isolation levels refer to specific measures taken to ensure transaction isolation by locking various parts of the database (fields, records, tables). Higher isolation levels increase locking overhead and can lead to a loss of parallelism by rendering concurrent transactions sequential; on the other hand, no isolation can lead to inconsistent data.

  • Maintaining data integrity: Databases make sure the data stored within is accurate. Having no redundancy is one way to maintain data integrity (if a piece of data is changed, it's changed in the only place it occurs; thus, it remains accurate); on the other hand, data security and transaction isolation are needed to ensure that the data stored is modified in a controlled manner.

Choosing Your Database Engine

Table 15-1 shows some of the most common database engines, all of which can be used from F# and .NET.

Table 15.1. Common Databases

Name

Type

Description

Available From

PostgreSQL

Open source

Open source database engine

http://postgresql.org/

SQLite

Open source

Small, embeddable, zero-configuration SQL database engine

www.sqlite.org/

DB2

Commercial

IBM's database engine

www-01.ibm.com/software/data/db2/ad/dotnet.html

Firebird

Open source

Based on Borland Interbase

www.firebirdsql.org/

MySQL

Open source

Reliable and popular database

www.mysql.com/

Mimer SQL

Commercial

Reliable database engine

www.mimer.com/

Oracle

Commercial

One of the most popular enterprise database engines

www.oracle.com/

SQL Server

Commercial

Microsoft's main database engine

www.microsoft.com/sql/default.mspx

SQL Server Express

Commercial

Free and easy-to-use version of SQL Server

www.microsoft.com/express/database/default.aspx

Sybase iAnywhere

Commercial

Mobile database engine

www.ianywhere.com/

Applications communicate with relational databases using Structured Query Language (SQL). Each time you create tables, create relationships, insert new records, or update or delete existing ones, you're explicitly or implicitly issuing SQL statements to the database. The examples in this chapter use a dialect of Standard SQL called Transact-SQL (T-SQL), used by SQL Server and SQL Server Express. SQL has syntax to define the structure of a database schema (loosely speaking, a collection of data tables and their relations) and also syntax to manage the data within. These subsets of SQL are called Data Definition Language (DDL) and Data Manipulation Language (DML), respectively. The most important DDL statements are the following:

CREATE/ALTER/DROP TABLE
CREATE/DROP VIEW

Understanding ADO.NET

ADO.NET is the central database-access machinery in the .NET Framework, and it provides full XML support, disconnected and typed datasets, scalability, and high performance. This section gives a brief overview of the ADO.NET fundamentals.

With ADO.NET, data is acquired through a connection to the database via a provider. This connection serves as a medium against which to execute a command; this can be used to fetch, update, insert, or delete data from the data store. Statements and queries are articulated as SQL text (CREATE, SELECT, UPDATE, INSERT, and DELETE statements) and are passed to the command object's constructor. When you execute these statements, you obtain data (in the case of queries) or the number of affected rows (in the case of UPDATE, INSERT, and DELETE statements). The data returned can be processed via two main mechanisms: sequentially in a read-only fashion using a DataReader object or by loading it into an in-memory representation (a DataSet object) for further disconnected processing. DataSet objects store data in a set of table objects along with metadata that describes their relationships and constraints in a fully contained model.

ADO.NET 2.0 comes with four data providers: SQLClient, OleDb, Oracle, and Odbc. Table 15-2 describes them and a couple of more commonly used providers. These providers act as the main gateway to the database.

Table 15.2. Common Data Providers

Name

Namespace

Available From

SQLClient

System.Data.SqlClient

.NET 2.0

OleDb

System.Data.OleDb

.NET 2.0

Oracle

System.Data.OracleClient

.NET 2.0

ODBC

System.Data.Odbc

.NET 2.0

PostgreSQL

Npgsql

http://npgsql.projects.postgresql.org/

MySql

MySql.Data.MySqlClient

http://dev.mysql.com/downloads/

The OleDb and ODBC data providers are provided for compatibility with earlier database-access technologies. All ADO.NET connection and command classes have the data provider name as the prefix to their class name, as in OdbcConnection and OdbcCommand or OleDbConnection and OleDbCommand.

Establishing Connections to a Database Engine

Before you can do any work with a database, you need to establish a connection to it. For instance, you can connect to a locally running instance of SQL Server 2005 Express using the following code:

open System.Data
open System.Data.SqlClient

let connString = @"Server='.SQLEXPRESS';Integrated Security=SSPI"
let conn = new SqlConnection(connString)

The value connString is a connection string. Regardless of how you created your connection object, to execute any updates or queries on it, you need to open it first:

> conn.Open();;
val it : unit = ()

If this command fails, then you may need to do one of the following:

  • Install SQL Server 2005 Express or a newer version.

  • Consult the latest SQL Server Express samples for alternative connection strings.

  • Add UserInstance='true' to the connection string. This starts the database engine as a user-level process.

  • Change the connection string if you have a different database engine installed and running (for instance, if you're using SQL Server instead of SQL Server Express).

Connections established using the same connection string are pooled and reused depending on your database engine. Connections are often a limited resource and should generally be closed as soon as possible within your application.

Tip

The "More on Connection Strings" sidebar contains details about creating and managing connection strings. A useful web site for complete connection strings is www.connectionstrings.com.

Creating a Database

Now that you've established a connection to the database engine, you can explicitly create a database from F# code by executing a SQL statement directly. For example, you can create a database called company as follows:

open System.Data
open System.Data.SqlClient

let execNonQuery conn s =
    let comm = new SqlCommand(s, conn, CommandTimeout = 10)
    comm.ExecuteNonQuery() |> ignore

execNonQuery conn "CREATE DATABASE company"

You use execNonQuery in the subsequent sections. This method takes a connection object and a SQL string and executes it as a SQL command, ignoring its result.

Note

If you try to create the same database twice, you receive a runtime exception. However, if you intend to drop an existing database, you can do so by issuing a DROP DATABASE company SQL command. The DROP command can also be used for other database artifacts, including tables, views, and stored procedures.

Creating Tables and Inserting and Fetching Records

You can execute a simple SQL command to create a table; all you need to do is specify its data fields and their types and whether null values are allowed. The following example creates an Employees table with a primary key EmpID and FirstName, LastName, and Birthday fields:

execNonQuery conn "CREATE TABLE Employees (
   EmpID int NOT NULL,
   FirstName varchar(50) NOT NULL,
   LastName varchar(50) NOT NULL,
   Birthday datetime,
   PRIMARY KEY (EmpID))"

You can now insert two new records as follows

execNonQuery conn "INSERT INTO Employees (EmpId, FirstName, LastName, Birthday)
   VALUES (1001, 'Joe', 'Smith', '02/14/1965')"

execNonQuery conn "INSERT INTO Employees (EmpId, FirstName, LastName, Birthday)
   VALUES (1002, 'Mary', 'Jones', '09/15/1985')"

and retrieve two columns of what was inserted using a fresh connection and a data reader:

let query() =
    seq { use conn = new SqlConnection(connString)
          do conn.Open()
          use comm = new SqlCommand("SELECT FirstName, Birthday FROM Employees",
                                     conn)
          use reader = comm.ExecuteReader()
          while reader.Read() do
              yield (reader.GetString 0, reader.GetDateTime 1)  }

When you evaluate the query expression in F# Interactive, a connection to the database is created and opened, the command is built, and the reader is used to read successive elements:

> fsi.AddPrinter(fun (d: System.DateTime) -> d.ToString());;
val it : unit = ()

> query();;
val it : seq<string * System.DateTime> =
     seq [("Joe", 14/02/1965 00:00:00); ("Mary", 15/09/1985 00:00:00)]

The definition of query uses sequence expressions that locally define new IDisposable objects such as conn, comm, and reader using declarations of the form use var = expr. These ensure that the locally defined connection, command, and reader objects are disposed after exhausting the entire sequence. See Chapters 4, 8, and 9 for more details about sequence expressions of this kind.

F# sequences are on-demand (that is, lazy), and the definition of query doesn't open a connection to the database. This is done when the sequence is first iterated; a connection is maintained until the sequence is exhausted.

Note that the command object's ExecuteReader method returns a DataReader instance that is used to extract the typed data returned from the query. You can read from the resulting sequence in a straightforward manner using a sequence iterator. For instance, you can use a simple anonymous function to print data on the screen:

> query() |> Seq.iter (fun (fn, bday) -> printfn "%s has birthday %O" fn bday);;
Joe has birthday 14/02/1965 00:00:00
Mary has birthday 15/09/1985 00:00:00
val it : unit = ()

The query brings the data from the database in-memory, although still as a lazy sequence. You can then use standard F# in-memory data transformations on the result:

> query()
  |> Seq.filter (fun (nm, bday) -> bday < System.DateTime.Parse("01/01/1985"))
  |> Seq.length;;
val it : int = 1

However, be aware that these additional transformations are happening in-memory and not in the database.

The command object has different methods for executing different queries. For instance, if you have a nonselect query, you need to use the ExecuteNonQuery method (for UPDATE, INSERT, and DELETE statements, as previously in execNonQuery), which returns the number of rows affected (updated, inserted, or deleted), or the ExecuteScalar method, which returns the first column of the first row of the result, providing a fast and efficient way to extract a single value, such as the number of rows in a table or a result set.

In the previous command, you extracted fields from the result rows using GetXXX methods on the reader object. The particular methods have to match the types of the fields selected in the SQL query, and a mismatch results in a runtime InvalidCastException. For these and other reasons, DataReader tends to be suitable only in situations when the following items are true:

  • You need to read data only in a sequential order (as returned from the database). DataReader provides forward-only data access.

  • The field types of the result are known, and the query isn't configurable.

  • You're reading only and not writing data. DataReader provides read-only access.

  • Your use of the DataReader is localized. The data connection is open throughout the reader loop.

Using Untyped Datasets

Datasets allow applications to work in a disconnected mode, which means the application connects to the database, loads relevant data to an in-memory representation, and processes the data locally. When the processing is completed, the data in the in-memory dataset can be synchronized with the database.

ADO.NET datasets are compact in-memory databases and provide similar functionality to real databases; however, they're designed to work with a limited amount of data. A DataSet contains a collection of tables, the relationships between those tables, and various constraints. Tables are represented by the DataTable type. And as in a SQL server, tables are defined by their columns (DataColumn objects), and the data is stored in rows (DataRow objects).

In ADO.NET, the loading of data into the dataset and the synchronization to the database are coordinated by data adapters—for instance, SqlDataAdapter objects. Consider the following example, which defines a buildDataSet function that takes a connection and a SQL SELECT query and returns the resulting DataSet object:

let dataAdapter = new SqlDataAdapter()

let buildDataSet conn queryString =
    dataAdapter.SelectCommand <- new SqlCommand(queryString, conn)
    let dataSet = new DataSet()
    // This line is needed to configure the command
    let _ = new SqlCommandBuilder(dataAdapter)
    dataAdapter.Fill(dataSet) |> ignore  // ignore the number of records returned
    dataset

The inferred types are as follows:

val dataAdapter : SqlDataAdapter
val buildDataSet : SqlConnection -> string -> DataSet

When setting up the data adapter, you initialize its SelectCommand property to the query string that was passed as an argument. This SQL query is used to populate the DataSet when the Fill method is called. The somewhat mysterious line let _ = new SqlCommandBuilder(dataAdapter) creates a command-builder object, which has the side effect of building the INSERT, UPDATE, and DELETE commands automatically from the query in SelectCommand, making the dataset capable of persisting changes. Also, note that you don't have to worry about opening or closing connections, because this is taken care of with the data adapter; all you need is the connection object itself:

let dataSet =
    buildDataSet conn "SELECT EmpID, FirstName, LastName, Birthday from Employees"

The resulting DataSet contains a single table; you obtain it by index and print its content. This table is assembled based on the query and contains the columns that are part of the SELECT statement:

let table = dataSet.Tables.Item(0)
for row in table.Rows do
    printfn "%O, %O - %O"
        (row.Item "LastName")
        (row.Item "FirstName")
        (row.Item "Birthday")

When run in F# Interactive, this produces the following output:

Smith, Joe - 14/02/1965 00:00:00
Jones, Mary - 15/09/1985 00:00:00

You can refer to each column of a row by the same field name used in the SELECT query. Adding a new row to table is treated similarly:

let row = table.NewRow()
row.Item("EmpID") <- 1003
row.Item("FirstName") <- "Eve"
row.Item("LastName") <- "Smith"
row.Item("Birthday") <- System.DateTime.Today
table.Rows.Add row
dataAdapter.Update(dataSet) |> ignore  // ignore the number of affected rows

Repeating the SQL query from the previous section reveals the addition of the new entry to the database table:

> query();;
val it : seq<string * System.DateTime> =
    seq [("Joe", 14/02/1965 00:00:00);
         ("Mary", 15/09/1985 00:00:00);
         ("Eve", 27/09/2007 00:00:00)]

Note that you use the INSERT statement that was built by the command-builder object based on the selection query. Using untyped datasets is a great way to execute dynamic queries and to provide ad hoc access to your data. On the other hand, the lack of strong typing means it suffers from possible type mismatches or incorrect field names.

Generating Typed Datasets Using xsd.exe

Typed datasets are derived from ordinary datasets and allow you to work with data in a type-safe manner. This means that instead of (row.Item "FirstName" :?> string), you can write row.FirstName. You can create typed datasets using the xsd.exe command-line tool included in the .NET SDK. Among other things, this tool generates source code from XML schema documents; to use it, you must obtain an XSD document for the tables for which you want data objects. (You can quickly generate an XML schema document using Visual Studio; select File

Generating Typed Datasets Using xsd.exe

Alternatively, you can extract this schema definition using code:

open System.IO

let dataSet2 = buildDataSet conn "SELECT * FROM Employees"
let file name = Path.Combine(@"c:fsharp", name)

File.WriteAllText(file "employees.xsd", dataSet2.GetXmlSchema())

Using this extracted XSD document, you can now fire up xsd.exe from the command-line shell:

C:fsharp> xsd.exe employees.xsd /d /l:CS /n:Employees.Data
Writing file 'C:fsharpEmployees.cs'.
C:fsharp> csc /target:library Employees.cs

This generates a C# file Employees.cs in the Employees.Data namespace, containing a typed dataset for the Employees table, which you then compile to Employees.dll. You can now reference this DLL and create an instance of a typed dataset:

> #r @"employees.dll";;
Binding session to 'C:fsharpEmployees.dll'...

> let employeesTable = new Employees.Data.NewDataSet();;
val employeesTable : Employees.Data.NewDataSet

> dataAdapter.Fill(employeesTable) |> ignore;;  // ignore the number of records
val it : unit = ()

> for emp in employeesTable._Table do
    printfn "%s, %s - %O" emp.LastName emp.FirstName emp.Birthday;;
Smith, Joe - 14/02/1965 00:00:00
Jones, Mary - 15/09/1985 00:00:00
Smith, Eve - 27/09/2007 00:00:00

Note that in the iteration emp is known to have a strong type that allows you to access the fields LastName, FirstName, and Birthday.

Finally, it's very easy to dump out XML for your data:

> printf "%s" (employeesTable.GetXml());;
<NewDataSet>
  <Table>
    <EmpID>1001</EmpID>
    <FirstName>Joe</FirstName>
    <LastName>Smith</LastName>
    <Birthday>1965-02-14T00:00:00+00:00</Birthday>
  </Table>
  <Table>
    <EmpID>1002</EmpID>
    <FirstName>Mary</FirstName>
    <LastName>Jones</LastName>
    <Birthday>1985-09-15T00:00:00+01:00</Birthday>
  </Table>

  <Table>
    <EmpID>1003</EmpID>
    <FirstName>Eve</FirstName>
    <LastName>Smith</LastName>
    <Birthday>2007-09-27T00:00:00+01:00</Birthday>
  </Table>
</NewDataSet>

You can also write it to a file:

> System.IO.File.WriteAllText(file "employees.xml", employeesTable.GetXml());;
val it : unit = ()

Using Stored Procedures

Stored procedures are defined and stored in your relational database and provide a number of benefits over literal SQL. First, they're external to the application and thus provide a clear division of the data logic from the rest of the application. This enables you to make data-related modifications without having to change application code or having to redeploy the application. Second, they're stored in the database in a prepared or compiled form and thus are executed more efficiently than literal SQL statements (although those can be prepared as well at a one-time cost, but they're still contained in application space, which is undesirable). Supplying arguments to stored procedures instantiates the compiled formula.

In Visual Studio, you can add stored procedures just like any other database artifacts using the Server Explorer window: right-click the Stored Procedures item in the appropriate database, and select Add New Stored Procedure. Doing so creates a stored procedure template that you can easily customize. Alternatively, you can add stored procedures programmatically using the CREATE PROCEDURE SQL command. Consider the following stored procedure that returns the first and last names of all employees whose last name matches the given pattern:

execNonQuery conn "
CREATE PROCEDURE dbo.GetEmployeesByLastName
        (
        @Name nvarchar(50)
        )
AS
    SELECT
        Employees.FirstName, Employees.LastName
    FROM Employees
    WHERE Employees.LastName LIKE @Name"

You can wrap this stored procedure in a function as follows:

let GetEmployeesByLastName (name: string) =
    use comm = new SqlCommand("GetEmployeesByLastName", conn,
                              CommandType=CommandType.StoredProcedure)

    comm.Parameters.AddWithValue("@Name", name) |> ignore
    use adapter = new SqlDataAdapter(comm)
    let table = new DataTable()
    adapter.Fill(table) |> ignore
    table

You can execute the stored procedure as follows to find employees with the last name Smith:

> for row in (GetEmployeesByLastName "Smith").Rows do
     printfn "row = %O, %O" (row.Item("FirstName")) (row.Item("LastName"));;
row = Joe, Smith
row = Eve, Smith
val it : unit = ()

Using Data Grids

You saw in Chapter 14 how data tables can be visualized in web applications. The return value of GetEmployeesByLastName from the previous section is a DataTable. These objects can also be directly bound to a Windows Forms data grid, a visual data control that supports the DataSource property and that can display data in a tabular format (Windows Forms controls were discussed in Chapter 11):

open System.Windows.Forms

let emps = GetEmployeesByLastName "Smith"
let grid = new DataGrid(Width=300, Height=200, DataSource=emps)
let form = new Form(Visible=true, TopMost=true)
form.Controls.Add(grid)

Figure 15-1 shows what you see when you run this code.

Calling the GetEmployeesByLastName stored procedure

Figure 15.1. Calling the GetEmployeesByLastName stored procedure

Stored procedures can also perform deletions or updates (executed via the ExecuteNonQuery() method of the command object) or return data through out parameters. You can define these using the OUTPUT keyword after a single parameter definition in the stored procedure. When calling the stored procedure, the out parameter's direction must be set to ParameterDirection.Output; after the stored procedure executes, its return value can be read using the Value property of the given parameter.

Working with Databases in Visual Studio

Many of the database tasks you saw earlier in this chapter can be easily performed using the built-in capabilities of Visual Studio. It also provides good tools for working with stored procedures and views, building SQL queries, and designing entity models.

Creating a Database

Assuming you have a version of SQL Server installed, you can create a new SQL Server database in Visual Studio's Server Explorer (Ctrl+Alt+S, or View

Creating a Database
Creating a new Microsoft SQL Server database

Figure 15.2. Creating a new Microsoft SQL Server database

In addition to creating a new database, this process also creates a Server Explorer connection to the new database, which you can use in Visual Studio to browse, create, manipulate, and delete tables, triggers, views, stored procedures, functions, and other database artifacts.

Visual Data Modeling: Adding Relationships

Various tools exist to assist application architects in designing the data layer. On a logical level, entity-relationship (ER) models provide a visual representation of data tables and their relationships, showing table fields, primary and foreign keys, and various constraints. Visual Studio simplifies the design process and supports visual modeling; let's take a brief look at how you can exploit its capabilities.

In the previous section, you used Visual Studio to create an SQL Server database called company; and earlier, you saw how to create a table to store data about employees. Let's extend this database with a new table that stores addresses. You can then link the existing Employees table to it to store an optional address record for each employee. This means you allow multiple employees to live at the same address, but not multiple addresses for a given employee.

Start by creating the Addresses table: right-click Add New Table in the Tables list item in Server Explorer

Visual Data Modeling: Adding Relationships
The Addresses table in the designer mode

Figure 15.3. The Addresses table in the designer mode

Note that you designate AddID as a non-null primary key (right-click, select Set Primary Key, and clear the Allow Nulls flag). Also, under Column Properties and Identity Specification, you should set the Is Identity property to Yes; this takes care of automatically incrementing the primary-key value when inserting new records.

Next, create a new database diagram, and add/drag your existing two tables, Employees and Addresses, onto it. Before you can add a relationship between these two tables, you must create a new nullable field in the Employees table to store the address foreign key; call it AddressID.

Now you're ready to link the twotables. First, right-click the Employees table, and select Relationships. Next, click Add to add a new foreign-key relationship, name it FK_Employee_Address under Identity, and then click the ellipsis icon next to Tables and Columns Specification under General to configure the relationship. The foreign-key table (the table that stores references to rows in another table) is Employees, which is grayed out because you started by adding a relationship to the Employees table; but you can select the foreign-key field to be AddressID. Then, select the primary-key table to be Addresses, with the field AddID. Doing so links the two tables by storing unique address IDs in the AddressID field of the Employees table records, giving you a one-to-many relationship between addresses and employees. Figure 15-4 shows the table design canvas after you're finished. Similarly, linking two primary keys yields a one-to-one relationship.

Note

Linking tables via explicit relationships and controlling the nullability of the foreign-key columns gives you fine control of referential integrity. For instance, you run into a foreign-key violation if you try to remove a record that is linked to another by a foreign-key constraint. On the other hand, storing raw primary-key values without an explicit constraint in related tables loses these benefits.

Adding a one-to-many relationship between Employees and Addresses

Figure 15.4. Adding a one-to-many relationship between Employees and Addresses

In the previous example, you made the AddressID field nullable, meaning it's possible that not every employee has an address. You can also control what happens if employees or addresses are removed or updated. (However, as a design decision, you need to carefully consider one-to-many relationships such as those shown earlier: should a change in one employee's address cause the same change in the addresses of all other employees with the same address? Probably, if it's the same address; otherwise, a particular employee should be updated with a different address record.) Placing a cascade on deletes removes all records that are linked to a record that is deleted; naturally, you should treat this option with care.

Accessing Relational Data with Linq Queries

The following sections show how to perform relational database queries using FLinq, the F# way of doing LinqToSql queries. FLinq uses F# quotation metaprogramming to represent SQL queries. These are then translated across to SQL and executed using the Microsoft LINQ libraries that are part of .NET Framework 3.5 or higher. At the time of writing, you can work with FLinq as part of the F# Power Pack; be sure to check the latest samples in that distribution for full details and updates.

We assume you're working with the Northwnd.mdf database, a common database used in many LINQ samples. You can download this sample database as part of the F# Power Pack, or from many other sources on the Web.

Generating the Object/Relational Mapping

The first step in using LINQ with F# is to generate the code that implements the object/relational (O/R) mapping for the database tables to which you're connecting. Let's first look at why you want to do this. You learned previously how to create simple tables and store and retrieve data using SQL code. However, this approach doesn't work well for large data schemas with multiple related tables and constraints. For instance, creating a hierarchy of records in a number of related tables connected by foreign keys involves issuing multiple SQL statements to create the base data and to connect them in ways that obey the foreign-key and other constraints that may be in place. Instead, it can often be much better to view records in tables as if they were objects. This is part of what an O/R mapping provides.

You already saw that datasets can be used to fetch and manipulate data in a disconnected way. These datasets are managed by data adapters that handle the gory details of producing the appropriate SQL statements and issuing them to the underlying database when fetching records or synchronizing the changes made to the dataset back to the database. The main goal of a separate Data Access Layer (DAL) is to bridge the gap between two disparate domains: the database and your application logic. Practically speaking, this frees you from having to write SQL code and mingle it with your application code.

O/R mappings use smart data objects that can load and persist record-level data. Underneath, they use objects such as ADO.NET datasets that are filled and flushed on demand.

The tool this section focuses on is SqlMetal, which computes the O/R mapping for LINQ, part of the .NET Framework 3.5. For example, to use SqlMetal to generate bindings for the Northwnd.mdf database, you can use this:

sqlmetal /code:northwind.cs /namespace:Nwind /server:.SQLExpress Northwnd.mdf

This assumes you're running SQL Server Express on your machine as a named instance. Further options are available when you run SqlMetal without parameters. You may want to use the /code and /xml flags to specify a different output file of the object bindings and to create a schema definition, respectively. The resulting code by default uses C#, which you can change using the /language option. You can easily add the generated mappings under a separate DLL project in your F# solution and reference it from your F# project.

The remainder of this section uses the generated data object layer for the classic Northwind database, as used by the F# LINQ samples in the F# distribution. This C# source file contains classes for each data table in the Northwind database, and the relationships between those tables are strongly typed.

Tip

Regardless of your choice to work with SQL explicitly or implicitly, in applied database scenarios you frequently want to separate data definitions entirely from application code. DDL scripts that create schemas, tables, triggers, and views and scripts that create seed data should be managed as separate artifacts that are executed prior to application deployment.

Building the DataContext Instance

One of the classes (with the same name as your database) generated by SqlMetal represents the entire database. This class inherits from the DataContext class and carries with it all the connection details that are used when accessing the data contained in the database through the mapped objects. You can supply a connection string when you instantiate this main database object, as the following code snippet shows (to run this, you need the Northwnd.mdf file in your source directory):

#I @"c:Program FilesReference AssembliesMicrosoftFrameworkv3.5"
#r "System.Core.dll"
#r "System.Data.Linq.dll"
#r "FSharp.PowerPack.Linq.dll"
#r "Northwind.dll"

open System
open System.Data.SqlClient
open Nwind

let db =
    let connB = new SqlConnectionStringBuilder()
    connB.AttachDBFilename <- __YOUR_SOURCE_DIRECTORY__ + @"Northwnd.mdf"
    connB.IntegratedSecurity <- true
    connB.Enlist <- false
    connB.DataSource <- @".SQLExpress"
    new Northwnd(connB.ConnectionString)

Using LINQ from F#

So far, you've seen how you can perform LINQ-style queries using a set of aggregate operators that work on enumerable objects, and you see these same operators query and manipulate XML data in the coming section. Performing querying and manipulation on relational data is done almost the same way, except that these are implemented under the hood by calling LINQ. Assuming you've mapped your employees database with SqlMetal and created the main database object as db, here is an example similar to those in earlier sections using FLinq syntax:

open Microsoft.FSharp.Linq.Query

let res =
    query <@ seq { for emp in db.Employees do
                         if emp.BirthDate.Value.Year > 1960
                           && emp.LastName.StartsWith "S" then
                             yield (emp.FirstName, emp.LastName) }
          |> Seq.take 5 @>

for (first, last) in res do
  printfn "%s %s" first last

One notable difference between this and previous queries is the use of F# quotations inside <@ and @>. As shown in Chapter 9, quotations are reified syntax trees that can be read and manipulated by F# programs. Quoted expressions (provided by the Microsoft.FSharp.Quotations namespace) are of type Expr<'a>, where 'a is the type of the unquoted expression. You can read more about metaprogramming via F# quotations in depth in Chapter 9.

Note that in F# quotations, you can splice values bound to identifiers into the quotation expression tree. This allows quotations to reference defined objects and values such as those representing the database.

The type of the previous query function is as follows:

val query: Quotations.Expr<'T> -> 'T

This function works on F# quotations. These are converted to LINQ Expression objects and passed to the underlying LINQ mechanism to produce SQL that is then executed on demand.

Note

Note When writing embedded queries using F# LinqToSql, you can use only a limited subset of operators to express your queries, in particular those defined in the F# LinqToSql library. Check the latest F# LINQ documentation for more details.

Working with XML as a Generic Data Format

XML provides a way to represent data in plain-text format that is independent of the platform, operating system, and application. Nowadays, XML is ubiquitous; it's widely used to describe application configuration data, as an output format for applications such as Microsoft Word and Excel, to wrap data that is sent across networks, and as a way to interact with the new generation of database servers, including Oracle 8i and newer, as well as Microsoft SQL Server 2000 and above. These database servers can work with XML data directly, allowing you to update the database from XML documents or extract data in XML.

Data represented as XML carries various tags and metainformation that help to identify what sort of data is contained within. This results in a larger size, but typically this can be compensated for by applying compression on the XML text.

As an example, consider the following classic XML example (contacts.xml):

<contacts>
  <contact>
    <name>John Smith</name>
    <phone type="home">+1 626-123-4321</phone>
  </contact>
</contacts>

One way to represent and work with XML documents is via the XML Document Object Model (DOM) contained in the System.Xml namespace; you saw how to work with this model in Chapter 9. Using the XML DOM constructors and methods, you can create the previous XML as follows:

open System.Xml

let doc = new XmlDocument()
let rootNode = doc.CreateElement "contacts"
doc.AppendChild rootNode |> ignore
let contactNode = doc.CreateElement "contact"
let nameNode = doc.CreateElement "name"
let nameText = doc.CreateTextNode "John Smith"
let phoneNode = doc.CreateElement "phone"
phoneNode.SetAttribute("type", "home")
let phoneText = doc.CreateTextNode "+1 626-123-4321"
nameNode.AppendChild nameText |> ignore
contactNode.AppendChild nameNode |> ignore
contactNode.AppendChild phoneNode |> ignore
phoneNode.AppendChild phoneText |> ignore
rootNode.AppendChild contactNode |> ignore

Here, you build an XML document in a bottom-up fashion via a series of method calls that mutate the main XML document object. This means various XML elements can't be constructed without this document container object. In addition, construction by mutation makes the shape of the XML hard to read.

Using XmlWriter, the result is a bit more readable:

let doc = new XmlDocument()
let writer = doc.CreateNavigator().AppendChild()
writer.WriteStartElement "contacts"
writer.WriteStartElement "contact"
writer.WriteElementString ("name", "John Smith")
writer.WriteStartElement "phone"
writer.WriteAttributeString ("type", "home")
writer.WriteString "+1 626-123-4321"
writer.WriteEndElement()
writer.Close()

Here, you don't have to worry about creating the structure of the document; instead, you output each element in a sequence. XmlWriter also takes care of the closing tags, even if you forget them before closing the writer.

Constructing XML via LINQ

LINQ to XML (LinqToXml) offers a new and easier way to work with XML than using the traditional XML DOM. The System.Xml.Linq namespace contains everything you need to construct, load and save, manipulate, and query over XML documents. You can reference the DLL containing this namespace as follows:

#I @"c:Program FilesReference AssembliesMicrosoftFrameworkv3.5"
#r "System.Xml.Linq.dll"

Being a data format for tree-like structures, XML trees are made up of a collection of XNode objects. Structurally, two XNode descendants can contain other nodes (and thus inherit from XContainer, a subclass of XNode): XDocument and XElement. Therefore, all XML documents are represented either as an XDocument with nested XElement objects or as a collection of nested XElements.

A fully qualified XML document (an instance of XDocument) contains metainformation such as declarations (added as a child XDeclaration object) and a DTD (added as an XDocumentType) and may contain various XML processing instructions (instances of XProcessingInstruction). Typically, you need an XDocument only if the XML you produce is exchanged with the external world (information passed through a web service, for instance); in any other case, you're working with a collection of nested XElement objects. These can have other XNode objects such as XText for storing text, which often represents binary data encoded using Base64 encoding mentioned in Chapter 10, or XComment for embedding comments. On the other hand, XElements typically have attributes (XAttribute)—key/value pairs, which are non-XNode objects. Both the XElement names (the tags themselves) and the XAttribute keys are XName objects.

The easiest way to construct an XML document is to call the Parse method of the XDocument or XElement class on a string:

open System.Xml.Linq

let xml =
   "<contacts>
      <contact>
         <name>John Smith</name>
         <phone type="home">+1 626-123-4321</phone>
      </contact>
   </contacts>"
let doc = XDocument.Parse xml

LinqToXml makes functional construction of XML possible. It does so by making all XML constituents first-class values that can be created and embedded in each other. The functional construction also requires that you treat these values uniformly. Let's assume you create a few shorthand functions for various XML constructors:

open System.Xml.Linq

let xname n                    = XName.op_Implicit(n)
let xdoc (el: seq<XElement>)   = new XDocument(Array.map box (Array.ofSeq el))
let xelem s el                 = new XElement(xname s, box el)
let xatt  a b                  = new XAttribute(xname a, b) |> box
let xstr  s                    = box s

Using these functions, you can construct the XML from the beginning of this section as follows:

let doc =
    xdoc
        [ xelem "contacts"
            [ xelem "contact"
                [ (xelem "name" (xstr "John Smith"))
                  (xelem "phone"
                       [ xatt "type" "home"
                         xstr "+1 626-123-4321" ])
                ]
            ]
        ]

This also includes the default document header (<?xml version="1.0" encoding="utf-8"?>). If this header isn't needed, you can omit the top-level call to xdoc.

Storing, Loading, and Traversing LinqToXml Documents

Loading an existing XML document is straightforward; you can call the Load static method on either an XDocument or an XElement:

let file name = Path.Combine(__YOUR_SOURCE_DIRECTORY__, name)

XElement.Load (file "contacts.xml")

Saving is just as easy, reusing doc from earlier:

doc.Save (file "contacts.xml")

LinqToXml considers an XML document as a collection of XElement objects, each in a parent/child relationship. The root XElement's Parent property is null, even if it's embedded in an XDocument, which can be ignored as far as the data in the XML is concerned. You can obtain XElement children using the Elements() method, or its override, which expects an XName argument and returns all elements with a given name:

let contacts = doc.Element(xname "contacts")  // Get the first contact
for elem in contacts.Elements() do
    printfn "Tag=%s, Value=%A" elem.Name.LocalName elem.Value

Querying XML

Queries on XML are often expressed using the XPath query language, which this book doesn't cover in detail but which is supported by the types in the System.Xml namespace. As a good alternative to XPath, you can use the standard sequence operators to perform queries over XML data. The following example uses the helper functions and the file contacts.xml created in the previous section:

open System
open System.Xml.Linq

let elem (e: XElement) s       = e.Element(xname s)
let elemv e s                  = (elem e s).Value

let contactsXml = XElement.Load(file "contacts.xml")
let contacts = contactsXml.Elements ()
> contacts |> Seq.filter (fun e -> (elemv e "name").StartsWith "J")
           |> Seq.map (fun e -> (elemv e "name"), (elemv e "phone"));;
val it : seq<string * string> = seq [("John Smith", "+1 626-123-4321")]

This example also defines some helper functions: elem to extract from an XElement object the first child element with a given name, and elemv to convert that to a string value.

You can also use the query operators in building new XML:

xelem "results"
   [ contacts |> Seq.filter  (fun e -> (elemv e "name").StartsWith "J")  ]

This creates a <results> tag and inserts all employees whose last name starts with the letter J. You can also use sequence expressions to achieve the same result:

xelem "results"
   [ for e in contacts do
         if (elemv e "name").StartsWith "J" then
             yield e ]

Summary

In this chapter, you saw how the functional programming techniques from Chapter 3 are often used to implement in-memory queries similar to those used to access databases. You also learned how to use sequence expressions as an alternative notation for these query expressions. We then turned to databases themselves and covered how to use ADO.NET to access relational databases. You saw how to perform a variety of database-related tasks from Visual Studio. You next learned how to perform simple, typed queries using F# LinqToSql, taking particular advantage of the object/relational data objects generated by the LINQ tool SqlMetal.exe. Finally, you saw how to use XML as a generic data format, partly by using functionality from the LINQ libraries.

The next chapter covers parsing techniques, including using the lexer and parser generator tools that come with the F# distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset