In Chapter 3, you saw some of the simple but powerful data types and language constructs that make up F# functional programming. The functional programming paradigm is strongly associated with "programming without side effects," called pure functional programming. In this paradigm, programs compute the result of a mathematical expression and don't cause any side effects, except perhaps reporting the result of the computation. The formulas used in spreadsheets are often pure, as is the core of functional programming languages such as Haskell. F# isn't, however, a pure functional language. For example, you can write programs that mutate data, perform I/O communications, start threads, and raise exceptions. Furthermore, the F# type system doesn't enforce a strict distinction between expressions that perform these actions and expressions that don't.
Programming with side effects is called imperative programming. This chapter looks more closely at a number of constructs related to imperative programming. It describes how to use loops, mutable data, arrays, and some common input/output techniques.
If your primary programming experience has been with an imperative language such as C, C#, or Java, you may initially find yourself using imperative constructs fairly frequently in F#. However, over time, F# programmers generally learn how to perform many routine programming tasks within the side-effect-free subset of the language. F# programmers tend to use side effects in the following situations:
When scripting and prototyping using F# Interactive
When working with .NET library components that use side effects heavily, such as GUI libraries and I/O libraries
When initializing complex data structures
When using inherently imperative, efficient data structures such as hash tables and hash sets
When locally optimizing routines in a way that improves the performance of the functional version of the routine
When working with very large data structures or in scenarios where the allocation of data structures must be minimized for performance reasons
Some F# programmers don't use any imperative techniques except as part of the external wrapper for their programs. Adopting this form of pure functional programming for a time is an excellent way to hone your functional programming techniques.
Programming with fewer side effects is attractive for many reasons. For example, eliminating unnecessary side effects nearly always reduces the complexity of your code, so it leads to fewer bugs. Another thing experienced functional programmers appreciate is that the programmer or compiler can easily adjust the order in which expressions are computed. A lack of side effects also helps you reason about your code: it's easier to visually check when two programs are equivalent, and it's easier to make radical adjustments to your code without introducing new, subtle bugs. Programs that are free from side effects can often be computed on demand as necessary, often by making very small, local changes to your code to introduce the use of delayed data structures. Finally, side effects such as mutation are difficult to use when data is accessed concurrently from multiple threads, as you see in Chapter 13.
Three looping constructs are available to help simplify writing iterative code with side effects:
Simple for
loops: for var = start-expr to end-expr do expr
Simple while
loops: while expr do expr
Sequence loops: for pattern in expr do expr
All three constructs are for writing imperative programs, indicated partly by the fact that in all cases the body of the loop must have a return type of unit
. Note that unit
is the F# type that corresponds to void
in imperative languages such as C, and it has the single value ()
. The following sections cover these three constructs in more detail.
Simple for
loops are the most efficient way to iterate over integer ranges. This is illustrated here by a replacement implementation of the repeatFetch
function from Chapter 2:
let repeatFetch url n = for i = 1 to n do let html = http url printf "fetched <<< %s >>> " html printf "Done! "
This loop is executed for successive values of i
over the given range, including both start and end indexes.
The second looping construct is a while
loop, which repeats until a given guard is false. For example, here is a way to keep your computer busy until the weekend:
open System let loopUntilSaturday() = while (DateTime.Now.DayOfWeek <> DayOfWeek.Saturday) do printf "Still working! " printf "Saturday at last! "
When executing this code in F# Interactive, you can interrupt its execution by using Ctrl+C.
As discussed in Chapter 3, any values compatible with the type seq<type>
can be iterated using the for pattern in seq do ...
construct. The input seq
may be an F# list value, any seq<type>
, or a value of any type supporting a GetEnumerator
method. Here are some simple examples:
> for (b,pj) in [ ("Banana 1",true); ("Banana 2",false) ] do
if pj then printfn "%s is in pyjamas today!" b;;
Banana 1 is in pyjamas today!
The following example iterates the results of a regular expression match. The type returned by the .NET method System.Text.RegularExpressions.Regex.Matches
is a MatchCollection
, which for reasons known best to the .NET designers doesn't directly support the seq<Match>
interface. It does, however, support a GetEnumerator
method that permits iteration over the individual results of the operation, each of which is of type Match
; the F# compiler inserts the conversions necessary to view the collection as a seq<Match>
and perform the iteration. You learn more about using the .NET Regular Expression library in Chapter 10:
> open System.Text.RegularExpressions;; > for m in (Regex.Matches("All the Pretty Horses","[a-zA-Z]+")) do printf "res = %s " m.Value;;res = All
res = the
res = Pretty
res = Horses
The simplest mutable data structures in F# are mutable records. In Chapter 3, you saw some simple examples of immutable records. A record is mutable if one or more of its fields is labeled mutable
. This means record fields can be updated using the <-
operator: that is, the same syntax used to set a property. Mutable fields are generally used for records that implement the internal state of objects, discussed in Chapters 6 and 7.
For example, the following code defines a record used to count the number of times an event occurs and the number of times the event satisfies a particular criterion:
type DiscreteEventCounter = { mutable Total: int; mutable Positive: int; Name : string } let recordEvent (s: DiscreteEventCounter) isPositive = s.Total <- s.Total+1 if isPositive then s.Positive <- s.Positive+1 let reportStatus (s: DiscreteEventCounter) = printfn "We have %d %s out of %d" s.Positive s.Name s.Total let newCounter nm = { Total = 0; Positive = 0; Name = nm }
You can use this type as follows (this example uses the http
function from Chapter 2):
let longPageCounter = newCounter "long page(s)" let fetch url = let page = http url recordEvent longPageCounter (page.Length > 10000) page
Every call to the function fetch
mutates the mutable record fields in the global variable longPageCounter
. For example:
> fetch "http://www.smh.com.au" |> ignore;;val it : unit = ()
> fetch "http://www.theage.com.au" |> ignore;;val it : unit = ()
> reportStatus longPageCounter;;We have 1 long page(s) out of 2
val it : unit = ()
Record types can also support members (for example, properties and methods) and give implicit implementations of interfaces, discussed in Chapter 6. Practically speaking, this means you can use them as one way to implement object-oriented abstractions.
One particularly useful mutable record is the general-purpose type of mutable reference cells, or ref cells for short. These often play much the same role as pointers in other imperative programming languages. You can see how to use mutable reference cells in the following example:
> let cell1 = ref 1;;val cell1 : int ref = {contents = 1;}
> !cell1;;val it : int = 1
> cell1 := 3;;val it : unit = ()
> cell1;;val it : int ref = {contents = 3;}
> !cell1;;val it : int = 3
The key type is 'T ref
, and its main operators are ref
, !
, and :=
. The types of these operators are as follows:
val ref : 'T -> 'T ref
val (:=) : 'T ref -> 'T -> unit
val (!) : 'T ref -> 'T
These allocate a reference cell, mutate the cell, and read the cell, respectively. The operation cell1 := 3
is the key one; after this operation, the value returned by evaluating the expression !cell1
is changed. You can also use either the contents
field or the Value
property to access the value of a reference cell.
Both the 'T ref
type and its operations are defined in the F# library as simple record data structures with a single mutable field:
type 'T ref = { mutable contents: 'T } let (!) r = r.contents let (:=) r v = r.contents <- v let ref v = { contents = v }
The type 'T ref
is a synonym for a type Microsoft.FSharp.Core.Ref<'T>
defined in this way.
Like all mutable data structures, two mutable record values or two values of type 'T ref
may refer to the same reference cell—this is called aliasing. Aliasing of immutable data structures isn't a problem; no client consuming or inspecting the data values can detect that the values have been aliased. However, aliasing of mutable data can lead to problems in understanding code. In general, it's good practice to ensure that no two values currently in scope directly alias the same mutable data structures. The following example continues from earlier and shows how an update to cell1
can affect the value returned by !cell2
:
> let cell2 = cell1;;val cell2 : int ref = {contents = 3;}
> !cell2;;val it : int = 3
> cell1 := 7;;val it : unit = ()
> !cell2;;val it : int 7
Mutable data is often hidden behind an encapsulation boundary. Chapter 7 looks at encapsulation in more detail, but one easy way to do this is to make data private to a function. For example, the following shows how to hide a mutable reference within the inner closure of values referenced by a function value:
let generateStamp = let count = ref 0 (fun () -> count := !count + 1; !count)
val generateStamp: unit -> int
The line let count = ref 0
is executed once, when the generateStamp
function is defined. Here is an example of the use of this function:
> generateStamp();;val it : int = 1
> generateStamp();;val it : int = 2
This is a powerful technique for hiding and encapsulating mutable state without resorting to writing new type and class definitions. It's good programming practice in polished code to ensure that all related items of mutable state are collected under some named data structure or other entity such as a function.
You saw in the previous section that mutable references must be explicitly dereferenced. F# also supports mutable locals that are implicitly dereferenced. These must either be top-level definitions or be local variables in a function:
> let mutable cell1 = 1;;val mutable cell1 : int = 1
> cell1;;val it : int = 1
> cell1 <- 3;;val it : unit = ()
> cell1;;val it : int = 3
The following shows how to use a mutable local:
let sum n m = let mutable res = 0 for i = n to m do res <- res + i res
> sum 3 6;;
val it : int = 18
F# places strong restrictions on the use of mutable locals. In particular, unlike mutable references, mutable locals are guaranteed to be stack-allocated values, which is important in some situations because the .NET garbage collector won't move stack values. As a result, mutable locals may not be used in any inner lambda expressions or other closure constructs, with the exception of top-level mutable values, which can be used anywhere, and mutable fields of records and objects, which are associated with the heap allocated objects themselves. You learn more about mutable object types in Chapter 6. Reference cells and types containing mutable fields can be used instead to make the existence of heap-allocated imperative state obvious.
Mutable arrays are a key data structure used as a building block in many high-performance computing scenarios. The following example illustrates how to use a one-dimensional array of double
values:
> let arr = [| 1.0; 1.0; 1.0 |];;val arr : float[]
> arr.[1];;val it : float = 1.0
> arr.[1] <- 3.0;;val it : unit = ()
> arr;;val it : float[] = [| 1.0; 3.0; 1.0 |]
F# array values are usually manipulated using functions from the Array
module; its full path is Microsoft.FSharp.Collections.Array
, but you can access it with the short name Array
. Arrays are created either by using the creation functions in that module (such as Array.init
, Array.create
, and Array.zeroCreate
) or by using sequence expressions, as discussed in Chapter 3. Some useful methods are also contained in the System.Array
class. Table 4-1 shows some common functions from the Array
module.
Table 4.1. Some Important Functions and Aggregate Operators from the Array
Module
Operator | Type | Explanation |
---|---|---|
|
| Returns a new array containing elements of the first array followed by elements of the second array |
|
| Returns a new array containing a portion of elements of the input array |
|
| Returns a copy of the input array |
|
| Applies a function to all elements of the input array |
|
| Returns a new array containing a selection of elements of the input array |
|
| Returns the length of the input array |
|
| Returns a new array containing the results of applying the function to each element of the input array |
|
| Accumulates left to right over the input array |
|
| Accumulates right to left over the input array |
F# arrays can be very large, up to the memory limitations of the machine (a 3GB limit applies on 32-bit systems). For example, the following creates an array of 100 million elements (of total size approximately 400MB for a 32-bit machine):
> let (r : int[]) = Array.zeroCreate 100000000;;
val r : int [] = ...
The following attempt to create an array more than 4GB in size causes an OutOfMemoryException
on one of our machines:
> let (r : int[]) = Array.zeroCreate 1000000000;;System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException'
was thrown.
Arrays of value types (such as int
, single
, double
, int64)
are stored flat, so only one object is allocated for the entire array. Arrays of other types are stored as an array of object references. Primitive types such as integers and floating-point numbers are all value types; many other .NET types are also value types. The .NET documentation indicates whether each type is a value type or not. Often, the word struct is used for value types. You can also define new struct types directly in F# code, as discussed in Chapter 6. All other types in F# are reference types, such as all record, tuple, discriminated union, and class and interface values.
In Chapter 3, you saw in passing that you can use sequence expressions as a way to generate interesting array values. For example:
> let arr = [| for i in 0 .. 5 -> (i,i*i) |];;val arr : (int * int) [] =
[|(0, 0); (1, 1); (2, 4); (3, 9); (4, 16); (5, 25)|]
You can also use a convenient syntax for extracting subarrays from existing arrays; this is called slice notation. A slice expression for a single-dimensional array has the form arr.[start..finish]
, where one of start
and finish
may optionally be omitted, and index zero or the index of the last element of the array is assumed instead. For example:
> let arr = [| for i in 0 .. 5 -> (i,i*i) |];;val arr : (int * int) [] =
[|(0, 0); (1, 1); (2, 4); (3, 9); (4, 16); (5, 25)|]
> arr.[1..3];;val it : (int * int) [] = [| (1, 1); (2, 4); (3, 9); |]
> arr.[..2];;val it : (int * int) [] = [| (0, 0); (1, 1); (2, 4); |]
> arr.[3..];;val it : (int * int) [] = [| (3, 9); (4, 16); (5, 25) |]
Slicing syntax is used extensively in the example "Verifying Circuits with Propositional Logic" in Chapter 12. You can also use slicing syntax with strings and several other F# types such as vectors and matrices, and the operator can be overloaded to work with your own type definitions. The F# library definitions of vectors and matrices can be used as a guide.
Slices on arrays generate fresh arrays. Sometimes it's more efficient to use other techniques, such as accessing the array via an accessor function or object that performs one or more internal index adjustments before looking up the underlying array. If you add support for the slicing operators to your own types, you can choose whether they return copies of data structures or an accessor object.
Like other .NET languages, F# directly supports two-dimensional array values that are stored flat: that is, where an array of dimensions (N, M) is stored using a contiguous array of N * M elements. The types for these values are written using [,]
, such as in int[,]
and double[,]
, and these types also support slicing syntax. Values of these types are created and manipulated using the values in the Array2D
module. Likewise, there is a module for manipulating three-dimensional array values whose types are written int[,,]
. You can also use the code in those modules as a template for defining code to manipulate arrays of higher dimension.
The .NET Framework comes equipped with an excellent set of imperative collections under the namespace System.Collections.Generic
. You've seen some of these already. The following sections look at some simple uses of these collections.
As mentioned in Chapter 3, the .NET Framework comes with a type System.Collections.Generic.List<'T>
, which, although named List
, is better described as a resizeable array. The F# library includes the following type abbreviation for this purpose:
type ResizeArray<'T> = System.Collections.Generic.List<'T>
Here is a simple example of using this data structure:
> let names = new ResizeArray<string>();;val names : ResizeArray<string>
> for name in ["Claire"; "Sophie"; "Jane"] do names.Add(name);;val it : unit = ()
> names.Count;;val it : int = 3
> names.[0];;val it : string = "Claire"
> names.[1];;val it : string = "Sophie"
> names.[2];;val it : string = "Jane"
Resizable arrays use an underlying array for storage and support constant-time random-access lookup. In many situations, this makes a resizable array more efficient than an F# list, which supports efficient access only from the head (left) of the list. You can find the full set of members supported by this type in the .NET documentation. Commonly used properties and members include Add
, Count
, ConvertAll
, Insert
, BinarySearch
, and ToArray
. A module ResizeArray
is included in the F# library; it provides operations over this type in the style of the other F# collections.
Like other .NET collections, values of type ResizeArray<'T>
support the seq<'T>
interface. There is also an overload of the new
constructor for this collection type that lets you specify initial values via a seq<'T>
. This means you can create and consume instances of this collection type using sequence expressions:
> let squares = new ResizeArray<int>(seq { for i in 0 .. 100 -> i*i });;val squares : ResizeArray<int>
> for x in squares do printfn "square: %d" x;;square: 0
square: 1
square: 4
square: 9
...
The type System.Collections.Generic.Dictionary<'Key,'Value>
is an efficient hash-table structure that is excellent for storing associations between values. The use of this collection from F# code requires a little care, because it must be able to correctly hash the key type. For simple key types such as integers, strings, and tuples, the default hashing behavior is adequate. Here is a simple example:
> open System.Collections.Generic;; > let capitals = new Dictionary<string, string>(HashIdentity.Structural);;val capitals : Dictionary<string,string> = dict []
> capitals.["USA"] <- "Washington";;val it : unit = ()
> capitals.["Bangladesh"] <- "Dhaka";;val it : unit = ()
> capitals.ContainsKey("USA");;val it : bool = true
> capitals.ContainsKey("Australia");;val it : bool = false
> capitals.Keys;;val it : KeyCollection<string,string> = seq["USA"; "Bangladesh"]
> capitals.["USA"];;val it : string = "Washington"
Dictionaries are compatible with the type seq<KeyValuePair<'key,'value>>
, where KeyValuePair
is a type from the System.Collections.Generic
namespace and simply supports the properties Key
and Value
. Armed with this knowledge, you can use iteration to perform an operation for each element of the collection:
> for kvp in capitals do printf "%s has capital %s " kvp.Key kvp.Value;;USA has capital Washington
Bangladesh has capital Dhaka
val it : unit = ()
The Dictionary
method TryGetValue
is of special interest because its use from F# is a little nonstandard. This method takes an input value of type 'Key
and looks it up in the table. It returns a bool
indicating whether the lookup succeeded: true
if the given key is in the dictionary and false
otherwise. The value itself is returned via a .NET idiom called an out parameter. From F# code, three ways of using .NET methods rely on out parameters:
Here's how you do it using a mutable local:
open System.Collections.Generic let lookupName nm (dict : Dictionary<string,string>) = let mutable res = "" let foundIt = dict.TryGetValue(nm, &res) if foundIt then res else failwithf "Didn't find %s" nm
The use of a reference cell can be cleaner. For example:
> let res = ref "";;val res: string ref = {contents = "";}
> capitals.TryGetValue("Australia", res);;val it: bool = false
> capitals.TryGetValue("USA", res);;val it: bool = true
> res;;val it: string ref = {contents = "Washington"}
Finally, here is the technique where you don't pass the final parameter, and instead the result is returned as part of a tuple:
> capitals.TryGetValue("Australia");;val it: bool * string = (false, null)
> capitals.TryGetValue("USA");;val it: bool * string = (true, "Washington")
Note that the value returned in the second element of the tuple may be null
if the lookup fails when this technique is used. null
values are discussed in the section "Working with null Values" at the end of this chapter.
You can use dictionaries with compound keys such as tuple keys of type (int * int)
. If necessary, you can specify the hash function used for these values when creating the instance of the dictionary. The default is to use generic hashing, also called structural hashing, a topic covered in more detail in Chapter 8. If you want to indicate this explicitly, you do so by specifying Microsoft.FSharp.Collections. HashIdentity.Structural
when creating the collection instance. In some cases, this can also lead to performance improvements, because the F# compiler often generates a hashing function appropriate for the compound type.
Here is an example that uses a dictionary with a compound key type to represent sparse maps:
> open System.Collections.Generic;;> open Microsoft.FSharp.Collections;;
> let sparseMap = new Dictionary<(int * int), float>();;val sparseMap : Dictionary <(int * int),float> = dict []
> sparseMap.[(0,2)] <- 4.0;;val it : unit = ()
> sparseMap.[(1021,1847)] <- 9.0;;val it : unit = ()
> sparseMap.Keys;;val it : Dictionary.KeyCollection<(int * int),float> = seq [(0,2); (1021; 1847)]
Some of the other important mutable data structures in the F# and .NET libraries are as follows:
System.Collections.Generic.SortedList<'Key,'Value>
: A collection of sorted values. Searches are done by a binary search. The underlying data structure is a single array.
System.Collections.Generic.SortedDictionary<'Key,'Value>
: A collection of key/value pairs sorted by the key, rather than hashed. Searches are done by a binary chop. The underlying data structure is a single array.
System.Collections.Generic.Stack<'T>
: A variable-sized last-in/first-out (LIFO) collection.
System.Collections.Generic.Queue<'T>
: A variable-sized first-in/first-out (FIFO) collection.
System.Text.StringBuilder
: A mutable structure for building string
values.
Microsoft.FSharp.Collections.HashSet<'Key>
: A hash table structure holding only keys and no values. From .NET 3.5, a HashSet<'T>
type is available in the System.Collections.Generic
namespace.
When a routine encounters a problem, it may respond in several ways, such as by recovering internally, emitting a warning, returning a marker value or incomplete result, or throwing an exception. The following code indicates how an exception can be thrown by some of the code you've been using:
> let req = System.Net.WebRequest.Create("not a URL");;System.UriFormatException: Invalid URI: The format of the URI could not be
determined.
Similarly, the GetResponse
method also used in the http
function may raise a System.Net.WebException
exception. The exceptions that may be raised by routines are typically recorded in the documentation for those routines. Exception values may also be raised explicitly by F# code:
> (raise (System.InvalidOperationException("not today thank you")) : unit);;
System.InvalidOperationException: not today thank you
In F#, exceptions are commonly raised using the F# failwith
function:
> if false then 3 else failwith "hit the wall";;
System.Exception: hit the wall
The types of some of the common functions used to raise exceptions are shown here:
val failwith : string -> 'T
val raise : System.Exception -> 'T
val failwithf : StringFormat<'T,'U> -> 'T
val invalidArg : string -> string -> 'T
Note that the return types of all these are generic type variables: the functions never return normally and instead return by raising an exception. This means they can be used to form an expression of any particular type and can be handy when you're drafting your code. For example, in the following example, we've left part of the program incomplete:
if (System.DateTime.Now > failwith "not yet decided") then printfn "you've run out of time!"
Table 4-2 shows some of the common exceptions that are raised by failwith
and other operations.
Table 4.2. Common Categories of Exceptions and F# Functions That Raise Them
Exception Type | F# Abbreviation | Description | Example |
---|---|---|---|
|
| General failure |
|
|
| Bad input |
|
| Integer divide by 0 |
| |
| Unexpected |
|
You can catch exceptions using the try ... with ...
language construct and :?
type-test patterns, which filter any exception value caught by the with
clause. For example:
> try
raise (System.InvalidOperationException ("it's just not my day"))
with
| :? System.InvalidOperationException -> printfn "caught!";;
caught!
Chapter 5 covers these patterns more closely. The following code sample shows how to use try ... with ...
to catch two kinds of exceptions that may arise from the operations that make up the http
method, in both cases returning the empty string ""
as the incomplete result. Note that try ... with ...
is just an expression, and it may return a result in both branches:
open System.IO let http(url: string) = try let req = System.Net.WebRequest.Create(url) let resp = req.GetResponse() let stream = resp.GetResponseStream() let reader = new StreamReader(stream) let html = reader.ReadToEnd() html with | :? System.UriFormatException -> "" | :? System.Net.WebException -> ""
When an exception is thrown, a value is created that records information about the exception. This value is matched against the earlier type-test patterns. It may also be bound directly and manipulated in the with
clause of the try ... with
constructs. For example, all exception values support the Message
property:
> try
raise (new System.InvalidOperationException ("invalid operation"))
with
| err -> printfn "oops, msg = '%s'" err.Message;;
oops, msg = 'invalid operation'
Exceptions may also be processed using the try ... finally ...
construct. This guarantees to run the finally
clause both when an exception is thrown and when the expression evaluates normally. This allows you to ensure that resources are disposed after the completion of an operation. For example, you can ensure that the web response from the previous example is closed as follows:
let httpViaTryFinally(url: string) = let req = System.Net.WebRequest.Create(url) let resp = req.GetResponse() try let stream = resp.GetResponseStream() let reader = new StreamReader(stream) let html = reader.ReadToEnd() html finally resp.Close()
In practice, you can use a shorter form to close and dispose of resources, simply by using a use
binding instead of a let
binding. This closes the response at the end of the scope of the resp
variable, a technique that is discussed in full in Chapter 8. Here is how the previous function looks using this form:
let httpViaUseBinding(url: string) = let req = System.Net.WebRequest.Create(url) use resp = req.GetResponse() let stream = resp.GetResponseStream() let reader = new StreamReader(stream) let html = reader.ReadToEnd() html
F# lets you define new kinds of exception objects that carry data in a conveniently accessible form. For example, here is a declaration of a new class of exceptions and a function that wraps http
with a filter that catches particular cases:
exception BlockedURL of string let http2 url = if url = "http://www.kaos.org" then raise(BlockedURL(url)) else http url
You can extract the information from F# exception values, again using pattern matching:
> try
raise(BlockedURL("http://www.kaos.org"))
with
| BlockedURL(url) -> printf "blocked! url = '%s'
" url;;
blocked! url = 'http://www.kaos.org'
Exception values are always subtypes of the F# type exn
, an abbreviation for the .NET type System.Exception
. The declaration exception BlockedURL of string
is shorthand for defining a new F# class type BlockedURLException
, which is a subtype of System.Exception
. Exception types can also be defined explicitly by defining new object types. Chapters 5 and 6 look more closely at object types and subtyping.
Table 4-3 summarizes the exception-related language and library constructs.
Table 4.3. Exception-Related Language and Library Constructs
Kind | Notes | |
---|---|---|
| F# library function | Raises the given exception |
| F# library function | Raises an |
| F# expression | Catches expressions matching the pattern rules |
| F# expression | Executes the |
| F# pattern rule | A rule matching the given .NET exception type |
| F# pattern rule | A rule matching the given .NET exception type and naming it as its stronger type |
| F# pattern rule | A rule matching the given data-carrying F# exception |
| F# pattern rule | A rule matching any exception, binding the name |
| F# pattern rule | A rule matching the exception under the given condition, binding the name |
Imperative programming and input/output are closely related topics. The following sections show some very simple I/O techniques using F# and .NET libraries.
The .NET types System.IO.File
and System.IO.Directory
contain a number of simple functions to make working with files easy. For example, here's a way to output lines of text to a file:
> open System.IO;;
> File.WriteAllLines("test.txt", [| "This is a test file.";
"It is easy to read." |]);;
val it : unit = ()
Many simple file-processing tasks require reading all the lines of a file. You can do this by reading all the lines in one action as an array using System.IO.File.ReadAllLines
:
> open System.IO;;
> File.ReadAllLines("test.txt");;
val it : string [] = [| "This is a test file."; "It is easy to read." |]
If necessary, the entire file can be read as a single string using System.IO.File.ReadAllText
:
> File.ReadAllText("test.txt");;
val it : string = "This is a test file.
It is easy to read
"
You can also use the results of System.IO.File.ReadAllLines
as part of a list or sequence defined using a sequence expression:
> [ for line in File.ReadAllLines("test.txt") do
let words = line.Split [| ' ' |]
if words.Length > 3 && words.[2] = "easy" then
yield line ];;
val it : string list = [| "It is easy to read." |]
The .NET namespace System.IO
contains the primary .NET types for reading/writing bytes and text to/from data sources. The primary output constructs in this namespace are as follows:
System.IO.BinaryWriter
: Writes primitive data types as binary values. Create using new BinaryWriter(stream)
. You can create output streams using File.Create(filename)
.
System.IO.StreamWriter
: Writes textual strings and characters to a stream. The text is encoded according to a particular Unicode encoding. Create by using new StreamWriter(stream)
and its variants or by using File.CreateText(filename)
.
System.IO.StringWriter
: Writes textual strings to a StringBuilder
, which eventually can be used to generate a string.
Here is a simple example of using System.IO.File.CreateText
to create a StreamWriter
and write two strings:
> let outp = File.CreateText("playlist.txt");;val outp : StreamWriter
> outp.WriteLine("Enchanted");;val it : unit = ()
> outp.WriteLine("Put your records on");;
val it : unit = ()
> outp.Close();
These are the primary input constructs in the System.IO
namespace:
System.IO.BinaryReader
: Reads primitive data types as binary values. When reading the binary data as a string, it interprets the bytes according to a particular Unicode encoding. Create using new BinaryReader(stream)
.
System.IO.StreamReader
: Reads a stream as textual strings and characters. The bytes are decoded to strings according to a particular Unicode encoding. Create by using new StreamReader(stream)
and its variants or by using File.OpenText(filename)
.
System.IO.StringReader
: Reads a string as textual strings and characters.
Here is a simple example of using System.IO.File.OpenText
to create a StreamReader
and read two strings:
> let inp = File.OpenText("playlist.txt");;val inp : StreamReader
> inp.ReadLine();;val it : string = "Enchanted"
> inp.ReadLine();;val it : string = "Put your records on"
> inp.Close();;val it : unit = ()
Whenever you create objects such as a StreamReader
that have a Close
or Dispose
operation or that implement the IDisposable
interface, you should consider how to eventually close or otherwise dispose of the resource. We discuss this later in this chapter and in Chapter 8.
The System.IO
namespace contains a number of other types, all of which are useful for corner cases of advanced I/O but that you won't need to use from day to day. For example, the following abstractions appear in the .NET documentation:
System.IO.TextReader
: Reads textual strings and characters from an unspecified source. This is the common functionality implemented by the StreamReader
and StringReader
types and the System.Console.In
object. The latter is used to access the stdin
input.
System.IO.TextWriter
: Writes textual strings and characters to an unspecified output. This is the common functionality implemented by the StreamWriter
and StringWriter
types and the System.Console.Out
and System.Console.Error
objects. The latter are used to access the stdout
and stderr
output streams.
System.IO.Stream
: Provides a generic view of a sequence of bytes.
Some functions that are generic over different kinds of output streams make use of these; for example, the formatting function twprintf
discussed in the section "Using printf
and Friends" writes to any System.IO.TextWriter
.
Some simple input/output routines are provided in the System.Console
class. For example:
> System.Console.WriteLine("Hello World");;Hello World
> System.Console.ReadLine();;<enter "I'm still here" here>
val it : string = "I'm still here"
The System.Console.Out
object can also be used as a TextWriter
.
Throughout this book, you've used the printfn
function, which is one way to print strings from F# values. This is a powerful, extensible technique for type-safe formatting. A related function called sprintf
builds strings:
> sprintf "Name: %s, Age: %d" "Anna" 3;;
val it : string = "Name: Anna, Age: 3"
The format strings accepted by printf
and sprintf
are recognized and parsed by the F# compiler, and their use is statically type checked to ensure the arguments given for the formatting holes are consistent with the formatting directives. For example, if you use an integer where a string is expected, you see a type error:
> sprintf "Name: %s, Age: %d" 3 10;; ------------------------------^error: FS0001: This expression was expected to have type string but here
has type int
Several printf
-style formatting functions are provided in the Microsoft.FSharp.Text.Printf
module. Table 4-4 shows the most important of these.
Table 4.4. Formatting Functions in the Printf
Module
Function(s) | Outputs via Type | Outputs via Object | Example |
---|---|---|---|
|
|
|
|
|
|
|
|
|
| Any |
|
|
| Generates strings |
|
|
| Any |
|
[a] The functions with a suffix n add a new line to the generated text. |
Table 4-5 shows the basic formatting codes for printf
-style formatting.
Table 4.5. Formatting Codes for printf
-style String and Output Formatting
Code | Type Accepted | Notes |
---|---|---|
|
| Prints |
|
| Prints the string |
|
| Decimal/hex/octal format for any integer types |
|
| Floating-point formats |
|
| See the .NET documentation |
| Any type | Uses structured formatting, discussed in the section "Generic Structural Formatting" and in Chapter 5 |
| Any type | Uses |
| Any type | Takes two arguments: one is a formatting function, and one is the value to format |
| Function | Runs the function given as an argument |
Any value can be formatted using a %O
or %A
pattern; these patterns are extremely useful when you're prototyping or examining data. %O
converts the object to a string using the Object.ToString()
function supported by all values. For example:
> System.DateTime.Now.ToString();;val it : string = "28/06/20.. 17:14:07 PM"
> sprintf "It is now %O" System.DateTime.Now;;val it : string = "It is now 28/06/20... 17:14:09"
The format strings used with printf
are scanned by the F# compiler during type checking, which means the use of the formats are type-safe; if you forget arguments, a warning is given, and if your arguments are of the wrong type, an error is given. The format strings may also include the usual range of specifiers for padding and alignment used by languages such as C, as well as some other interesting specifiers for computed widths and precisions. You can find the full details in the F# library documentation for the Printf
module.
Object.ToString()
is a somewhat undirected way of formatting data. Structural types such as tuples, lists, records, discriminated unions, collections, arrays, and matrices are often poorly formatted by this technique. The %A
pattern uses .NET reflection to format any F# value as a string based on the structure of the value. For example:
> printf "The result is %A
" [1;2;3];;
"The result is [1; 2; 3]"
Generic structural formatting can be extended to work with any user-defined data types, a topic covered on the F# web site. This is covered in detail in the F# library documentation for the printf
function.
Many constructs in the System.IO
namespace need to be closed after use, partly because they hold on to operating system resources such as file handles. You can ignore this issue when prototyping code in F# Interactive. However, as we touched on earlier in this chapter, in more polished code, you should use language constructs such as use var = expr
to ensure that the resource is closed at the end of the lexical scope where a stream object is active. For example:
let myWriteStringToFile () = use outp = File.CreateText(@"playlist.txt") outp.WriteLine("Enchanted") outp.WriteLine("Put your records on")
This is equivalent to the following:
let myWriteStringToFile () = using (File.CreateText(@"playlist.txt")) (fun outp -> outp.WriteLine("Enchanted") outp.WriteLine("Put your records on"))
where the function using
has the following definition in the F# library:
let using (ie : #System.IDisposable) f = try f(ie) finally ie.Dispose()
use
and using
ensure that the underlying stream is closed deterministically and the operating system resources are reclaimed when the lexical scope is exited. This happens regardless of whether the scope is exited because of normal termination or because of an exception. Chapter 8 covers the language construct use
, the operator using
, and related issues in more detail.
If you don't use using
or otherwise explicitly close the stream, the stream is closed when the stream object is finalized by the .NET garbage collector. However, it's generally bad practice to rely on finalization to clean up resources this way, because finalization isn't guaranteed to happen in a deterministic, timely fashion.
The keyword null
is used in imperative programming languages as a special, distinguished value of a type that represents an uninitialized value or some other kind of special condition. In general, null
isn't used in conjunction with types defined in F# code, although it's common to simulate null
with a value of the option
type. For example:
> let parents = [("Adam",None); ("Cain",Some("Adam","Eve"))];;
val parents : (string * (string * string) option) list = ...
However, reference types defined in other .NET languages do support null
; when using .NET APIs, you may have to explicitly pass null
values to the API and also, where appropriate, test return values for null
. The .NET Framework documentation specifies when null
may be returned from an API. It's recommended that you test for this condition using null
pattern tests. For example:
match System.Environment.GetEnvironmentVariable("PATH") with | null -> printf "the environment variable PATH is not defined " | res -> printf "the environment variable PATH is set to %s " res
The following is a function that incorporates a pattern type test and a null
-value test:
let switchOnType (a:obj) = match a with | null -> printf "null!" | :? System.Exception as e -> printf "An exception: %s!" e.Message | :? System.Int32 as i -> printf "An integer: %d!" i | :? System.DateTime as d -> printf "A date/time: %O!" d | _ -> printf "Some other kind of object "
There are other important sources of null
values. For example, the semisafe function Array.zeroCreate
creates an array whose values are initially null
or, in the case of value types, an array each of whose entries is the zero bit pattern. This function is included with F# primarily because there is no other alternative technique to initialize and create the array values used as building blocks of larger, more sophisticated data structures such as queues and hash tables. Of course, you must use this function with care, and in general you should hide the array behind an encapsulation boundary and be sure the values of the array aren't referenced before they're initialized.
Although F# generally enables you to code in a null
-free style, F# isn't totally immune to the potential existence of null
values: they can come from the .NET APIs, and it's also possible to use Array.zeroCreate
and other back-door techniques to generate null
values for F# types. If necessary, APIs can check for this condition by first converting F# values to the obj
type by calling box
and then testing for null
(see the F# Informal Language Specification for full details). But in practice, this isn't required by the vast majority of F# programs; for most purposes, the existence of null
values can be ignored.
F# stems from a tradition in programming languages where the emphasis has been on declarative and functional approaches to programming in which state is made explicit, largely by passing extra parameters. Many F# programmers use functional programming techniques first before turning to their imperative alternatives, and we encourage you to do the same, for all the reasons listed at the start of this chapter.
However, F# also integrates imperative and functional programming together in a powerful way. F# is actually an extremely succinct imperative programming language! Furthermore, in some cases, no good functional techniques exist to solve a problem, or those that do are too experimental for production use. This means that in practice, using imperative constructs and libraries is common in F#: for example, many of the examples you saw in Chapters 2 and 3 used side effects to report their results or to create GUI components.
Regardless, we still encourage you to think functionally, even about your imperative programming. In particular, it's always helpful to be aware of the potential side effects of your overall program and the characteristics of those side effects. The following sections describe five ways to help tame and reduce the use of side effects in your programs.
When imperative programmers begin to use F#, they frequently use mutable local variables or reference cells heavily as they translate code fragments from their favorite imperative language into F#. The resulting code often looks very bad. Over time, they learn to avoid many uses of mutable locals. For example, consider the following (naive) implementation of factorization, transliterated from C code:
let factorizeImperative n = let mutable primefactor1 = 1 let mutable primefactor2 = n let mutable i = 2 let mutable fin = false while (i < n && not fin) do if (n % i = 0) then primefactor1 <- i primefactor2 <- n / i fin <- true i <- i + 1 if (primefactor1 = 1) then None else Some (primefactor1, primefactor2)
This code can be replaced by the following use of an inner recursive function:
let factorizeRecursive n = let rec find i = if i >= n then None elif (n % i = 0) then Some(i,n / i) else find (i+1) find 2
The second code is not only shorter but also uses no mutation, which makes it easier to reuse and maintain. You can also see that the loop terminates (i
is increasing toward n
) and see the two exit conditions for the function (i >= n
and n % i = 0
). Note that the state i
has become an explicit parameter.
Where possible, separate out as much of your computation as possible using side-effect-free functional programming. For example, sprinkling printf
expressions throughout your code may make for a good debugging technique but, if not used wisely, can lead to code that is difficult to understand and inherently imperative.
A common technique of object-oriented programming is to ensure that mutable data structures are private, nonescaping, and, where possible, fully separated, which means there is no chance that distinct pieces of code can access each other's internal state in undesirable ways. Fully separated state can even be used inside the implementation of what, to the outside world, appears to be a purely functional piece of code.
For example, where necessary, you can use side effects on private data structures allocated at the start of an algorithm and then discard these data structures before returning a result; the overall result is then effectively a side-effect-free function. One example of separation from the F# library is the library's implementation of List.map
, which uses mutation internally; the writes occur on an internal, separated data structure that no other code can access. Thus, as far as callers are concerned, List.map
is pure and functional. The following is a second example that divides a sequence of inputs into equivalence classes (the F# library function Seq.groupBy
does a similar thing):
open System.Collections.Generic let divideIntoEquivalenceClasses keyf seq = // The dictionary to hold the equivalence classes let dict = new Dictionary<'key,ResizeArray<'T>>() // Build the groupings seq |> Seq.iter (fun v -> let key = keyf v let ok,prev = dict.TryGetValue(key) if ok then prev.Add(v) else let prev = new ResizeArray<'T>() dict.[key] <- prev prev.Add(v)) // Return the sequence-of-sequences. Don't reveal the // internal collections: just reveal them as sequences dict |> Seq.map (fun group -> group.Key, Seq.readonly group.Value)
This uses the Dictionary
and ResizeArray
mutable data structures internally, but these mutable data structures aren't revealed externally. The inferred type of the overall function is as follows:
val divideIntoEquivalenceClasses : ('T -> 'key) -> seq<'T> -> seq<'key * seq<'T>>
Here is an example use:
> divideIntoEquivalenceClasses (fun n -> n % 3) [ 0 .. 10 ];;
val it : seq<int * seq<int>>
= seq [(0, seq [0; 3; 6; 9]); (1, seq [1; 4; 7; 10]); (2, seq [2; 5; 8])]
It's often helpful to use the weakest set of side effects necessary to achieve your programming task and at least be aware when you're using strong side effects:
Weak side effects are effectively benign given the assumptions you're making about your application. For example, writing to a log file is very useful and is essentially benign (if the log file can't grow arbitrarily large and crash your machine!). Similarly, reading data from a stable, unchanging file store on a local disk is effectively treating the disk as an extension of read-only memory, so reading these files is a weak form of side effect that isn't difficult to incorporate into your programs.
Strong side effects have a much more corrosive effect on the correctness and operational properties of your program. For example, blocking network I/O is a relatively strong side effect by any measure. Performing blocking network I/O in the middle of a library routine can have the effect of destroying the responsiveness of a GUI application, at least if the routine is invoked by the GUI thread of an application. Any constructs that perform synchronization between threads are also a major source of strong side effects.
Whether a particular side effect is stronger or weaker depends very much on your application and whether the consequences of the side effect are sufficiently isolated and separated from other entities. Strong side effects can and should be used freely in the outer shell of an application or when you're scripting with F# Interactive; otherwise, not much can be achieved.
When you're writing larger pieces of code, you should write your application and libraries in such a way that most of your code either doesn't use strong side effects or at least makes it obvious when these side effects are being used. Threads and concurrency are commonly used to mediate problems associated with strong side effects; Chapter 14 covers these issues in more depth.
It's generally thought to be bad style to combine delayed computations (that is, laziness) and side effects. This isn't entirely true; for example, it's reasonable to set up a read from a file system as a lazy computation using sequences. However, it's relatively easy to make mistakes in this sort of programming. For example, consider the following code:
open System.IO let reader1, reader2 = let reader = new StreamReader(File.OpenRead("test.txt")) let firstReader() = reader.ReadLine() let secondReader() = reader.ReadLine()// Note: we close the stream reader here!
// But we are returning function values which use the reader
// This is very bad!
reader.Close() firstReader, secondReader// Note: stream reader is now closed! The next line will fail!
let firstLine = reader1() let secondLine = reader2() firstLine, secondLine
This code is wrong because the StreamReader
object reader
is used after the point indicated by the comment. The returned function values are then called, and they try to read from the captured variable reader
. Function values are just one example of delayed computations: other examples are lazy values, sequences, and any objects that perform computations on demand. Be careful not to build delayed objects such as reader
that represent handles to transient, disposable resources, unless those objects are used in a way that respects the lifetime of that resource.
The previous code can be corrected to avoid using laziness in combination with a transient resource:
open System.IO let line1, line2 = let reader = new StreamReader(File.OpenRead("test.txt")) let firstLine = reader.ReadLine() let secondLine = reader.ReadLine() reader.Close() firstLine, secondLine
Another technique uses language and/or library constructs that tie the lifetime of an object to some larger object. For example, you can use a use
binding within a sequence expression, which augments the sequence object with the code needed to clean up the resource when iteration is finished or terminates. This technique is discussed further in Chapter 8 and shown by example here:
let reader = seq { use reader = new StreamReader(File.OpenRead("test.txt")) while not reader.EndOfStream do yield reader.ReadLine() }
The general lesson is to try to keep your core application pure. Use both delayed computations (laziness) and imperative programming (side effects) where appropriate, but be careful about using them together.
In this chapter, you learned how to do imperative programming in F#, from some of the basic mutable data structures such as reference cells to working with side effects such as exceptions and I/O. You also looked at some general principles for avoiding the need for imperative programming and isolating your uses of side effects. The next chapter returns to some of the building blocks of both functional and imperative programming in F#, with a deeper look at types, type inference, and generics.