Of the three main glues of Haskell (HOFs, the type system, and laziness), laziness is different in that it is not a concrete thing in the language, but is instead related to the way the code will be evaluated in the runtime. Laziness is something that we have to know about rather than something we can always see in the code.
Of course, laziness does show in the language, for example, wherever we want to enforce strict evaluation, as with the sequence
function:
main = do –- lazy IO stream let ios = map putStrLn ["this", "won't", "run"] putStrLn "until ios is 'sequenced'..." sequence_ ios –- perform actions
Where:
sequence_ :: [IO ()] -> IO () sequence_ = foldr (>>) (return ())
The sequence_
function discards the action results because the (>>)
operator discards the results of the its firsts argument action.
In contrast, the sequence
function retains the results:
main = do h <- openFile "jabberwocky.txt" ReadMode line1 <- hGetLine h -– perform action let getLines = [hGetLine h, hGetLine h] [line2, line3] <- sequence getLines -- perform actions hClose h putStrLn line2
The line1
string is read eagerly with the first hGetLine h
action, while the line2
and line3
are only read when "sequenced".
While the hGetLine
function returns a strict string, the hGetContents
function returns a lazy string. Put another way, hGetContents
streams file contents on demand while hGetLine
reads the file eagerly:
main = do h <- openFile "jabberwocky.txt" ReadMode contents <- hGetContents h putStrLn (take 10 contents) -- lazily fetch 10 chars hClose h
Using lines
together with hGetContents
, we get a lazy stream of file lines:
lineStream h = hGetContents h >>= return . lines main = do h <- (openFile "jabberwocky.txt" ReadMode) lines' <- lineStream h sequence_ (map putStrLn lines') hClose h
The mapM
function captures the common pattern of mapping and sequencing:
main = do
h <- (openFile "jabberwocky.txt" ReadMode)
lines' <- lineStream h
mapM_ putStrLn lines'
hClose h
Here, mapM_ f
is equal to sequence_ (map f)
.
The forM_
function is just mapM_
with flipped arguments, which is useful when you want to pass a "trailing lambda":
main = do h <- (openFile "jabberwocky.txt" ReadMode) lines' <- lineStream h forM_ lines' $ line -> do let reversed = reverse line putStrLn reversed hClose h
When performing lazy I/O, we need to make the distinction between an I/O action and performing an I/O action. Also, we need to know the lazy/strict characteristics of the functions we are working with (for example, hGetLine
and hGetContents
).
Let's return to our imperative style I/O code from the previous section and rephrase it in the style of lazy I/O. We'll retain the Chunk
data type and the parseChunk
function from the imperative example:
data Chunk = Chunk {chunk :: String} | LineEnd {chunk :: String, remainder :: String} parseChunk :: String -> Chunk -– parseChunk (B8.pack "gimble in the wabe: All") -- gives: LineEnd "gimble in the wabe" -- "All"
To write this in a lazy I/O style, we'll start by defining a stream of file chunks:
-- import qualified Data.ByteString.Lazy as LB -- import qualified Data.ByteString.Lazy.Char8 as L8 chunkStream :: Handle -> IO [L8.ByteString] chunkStream h = do isEof <- hIsEOF h if isEof then return [] else do chunk <- LB.hGet h 8 rest <- (chunkStream h) return (chunk:rest)
Now we can produce a stream and consume it:
main = do chunks <- chunkStream h print $ take 10 chunks
The chunk stream produces data. Next, we write a consumer:
processChunk :: String -> [L8.ByteString] -> IO () processChunk acc [] = do putStrLn acc -- terminate recursion processChunk' acc (chunk:chunks) = case (parseChunk chunk) of (Chunk chunk') -> do processChunk' (acc ++ chunk') chunks (LineEnd chunk' remainder) -> do let line = acc ++ chunk' putStrLn line -- do something with line processChunk' remainder chunks processChunk = processChunk' ""
The processChunk
function recursively consumes our stream and accumulates chunks into lines. It is tail recursive and uses constant space:
main = do
h <- openFile "jabberwocky.txt" ReadMode
chunkStream h >>= processChunk
hClose h
We decoupled producer from the consumer and the consumer drives the materializing of the source stream.
In contrast, in the imperative example, the loop
function drives the iteration through the chunks. Also, in the imperative case, the consumer is not explicit.
The processChunk
method is an I/O action with side effects. As it loops through the file chunks (via recursion), it keeps accumulating chunks until it has captured a whole line.
Then it does some I/O putStrLn line
and starts accumulating chunks for the next line. Iteration and I/O processing are interleaved. We can decouple this further, by making a pure function, lineStream
, that produces a stream of lines:
lineStream accChunks [] = [accChunks] lineStream accChunks (chunk:chunks) = case (parseChunk chunk) of (Chunk chunk') -> lineStream (accChunks ++ chunk') chunks (LineEnd chunk' remainder) -> (accChunks ++ chunk') : (lineStream remainder chunks) toLines = lineStream ""
Now we can feed chunkStream
to the toLines
function:
main = do h <- openFile "jabberwocky.txt" ReadMode lines' <- liftM toLines (chunkStream h) mapM_ putStrLn lines' hClose h
The toLines
is a pure function of its input stream while chunkStream
is a stream wrapped in an I/O monad. This is why we need liftM
to lift toLines into the Monad
. We could have said this instead:
chunks <- (chunkStream h) let lines' = toLines chunks
It is only when we do mapM putStrLn lines'
that the stream toLines
starts to materialize, which in turn drives the evaluation of chunkStream
. When we attempt to print one line, the line is materialized, just in time, lazily. Underneath the mapM_
function, it is the sequence_
function that drives the evaluation of the stream.
This brings us back full circle to the lines
library function, which lazily returns lines from file chunks:
main = do
h <- openFile "jabberwocky.txt" ReadMode
lines' <- hGetContents h >>= return . Lines
-– vs
-- lines' <- liftM toLines (chunkStream h)
mapM putStrLn lines'
hClose h
In the preceding code, we composed pure functional streams with I/O streams using monadic operators and functions. This is the lazy I/O, the pure functional way for composable I/O. In the same way that we can often express functions as pipelines of simpler functions, the same is true for I/O. Many practical I/O can be modeled as processing pipelines of streams.
Let's try to describe the essence of Lazy I/O, starting with the advantages:
The disadvantages of Lazy I/O are that it provides:
Lazy evaluation is elegant and has far reaching implications. It is an integral part of functional programming that unfortunately does not translate to performing I/O.
Let's use the
hGetLine
function alongside hGetContents
:
main = do h <- openFile "jabberwocky.txt" ReadMode firstLine <- hGetLine h -- returns a string contents <- hGetContents h -- returns a "promise" hClose h -– close file print $ words firstLine print $ words contents
We close the file before consuming the firstLine
string and the contents
stream:
print $ words firstLine ["'Twas","brillig,","and","the","slithy","toves"] print $ words contents []
The contents
stream is a live stream that gets turned off when the file is closed. The firstLine
is an eager string and survives the closing of the file.
The preceding example points to some serious problems with lazy I/O:
print $ words contents
, all the file contents will be held in memory at once. We could have used hGetContents h >>= return . lines
to print the lines incrementally, thereby using only constant space. This shows that the space requirements are contextual to how the stream is used.Despite this, lazy I/O remains an attractive option in simple situations, where space requirements and order of execution are sufficiently predictable and where resource management is easy enough. However, when there is strong demand for precise resource management or predictable space usage, lazy I/O is not an option; for example, writing networking code, handling many files, or handling many HTTP requests in a web server, and so on.
"Extensive experience in Haskell has, however, exposed severe drawbacks of lazy evaluation, which are especially grievous for stream processing of large amounts of data.
Lazy evaluation is fundamentally incompatible with computational effects, can cause fatal memory leaks, and greatly inhibits modular reasoning, especially about termination and space consumption.
Seemingly innocuous and justified changes to the code or code compositions may lead to divergence, or explosion in memory consumption."
Lazy v. Yield: Incremental, Linear Pretty-printing - Kiselyov et al
Before we look at an established solution to the problem of lazy I/O (Iteratee I/O), we'll briefly explore the basic lazy I/O approach to resource management in the face of exceptions.
So far, we have been explicitly opening and closing files. This is what we call explicit resource management:
main = do h <- (openFile "jabberwocky.txt" ReadMode) useResource h hClose h where useResource h' = (stream h') >>= mapM_ putStrLn stream h' = hGetContents h' >>= return . lines
Let's look at some higher level abstractions to capture this pattern: open resource, use it, in some way clean up resource. The crudest solution is to just ignore the problem and rely on the garbage collector for the cleanup:
main = do
contents <- readFile "jabberwocky.txt"
mapM_ putStrLn (lines contents)
The readFile
function encapsulates the file handle, which is then garbage collected when the contents
is garbage collected or when it has been entirely consumed. This is very poor resource management!
It would be more idiomatic to use the wrapper function withFile
:
main = do withFile "jabberwocky.txt" ReadMode enumerateLines where enumerateLines h = lines' h >>= mapM_ putStrLn lines' h' = hGetContents h' >>= return . lines
The withFile
function cleanly decouples the producer from the consumer and gives better control over resource management. The file will be closed in case of completion or error because withFile
makes use of the bracket
function:
bracket
(openFile "filename" ReadMode) -- acquire resource
hClose -- release resource
(h -> "do some work")
where
bracket :: IO a -- before action -> (a -> IO b) -- after action -> (a -> IO c) -- do action -> IO c -- result
The bracket
function relies on higher order functions to express a specific kind of wrapper pattern: "acquire and release". For more information, visit https://wiki.haskell.org/Bracket_pattern.
The finally
function is a special form of bracket:
finally :: IO a -- some action -> IO b -- final action: runs afterwards -> IO b -- result
The bracket family of functions helps us clean up resources more reliably, but by no means definitively solves the problem of closing resources in a timely manner. If we need more precise resource management than this (and more predictable space requirements and ordering of effects), then we must use a more sophisticated pattern for stream programming called Iteratee I/O.