In this chapter, we seek a unified view of "Generic Programming", which comes in many guises; hence the statement,
"Genericity is in the eye of the beholder" | ||
--Jeremy Gibbons, Datatype-Generic Programming |
We start with a broad perspective by reviewing Jeremy Gibbons' patterns of generic programming—many of which we have already encountered. Then we shift focus to one of the patterns—datatype-generic programming—which is characterized by generic functions parameterized by the shape of the datatype instead of the content.
To get a taste of datatype-generic programming, we'll sample three basic approaches: the sum of products, origami programming, and scrap your boilerplate.
Along the way, we'll encounter a few exotic Haskell types (Typeable
and Data
; Bifunctor
and Fix
), reveal the underpinnings of Derivable type classes and also discover four Gang of Four design patterns in the heart of datatype-generic programming:
Jeremy Gibbons describes seven patterns of generic programming, where each pattern is viewed as a different kind of parameterization. For more information, refer to Datatype-Generic Programming (available at http://www.cs.ox.ac.uk/jeremy.gibbons/publications/dgp.pdf).
Functions parameterized by values are more general than functions with hardcoded values. This is the most simple kind of generic programming.
Functions parameterized by functions (higher order functions) represent a more powerful form of genericity.
Polymorphism can also be viewed as a pattern of generic programming. For example, when we parameterize types by other types (type polymorphism):
Tree a = Leaf a | Node a (Tree a) – is more generic than TreeI = Leaf Int | Node Int TreeI
Or when we parameterize functions by types (function polymorphism):
f :: Tree a → a – is more generic than g :: String → String
In the previous chapter, we saw that the Foldable
type-class provides a uniform interface for folding over different types. To make a type Foldable
, we need to implement some functions specific to the Foldable typeclass (for example, foldMap
or foldr
).
On the one hand, the type class serves as a contract for ad hoc implementations, while on the other hand, it facilitates generic functions. Through this combination of type-specific and generic functions, type classes give us a way to generalize functions over disparate types. This can be referred to as ad hoc datatype genericity.
This pattern refers to programs specified by other programs, which can be manifested in many different ways; for example, it can refer to reflection-based programming (analyzing the structure of code and data), template-based meta-programming (for example, TemplateHaskell
, 2002), or other styles of code generation.
Derivable type-classes enable code generation for type-class instances. Haskell 98 included autoderivation for the Eq
, Ord
, Enum
, Bounded
, Show
, and Read
type classes.
A second generation of Derivable type-classes was added to the Glasgow Haskell Compiler (GHC) in stages for Data
, Typeable
, Functor
, Foldable
, Traversable
, and Generic
.
Haskell 98 Derivable type-classes were achieved through compiler analysis of the structure of the derivable datatypes, that is, a form of metaprogramming. As there was no unifying formalism underlying the different types, the early Derivable type-classes represent only a rudimentary form of generic programming.
Later in this chapter, we will encounter more generic approaches to enable Derivable type-classes.
The GeneralizedNewtypeDeriving
language extension allows a newtype
declaration to inherit some or all of the type-class instances of an inner type. This is achieved through a trivial kind of meta-programming.
We used this extension when we created a Monad
transformer stack:
{-# LANGUAGE GeneralizedNewtypeDeriving #-} newtype App a = App {runApp :: ReaderT Config (Writer String) a} deriving (Monad, MonadReader Config, MonadWriter String)
Many fundamental datatypes, such as Functor
, Applicative
, Arrow
, and Monad
are associated with mathematical laws that are meant to be obeyed by the type implementer. Since the Haskell type system is not strong enough to express type laws in general, they are not enforceable by the compiler, and the "burden of proof" remains with the type-class implementer.
There are languages (for example, Coq, Agda, and Idris) with type systems designed for expressing laws and constraints for types (known as dependently-typed programming languages; refer to Chapter 7, Patterns of Kind Abstraction, for more on this).
This limitation in Haskell (and most languages for that matter) is a strong motivator for Derivable type classes. Generic implementations of type-classes still have to obey type laws, but at least we can get this right in the generic code, and implementers can assume that the type laws are obeyed.
Although datatype generic programming is a sophisticated pattern, it is based on a simple premise—instead of defining functions for ad hoc types, we deconstruct our types into a more fundamental type representation and then write generic functions against the lower-level type representation.
Instead of writing ad hoc instances of the foldMap
function for each type, we define a lower-level type representation along with a way to translate all regular datatypes to the lower-level representation. This allows us to write generic functions on the lower level of type representation, impervious to changes in the higher level datatypes.
This is datatype-generic programming—writing generic functions parameterized by the shape of the datatype. It explains why datatype-generic programming is also said to exhibit “shape polymorphism”, “structure polymorphism” or “polytypism”.
As stated in Gibbons' Datatype-Generic Programming:
"Parametric polymorphism abstracts from the occurrence of 'integer' in 'lists of integers', whereas datatype genericity abstracts from the occurrence of 'list'"
There are many patterns of datatype-generic programming that differ in the design of the type representation and the way in which the functions on the type representation are arranged.
For the rest of this chapter, we will explore some key patterns of datatype-generic programming.