To explore datatype-generic functions in the sum of products style, we'll return to the familiar List
and Tree
types:
data List' a = Nil' | Cons' a (List' a) deriving (Show) data Tree a = Node a (Tree a) (Tree a) | Leaf a deriving (Show) aList = (Cons' 2 (Cons' 3 (Cons' 5 Nil'))) intTree = Node 2 (Leaf 3) (Node 5 (Leaf 7) (Leaf 11))
As a reference point, we define the datatype-specific size
functions:
sizeT (Leaf _) = 1 sizeT (Node _ lt rt) = 1 + (sizeT lt) + (sizeT rt) sizeL Nil' = 0 sizeL (Cons' _ xs) = 1 + (sizeL xs)
Instead of these ad hoc polymorphic functions, let's write them in a datatype-generic way. First, we define a type representation. In this section, we follow the generic programming style of Lightweight Implementation of Generics and Dynamics (LIGD) by
Cheney
and Hinze
, 2002, as revised in Libraries for Generic Programming in Haskell by Jeuring et al., 2009.
LIGD uses the sum of products style of type representation. For example, in this style, List'
would be deconstructed as either Nil
or the combination of an element with another list. In order to represent List
' we need to be able to express "Nil", "choice of either", and "combination of":
data List' a = Nil' | Cons' a (List' a)
Constructors that take no arguments (for example, Nil'
) are represented by the “unit” type U
:
data U = U deriving (Show)
To simplify this example, we discard the constructor names and focus only on the structure of the datatype. We can encode the choice between multiple constructors in the style of the Either
type:
data Choice a b = L a | R b
deriving (Show)
that is, List
is a choice between U
and Cons
':
Choice U (Cons' a (List' a))
To encode type constructors with multiple arguments, we define the Combo
type:
data Combo a b = Combo a b
deriving (Show)
Now we can express Cons'
as a combination of types:
Cons' a (List' a) -> Combo a (List' a)
The List
type can be reduced to a choice between U
and the above combination:
type RList a = Choice U (Combo a (List' a))
Note that the RList
type does not recurse but refers to List'
instead; this is called a shallow type representation.
Similarly, we can represent Tree
in this type representation:
type RTree a = Choice (Combo U a) (Combo a (Combo (Tree a) (Tree a)))
Although Combo
and Choice
take two arguments, we can express multiple arguments through nesting, for example, in (Combo a (Combo (Tree a) (Tree a)))
.
In the sum of products representation style, sum refers to our Choice
, product refers to Combo
, and unit refers to U
.
Now that we have an alternative representation for List
, we can translate to and from the type representation:
fromL :: List' a -> RList a fromL Nil' = L U fromL (Cons' x xs) = R (Combo x xs) toL :: RList a -> List' a toL (L U) = Nil' toL (R (Combo x xs)) = (Cons' x xs) main = do print $ fromL aList print $ (toL . fromL) aList
Before we move on, let's express the pair of type translation functions as a single type:
data EP d r = EP {from :: (d -> r), to :: (r -> d)}
At this stage, our type representation consists of the following three data types:
data U = U data Choice a b = L a | R b data Combo a b = Combo a b
If we are to define a generic function parameterized by this type representation, we'll need to group these disparate types into one. There are different ways of doing this and for our purposes, we'll use a GADT:
data TypeRep t where RUnit :: TypeRep U RChoice :: TypeRep a -> TypeRep b -> TypeRep(Choice a b) RCombo :: TypeRep a -> TypeRep b -> TypeRep (Combo a b) RInt :: TypeRep Int Rtype :: EP d r -> TypeRep r -> TypeRep d
What makes the TypeRep
constructor a GADT is that each constructor returns a different specialization of the general type TypeRep t
. We added two other constructors that will only make sense a bit further down the line: RInt
and RType
.
Recall that we defined RList
in terms of the type representation datatypes:
type RList a = Choice U (Combo a (List' a))
We need a corresponding type based on the TypeRep
constructors. rList
creates a more finely-typed list representation that packages the list representation together with toL
and fromL
:
rList :: TypeRep a -> TypeRep (List' a) rList tr = RType (EP fromL toL) (RChoice RUnit (RCombo tr (rList tr)))
The rList
function is a recursive function using the TypeRep
constructors as building blocks. The first argument (TypeRep a)
guides the type resolution of List'
; for example, if we passed in (TypeRep Int)
, the resulting type would be:
TypeRep Int -> TypeRep (List' Int)
This is where we need the RInt
type constructor:
rList (TypeRep Int) –- INVALID rList RInt –- VALID
Similarly, we would need additional constructors for RFloat
, RDouble
, RChar
, and so on, to deal with lists of these types.
Now, we can write a generic function, say gSize
, parameterized by both the type representation and an instance of the type:
gSize :: TypeRep a -> a -> Int gSize RUnit U = 0 gSize (RChoice trA trB) (L a) = gSize trA a gSize (RChoice trA trB) (R b) = gSize trB b gSize (RCombo trA trB) (Combo a b) = (gSize trA a) + (gSize trB b) gSize RInt _ = 1 gSize (RType ep tr) t = gSize tr (from ep t)
GADTs give us finer precision in the return types of data constructors, and hence finer precision when we perform pattern matching against algebraic datatypes.
Finally, we can apply the gSize
function to (List' Int)
:
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE GADTs #-}
main = print $ gSize (rList RInt) aList
It's worth evaluating this in slow motion:
gSize (rList RInt) aList gSize (RType ep listRep) aList gSize listRep (from ep aList) -- substitute listRep, apply 'from' to aList gSize (RChoice RUnit (RCombo RInt (rList RInt))) R (Combo 2 (Cons' 3 (Cons' 5 Nil'))) -- choose the 2nd type-rep because of R in list rep gSize (RCombo RInt (rList RInt)) (Combo 2 (Cons' 3 (Cons' 5 Nil')) –- add the matching type-rep and list-rep pairs (gSize RInt 2) + (gSize (rList RInt) (Cons' 3 (Cons' 5 Nil')) –- evalulate (gSize RInt 2) 1 + (gSize (rList RInt) (Cons' 3 (Cons' 5 Nil')) ...
Note how the TypeRep
constructors guide the pattern matching of the appropriate data contents.
Adding a new datatype, say Tree
, does not require amending the generic functions. (Of course, we may want to override a generic function for a particular type. Generic libraries have different ways to allow for this, but we'll omit those details here).
We already defined RTree
and are left with fromT
, toT
, and rTree
:
fromT :: Tree a > RTree a fromT (Leaf x) = L (Combo U x) fromT (Node x lt rt) = R (Combo x (Combo lt rt)) toT :: RTree a > Tree a toT (L (Combo U x)) = Leaf x toT (R (Combo x (Combo lt rt))) = (Node x lt rt) rTree :: TypeRep a > TypeRep (Tree a) rTree tr = RType (EP fromT toT) (RChoice (RCombo RUnit tr) (RCombo tr (RCombo (rTree tr) (rTree tr))))
Now we can apply the datatype-generic gSize
function to a Tree
instance:
main = print $ gSize (rTree RInt) intTree
This may seem like a lot of work to gain one generic function for Tree
, but we can now use the whole class of generic functions defined against the underlying type representation. If we were to add gEq
, gShow
, gFold
, gTraverse
, and so on, they would all be automatically applicable to Tree
and List
.
In 2010, a more generic approach was introduced to synthesize (and surpass) the Derivable type-classes of Haskell 98 (refer to A Generic Deriving Mechanism for Haskell by Magalhaes et al, 2010)
The generic deriving mechanism was based on a more sophisticated extension of the sum of products approach of this section. Furthermore, the new approach also enabled autoderiving of user-defined type classes.
This work has found its way into the Haskell ecosystem in the form of the GHC.Generics library.