The sum of products style

To explore datatype-generic functions in the sum of products style, we'll return to the familiar List and Tree types:

  data List' a = Nil' | Cons' a (List' a)
      deriving (Show)

  data Tree a = Node a (Tree a) (Tree a)
          | Leaf a
            deriving (Show)

  aList = (Cons' 2 (Cons' 3 (Cons' 5 Nil')))
  intTree
    = Node 2 (Leaf 3)
             (Node 5 (Leaf 7)
                     (Leaf 11))

As a reference point, we define the datatype-specific size functions:

  sizeT (Leaf _) = 1
  sizeT (Node _ lt rt) 
    = 1 + (sizeT lt) + (sizeT rt)

  sizeL Nil' = 0
  sizeL (Cons' _ xs)
    = 1 + (sizeL xs)

Instead of these ad hoc polymorphic functions, let's write them in a datatype-generic way. First, we define a type representation. In this section, we follow the generic programming style of Lightweight Implementation of Generics and Dynamics (LIGD) by Cheney and Hinze, 2002, as revised in Libraries for Generic Programming in Haskell by Jeuring et al., 2009.

The sum of products type representation

LIGD uses the sum of products style of type representation. For example, in this style, List' would be deconstructed as either Nil or the combination of an element with another list. In order to represent List' we need to be able to express "Nil", "choice of either", and "combination of":

  data List' a = Nil' | Cons' a (List' a)

Constructors that take no arguments (for example, Nil') are represented by the “unit” type U:

  data U = U
    deriving (Show)

To simplify this example, we discard the constructor names and focus only on the structure of the datatype. We can encode the choice between multiple constructors in the style of the Either type:

  data Choice a b = L a | R b
    deriving (Show)

that is, List is a choice between U and Cons':

  Choice U (Cons' a (List' a))

To encode type constructors with multiple arguments, we define the Combo type:

  data Combo a b = Combo a b
    deriving (Show)

Now we can express Cons' as a combination of types:

  Cons' a (List' a) -> Combo a (List' a)

The List type can be reduced to a choice between U and the above combination:

  type RList a = Choice U (Combo a (List' a))

Note that the RList type does not recurse but refers to List' instead; this is called a shallow type representation.

Similarly, we can represent Tree in this type representation:

type RTree a = Choice (Combo U a) 
                      (Combo a  
                               (Combo (Tree a)  
                                      (Tree a)))

Although Combo and Choice take two arguments, we can express multiple arguments through nesting, for example, in (Combo a (Combo (Tree a) (Tree a))).

In the sum of products representation style, sum refers to our Choice, product refers to Combo, and unit refers to U.

Translating between the type and representation

Now that we have an alternative representation for List, we can translate to and from the type representation:

  fromL :: List' a -> RList a
  fromL Nil'          = L U
  fromL (Cons' x xs)  = R (Combo x xs)

  toL :: RList a -> List' a
  toL (L U)              = Nil'
  toL (R (Combo x xs))   = (Cons' x xs)

  main = do
    print $ fromL aList
    print $ (toL . fromL) aList

Before we move on, let's express the pair of type translation functions as a single type:

 data EP d r = EP {from :: (d -> r),
                  to :: (r -> d)}

Writing a datatype-generic function

At this stage, our type representation consists of the following three data types:

  data U = U
  data Choice a b = L a | R b
  data Combo a b = Combo a b

If we are to define a generic function parameterized by this type representation, we'll need to group these disparate types into one. There are different ways of doing this and for our purposes, we'll use a GADT:

data TypeRep t where
  RUnit   :: TypeRep U
  RChoice :: TypeRep a -> TypeRep b 
                 -> TypeRep(Choice a b)
  RCombo  :: TypeRep a -> TypeRep b 
             -> TypeRep (Combo a b)
  RInt  :: TypeRep Int
  Rtype :: EP d r -> TypeRep r -> TypeRep d

What makes the TypeRep constructor a GADT is that each constructor returns a different specialization of the general type TypeRep t. We added two other constructors that will only make sense a bit further down the line: RInt and RType.

Recall that we defined RList in terms of the type representation datatypes:

  type RList a = Choice U (Combo a (List' a))

We need a corresponding type based on the TypeRep constructors. rList creates a more finely-typed list representation that packages the list representation together with toL and fromL:

  rList :: TypeRep a -> TypeRep (List' a)
  rList tr = RType (EP fromL toL) 
                   (RChoice RUnit  
                              (RCombo tr (rList tr)))

The rList function is a recursive function using the TypeRep constructors as building blocks. The first argument (TypeRep a) guides the type resolution of List'; for example, if we passed in (TypeRep Int), the resulting type would be:

   TypeRep Int -> TypeRep (List' Int)

This is where we need the RInt type constructor:

  rList (TypeRep Int) –- INVALID
  rList RInt          –- VALID

Similarly, we would need additional constructors for RFloat, RDouble, RChar, and so on, to deal with lists of these types.

Now, we can write a generic function, say gSize, parameterized by both the type representation and an instance of the type:

  gSize :: TypeRep a -> a -> Int
  gSize RUnit  U  = 0
  gSize (RChoice trA trB) (L a) 
    = gSize trA a
  gSize (RChoice trA trB) (R b) 
    = gSize trB b
  gSize (RCombo  trA trB) (Combo a b)
    = (gSize trA a) + (gSize trB b)
  gSize RInt  _ = 1 
  gSize (RType ep tr) t           
    = gSize tr (from ep t)

GADTs give us finer precision in the return types of data constructors, and hence finer precision when we perform pattern matching against algebraic datatypes.

Finally, we can apply the gSize function to (List' Int):

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE GADTs #-}

main = print $ gSize (rList RInt) aList

It's worth evaluating this in slow motion:

   gSize (rList RInt) aList
   gSize (RType ep listRep) aList
   gSize listRep (from ep aList)

  -- substitute listRep, apply 'from' to aList
  gSize (RChoice RUnit (RCombo RInt (rList RInt))) 
        R (Combo 2 (Cons' 3 (Cons' 5 Nil')))

  -- choose the 2nd type-rep because of R in list rep
  gSize (RCombo RInt (rList RInt))
        (Combo  2    (Cons' 3 (Cons' 5 Nil'))

  –- add the matching type-rep and list-rep pairs
  (gSize RInt 2) 
         + (gSize (rList RInt) (Cons' 3 (Cons' 5 Nil'))

  –- evalulate (gSize RInt 2) 
  1  + (gSize (rList RInt) (Cons' 3 (Cons' 5 Nil'))
  ...

Note how the TypeRep constructors guide the pattern matching of the appropriate data contents.

Adding a new datatype

Adding a new datatype, say Tree, does not require amending the generic functions. (Of course, we may want to override a generic function for a particular type. Generic libraries have different ways to allow for this, but we'll omit those details here).

We already defined RTree and are left with fromT, toT, and rTree:

fromT :: Tree a ­> RTree a 
fromT (Leaf x)         = L (Combo U x)  
fromT (Node x lt rt) = R (Combo x (Combo lt rt)) 
 
toT :: RTree a ­> Tree a 
toT (L (Combo U x))                   = Leaf x 
toT (R (Combo x (Combo lt rt))) = (Node x lt rt) 
 
rTree :: TypeRep a ­> TypeRep (Tree a) 
rTree tr = RType (EP fromT toT) 
                 (RChoice (RCombo RUnit tr)   
                          (RCombo tr  
                                 (RCombo (rTree tr)  
                                         (rTree tr)))) 

Now we can apply the datatype-generic gSize function to a Tree instance:

main = print $ gSize (rTree RInt) intTree

This may seem like a lot of work to gain one generic function for Tree, but we can now use the whole class of generic functions defined against the underlying type representation. If we were to add gEq, gShow, gFold, gTraverse, and so on, they would all be automatically applicable to Tree and List.

GHC.Generics – a generic deriving mechanism

In 2010, a more generic approach was introduced to synthesize (and surpass) the Derivable type-classes of Haskell 98 (refer to A Generic Deriving Mechanism for Haskell by Magalhaes et al, 2010)

The generic deriving mechanism was based on a more sophisticated extension of the sum of products approach of this section. Furthermore, the new approach also enabled autoderiving of user-defined type classes.

This work has found its way into the Haskell ecosystem in the form of the GHC.Generics library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset