Abstracting datatypes

In this section, we will describe a series of patterns related to data abstraction. We start with existentially quantified types then progress to phantom types and end with GADTs. We'll see that these patterns fall within a spectrum of generality and power.

Universal quantification

Let's explore existential quantification from the perspective of its opposite, universal quantification.

All parametrically polymorphic functions, from Rank 1 to higher rank functions, are universally quantified. Similarly, parametrically polymorphic data­types are universally quantified. If the Haskell syntax for universally quantified functions and data­types were consistent, then we would have had to use the forall keyword in data­type signatures to indicate polymorphism, for example:

  data Maybe' a =           Nothing' | Just' a
  -- conceptually (but not practically!) the same as 
  data Maybe' a = forall a. Nothing' | Just' a

As another example, consider the universally quantified type ObjU, which mimics an OOP object with one property (of type a) and two object methods:

data ObjU a 
    = ObjU a           -- property 
       (a -> Bool)     -- obj method 
       (a -> String)   -- obj method 

To apply either of the object methods to the property value, we write new functions that use pattern matching to extract the respective method and value:

obj_f1 :: ObjU a -> Bool
obj_f1 (ObjU v f1 _) = f1 v

obj_f2 :: ObjU a -> String
obj_f2 (ObjU v _ f2) = f2 v

main = do
  print $ obj_f1 obj -- even 3
  print $ obj_f2 obj -- show 3
  where obj = (ObjU 3 even show)

We've packaged a value with some functions in the ObjU object but we have not managed to encapsulate the property value as a true OOP object would have. This is what existential quantification enables us to do.

Existential quantification and abstract datatypes

In contrast to the universally quantified type ObjU of the previous section, the following existentially quantified type ObjE “hides” the type parameter a (which is no longer present on the lefthand side):

     -- ObjU a  =              ObjU a (a ­> Bool) (a ­> String)  
data ObjE       = ​ forall a.​   ObjE a (a ­> Bool) (a ­> String)

This means that the type parameter is also "hidden" in the following type signatures of objE_f1 and objE_f2:

objE_f1 :: ObjE -> Bool
objE_f1 (ObjE v f1 _) = f1 v

objE_f2 :: ObjE -> String
objE_f2 (ObjE v _ f2) = f2 v

-- requires {-# LANGUAGE ExistentialQuantification #-}
main = do
  print $ objE_f1 obj -- even 3
  print $ objE_f2 obj -- show 3
  where obj = (ObjE 3 even show)

We can access an encapsulated object property only with the functions packaged with that property. For example, we can't apply any other function to the following property:

-- INVALID (cannot infer types)
objE_f3 (ObjE v f1 f2) = v 

Existential quantification provides us with the means to implement abstract datatypes, thus providing functions over a type while hiding the representation of the type. We can also achieve the abstract datatypes on a higher level by using Haskell's module system to hide algebraic datatype constructors while exporting functions over the type.

For the remainder of this chapter, we will refer to types that are existentially qualified as "existentials".

Universal

Existential

Type parametrization

Type abstraction

Parametric Polymorphism

Encapsulation

user of data specifies type

implementer of data specifies type

forall = "for all"

forall = "for some"

Phantom types

Phantom types were introduced in 1999 as a solution to the challenges that arise when embedding a type-safe domain specific language (DSL) in Haskell. (See Fun with Phantom Types, Hinze http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf)

Consider this trivial expression language and evaluator:

data Expr1 = I1 Int 
             | Add1 Expr1 Expr1

eval1 :: Expr1 -> Int
eval1 (I1 v) = v
eval1 (Add1 x y) = (eval1 x) + (eval1 y)

When we add another base type (B2 Bool) to the expression language, the situation gets more complicated.

data Expr2 = I2 Int 
             |  B2 Bool 
             |  Add2 Expr2 Expr2
     deriving Show

This has brought about the following two problems:

  1. The Add2 type constructor was meant to only work with I2 values, but regular algebraic datatypes don't allow us to express this relationship (constraint) between the two constructors. This means we can not construct “bad” values:
      -– construct a "bad" value
      (Add2 (I2 11) (B2 True))
  2. The type of the eval function can no longer be inferred (or defined):
        -- INVALID
        eval2 :: Expr2 -> t 
        eval2 (I2 v) = v
        eval2 (B2 v) = v 
        eval2 (Add2 x y) = (eval2 x) + (eval2 y)

Phantom types solve our first problem by adding the type parameter t to the Expr3 type:

data Expr3 t = I3 Int 
               |  B3 Bool 
               |  Add3 (Expr3 Int) (Expr3 Int)
 deriving Show

The type t serves as a type placeholder that can be used by each constructor to describe its particular type. However, all the constructors still return the same type:

I3   ::                   Int  -> Expr3 t  
B3   ::                   Bool -> Expr3 t 
Add3 :: Expr3 Int -> Expr3 Int  -> Expr3 t

The Expr3 type is parametrized by the type t, but t does not appear in any of the constructors, hence the term phantom type. We can still construct invalid values, for example for example the following will be rejected:

  print $ add3 (i3 10) (b3 True)  INVALID

However, now we can use the phantom type information to create type-safe smart constructors:

  i3 :: Int -> Expr3 Int
  i3 = I3

  b3 :: Bool -> Expr3 Bool
  b3 = B3

  add3 :: Expr3 Int -> Expr3 Int -> Expr3 Int
  add3 = Add3

If we use the smart constructors instead of the datatype constructors, the Haskell type-checker will prevent us from creating invalid values, for example, the following will be rejected:

-- rejected by compiler  
   print $ add3 (i3 10) (b3 True) -- NVALID

However, type inference remains a problem because the values are still not described accurately. For example:

  (I3 12) :: Expr3 t   -- this 
  --      :: Expr3 Int – not this

The effect is that adding values remains too ambiguous:

  eval3 (Add3 x y) = (eval3 x) + (eval3 y)

Despite these limitations, phantom types continue to be useful in other areas. For example, the Lens library uses the const phantom type to great effect (See https://github.com/ekmett/lens/wiki/Derivation).

We have seen here how phantom types enable type-safe construction (stated as problem (1) earlier in this section). To solve problem (2) we'll need the power of generalized algebraic datatypes (GADTs).

Generalized algebraic datatypes

GADTs emerged independently from both ML and Haskell camps in 2003 and were a part of the GHC by 2005 (see History of Haskell, Hudak et al. and First-class Phantom Types, Cheney and Hinze).

GADTs bring together phantom types, smart constructors, and refined pattern matching. Let's rewrite our expression language from earlier using GADTs:

{-# LANGUAGE GADTs #-}

data Expr t where  -- phantom 't'
  -- built-in smart constructors
  I :: Int  -> Expr Int
  B :: Bool -> Expr Bool
  Add :: Expr Int -> Expr Int -> Expr Int

The GADT smart constructors describe constrained instances of Expr t. As with phantom types, smart constructors provide increased type safety for data construction. However, GADTs give us something we don't get from phantom types:

  eval :: Expr t -> t
  eval (I v) = v
  eval (B v) = v
  eval (Add x y) = (eval x) + (eval y)

  --  eval (Add (I 10) (I 12))

Because the GADT smart constructors are built into the type, we can pattern match on them. This solves the problem of type inference that we had with phantom types. This is why GADTs are also known as first-class phantom types.

Although GADTs are expressed using special syntax (for example, data Expr t where), they are characterised by the relationship between the type parameters and the constructor return types. On the other hand, phantom types don’t use any special syntax but are instead implied by the lack of appearance of a type parameter in type constructors.

There is a subtle drift in the meaning of a type parameter, from signifying the type of some embedded value to expressing type metadata.

The Typecase pattern

Generic programming (the focus of the following Chapter 6, Patterns of Generic Programming) is another important use-case for GADTs. As an example, consider a type representation Rep that unifies the three types Int, Char and List:

data Rep t where
  RInt  :: Rep Int
  RChar :: Rep Char
  RList :: Show a => Rep a -> Rep [a]

The RList constructor can be thought of as being existentially qualified (a does not appear on the left-hand side). The phantom t in Rep t will serve as type metadata.

We can now write a function that takes a value along with its type representation:

  showT :: Show t => Rep t -> t -> String

  showT RInt i  = (show i) ++ " :: INT"
  showT RChar i = (show i) ++ " :: Char"

  showT (RList rep) [] = "THE END"
  showT (RList rep) (x:xs) 
    = (showT rep x) ++ ", " ++ 
      (showT (RList rep) xs)

The showT function is a type-indexed function because it is defined for each member of the family of types Rep t:

  showT RInt 3
  showT (RList RInt) [12,13,14]
  showT (RList RChar) ['2','3','5']

More precisely, showT is a closed type-indexed function because the type index family (Rep t) is fixed.

In contrast, the show function of the Show type-class is an example of an open type-indexed function where show is simply type-indexed by instances of Show and considered "open" because we can add new types to the type index freely (in this case, new instances of the type-class Show).

In languages that allow us to reflect on the type of a value, we can write showT by dealing with the parameter value on a "type-case" basis:

– pseudo code
case (type t) of 
  Int  -> (show t) ++ ":: INT"
  Char -> (show t) ++ ":: CHAR"
  List -> -- ... show list

This style has been distilled into a design pattern:

 

"TypeCase: a design pattern that allows the definition of closed type-indexed functions, in which the index family is fixed but the collection of functions is extensible"

 
 --Oliveira and Gibbons, TypeCase: A Design Pattern for Type-indexed Functions

Dynamic types

We now have the ingredients to define dynamic types. All we need to do is package a type together with the type representation. In our first attempt, we'll do the packaging using existential quantification:

data DynamicEQ = forall t. Show t => DynEQ (Rep t) t

Even though DynEq dynamic values have opaque type, they are well typed. For example, we can use them to express heterogeneous lists:

dynEQList = [DynEQ RChar 'x',
             DynEQ RInt 3]

Since GADTs generalize existentials, we can also write the above example in the form of a "dynamic GADT":

data Dynamic where 
  Dyn :: Show t => Rep t -> t -> Dynamic

instance Show Dynamic where
  show (Dyn rep v) = showT rep v

We can use this GADT to define a heterogeneous list of dynamically typed values:

dynList :: [Dynamic]
dynList = [Dyn RChar 'x', Dyn RInt 3]

showDyn (Dyn rep v) = showT rep v

To achieve representable lists of dynamic types, (RList RDyn), we need to add another constructor to our representation Rep:

data Rep t where
  RInt  :: Rep Int
  RChar :: Rep Char
  RList :: Show a => Rep a -> Rep [a]
  RDyn :: Rep Dynamic

We also need to add another clause for showT to deal with dynamic values (analogous to ShowDyn):

showT RDyn (Dyn rep v) = showT rep v

Now we have the generic functions acting on dynamic types:

main = do 
  print $ showT RInt 17
  print $ showT (RList RInt) [12,13,14]
  print $ showT (RList RDyn) dynList

Dynamic types carry enough type information about themselves to enable safe type casting:

toInt :: Dynamic -> Maybe Int
toInt (Dyn RInt i) = Just i
toInt (Dyn _ _)    = Nothing

(See Fun with Phantom Types, Hinze: http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf, and Generalized Algebraic Data Types in Haskell, Anton Dergunov: https://themonadreader.files.wordpress.com/2013/08/issue221.pdf)

Heterogeneous lists

In the previous sections, we saw that GADTs generalize phantom types and existential types. Let's explore the heterogeneous lists pattern (lists of varying types), first using existentials and then GADTs.

Using existentials

We can quite easily define a heterogeneous list using existentials:

{-# LANGUAGE ExistentialQuantification #-}

data LI_Eq1 = forall  a. LI_Eq1 a

hListEq1 :: [LI_Eq1]
hListEq1 =  [LI_Eq1 3, LI_Eq1 "5"]

However, as we saw earlier, we can't do anything with this list. For example, in order to show list items we need to package a show function with each item:

data LI_Eq2 = forall  a. LI_Eq2 a (a -> String)

hListEq2 :: [LI_Eq2]
hListEq2 =  [LI_Eq2  3  (show :: Int -> String), 
             LI_Eq2 "5" (show :: String -> String)]

(We add the show type signatures here for the sake of clarity but they can be inferred, in other words omitted.)

showEq2 (LI_Eq2 v showF) = showF v
-- e.g. main = mapM_ (putStrLn . showEq2) hListEq2

Instead of passing in the show functions explicitly, we can constrain the type signatures to the Show type­class, allowing us to implicitly supply the show functions:

data LI_Eq3 = forall a. Show a => LI_Eq3 a

hListEq3 :: [LI_Eq3]
hListEq3 =  [LI_Eq3  3, LI_Eq3 "5"]

showEq3 (LI_Eq3 v) = show v

The type-class constraint specified in the existential amounts to what is called bounded quantification (bounded by type-class).

Using GADTs

We can express heterogeneous lists in the same two styles that we used with existentials. In the first style we pass in the show function (a -> String):

{-# LANGUAGE GADTs #-}

data LI_Gadt1 where
  {MkShow1 :: a -> (a -> String) -> LI_Gadt1}

hListGadt1 :: [LI_Gadt1]
hListGadt1 = [MkShow1 "3" show, MkShow1 5 show]

showGadt1 (MkShow1 v showF) = showF v

Alternatively, we can also use a GADT with bounded quantification:

data LI_Gadt2 where
  {MkShow2 :: Show a => a -> LI_Gadt2}

hListGadt1 :: [LI_Gadt1]
hListGadt2 = [MkShow2 "3", MkShow2 5]

showGadt2 (MkShow2 v) = show v
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset