In this section, we will describe a series of patterns related to data abstraction. We start with existentially quantified types then progress to phantom types and end with GADTs. We'll see that these patterns fall within a spectrum of generality and power.
Let's explore existential quantification from the perspective of its opposite, universal quantification.
All parametrically polymorphic functions, from Rank 1
to higher rank functions, are universally quantified. Similarly, parametrically polymorphic datatypes are universally quantified. If the Haskell syntax for universally quantified functions and datatypes were consistent, then we would have had to use the forall
keyword in datatype signatures to indicate polymorphism, for example:
data Maybe' a = Nothing' | Just' a
-- conceptually (but not practically!) the same as
data Maybe' a = forall a. Nothing' | Just' a
As another example, consider the universally quantified type ObjU
, which mimics an OOP object with one property (of type a
) and two object methods:
data ObjU a = ObjU a -- property (a -> Bool) -- obj method (a -> String) -- obj method
To apply either of the object methods to the property value, we write new functions that use pattern matching to extract the respective method and value:
obj_f1 :: ObjU a -> Bool obj_f1 (ObjU v f1 _) = f1 v obj_f2 :: ObjU a -> String obj_f2 (ObjU v _ f2) = f2 v main = do print $ obj_f1 obj -- even 3 print $ obj_f2 obj -- show 3 where obj = (ObjU 3 even show)
We've packaged a value with some functions in the ObjU
object but we have not managed to encapsulate the property value as a true OOP object would have. This is what existential quantification enables us to do.
In contrast to the universally quantified type ObjU
of the previous section, the following existentially
quantified type ObjE
“hides” the type parameter a
(which is no longer present on the lefthand side):
-- ObjU a = ObjU a (a > Bool) (a > String) data ObjE = forall a. ObjE a (a > Bool) (a > String)
This means that the type parameter is also "hidden" in the following type signatures of objE_f1
and objE_f2
:
objE_f1 :: ObjE -> Bool objE_f1 (ObjE v f1 _) = f1 v objE_f2 :: ObjE -> String objE_f2 (ObjE v _ f2) = f2 v -- requires {-# LANGUAGE ExistentialQuantification #-} main = do print $ objE_f1 obj -- even 3 print $ objE_f2 obj -- show 3 where obj = (ObjE 3 even show)
We can access an encapsulated object property only with the functions packaged with that property. For example, we can't apply any other function to the following property:
-- INVALID (cannot infer types) objE_f3 (ObjE v f1 f2) = v
Existential quantification provides us with the means to implement abstract datatypes, thus providing functions over a type while hiding the representation of the type. We can also achieve the abstract datatypes on a higher level by using Haskell's module system to hide algebraic datatype constructors while exporting functions over the type.
For the remainder of this chapter, we will refer to types that are existentially qualified as "existentials".
Universal |
Existential |
---|---|
Type parametrization |
Type abstraction |
Parametric Polymorphism |
Encapsulation |
user of data specifies type |
implementer of data specifies type |
forall = "for all" |
forall = "for some" |
Phantom types were introduced in 1999 as a solution to the challenges that arise when embedding a type-safe domain specific language (DSL) in Haskell. (See Fun with Phantom Types, Hinze http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf)
Consider this trivial expression language and evaluator:
data Expr1 = I1 Int | Add1 Expr1 Expr1 eval1 :: Expr1 -> Int eval1 (I1 v) = v eval1 (Add1 x y) = (eval1 x) + (eval1 y)
When we add another base type (B2 Bool)
to the expression language, the situation gets more complicated.
data Expr2 = I2 Int
| B2 Bool
| Add2 Expr2 Expr2
deriving Show
This has brought about the following two problems:
Add2
type constructor was meant to only work with I2
values, but regular algebraic datatypes don't allow us to express this relationship (constraint) between the two constructors. This means we can not construct “bad” values:-– construct a "bad" value (Add2 (I2 11) (B2 True))
eval
function can no longer be inferred (or defined):-- INVALID eval2 :: Expr2 -> t eval2 (I2 v) = v eval2 (B2 v) = v eval2 (Add2 x y) = (eval2 x) + (eval2 y)
Phantom types solve our first problem by adding the type parameter t
to the Expr3
type:
data Expr3 t = I3 Int | B3 Bool | Add3 (Expr3 Int) (Expr3 Int) deriving Show
The type t
serves as a type placeholder that can be used by each constructor to describe its particular type. However, all the constructors still return the same type:
I3 :: Int -> Expr3 t B3 :: Bool -> Expr3 t Add3 :: Expr3 Int -> Expr3 Int -> Expr3 t
The Expr3
type is parametrized by the type t
, but t
does not appear in any of the constructors, hence the term phantom type. We can still construct invalid values, for example for example the following will be rejected:
print $ add3 (i3 10) (b3 True) INVALID
However, now we can use the phantom type information to create type-safe smart constructors:
i3 :: Int -> Expr3 Int i3 = I3 b3 :: Bool -> Expr3 Bool b3 = B3 add3 :: Expr3 Int -> Expr3 Int -> Expr3 Int add3 = Add3
If we use the smart constructors instead of the datatype constructors, the Haskell type-checker will prevent us from creating invalid values, for example, the following will be rejected:
-- rejected by compiler print $ add3 (i3 10) (b3 True) -- NVALID
However, type inference remains a problem because the values are still not described accurately. For example:
(I3 12) :: Expr3 t -- this -- :: Expr3 Int – not this
The effect is that adding values remains too ambiguous:
eval3 (Add3 x y) = (eval3 x) + (eval3 y)
Despite these limitations, phantom types continue to be useful in other areas. For example, the Lens
library uses the const
phantom type to great effect (See https://github.com/ekmett/lens/wiki/Derivation).
We have seen here how phantom types enable type-safe construction (stated as problem (1) earlier in this section). To solve problem (2) we'll need the power of generalized algebraic datatypes (GADTs).
GADTs emerged independently from both ML and Haskell camps in 2003 and were a part of the GHC by 2005 (see History of Haskell, Hudak et al. and First-class Phantom Types, Cheney and Hinze).
GADTs bring together phantom types, smart constructors, and refined pattern matching. Let's rewrite our expression language from earlier using GADTs:
{-# LANGUAGE GADTs #-} data Expr t where -- phantom 't' -- built-in smart constructors I :: Int -> Expr Int B :: Bool -> Expr Bool Add :: Expr Int -> Expr Int -> Expr Int
The GADT smart constructors describe constrained instances of Expr t
. As with phantom types, smart constructors provide increased type safety for data construction. However, GADTs give us something we don't get from phantom types:
eval :: Expr t -> t
eval (I v) = v
eval (B v) = v
eval (Add x y) = (eval x) + (eval y)
-- eval (Add (I 10) (I 12))
Because the GADT smart constructors are built into the type, we can pattern match on them. This solves the problem of type inference that we had with phantom types. This is why GADTs are also known as first-class phantom types.
Although GADTs are expressed using special syntax (for example, data Expr t
where), they are characterised by the relationship between the type parameters and the constructor return types. On the other hand, phantom types don’t use any special syntax but are instead implied by the lack of appearance of a type parameter in type constructors.
There is a subtle drift in the meaning of a type parameter, from signifying the type of some embedded value to expressing type metadata.
Generic programming (the focus of the following Chapter 6, Patterns of Generic Programming) is another important use-case for GADTs. As an example, consider a type representation Rep
that unifies the three types Int
, Char
and List
:
data Rep t where RInt :: Rep Int RChar :: Rep Char RList :: Show a => Rep a -> Rep [a]
The RList
constructor can be thought of as being existentially qualified (a
does not appear on the left-hand side). The phantom t
in Rep t
will serve as type metadata.
We can now write a function that takes a value along with its type representation:
showT :: Show t => Rep t -> t -> String showT RInt i = (show i) ++ " :: INT" showT RChar i = (show i) ++ " :: Char" showT (RList rep) [] = "THE END" showT (RList rep) (x:xs) = (showT rep x) ++ ", " ++ (showT (RList rep) xs)
The showT
function is a type-indexed function because it is defined for each member of the family of types Rep t
:
showT RInt 3 showT (RList RInt) [12,13,14] showT (RList RChar) ['2','3','5']
More precisely, showT
is a closed type-indexed function because the type index family (Rep t)
is fixed.
In contrast, the show
function of the Show
type-class is an example of an open type-indexed function where show
is simply type-indexed by instances of Show
and considered "open" because we can add new types to the type index freely (in this case, new instances of the type-class Show
).
In languages that allow us to reflect on the type of a value, we can write showT
by dealing with the parameter value on a "type-case" basis:
– pseudo code
case (type t) of
Int -> (show t) ++ ":: INT"
Char -> (show t) ++ ":: CHAR"
List -> -- ... show list
We now have the ingredients to define dynamic types. All we need to do is package a type together with the type representation. In our first attempt, we'll do the packaging using existential quantification:
data DynamicEQ = forall t. Show t => DynEQ (Rep t) t
Even though DynEq
dynamic values have opaque type, they are well typed. For example, we can use them to express heterogeneous lists:
dynEQList = [DynEQ RChar 'x', DynEQ RInt 3]
Since GADTs generalize existentials, we can also write the above example in the form of a "dynamic GADT":
data Dynamic where Dyn :: Show t => Rep t -> t -> Dynamic instance Show Dynamic where show (Dyn rep v) = showT rep v
We can use this GADT to define a heterogeneous list of dynamically typed values:
dynList :: [Dynamic] dynList = [Dyn RChar 'x', Dyn RInt 3] showDyn (Dyn rep v) = showT rep v
To achieve representable lists of dynamic types, (RList RDyn)
, we need to add another constructor to our representation Rep
:
data Rep t where
RInt :: Rep Int
RChar :: Rep Char
RList :: Show a => Rep a -> Rep [a]
RDyn :: Rep Dynamic
We also need to add another clause for showT
to deal with dynamic values (analogous to ShowDyn
):
showT RDyn (Dyn rep v) = showT rep v
Now we have the generic functions acting on dynamic types:
main = do print $ showT RInt 17 print $ showT (RList RInt) [12,13,14] print $ showT (RList RDyn) dynList
Dynamic types carry enough type information about themselves to enable safe type casting:
toInt :: Dynamic -> Maybe Int toInt (Dyn RInt i) = Just i toInt (Dyn _ _) = Nothing
(See Fun with Phantom Types, Hinze: http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf, and Generalized Algebraic Data Types in Haskell, Anton Dergunov: https://themonadreader.files.wordpress.com/2013/08/issue221.pdf)
In the previous sections, we saw that GADTs generalize phantom types and existential types. Let's explore the heterogeneous lists pattern (lists of varying types), first using existentials and then GADTs.
We can quite easily define a heterogeneous list using existentials:
{-# LANGUAGE ExistentialQuantification #-}
data LI_Eq1 = forall a. LI_Eq1 a
hListEq1 :: [LI_Eq1]
hListEq1 = [LI_Eq1 3, LI_Eq1 "5"]
However, as we saw earlier, we can't do anything with this list. For example, in order to show list items we need to package a show
function with each item:
data LI_Eq2 = forall a. LI_Eq2 a (a -> String) hListEq2 :: [LI_Eq2] hListEq2 = [LI_Eq2 3 (show :: Int -> String), LI_Eq2 "5" (show :: String -> String)]
(We add the show
type signatures here for the sake of clarity but they can be inferred, in other words omitted.)
showEq2 (LI_Eq2 v showF) = showF v -- e.g. main = mapM_ (putStrLn . showEq2) hListEq2
Instead of passing in the show
functions explicitly, we can constrain the type signatures to the Show typeclass, allowing us to implicitly supply the show functions:
data LI_Eq3 = forall a. Show a => LI_Eq3 a
hListEq3 :: [LI_Eq3]
hListEq3 = [LI_Eq3 3, LI_Eq3 "5"]
showEq3 (LI_Eq3 v) = show v
The type-class constraint specified in the existential amounts to what is called bounded quantification (bounded by type-class).
We can express heterogeneous lists in the same two styles that we used with existentials. In the first style we pass in the show
function (a -> String
):
{-# LANGUAGE GADTs #-}
data LI_Gadt1 where
{MkShow1 :: a -> (a -> String) -> LI_Gadt1}
hListGadt1 :: [LI_Gadt1]
hListGadt1 = [MkShow1 "3" show, MkShow1 5 show]
showGadt1 (MkShow1 v showF) = showF v
Alternatively, we can also use a GADT with bounded quantification:
data LI_Gadt2 where
{MkShow2 :: Show a => a -> LI_Gadt2}
hListGadt1 :: [LI_Gadt1]
hListGadt2 = [MkShow2 "3", MkShow2 5]
showGadt2 (MkShow2 v) = show v