Functional programming, in its “purest” sense, is rooted in how functions, variables, and values actually work in mathematics, which is different from how they typically work in most programming languages.
Functional programming got its start before digital computers even existed. Many of the theoretical underpinnings of computation were developed in the 1930s by mathematicians like Alonzo Church and Haskell Curry.
In the 1930s, Alonzo Church developed the Lambda Calculus, which is a formalism for defining and invoking functions (called applying them). Today, the syntax and behavior of most programming languages reflect this model.
Haskell Curry (for whom the Haskell language is named) helped develop Combinatory Logic, which provides an alternative theoretical basis for computation. Combinatory Logic examines how combinators, which are essentially functions, combine to represent a computation. One practical application of combinators is to use them as building blocks for constructing parsers. They are also useful for representing the steps in a planned computation, which can be analyzed for possible bugs and optimization opportunities.
More recently, Category Theory has been a fruitful source of ideas for functional programming, such as ways to structure computations so that side effects like IO (input and output), which change the state of the “world,” are cleanly separated from code with no side effects.
A lot of the literature on functional programming reflects its mathematical roots, which can be overwhelming if you don’t have a strong math background. In contrast, object-oriented programming seems more intuitive and approachable. Fortunately, you can learn and use the principles of functional programming without a thorough grounding in mathematics.
The first language to incorporate functional programming ideas was Lisp,[3] which was developed in the late 1950s and is the second-oldest high-level programming language, after Fortran. The ML family of programming languages started in the 1970s, including Caml, OCaml (a hybrid object-functional language), and Microsoft’s F#. Perhaps the best known functional language that comes closest to functional “purity” is Haskell, which was started in the early 1990s. Other recent functional languages include Clojure and Scala, both of which run on the JVM but are being ported to the .NET environment. Today, many other languages are incorporating ideas from functional programming.
Don’t all programming languages have functions? If so, why aren’t all programming languages considered functional languages? Functional languages share a few basic principles.
The first principle is the use of immutable values. You might recall the famous Pythagorean equation from school, which relates the lengths of the sides of a triangle:
If I give you
values for the variables
x
and y
, say x=3
and y=4
, you can compute the value
for z
(5
in this case). The key idea here is that
values are never modified. It would be crazy to say
3++
, but you could start over by
assigning new values to the
same variables.
Most programming languages
don’t make a clear distinction between a value (i.e., the contents of
memory) and a variable that refers to it. In Java, we’ll use final
to prohibit
variable reassignment, so we get objects that are
immutable values.
Why should we avoid mutating values? First, allowing mutable values is what makes multithreaded programming so difficult. If multiple threads can modify the same shared value, you have to synchronize access to that value. This is quite tedious and error-prone programming that even the experts find challenging [Goetz2006]. If you make a value immutable, the synchronization problem disappears. Concurrent reading is harmless, so multithreaded programming becomes far easier.
A second benefit of immutable values relates to program correctness in other ways. It is harder to understand and exhaustively test code with mutable values, particularly if mutations aren’t localized to one place. Some of the most difficult bugs to find in large systems occur when state is modified non-locally, by client code that is located elsewhere in the program.
Consider the following
example, where a mutable List
is used
to hold a customer’s orders:
public
class
Customer
{
// No setter method
private
final
List
<
Order
>
orders
;
public
List
<
Order
>
getOrders
()
{
return
orders
;
}
public
Customer
(...)
{...}
}
It’s reasonable that
clients of Customer
will want to view
the list of Orders
. Unfortunately, by
exposing the list through the getter method, getOrders
, we’ve lost control over them! A
client could modify the list without our knowledge. We didn’t provide a
setter for orders
and it is declared
final
, but these protections only
prevent assigning a new list to orders
. The list itself can still be
modified.
We could work around this
problem by having getOrders
return a
copy of the list or by adding special accessor methods to Customer
that provide controlled access to
orders
. However, copying the list is
expensive, especially for large lists. Adding ad-hoc accessor methods
increases the complexity of the object, the testing burden, and the
effort required of other programmers to comprehend and use the
class.
However, if the list of orders is immutable and the list elements are immutable, these worries are gone. Clients can call the getter method to read the orders, but they can’t modify the orders, so we retain control over the state of the object.
What happens when the list of orders is supposed to change, but it has become huge? Should we relent and make it mutable to avoid the overhead of making big copies? Fortunately, we have an efficient way to copy large data structures; we’ll reuse the parts that aren’t changing! When we add a new order to our list of orders, we can reuse the rest of the list. We’ll explore how in Chapter 3.
Some mutability is unavoidable. All programs have to do IO. Otherwise, they could do nothing but heat up the CPU, as a joke goes. However, functional programming encourages us to think strategically about when and where mutability is necessary. If we encapsulate mutations in well-defined areas and keep the rest of the code free of mutation, we improve the robustness and modularity of our code.
We still need to handle mutations in a thread-safe way. Software Transactional Memory and the Actor Model give us this safety. We’ll explore both in Chapter 4.
Make your objects immutable. Declare fields final
. Only provide getters for fields and
then only when necessary. Be careful that mutable final
objects can still be modified. Use
mutable collections carefully. See “Minimize Mutability” in [Bloch2008] for more
tips.
In Java, we are accustomed to passing objects and primitive values to methods, returning them from methods, and assigning them to variables. This means that objects and primitives are first-class values in Java. Note that classes themselves aren’t first-class values, although the reflection API offers information about classes.
Functions are not first-class values in Java. Let’s clarify the difference between a method and a function.
A method is a block of code attached to a
particular class. It can only be called in the context of the class,
if it’s defined to be static
, or in
the context of an instance of the class. A
function is more general. It is not attached to
any particular class or object. Therefore, all instance
methods are functions where
one of the arguments is the object.
Java only has methods and methods aren’t first-class in Java. You can’t pass a method as an argument to another method, return a method from a method, or assign a method as a value to a variable.
However, most
anonymous inner classes are effectively function
“wrappers.” Many Java methods take an instance of an interface that
declares one method. Here’s a common example, specifying an ActionListener
for an AWT/Swing application
(see the Preface for details on obtaining and using
all the source code examples in this book):
package
functions
;
import
java.awt.*
;
import
java.awt.event.*
;
class
HelloButtonApp2
{
private
final
Button
button
=
new
Button
();
public
HelloButtonApp2
()
{
button
.
addActionListener
(
new
ActionListener
()
{
public
void
actionPerformed
(
ActionEvent
e
)
{
System
.
out
.
println
(
"Hello There: event received: "
+
e
);
}
});
}
}
If we want the button to
do something, we have to specify an ActionListener
object, which has a single
method: actionPerformed
. We used an
anonymous inner class to implement the interface
and the method.
It is very common in Java APIs to define custom interfaces like this that declare a single abstract method. They are often labelled “callback methods,” because they are typically used to enable registration of client code that will be called for particular events.
The world’s Java APIs
must have hundreds of one-off, special-purpose interfaces like ActionListener
. It greatly increases the
cognitive load on the developer to learn all of them. You spend a lot of
time reading Javadocs or letting your IDE remember for you. We’ve been
told that abstraction is a good thing, right? Well, let’s introduce
abstractions for all these “function objects”!
First, here is an
interface that defines a “function” that takes one argument of type
parameter A and returns void
:
package
functions
;
public
interface
Function1Void
<
A
>
{
void
apply
(
A
a
);
}
You could call the
generic method name anything you want, but I chose apply
because it is a common name in
functional programming, derived from the convention of saying that you
“apply” a function to its arguments when you call it.
Now, let’s pretend that
there is a “functional” version of the Abstract Window Toolkit (AWT),
java.fawt.Component
, with a method
addActionListener
that takes a
Function1Void
object instead of
ActionListener
:
package
functions
;
import
java.fawt.*
;
import
java.fawt.event.*
;
class
FunctionalHelloButtonApp
{
private
final
Button
button
=
new
Button
();
public
FunctionalHelloButtonApp
()
{
button
.
addActionListener
(
new
Function1Void
<
ActionEvent
>()
{
// 1
public
void
apply
(
ActionEvent
e
)
{
// 2
System
.
out
.
println
(
"Hello There: event received: "
+
e
);
}
});
}
}
I have indicated the
changes with the two comments 1
and
2
. Otherwise, the code is identical
to the previous example.
You might argue that
having a custom type for the argument to addActionListener
prevents a user from passing
an arbitrary and inappropriate object to it. You might also claim that
the custom name of the interface and the custom method name help
document the API for the reader. Neither argument really holds
up.
First, giving abstractions
special names does nothing to prevent the user from implementing the
wrong thing. As far as documentation is concerned, addActionListener
must document its
expectations (as we’ll discuss in The Liskov Substitution Principle). The type
parameter for Function1Void<ActionEvent>
must still
appear in addActionListener
signature. That’s another
bit of essential documentation for the user.
Once the developer is
accustomed to using Function1Void<A>
all over the JDK (in
our more perfect world…), it’s no longer necessary to learn all the
one-off interfaces defined in the library. They are all effectively the
same thing; a function wrapper.
So, we have introduced a
new, highly reusable abstraction. You no longer need to remember the
name of the special type you pass to addActionListener
. It’s just the same Function1Void
that you use “everywhere.” You
don’t need to remember the special name of its method. It’s always just
apply
.
It was a revelation for me when I realized how much less I have to learn when I can reuse the same function abstractions in a wide variety of contexts. I no longer care about trivial details like one-off interface names. I only care about what a particular function is supposed to do.
While we’ve reduced some
of the unnecessary complexity in the JDK (or pretended to do so), the
syntax is still very verbose, as we still have to say things like
new Function1Void<ActionEvent>()
{…}
. Wouldn’t it be great if we could just write an
anonymous function with just the argument list and
the body?
Most programming languages now support this. After years of debate, JDK 8 will introduce a syntax for defining anonymous functions, also called lambdas (see [Project Lambda] and [Goetz2010]). Here is what the planned syntax looks like:
public
FunctionalHelloButtonApp
()
{
button
.
addActionListener
(
#
{
ActionEvent
e
->
System
.
out
.
println
(
"Hello There: event received: "
+
e
)
}
);
}
The #{…}
expression is the
literal syntax for lambda expressions. The argument
list is to the left of the “arrow” (->
) and the body of the function is to the
right of the arrow. Notice how much boilerplate code this syntax
removes!
The term lambda is another term for anonymous function. It comes from the use of the Greek lambda symbol λ to represent functions in lambda calculus.
For completeness, here is
another example function type, one that takes two arguments of types
A1
and A2
, respectively, and returns a non-void value
of type R
. This example is inspired
by the Scala types for anonymous functions:
package
functions
;
public
interface
Function2
<
A1
,
A2
,
R
>
{
R
apply
(
A1
a1
,
A2
a2
);
}
Unfortunately, you would need a separate interface for every function “arity” you want (arity is the number of arguments). Actually, it’s that number times two; one for the void return case and one for the non-void return case. However, the effort is justified for a widely used concept. Actually, the [Functional Java] project has already done this work for you.
There is a special term for functions that take other functions as arguments or return them as results: higher-order functions. Java methods are limited to primitives and objects as arguments and return values, but we can mimic this feature with our Function interfaces.
Higher-order functions
are a powerful tool for building abstractions and composing behavior. In
Chapter 3, we’ll show how
higher-order functions allow nearly limitless customization of standard
library types, like Lists
and
Maps
, and also promote reusability.
In fact, the combinators we mentioned at the
beginning of this chapter are higher-order functions.
Another source of complexity, which leads to bugs, are functions that mutate state, e.g., setting values of an object’s field or global variables.
In mathematics, functions
never have side effects, meaning they are
side-effect-free. For example, no matter how much
work sin(x)
has to do, its entire
result is returned to the caller. No external state is changed. Note
that a real implementation might cache previously calculated values, for
efficiency, which would require changing the state of a cache. It’s up
to the implementer to preserve the side-effect-free external behavior
(including thread safety), as seen by users of the function.
Being able to replace a function call for a particular set of parameters with the value it returns is called referential transparency. It has a fundamental implication for functions with no side effects; the function and the corresponding return values are really synonymous, as far as the computation is concerned. You can represent the result of calling any such function with a value. Conversely, you can represent any value with a function call!
Side-effect-free functions make excellent building blocks for reuse, since they don’t depend on the context in which they run. Compared to functions with side effects, they are also easier to design, comprehend, optimize, and test. Hence, they are less likely to have bugs.
Recall that functional
programming in its purest form doesn’t allow mutable values. That means
we can’t use mutable loop counters to iterate through a collection! Of
course, Java already solves this problem for us with the foreach
loop:
for
(
String
str:
myListOfStrings
)
{...}
which encapsulates the required loop counting. We’ll see other iteration approaches in the next chapter, when we discuss operations on functional collections.
The classic functional alternative to an iterative loop is to use recursion, where each pass through the function operates on the next item in the collection until a termination point is reached. Recursion is also a natural fit for certain algorithms, such as traversing a tree where each branch is itself a tree.
Consider the following
example, where a unit test defines a simple tree type, with a value at
each node, and left and right subtrees. The Tree
type defines a recursive toString
method that walks the tree and builds
up a string from each node. After the definition, the unit test declares
an instance of the tree and tests that toString
works as expected:
package
functions
;
import
static
org
.
junit
.
Assert
.*;
import
org.junit.Test
;
public
class
RecursionTest
{
static
class
Tree
{
// public fields for simplicity
public
final
Tree
left
;
// left subtree
public
final
Tree
right
;
// right subtree
public
final
int
value
;
// value at this node
public
Tree
(
Tree
left
,
int
value
,
Tree
right
)
{
this
.
left
=
left
;
this
.
value
=
value
;
this
.
right
=
right
;
}
public
final
String
toString
()
{
String
leftStr
=
left
==
null
?
"^"
:
left
.
toString
();
String
rightStr
=
right
==
null
?
"^"
:
right
.
toString
();
return
"("
+
leftStr
+
"-"
+
value
+
"-"
+
rightStr
+
")"
;
}
}
@Test
public
void
walkATree
()
{
Tree
root
=
new
Tree
(
new
Tree
(
new
Tree
(
null
,
3
,
null
),
2
,
new
Tree
(
new
Tree
(
null
,
5
,
null
),
4
,
null
)),
1
,
new
Tree
(
new
Tree
(
null
,
7
,
null
),
6
,
new
Tree
(
null
,
8
,
null
)));
String
expected
=
"(((^-3-^)-2-((^-5-^)-4-^))-1-((^-7-^)-6-(^-8-^)))"
;
assertEquals
(
expected
,
root
.
toString
());
}
}
However, each recursion adds a new frame to the stack, which can exceed the stack size for deep recursions. Tail-call recursions can be converted to loops, eliminating the extra function call overhead. Unfortunately, the JVM and the Java compiler do not currently perform this optimization.
Mathematics defines some infinite sets, such as the natural numbers (all positive integers). They are represented symbolically. Any particular finite subset of values is evaluated only on demand. We call this lazy evaluation. Eager evaluation would force us to represent all of the infinite values, which is clearly impossible.
Some languages are lazy by default, while others provide lazy data structures that can be used to represent infinite collections and only compute a subset of values on demand. Here is an example that represents the natural numbers:
package
math
;
import
static
datastructures2
.
ListModule
.*;
public
class
NaturalNumbers
{
public
static
final
int
ZERO
=
0
;
public
static
int
next
(
int
previous
)
{
return
previous
+
1
;
}
public
static
List
<
Integer
>
take
(
int
count
)
{
return
doTake
(
emptyList
(),
count
);
}
private
static
List
<
Integer
>
doTake
(
List
<
Integer
>
accumulator
,
int
count
)
{
if
(
count
==
ZERO
)
return
accumulator
;
else
return
doTake
(
list
(
next
(
count
-
1
),
accumulator
),
count
-
1
);
}
}
We start with a
definition of zero, then use next
to
compute each natural number from its predecessor. The take(n)
method is a pragmatic tool for
extracting a fixed subset of the integers. It returns a List
of the integers from 1 to n
. (The List
type shown will be discussed in Chapter 3. It isn’t java.util.List
.) Note that the helper method
doTake
is tail-call
recursive.
We have replaced values, integers in this case, with functions that compute them on demand, an example of the referential transparency we discussed earlier. Lazy representation of infinite data structures wouldn’t be possible without this feature! Both referential transparency and lazy evaluation require side-effect-free functions and immutable values.
Finally, lazy evaluation is useful for deferring expensive operations until needed or never executing them at all.
Finally, functional programming is declarative, like mathematics, where properties and relationships are defined. The runtime figures out how to compute final values. The definition of the factorial function provides an example:
factorial(n) = 1 if n = 1 n * factorial(n-1) if n > 1
The definition relates
the value of factorial(n)
to factorial(n-1)
, a recursive definition. The
special case of factorial(1)
terminates the recursion.
Object-oriented programming is primarily imperative, where we tell the computer what specific steps to do.
To better understand the differences, consider this example, which provides a declarative and an imperative implementation of the factorial function:
package
math
;
public
class
Factorial
{
public
static
long
declarativeFactorial
(
int
n
)
{
assert
n
>
0
:
"Argument must be greater than 0"
;
if
(
n
==
1
)
return
1
;
else
return
n
*
declarativeFactorial
(
n
-
1
);
}
public
static
long
imperativeFactorial
(
int
n
)
{
assert
n
>
0
:
"Argument must be greater than 0"
;
long
result
=
1
;
for
(
int
i
=
2
;
i
<=
n
;
i
++)
{
result
*=
i
;
}
return
result
;
}
}
The declarativeFactorial
method might look
“imperative,” in the sense that it implements a calculation of
factorials, but its structure is more declarative than imperative. I
formatted the method to look similar to the definition of
factorial.
The imperativeFactorial
method uses mutable
values, the loop counter and the result
that accumulates the calculated value.
The method explicitly implements a particular algorithm. Unlike the
declarative version, this method has lots of little mutation steps,
making it harder to understand and keep bug free.
Declarative programming is made easier by lazy evaluation, because laziness gives the runtime the opportunity to “understand” all the properties and relations, then determine the optimal way to compute values on demand. Like lazy evaluation, declarative programming is largely incompatible with mutability and functions with side effects.
Whether you prefer static or dynamic typing, functional programming has some useful lessons to teach us about good type design. First, all functional languages emphasize the use of core container types, like lists, maps, trees, and sets for capturing and transforming data, which we’ll explore in Chapter 3. Here, I want to discuss two other benefits of functional thinking about types, enforcing valid values for variables and applying rigor to type design.
In a
pure functional language where values are
immutable, each variable must be initialized to a value that can be
checked to make sure it is valid. This suggests that we should never
allow a variable to reference our old friend, null
. Null values are a common source of bugs.
Tony Hoare, who invented the concept of null
, has recently called it The
Billion Dollar Mistake [Hoare2009].
Java’s model is to
“pretend” there is a Null
type that
is the subtype of all other types in the system. Suppose you have a
variable of type String
. If the value
can be null
, you could also think of
the type as actually StringOrNull
.
However, we never think in either terms and that’s why we often forget
to check for null
. What’s really
going on is that we have a variable that can “optionally” hold a value.
So, why not explicitly represent this idea in the type system? Consider
the following abstract class:
package
option
;
public
abstract
class
Option
<
T
>
{
public
abstract
boolean
hasValue
();
public
abstract
T
get
();
public
T
getOrElse
(
T
alternative
)
{
return
hasValue
()
==
true
?
get
()
:
alternative
;
}
}
Option
defines a “container” that may have one
item of type T
or not. The hasValue
method returns true
if the container has an item or false
if it doesn’t. Subclasses will define
this method appropriately. Similarly, the get
method returns the item, if there is one.
A variation of this method is the getOrElse
method, which will return the
alternative
value if the Option
doesn’t have a value. This is the one
method that can be implemented in this class.
Here is the first subtype,
Some
:
package
option
;
public
final
class
Some
<
T
>
extends
Option
<
T
>
{
private
final
T
value
;
public
Some
(
T
value
)
{
this
.
value
=
value
;
}
public
boolean
hasValue
()
{
return
true
;
}
public
T
get
()
{
return
value
;
}
@Override
public
String
toString
()
{
return
"Some("
+
value
+
")"
;
}
@Override
public
boolean
equals
(
Object
other
)
{
if
(
other
==
null
||
other
.
getClass
()
!=
Some
.
class
)
return
false
;
Some
<?>
that
=
(
Some
<?>)
other
;
Object
thatValue
=
that
.
get
();
return
value
.
equals
(
thatValue
);
}
@Override
public
int
hashCode
()
{
return
37
*
value
.
hashCode
();
}
}
A Some
instance is used when the Option
has a value. So, its hasValue
always returns true
and its get
method simply returns the value. It also
provides conventional toString
,
equals
, and hashCode
methods. I’ll explain why Some
is declared final
in the next section.
Finally, here is None
, the only other valid
subtype of Option
:
package
option
;
public
final
class
None
<
T
>
extends
Option
<
T
>
{
public
static
class
NoneHasNoValue
extends
RuntimeException
{}
public
None
()
{}
public
boolean
hasValue
()
{
return
false
;
}
public
T
get
()
{
throw
new
NoneHasNoValue
();
}
@Override
public
String
toString
()
{
return
"None"
;
}
@Override
public
boolean
equals
(
Object
other
)
{
return
(
other
==
null
||
other
.
getClass
()
!=
None
.
class
)
?
false
:
true
;
}
@Override
public
int
hashCode
()
{
return
-
1
;
}
}
A None
instance is used when the Option
has no value. So, its hasValue
always returns false
and its get
method throws an exception, because there
is nothing to get! It also provides toString
, equals
, and hashCode
methods. Since
None
has no value, all instances are
considered equal! None
is
also final
.
The following unit test
exercises Option
, Some
, and None
:
package
option
;
import
java.util.*
;
import
org.junit.*
;
import
static
org
.
junit
.
Assert
.*;
public
class
OptionTest
{
private
List
<
Option
<
String
>>
names
=
null
;
@Before
public
void
setup
()
{
names
=
new
ArrayList
<
Option
<
String
>>();
names
.
add
(
new
Some
<
String
>(
"Dean"
));
names
.
add
(
new
None
<
String
>());
names
.
add
(
new
Some
<
String
>(
"Wampler"
));
}
@Test
public
void
getOrElseUsesValueForSomeAndAlternativeForNone
()
{
String
[]
expected
=
{
"Dean"
,
"Unknown!"
,
"Wampler"
};;
System
.
out
.
println
(
"*** Using getOrElse:"
);
for
(
int
i
=
0
;
i
<
names
.
size
();
i
++)
{
Option
<
String
>
name
=
names
.
get
(
i
);
String
value
=
name
.
getOrElse
(
"Unknown!"
);
System
.
out
.
println
(
name
+
": "
+
value
);
assertEquals
(
expected
[
i
],
value
);
}
}
@Test
public
void
hasNextWithGetUsesOnlyValuesForSomes
()
{
String
[]
expected
=
{
"Dean"
,
null
,
"Wampler"
};;
System
.
out
.
println
(
"*** Using hasValue:"
);
for
(
int
i
=
0
;
i
<
names
.
size
();
i
++)
{
Option
<
String
>
name
=
names
.
get
(
i
);
if
(
name
.
hasValue
())
{
String
value
=
name
.
get
();
System
.
out
.
println
(
name
+
": "
+
value
);
assertEquals
(
expected
[
i
],
value
);
}
}
}
static
Option
<
String
>
wrap
(
String
s
)
{
if
(
s
==
null
)
return
new
None
<
String
>();
else
return
new
Some
<
String
>(
s
);
}
@Test
public
void
exampleMethodReturningOption
()
{
System
.
out
.
println
(
"*** Method that Returns an Option:"
);
Option
<
String
>
opt1
=
wrap
(
"hello!"
);
System
.
out
.
println
(
"hello! -> "
+
opt1
);
assertEquals
(
Some
.
class
,
opt1
.
getClass
());
assertEquals
(
"hello!"
,
opt1
.
get
());
Option
<
String
>
opt2
=
wrap
(
null
);
System
.
out
.
println
(
"null -> "
+
opt2
);
assertEquals
(
None
.
class
,
opt2
.
getClass
());
assertEquals
(
"str"
,
opt2
.
getOrElse
(
"str"
));
}
}
After creating an array of
Some
and None
instances in the setup
method, the first test uses getOrElse
to extract the value for Some
instances, or the “alternative” for
None
instances. Print statements
output each case before the assertion verifies the expected
behavior.
The second test shows an
alternative way to work with the Options
. The hasValue
method is called to determine if the
Option
has a value (that is, if it is
a Some
instance). Only then is the
get
method called and the value is
output and tested with an assertion.
The final test
demonstrates the wrap
method defined
in the test, which demonstrates how an arbitrary method might return an
Option
instead of returning another
type when the value could be null
. In
this case, if the input String
is
null
, then a None
is returned. Otherwise, the input
String
is wrapped in a Some
.
Here is the output from
running the test. The following listing shows just the output from the
println
calls:
*** Using getOrElse: Some(Dean): Dean None: Unknown! Some(Wampler): Wampler *** Using hasValue: Some(Dean): Dean Some(Wampler): Wampler *** Method that Returns an Option: hello! -> Some(hello!) null -> None
Look at the method
signature for the test’s wrap
method
again:
static
Option
<
String
>
wrap
(
String
s
)
...
What’s most interesting
about this signature is the return value. The type tells you
that a value may or may not be available. That is, a value is
optional. Furthermore, Java’s type safety won’t let you “forget” that an
option is returned. You must determine if a Some
was returned and extract the value before
calling methods with it, or handle the None
case. Using Option
as a return type improves the
robustness of your code compared to allowing nulls
and it provides better documentation for
users of the code. We are expressing and enforcing the optional
availability of a value through the type system.
In the previous discussion
the Option
interface has only two
valid implementing types: Some
and
None
. Mathematically, Option
is an algebraic data
type, which for our purposes means that there can be only a
few well-defined types that implement the abstraction [AlgebraicDT]. It also means
that there are well-defined rules for transitioning from an instance of
one type to another. We’ll see a good example of these transitions when
we discuss lists in Chapter 3.
A similar-sounding (and easy to confuse) concept is the abstract data type. This is already familiar from object-oriented programming, where you define an interface for an abstraction and give it well-defined semantics. The abstraction is implemented by one or more types. Usually, abstract data types have relatively little polymorphic behavior. Instead, the subtypes optimize for different performance criteria, like search speed vs. update speed. Unlike algebraic data types, you might make these concrete classes private and hide them behind a factory, which could decide which class to instantiate based on the input arguments, for example.
A good example of an abstract data type is a map of key-value pairs. The abstraction tells us how to put new pairs in the map, query for existing pairs, remove pairs, etc.
To compare these two
concepts, an algebraic data type like Option
constrains the number of possible
subtypes that implement the abstraction. Usually these subtypes are
visible to users. In contrast, an abstract data
type imposes no limit on the possible subtypes, but often
those subtypes exist only to support different implementation goals and
they may be hidden behind a factory.
One final point on
algebraic data types. Recall that Some
and None
are final and can’t be subtyped. Final
types are often considered bad in Java, because you can’t subclass them
to create special versions for testing. That’s really only a problem for
types with strong dependencies on other objects that would make testing
difficult, like networked services. Well-designed algebraic
data types should never have such connections, so there is
really nothing that would need to be replaced by a test-only
derivative.
Note: Some of these exercises are difficult.
Write unit tests for Function1Void
and Function2
.
Write a method that uses recursion to add a list of numbers.
Find some Java code you wrote before that does null
checks. Try modifying it to use
Option
instead.
Explore the typing of functions under inheritance. Hint: this
exercise anticipates The Liskov Substitution Principle. If you get stuck, see
the unit tests for the functions
package that is part of the code distribution.
Suppose some method m1
takes a Function1<String,Object>
argument.
What would happen if you
passed an instance f1
of type
Function1<Object,Object>
to m1
? In Java, how could you
change the declaration of m1
so
that the compiler would allow you to pass f1
to it? Why would that be a valid
thing to do, at least from the perspective of “safe
typing”?
Considering the same method m1
, suppose you wanted to pass a
function f2
of type Function1<String,String>
to
m1
? How could you change the
declaration of m1
so that the
compiler would allows you to pass f2
to it? Why would that be a valid
thing to do from the safe typing perspective?