As soon as your first Erlang product reaches the market and is deployed around the world, you start working on feature enhancements for the second release. Imagine 15,000 lines of code, which incidentally happens to be the size of the code base of the first Erlang product Ericsson shipped, the Mobility Server. In your code base, you have tuples that contain data relating to the existing features and constants that have been hardcoded. When you add new features, you need to add fields to these tuples. The problem is that the fields need to be updated not only in the code base where you are adding these features, but also in the remaining 15,000 lines of code where you aren’t adding them. Missing one tuple will cause a runtime error. Assuming your constants also need to be updated, you need to change the hardcoded values everywhere they are used. And even more costly than implementing these software changes is the fact that the entire code base needs to be retested to ensure that no new bugs have been introduced or fields and constant updates have been omitted.
One of the most common constructions in computing is to bring together a number of pieces of data as a single item. Erlang tuples provide the basic mechanism for collecting data, but they do have some disadvantages, particularly when a larger number of data items are collected as a single object. In the first part of this chapter, you will learn about records, which overcome most of these disadvantages and which also make code evolution easier to achieve. The key to this is the fact that records provide data abstraction by which the actual representation of the data is hidden from the programs that access it.
Macros allow you to write abbreviations that are expanded by the Erlang
preprocessor. Macros can be used to make programs more readable, to extend
the language, and to write debugging code. We conclude the chapter by
describing the include
directive, by
which header files containing macro and record definitions are used in
Erlang projects.
Although neither is essential for writing Erlang programs, both are useful in making programs easier to read, modify, and debug, facilitating code enhancements and support of deployed products. It is no coincidence that records and macros, the two constructs described in this chapter, were added to the language soon after Ericsson’s Mobility Server went into production and developers started to support it while working on enhancing its feature set.
To understand the advantages of records, we will first introduce a small
example dealing with information about people. Suppose you want to store
basic information about a person, including his name, age, and telephone
number. You could do this using three-element tuples of the form {Name,Age,Phone}
:
-module(tuples1). -export([test/1, test/2]). birthday({Name,Age,Phone}) -> {Name,Age+1,Phone}. joe() -> {"Joe", 21, "999-999"}. showPerson({Name,Age,Phone}) -> io:format("name: ~p age: ~p phone: ~p~n", [Name,Age,Phone]). test1() -> showPerson(joe()). test2() -> showPerson(birthday(joe())).
At every point in the program where the person representation is
used, it must be presented as a complete tuple: {Name,Age,Phone}
. Although not apparently a
problem for a three-element tuple, adding new fields means you would have
to update the tuple everywhere, even in the code base where the new fields
are not used. Missing an update will result in a badmatch
runtime error when pattern-matching the
tuple. Furthermore, tuples do not scale well when dealing with sizes of 30
or even 10 elements, as the potential for misunderstanding or error is
much greater.
A record is a data structure with a fixed number of fields that are accessed by name, similar to a C structure or a Pascal record. This differs from tuples, where fields are accessed by position. In the case of the person example, you would define a record type as follows:
-record(person, {name,age,phone}).
This introduces the record type person
, where each record instance contains
three fields named name
, age
, and phone
. Field names are defined as atoms. Here
is an example of a record instance of this type:
#person{name="Joe", age=21, phone="999-999"}
In the preceding code, #person
is the constructor for person records. It just so
happens in this example that we listed the fields in the same order as
in the definition, but this is not necessary. The following expression
gives the same value:
#person{phone="999-999", name="Joe", age=21}
In both examples, we defined all the fields, but it is possible to give default values for the fields in the record definition, as in the following:
-record(person, {name,age=0,phone=""}).
Now a person record like this one:
#person{name="Fred"}
will have age zero and an empty phone number; in the absence of a
default value being specified, the “default default” is the atom
undefined
.
The general definition of a record name
with fields named field1
to
fieldn
will take the following form:
-record{name
, {field1
[ =default1
],field2
[ =default2
], ...fieldn
[ =defaultn
] }
where the parts enclosed in square brackets are optional declarations of default field values. The same field name can be used in more than one record type; indeed, two records might share the same list of names. The name of the record can be used in only one definition, however, as this is used to identify the record.
Suppose you are given a record value. How can you access the fields, and how can you describe a modified record? Given the following example:
Person = #person{name="Fred"}
you access the fields of the record like this: Person#person.name
, Person#person.age
, and so on. What will be the
values of these? The general form for this field access will be:
RecordExp
#name
.fieldName
where the name
and
fieldName
cannot be variables and
RecordExp
is an expression denoting a record.
Typically, this will be a variable, but it might also be the result of a
function application or a field access for another record type.
Suppose you want to modify a single field of a record. You can write this directly, as in the following:
NewPerson = Person#person{age=37}
In such a case, the record syntax is a real advantage. You have
mentioned only the field whose value is modified; those that are
unchanged from Person
to NewPerson
need not figure in the definition.
In fact, the record mechanism allows for any selection of the fields to
be updated, as in:
NewPerson = Person#person{phone="999-999",age=37}
The general case will be:
RecordExp
#name
{...,fieldNamei
=valuei
, ... }
where the field updates can occur in any order, but each field name can occur, at most, only once.
Using pattern matching over records it is possible to extract
field values and to affect the control flow of computation. Suppose you
want to define the birthday
function,
which increases the age of the person by one. You could define the
function using field selection and update like this:
birthday(P) -> P#person{age = P#person.age + 1}.
But it is clearer to use pattern matching:
birthday(#person{age=Age} = P) -> P#person{age=Age+1}.
The preceding code makes it clear that the function is applied to
a person
record, as well as
extracting the age
field into the
variable Age
. It is also possible to
match against field values so that you increase only Joe’s age, keeping
everyone else the same age:
joesBirthday(#person{age=Age,name="Joe"} = P) -> P#person{age=Age+1}; joesBirthday(P) -> P.
Revisiting the example from the beginning of the section, you can give the definitions using records:
-module(records1). -export([birthday/1, joe/0, showPerson/1]). -record(person, {name,age=0,phone}). birthday(#person{age=Age} = P) -> P#person{age=Age+1}. joe() -> #person{name="Joe", age=21, phone="999-999"}. showPerson(#person{age=Age,phone=Phone,name=Name}) -> io:format("name: ~p age: ~p phone: ~p~n", [Name,Age,Phone]).
Although the notation used here is a little more verbose, this is
more than compensated for by the clarity of the code, which makes clear
our intention to work with records of people, as well as concentrating
on the relevant details: it is clear from the definition of birthday
that it operates on the age
field and leaves the others unchanged.
Finally, the code is more easily modified if the composition of the
record is changed or extended; the first exercise at the end of this
chapter gives you a chance to verify this for yourself.
Record fields can contain any valid Erlang data types. As records
are valid data types, fields can contain other records, resulting in
nested records. For example, the content of the name
field in a person
record could itself be a record:
-record(name, {first, surname}). P = #person{name = #name{first = "Robert", surname = "Virding"}} First = (P#person.name)#name.first.
Furthermore, field selection of a nested field can be given by a
single expression, as in the definition of First
earlier.
Records in Erlang are a compile-time feature, and they don’t have their own types in the virtual machine (VM). Because of this, the shell deals with them differently than it does other constructions.
Using the command rr(
moduleName
)
in the shell, all record definitions in the
module moduleName
are loaded. You can
otherwise define records directly in the shell itself using the
command rd(name, {field1, field2, ...
})
, which defines the record name
with fields field1
, field2
, and so on. This can be useful in
testing and debugging, or if you do not have access to the module in
which you’ve defined the record. Finally, the command rl()
lists all the
record definitions currently visible in the shell. Try them out in the
shell:
1>c("/Users/Francesco/records1", [{outdir, "/Users/Francesco/"}]).
{ok,records1} 2>rr(records1).
[person] 3>Person = #person{name="Mike",age=30}.
#person{name = "Mike",age = 30,phone = undefined} 4>Person#person.age + 1.
31 5>NewPerson = Person#person{phone=5697}.
#person{name = "Mike",age = 30,phone = 5697} 6>rd(name, {first, surname}).
name 7>NewPerson = Person#person{name=#name{first="Mike",surname="Williams"}}.
#person{name = #name{first = "Mike",surname = "Williams"}, age = 30,phone = undefined} 8>FirstName = (NewPerson#person.name)#name.first.
"Mike" 9>rl().
-record(name,{first,surname}). -record(person,{name,age = 0,phone}). ok 10>Person = Person#person{name=#name{first="Chris",surname="Williams"}}.
** exception error: no match of right hand side value #person{name = #name{first = "Mike",surname = "Williams"}, age = 30,phone = undefined}
In the preceding example, we load the person
record definition from the records1
module, create an instance of it, and
extract the age
field. In command 6,
we create a new record of type name
,
with the fields first
and surname
. We bind the name
field of the record stored in the
variable Person
to a new record
instance we create in one operation. Finally, in command 8, we extract
the first name by looking up the name
field in the record of type person
stored in the variable NewPerson
, all
in one operation.
Look at what happens in command 10. This is a very common error
made by beginners and seasoned programmers, that is, forgetting that
Erlang variables are single assignment and that the =
operator is nondestructive. In command 10,
you might think you are changing the value of the name
field to a new name, but you are in fact
pattern-matching a record you’ve just created on the right side with the
contents of the bound variable Person
on the left. The pattern matching fails, as the record name contains the
fields "Mike"
and "Williams"
on the left and the fields "Chris"
and "Williams"
on the right.
Finally, the shell commands rf(
RecordName
)
and rf()
forget one or all of the record definitions currently visible in the
shell.
We are now about to let you in on a poorly kept secret. We would rather not tell you, but when testing with records from the shell, using debugging tools to troubleshoot your code, or printing out internal data structures, you are bound to come across this. The Erlang compiler implements records before programs are run. Records are translated into tuples, and functions over records translate to functions and BIFs over the corresponding tuples. You can see this from this shell interaction:
11>records1:joe().
#person{name = "Joe",age = 21,phone = "999-999"} 12>records1:joe() == {person,"Joe",21,"999-999"}.
true 13>Tuple = {name,"Francesco","Cesarini"}.
#name{first = "Francesco",surname = "Cesarini"} 14>Tuple#name.first.
"Francesco"
From the preceding code, you can deduce that person
is a 4-tuple, the first element being
the atom person
“tagging” the tuple
and the remaining elements being the tuple fields in the order in which
they are listed in the declaration of the record. The name
record is a 3-tuple, where the first
element is the atom name
, the second
is the first
name field, and the
third is the surname
field.
Note how the shell by default assumes that a tuple is a record. This will unfortunately be the same in your programs, so whatever you do, never, ever use the tuple representations of records in your programs. If you do, the authors of this book will disown you and deny any involvement in helping you learn Erlang. We mean it!
Why should you never use the tuple representation of records?
Using the representation breaks data abstraction, so any modification
to your record type will not be reflected in the code using the
tuples. If you add a field to the record, the size of the tuple
created by the compiler will change, resulting in a badmatch
error when trying to pattern-match
the record to your tuple (where you obviously forgot to add the new
element). Swapping the field order in the record will not affect your
code if you are using records, as you access the fields by name. If in
some places, however, you use a tuple and forget to swap all
occurrences, your program may fail, or worse, may behave in an
unexpected and unintended way. Finally, even though this should be the
least of your worries, the internal record representation might change
in future releases of Erlang, making your code
nonbackward-compatible.
To view the code produced as a source code transformation on
records, compile your module and include the 'E'
option. This results in a file with the
E suffix. As an example, let’s compile the records1
module using compile:file(records1,
['E'])
or the shell command c(records1, ['E'])
, producing a file called
records1.E. No beam file containing
the object code is produced. Note the slightly different syntax to what
you have read so far, and pay particular attention to the record
operations and tests which have been mapped to tuples, as well as the
module_info
functions which have been
added. We will not go into the details of the various commands, as they
are implementation-dependent and outside the scope of this book. They
are, however, still interesting to see:
-file("/Users/Francesco/records1.erl", 1). birthday({person,_,Age,_} = P) -> begin Rec0 = Age + 1, Rec1 = P, case Rec1 of {person,_,_,_} -> setelement(3, Rec1, Rec0); _ -> erlang:error({badrecord,person}) end end. joe() -> {person,"Joe",21,"999-999"}. showPerson({person,Name,Age,Phone}) -> io:format("name: ~p age: ~p phone: ~p~n", [Name,Age,Phone ]). module_info() -> erlang:get_module_info(records1). module_info(X) -> erlang:get_module_info(records1, X).
The BIF record_info
will
give information about a record type and its representation. The
function call record_info(fields,
recType
)
will return the list of field names in the
recType
, and the function call record_info(size,
recType
)
will return the size of the representing tuple, namely the number of
fields plus one. The position of a field in the representing tuple is given by
#recType.fieldName
, where both
recType
and
fieldName
are atoms:
15>#person.name.
2 16>record_info(size, person).
4 17>record_info(fields, person).
[name,age,phone] 18>RecType = person.
person 19>record_info(fields, RecType).
* 1: illegal record info 20>RecType#name.
* 1: syntax error before: '.'
Note how command 19 failed. If you type the same code in a module
as part of a function and compile it, the compilation will also fail.
The reason is simple. The record_info/2
BIF and the #RecordType.Field
operations must contain
literal atoms; they may not contain variables. This is because they are
handled by the compiler and converted to their respective values before
the code is run and the variables are bound.
A BIF that you can use in guards is is_record(
Term
,
RecordTag
)
. The BIF will verify that
Term
is a tuple, that its first element is
RecordTag
, and that the size of the tuple is
correct. This BIF returns the atom true
or false
.
Macros allow you to write abbreviations of Erlang constructs that the Erlang Preprocessor (EPP) expands at compile time. You can use macros to make programs more readable and to implement features outside the language itself. With conditional macros, it becomes possible to write programs that can be customized in different ways, switching between debugging and production modes or among different architectures.
The simplest macro can be used to define a constant, as in:
-define(TIMEOUT, 1000).
The macro is used by putting a ?
in front of the macro name, as in:
receive after ?TIMEOUT -> ok end
After macro expansion in epp
,
the preceding code will give the following Erlang program:
receive after 1000 -> ok end
The general form of a simple macro definition is:
-define(Name,Replacement).
where it is customary—but not required—to CAPITALIZE the Name
. In the earlier example, the Replacement
was the literal 1000
; it can, in fact, be any sequence of
Erlang tokens—that is, a sequence of “words” such as variables, atoms,
symbols, or punctuation. The result need not be a complete Erlang
expression or a top-level form (i.e., a function
definition or compiler directive). It is not
possible to build new tokens through macro expansion. As an example,
consider the following:
-define(FUNC,X). -define(TION,+X). double(X) -> ?FUNC?TION.
Here, you can see that the replacement for TION
is not an expression, but on expansion a
legitimate function (or top-level form) definition
is produced. Note that when appending macros, a space delimiting their
results is added to the result by default:
double(X) -> X + X.
Macros can take parameters which are indicated by variable names. The general form for parameterized macros is:
-define(Name(Var1,Var2,...,VarN), Replacement).
where, as for normal Erlang variables, the variables Var1, Var2, ..., VarN
need to begin with a
capital letter. Here is an example:
-define(Multiple(X,Y),X rem Y == 0). tstFun(Z,W) when ?Multiple(Z,W) -> true; tstFun(Z,W) -> false.
The macro definition is used here to make a guard expression more readable; a macro rather than a function needs to be used, as the syntax for guards precludes function calls in guards. After macro expansion, the call is “inlined” thus:
tstFun(Z,W) when Z rem W == 0 -> true; tstFun(Z,W) -> false.
Another example of parameterized macros could be for diagnostic printouts. It is not uncommon to come across code where two macros have been defined, but one is commented out:
%-define(DBG(Str, Args), ok). -define(DBG(Str, Args), io:format(Str, Args)). birthday(#person{age=Age} = P) -> ?DBG("in records1:birthday(~p)~n", [P]), P#person{age=Age+1}.
When developing the system, you have all of the debug printouts on
in the code. When you want to turn them off, all you need to do is
comment the second definition of DBG
and uncomment the first one before recompiling the code.
One of the major uses of macros in Erlang is to allow code to be instrumented in various ways. The advantage of the macro approach is that in using conditional macros (which we will describe in this section), it is possible to generate different versions of code, such as a debugging version and a production version.
The first aspect of this is the ability to get hold of the
argument to a macro as a string, made up of the
tokens comprising the argument. You do this by prefixing the variable
with ??
, as in ??Call
:
-define(VALUE(Call),io:format("~p = ~p~n",[??Call,Call])). test1() -> ?VALUE(length([1,2,3])).
The first use of the Call
parameter is as ??Call
, which will be
expanded to the text of the parameter as a string; the second call will
be expanded to a call to length
so
that in the shell, you would see the following:
36> macros1: test1().
"length ( [ 1 , 2 , 3 ] )" = 3
Second, there is a set of predefined macros that are commonly used in debugging code:
Finally, it is possible to define conditional macros, which will be expanded in different
ways according to different flags passed to the compiler. Conditional
macros are a more elegant and effective way to get the same effect as
the earlier ?DBG
example, where given
two macros, the user comments one out. The following directives make
this possible:
Here is an example of their use:
-ifdef(debug). -define(DBG(Str, Args), io:format(Str, Args)). -else. -define(DBG(Str, Args), ok). -endif.
In the code this is used as follows:
?DBG("~p:call(~p) called~n",[?MODULE, Request])
To turn on system debugging, you need to set the debug
flag. You can do this in the shell using
the following command:
c(Module,[{d,debug}]).
Or, you can do it programmatically, using compile:file/2
with
similar flags. You can unset the flag by using c(Module,[{u,debug}])
.
Conditional macro definitions such as these need to be properly nested, and cannot occur within function definitions.
To debug macro definitions, it is possible to
get the compiler to dump a file of the results of applying epp
to a file. You do this in a shell with
c(Module,['P'])
and in a program with
compile:file/2
; these commands dump
the result in the file Module.P.
The 'P'
flag differs from the
'E'
flag in that code transformations
necessary for record operations are not done by 'P'
.
It is customary to put record and macro definitions into an include
file so that they can be shared across multiple modules
throughout a project, and not simply in a single module. To make the
definitions available to more than one module, you place them in a
separate file and include them in a module using
the –include
directive,
usually placed after the module
and export
directives:
-include("File.hrl").
In the preceding directive, the quotes "..."
around the
filename are mandatory. Include files customarily
have the suffix .hrl, but this is
not enforced.
The compiler has a list of paths to search for include files, the
first of which is the current directory followed by the directory
containing the source code being compiled. You can include other paths
in the path list by compiling your code using the i
option: c(Module,
[{i, Dir}])
. Several directories can be specified, where the
directory specified last is searched first.
Extend the person
record type
to include a field for the address of the person. Which of the existing
functions over person
need to be
modified, and which can be left unchanged?
Using the record
BIF record(P, person)
, it is possible to check
whether the variable P
contains a
person
record. Explain how you would
use this to modify the function foobar
, defined as follows:
foobar(P) when P#person.name == "Joe" -> ...
so that it will not fail if applied to a nonrecord.
Revisit the database example db.erl that you wrote in Exercise 3-4 in Chapter 3. Rewrite it using records instead of tuples. As a record, you could use the following definition:
-record{data, {key, data}).
You should remember to place this definition in an include file. Test your results using the database server developed in Exercise 5-1 in Chapter 5.
Define a record type to represent circles; define another to represent rectangles. You should assume the following:
A circle has a radius.
A rectangle has a length and a width.
Give functions that work over these types to give the perimeter and area of these geometric figures. Once this is completed, add the code for triangles to your type definitions and functions, where you can assume that the triangle is described by the lengths of its three sides.
Define a record type to represent binary trees with numerical values held at internal nodes and at the leaves. Figure 7-1 shows an example.
Define functions over the record type to do the following:
Sum the values contained in the tree.
Find the maximum value contained in the tree (if any).
A tree is ordered if, for all nodes, the values in the left subtree below the node are smaller than or equal to the value held at the node, and this value is less than or equal to all the values in the right subtree below the node. Figure 7-2 shows an example:
Define a parameterized macro SHOW_EVAL
that will simply return the result
of an expression when the show
mode
is switched off, but which will also print the expression and its value
when the show
flag is on. You should
ensure that the expression is evaluated only once whichever case
holds.
How can you use the Erlang macro facility to count the number of calls to a particular function in a particular module?
An enumerated type consists of a finite number of elements, such as the days of the week or months of the year. How can you use macros to help the implementation of enumerated types in Erlang?