Chapter 10. Storing Structured Data

Tuples and lists are powerful tools for creating complex data structures, but there are two key pieces missing from the story so far. First, tuples are relatively anonymous structures. Relying on a specific order and number of components in tuples can create major maintenance headaches. This also means that tuples don’t let you refer to contents by name: you always have to know their location. Second, despite Erlang’s general preference for avoiding side effects, storing and sharing data is a fundamental side effect needed for a wide variety of projects.

Four tools provide more support for structured data. Maps work well when you want to refer to possibly varied information through a single list of names. Records will help you create labeled orderly sets of information. Erlang term storage (ETS) will help you store and manipulate those sets, and the Mnesia database provides additional features for reliable distributed storage.

Mapping Your Data

Referring to data by its place in a list or tuple can tax programmer memory and code quickly, especially if data comes and goes. Erlang 17 (and later) addresses this common challenge with a new data structure, the map. Map processing is slightly slower than list or tuple processing, but is often easier to work with: you don’t have to remember as much.

Creating a map requires a different syntax presenting keys and values:

1> Planemos = #{ earth => 9.8, moon => 1.6, mars => 3.71 }.
#{earth => 9.8,mars => 3.71,moon => 1.6}

The Planemos map now contains three items, with atoms as keys. The key earth references a value of 9.8, mars 3.71, and moon 1.6. The values are gravitational constants, though that doesn’t need to be specified. Unlike records, coming up next, the different pieces don’t get names.

The easiest way to extract values is with the Map module’s get function:

2> maps:get(moon, Planemos).
1.6

If you need to add a value to a map, you can’t—but you can ask for a new map that contains the values of the old map plus the new key-value pair, or a map that contains the old map minus a pair:

3> MorePlanemos = maps:put(venus, 8.9, Planemos).
#{earth => 9.8,mars => 3.71,moon => 1.6,venus => 8.9}
4> maps:get(venus, MorePlanemos).
8.9
5> FewerPlanemos = maps:remove(moon, MorePlanemos).
#{earth => 9.8,mars => 3.71,venus => 8.9}

Ask for a key that isn’t there, and you’ll get an error:

6> maps:get(moon, FewerPlanemos).
** exception error: {badkey,moon}
     in function  maps:get/2
        called as maps:get(moon,#{earth => 9.8,mars => 3.71,venus => 8.9})

While most of the power of maps remains locked in Map module functions and hasn’t yet reached Erlang’s own syntax, you can pattern match on maps:

17> #{earth := Gravity} = Planemos.
#{earth => 9.8,mars => 3.71,moon => 1.6}
18> Gravity.
9.8

If you need a flexible way to connect values with keys, maps may be what you’re looking for. The Maps module also provides a variety of tools that support processing maps with higher-order functions. If you’d like more structure, you probably want to consider records.

Warning

Maps appeared in Erlang 17, but are still slowly evolving and integrating into the language. Functions in the Maps module work, but only a few parts of the native Erlang syntax for maps have been implemented as of version 19. If you find examples online or even in books that don’t work, they may be looking a little too far into the future.

From Tuples to Records

Tuples let you build complex data structures, but force you to rely on keeping the order and number of items consistent. If you change the sequence of items in a tuple, or if you want to add an item, you have to check through all of your code to make sure that the change propagates smoothly. As your projects grow, and especially if you need to share data structures with code you don’t control, you’ll need a safer way to store and address information.

Records let you create data structures that use names (rather than order) to connect with data. You can read, write, and pattern match data in a record without having to worry about the details of where in a tuple a field lurks or whether someone’s added a new field.

Warning

There are still tuples underneath records, and occasionally Erlang will expose them to you. Do not attempt to use the tuple representation directly, or you will add all the potential problems of using tuples to the slight extra syntax of using records.

Setting Up Records

Using records requires telling Erlang about them with a special declaration. It looks like a -module or -export declaration, but is a -record declaration:

-record(planemo, {name, gravity, diameter, distance_from_sun}).

That defines a record type named planemo, containing fields named name, gravity, diameter, and distance_from_sun. Right now, when you create a new record, the fields will all have the value undefined, but you can also specify default values if you prefer, for situations where there is a sensible normal option. For example, this declaration creates records for different towers for dropping objects:

-record(tower, {location, height=20, planemo=earth, name}).

Unlike -module or -export declarations, you’ll often want to share record declarations across multiple modules and (for the examples in this chapter at least) even use them in the shell. To share record declarations reliably, just put the record declarations in their own file, ending with the extension .hrl. You can put each record declaration in a separate file or all of them in a single file, depending on your needs. To get started, and to see how these behave, you can put both of the declarations into a single file, records.hrl, as shown in Example 10-1. (You can find it in ch10/ex1-records.)

Example 10-1. A records.hrl file containing two rather unrelated record declarations
-record(planemo, {name, gravity, diameter, distance_from_sun}).
-record(tower, {location, height=20, planemo=earth, name}).
Note

You may want to put individual record declarations into their own files and import them separately, bringing them in only when you actually need to get data into or out of a particular record type. This can be especially important if you’re mixing code in cases where different developers used the same name for a record type but different underlying structures.

The command rr (for read records) lets you bring this into the shell:

1> rr("records.hrl").
[planemo,tower]

The shell now understands records with the names planemo and tower.

Note

You can also declare records directly in the shell with the rd/2 function, but if you’re doing anything more than just poking around, it’s easier to have them in a formal imported declaration, which is a more reliable approach. You can call rl/0 if you want to see what records are defined, or rl/1 if you want to see how a specific record is defined.

Creating and Reading Records

You can now create variables that contain new records. The syntax for referencing records prefaces the name of the record type with a #, and encloses name-value pairs in curly brackets. For example, you could create towers with syntax like the following:

2> Tower1=#tower{}.
#tower{location = undefined,height = 20,planemo = earth,
       name = undefined}
3> Tower2=#tower{location="Grand Canyon"}.
#tower{location = "Grand Canyon",height = 20,
       planemo = earth,name = undefined}
4> Tower3=#tower{location="NYC", height=241, name="Woolworth Building"}.
#tower{location = "NYC",height = 241,planemo = earth,
       name = "Woolworth Building"}
5> Tower4=#tower{location="Rupes Altai 241", height=500, planemo=moon,
    name="Piccolomini View"}.
#tower{location = "Rupes Altai 241",height = 500,
       planemo = moon,name = "Piccolomini View"}
6> Tower5=#tower{planemo=mars, height=500, name="Daga Vallis",
    location="Valles Marineris"}.
#tower{location = "Valles Marineris",height = 500,
       planemo = mars,name = "Daga Vallis"}

These towers (or at least drop sites) demonstrate a variety of ways to use the record syntax to create variables as well as interactions with the default values:

  • Line 2 just creates Tower1 with the default values. You can add real values later.

  • Line 3 creates a Tower2 with a location, but otherwise relies on the default values.

  • Line 4 overrides the default values for location, height, and name, but leaves the planemo alone.

  • Line 5 replaces all of the default values with new values.

  • Line 6 replaces all of the default values, and also demonstrates that it doesn’t matter in what order you list the name/value pairs. Erlang will sort it out.

You can read record entries with two different approaches. To extract a single value, you can use a dot (.) syntax that may look familiar from other languages. For example, to find out which planemo Tower5 is on, you could write:

7> Tower5#tower.planemo.
mars

You could also use pattern matching to extract several pieces simultaneously:

8> #tower{location=L5, height=H5} = Tower5.
#tower{location = "Valles Marineris",height = 500,
       planemo = mars,name = "Daga Vallis"}
9> L5.
"Valles Marineris"
10> H5.
500

The syntax feels a little backward, with the variable being bound on the right side of the equals sign instead of in its usual place on the left.

As always, you can’t write a new value to an existing variable, but you can create a new record based on the values of an old one. The syntax used on line 12 is much like that used for assigning the contents of a field to a variable, but with a value in place of the variable name:

11> Tower5.
#tower{location = "Valles Marineris",height = 500,
       planemo = mars,name = "Daga Vallis"}
12> Tower5a=Tower5#tower{height=512}.
#tower{location = "Valles Marineris",height = 512,
       planemo = mars,name = "Daga Vallis"}
Note

Yes, you always need to specify the record type. Yes, it’s a bit of extra typing.

If you ever want to make the shell forget your record declarations, you can issue the shell command rf(). Your record-based variables will still exist, in a raw tuple form you should avoid ever using.

Using Records in Functions and Modules

Records also work well in modules using the same declaration files. You can, of course, just include the record declaration in every module that uses it, but that will require you to hunt down every declaration and update it if you ever want to change it. The saner approach is to use the files like the ones previously shown. You can do that easily with a single extra declaration near the top of your module:

-include("records.hrl").

Once you have the record declaration included, you can pattern match against records submitted as arguments. The simplest way to do this is to just match against the type of the record, as shown in Example 10-2, which is also in ch10/ex1-records.

Example 10-2. A method that pattern matches a complete record
-module(record_drop).
-export([fall_velocity/1]).
-include("records.hrl").

fall_velocity(#tower{} = T) ->
   fall_velocity(T#tower.planemo, T#tower.height).

fall_velocity(earth, Distance) when Distance >= 0  -> math:sqrt(2 * 9.8 * Distance);
fall_velocity(moon, Distance) when Distance >= 0 -> math:sqrt(2 * 1.6 * Distance);
fall_velocity(mars, Distance) when Distance >= 0 -> math:sqrt(2 * 3.71 * Distance).

This code uses a pattern match that will match only tower records, and puts the record into a variable T. Once again, the syntax may seem backward, with T being on the right of the equals sign instead of on the left, but it works. Then, like the original code way back in Example 3-8, it passes the individual arguments to fall_velocity/2 for calculations, this time using the record syntax.

Note

Short variable names suddenly seem more attractive when you have to append the name of the record type on every use. In simple functions this can work, but in more complex functions short names may prove confusing, especially if you have two variables containing the same kind of record.

Because you used the same -record declaration in both the shell and the module, you can use the records you created to test the function.

14> c(record_drop).
{ok,record_drop}
15> record_drop:fall_velocity(Tower5).
60.909769331364245
16> record_drop:fall_velocity(Tower1).
19.79898987322333

The record_drop:fall_velocity/1 function shown in Example 10-3 pulls out the planemo and binds it to Planemo, and pulls out height and binds it to Distance. Then it returns the velocity of an object dropped from that Distance just like earlier examples throughout this book.

You can also extract the specific fields from the record in the pattern match, as shown in Example 10-3, which is in ch10/ex2-records.

Example 10-3. A method that pattern matches components of a record
-module(record_drop).
-export([fall_velocity/1]).
-include("records.hrl").

fall_velocity(#tower{planemo=Planemo, height=Distance}) ->
   fall_velocity(Planemo, Distance).

fall_velocity(earth, Distance) when Distance >= 0  -> math:sqrt(2 * 9.8 * Distance);
fall_velocity(moon, Distance) when Distance >= 0 -> math:sqrt(2 * 1.6 * Distance);
fall_velocity(mars, Distance) when Distance >= 0 -> math:sqrt(2 * 3.71 * Distance).

Again, the syntax may seem backwards, but it lets you extract the individual fields. You can take the records created and feed them into this function, and it will tell you the velocity resulting from a drop from the top of that tower to the bottom.

Finally, you can pattern match against both the fields and the records as a whole. Example 10-4, in ch10/ex3-records, demonstrates using this mixed approach to create a more detailed response than just the fall velocity.

Example 10-4. A method that pattern matches the whole record as well as components of a record
-module(record_drop).
-export([fall_velocity/1]).
-include("records.hrl").

fall_velocity(#tower{planemo=Planemo, height=Distance} = T) ->
io:format("From ~s's elevation of ~p meters on ~p, the object will reach ~p m/s
before crashing in ~s.~n",[T#tower.name, Distance, Planemo, fall_velocity(Planemo,
Distance), T#tower.location ]).

fall_velocity(earth, Distance) when Distance >= 0  -> math:sqrt(2 * 9.8 * Distance);
fall_velocity(moon, Distance) when Distance >= 0 -> math:sqrt(2 * 1.6 * Distance);
fall_velocity(mars, Distance) when Distance >= 0 -> math:sqrt(2 * 3.71 * Distance).

If you pass a tower record to record_drop:fall_velocity/1, it will match against the individual fields it needs to do the calculation, and match the whole record into T so that it can produce a more interesting if not necessarily grammatically correct report.

17> record_drop:fall_velocity(Tower5).
From Daga Vallis's elevation of 500 meters on mars, the object will reach
60.909769331364245 m/s before crashing in Valles Marineris.
ok
18> record_drop:fall_velocity(Tower3).
From Woolworth Building's elevation of 241 meters on earth, the object
will reach 68.72845116834803 m/s before crashing in NYC.
ok
Note

record_drop:fall_velocity/1 uses the ~s control sequence for the io:format/2 call. It just includes the contents of the string, without surrounding quotes.

Note

You can learn more about working with records in Chapter 7 of Erlang Programming; Section 3.9 of Programming Erlang; Section 2.11 of Erlang and OTP in Action; and Chapter 9 of Learn You Some Erlang For Great Good!.

Storing Records in Erlang Term Storage

ETS is a simple but powerful in-memory collection store. It holds tuples, and since records are tuples underneath, they’re a natural fit. ETS and its disk-based cousin DETS provide a (perhaps too) simple solution for many data management problems. ETS is not exactly a database, but does similar work, and is useful by itself as well as underneath the Mnesia database you’ll see in the next section.

Every entry in an ETS table is a tuple (or corresponding record), and one piece of the tuple is designated the key. ETS offers a few different structural choices depending on how you want to handle that key. ETS can hold four kinds of collections:

Sets (set)

Can contain only one entry with a given key. This is the default.

Ordered sets (ordered_set)

Same as a set, but also maintains a traversal order based on the keys. Great for anything you want to keep in alphabetic or numeric order.

Bags (bag)

Lets you store more than one entry with a given key. However, if you have multiple entries that have identical values, they get combined into a single entry.

Duplicate bags (duplicate_bag)

Not only lets you store more than one entry with a given key, but also lets you store multiple entries with identical values.

By default, ETS tables are sets, but you can specify one of the other options when you create a table. The examples in this chapter will be sets because they are simpler to figure out, but the same techniques apply to all four table varieties.

Note

There is no requirement in ETS that all of your entries look at all similar. When you’re starting out, however, it’s much simpler to use the same kind of record, or at least tuples with the same structure. You can also use any kind of value for the key, including complex tuple structures and lists, but again, it’s best not to get too fancy at the beginning.

All of the examples in the following section will use the planemo record type defined in the previous section, and the data in Table 10-1.

Table 10-1. Planemos for gravitational exploration
Planemo Gravity (m/s2) Diameter (km) Distance from Sun (106 km)

mercury

3.7

4878

57.9

venus

8.9

12104

108.2

earth

9.8

12756

149.6

moon

1.6

3475

149.6

mars

3.7

6787

227.9

ceres

0.27

950

413.7

jupiter

23.1

142796

778.3

saturn

9.0

120660

1427.0

uranus

8.7

51118

2871.0

neptune

11.0

30200

4497.1

pluto

0.6

2300

5913.0

haumea

0.44

1150

6484.0

makemake

0.5

1500

6850.0

eris

0.8

2400

10210.0

Creating and Populating a Table

The ets:new/2 function lets you create a table. The first argument is a name for the table, and the second argument is a list of options. There are lots and lots of options, including the identifiers for the table types just described, but the two most important for getting started are named_table and the tuple starting with keypos.

Every table has a name, but only some can be reached using that name. If you don’t specify named_table, the name is there but visible only inside the database. You’ll have to use the value returned by ets:new/2 to reference the table. If you do specify named_table, processes can reach the table as long as they know the name, without needing access to that return value.

Note

Even with a named table, you still have some control over which processes can read and write the table through the private, protected, and public options.

The other important option, especially for ETS tables containing records, is the keypos tuple. By default, ETS treats the first value in a tuple as the key. The tuple representation underneath records (which you shouldn’t really touch) always uses the first value in a tuple to identify the kind of record, so that approach works very badly as a key for records. Using the keypos tuple lets you specify which record value should be the key.

Remember, the record format for a planemo looks like the following:

-record(planemo, {name, gravity, diameter, distance_from_sun}).

Because this table is mostly used for calculations based on a given planemo, it makes sense to use the name as a key. An appropriate declaration for setting up the ETS table might look like the following:

PlanemoTable=ets:new(planemos,[ named_table, {keypos, #planemo.name} ])

This gives the table the name planemos and uses the named_table option to make that table visible to other processes that know the name. Because of the default access level of protected, this process can write to that table but other processes can only read it. It also tells ETS to use the name field as the key. Because it doesn’t specify otherwise, the table will be treated as a set—each key maps to only one instance of a record, and ETS doesn’t keep the list sorted by key.

Once you have the table set up, as shown in Example 10-5, you use the ets:info/1 function to check out its details. (You can find this in ch10/ex4-ets.)

Example 10-5. Setting up a simple ETS table and reporting on what’s there
-module(planemo_storage).
-export([setup/0]).
-include("records.hrl").

setup() ->
 PlanemoTable=ets:new(planemos, [named_table, {keypos, #planemo.name}]),
 ets:info(PlanemoTable).

If you compile and run this code, you’ll get a report of an empty ETS table with more properties than you probably want to know about at the moment:

1> c(planemo_storage).
{ok,planemo_storage}
2> planemo_storage:setup().
[{compressed,false},
 {memory,317},
 {owner,<0.316.0>},
 {heir,none},
 {name,planemos},
 {size,0},
 {node,nonode@nohost},
 {named_table,true},
 {type,set},
 {keypos,2},
 {protection,protected}]

Most of this is either more information than you need or unsurprising, but it is good to see the name (planemos), size (0—empty!), and keypos (not 1, the default, but 2, the location of the name in the tuple underneath the record). It is, as the defaults specify, set up as a protected set. (nonode@nohost just refers to the current Erlang environment when you aren’t distributing processing across multiple systems. If you’re running a distributed Erlang system, you’ll have multiple nodes, each its own independent Erlang runtime with its own name.)

You can set up only one ETS table with the same name. If you call planemo_storage:setup/0 twice, you’ll get an error:

3> planemo_storage:setup().
** exception error: bad argument
     in function  ets:new/2
        called as ets:new(planemos,[named_table,{keypos,2}])
     in call from planemo_storage:setup/0 (planemo_storage.erl, line 6)

To avoid this, at least in these early tests, you’ll want to use the f() shell command to clear out any previous tables. If you think you’re likely to call your initialization code repeatedly after you figure the basics out, you can also test the ets:info/1 for undefined to make sure the table doesn’t already exist, or put a try…catch construct around the ets:new/2 call.

A more exciting ETS table, of course, will include content. The next step is to use ets:insert/2 to add content to the table. The first argument is the table, referenced either by its name (if you set the named_table option), or by the variable that captured the return value of ets:new/2. In Example 10-6, which is in ch10/ex5-ets, the first call uses the name, to show that it works, and the rest use the variable. The second argument is a record representing one of the rows from Table 10-1.

Example 10-6. Populating a simple ETS table and reporting on what’s there
-module(planemo_storage).
-export([setup/0]).
-include("records.hrl").

setup() ->
PlanemoTable=ets:new(planemos, [named_table, {keypos, #planemo.name}]),

ets:insert(planemos,
 #planemo{ name=mercury, gravity=3.7, diameter=4878, distance_from_sun=57.9 }),
ets:insert(PlanemoTable,
 #planemo{ name=venus, gravity=8.9, diameter=12104, distance_from_sun=108.2 }),
ets:insert(PlanemoTable,
 #planemo{ name=earth, gravity=9.8, diameter=12756, distance_from_sun=149.6 }),
ets:insert(PlanemoTable,
 #planemo{ name=moon, gravity=1.6, diameter=3475, distance_from_sun=149.6 }),
ets:insert(PlanemoTable,
 #planemo{ name=mars, gravity=3.7, diameter=6787, distance_from_sun=227.9 }),
ets:insert(PlanemoTable,
 #planemo{ name=ceres, gravity=0.27, diameter=950, distance_from_sun=413.7 }),
ets:insert(PlanemoTable,
 #planemo{ name=jupiter, gravity=23.1, diameter=142796, distance_from_sun=778.3 }),
ets:insert(PlanemoTable,
 #planemo{ name=saturn, gravity=9.0, diameter=120660, distance_from_sun=1427.0 }),
ets:insert(PlanemoTable,
 #planemo{ name=uranus, gravity=8.7, diameter=51118, distance_from_sun=2871.0 }),
ets:insert(PlanemoTable,
 #planemo{ name=neptune, gravity=11.0, diameter=30200, distance_from_sun=4497.1 }),
ets:insert(PlanemoTable,
 #planemo{ name=pluto, gravity=0.6, diameter=2300, distance_from_sun=5913.0 }),
ets:insert(PlanemoTable,
 #planemo{ name=haumea, gravity=0.44, diameter=1150, distance_from_sun=6484.0 }),
ets:insert(PlanemoTable,
 #planemo{ name=makemake, gravity=0.5, diameter=1500, distance_from_sun=6850.0 }),
ets:insert(PlanemoTable,
 #planemo{ name=eris, gravity=0.8, diameter=2400, distance_from_sun=10210.0 }),
ets:info(PlanemoTable).

Again, the last call is to ets:info/1, which now reports that the table has 14 items:

4> c(planemo_storage).
{ok,planemo_storage}
5> f().
ok
6> planemo_storage:setup().
[{compressed,false},
 {memory,541},
 {owner,<0.342.0>},
 {heir,none},
 {name,planemos},
 {size,14},
 {node,nonode@nohost},
 {named_table,true},
 {type,set},
 {keypos,2},
 {protection,protected}]

If you want to see what’s in that table, you have a couple of options. The quick way to do it in the shell is to use the ets:tab2list/1 function, which will return a list of records (or tuples, if you leave out the record import on line 7):

7> rr("records.hrl").
[planemo,tower]
8> ets:tab2list(planemos).
[#planemo{name = pluto,gravity = 0.6,diameter = 2300,
          distance_from_sun = 5913.0},
 #planemo{name = saturn,gravity = 9.0,diameter = 120660,
          distance_from_sun = 1427.0},
 #planemo{name = moon,gravity = 1.6,diameter = 3475,
          distance_from_sun = 149.6},
 #planemo{name = mercury,gravity = 3.7,diameter = 4878,
          distance_from_sun = 57.9},
 #planemo{name = earth,gravity = 9.8,diameter = 12756,
          distance_from_sun = 149.6},
 #planemo{name = neptune,gravity = 11.0,diameter = 30200,
          distance_from_sun = 4497.1},
 #planemo{name = makemake,gravity = 0.5,diameter = 1500,
          distance_from_sun = 6850.0},
 #planemo{name = uranus,gravity = 8.7,diameter = 51118,
          distance_from_sun = 2871.0},
 #planemo{name = ceres,gravity = 0.27,diameter = 950,
          distance_from_sun = 413.7},
 #planemo{name = venus,gravity = 8.9,diameter = 12104,
          distance_from_sun = 108.2},
 #planemo{name = mars,gravity = 3.7,diameter = 6787,
          distance_from_sun = 227.9},
 #planemo{name = eris,gravity = 0.8,diameter = 2400,
          distance_from_sun = 10210.0},
 #planemo{name = jupiter,gravity = 23.1,diameter = 142796,
          distance_from_sun = 778.3},
 #planemo{name = haumea,gravity = 0.44,diameter = 1150,
          distance_from_sun = 6484.0}]

If you’d rather keep track of the table in a separate window, Erlang’s table visualizer shows the same information in a slightly more readable form. You can start it from the shell with observer:start(), and then click on the Table Viewer tab. You’ll see something like Figure 10-1. Double-click on the planemos table, and a more detailed report on its contents like the one shown in Figure 10-2 will appear.

ier2 1001
Figure 10-1. Opening the table visualizer
ier2 1002
Figure 10-2. Reviewing the planemos table in the visualizer

The visualizer doesn’t know about your record declarations.

Note

If you want to see a table of all the current ETS tables, try issuing ets:i() in the shell. You’ll see the tables you’ve created (probably) near the bottom.

Simple Queries

The easiest way to look up records in your ETS table is with the ets:lookup/2 function and the key. You can test this easily from the shell:

9> ets:lookup(planemos,eris).
[#planemo{name = eris,gravity = 0.8,diameter = 2400,
          distance_from_sun = 10210.0}]

The return value is always a list. This is true despite Erlang knowing that this ETS table has the set type, so only one value can match the key, and despite there being only one value. In situations like this where you know that there will only be one returned value, the hd/1 function, which Example 5-5 showed for use with user inputs, can get you the head of a list quickly. Since there is only one item, the head is just that item.

10> hd(ets:lookup(planemos,eris)).
#planemo{name = eris,gravity = 0.8,diameter = 2400,
         distance_from_sun = 10210.0}

The square brackets are gone, which means you can now extract, say, the gravity of a planemo:

11> Result=hd(ets:lookup(planemos,eris)).
#planemo{name = eris,gravity = 0.8,diameter = 2400,
         distance_from_sun = 10210.0}
12> Result#planemo.gravity.
0.8
Note

You can also use pattern matching to extract the value instead of the hd/1 function, as in [Result]=ets:lookup(planemos,eris).. Both approaches will fail if the return value is an empty list.

A Key Feature: Overwriting Values

Up until now, you’ve had to work with (or around) Erlang’s single-assignment paradigm: you can’t overwrite the value of a variable, or change the value of an item in a list directly. However, ETS doesn’t have that restriction. If you want to change the value of gravity on mercury, you can:

13> ets:insert(planemos, #planemo{ name=mercury,
 gravity=3.9, diameter=4878, distance_from_sun=57.9 }).
true
14> ets:lookup(planemos, mercury).
[#planemo{name = mercury,gravity = 3.9,diameter = 4878,
          distance_from_sun = 57.9}]

Just because you can change values in an ETS table, however, doesn’t mean that you should rewrite your code to replace immutable variables with flexible ETS table contents. Nor should you make all your tables public so that various processes can read and write whatever they like to the ETS table, making it a different form of shared memory.

Try to remember the discipline you’ve had to learn up until this point. Ask yourself when making changes is going to be useful, and when it might introduce tricky bugs. You probably won’t have to change the gravity of Mercury, but it certainly could make sense to change a shipping address. If you have doubts, lean toward caution.

ETS Tables and Processes

Now that you can extract gravitational constants for planemos, you can expand the drop module to calculate drops in many more locations. Example 10-7 combines the drop module from Example 8-6 with the ETS table built in Example 10-6 to create a more powerful drop calculator. (You can find this in ch10/ex6-ets-calculator.)

Example 10-7. Calculating drop velocities using an ETS table of planemo properties
-module(drop).
-export([drop/0]).
-include("records.hrl").

drop() ->
 setup(),
 handle_drops().

handle_drops() ->
 receive
  {From, Planemo, Distance} ->
  From ! {Planemo, Distance, fall_velocity(Planemo, Distance)},
  handle_drops()
 end.

fall_velocity(Planemo, Distance) when Distance >= 0 ->
  P=hd(ets:lookup(planemos,Planemo)),
  math:sqrt(2 * P#planemo.gravity * Distance).

setup() ->
 ets:new(planemos, [named_table, {keypos, #planemo.name}]),

 ets:insert(planemos,
  #planemo{ name=mercury, gravity=3.7, diameter=4878, distance_from_sun=57.9 }),
 ets:insert(planemos,
  #planemo{ name=venus, gravity=8.9, diameter=12104, distance_from_sun=108.2 }),
 ets:insert(planemos,
  #planemo{ name=earth, gravity=9.8, diameter=12756, distance_from_sun=149.6 }),
 ets:insert(planemos,
  #planemo{ name=moon, gravity=1.6, diameter=3475, distance_from_sun=149.6 }),
 ets:insert(planemos,
  #planemo{ name=mars, gravity=3.7, diameter=6787, distance_from_sun=227.9 }),
 ets:insert(planemos,
  #planemo{ name=ceres, gravity=0.27, diameter=950, distance_from_sun=413.7 }),
 ets:insert(planemos,
  #planemo{ name=jupiter, gravity=23.1, diameter=142796, distance_from_sun=778.3 }),
 ets:insert(planemos,
  #planemo{ name=saturn, gravity=9.0, diameter=120660, distance_from_sun=1427.0 }),
 ets:insert(planemos,
  #planemo{ name=uranus, gravity=8.7, diameter=51118, distance_from_sun=2871.0 }),
 ets:insert(planemos,
  #planemo{ name=neptune, gravity=11.0, diameter=30200, distance_from_sun=4497.1 }),
 ets:insert(planemos,
  #planemo{ name=pluto, gravity=0.6, diameter=2300, distance_from_sun=5913.0 }),
 ets:insert(planemos,
  #planemo{ name=haumea, gravity=0.44, diameter=1150, distance_from_sun=6484.0 }),
 ets:insert(planemos,
  #planemo{ name=makemake, gravity=0.5, diameter=1500, distance_from_sun=6850.0 }),
 ets:insert(planemos,
  #planemo{ name=eris, gravity=0.8, diameter=2400, distance_from_sun=10210.0 }).

The drop/0 function changes a little to call the initialization separately and avoid setting up the table on every call. This moves the message handling to a separate function, handle_drop/0. The fall_velocity/2 function also changes, as it now looks up planemo names in the ETS table and gets their gravitational constant from that table rather than hardcoding those contents into the function. (While it would certainly be possible to pass the PlanemoTable variable from the previous example as an argument to the recursive message handler, it’s simpler to just use it as a named table.)

Note

If this process crashes and needs to be restarted, restarting it will trigger the setup/0 function, which currently doesn’t check to see whether the ETS table exists. That could cause an error, except that ETS tables vanish when the processes that created them die. ETS offers an heir option and an ets:give_away/3 function if you want to avoid that behavior, but for now it works well.

If you combine this module with the mph_drop module from Example 8-7, you’ll be able to calculate drop velocities on all of these planemos:

1> c(drop).
{ok,drop}
2> c(mph_drop).
{ok,mph_drop}
3> Pid1=spawn(mph_drop,mph_drop,[]).
<0.33.0>
4> Pid1 ! {earth,20}.
On earth, a fall of 20 meters yields a velocity of 44.289078952755766 mph.
{earth,20}
5> Pid1 ! {eris,20}.
On eris, a fall of 20 meters yields a velocity of 12.65402255793022 mph.
{eris,20}
6> Pid1 ! {makemake,20}.
On makemake, a fall of 20 meters yields a velocity of 10.003883211552367 mph.
{makemake,20}

That’s a lot more variety than its earth, moon, and mars predecessors!

Next Steps

While many applications just need a fast key/value store, ETS tables are far more flexible than the examples so far demonstrate. You can use Erlang’s match specifications and ets:fun2ms to create more complex queries with ets:match and ets:select. You can delete rows (and tables) with ets:delete. The ets:first, ets:next, and ets:last functions let you traverse tables recursively.

Perhaps most important, you can also explore DETS, the disk-based term storage, which offers similar features but with tables stored on disk. It’s slower, with a 2GB limit, but the data doesn’t vanish when the controlling process stops.

You can dig deeper into ETS and DETS, but if your needs are more complex, and especially if you need to split data across multiple nodes, you should probably explore the Mnesia database.

Note

ETS and DETS are discussed in Chapter 10 of Erlang Programming; Chapter 19 of Programming Erlang, 2nd Edition; Section 2.14 and Chapter 6 of Erlang and OTP in Action; and Chapter 25 of Learn You Some Erlang For Great Good!.

Storing Records in Mnesia

Mnesia is a database management system (DBMS) that comes with Erlang. It uses ETS and DETS underneath, but provides many more features than those components.

You should consider shifting from ETS (and DETS) tables to the Mnesia database if:

  • You need to store and access data across a set of nodes, not just a single node.

  • You don’t want to have to think about whether you’re going to store data in memory or on a disk (or both).

  • You need to be able to roll back transactions if something goes wrong.

  • You’d like a more approachable syntax for finding and joining data.

  • Management prefers the sound of “database” to the sound of “tables.”

You may even find yourself using ETS for some aspects of a project and Mnesia for others.

Note

That isn’t “amnesia,” the forgetting, but “mnesia,” the Greek word for memory.

Starting up Mnesia

If you want to store data on disk, you need to give Mnesia some information. Before you turn Mnesia on, you need to create a database, using the mnesia:create_schema/1 function. For now, because you’ll be getting started using only the local node, that will look like the following:

1> mnesia:create_schema([node()]).
ok

By default, when you call mnesia:create_schema/1, Mnesia will store schema data in the directory you’re in when you start it. If you look in the directory where you started Erlang, you’ll see a new directory with a name like Mnesia.nonode@nohost. Initially, it holds a LATEST.LOG file and a schema.DAT file. The node() function just returns the identifier of the node you’re on, which is fine when you’re getting started. (If you want to change where Mnesia stores data, you can start Erlang with some extra options: erl -mnesia dir " path ". The path will be the location where Mnesia keeps any disk-based storage.)

Note

If you start Mnesia without calling mnesia:create_schema/1, it will keep its schema in memory, and that schema will vanish if and when Mnesia stops.

Unlike ETS and DETS, which are always available, you need to turn Mnesia on:

2> mnesia:start().
ok

There’s also an mnesia:stop/0 function if you want to stop it.

Note

If you run Mnesia on a computer that goes to sleep, you may, when it wakes up, get odd messages like Mnesia(nonode@nohost): ** WARNING ** Mnesia is overloaded: {dump_log, time_threshold}. Don’t worry, it’s a side effect of waking up, and your data should still be safe. You probably shouldn’t run production systems on devices that go to sleep, of course.

Creating Tables

Like ETS, Mnesia’s basic concept of a table is a collection of records. It also offers set, ordered_set, and bag options, just like those in ETS, but does not offer duplicate_bag.

Mnesia also wants to know more about your data than ETS. ETS pretty much takes data in tuples of any shape, counting only on there being a key it can use. The rest is up to you to interpret. Mnesia wants to know more about what you store, and takes a list of field names. The easy way to handle this is to define records and consistently use the field names from the records as Mnesia field names. There’s even an easy way to pass the record names to Mnesia, using record_info/2.

The planemos table can work just as easily in Mnesia as in ETS, and some aspects of dealing with it will be easier. Example 10-8, which is in ch10/ex7-mnesia, shows how to set up the planemo table in Mnesia. The setup/0 method creates a schema, then starts Mnesia, and then creates a table based on the planemo record type. Once the table is created, it writes the values from Table 10-1 to it.

Example 10-8. Setting up an Mnesia table of planemo properties
-module(drop).
-export([setup/0]).
-include("records.hrl").

setup() ->
 mnesia:create_schema([node()]),
 mnesia:start(),
 mnesia:create_table(planemo, [{attributes, record_info(fields, planemo)}]),

 F = fun() ->
 mnesia:write(
  #planemo{ name=mercury, gravity=3.7, diameter=4878, distance_from_sun=57.9 }),
 mnesia:write(
  #planemo{ name=venus, gravity=8.9, diameter=12104, distance_from_sun=108.2 }),
 mnesia:write(
  #planemo{ name=earth, gravity=9.8, diameter=12756, distance_from_sun=149.6 }),
 mnesia:write(
  #planemo{ name=moon, gravity=1.6, diameter=3475, distance_from_sun=149.6 }),
 mnesia:write(
  #planemo{ name=mars, gravity=3.7, diameter=6787, distance_from_sun=227.9 }),
 mnesia:write(
  #planemo{ name=ceres, gravity=0.27, diameter=950, distance_from_sun=413.7 }),
 mnesia:write(
  #planemo{ name=jupiter, gravity=23.1, diameter=142796, distance_from_sun=778.3 }),
 mnesia:write(
  #planemo{ name=saturn, gravity=9.0, diameter=120660, distance_from_sun=1427.0 }),
 mnesia:write(
  #planemo{ name=uranus, gravity=8.7, diameter=51118, distance_from_sun=2871.0 }),
 mnesia:write(
  #planemo{ name=neptune, gravity=11.0, diameter=30200, distance_from_sun=4497.1 }),
 mnesia:write(
  #planemo{ name=pluto, gravity=0.6, diameter=2300, distance_from_sun=5913.0 }),
 mnesia:write(
  #planemo{ name=haumea, gravity=0.44, diameter=1150, distance_from_sun=6484.0 }),
 mnesia:write(
  #planemo{ name=makemake, gravity=0.5, diameter=1500, distance_from_sun=6850.0 }),
 mnesia:write(
  #planemo{ name=eris, gravity=0.8, diameter=2400, distance_from_sun=10210.0 })
  end,

  mnesia:transaction(F).

Apart from the setup, the key thing to note is that all of the writes are contained in a fun that is then passed to mnesia:transaction to be executed as a transaction. Mnesia will restart the transaction if there is other activity blocking it, so the code may get executed repeatedly before the transaction happens. Because of this, do not include any calls that create side effects to the function you’ll be passing to mnesia:transaction, and don’t try to catch exceptions on Mnesia functions within a transaction. If your function calls mnesia:abort/1 (probably because some condition for executing it wasn’t met), the transaction will be rolled back, returning a tuple beginning with aborted instead of atomic.

Note

You may also want to explore the more flexible mnesia:activity/2 when you need to mix more kinds of tasks in a transaction.

Your interactions with Mnesia should be contained in transactions, especially when your database is shared across multiple nodes. The main mnesia:write, mnesia:read, and mnesia:delete methods work only within transactions, period. There are dirty_ methods, but every time you use them, especially to write data to the database, you’re taking a risk.

Note

Just as in ETS, you can overwrite values by writing a new value with the same key as a previous entry.

If you want to check on how this function worked out, try the mnesia:table_info function, which can tell you more than you want to know. The following listing is abbreviated to focus on key results.

1> c(drop).
{ok,drop}
2> rr("records.hrl").
[planemo,tower]
3> drop:setup().
{atomic,ok}
4> mnesia:table_info(planemo,all).
[{access_mode,read_write},
 {active_replicas,[nonode@nohost]},
 {all_nodes,[nonode@nohost]},
 {arity,5},
 {attributes,[name,gravity,diameter,distance_from_sun]},
 ...
 {memory,541},
 {ram_copies,[nonode@nohost]},
 {record_name,planemo},
 {record_validation,{planemo,5,set}},
 {type,set},
 {size,14},
 ...]

You can see which nodes are involved in the table (nonode@nohost is the default for the current node). arity in this case is the count of fields in the record, and attributes tells you what their names are. ram_copies plus the name of the current node tells you that this table is stored in memory locally. It is, as in the ETS example, of type set, and there are 14 records.

Note

By default, Mnesia will store your table in RAM only (ram_copies) on the current node. This is speedy, but it means the data vanishes if the node crashes. If you specify disc_copies (note the spelling), Mnesia will keep a copy of the database on disk, but still use RAM for speed. You can also specify disc_only_copies, which will be slow. Unlike ETS, the table you create will still be around if the process that created it crashes, and will likely survive even a node crash so long as it wasn’t only in RAM on a single node. By combining these options and (eventually) multiple nodes, you should be able to create fast and resilient systems.

The table is now set up, and you can start to use it. If you’re running the Table Viewer, or start it with observer:start(), you can take a look at the contents of your Mnesia tables as well as your ETS tables. In the View menu, choose Mnesia Tables. The interface is similar to that for ETS tables.

Reading Data

Just like writes, you should wrap mnesia:read calls in a fun, which you then pass to mnesia:transaction. You can do that in the shell if you want to explore:

5> mnesia:transaction(fun() -> mnesia:read(planemo,neptune) end).
{atomic,[#planemo{name = neptune,gravity = 11.0,
                  diameter = 30300,distance_from_sun = 4497.1}]}

The result arrives as a tuple, which when successful contains atomic plus a list with the data from the table. The table data is packaged as a record, and you can get to its fields easily.

You can rewrite the fall_velocity/2 function from Example 10-8 to use an Mnesia transaction instead of an ETS call. The ETS version looked like the following:

fall_velocity(Planemo, Distance) when Distance >= 0 ->
  P=hd(ets:lookup(planemos,Planemo)),
  math:sqrt(2 * P#planemo.gravity * Distance).

Line 2 of the Mnesia version is a bit different:

fall_velocity(Planemo, Distance) when Distance >= 0->
 {atomic, [P | _]}=mnesia:transaction(fun()->mnesia:read(planemo,Planemo) end),
  math:sqrt(2 * P#planemo.gravity * Distance).

Because Mnesia returns a tuple rather than a list, this code uses pattern matching to extract the first item in the list contained in the second item of the tuple (and throws away the tail of that list with _). This table is a set, so there will always be only one item there. Then the data, contained in P, can be used for the same calculation as before.

If you compile and run that code, you’ll see a familiar result:

6> c(drop).
{ok,drop}
7> drop:fall_velocity(earth,20).
19.79898987322333
8> Pid1=spawn(mph_drop,mph_drop,[]).
<0.120.0>
9> Pid1 ! {earth,20}.
{earth,20}
On earth, a fall of 20 meters yields a velocity of 44.289078952755766 mph.

For these purposes, the simple mnesia:read is enough. You can tell Mnesia to build indexes for fields other than the key, and query those with mnesia:index_read.

Note

If you want to delete records, you can run mnesia:delete/2, also inside of a transaction.

Query List Comprehensions

If Mnesia is really a database, it should be able to do more than key-value querying, right? It definitely can. You can use Erlang match specifications (as you can with ETS), but query list comprehensions (QLCs) are much more readable. They look like list comprehensions, which you saw in Chapter 7, but operate on Mnesia tables rather than lists.

Suppose you want to find all the planemos with gravity less than that of Earth. You could traverse the table with the mnesia:first and mnesia:next methods, but that seems like a lot of extra work. Instead, you can use the qlc:q function to hold a list comprehension and the qlc:e (or the equivalent but longer qlc:eval) function to process it. Then you run that inside of an mnesia:transaction call.

Note

You can run query list comprehensions in the shell, but if you want to use them in modules you need to add -include_lib("stdlib/include/qlc.hrl"). to the declarations at the top of your module.

The simplest query list comprehension just returns all the values in the table. I’ve broken it out here on separate lines so that you can see how they interact:

mnesia:transaction(
  fun() ->
    qlc:e(
      qlc:q( [X || X <- mnesia:table(planemo)] )
    )
  end
)

As always, the mnesia:transaction function takes a fun as its argument. In this case, the fun contains a qlc:e function, which then contains a qlc:q function, where the real query is. It will build a list from the contents of the planemo table.

If you compact this a bit and run it in the shell, you’ll see that the resulting list—wrapped in a transaction result tuple—contains the entire table.

10> mnesia:transaction( fun() -> qlc:e(qlc:q([X || X <- mnesia:table(planemo)]))
 end).
{atomic,[#planemo{name = pluto,gravity = 0.6,
                  diameter = 2300,distance_from_sun = 5913.0},
         #planemo{name = saturn,gravity = 9.0,diameter = 120660,
                  distance_from_sun = 1427.0},
         #planemo{name = moon,gravity = 1.6,diameter = 3475,
                  distance_from_sun = 149.6},
         #planemo{name = mercury,gravity = 3.7,diameter = 4878,
                  distance_from_sun = 57.9},
         #planemo{name = earth,gravity = 9.8,diameter = 12756,
                  distance_from_sun = 149.6},
         #planemo{name = neptune,gravity = 11.0,diameter = 30200,
                  distance_from_sun = 4497.1},
         #planemo{name = makemake,gravity = 0.5,diameter = 1500,
                  distance_from_sun = 6850.0},
         #planemo{name = uranus,gravity = 8.7,diameter = 51118,
                  distance_from_sun = 2871.0},
         #planemo{name = ceres,gravity = 0.27,diameter = 950,
                  distance_from_sun = 413.7},
         #planemo{name = venus,gravity = 8.9,diameter = 12104,
                  distance_from_sun = 108.2},
         #planemo{name = mars,gravity = 3.7,diameter = 6787,
                  distance_from_sun = 227.9},
         #planemo{name = eris,gravity = 0.8,diameter = 2400,
                  distance_from_sun = 10210.0},
         #planemo{name = jupiter,gravity = 23.1,diameter = 142796,
                  distance_from_sun = 778.3},
         #planemo{name = haumea,gravity = 0.44,diameter = 1150,
                  distance_from_sun = 6484.0}]}

You can add conditions to the query list comprehension. To find all of the planemos with gravity less than that of Earth’s 9.8, you’d run:

mnesia:transaction(
  fun() ->
    qlc:e(
      qlc:q( [X || X <- mnesia:table(planemo),
                   X#planemo.gravity < 9.8] )
    )
  end
)

Compress and run that in the shell, and you’ll get a shorter list of planemos where everything feels a little lighter.

11> mnesia:transaction( fun() -> qlc:e(qlc:q( [X || X <- mnesia:table(planemo),
 X#planemo.gravity < 9.8] )) end).
{atomic,[#planemo{name = pluto,gravity = 0.6,
                  diameter = 2300,distance_from_sun = 5913.0},
         #planemo{name = saturn,gravity = 9.0,diameter = 120660,
                  distance_from_sun = 1427.0},
         #planemo{name = moon,gravity = 1.6,diameter = 3475,
                  distance_from_sun = 149.6},
         #planemo{name = mercury,gravity = 3.7,diameter = 4878,
                  distance_from_sun = 57.9},
         #planemo{name = makemake,gravity = 0.5,diameter = 1500,
                  distance_from_sun = 6850.0},
         #planemo{name = uranus,gravity = 8.7,diameter = 51118,
                  distance_from_sun = 2871.0},
         #planemo{name = ceres,gravity = 0.27,diameter = 950,
                  distance_from_sun = 413.7},
         #planemo{name = venus,gravity = 8.9,diameter = 12104,
                  distance_from_sun = 108.2},
         #planemo{name = mars,gravity = 3.7,diameter = 6787,
                  distance_from_sun = 227.9},
         #planemo{name = eris,gravity = 0.8,diameter = 2400,
                  distance_from_sun = 10210.0},
         #planemo{name = haumea,gravity = 0.44,diameter = 1150,
                  distance_from_sun = 6484.0}]}

That output still contains more information than might be necessary. You can modify the left side of the comprehension to cut things down, creating a tuple that is just the name and gravity of the planemo:

mnesia:transaction(
  fun() ->
    qlc:e(
      qlc:q( [{X#planemo.name, X#planemo.gravity} ||
               X <- mnesia:table(planemo),
               X#planemo.gravity < 9.8] )
    )
  end
)

The result is much trimmer:

12> mnesia:transaction( fun()->qlc:e(qlc:q( [ {X#planemo.name, X#planemo.gravity}
|| X<-mnesia:table(planemo), X#planemo.gravity < 9.8] )) end).
{atomic,[{pluto,0.6},
         {saturn,9.0},
         {moon,1.6},
         {mercury,3.7},
         {makemake,0.5},
         {uranus,8.7},
         {ceres,0.27},
         {venus,8.9},
         {mars,3.7},
         {eris,0.8},
         {haumea,0.44}]}

There are ways to reduce at least some of the syntax overhead here. It’s not difficult, for example, to move the mnesia:transaction, fun definition, and qlc:e call to a function that takes the qlc:q function as its argument. In Programming Erlang, Joe Armstrong does just that to create a do function. You may want to break things up differently depending on your coding style and data structures.

Note

You can use query list comprehensions on more than one table at a time, which is how you can create the equivalent of joins between tables, and it is also possible to use them on ETS tables.

This is just a brief introduction to Mnesia. It gets some coverage in all of the Erlang books, but eventually I hope it will get a book of its own, about as long as this one.

Note

Mnesia is covered in Chapter 13 of Erlang Programming (O’Reilly); Chapter 20 of Programming Erlang, 2nd Edition (Pragmatic); Section 2.7 of Erlang and OTP in Action (Manning); and Chapter 29 of Learn You Some Erlang For Great Good! (No Starch Press).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset