Distributed Erlang

In Chapter 6, Distributed Elixir, we discussed distribution from a development and architectural perspective. This time, we will explore it under the operations view, directly tied to the Erlang runtime.

Let’s start with a quick summary.

Distributed Erlang works by establishing TCP connections between nodes. Nodes can only successfully establish connections if they share the same cookie. When distributed Erlang starts, it can automatically create a cookie, but we strongly advise teams to generate their own cookies. Once connected, nodes form a fully meshed network, where each node can communicate with all others. By default, the runtime does not encrypt the connection but can be configured to do so.

Nodes keep an open connection between them while both are up and running. Both nodes send a configurable heartbeat (also called ticktime) over the connection. If either node fails to receive a heartbeat in a time interval, the connection is dropped and the nodes disconnected. Since nodes send heartbeats over the same connection as data, you should refrain from sending large amounts of data between them at once, as that would delay the heartbeat message.

Each node has a name and host address. Both :"my_app@node1" and :"my_app @192.168.1.1" are valid node names. The former requires host names to be properly configured in your clusters.

To aid the connection between nodes, Erlang ships with a tool called EPMD, Erlang Port Mapper Daemon.[95] Once a node goes up, it registers its name and port to the EPMD running locally. When a node wants to connect to instance :"[email protected]", it first reaches the EPMD instance running on 192.168.1.1 to fetch all available names and ports. If my_app is one of the available names, it then attempts to connect to the Erlang runtime on the registered port.

Communication with EPMD does not use the cookie for authorization and is not encrypted. This means if the port EPMD runs on is publicly available, an external entity will be able to query EPMD for the list of names and ports. Fortunately, the external entity will only be able to connect to a node if they know the cookie.

The use of EPMD means that distributed Erlang needs two ports for every machine. EPMD by default runs on port 4369. The other port, which is used for connecting Erlang nodes, is randomly assigned. Luckily the range of assigned ports can be configured when starting Elixir under the --erl flag:

 $ ​​elixir​​ ​​--erl​​ ​​"-kernel inet_dist_listen_min 9100
 > -kernel inet_dist_listen_max 9200"

If you need to use a fixed port, you can set both configurations to the same value. An advantage of using a fixed port is that you no longer need to run EPMD, as the whole goal of EPMD is mapping names to ports, reducing the number of required ports to 1. You can combine this approach with orchestration tools to provide straightforward management of Erlang clusters.

Let’s look at how to do so. First, we’ll provide general security guidelines. Then, we’ll show a distributed Erlang example without EPMD and we’ll discuss dynamically setting up clusters.

Security Guidelines

Elixir provides the framework for building safe, secure, and reliable applications, but you’ll still need to do your part. Here are the general guidelines for running distributed Erlang:

  • Never expose the Erlang distribution ports and EPMD to the public network.

  • Never rely on automatic cookies. Generate your own and make sure it is sufficiently large. Distillery automatically takes care of this step.

  • If you are running distributed Erlang over a known port, consider disabling epmd (as you’ll see next). Given traffic to and from EPMD cannot be encrypted, disabling it may also appease operation teams that do not allow unencrypted services to run, even when the service is not publicly exposed.

  • Remember the connection between nodes is not encrypted. If someone can eavesdrop the communication in your cluster or if encryption is required, use TLS.[96] More complete resources are also available.[97]

With that out of the way, let’s configure Erlang clusters without EPMD.

Removing EPMD

If your nodes run on a known port, there is no need to run epmd alongside the Erlang VM. We have seen this being handy on two different situations.

In the first situation, all communication had to be encrypted, no questions asked. In the second case, the operations team had to explicitly authorize each port. Sometimes those rules are in place by corporate mandate. You could try to break the rules, but new technology adopters must choose battles wisely. Other times, the rules are there because of external needs, such as in financial institutions or health organizations dealing with patient data.

There are multiple ways to ensure Erlang nodes run on a known port. One option is to choose a fixed port and apply it to all nodes. For this example, we’ll choose a slightly more complex mechanism. We’ll encode the port in the node name. For example, a node named “example” should now be named “example-9100”, where 9100 is the port it is running on. Once you’ve chosen this mechanism, you can implement a custom EPMD client module that won’t invoke EPMD at all. Instead, you can parse the port out of the node name.

Let’s start with the EPMD client module. It needs to implement a group of functions, like this:

 defmodule​ NameAndPort ​do
 # The current distribution protocol version.
  @protocol_version 5

Our new EPMD client does not have an underlying process, so we return :ignore:

 def​ start_link ​do
 :ignore
 end

Without EPMD, there is nowhere to register the name and port. We return a “creation” number between 1 and 3 as required by Erlang:

 def​ register_node(_name, _port, _version) ​do
  {​:ok​, ​:rand​.uniform(3)}
 end

This implementation will retrieve the port from the node name, and there is no need to contact EPMD:

 def​ port_please(name, _ip) ​do
  shortname = name |> to_string() |> String.split(​"​​@"​) |> hd()
 
 with​ [_prefix, port_string] <- String.split(shortname, ​"​​-"​),
  {port, ​"​​"​} <- Integer.parse(port_string) ​do
  {​:port​, port, @protocol_version}
 else
  _ -> ​:noport
 end
 end

There are also no names to fetch without EPMD:

 def​ names(_hostnames) ​do
  {​:error​, ​:no_epmd​}
 end
 end

Write this code to a file and then compile it:

 $ ​​elixirc​​ ​​name_and_port.ex

This command will generate a .beam file at the current directory.

With the new EPMD client in hand, we can start iex using a custom name, port, and our custom client. Note you will need at least Erlang 19.1, as the ability to configure the EPMD client was added in that version:

 $ ​​iex​​ ​​--sname​​ ​​"example-9100"​​ ​​--erl​​ ​​"-start_epmd false
 > -epmd_module Elixir.NameAndPort -kernel inet_dist_listen_min 9100
 > -kernel inet_dist_listen_max 9100"
 Interactive Elixir (1.5.0) - press Ctrl+C to exit (type h() ENTER for help)
 iex(example-9101@macbook)1>

On another terminal, do the same, except we need to use a different name and a matching port:

 $ ​​iex​​ ​​--sname​​ ​​"example-9101"​​ ​​--erl​​ ​​"-start_epmd false
 > -epmd_module Elixir.NameAndPort -kernel inet_dist_listen_min 9101
 > -kernel inet_dist_listen_max 9101"
 Interactive Elixir (1.5.0) - press Ctrl+C to exit (type h() ENTER for help)
 iex(example-9101@macbook)1>

In the last session, we can connect to the first one by running Node.connect(:"example-9100@macbook") and everything should work as expected. Remember your node name won’t be precisely :"example-9100@macbook", so change the Node.connect call accordingly.

Although in this example we have defined a file and compiled it by hand, if you are using releases, Mix will take care of compiling the file for you. All you need to do is to move the relevant flags given to --erl to the vm.args file.

Getting rid of EPMD is a fairly straightforward process. If you have fixed ports across all nodes, it can be even simpler. You just need to change the port_please/2 implementation to always return {:port, 9876, @protocol_version}, where 9876 should be replaced by your port of choice. In such cases, the node names are no longer relevant and you can use any name of your choice as long as each one is unique.

Setting Up Clusters

In the previous section, we have explicitly called Node.connect/1 to connect two nodes. Setting up a cluster is simply a matter of calling Node.connect/1 whenever a new node joins the cluster. In the rare cases the list of nodes is static, all you need to do on boot is:

 Enum.map(list_of_known_nodes, &Node.connect/1)

In practice, new nodes may join the cluster at any moment and you need a mechanism to propagate this information throughout the cluster. If you are using a cloud platform like AWS or an orchestration tool such as Kubernetes, it is very likely they expose an API where you can retrieve the IP of all nodes. To dynamically set up a cluster, all you need is to periodically request a list of all nodes to such tools, and then call Node.connect/1 whenever there is a new entry. Those tools and platforms are not required for setting up clusters, but when already in place, they play well with the Erlang runtime by removing the hurdles of cluster membership.

While this mechanism is relatively straightforward to set up, there are existing packages in the community, such as peerage[98] and libcluster,[99] that provide integration with external services as well as their own discovery alternatives via multicast. If you’d rather roll your own, we recommend exploring the source code of those tools for guidance.

The solutions described in this and the last section are orthogonal. For instance, you may use Kubernetes, an alternative deployment system, without EPMD by relying on the fixed port technique outlined in the previous section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset