28 Days - Elixir Node Networking Basics
One of my favorite points in Elixir, some form of magic perhaps, is how simple distributed networking can be. The language itself seems to make establishing networked nodes and sending messages a pain-free exercise. While breaking down networking techniques is more than a single post, I am going to look into some of the basics of networking in Elixir. Specifically, what happens when we connect nodes together?
Nodes in Elixir
In Elixir, a Node
could be defined as a single running instance. There can be
multiple nodes running on a single machine. Let’s use this to take a look at some
basics of node communication, before diving into what is going on.
# In shell 1
iex --name [email protected]
# In shell 2
iex --name [email protected]
# In shell 1
Node.self() # :"[email protected]"
Node.list() # []
Node.connect(:"[email protected]") # true
Node.list() # [:"[email protected]"]
# In shell 2
Node.list() # [:"[email protected]"]
# Leave open
From the above interactive example, it’s possible to see that 2 nodes are initially
started, with a particular name of [email protected]
or test2. These nodes start as
disconnected, but can be manually connected. Once connected, both nodes are aware of the
other node’s existence.
Finally, we can test out sending a message in the above processes:
# In shell 1
Node.spawn(:"[email protected]", fn -> IO.inspect(Node.self()) end)
#PID<13084.113.0>
:"[email protected]"
In the above example, a function is set to be executed on :"[email protected]"
. We can see
that the result of the execution is a remote pid (doesn’t start with 0) as well the output
containing the node’s name.
The Node module contains lots of interesting tidbits that are actually implemented pretty thin on top of erlang modules. This post won’t go into the ins and outs of node communication, although it’s suffice to say that communication is generally done in better ways than spawning functions between the nodes.
Diving into a connection
Node.connect provides
a really simple 1-liner over :net_kernel.connect_node/1
. Let’s dig into this function and
what some ramifications of using it are.
Deep inside of the erlang OTP libraries, we start to see some interesting code in connection. Specifically, there is code to handle automatic “magical” connection between nodes, vs more explicit connection dependencies.
Digging even further,
we discover that the net_kernel
symbol is actually a process on the system. Running Process.whereis(:net_kernel)
will return the pid of this net_kernel process.
The first time that a Node is connected to, that connection is not present in the ets lookup table. This leads to setup being called and initializing that connection.
By digging into Process.whereis(:net_kernel) |> :sys.get_state()
, it’s possible to see that
there is a structure like:
{:listen, #Port<0.609>, #PID<0.49.0>,
{:net_address, {{0, 0, 0, 0}, 56479}, 'Steves-MBP', :tcp, :inet},
:inet_tcp_dist}
This state is documented and helps to let us trace the module that will actually connect our nodes together. Finally, we are able to track down the setup code that is creating the TCP socket between the nodes.
One small caveat of connecting to nodes is that “cookies” have to match up. This is
essentially an atom that is initialized on boot and read from ~/.erlang.cookie
. Finding
this was pretty challenging in erlang, but I traced it back to
dist_util
module. The code above (even if you can’t read erlang) is doing some interesting challenge/reply
protocol to ensure that the nodes are allowed to talk to each other.
After the connection
Of course, connecting via TCP here is just the beginning. There is significantly more at play, cookies to authenticate nodes and heartbeats to ensure connectivity between nodes for starters. The distributed erlang guide from learnyousomeerlang.com goes over some of these concepts in detail. I’ll be tackling them in a future post as well.
Addendum
It was asked in the Elixir slack group if it’s possible to customize the distribution mechanism from TCP/IP to something else. It is! It’s fairly involved C code, but there is an example walking through a OS socket level distribution.
In addition, some other drivers are provided out of the box, such as a SSL driver for communicating over SSL.
I don’t really read erlang code, so this post was very interesting today. However, a lot of great things can be learned from digging into the erlang code and module documentation and not relying solely on Elixir docs. For instance, the documentation for node networking brings up great points around security and TLS node communication. Dive into the docs and see what you go from there; it might be useful one day when there’s a problem that you just can’t figure out.
Thanks for reading the 5th post in my 28 days of Elixir. Keep up through the month of February to see if I can stand subjecting myself to 28 days of straight writing. I anticipate a few more posts around networking, such as cookie gotchas, distillery release networking, and pg2.