28 days - Phoenix websocket dive

I, and seemingly most people that use them, have a positive view towards Phoenix websockets. They are simple to use, scalable, and are stable in production. I have some neat content that I want to go over with websockets, but it will be better to do a deep dive into how they work first. In particular, the process structure and how they scale.

I have prepared a demo repository that implements a few different websocket concepts. The goal of this demo is to reinforce / provide material for this post. You can find the repo on github.

The beginning section will start with basics of the websocket, leading into how the websocket internals work.

Websocket Tests

Before going into the code that powers the websocket, let’s dive into the code that tests the websocket. Phoenix.ChannelTest provides some helpers that make testing the socket simple. There is some complex setup that goes into the socket, but these helpers reduce the complexity.

The first test that I like to start with is a test for joining a channel. A channel is an abstraction over the websocket to allow for the actual bi-directional communication. When we “communicate” to a socket, we’re actually communicating to the channel. It is good practice to test that the socket is authenticated to join a particular channel, but this demo does not use any authorization.

One of the very core behaviors of a websocket is responding to a request from the client. This is tested using assert_reply, which ensures that the particular push to the websocket is responded to. We will dig into what “responding to a push” means, below.

One test that I struggled a bit with was testing that the async timer was enqueued for 5 seconds. I started this test by making the test wait for 5s, which was really not a good strategy. What I ended with was keep the timer reference in a state variable, and reading that from the test. It might not be great to reach into the internals of a structure in a test, but the alternative is waiting a very long time in the test suite. I ended up being okay with this tradeoff for the delayed push test, as I could easily make it run in less time by changing from 300ms to something else.

You can find the implementation that satisfies these tests here.

Behind the scenes, what is a websocket?

The real purpose of this post is to explore what a websocket is. Specifically, what is happening when connection, pushes, and responses happen. Let’s start by looking at the process structure for when a websocket is created. I will do this by opening :observer.start and looking at the difference when a socket is created.

The rightmost process is the websocket process

Above, the pid <0.369.0> is the websocket process. This is possible to visualize by clicking on that process and viewing the state. It shows the websocket internal state.

It is possible to view the Socket struct in the state information

One interesting thing about this pid is that it appears in the state itself, as the channel_pid key of the Elixir.Phoenix.Socket struct. As if seeing the process in observer is not enough, this gives the proof that the channel is a process.

Also in the process state is a second pid called transport_pid. This process is where the actual websocket protocol occurs.

The transport process is the parent of the channel process

Internally, there is a Phoenix.Channel.Server GenServer which wraps around the behavior of the defined channel. I initially thought that saying use Phoenix.Channel would cause the channel to become a GenServer, but that is proved false by seeing that the Phoenix.Channel.Server implementation sets channel_pid: self(). I won’t dig into this server implementation today.

I also won’t dig into the cowboy code, but we can see that the actual websocket implementation happens at the cowboy level via cowboy_websocket.

Finally, the PubSub.Local0 process is monitoring the channel process. I am going to leave this implementation up for another blog post, or as an exercise for you.

Websocket messages

Google Chrome provides functionality to view what a websocket is doing. This can be accessed by finding the websocket in the Network tab. The websocket request has a “frames” inner section. When you view this, you may need to drag down the “select a frame” content to view the frames themselves.

The in/out of the client websocket

When a message is pushed from the JS client to the backend, it sends the request in the format [join_ref, message_ref, topic, message, payload]. The joinref and messageref are actually really important. These are used in the response (same format as the request) to allow calling back to the right function. Phoenix provides the reply function to reply to a socket_ref. The demo uses this to reply asynchronously to requests. Asynchronous replies must happen within the channel timeout period (10s default). A timeout event will trigger if it is not in that timeframe. It is possible to override this timeframe to be whatever you need for your use case.

The two refs can be null, as seen for the tick message. This will cause it to be a global message handler rather than a local one.

What are the benefits of a channel / transport process?

The process structure of Phoenix websockets is one of the things that helps make it fault tolerant. Let’s say that something goes wrong with the channel process and causes it to crash. The process will be started back up from the supervisor, causing the join function to be called (triggering any join callbacks on the client). However, the channel crash does not cause the client to see any fault in the socket it has.

If the transport process crashes, the client sees a disconnection and will try to reconnect. The channel process also restarts, because it is a child of the transport process. In this scenario, the system ends in a stable operating state.

A benefit of the channel process is that the process can be controlled like any other process. It can handle messages via a handle_info callback, which the demo uses to create a tick every 5 seconds. It can also store arbitrary state (best practice is the assigns map) which can be used to like any other GenServer state.

I have a few different things that are more advanced than an intro to websocket post. My current ideas are how to debounce a websocket push (push once every X seconds) and also what happens to setup the websocket requests.

Thanks for reading the 10th post in my 28 days of Elixir. Keep up through the month of February to see if I can stand subjecting myself to 28 days of straight writing. I am looking for new topics to write about, so please reach out if there’s anything you really want to see!

View other posts tagged: engineering elixir 28 days of elixir