Improve Exq Writes With Pooling
Exq is a background job processing library written in Elixir. It uses Redis, via the
Redix library, to store and then retrieve jobs. In this post, we’ll look at the performance of writing jobs into Redis
via the Exq.Enqueuer
API. You’ll see several benchmarks that utilize a single Enqueuer
, a poolboy queue, and a named
process pool.
The repo for the benchmark and sample application is at https://github.com/sb8244/exq-throughput.
The Problem
Background job processing libraries write their jobs into a persistent storage mechanism and then retrieve those jobs in the future. If you’ve used Ruby, you may be familiar with Sidekiq. The act of writing to Redis is very fast, but there can be overhead at multiple levels. If the overhead is too high, then writing jobs to Redis becomes slow and the application may become backed up. This can lead to errors or even a loss of service, if acknowledged persistence of a job is required.
Types of Overhead
The most common overhead that I’ve seen is the backup of Redis commands being executed end-to-end serially. This happens when you use a single connection to write to Redis, and can occur in any language. The issue arises because a single connection can only send one command at a time. It must then wait for the response before another command can occur. Redis is single-threaded, so it may not be obvious why this is an issue. The problem is that the network overhead is done serially in this type of system—each write has to go over the network and back before the next starts.
The following diagram shows the speed of three hypothetical Redis requests:
Redis single connection versus pooled connection. Pooled connection completes 3 requests much faster.
Each connection sends a command that goes over the network to Redis, which processes the command. A response is returned and also goes over the network. In the real-world, this network latency might be 1ms or less. However, the end result is that the requests complete much faster when multiple commands can be simultaneously sent via multiple connections.
Another type of overhead is the fact that an Elixir process handles messages serially. If a job is enqueued via a single process, the same problem as a single connection emerges.
The Problem in Exq
Exq enqueues jobs through the Exq.Enqueuer
process. This is a single process that holds a single redis connection. Each enqueue
task goes through this one process, serially. If serial processes and single connections lead to less throughput, then this is
will limit the throughput of Exq enqueueing.
Let’s move into what we can do about it, and then benchmarks.
Pool Processes to Increase Throughput
The solution to the problem above is to pool processes, so that multiple Redis commands can be sent to Redis in the same moment of time. There are two main ways that I’ve done this in Elixir: poolboy and named pools.
Poolboy
Poolboy is a nifty Erlang library that can create a pool of any process you want. We could
pool Exq.Enqueuer
processes and then enqueue jobs by using the poolboy functions. Let’s see how we’d do that:
defmodule ExqThroughput.Application do
use Application
def start(_type, _args) do
children =
[
:poolboy.child_spec(:worker, poolboy_config())
]
opts = [strategy: :one_for_one, name: ExqThroughput.Supervisor]
Supervisor.start_link(children, opts)
end
def enqueuer_pool_size(), do: :erlang.system_info(:schedulers_online)
def poolboy_config() do
[
{:name, {:local, :enqueuer}},
{:worker_module, ExqThroughput.PooledEnqueuer},
{:size, enqueuer_pool_size()}
]
end
end
defmodule ExqThroughput.PooledEnqueuer do
def start_link(_) do
# Hack to make Exq happy with running
num = :rand.uniform(100_000_000) + 100
name = :"Elixir.Exq#{num}"
Exq.Enqueuer.start_link(name: name)
# We need to put the enqueuer instance into the pool
{:ok, Process.whereis(:"#{name}.Enqueuer")}
end
end
There is a bit of a hack in the PooledEnqueuer
module to make Exq happy. There may be another way to get around this, but I went
for a quick solution for the purpose of this benchmark. There is also a bit of working around the Exq process tree to get access
directly to the Enqueuer process.
We can now enqueue a job by first checking out the poolboy process:
:poolboy.transaction(:enqueuer, fn pid ->
Exq.enqueue(pid, "throughput_queue", Worker, [])
end)
Named process pooling looks a bit different than this.
Named Processes
You can start multiple processes in Elixir and give them a name like MyProcess1
, MyProcess2
, etc. When you want to send a
message to the process, you would send a message to :"Elixir.MyProcess#{:rand.uniform(2)}"
. This is named process pooling, and is
conceptually very simple—this makes it easier to setup.
We have to start the pool of processes in the application:
defmodule ExqThroughput.Application do
use Application
def start(_type, _args) do
children = named_enqueuer_pool(enqueuer_pool_size())
opts = [strategy: :one_for_one, name: ExqThroughput.Supervisor]
Supervisor.start_link(children, opts)
end
def enqueuer_pool_size(), do: :erlang.system_info(:schedulers_online)
defp named_enqueuer_pool(count) do
for i <- 1..count do
name = :"Elixir.Exq#{i}"
%{
id: name,
start: {Exq.Enqueuer, :start_link, [[name: name]]}
}
end
end
end
We can then enqueue work by directly using these processes:
def named_enqueue() do
num = :rand.uniform(ExqThroughput.Application.enqueuer_pool_size())
Exq.enqueue(:"Elixir.Exq#{num}.Enqueuer", "throughput_queue", Worker, [])
end
I love this approach due to its simplicity. Let’s see how all of the approaches stack up.
Benchmark
Benchee is used to benchmark three scenarios: single process, poolboy, named processes. Benchee is ran with various parallelism amounts to simulate how you might run Exq in production. For example, if you are enqueueing from a web tier, then your parallelism will be quite high. If you’re enqueueing from a single process, you would have no parallelism.
The redis queues are cleaned up before/after each test. The Exq work processor is not running—this test is purely around speed of enqueueing. These tests are all running locally, and Redis is not running through any type of virtualization. The performance would be significantly different depending on how redis is setup and the network speed between your application and redis.
When Benchee was run with a single runner, all of the approaches came out roughly the same. This is expected because we won’t see parallelism benefits without multiple processes trying to enqueue.
Name ips average deviation median 99th %
named enqueuer 9.05 K 110.52 μs ±42.61% 99 μs 210 μs
poolboy enqueuer 8.73 K 114.51 μs ±57.05% 102 μs 240 μs
default enqueuer 8.30 K 120.54 μs ±51.87% 110 μs 249 μs
The difference with 6 parallel testers was quite different. We can see that the pool approaches have significantly higher throughput:
total ips is these numbers * 6
Name ips average deviation median 99th %
poolboy enqueuer 4.40 K 227.14 μs ±39.15% 216 μs 417 μs
named enqueuer 3.95 K 253.41 μs ±45.96% 227 μs 605 μs
default enqueuer 1.05 K 954.02 μs ±21.91% 951 μs 1446.13 μs
Now for 12:
total ips is these numbers * 12
Name ips average deviation median 99th %
poolboy enqueuer 2.83 K 352.86 μs ±26.97% 339 μs 655 μs
named enqueuer 2.78 K 359.24 μs ±53.25% 302 μs 1004 μs
default enqueuer 0.84 K 1187.04 μs ±21.96% 1121 μs 1882.19 μs
and 24:
total ips is these numbers * 24
Name ips average deviation median 99th %
named enqueuer 1.48 K 675.58 μs ±66.26% 541.98 μs 2198.98 μs
poolboy enqueuer 1.06 K 942.92 μs ±51.20% 845.98 μs 2470.98 μs
default enqueuer 0.34 K 2900.89 μs ±19.05% 2765.98 μs 4482.25 μs
That one surprised me because the named enqueuer was significantly more performant. I tried it over 10 times and consistently got the same results. The tests were run in different order each time.
That disappeared for 48:
total ips is these numbers * 48
Name ips average deviation median 99th %
poolboy enqueuer 912.30 1.10 ms ±30.56% 1.01 ms 2.35 ms
named enqueuer 896.40 1.12 ms ±77.47% 0.86 ms 4.06 ms
default enqueuer 203.05 4.92 ms ±18.66% 4.65 ms 8.84 ms
Interpreting the Results
These results show, clearly, that pooling the Exq.Enqueuer
process significantly increases throughput. This might be
even more pronounced when Redis is accessed over the network.
Each test increased the parallelism, and the gap between pooled and single got even larger. With 48 processes enqueueing jobs, the total throughput per second is ~43,000 versus ~9,600. With 12 processes enqueueing jobs, the throughput per second is still ~33,000 versus ~10,000.
Action the Results
If you are using Exq in production, consider pooling the enqueuer processes to increase throughput capacity. You may also increase your enqueue speeds even if you’re not at capacity. You can use any pooling approach you want, they are roughly the same and have a substantial impact to throughput.
Exq already has an open issue to discuss adding some type of parallelism to the enqueuer process. Thanks to my colleague Marco for opening that issue and for letting me look at this problem with him.
The Book Plug
My book “Real-Time Phoenix: Build Highly Scalable Systems with Channels” is now in beta through The Pragmatic Bookshelf. This book explores using Phoenix Channels, GenStage, and more to build real-time applications in Elixir.