Clustering your Elixir application on AWS inside an Auto Scaling Group

To cluster your Elixir application, you need to tell each node how to find the other nodes.

If you use AWS Auto Scaling Groups, this is a little tricky, as you can’t just hard code IPs - they’ll change any time an instance gets launched.

This post demonstrates one way to work with this. We’ll use the AWS command line tools to fetch the IPs of other instances in the same Auto Scaling Group and write them to a .hosts.erlang file. When our application starts, we’ll call :net_adm.world(), which will use that file to connect our node to the cluster.

We assume distillery is used to build the application and use it’s hook system to update the .hosts.erlang file before the application starts.

General cluster setup

First, let’s set up the AWS Security Group for Erlang clustering. Open ports 4369 (for the Erlang port mapper daemon) and 9100-9155 (for actual inter node communication). Traffic on those ports should be allowed inside the Security Group.

Setup your Security Group for Erlang clustering

Next, you need to create a custom vm.args template. In this, we tell Erlang to actually use the ports we just opened.

Create a file rel/templates/vm.args.eex with the following contents:

## Name of the node
-name <%= release_name %>@127.0.0.1

## Cookie for distributed erlang
-setcookie <%= release.profile.cookie %>

## Heartbeat management; auto-restarts VM if it dies or becomes unresponsive
## (Disabled by default..use with caution!)
##-heart

## Enable kernel poll and a few async threads
##+K true
##+A 5

## Increase number of concurrent ports/sockets
##-env ERL_MAX_PORTS 4096

## Tweak GC to run more often
##-env ERL_FULLSWEEP_AFTER 10

# Enable SMP automatically based on availability
-smp auto

# use ports between 9100 and 9155 to communicate (as set in aws security group)
-kernel inet_dist_listen_min 9100 inet_dist_listen_max 9155

If you already use a custom vm.args, just add the last line.

In rel/config.exs make distillery actually use that file:

release :my_app do
  set vm_args: "rel/templates/vm.args.eex"
end

Fetching IPs via AWS command line tools

Next we’ll create a script to pipe the IPs of different instances in our Auto Scaling Group into the .hosts.erlang file.

Copy the following into rel/hooks/pre_start and set correct values for REGION and APP_PATH:

#!/bin/bash
REGION=us-east-1
APP_PATH=/var/www/my_app
echo "Creating .hosts.erlang"

InstanceID=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/instance-id`
ScalingGroup=`aws ec2 describe-tags --filters "Name=resource-id,Values=$InstanceID" "Name=key,Values=aws:autoscaling:groupName" --region $REGION --query 'Tags[].Value[]'  --output text`
aws ec2 describe-instances --region $REGION --filters Name=tag:"aws:autoscaling:groupName",Values=$ScalingGroup --output text --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses[].PrivateIpAddress' | sed '$!N;s/\t/\n/' | sed -e "s/\(.*\)/'\1'./" > $APP_PATH/.hosts.erlang

echo ".hosts.erlang created"

Now configure distillery to use that script as pre_start hook. In rel/config.exs set:

environment :prod do
  ...
  set pre_start_hook: "rel/hooks/pre_start"
end

Connect to the cluster on application start

In your applications start callback in my_app.ex call :net_adm.world() to actually connect to the other nodes:

def start(_type, _args) do
  # connect to other nodes in the cluster
  # requires a .hosts.erlang file in the release root
  :net_adm.world()
  ...
end

Test it

After all those changes, you should be ready to go. You can test by running Node.list() inside a remote_console on one of your instances. If it returns a non-empty list, your nodes are connected.