Redundant SFTP Servers with AWS: Route53

With Amazon I recently ran into a case where I needed an SFTP service that was accessible to the internet but could survive up to 2x AWS availability zone outages. Sadly, amazon only allows certain ports to be load balanced using the Elastic Load Balancers. Since we could not use any other port but 22 we were forced to look at a few solutions such as HAProxy, and some commercial solutions. The latter being very expensive from a licensing and time perspective.

In order to survive two availability zone outages any infrastructure that the SFTP process relies on also needs to be available. Below is a list of the various systems that are required for this whole process to run.

  • Shared Authentication system
  • Shared Filesystem
  • SFTP Server
  • Some way to automatically route traffic between healthy hosts.

I will touch upon each of these in seperate blog posts but for this one I want to discuss the overall arcitecture of this system.

For centralized authentication you have a few options. One of which would be some sort of master/slave LDAP system that spans availability zones. Another might be using SQLAuthTypes with an RDS instance.

One important thing is to ensure that your SFTP Servers share a common file system. This can be accomplished a myriad of ways, but if you want to go the cheaper route then I recommend GlusterFS. I wont dive into the setup of this too much as that could be a whole article in itself. However with AWS you would want three glusterFS servers each in an availability zone. You would configure a replica volume (Data is replicated across all three glusterFS servers) and this volume would be mounted on each SFTP server. In the event of a glusterFS failure the client would immediately start reading/writing from one of the other gluster servers.

Another important thing to remember is that you want things to be transparent to the end user. You want to sync the ssh host keys in /etc/ssh across the servers. This way if a user connects via SFTP and gets routed to a diffent SFTP server they wont get any warnings from their client.

What ties all of this together is route53. Amazon recently introduced the ability to create health checks within the DNS Service that they offer customers. The documentation for configuring health checks is certainly worth reading over.

First health checks must be external, so lets assume you have three servers with elastic IPs.

Hostname Public IP
sftp-1.example.com 57.108.34.22
sftp-2.example.com 57.118.11.90
sftp-3.example.com 57.107.39.93

We first want to configure three TCP health checks for the above IPs that check for port 22.

content_route53-1

We then want to add record sets inside of the zone file we have. Make sure that you have the TTL set to either 30 or 60 seconds. In the event an instance goes down you want that record to be yanked out as quickly as possible. Depending on how you want to configure things (active/active or active/passive) you may want to adjust the Weighted value. If each record has the same weight then a record will be returned at random. If you weight sftp-1 to 10 and sftp-2/3 to 20 then sftp-1 will always be returned unless it is unavailable then either of the other two will.

content_route53-2

As you can see, the final configuration shows three servers that will response for sftp.example.com

content_route53-3

Now that we have the above configured. If you run dig you will see that the TTL value is low, if you stop SFTP then that record will no longer be returned.

In my next post in this series, I will discuss glusterFS.

Elastic Load Balancing to VPC instances on a private subnet

Generally, when creating an elastic load balancer inside a VPC you are load balancing to a set of servers that are inside a public subnet (which routes through the IGW device). Each of these instances has it’s own elastic IP and while elastic IPs are free you may want to move your web servers inside a private subnet to avoid placing a public IP on them.

I have had discussions with amazon support and generally gotten mixed responses. The documentation is not exactly clear in the required configuration. First, the important thing to understand is that when you create a load balancer you are essentially creating hidden instances inside that subnet which have two IPs (a public and a private). If we wanted to have our public load balancer able to serve traffic from a set of servers on a private subnet then we need to create public subnets for the hidden elastic load balancer instances to live in.

Let’s say we have three webservers. Each of these are in a different availability zone and existing inside a private subnet which routes traffic through a NAT device. These instances do NOT have public IPs.

Hostname IP Address
www-1.example.com 10.200.21.21
www-2.example.com 10.200.22.21
www-3.example.com 10.200.23.21

We need to create three public subnets with the following address spacing

10.200.11.0/24
10.200.12.0/24
10.200.13.0/24

The subnets we just created are associated with our public routing table which will route traffic through the Internet Gateway Device. We can now create an Elastic Load Balancer using the documentation link above. When you get to the page to select the subnets you want to make sure you select the subnets we just created. You then want to select the instances that you wish to load balance traffic to. You should make sure you configure your health checks correct and assuming your security groups allow traffic from the Elastic Load balancers subnet or security group to the webservers your instances will soon report healthy and you will be able to access your content using the new load balancer URL (assuming your configuration permits this).