Crouching Supervisor, Hidden File Descriptor Setting

Here’s an interesting problem our team faced last month that was extremely infuriating. We were in the process of launching replacement haproxy instances that are used to load balance to nodes in our RabbitMQ cluster. We’ve done this a lot of times before and set all the usual user settings required under limits.d to ensure proper file descriptors are allocated for the haproxy process. While creating this new role we also decided to use supervisor to supervise the haproxy process as it was previously observed in an older release that it didn’t automatically restart when it crashed (which in itself is a rarity).

Everything looked solid and we began throwing some traffic at the new balancer. Eventually we discovered something had gone horribly wrong! Tons of connection refused errors began showing up and the behavior exhibited was what one would expect if file descriptors weren’t being allocated correctly. Sure enough a quick look at /proc/<pid>/limits revealed that maximum open file descriptors were set to the very low value of 1024. We directed traffic back to the old balancer and began the investigation. How could this be? All of the settings were correct so why is it being set to 1024?

Supervisor was one new variable in the mix so I decided to begin pursuing the supervisor documentation and scanning for the number 1024 to see what might be tied to that. Sure enough, I came to discover the minfds setting. Let’s take a look at what the supervisor documentation has to say about this setting.

The minimum number of file descriptors that must be available before supervisord will start successfully. A call to setrlimit will be made to attempt to raise the soft and hard limits of the supervisord process to satisfy minfds.

The hard limit may only be raised if supervisord is run as root. supervisord uses file descriptors liberally, and will enter a failure mode when one cannot be obtained from the OS, so it’s useful to be able to specify a minimum value to ensure it doesn’t run out of them during execution. This option is particularly useful on Solaris, which has a low per-process fd limit by default.

Default: 1024

Well that doesn’t make much sense… if I’m reading this correctly it’s simply saying that the number specified is the minimum that should be available, right? The devil as they say is in the details. If we look at the documentation on setrlimit we’ll clearly see that this will actually set the limits without any reservations on what it currently is. The call basically is going to set max open files to whatever the value minfds is defined to in supervisor. Sure enough, as an experiment I set minfds in supervisor’s configuration to a higher number and after restarting supervisor the number of open file descriptors allocated to the haproxy process were greatly increased and reflected what minfds was set to.

In the end this pain also turned out to be unnecessary. While we had used supervisor because it was “what we know well” it turned out that the newer distribution we were releasing on already managed services via systemd which by default was also configured to respawn on failure.

FACEPALM!

Hopefully this story will prevent a similar trail of sorrow for others who may encounter the same situation!

TLDR; If you’re having supervisor supervise an application that is sensitive to max open file descriptors you’ll want to ensure minfds is set to match!