A while back I set up an exim server on EC2 for secondary email – at the time I tested that it worked and it all looked fine.
It accepted emails for my domain then forwarded them on to the primary as expected.
The one critical thing that I didn’t test was what happened when the primary was down – it seemed like it might be difficult to test so I didn’t bother.
This proved to be a mistake.
Last weekend when the primary email server did actually go down I found out the hard way that my secondary was bouncing all emails.
Investigations determined that it was trying to deliver to itself after discovering that the primary was down.
This caused an email loop which was eventually bounced back to the recipient with the error “Too many received headers”.
This made no sense to me and prompted much head scratching.
After much searching online I figured out what was happening, and more importantly, how to fix it.
When exim detects that a host is down it moves on to the next MX DNS record listed (if any).
It then compares its own IP address(es) against the resolved MX record to make sure it doesn’t try to deliver to itself – this is the part that was failing!
The reason it was failing was that EC2 instances run with a NAT‘d IP address – meaning they have a private range IP address which differs from the elastic IP that I had assigned it (and was effectively recorded in the DNS record).
Exim had no way of knowing that it was trying to deliver to itself – hence its confusion.
The fix was fairly straightforward – exim provides a config setting for this situtation.
extra_local_interfaces = YOUR_ELASTIC_IP_ADDRESS
Add that to your
/etc/exim4/conf.d/main/000_localmacros file and restart exim (I’m assuming an ubuntu style setup here).
Finally, on to testing that this fix works properly – it proved to be a simple case of stopping exim on my primary for a few seconds while routing an email through the secondary (using mutt on the command line) and checking the exim logs.
It now accepts emails and puts them upon the queue (
sudo mailq) for a fixed amount of time before being forwarded to the primary.
This tip also applies if you’re running an Exim secondary on any sort of NAT network, not just EC2.