Running an exim secondary email server on EC2

A while back I set up an exim server on EC2 for secondary email – at the time I tested that it worked and it all looked fine.

It accepted emails for my domain then forwarded them on to the primary as expected.

The one critical thing that I didn’t test was what happened when the primary was down – it seemed like it might be difficult to test so I didn’t bother.

This proved to be a mistake.

Last weekend when the primary email server did actually go down I found out the hard way that my secondary was bouncing all emails.

Investigations determined that it was trying to deliver to itself after discovering that the primary was down.

This caused an email loop which was eventually bounced back to the recipient with the error “Too many received headers”.

This made no sense to me and prompted much head scratching.

After much searching online I figured out what was happening, and more importantly, how to fix it.

When exim detects that a host is down it moves on to the next MX DNS record listed (if any).

It then compares its own IP address(es) against the resolved MX record to make sure it doesn’t try to deliver to itself – this is the part that was failing!

The reason it was failing was that EC2 instances run with a NAT‘d IP address – meaning they have a private range IP address which differs from the elastic IP that I had assigned it (and was effectively recorded in the DNS record).

Exim had no way of knowing that it was trying to deliver to itself – hence its confusion.

The fix was fairly straightforward – exim provides a config setting for this situtation.

extra_local_interfaces = YOUR_ELASTIC_IP_ADDRESS

Add that to your /etc/exim4/conf.d/main/000_localmacros file and restart exim (I’m assuming an ubuntu style setup here).

Finally, on to testing that this fix works properly – it proved to be a simple case of stopping exim on my primary for a few seconds while routing an email through the secondary (using mutt on the command line) and checking the exim logs.

It now accepts emails and puts them upon the queue (sudo mailq) for a fixed amount of time before being forwarded to the primary.

This tip also applies if you’re running an Exim secondary on any sort of NAT network, not just EC2.

Zsh completion of arbitrary commands

I spent a good few hours over the weekend trying to figure out how do something with zsh completion that I figured would probably be quite simple but I just couldn’t find an example of anywhere.

I wanted to do tab completion based on the output of an arbitrary command.

This was so that I could make full use of AndyA‘s very useful directory shortcut pind script.

Essentially I was porting the _cdpin function from bash to zsh.


function _cdpin() {
local cur=${COMP_WORDS[COMP_CWORD]}
COMPREPLY=(
$( hasle ~/.pind -cx $cur )
)
}

complete -F _cdpin cdpin

It looked like it should be simple but after reading sections of the zsh manual, searching online, looking at the various completion scripts in my zsh installation and trawling Stack Overflow’s zsh content I couldn’t find a succinct example.

Eventually I found the answer in a book called From Bash to Z Shell which I was able to read on my Safari Books Online subscription.

The solution turned out to be ludicrously simple:


function _cdpin() {
compadd $(hasle ~/.pind -cx)
}

compdef _cdpin cdpin

I could make it even simpler if I stuck if in a file called _cdpin but I wanted to keep it in the same place as the functions.

Now when I type cdpin and hit tab I get a list of my existing entries and I can complete from there.

Annoying that something so simple was so hard to find so I’m posting it here in the hope it helps others.

And as AndyA was smart enough to share his scripts on github I’ve forked it and will add zsh support (via a conditional) and see if I can get him to pull in my changes (I’m sure he’ll do it if I buy him a beer).

As an aside, gotta say I’m loving github – I joined a few months back but have only just started uploading some code to it.

You can see my stuff at github.com/boncey.

NVIDIA drivers in Ubuntu 8.10 (Intrepid Ibex)

I upgraded from Kubuntu 7.10 to Ubuntu 8.10 this morning.

This was via a fresh installation into a new partition.

Getting it running went fairly smoothly but trying to get the NVIDIA drivers working was a world of pain.

It seemed that no matter what I tried I always got the following error:

Failed to load the NVIDIA kernel module

Every tip and bit of advice I found online didn’t work.

It turns out that the linux kernel headers aren’t installed by default but are required by the nvidia drivers (I only wasted two hours finding this out!).

It’ll never work unless you install the linux headers first.

These commands fixed it for me (exact version numbers may vary according to current Linux kernel version and your particular video card).

apt-get install linux-headers-2.6.27-7-generic

apt-get install nvidia-glx-177

Hope this helps someone.

My convoluted CD importing system

And here’s how a CD gets from HMV on to my iPod…

Hmmm, it’s complicated alright.

But it needs to be as I have several goals I am trying to meet.


Never have to rip the CD more than once.

I rip to FLAC format which is lossless so I can recreate the original wav file at any time.

Be able to play my music back from a variety of sources.

I need to be able to play music on both my iPod(s) and through MPD.

The ripping and encoding bit is done with a Java app I wrote that encodes in parallel (it can handle Ogg Vorbis too if I want).

It’s not as automated as I’d like, I need to run it by hand when I insert a CD and choose the matching CDDB entry if there are multiple matches.

Oh, and here’s what I bought.

My plan for spam – one month on

A while back I wrote about “my plan for spam“.

“My plan” has been running for a month now so it’s time to review it.

Before I implemented it I was getting around 600 spams a month.

After running it for a full month I’m down to around 250.

So, I’d consider that a fairly successful plan.

Of course, all the ones that were caught in my spam filter were being sent to my home account and put into a spam folder.

I said I’d monitor that and if there were no false positives I’d set it to delete them upon arrival.

There are no false positives, so I will be setting it to delete (need to test properly before I go putting delete rules into my filters).

As for getting that 250 even lower – I’m kinda stuck.

Around 80% of those 250 are direct to my gmail address so they don’t touch my filtering system.

I don’t really use my gmail address so I could set gmail to delete anything sent to that address.

The thought of doing that scares me a bit – I’ll wait and see how annoyed I get by it all I think.

Of course, no doubt in 6 months time I’ll be back up to 600 a month again, but what can you do?

Update:

I found out that if I tell gmail to delete an email in a filter rule it puts it into the Deleted Items folder which is automatically cleaned out after being in there for 30 days.

So, with that in mind I’ve set it to delete any emails that are addressed to my gmail address.

That way I have 30 days to find a real email if I have reason to believe it was sent to my gmail address.

Since doing that I’ve received an average of just over one spam a day!

That’s going to be around 40 a month.

Eat that spammers!

Purging jsessionids

jsessionid is the parameter that a Servlet engine adds to your site’s URL if you’ve enabled cookies in your config but the user viewing the site doesn’t have cookies enabled.

It then allows a cookie-less user to use your site and maintain their session.

It seems like a good idea but it’s a bit flawed.

The author of randomCoder has summarised the flaws quite well.

Every link on your site needs manual intervention

Cookieless sessions are achieved in Java by appending a string of the format ;jsessionid=SESSION_IDENTIFIER to the end of a URL. To do this, all links emitted by your website need to be passed through either HttpServletRequest.encodeURL(), either directly or through mechanisms such as the JSTL <c:out /> tag. Failure to do this for even a single link can result in your users losing their session forever.

Using URL-encoded sessions can damage your search engine placement

To prevent abuse, search engines such as Google associate web content with a single URL, and penalize sites which have identical content reachable from multiple, unique URLs. Because a URL-encoded session is unique per visit, multiple visits by the same search engine bot will return identical content with different URLs. This is not an uncommon problem; a test search for ;jsessionid in URLs returned around 79 million search results.

It’s a security risk

Because the session identifier is included in the URL, an attacker could potentially impersonate a victim by getting the victim to follow a session-encoded URL to your site. If the victim logs in, the attacker is logged in as well – exposing any personal or confidential information the victim has access to. This can be mitigated somewhat by using short timeouts on sessions, but that tends to annoy legitimate users.

There’s one other factor for me too; public users of my site don’t require cookies – so I really don’t need jsessionids at all.

Fortunately, he also presents an excellent solution to the problem.

The solution is to create a servlet filter which will intercept calls to HttpServletRequest.encodeURL() and skip the generation of session identifiers. This will require a servlet engine that implements the Servlet API version 2.3 or later (J2EE 1.3 for you enterprise folks). Let’s start with a basic servlet filter:

He then goes on to dissect the code section by section and presents a link at the end to download it all.

So I downloaded it, reviewed it, tested it and implemented it on my site.

It works a treat!

However, I still had a problem; Google and other engines still have lots of links to my site with jsessionid in the URL.

I wanted a clean way to remove those links from its index.

Obviously I can’t make Google do that directly.

But I can do it indirectly.

The trick is first to find a way to rewrite incoming URLs that contain a jsessionid to drop that part of the URL.

Then to tell the caller of the URL to not use that URL in future but to use the new one that doesn’t contain jsessionid.

Sounds complicated, but there are ways of doing both.

I achieved the first part using a thing called mod rewrite.

This allows me to map an incoming URL to a different URL – it’s commonly used to provide clean URLs on Web sites.

For the second part there is a feature of the HTTP spec that allows me to indicate that a link has been permanently changed and that the caller should update their link to my site.

301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.

So, putting these two together, I wrote the following mod rewrite rules for Apache.


ReWriteRule ^/(\w+);jsessionid=\w+$ /$1 [L,R=301]
ReWriteRule ^/(\w+\.go);jsessionid=\w+$ /$1 [L,R=301]

The first rule says that any URLs ending in jsessionid will be rewritten without the jsessionid.

The second does the same but maps anything ending in .go – I was too lazy to work out a single pattern to do both types of URLs in one line.

And I used that all-important 301 code to persuade Google to update its index to the new link.

So, from now on – my pages will no longer output jsessionids and any incoming links that include them will have them stripped out.

In other words; jsessionids purged.

My plan for spam

Every now and again I get sufficiently annoyed by spam to want to do something about it.

Today was one of those days where I had enough time to sit down and work on my plan for dealing with spam.

My email setup is a little unusual so it became a bit complicated.

All my incoming email is processed on my mail server; then one copy is forwarded to gmail, the other to my computer at home.

I then read all my email via gmail, and keep the copy at home just for backup purposes.

I’ve been doing this for a while; but found the manual task of scanning through spam in gmail to be tiresome and annoying.

What I wanted to do was cut out a lot of the spam on my server, before it was even passed on to gmail.

I figured I could script a better spam-filtering solution than gmail’s system.

My plan was to install bogofilter on the mail server and use that to filter out a lot of the spam.

There were two problems with this, one was that I needed a body of spam and ham emails to train it with and I didn’t keep email on the server.

The other was that it would need constant training as new emails came in, this was very tricky as emails were passsed on to gmail and my home computer in an automated process with no scope for human intervention.

To deal with the first problem I decided to install bogofilter at home too and train it there, then upload the training database to the mail server.

For the second problem I came up with the following solution:

I would use bogofilter on the mail server; send anything flagged as non-spam on to gmail and home; and send anything classed as spam just to home.

Once it got to home, it would be passed through bogofilter a second time; this instance would be set up slightly differently to the first one; it would classify emails into one of three folders; ham, spam or unsure.

I would then use mutt to periodically re-train bogofilter telling it that anything in the “unsure” folder was either ham or spam.

Finally, the newly trained database would be copied back up to the mail server each night.

The more astute reader may have noticed a problem with this solution.

I seem to have replaced scanning a folder full of spam on gmail, for scanning a folder full of spam at home.

This is true, initially I will be dealing with the same amount of spam.

However, I have a longer term plan here.

Once I’m happy with the filtering I’m going to tweak my solution so that anything tagged as spam will be deleted outright once it hits my home PC.

This I expect will reduce the amount of spam that I see by about 90%.

I am comfortable with the fact that I will probably lose the occasional non-spam email.

I’m gonna run this system for about a month, if I get through that with zero false positives I’ll feel brave enough to set it to delete.

Reclaiming ext3 disk space

A while back I bought an external hard drive for backing up my flacs and my photos.

Recently it started to fill up.

This was, of course, a bad thing.

I looked at how much space I had left on it:


Filesystem Size Used Avail Use% Mounted on
/dev/external 276G 259G 2.5G 100% /mnt/external

So, I had 2.5GB free. But hold on, the maths doesn’t make sense.

I have a 276GB drive, with 2.5GB free, yet I’ve only used 259GB.

I’m missing 14.5GB!

I did some googling and found out the following:

The most likely reason is reserved space. When an ext2/3 filesystem is formated by default 5% is reserved for root. Reserved space is supposed to reduce fragementation and allow root to login in case the filesystem becomes 100% used. You can use tune2fs to reduce the amount of reserved space.

So, ext2/3 reserves 5% of space, which on my drive is 13.8GB – well, that’s close enough to 14.5GB, so that explains that mystery.

The next questions was; can I and should I reduce that amount of reserved space?

More googling:

The reserved blocks are there for root’s use. The reason being that the system gets really narky if you completely run out of room on / (specifically /var, or /tmp, I think). Programs won’t start, wierd errors will pop up, that sort of thing. With some room reserved for root, you can at least be sure to be able to run the really important programs, like sudo and rm .

So, in short, if the drive doesn’t contain /var or /tmp, then there’s not much point in having space reserved for root.

So, some poster on some Internet forum says it’s probably OK to do away with that reserved space.

That’s usually good enough for me, but I figured this time I’ll play it safe and reduce it to 1%.

So I unmounted the drive and ran the following command: tune2fs -m 1 /dev/external

I re-mounted and voila, 11.5GB appeared out of nowhere!


Filesystem Size Used Avail Use% Mounted on
/dev/external 276G 259G 14G 95% /mnt/external

I’ve now run this on my other non-booting partitions.

All seems fine so far.

I’ll leave my system partition at 5% I think though, just to be safe.

Sporadic service will resume

The server that hosts this site is dying.

It’s due to be replaced by a brand new server with lots of RAM – once I’ve finish building it and I can get it installed.

So until then, the site may be up and down a lot.

Once it’s fixed, sporadic service will resume.

Update: A reboot seems to have fixed it, no random restarts since.

Missing desktop icons in Ubuntu/Gnome

Ever since I installed Ubuntu I’ve not been able to right-click my desktop – as in, I right-clicked and no menu appeared.

Also, I never had any icons on my desktop.

I could view my ~/Desktop folder in Nautilus and place files there, but they never actually appeared on my desktop.

Compared to all the other things I was setting up this was just a minor annoyance – but today I was able to give it my full attention and thereby allow it to become a major annoyance.

The important thing to note here is that I always keep my home directory between installs – so clearly I had some strange setting from way back hidden in a config file somewhere (and Gnome gives you a lot of config files).

Anyway, suffice to say, after much hacking and googling I found a page advising me how to fix the problem.

Following the instrucions to restore the Trash Can at SystemTools – Configurator Editor – Nautilus… I disable and enable the Preferences – show_desktop option and now I have my lost icons again.

The “show_desktop” option – if only they were all that simple.