about the CD list – part three – automation and optimisation, then going live

Continued from about the CD list – part two.

You may also want to read about the CD list – part one.

How – Automating it

Shell Scripts

The next plan was to wrap it all up in a script that would call my program once for each cache file.

I used "find" as that’s what it’s there for.

That was easy to set up and worked fine, except it was real slow.

I planned to call this script every hour and it took 45 seconds to a minute each time, not much time you may think, but most of that was database access and hammering the database that much is just not good practice.

I needed to consider some optimisation techniques.

How – Optimisation

Obvious really

A quick look over the program revealed that for every invocation it was doing one database read and one database write.

The read to see if the CD was there and the write to either insert it or update its timestamp.

The most obvious optimisation was to only do an update if the timestamp of the cache file was newer than the entry in the database.

Some may say it was a bit of an oversight on my part to code it that way in the first place, they’re probably right. 😛

As most of the CDs being read each time would already be in the database this cut down the number of database writes quite dramatically.

However, I was still doing one read per CD cache file read. This was still not good enough

Dramatic improvements

The most obvious thing wrong was that I was calling the program once per file, therefore I could only read one CD from the database at a time.

It would have been ideal to have read the entire CD table into a list then see what needed updating.

But to do this I would have to have somewhere to store that list between invocations. That was tricky.

The alternative was to have the program take care of traversing the file system itself so I would only need to call it once.

This seemed easier so I set about doing that.

A few hours coding later and I had a recursive function that parsed the files and stored them in a List object.

A bit of tweaking here and there and I had a program that would do everything in one call.

This now involved only one database read plus one write for each new (or newer) cache file.

I finally ran it was very pleased to see that it took just over a second to run, this was down from 40 seconds on the last run. Booyah!

How – Sending it live

Fiddly bits

The only thing left to do now was to get it running on the Web server.

I wrote another script to copy the local cache from my computer to the server, this I achieved using rsync via ssh.

Then I set it to run from cron on my machine on an hourly basis. I could also trigger the import program on the Web server from cron also.

I also set it up to run from cron on my work PC too.

I was done… or was I?

How – The Future

Potential Improvements

Inevitably I am thinking about improvements that could be made.

The most obvious one is to paginate the results, they are already too long in my opinion (this is now done).

Another is to have a count of how many times a CD is played (this is also now done).

Please add comments below if you have any other suggestions.

Leave a Reply

Your email address will not be published. Required fields are marked *