Blog code update

I have changed my blog code so that it now allows me to attach tags to posts instead of filing them into one-dimensional categories.

Of course, for it to have any meaning I have to go back to old entries and add relevant tags. Or train a monkey to do it.

I’ve also updated my CD List code.

Since I ripped all my CDs to disk and no longer play CDs on my PC it was never being updated.

Now that I stream my music from home to work it makes sense for it to update when I play a CD that way – so I made that change.

In essence, when I play a CD in my streaming app, it sends off a HTTP POST to this site which then adds an entry into the database that tracks what I played and when.

The last part of that was to put together a quick script to insert historical data for all the music I was streaming over the past year or so.

Fortunately, I had had the foresight to collect that data locally in anticipation of this change.

The next steps are to track music that I play on my iPod and on wiphi.

What WAP was invented for?

I was just in HMV, and I was trying to remember the name of a band.

I knew that band had a track on a compilation I owned, so I looked for that compilation too, but no joy.

So I whipped out my phone and started wapping my way to boncey.org.

I then selected CD List, then search CD List and entered in the name of the compilation.

Back it came with 4 results, I selected the one I wanted then scrolled down the track listing to see the name of the band was Fun Lovin’ Criminals.

Now that’s clearly what WAP was invented for.

CD List

My CD List application has been monitoring and recording my music listening habits for around 20 months now.

I have managed to collect a sizable body of data about what CDs I have been listening to, and now I think it is time to do something with that data.

But first, some statistics.

The first CD entered into the system was Rubber Soul, this was on the 2nd May 2002.

Since then I have listened to 286 different CDs.

The actual number of distinct plays is 1268, which means each CD has been listened to an average of 4.43 times.

The most played CD is Late Lounge Disc One.

So, what should I do with all this data?

The first step is to clean it up.

My CD player has a habit of “finding” an incorrect CD if you accidentally press play when a data CD is inside. So I have some entries in there that I have never actually played.

Then, I want to filter out CDs that I borrowed and don’t belong to me, so I have added a flag and can “disown” a CD after it’s in the system.

Then, when I am happy that I can determine everything that belongs to me I can start to do some interesting stuff.

One idea is something that suggests music for me to listen to.

It can look for CDs I have not listened to in a while, or detect patterns of activity like multiple plays within the first month after initial creation then nothing for a while.

As I now record CDDB genres too that can help finetune the process.

Another idea is to display a list of CDs I was listening to the day I posted a particular blog entry.

I would also like to have an artist table in the database, so I can group music by artist.

This is tricky as the data quality isn’t always high enough to match artists (I am trying to keep manual intervention to an absolute minimum).

An ideal feature although difficult to implement would be to be able to link it in to something like allmusic or MusicBrainz so I can see other albums by artists I like and features like that.

I don’t have any ideas on how to do that though. 🙁

about the CD list – part three – automation and optimisation, then going live

Continued from about the CD list – part two.

You may also want to read about the CD list – part one.

How – Automating it

Shell Scripts

The next plan was to wrap it all up in a script that would call my program once for each cache file.

I used "find" as that’s what it’s there for.

That was easy to set up and worked fine, except it was real slow.

I planned to call this script every hour and it took 45 seconds to a minute each time, not much time you may think, but most of that was database access and hammering the database that much is just not good practice.

I needed to consider some optimisation techniques.

How – Optimisation

Obvious really

A quick look over the program revealed that for every invocation it was doing one database read and one database write.

The read to see if the CD was there and the write to either insert it or update its timestamp.

The most obvious optimisation was to only do an update if the timestamp of the cache file was newer than the entry in the database.

Some may say it was a bit of an oversight on my part to code it that way in the first place, they’re probably right. 😛

As most of the CDs being read each time would already be in the database this cut down the number of database writes quite dramatically.

However, I was still doing one read per CD cache file read. This was still not good enough

Dramatic improvements

The most obvious thing wrong was that I was calling the program once per file, therefore I could only read one CD from the database at a time.

It would have been ideal to have read the entire CD table into a list then see what needed updating.

But to do this I would have to have somewhere to store that list between invocations. That was tricky.

The alternative was to have the program take care of traversing the file system itself so I would only need to call it once.

This seemed easier so I set about doing that.

A few hours coding later and I had a recursive function that parsed the files and stored them in a List object.

A bit of tweaking here and there and I had a program that would do everything in one call.

This now involved only one database read plus one write for each new (or newer) cache file.

I finally ran it was very pleased to see that it took just over a second to run, this was down from 40 seconds on the last run. Booyah!

How – Sending it live

Fiddly bits

The only thing left to do now was to get it running on the Web server.

I wrote another script to copy the local cache from my computer to the server, this I achieved using rsync via ssh.

Then I set it to run from cron on my machine on an hourly basis. I could also trigger the import program on the Web server from cron also.

I also set it up to run from cron on my work PC too.

I was done… or was I?

How – The Future

Potential Improvements

Inevitably I am thinking about improvements that could be made.

The most obvious one is to paginate the results, they are already too long in my opinion (this is now done).

Another is to have a count of how many times a CD is played (this is also now done).

Please add comments below if you have any other suggestions.

about the CD list – part two – preparation and importing of data.

Continued from about the CD list – part one.

How – Preparation

The first step was to create the database tables and the model classes that would represent them.

I created a CD table and a Track table and the appropriate classes.

The plan was to populate the data from the cache files and keep a timestamp of when the CD was played, based on the timestamp on the cache file.

How – Importing data

Perl

I then had to consider how to get the data from the file system into a database.

My first instinct was to write it in Perl. Perl is normally my first choice when it comes to dealing with parsing data in files but I had no experience of talking to a database with Perl, plus I thought it might be good to do something in Java that I didn’t normally do.

Java

So I wrote a stand alone program in Java to parse a file created by libcdaudio.

I planned to call this in a script once for each cache file (libcdaudio creates one per CD), that way I wouldn’t need to write anything to deal with traversing the file system.

Parsing problems

The basis of the file was key value pairs separated by an equals sign, as follows

DTITLE=Stuart Hamm / Radio Free Albemuth

Great I thought, I can use the built in Java Properties stuff to parse that file.

I then realised that the values were sometimes truncated and followed on a second line with the key repeated, as follows

DTITLE=Tom Tom Club / The Good The Bad and the Funky [extended dub edit

DTITLE=ion]

Damn, the Properties code would probably barf on that.

I decided to try it just in case, but as suspected keys had to be unique so the first line in the example above would have been lost.

So, I wrote my own parser.

Uh-oh

Once I was happy with that I then put in the hooks to write the data to the database.

The plan was that entries that weren’t there would be added, and existing entries would have their timestamp updated.

That was when I hit my next problem.

libcdaudio didn’t update the timestamp when reading from the cache, quite sensible of course, but I needed it to do that.

It was time to "hack the source".

I modified libcdaudio to update the timestamp of the cache file whenever it read from the file.

This only involved adding one line of C code to the main source file thankfully.

I now had a program that could parse a libcdaudio cache file and write the results to the database.

It was a little rough around the edges but it worked.

Displaying the data

I then wrote the code to display the data on the site.

This consisted of a set of classes and JSP pages, one set for listing and one for showing an individual entry.

Continued in part three.

about the CD list – part one – introduction and why I did it

Everybody says the same thing when I show them the CD list.

"How did you do that, and why?"

This posting seeks to answer both those questions, if after reading it you still have questions, feel free to add a comment below.

I’ll answer the why first as I imagine not everyone will be interested in the techy details of how.

Why?

I always though it would be quite nice to have a page on my site that lists the music I’m currently listening to, but as I buy new CDs all the time it would be tedious to update.

I then thought it would be cool to have an automated system to keep it up to date but for a long time did nothing about it.

Then I recently realised that thanks to the power of freedb.org and wmsvencd (the application I use to play CDs at home and work) it was possible to do this.

3 days of furious programming later, followed by the odd hour here or there of tweaking and I now have an automated system.

Every time I play a CD on my PC at home or work, within an hour it will appear on this site.

How?

Above covers the why, this attempts to the cover the how.

I’m a Java Programmer by trade and I want to keep my skills current, so I tend to program my Web sites in Java.

This development was no exception.

The realisation that prompted me to do this work was finding out that libcdaudio (that wmsvencd uses) writes a cache of CDs I have played to the file system.

I saw this and realised all I had to was read the information from that cache into a database and I could then display it on my site.

It also has the nice side effect of building up a database of CDs I own (or listen to).

Continued in part two.