Do algorithms still matter?

I attended the BCS Mini-SPA event a few months back.

The premise of the mini SPA is as follows:

If you attended SPA2006 you might find that the miniSPA2006 programme allows you to catch up with sessions you didn’t select at the event.



The annual SPA conference (formerly known as OT) is where IT practitioners gather to reflect, exchange ideas and learn.

It also served as a convenient advert for next year’s full SPA event.

It was also free, had a free lunch and got me out of the office for a day, so it pretty much fulfilled all my criteria.

The structure of the day was 6 sessions, divided into two parallel streams.

I attended “Distributed workforces”, “Modelling with Views” and “A Good Read” but the one that really interested me was “A Good Read”.

This was a panel of five people who had each proposed a book to discuss. Each member of the panel then read each book so they could discuss it and give their views and insights.

The really interesting part for me was that someone proposed Programming Pearls by Jon Bentley.

I’ve owned a copy of this book for years but have yet to finish it, (it’s back on my “to read” list now though).

Everybody roundly praised the book but one of the members of the panel questioned whether we needed to know that level of detail when it comes to coding efficient algorithms – “wouldn’t it be simpler to throw more CPU and RAM at a problem?” they said.

Someone in the audience then countered that algorithm efficiency was relevant once again when programming Web-apps. They said something along the lines of “Wait until 100 people hit that page on your site“.

Sadly the session ran out of time at that point so no conclusion was reached.

My own belief is that you do need to know code at that level, especially if you write Web sites or other similar client/server apps with many concurrent client requests.

I’m not saying everyone should know the Quicksort algorithm inside out, but if you program in Java (for example) you should know the difference between a Vector, an ArrayList and a plain old array and when to use each.

I have had personal experience of a badly written for loop bringing down a Web site on launch day.

The for loop in itself wasn’t the worst code ever written by any means, but it was probably executed 30 to 40 times per individual home page hit.

Multiply that by a few dozen concurrent hits (it was a busy site) and any flaws in that code were mercilessly exposed.

Embarrassingly for me, it was my code. Oops.

Ever since that day I’ve been unable to forget that no amount of “CPU and RAM” (and we had a lot) will help if you don’t get your algorithms right in the first place.

Need a new Programming Language

I need to learn a new Programming Language.

This is for two reasons.

In my time as a programmer I’ve learned and used; Basic, Ada, C, C++, VB, Perl and Java.

So that’s 7 (5 if you merge Basic with VB and C with C++).

It’s a reasonable amount, a little on the small side.

But that list is only half the truth; most of those languages I’ve not touched in years, some I’m definitely never going to touch again (Ada!).

The only ones I now use in any form now are Java and Perl.

I use Java in my day job and to write things like this site, and I use Perl for the odd scripting task.

My first reason for needing a new language is a pragmatic one. I need to learn a new scripting language.

I need a new scripting language because every time I go to do something in Perl I find I have forgotten how to do one of:

a, list the files in a directory.

b, pass an array to a function.

c, iterate over an array.

d. all of the above.

This is because I find Perl’s syntax to be on the whole inconsistent and unintuitive.

So, I’ve had enough of Perl’s kooky ways and would like to learn something a little bit more “sane” (definition: consistent and intuitive syntax).

My second reason goes a little deeper.

I’ve been reading a few articles and blogs of late that in some way or another point out some problems with Java.

A Quick Tour of Ruby

Java doesn’t provide a utility method for opening a text file, reading or processing its lines, and closing the file. This is something you do pretty frequently.

— Steve Yegge

Can Your Programming Language Do This?

Java required you to create a whole object with a single method called a functor if you wanted to treat a function like a first class object.

— Joel Spolsky

What was interesting was that once I was over my initial denial of such heresy, I found myself mostly agreeing with what they had said.

The surprising part for me was that I had not consciously noticed these things myself – even though I now realise such things had annoyed me at the time.

The reason that they had not bubbled up to the level of consciousness was that I could not see beyond the Java language itself.

Something was awkward to do in Java (ever tried reading a file?) – well, that’s just the way Java is.

I couldn’t question it, because I was so deeply ingrained in the ways of Java, I could see no alternatives.

This worried me somewhat, what other concepts and ideas was I ignorant of due to my Java mindset?

Sometimes you need to take a step back and get a fresh perspective on things.

And what better way than to learn a new programming language.

I’m a busy guy though.

I can’t simply afford to take two weeks off just to learn a new language.

So, to be pragmatic (I’m a pragmatic guy too) I’m going to try to solve both of these problems with a single language.

So, I want a general purpose language that’s also good for scripting work.

My shortlist of languages is not long:

Python.

I’ve dabbled with Python.

It’s fun, quick, easy etc.

I’ve not done enough to know if it’s “sane” as defined above, it doesn’t seem as freaky as Perl though.

Ruby.

Ummm, everyone’s talking about it.

A friend of mine is learning it and he’s not swearing about it too much yet.

Apparently it’s mostly “sane”.

I’ve not completely decided yet, I’m leaning towards Ruby at the moment mind.

Anyone care to convince me either way, or suggest other languages I should be looking at?

Log4j with xml configuration

Like many Java developers, I use Log4j as my logging solution.

However, unlike many Java developers (in my experience anyway) I configure Log4j using xml rather than a properties file.

XML has always struck me as a neater way to represent what is essentially a hierarchical configuration.

However, it’s not terribly well documented (although the latest Log4j download comes with lots of example xml files).

Here’s an example entry from one of my xml files that sets up a daily rolling file appender appending to the existing file on startup.

    

<!-- An appender which writes to file -->
<appender name="FILE" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${user.home}/conf/apache/logs/boncey_app.log" />
<param name="datePattern" value="'.'yyyy-MM" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %C{6} (%F:%L) - %m%n"/>
</layout>
</appender>

Getting Tomcat to use a log4j XML file is a bit fiddly. The log4j manual explains how to set it up.

Under Tomcat 3.x and 4.x, you should place the log4j.properties under the WEB-INF/classes directory of your web-applications. Log4j will find the properties file and initialize itself. This is easy to do and it works.

However, it’s not immediately clear what to do if you use an XML file.

Dropping log4j.xml into WEB-INF/classes doesn’t work – or it didn’t for me on Tomcat 5.5.

Log4j manual to the rescue once more.

You can also choose to set the system property log4j.configuration before starting Tomcat. For Tomcat 3.x The TOMCAT_OPTS environment variable is used to set command line options. For Tomcat 4.0, set the CATALINA_OPTS environment variable instead of TOMCAT_OPTS.

So I put export CATALINA_OPTS="-Dlog4j.configuration=log4j.xml" into my shell startup scripts, dropped the log4j.xml file into my WEB-INF/classes dir and I was logging in no time!

One other thing you may find about log4j once you start using it is that lots of other Apache code uses it so once you set it up you’ll find your logs start filling up with lots (and I mean lots) of extra logging from the Apache code.

The traditional way to stop this is to add an entry as follows:


<category name="org.apache.tomcat" additivity="false">
<priority value="info" />
<appender-ref ref="FILE" />
</category>

This says that for anything logged from org.apache.tomcat, suppress anything less than INFO level, ie, suppress all DEBUG level logging.

The problem with this approach is that you have to keep adding new entries to suppress the logging from other packages too (Hibernate is a good example of this).

There is an alternative approach which is stunningly obvious once you hear it.

It is of course, better practice, to keep your application log4j.properties ROOT log level set to WARN or ERROR, and just turn on debugging for your own application classes! (ie. org.appfuse=DEBUG )

So obvious! Can’t believe I never thought of that.

So I set my base level to WARN then added “appenders” to log my own code at DEBUG level.

Voila, no more Tomcat cruft in my logs.

That concludes my ramblings on log4j and xml, hopefully they’ll be of use to someone (other than me of course).

Making Wiphi Good

BBC News has an article about the Slim Devices’ Transporter wi-fi music player.

As usual when I read about such things I tend to compare it to wiphi, the music player that I built.

It’s interesting to note the amount of work that would have to be done to get Wiphi up to the spec of something that is commercially available.

Visual appeal

My player looks attractive initially.

But it’s a bit chunky, and the green VFD display is not very readable.

Ideally it would be half that height and have a blue LCD display.

It would also have a nice responsive dial for controlling the volume.

It could also do with a nice logo.

Remote control

It doesn’t have a dedicated remote – it’s just configured to use any remote that works with LIRC.

Building a remote control from scratch is beyond my skills.

User interface

The overall user interface isn’t too bad, but it’s not slick.

There’s no indication of what is going on until the box has booted (takes a few minutes as it boots over the network).

The only interface is the remote control and the display panel at the front – which is a bit limiting. It needs an on-screen display on the TV ideally.

Software

I wrote the software myself – it’s essentially some “glue code” (written in Java) between MPD, LCDproc and LIRC.

If it goes wrong and crashes the only way to deal with it is to ssh in and look at log files.

It is MPD based though – I could set it up so that I could control it from a Web browser on another machine for example.

There’s no way to configure it beyond hacking XML files and restarting the box. That could be fixed – but would require a Web interface of course.

Conclusion

It would obviously need a lot of work to get it up to speed.

Also, the software work is far more likely to happen than the hardware work.

I’m not really good with hardware stuff, plus I’ve spent enough on kit for it already.

It’s an academic question anyway, I’m unlikey to give up my day job and build music players for a living.

Wiphi
Wiphi

Hibernate

I’ve just finished given this site a massive overhaul.

It’s got a new lick of paint, the old design had been around since the site launched in 2002 so a re-design was well overdue (a 4 year old design is ancient by Internet standards!).

It’s also had some major surgery under the hood too.

I’ve essentially ripped out the existing home-brewed persistence layer and replaced it with hibernate.

This really was as painful as it sounded and I wouldn’t really recommend it to anyone.

I did it for two reasons.

The main reason being that hibernate looks good on my CV. 🙂

The second reason was that my home-brew persistence layer was, errr, a bit shit.

My existing persistence layer was very simple, it read the entire database into its cache at start-up then never touched it again apart from writing new data (whereupon it refreshed its cache).

It’s a simple but effective approach.

However, it’s not particularly scalable and the site was feeling the strain as data grew.

Hibernate of course is a completely different animal.

Once I’d finished plumbing it in and got a working site I then set about making it efficient.

I didn’t want to replicate the old system of trying to cache everything, I wanted a balance between speed and memory usage.

This involved a lot of fine tuning, reading of the Hibernate manual and a copy of Hibernate In Action which I borrowed from work.

I think I have struck that balance now, the main sections of the site are read from the cache but the cache has limits set so that it doesn’t try to cache everything.

The other important thing I set up was a reporting page where I could view the sizes of the various caches to enable future tuning.

As for my thoughts on Hibernate – I’m reasonably happy with it.

When someone at work first told me about it they sang its praises to the limit and talked about its many powerful features and how it was so easy to work with.

They were sort of right, it is powerful, and does have some nice features. It’s also easy to work with. If you don’t care about performance.

If you want Hibernate to work efficiently and scale well you need to spend time learning how it works so that you can tune it properly.

I think I have done that to some degree but I’ll be monitoring cache performance over the coming weeks to see how it shapes up.

Programmer’s Diary (part four)

For my app, I had planned to cache some data from the Flickr API in a database (hence the need for Hibernate).

But I was eager to do a “proof of concept” thing and make sure everything I wanted to do could be done before I wrapped my head around Hibernate.

So I’ve decided to compromise by caching stuff in the session for now.

I had to tweak more Flickrj code to make sure everything I wanted to store in the session implemented Serializable though.

So, having got Flickrj working, and stuff cached in the session to ease the load on the Flickr API I started writing the interesting code that actually does stuff.

This part took less time than all the other fiddling around (always the way) so after only a few hours I had a simple prototype app that pretty much proved the concept.

There’s no design or look and feel work yet, just black text on a white background and no navigation other than a link to the home page.

So, it needs a bit of tarting up.

But it works, which is kind of fundamental really.

Next steps are to test everything and do some look and feel work.

At some point I’ll even send it live.

delicious powered linkblog

I’ve added a linkblog to my site, it’s on the right-hand sidebar of my blog homepage.

The nice thing about this linkblog (for me) is that it’s automatically loaded from del.icio.us.

To add something there I just add a URL to my del.icio.us bookmarks with a linkblog tag.

Within 30 minutes it appears on the blog here. Fully automated – nice.

The mechanism behind this is the same as the automatically updated Bloglines links further down the sidebar.

The only difference being the stylesheet as del.icio.us uses RDF and Bloglines uses OPML.

The other tweak I did in the stylesheet is to only show the ten newest links.

So, here’s the stylesheet that I made if it’s useful to anyone.

Programmer’s Diary (part three)

Robustification*

I want to ensure the app is pretty robust.

By which I mean that it shouldn’t crash (much).

This obviously involves writing proper code.

It also involves dealing with user input and the Flickr API in a sensible manner.

It’s reasonable to state that no user input should cause the app to crash.

To achieve this aim I will be treating all user input with extreme prejudice.

By user input I don’t just mean forms on the site.

I also mean any form of “URL hacking” too.

I also need to look at all the possible error conditions that the Flickr API can throw at me and deal with them gracefully.

Errors from Flickr will either be network errors (come back later when it’s better), invalid XML (will need to be investigated to see if I can work around it) or one of its published error codes (should be the easiest to deal with).

* Robustify is a word some shit contractor used in a CVS commit message after I slagged off his toString method.

I think it’s an awful word so therefore I enjoy using it in an ironic manner.

And now, for some gratuitous photos.

Empty shop
Empty shop
Candy Chinatown style
Candy Chinatown style

Programmer’s Diary (part two)

So, as discussed previously I’ll be coding against the Flickr API.

I’ve done this before and back then I wrote my own code to do it.

This was fair enough, I was only calling about 4 API methods.

But as I’ll be making a lot more use of the API this time I’m going to use Flickrj instead.

Flickrj is simply a Java wrapper around the Flickr API.

It seems to cover all the stuff that I want to do so no point writing my own interface.

The first thing that I usually do when using any new Java library is to get a copy of the Javadocs onto my local Web server and get a copy of the source code onto my machine.

I then set up Vim and ctags so that I can browse and jump to the source code from within Vim (sorta like an IDE, only not as good).

Having done that I started getting to grips with Flickrj.

Unfortunately the first thing I needed to do was to authenticate against Flickr.

Flickr’s authentication system is complicated but it’s also clever. A person can authenticate against Flickr via my site without providing me with any details about themselves. Clever.

It also works in slightly different ways depending on what type of app you are writing, Web, mobile or desktop.

Flickrj didn’t seem to be set up for Web authentication and I’m writing a Web app.

Time to start hacking on the Flickrj source code.

I moved the source for Flickrj into my project so it sat alongside my own – this way I could compile once and pick up any changes to both sets of code.

Once I had done this and added the method that I needed I got it to authenticate.

I then found a number of minor bugs in Flickrj, so I had to fix them too.

I then emailed the lead developer to ask if he wanted me to send him some patches, but I haven’t had a reply yet.

Right, that’s enough for now, next time I’ll talk about “robustifying” my app.

To go on with though, here are some random photos that I’ve taken in London these last few weeks.

Covent Garden
Covent Garden
Lunchtime photography wander (2)
Lunchtime photography wander (2)
Millenium Silhouettes
Millenium Silhouettes
Sunset throught the trees
Sunset throught the trees

Programmer’s Diary (part one)

I’m going to keep a programmer’s diary for my next project.

It’s a Web app that will use the Flickr API.

I’m not going to say what it does yet, but I’ll post my progress here.

I’ve decided to do it in Java as that’s what I know.

I’m too impatient to start learning something like Ruby or Python now, I want to crack on and get coding.

I will try to learn something new though and that will be Hibernate.

Previously I’ve used a basic system for database access; code generation using XSLT and a thin layer for caching frequently accessed data. It’s OK but a bit clunky and restrictive.

For the rest I’ll carry on with the current Web framework that I’ve been using for a few years (an “in-house” one).

I also feel that I should have a crack at using Eclipse as an IDE.

I’m not really an IDE person (I normally use Vim) but I suppose I should at least see what I am missing out on (I used it briefly at work, it’s a bit of a monster but having a debugger is nice).

Next time I’ll talk about coding against the Flickr API.