Visible Progress

A few weeks back I’d been sat at my desk at work reading Steve McConnell’s blog entry on building a fort and how it compares with software development.

It’s quite interesting how many similarities there are between a software project and an engineering project (of sorts) even though they don’t really become apparent until after the fact.

Steve even managed to make a “classic mistake”.

4. Substituting a target for an estimate.

I had 7 days to do the project, and my estimate turned out to be 7 days. That’s a little suspicious, and I should have known better than to make that particular mistake!

It shows that when we are outside of our problem domain it’s easy to forget what we’ve learned.

I had an object lesson in this later on that same day when I was thinking about some small project that is going on in our office.

We’re getting a shower installed at work, and as I’ve just started cycling into work I had been taking an interest in the work.

The work started the week before but I wasn’t getting too excited about when it would be done as I know these things can drag on and hit all sorts of snags and problems.

It’s like software right? You can plan and plan but some unforeseen problem is always waiting around the next corner.

So, I was quite excited later that same day when I walked past the shower cubicle to see the shower unit had been attached to the wall.

My immediate, almost instinctive reaction was that the shower was pretty much installed and I’d be able to use it by the end of the week.

I walked back to my desk feeling quite pleased when suddenly I realised that I too had made a “classic mistake”.

Joel Spolsky explains this problem quite succinctly.

If you show a nonprogrammer a screen which has a user interface which is 100% beautiful, they will think the program is almost done.

People who aren’t programmers are just looking at the screen and seeing some pixels. And if the pixels look like they make up a program which does something, they think “oh, gosh, how much harder could it be to make it actually work?”

The big risk here is that if you mock up the UI first, presumably so you can get some conversations going with the customer, then everybody’s going to think you’re almost done. And then when you spend the next year working “under the covers,” so to speak, nobody will really see what you’re doing and they’ll think it’s nothing.

I was acting like that client, seeing the “front-end” and assuming all the back-end work was done and/or didn’t exist.

I felt pretty silly once I’d had this realisation.

And it turned out that I was being silly as it still took a few more weeks before the shower was actually complete – of course, during that time the workmen were nowhere to be seen – but hey, it’s not like me to question the integrity of the great British workman.

“Source control ate my files!”

Everyone who has worked in Software Development for long enough must have heard somebody say that source control ate their files – it’s up there with “Works on my machine” and other such silliness.

Invariably source control didn’t eat their files at all – the problem boiled down to a (sadly) not too uncommon condition of “Fear of Source Control“.

Here are some of the symptoms of such a fear.

Unwillingness to do an update

I worked on a project years back with a tight deadline, not a huge amount going on in the way of process and about 6 developers all coding like hell.

Invariably, every other update broke your local build as someone had changed an interface or forgotten to check in a file (we had no daily build either), so yes, updating was a pain.

For two of the less experienced team members their solution to this problem was to hold off updating as long as possible (they’d happily go two weeks without doing an update).

When I queried their approach their answer was that doing an update “broke things” so was best avoided.

This symptom of course goes hand in hand with…

Unwillingness to do a commit

As anyone who has worked with source control systems long enough knows, they won’t let you commit a file if there are outstanding changes to be merged in.

So, leaving long gaps between updates almost always leads to massive problems when you finally commit your changes.

Long update intervals lead to long commit intervals.

My usual solution to this is to do an update every morning before I start any development work for that day.

Of course, I now work on much saner projects where things don’t break so often (or when they are likely to break things then someone has already warned you in advance).

That way I get the small amount of pain out of the way without huge disruption to whatever I am working on.

I then commit my work once it’s complete and passes its tests.

I also try to pay attention to what my colleagues are doing on the project too so I can avoid nasty surprises.

Deleting other people’s code

I have seen this happen, usually when someone gets a merge conflict.

Merge conflicts are what happens when two people work on the same code at the same time and the changes from one person’s work are merged into the other’s work.

Sometimes this happens smoothly and everybody is happy and sometimes not so smoothly and one person is very unhappy.

In the face of a merge conflict the correct approach is to fix the code by hand, which often involves talking to the other person who worked on the file to ensure that both their changes and your changes are preserved.

The incorrect approach (and yes, I’ve seen it done) is to remove the offending lines (someone else’s code), keep yours and commit away. This is of course a “bad thing”.

Merge conflicts are of course best avoided, even experienced developers strongly dislike them.

The way to do this is through communication with your fellow team members and knowing who is working on what area of the code (Scrum-like daily meetings are great for keeping up with who is doing what).

Often, said conflicts can be avoided by a bit of advance planning (you do your bit, test and commit, then they update and pick up your changes before they do their bit etc).

Commenting out unused code

If you’re lucky it will be accompanied by a vague comment along the lines of “I don’t think this is used any more” .

This stems from the fear of not knowing how to use source control to retrieve old versions of a file.

The correct thing to do of course is to remove it, then commit that change and mention said removal in the commit log.

If the code is being replaced then a comment along the lines of “Replaced foobar1 with foobar2 – foobar1 code lives in version 4.1.2 in CVS” would be most appreciated by future developers (which could of course be you).

Committing backup versions of files

I’ve just finished working on a project where a binary file that was kept in CVS had no less than 7 alternate versions (none of which were used) checked in alongside it.

I spent ages working out what each one did, seeing it did nothing, then removing it from CVS.

Again it stems from confusion and fear of using source control to get access to older verions of a file.

The solution

The solution to all of these problem is of course to “lose that fear” and learn to love your source control tool.

One good way to start doing this is to stop using a GUI to manage your source control tool and learn the command line instead (assuming your tool has that option).

That will remove a lot of the mystery of what is going on.

Learn the mechanics of your source control tool by reading the manual.

Eric Sink has an excellent series of posts on source control that highlight some of the many benefits of using it.

The final thing to realise is that correct use of source control will save your bacon.

The initial learning investment will be repaid time and time again.

And that, is worth its weight in gold.

Thinking in Ruby

So, I’m learning Ruby (it only took me a year to get started!).

I’m working my way through Programming Ruby and doing a few different scripts to see what it can and can’t do.

Most of it seems fairly straightforward stuff and I’m liking what I’m seeing for the most part.

One of the things that crops up from time to time in examples in books and online is something along these lines:

print total unless total.zero?

That’s it, the “unless construct”.

I’ve seen this before in Perl and I’ve always avoided using it – I personally find it unintuitive so I always write my code in the if x do y style.

Do x unless y has always seemed a little, errr, backwards.

Seeing it again in Ruby I again decided I’d avoid using it and carry on as I had before – then I began to wonder if I was simply imposing my “Java programming style” on to my Ruby code.

It’s an easy enough trap to fall into, much like early C++ programmers wrapping their C-style static methods up in a class and think they were doing OO.

Thinking about it, most of my Perl code is written in a similar style to my Java – I always apply “use strict” and enable warnings, always put code into methods, almost always have a main method etc.

But hold on, am I writing Perl in a Java style and thereby restricting my ability with the language, or am I simply applying sensible practices to my Perl code?

My Perl code never really extended much beyond occasional scripts to process photos so I have no clear answer to that.

I hope that my Ruby coding will move beyond that (possibly into the realm of Ruby on Rails) so as it does I’ll have to constantly be asking myself if I am thinking in Java or thinking in Ruby.

Sit up and don’t slouch

I’ve just been trying out the suggestions on Jeff Atwood’s post on Computer Workstation Ergonomics.

I’ve known for ages that my seating position when at my computer is wrong; I’m a terrible sloucher you see.

Fortunately I’ve never experienced any back pain from it, so I’ve just carried on doing it.

But at the back of my mind I’ve always suspected that at some point I will experience some ill effects from it and I’ll regret my years of slouching.

So, Jeff’s post provided me with the impetus (and information) to do something about it.

Following the graphics and suggestions of Jeff’s site, I adjusted my chair (the only thing I can adjust) to match the ideal image (back straight, knees bent at 90 degree angles, eyes in line with top of monitor etc).

I’m in that position now as I type this post.

Some initial observations:

  • It feels strange, not uncomfortable, but I do keep getting the urge to slouch downwards – fortunately the new set-up pretty much prevents me from slouching and still being able to type at the keyboard.

  • I’m a little achy from spending much of the bank holiday weekend riding my new bike, so it’ll take me a few days to determine if any aches and pains are bike related or chair related.

  • I’m finding new reasons to learn to touch type! One of the guidelines is to use my chair armrests to support my elbows – for me this means that my left hand can only comfortably reach the left side of the keyboad, and vice versa for my right hand. What I am finding now is that as I type, sometimes my left hand instinctively tries to press a key on the right hand side of the keyboard (and vice versa, I type with both hands), which means that my arm then lifts off the armrest. I need to learn to touch type properly so that my arm doesn’t have to lift up.

  • My left elbow no longer hurts from resting on the (stupid) curved desk – woohoo!

I need to do the same at home now, and then monitor the situation long term.

Unchecked Exceptions

On our new project at work we’re using JPA sitting on top of Hibernate.

I’ve used Hibernate several times now and am familiar with it.

JPA is mostly similar in use but there are a few gotchas.

One that got me the other day was what happens when you write a query that you expect to return a single result.

In Hibernate I’d have called query.uniqueResult();

The Javadoc for that method says:

Convenience method to return a single instance that matches the query, or null if the query returns no results.

So, the query either returns my object or null (an exception is thrown if my query returns more than one result – fair enough).

I had to do something similar in JPA-land so I looked at its Query class.

It offered a similarly named method: query.getSingleResult();.

All good, I wrote my code, compiled it and restarted my application server.

Unfortunately, when I ran the code, it fell over with a NoResultException.

For my particular query, there were no results in our test database.

Fine, my code can deal with that, but clearly the JPA method works quite differently from the Hibernate version.

Its Javadoc says:

Execute a SELECT query that returns a single result.

Returns:

the result

Throws:

NoResultException – if there is no result


So, unlike the Hibernate version this one will throw an exception if the query returns no results.

Hmmmm, I think I prefer the Hibernate version.

Of course, if it had thrown a checked exception my code would not have even compiled.

As it was it was just luck that the database had no results so I found the problem right away.

I’m not saying unchecked exceptions are bad, on the whole I prefer them.

But there’s a certain element of retraining your brain to no longer rely on the compiler to tell you that you’re dealing with all possible error conditions.

I know, I know, there wouldn’t be a problem if I’d read the Javadoc up front, but how many people can honestly say that they read the Javadoc for every new method the first time that they call it?

Purging jsessionids

jsessionid is the parameter that a Servlet engine adds to your site’s URL if you’ve enabled cookies in your config but the user viewing the site doesn’t have cookies enabled.

It then allows a cookie-less user to use your site and maintain their session.

It seems like a good idea but it’s a bit flawed.

The author of randomCoder has summarised the flaws quite well.

Every link on your site needs manual intervention

Cookieless sessions are achieved in Java by appending a string of the format ;jsessionid=SESSION_IDENTIFIER to the end of a URL. To do this, all links emitted by your website need to be passed through either HttpServletRequest.encodeURL(), either directly or through mechanisms such as the JSTL <c:out /> tag. Failure to do this for even a single link can result in your users losing their session forever.

Using URL-encoded sessions can damage your search engine placement

To prevent abuse, search engines such as Google associate web content with a single URL, and penalize sites which have identical content reachable from multiple, unique URLs. Because a URL-encoded session is unique per visit, multiple visits by the same search engine bot will return identical content with different URLs. This is not an uncommon problem; a test search for ;jsessionid in URLs returned around 79 million search results.

It’s a security risk

Because the session identifier is included in the URL, an attacker could potentially impersonate a victim by getting the victim to follow a session-encoded URL to your site. If the victim logs in, the attacker is logged in as well – exposing any personal or confidential information the victim has access to. This can be mitigated somewhat by using short timeouts on sessions, but that tends to annoy legitimate users.

There’s one other factor for me too; public users of my site don’t require cookies – so I really don’t need jsessionids at all.

Fortunately, he also presents an excellent solution to the problem.

The solution is to create a servlet filter which will intercept calls to HttpServletRequest.encodeURL() and skip the generation of session identifiers. This will require a servlet engine that implements the Servlet API version 2.3 or later (J2EE 1.3 for you enterprise folks). Let’s start with a basic servlet filter:

He then goes on to dissect the code section by section and presents a link at the end to download it all.

So I downloaded it, reviewed it, tested it and implemented it on my site.

It works a treat!

However, I still had a problem; Google and other engines still have lots of links to my site with jsessionid in the URL.

I wanted a clean way to remove those links from its index.

Obviously I can’t make Google do that directly.

But I can do it indirectly.

The trick is first to find a way to rewrite incoming URLs that contain a jsessionid to drop that part of the URL.

Then to tell the caller of the URL to not use that URL in future but to use the new one that doesn’t contain jsessionid.

Sounds complicated, but there are ways of doing both.

I achieved the first part using a thing called mod rewrite.

This allows me to map an incoming URL to a different URL – it’s commonly used to provide clean URLs on Web sites.

For the second part there is a feature of the HTTP spec that allows me to indicate that a link has been permanently changed and that the caller should update their link to my site.

301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.

So, putting these two together, I wrote the following mod rewrite rules for Apache.


ReWriteRule ^/(\w+);jsessionid=\w+$ /$1 [L,R=301]
ReWriteRule ^/(\w+\.go);jsessionid=\w+$ /$1 [L,R=301]

The first rule says that any URLs ending in jsessionid will be rewritten without the jsessionid.

The second does the same but maps anything ending in .go – I was too lazy to work out a single pattern to do both types of URLs in one line.

And I used that all-important 301 code to persuade Google to update its index to the new link.

So, from now on – my pages will no longer output jsessionids and any incoming links that include them will have them stripped out.

In other words; jsessionids purged.

How to mentor programmers

I was reading an entry over at Raganwald where the author talks about managing projects.

He covers a list of things that one should try to do to ensure a project is a success.

One of his main points is that a tech lead should always know exactly what everyone is doing on a daily basis.

Whenever I’ve allowed the details of a project to escape me, I’ve failed. […] And I can tell you, whenever the details of a project have slipped from my grasp, the project has started to drift into trouble.

I’ve been the tech lead on many projects over the last 6 years or so but I’ve always stopped short of asking people what they are doing on a daily basis.

It has always struck me as a form of “micro-managing”, which is something that I’ve hated when I’ve been on the receiving end of it.

I should clarify though; I always know who is working in what area on a day to day basis (Jim is working on the email module for these two weeks), but I don’t necessarily know what specific task they are trying to achieve on a particular day (I don’t know if Jim is writing display code today, or back-end logic).

However, after reflecting on how this has worked on my projects, I have to conclude that my approach was wrong.

I should know what people are doing – I just need to find a balance between knowing what they are doing and getting on their nerves.

Clearly a balance can be found.

I make no apologies for now insisting on knowing exactly who, what, where, when, and why. There’s a big difference between being asked to explain your work in detail and being told how to do your job.

I’m not sure of the best way to handle getting to that level of detail though.

Daily meetings where everyone reports progress?

I find these to be a bit of a waste of time (especially in large teams) where you talk for 2 minutes then sit and listen to everyone else for 20 minutes.

Walking around and sitting down next to each person in turn (sort of like a Doctor doing his rounds)?

This is better for the team as they are only interrupted when I am talking to them.

I’ve done this before but never in a “tell me what code you are writing now” way.

I still think this might annoy me if I was on the receiving end of this.

Another way?

What about other tech lead people reading this, what works for you?

Or, if you’re on the receiving end of this, where exactly does that all-important line sit?

Do algorithms still matter?

I attended the BCS Mini-SPA event a few months back.

The premise of the mini SPA is as follows:

If you attended SPA2006 you might find that the miniSPA2006 programme allows you to catch up with sessions you didn’t select at the event.



The annual SPA conference (formerly known as OT) is where IT practitioners gather to reflect, exchange ideas and learn.

It also served as a convenient advert for next year’s full SPA event.

It was also free, had a free lunch and got me out of the office for a day, so it pretty much fulfilled all my criteria.

The structure of the day was 6 sessions, divided into two parallel streams.

I attended “Distributed workforces”, “Modelling with Views” and “A Good Read” but the one that really interested me was “A Good Read”.

This was a panel of five people who had each proposed a book to discuss. Each member of the panel then read each book so they could discuss it and give their views and insights.

The really interesting part for me was that someone proposed Programming Pearls by Jon Bentley.

I’ve owned a copy of this book for years but have yet to finish it, (it’s back on my “to read” list now though).

Everybody roundly praised the book but one of the members of the panel questioned whether we needed to know that level of detail when it comes to coding efficient algorithms – “wouldn’t it be simpler to throw more CPU and RAM at a problem?” they said.

Someone in the audience then countered that algorithm efficiency was relevant once again when programming Web-apps. They said something along the lines of “Wait until 100 people hit that page on your site“.

Sadly the session ran out of time at that point so no conclusion was reached.

My own belief is that you do need to know code at that level, especially if you write Web sites or other similar client/server apps with many concurrent client requests.

I’m not saying everyone should know the Quicksort algorithm inside out, but if you program in Java (for example) you should know the difference between a Vector, an ArrayList and a plain old array and when to use each.

I have had personal experience of a badly written for loop bringing down a Web site on launch day.

The for loop in itself wasn’t the worst code ever written by any means, but it was probably executed 30 to 40 times per individual home page hit.

Multiply that by a few dozen concurrent hits (it was a busy site) and any flaws in that code were mercilessly exposed.

Embarrassingly for me, it was my code. Oops.

Ever since that day I’ve been unable to forget that no amount of “CPU and RAM” (and we had a lot) will help if you don’t get your algorithms right in the first place.

Need a new Programming Language

I need to learn a new Programming Language.

This is for two reasons.

In my time as a programmer I’ve learned and used; Basic, Ada, C, C++, VB, Perl and Java.

So that’s 7 (5 if you merge Basic with VB and C with C++).

It’s a reasonable amount, a little on the small side.

But that list is only half the truth; most of those languages I’ve not touched in years, some I’m definitely never going to touch again (Ada!).

The only ones I now use in any form now are Java and Perl.

I use Java in my day job and to write things like this site, and I use Perl for the odd scripting task.

My first reason for needing a new language is a pragmatic one. I need to learn a new scripting language.

I need a new scripting language because every time I go to do something in Perl I find I have forgotten how to do one of:

a, list the files in a directory.

b, pass an array to a function.

c, iterate over an array.

d. all of the above.

This is because I find Perl’s syntax to be on the whole inconsistent and unintuitive.

So, I’ve had enough of Perl’s kooky ways and would like to learn something a little bit more “sane” (definition: consistent and intuitive syntax).

My second reason goes a little deeper.

I’ve been reading a few articles and blogs of late that in some way or another point out some problems with Java.

A Quick Tour of Ruby

Java doesn’t provide a utility method for opening a text file, reading or processing its lines, and closing the file. This is something you do pretty frequently.

— Steve Yegge

Can Your Programming Language Do This?

Java required you to create a whole object with a single method called a functor if you wanted to treat a function like a first class object.

— Joel Spolsky

What was interesting was that once I was over my initial denial of such heresy, I found myself mostly agreeing with what they had said.

The surprising part for me was that I had not consciously noticed these things myself – even though I now realise such things had annoyed me at the time.

The reason that they had not bubbled up to the level of consciousness was that I could not see beyond the Java language itself.

Something was awkward to do in Java (ever tried reading a file?) – well, that’s just the way Java is.

I couldn’t question it, because I was so deeply ingrained in the ways of Java, I could see no alternatives.

This worried me somewhat, what other concepts and ideas was I ignorant of due to my Java mindset?

Sometimes you need to take a step back and get a fresh perspective on things.

And what better way than to learn a new programming language.

I’m a busy guy though.

I can’t simply afford to take two weeks off just to learn a new language.

So, to be pragmatic (I’m a pragmatic guy too) I’m going to try to solve both of these problems with a single language.

So, I want a general purpose language that’s also good for scripting work.

My shortlist of languages is not long:

Python.

I’ve dabbled with Python.

It’s fun, quick, easy etc.

I’ve not done enough to know if it’s “sane” as defined above, it doesn’t seem as freaky as Perl though.

Ruby.

Ummm, everyone’s talking about it.

A friend of mine is learning it and he’s not swearing about it too much yet.

Apparently it’s mostly “sane”.

I’ve not completely decided yet, I’m leaning towards Ruby at the moment mind.

Anyone care to convince me either way, or suggest other languages I should be looking at?

Log4j with xml configuration

Like many Java developers, I use Log4j as my logging solution.

However, unlike many Java developers (in my experience anyway) I configure Log4j using xml rather than a properties file.

XML has always struck me as a neater way to represent what is essentially a hierarchical configuration.

However, it’s not terribly well documented (although the latest Log4j download comes with lots of example xml files).

Here’s an example entry from one of my xml files that sets up a daily rolling file appender appending to the existing file on startup.

    

<!-- An appender which writes to file -->
<appender name="FILE" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${user.home}/conf/apache/logs/boncey_app.log" />
<param name="datePattern" value="'.'yyyy-MM" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %C{6} (%F:%L) - %m%n"/>
</layout>
</appender>

Getting Tomcat to use a log4j XML file is a bit fiddly. The log4j manual explains how to set it up.

Under Tomcat 3.x and 4.x, you should place the log4j.properties under the WEB-INF/classes directory of your web-applications. Log4j will find the properties file and initialize itself. This is easy to do and it works.

However, it’s not immediately clear what to do if you use an XML file.

Dropping log4j.xml into WEB-INF/classes doesn’t work – or it didn’t for me on Tomcat 5.5.

Log4j manual to the rescue once more.

You can also choose to set the system property log4j.configuration before starting Tomcat. For Tomcat 3.x The TOMCAT_OPTS environment variable is used to set command line options. For Tomcat 4.0, set the CATALINA_OPTS environment variable instead of TOMCAT_OPTS.

So I put export CATALINA_OPTS="-Dlog4j.configuration=log4j.xml" into my shell startup scripts, dropped the log4j.xml file into my WEB-INF/classes dir and I was logging in no time!

One other thing you may find about log4j once you start using it is that lots of other Apache code uses it so once you set it up you’ll find your logs start filling up with lots (and I mean lots) of extra logging from the Apache code.

The traditional way to stop this is to add an entry as follows:


<category name="org.apache.tomcat" additivity="false">
<priority value="info" />
<appender-ref ref="FILE" />
</category>

This says that for anything logged from org.apache.tomcat, suppress anything less than INFO level, ie, suppress all DEBUG level logging.

The problem with this approach is that you have to keep adding new entries to suppress the logging from other packages too (Hibernate is a good example of this).

There is an alternative approach which is stunningly obvious once you hear it.

It is of course, better practice, to keep your application log4j.properties ROOT log level set to WARN or ERROR, and just turn on debugging for your own application classes! (ie. org.appfuse=DEBUG )

So obvious! Can’t believe I never thought of that.

So I set my base level to WARN then added “appenders” to log my own code at DEBUG level.

Voila, no more Tomcat cruft in my logs.

That concludes my ramblings on log4j and xml, hopefully they’ll be of use to someone (other than me of course).