Fixing UTF-8 encoding on my Tomcat websites

Just spent a few hours fixing some UTF-8 encoding problems on my blog.

I had a problem with non-ascii character being displayed incorrectly.

Turns out that I had a number of different problems to solve.

First I read through Cagan Senturk’s (very useful) UTF-8 Encoding fix (Tomcat, JSP, etc) post.

Fortunately I’d already read Joel Spolsky’s epic unicode post so I had the theory.

First off I needed to make sure all my JSPs had the correct pageEncoding at the top.

I also added the ‘Content-Type’ meta header to my template file.

Next I needed to wire in the EncodingFilter that Cagan so kindly provided.

That meant that non-ascii characters in my JSPs rendered fine but I still had two problems.

Any text that I entered into a form was still being screwed up, as was anything read from the database.

Stack Overflow had the solution (as usual) for the form input.

I needed to amend my Tomcat config to ensure my connector had ‘URIEncoding=”UTF-8″ ‘ added to it.

That fixed the form input problem.

That just left my Postgres database.

I first used ‘psql -l’ to see what encoding my database had.

It was set to ‘LATIN1’ – obviously it needed to be ‘UTF-8’.

To fix this I needed to drop and recreate my database.

Luckily this was only my local development database (my production one was already UTF-8) so that was simple enough.

Finally, after all that was done, I had proper UTF-8 support on my site.

And to prove it – here’s some non-ascii content from the UTF-8 SAMPLER website.

¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯

Pagination, thing of the past

I’ve been trying to choose a motherboard to replace a dodgy one in a computer.

As usual I went to dabs to get one as I’ve always found them to be reliable.

I needed a motherboard that supported a given chip-set, had on-board video and LAN and did SATA.

Finding a board that meets all those requirements on dabs used to be a real pain.

You had to click on the motherboards section, then do some searches on strings that may or not match what data they had entered for each product.

You also had to hope that they had followed a consistent vocabulary (Socket-775 vs S775 vs S-775 etc).

It’s a wonder I found anything!

A few years back though dabs re-did their site and introduced a new tool for finding products.

Essentially, they made searching redundant with a clever filtering tool that allowed you to drill down by any of the major criteria for a product.

As can be seen from the screen-shot it’s really quick now to choose a motherboard that matches my criteria and is in stock.

When I first saw this I was overjoyed but 2-3 years later I still see very few sites doing anything similar.

Not only does it make pagination redundant (for all but very large data-sets) but it also does away with so-called “advanced search”.

Just chuck out all the results and let the user filter them.

The “page 2 of 20” links become almost redundant.

Now obviously this sort of filtering solution only really works with “filterable” data.

But even when there’s only a small amount of data like that it can still help.

I implemented it at home on my photo database – the only meta-data I used was the date the photo was taken (or imported in the case of film photos) and it’s still really useful.

I have thousands of photos in there with little in the way of tagging so searching was next to impossible – the filtering thing makes it much easier to find things though.

I just wish more sites would do this.

Do algorithms still matter?

I attended the BCS Mini-SPA event a few months back.

The premise of the mini SPA is as follows:

If you attended SPA2006 you might find that the miniSPA2006 programme allows you to catch up with sessions you didn’t select at the event.



The annual SPA conference (formerly known as OT) is where IT practitioners gather to reflect, exchange ideas and learn.

It also served as a convenient advert for next year’s full SPA event.

It was also free, had a free lunch and got me out of the office for a day, so it pretty much fulfilled all my criteria.

The structure of the day was 6 sessions, divided into two parallel streams.

I attended “Distributed workforces”, “Modelling with Views” and “A Good Read” but the one that really interested me was “A Good Read”.

This was a panel of five people who had each proposed a book to discuss. Each member of the panel then read each book so they could discuss it and give their views and insights.

The really interesting part for me was that someone proposed Programming Pearls by Jon Bentley.

I’ve owned a copy of this book for years but have yet to finish it, (it’s back on my “to read” list now though).

Everybody roundly praised the book but one of the members of the panel questioned whether we needed to know that level of detail when it comes to coding efficient algorithms – “wouldn’t it be simpler to throw more CPU and RAM at a problem?” they said.

Someone in the audience then countered that algorithm efficiency was relevant once again when programming Web-apps. They said something along the lines of “Wait until 100 people hit that page on your site“.

Sadly the session ran out of time at that point so no conclusion was reached.

My own belief is that you do need to know code at that level, especially if you write Web sites or other similar client/server apps with many concurrent client requests.

I’m not saying everyone should know the Quicksort algorithm inside out, but if you program in Java (for example) you should know the difference between a Vector, an ArrayList and a plain old array and when to use each.

I have had personal experience of a badly written for loop bringing down a Web site on launch day.

The for loop in itself wasn’t the worst code ever written by any means, but it was probably executed 30 to 40 times per individual home page hit.

Multiply that by a few dozen concurrent hits (it was a busy site) and any flaws in that code were mercilessly exposed.

Embarrassingly for me, it was my code. Oops.

Ever since that day I’ve been unable to forget that no amount of “CPU and RAM” (and we had a lot) will help if you don’t get your algorithms right in the first place.

New photoblog

I’ve put together a photoblog for occasional photo posts.

It’s called ‘diminishing horizons‘ and it has an RSS feed too.

It’s quite a minimalist design, and is also my first pure CSS driven design, no more HTML tables for me.

In other news, my programming project is still going ahead and should be nearing completion soon, I’m just working on design, layout and copy at the moment.

Programmer’s Diary (part four)

For my app, I had planned to cache some data from the Flickr API in a database (hence the need for Hibernate).

But I was eager to do a “proof of concept” thing and make sure everything I wanted to do could be done before I wrapped my head around Hibernate.

So I’ve decided to compromise by caching stuff in the session for now.

I had to tweak more Flickrj code to make sure everything I wanted to store in the session implemented Serializable though.

So, having got Flickrj working, and stuff cached in the session to ease the load on the Flickr API I started writing the interesting code that actually does stuff.

This part took less time than all the other fiddling around (always the way) so after only a few hours I had a simple prototype app that pretty much proved the concept.

There’s no design or look and feel work yet, just black text on a white background and no navigation other than a link to the home page.

So, it needs a bit of tarting up.

But it works, which is kind of fundamental really.

Next steps are to test everything and do some look and feel work.

At some point I’ll even send it live.

delicious powered linkblog

I’ve added a linkblog to my site, it’s on the right-hand sidebar of my blog homepage.

The nice thing about this linkblog (for me) is that it’s automatically loaded from del.icio.us.

To add something there I just add a URL to my del.icio.us bookmarks with a linkblog tag.

Within 30 minutes it appears on the blog here. Fully automated – nice.

The mechanism behind this is the same as the automatically updated Bloglines links further down the sidebar.

The only difference being the stylesheet as del.icio.us uses RDF and Bloglines uses OPML.

The other tweak I did in the stylesheet is to only show the ten newest links.

So, here’s the stylesheet that I made if it’s useful to anyone.

Programmer’s Diary (part three)

Robustification*

I want to ensure the app is pretty robust.

By which I mean that it shouldn’t crash (much).

This obviously involves writing proper code.

It also involves dealing with user input and the Flickr API in a sensible manner.

It’s reasonable to state that no user input should cause the app to crash.

To achieve this aim I will be treating all user input with extreme prejudice.

By user input I don’t just mean forms on the site.

I also mean any form of “URL hacking” too.

I also need to look at all the possible error conditions that the Flickr API can throw at me and deal with them gracefully.

Errors from Flickr will either be network errors (come back later when it’s better), invalid XML (will need to be investigated to see if I can work around it) or one of its published error codes (should be the easiest to deal with).

* Robustify is a word some shit contractor used in a CVS commit message after I slagged off his toString method.

I think it’s an awful word so therefore I enjoy using it in an ironic manner.

And now, for some gratuitous photos.

Empty shop
Empty shop
Candy Chinatown style
Candy Chinatown style

Programmer’s Diary (part two)

So, as discussed previously I’ll be coding against the Flickr API.

I’ve done this before and back then I wrote my own code to do it.

This was fair enough, I was only calling about 4 API methods.

But as I’ll be making a lot more use of the API this time I’m going to use Flickrj instead.

Flickrj is simply a Java wrapper around the Flickr API.

It seems to cover all the stuff that I want to do so no point writing my own interface.

The first thing that I usually do when using any new Java library is to get a copy of the Javadocs onto my local Web server and get a copy of the source code onto my machine.

I then set up Vim and ctags so that I can browse and jump to the source code from within Vim (sorta like an IDE, only not as good).

Having done that I started getting to grips with Flickrj.

Unfortunately the first thing I needed to do was to authenticate against Flickr.

Flickr’s authentication system is complicated but it’s also clever. A person can authenticate against Flickr via my site without providing me with any details about themselves. Clever.

It also works in slightly different ways depending on what type of app you are writing, Web, mobile or desktop.

Flickrj didn’t seem to be set up for Web authentication and I’m writing a Web app.

Time to start hacking on the Flickrj source code.

I moved the source for Flickrj into my project so it sat alongside my own – this way I could compile once and pick up any changes to both sets of code.

Once I had done this and added the method that I needed I got it to authenticate.

I then found a number of minor bugs in Flickrj, so I had to fix them too.

I then emailed the lead developer to ask if he wanted me to send him some patches, but I haven’t had a reply yet.

Right, that’s enough for now, next time I’ll talk about “robustifying” my app.

To go on with though, here are some random photos that I’ve taken in London these last few weeks.

Covent Garden
Covent Garden
Lunchtime photography wander (2)
Lunchtime photography wander (2)
Millenium Silhouettes
Millenium Silhouettes
Sunset throught the trees
Sunset throught the trees

Programmer’s Diary (part one)

I’m going to keep a programmer’s diary for my next project.

It’s a Web app that will use the Flickr API.

I’m not going to say what it does yet, but I’ll post my progress here.

I’ve decided to do it in Java as that’s what I know.

I’m too impatient to start learning something like Ruby or Python now, I want to crack on and get coding.

I will try to learn something new though and that will be Hibernate.

Previously I’ve used a basic system for database access; code generation using XSLT and a thin layer for caching frequently accessed data. It’s OK but a bit clunky and restrictive.

For the rest I’ll carry on with the current Web framework that I’ve been using for a few years (an “in-house” one).

I also feel that I should have a crack at using Eclipse as an IDE.

I’m not really an IDE person (I normally use Vim) but I suppose I should at least see what I am missing out on (I used it briefly at work, it’s a bit of a monster but having a debugger is nice).

Next time I’ll talk about coding against the Flickr API.