Fixing UTF-8 encoding on my Tomcat websites

Just spent a few hours fixing some UTF-8 encoding problems on my blog.

I had a problem with non-ascii character being displayed incorrectly.

Turns out that I had a number of different problems to solve.

First I read through Cagan Senturk’s (very useful) UTF-8 Encoding fix (Tomcat, JSP, etc) post.

Fortunately I’d already read Joel Spolsky’s epic unicode post so I had the theory.

First off I needed to make sure all my JSPs had the correct pageEncoding at the top.

I also added the ‘Content-Type’ meta header to my template file.

Next I needed to wire in the EncodingFilter that Cagan so kindly provided.

That meant that non-ascii characters in my JSPs rendered fine but I still had two problems.

Any text that I entered into a form was still being screwed up, as was anything read from the database.

Stack Overflow had the solution (as usual) for the form input.

I needed to amend my Tomcat config to ensure my connector had ‘URIEncoding=”UTF-8″ ‘ added to it.

That fixed the form input problem.

That just left my Postgres database.

I first used ‘psql -l’ to see what encoding my database had.

It was set to ‘LATIN1’ – obviously it needed to be ‘UTF-8’.

To fix this I needed to drop and recreate my database.

Luckily this was only my local development database (my production one was already UTF-8) so that was simple enough.

Finally, after all that was done, I had proper UTF-8 support on my site.

And to prove it – here’s some non-ascii content from the UTF-8 SAMPLER website.

¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯