Just spent a few hours fixing some UTF-8 encoding problems on my blog.
I had a problem with non-ascii character being displayed incorrectly.
Turns out that I had a number of different problems to solve.
First off I needed to make sure all my JSPs had the correct pageEncoding at the top.
I also added the ‘Content-Type’ meta header to my template file.
Next I needed to wire in the EncodingFilter that Cagan so kindly provided.
That meant that non-ascii characters in my JSPs rendered fine but I still had two problems.
Any text that I entered into a form was still being screwed up, as was anything read from the database.
Stack Overflow had the solution (as usual) for the form input.
I needed to amend my Tomcat config to ensure my connector had ‘URIEncoding=”UTF-8″ ‘ added to it.
That fixed the form input problem.
That just left my Postgres database.
I first used ‘psql -l’ to see what encoding my database had.
It was set to ‘LATIN1’ – obviously it needed to be ‘UTF-8’.
To fix this I needed to drop and recreate my database.
Luckily this was only my local development database (my production one was already UTF-8) so that was simple enough.
Finally, after all that was done, I had proper UTF-8 support on my site.
And to prove it – here’s some non-ascii content from the UTF-8 SAMPLER website.
¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯