The continuing adventures of my Yashica

Carousel

I got my film back from my Yashica.

If anyone’s interested, I used Spectrum Imaging in Newcastle (mail order).

They are very fast and very cheap (yay!), but they don’t do B&W (boo!).

Of the 12 shots I took this is the one that I’m happiest with, it’s also the one with the best exposure.

Most of them were a bit over-exposed (nothing that can’t be rescued) but this one was pretty accurate.

I was trying to compensate for what I believed was the meter’s tendency to over-expose (based on a comparison with my SLR) but it looks like I need to compensate more.

Many people who own this camera ignore the built-in meter as it has a tendency to be wrong.

These people then either carry around an SLR or a light meter to meter with; or they guess.

Well, guess is not really an accurate description (although I imagine some people do genuinely guess – but I’m not referring to them here).

I’m referring to a form of educated guessing.

There’s a thing called the sunny f/16 rule that can be used to determine exposure.

As usual, Wikipedia knows all…

In photography, the sunny 16 rule (or, less often, the “sunny f/16 rule”) is a method to estimate correct daylight exposures without using a light meter.

The basic sunny 16 rule, applicable on a sunny day, is this:

Set aperture to f/16 and shutter speed (reciprocal seconds) to ISO film speed.

For example, for ISO 100 film, choose shutter speed of 1/100 second (or 1/125 second).

There’s also an interesting page here about calculating exposure that looks quite interesting.

I don’t want to carry my SLR around with me everytime I use the Yashica so it looks like I’ll be going down the “guessing” route (assuming I can’t trust the meter that is).

Today though I went for a walk in the park and I did have my SLR with me so I used that to set my first reading.

The light didn’t change for a while so I left it at that.

Later the sun came out so I metered again with the SLR.

I’ll find out when I get the film back how it all worked out.

As I’m shooting negative film I can afford to be a bit lax with my metering as corrections can be made at processing time – if I shoot slide film I’ll have to be more accurate as it’s much less forgiving.

I’m not sure if that matters too much with scans though.

The above was shot with Fujicolor Superia film.

I have another 3 rolls of that.

I’ve also ordered a selection of black and white film too.

Some Ilford HP5+ which I’ve heard good things about and some Kodak Tri-X.

Now I just have to hope that I don’t get bored with it all before I run out of film!

My plan for spam – one month on

A while back I wrote about “my plan for spam“.

“My plan” has been running for a month now so it’s time to review it.

Before I implemented it I was getting around 600 spams a month.

After running it for a full month I’m down to around 250.

So, I’d consider that a fairly successful plan.

Of course, all the ones that were caught in my spam filter were being sent to my home account and put into a spam folder.

I said I’d monitor that and if there were no false positives I’d set it to delete them upon arrival.

There are no false positives, so I will be setting it to delete (need to test properly before I go putting delete rules into my filters).

As for getting that 250 even lower – I’m kinda stuck.

Around 80% of those 250 are direct to my gmail address so they don’t touch my filtering system.

I don’t really use my gmail address so I could set gmail to delete anything sent to that address.

The thought of doing that scares me a bit – I’ll wait and see how annoyed I get by it all I think.

Of course, no doubt in 6 months time I’ll be back up to 600 a month again, but what can you do?

Update:

I found out that if I tell gmail to delete an email in a filter rule it puts it into the Deleted Items folder which is automatically cleaned out after being in there for 30 days.

So, with that in mind I’ve set it to delete any emails that are addressed to my gmail address.

That way I have 30 days to find a real email if I have reason to believe it was sent to my gmail address.

Since doing that I’ve received an average of just over one spam a day!

That’s going to be around 40 a month.

Eat that spammers!

Purging jsessionids

jsessionid is the parameter that a Servlet engine adds to your site’s URL if you’ve enabled cookies in your config but the user viewing the site doesn’t have cookies enabled.

It then allows a cookie-less user to use your site and maintain their session.

It seems like a good idea but it’s a bit flawed.

The author of randomCoder has summarised the flaws quite well.

Every link on your site needs manual intervention

Cookieless sessions are achieved in Java by appending a string of the format ;jsessionid=SESSION_IDENTIFIER to the end of a URL. To do this, all links emitted by your website need to be passed through either HttpServletRequest.encodeURL(), either directly or through mechanisms such as the JSTL <c:out /> tag. Failure to do this for even a single link can result in your users losing their session forever.

Using URL-encoded sessions can damage your search engine placement

To prevent abuse, search engines such as Google associate web content with a single URL, and penalize sites which have identical content reachable from multiple, unique URLs. Because a URL-encoded session is unique per visit, multiple visits by the same search engine bot will return identical content with different URLs. This is not an uncommon problem; a test search for ;jsessionid in URLs returned around 79 million search results.

It’s a security risk

Because the session identifier is included in the URL, an attacker could potentially impersonate a victim by getting the victim to follow a session-encoded URL to your site. If the victim logs in, the attacker is logged in as well – exposing any personal or confidential information the victim has access to. This can be mitigated somewhat by using short timeouts on sessions, but that tends to annoy legitimate users.

There’s one other factor for me too; public users of my site don’t require cookies – so I really don’t need jsessionids at all.

Fortunately, he also presents an excellent solution to the problem.

The solution is to create a servlet filter which will intercept calls to HttpServletRequest.encodeURL() and skip the generation of session identifiers. This will require a servlet engine that implements the Servlet API version 2.3 or later (J2EE 1.3 for you enterprise folks). Let’s start with a basic servlet filter:

He then goes on to dissect the code section by section and presents a link at the end to download it all.

So I downloaded it, reviewed it, tested it and implemented it on my site.

It works a treat!

However, I still had a problem; Google and other engines still have lots of links to my site with jsessionid in the URL.

I wanted a clean way to remove those links from its index.

Obviously I can’t make Google do that directly.

But I can do it indirectly.

The trick is first to find a way to rewrite incoming URLs that contain a jsessionid to drop that part of the URL.

Then to tell the caller of the URL to not use that URL in future but to use the new one that doesn’t contain jsessionid.

Sounds complicated, but there are ways of doing both.

I achieved the first part using a thing called mod rewrite.

This allows me to map an incoming URL to a different URL – it’s commonly used to provide clean URLs on Web sites.

For the second part there is a feature of the HTTP spec that allows me to indicate that a link has been permanently changed and that the caller should update their link to my site.

301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.

So, putting these two together, I wrote the following mod rewrite rules for Apache.


ReWriteRule ^/(\w+);jsessionid=\w+$ /$1 [L,R=301]
ReWriteRule ^/(\w+\.go);jsessionid=\w+$ /$1 [L,R=301]

The first rule says that any URLs ending in jsessionid will be rewritten without the jsessionid.

The second does the same but maps anything ending in .go – I was too lazy to work out a single pattern to do both types of URLs in one line.

And I used that all-important 301 code to persuade Google to update its index to the new link.

So, from now on – my pages will no longer output jsessionids and any incoming links that include them will have them stripped out.

In other words; jsessionids purged.

My plan for spam

Every now and again I get sufficiently annoyed by spam to want to do something about it.

Today was one of those days where I had enough time to sit down and work on my plan for dealing with spam.

My email setup is a little unusual so it became a bit complicated.

All my incoming email is processed on my mail server; then one copy is forwarded to gmail, the other to my computer at home.

I then read all my email via gmail, and keep the copy at home just for backup purposes.

I’ve been doing this for a while; but found the manual task of scanning through spam in gmail to be tiresome and annoying.

What I wanted to do was cut out a lot of the spam on my server, before it was even passed on to gmail.

I figured I could script a better spam-filtering solution than gmail’s system.

My plan was to install bogofilter on the mail server and use that to filter out a lot of the spam.

There were two problems with this, one was that I needed a body of spam and ham emails to train it with and I didn’t keep email on the server.

The other was that it would need constant training as new emails came in, this was very tricky as emails were passsed on to gmail and my home computer in an automated process with no scope for human intervention.

To deal with the first problem I decided to install bogofilter at home too and train it there, then upload the training database to the mail server.

For the second problem I came up with the following solution:

I would use bogofilter on the mail server; send anything flagged as non-spam on to gmail and home; and send anything classed as spam just to home.

Once it got to home, it would be passed through bogofilter a second time; this instance would be set up slightly differently to the first one; it would classify emails into one of three folders; ham, spam or unsure.

I would then use mutt to periodically re-train bogofilter telling it that anything in the “unsure” folder was either ham or spam.

Finally, the newly trained database would be copied back up to the mail server each night.

The more astute reader may have noticed a problem with this solution.

I seem to have replaced scanning a folder full of spam on gmail, for scanning a folder full of spam at home.

This is true, initially I will be dealing with the same amount of spam.

However, I have a longer term plan here.

Once I’m happy with the filtering I’m going to tweak my solution so that anything tagged as spam will be deleted outright once it hits my home PC.

This I expect will reduce the amount of spam that I see by about 90%.

I am comfortable with the fact that I will probably lose the occasional non-spam email.

I’m gonna run this system for about a month, if I get through that with zero false positives I’ll feel brave enough to set it to delete.

Reclaiming ext3 disk space

A while back I bought an external hard drive for backing up my flacs and my photos.

Recently it started to fill up.

This was, of course, a bad thing.

I looked at how much space I had left on it:


Filesystem Size Used Avail Use% Mounted on
/dev/external 276G 259G 2.5G 100% /mnt/external

So, I had 2.5GB free. But hold on, the maths doesn’t make sense.

I have a 276GB drive, with 2.5GB free, yet I’ve only used 259GB.

I’m missing 14.5GB!

I did some googling and found out the following:

The most likely reason is reserved space. When an ext2/3 filesystem is formated by default 5% is reserved for root. Reserved space is supposed to reduce fragementation and allow root to login in case the filesystem becomes 100% used. You can use tune2fs to reduce the amount of reserved space.

So, ext2/3 reserves 5% of space, which on my drive is 13.8GB – well, that’s close enough to 14.5GB, so that explains that mystery.

The next questions was; can I and should I reduce that amount of reserved space?

More googling:

The reserved blocks are there for root’s use. The reason being that the system gets really narky if you completely run out of room on / (specifically /var, or /tmp, I think). Programs won’t start, wierd errors will pop up, that sort of thing. With some room reserved for root, you can at least be sure to be able to run the really important programs, like sudo and rm .

So, in short, if the drive doesn’t contain /var or /tmp, then there’s not much point in having space reserved for root.

So, some poster on some Internet forum says it’s probably OK to do away with that reserved space.

That’s usually good enough for me, but I figured this time I’ll play it safe and reduce it to 1%.

So I unmounted the drive and ran the following command: tune2fs -m 1 /dev/external

I re-mounted and voila, 11.5GB appeared out of nowhere!


Filesystem Size Used Avail Use% Mounted on
/dev/external 276G 259G 14G 95% /mnt/external

I’ve now run this on my other non-booting partitions.

All seems fine so far.

I’ll leave my system partition at 5% I think though, just to be safe.

How to mentor programmers

I was reading an entry over at Raganwald where the author talks about managing projects.

He covers a list of things that one should try to do to ensure a project is a success.

One of his main points is that a tech lead should always know exactly what everyone is doing on a daily basis.

Whenever I’ve allowed the details of a project to escape me, I’ve failed. […] And I can tell you, whenever the details of a project have slipped from my grasp, the project has started to drift into trouble.

I’ve been the tech lead on many projects over the last 6 years or so but I’ve always stopped short of asking people what they are doing on a daily basis.

It has always struck me as a form of “micro-managing”, which is something that I’ve hated when I’ve been on the receiving end of it.

I should clarify though; I always know who is working in what area on a day to day basis (Jim is working on the email module for these two weeks), but I don’t necessarily know what specific task they are trying to achieve on a particular day (I don’t know if Jim is writing display code today, or back-end logic).

However, after reflecting on how this has worked on my projects, I have to conclude that my approach was wrong.

I should know what people are doing – I just need to find a balance between knowing what they are doing and getting on their nerves.

Clearly a balance can be found.

I make no apologies for now insisting on knowing exactly who, what, where, when, and why. There’s a big difference between being asked to explain your work in detail and being told how to do your job.

I’m not sure of the best way to handle getting to that level of detail though.

Daily meetings where everyone reports progress?

I find these to be a bit of a waste of time (especially in large teams) where you talk for 2 minutes then sit and listen to everyone else for 20 minutes.

Walking around and sitting down next to each person in turn (sort of like a Doctor doing his rounds)?

This is better for the team as they are only interrupted when I am talking to them.

I’ve done this before but never in a “tell me what code you are writing now” way.

I still think this might annoy me if I was on the receiving end of this.

Another way?

What about other tech lead people reading this, what works for you?

Or, if you’re on the receiving end of this, where exactly does that all-important line sit?

Sporadic service will resume

The server that hosts this site is dying.

It’s due to be replaced by a brand new server with lots of RAM – once I’ve finish building it and I can get it installed.

So until then, the site may be up and down a lot.

Once it’s fixed, sporadic service will resume.

Update: A reboot seems to have fixed it, no random restarts since.

Need a new Programming Language

I need to learn a new Programming Language.

This is for two reasons.

In my time as a programmer I’ve learned and used; Basic, Ada, C, C++, VB, Perl and Java.

So that’s 7 (5 if you merge Basic with VB and C with C++).

It’s a reasonable amount, a little on the small side.

But that list is only half the truth; most of those languages I’ve not touched in years, some I’m definitely never going to touch again (Ada!).

The only ones I now use in any form now are Java and Perl.

I use Java in my day job and to write things like this site, and I use Perl for the odd scripting task.

My first reason for needing a new language is a pragmatic one. I need to learn a new scripting language.

I need a new scripting language because every time I go to do something in Perl I find I have forgotten how to do one of:

a, list the files in a directory.

b, pass an array to a function.

c, iterate over an array.

d. all of the above.

This is because I find Perl’s syntax to be on the whole inconsistent and unintuitive.

So, I’ve had enough of Perl’s kooky ways and would like to learn something a little bit more “sane” (definition: consistent and intuitive syntax).

My second reason goes a little deeper.

I’ve been reading a few articles and blogs of late that in some way or another point out some problems with Java.

A Quick Tour of Ruby

Java doesn’t provide a utility method for opening a text file, reading or processing its lines, and closing the file. This is something you do pretty frequently.

— Steve Yegge

Can Your Programming Language Do This?

Java required you to create a whole object with a single method called a functor if you wanted to treat a function like a first class object.

— Joel Spolsky

What was interesting was that once I was over my initial denial of such heresy, I found myself mostly agreeing with what they had said.

The surprising part for me was that I had not consciously noticed these things myself – even though I now realise such things had annoyed me at the time.

The reason that they had not bubbled up to the level of consciousness was that I could not see beyond the Java language itself.

Something was awkward to do in Java (ever tried reading a file?) – well, that’s just the way Java is.

I couldn’t question it, because I was so deeply ingrained in the ways of Java, I could see no alternatives.

This worried me somewhat, what other concepts and ideas was I ignorant of due to my Java mindset?

Sometimes you need to take a step back and get a fresh perspective on things.

And what better way than to learn a new programming language.

I’m a busy guy though.

I can’t simply afford to take two weeks off just to learn a new language.

So, to be pragmatic (I’m a pragmatic guy too) I’m going to try to solve both of these problems with a single language.

So, I want a general purpose language that’s also good for scripting work.

My shortlist of languages is not long:

Python.

I’ve dabbled with Python.

It’s fun, quick, easy etc.

I’ve not done enough to know if it’s “sane” as defined above, it doesn’t seem as freaky as Perl though.

Ruby.

Ummm, everyone’s talking about it.

A friend of mine is learning it and he’s not swearing about it too much yet.

Apparently it’s mostly “sane”.

I’ve not completely decided yet, I’m leaning towards Ruby at the moment mind.

Anyone care to convince me either way, or suggest other languages I should be looking at?

Log4j with xml configuration

Like many Java developers, I use Log4j as my logging solution.

However, unlike many Java developers (in my experience anyway) I configure Log4j using xml rather than a properties file.

XML has always struck me as a neater way to represent what is essentially a hierarchical configuration.

However, it’s not terribly well documented (although the latest Log4j download comes with lots of example xml files).

Here’s an example entry from one of my xml files that sets up a daily rolling file appender appending to the existing file on startup.

    

<!-- An appender which writes to file -->
<appender name="FILE" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${user.home}/conf/apache/logs/boncey_app.log" />
<param name="datePattern" value="'.'yyyy-MM" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %C{6} (%F:%L) - %m%n"/>
</layout>
</appender>

Getting Tomcat to use a log4j XML file is a bit fiddly. The log4j manual explains how to set it up.

Under Tomcat 3.x and 4.x, you should place the log4j.properties under the WEB-INF/classes directory of your web-applications. Log4j will find the properties file and initialize itself. This is easy to do and it works.

However, it’s not immediately clear what to do if you use an XML file.

Dropping log4j.xml into WEB-INF/classes doesn’t work – or it didn’t for me on Tomcat 5.5.

Log4j manual to the rescue once more.

You can also choose to set the system property log4j.configuration before starting Tomcat. For Tomcat 3.x The TOMCAT_OPTS environment variable is used to set command line options. For Tomcat 4.0, set the CATALINA_OPTS environment variable instead of TOMCAT_OPTS.

So I put export CATALINA_OPTS="-Dlog4j.configuration=log4j.xml" into my shell startup scripts, dropped the log4j.xml file into my WEB-INF/classes dir and I was logging in no time!

One other thing you may find about log4j once you start using it is that lots of other Apache code uses it so once you set it up you’ll find your logs start filling up with lots (and I mean lots) of extra logging from the Apache code.

The traditional way to stop this is to add an entry as follows:


<category name="org.apache.tomcat" additivity="false">
<priority value="info" />
<appender-ref ref="FILE" />
</category>

This says that for anything logged from org.apache.tomcat, suppress anything less than INFO level, ie, suppress all DEBUG level logging.

The problem with this approach is that you have to keep adding new entries to suppress the logging from other packages too (Hibernate is a good example of this).

There is an alternative approach which is stunningly obvious once you hear it.

It is of course, better practice, to keep your application log4j.properties ROOT log level set to WARN or ERROR, and just turn on debugging for your own application classes! (ie. org.appfuse=DEBUG )

So obvious! Can’t believe I never thought of that.

So I set my base level to WARN then added “appenders” to log my own code at DEBUG level.

Voila, no more Tomcat cruft in my logs.

That concludes my ramblings on log4j and xml, hopefully they’ll be of use to someone (other than me of course).

Film4’s 50 Films to See Before You Die

I caught the second half of “Film4’s 50 Films to See Before You Die” on Saturday Night.

Unlike the usual Top 50 type shows this was voted for by “a panel of film experts” rather than the general public – and hence it didn’t contain Titanic.

I shamelessly lifted the list below (and the idea for this entry) from Cowfish’s most excellent posting along the same lines.

He went to the trouble of marking which ones he had seen in bold and came up with (as he says) “a fairly respectable 28”.

I did the same and came up with a shockingly awful 9.

Fortunately, Film Four is now free (the show was essentialy a promo for said freeness).

So I can start to improve on that number right-away (Apocalypse Now is on this Friday night).

2001: A Space Odyssey (I always seem to miss this when it’s on)

A Bout de Souffle

Aguirre, the Wrath of God

Alien

All About Eve

Apartment, The

Apocalypse Now

Badlands

Black Narcissus

Boyz N the Hood

Brazil

Breakfast Club, The

Cabaret

Chinatown

City of God

Come and See

Dawn of the Dead

Donnie Darko

Erin Brockovich

Fanny and Alexander

Fight Club

Heavenly Creatures

Hero

Ipcress File, The

King of Comedy, The

Ladykillers, The

Lagaan: Once Upon a Time in India

Lost in Translation

Manhattan

Manhunter

Mulholland Drive

Night at the Opera

North by Northwest

Pink Flamingos

Player, The

Princess Mononoke

Pulp Fiction

Raising Arizona

Royal Tenenbaums, The

Scarface

Searchers, The

Secrets and Lies

Sexy Beast

Shawshank Redemption, The

Terminator 2: Judgement Day

This Sporting Life

Three Colours Blue

Touch of Evil

Trainspotting

Walkabout