Purging jsessionids

jsessionid is the parameter that a Servlet engine adds to your site’s URL if you’ve enabled cookies in your config but the user viewing the site doesn’t have cookies enabled.

It then allows a cookie-less user to use your site and maintain their session.

It seems like a good idea but it’s a bit flawed.

The author of randomCoder has summarised the flaws quite well.

Every link on your site needs manual intervention

Cookieless sessions are achieved in Java by appending a string of the format ;jsessionid=SESSION_IDENTIFIER to the end of a URL. To do this, all links emitted by your website need to be passed through either HttpServletRequest.encodeURL(), either directly or through mechanisms such as the JSTL <c:out /> tag. Failure to do this for even a single link can result in your users losing their session forever.

Using URL-encoded sessions can damage your search engine placement

To prevent abuse, search engines such as Google associate web content with a single URL, and penalize sites which have identical content reachable from multiple, unique URLs. Because a URL-encoded session is unique per visit, multiple visits by the same search engine bot will return identical content with different URLs. This is not an uncommon problem; a test search for ;jsessionid in URLs returned around 79 million search results.

It’s a security risk

Because the session identifier is included in the URL, an attacker could potentially impersonate a victim by getting the victim to follow a session-encoded URL to your site. If the victim logs in, the attacker is logged in as well – exposing any personal or confidential information the victim has access to. This can be mitigated somewhat by using short timeouts on sessions, but that tends to annoy legitimate users.

There’s one other factor for me too; public users of my site don’t require cookies – so I really don’t need jsessionids at all.

Fortunately, he also presents an excellent solution to the problem.

The solution is to create a servlet filter which will intercept calls to HttpServletRequest.encodeURL() and skip the generation of session identifiers. This will require a servlet engine that implements the Servlet API version 2.3 or later (J2EE 1.3 for you enterprise folks). Let’s start with a basic servlet filter:

He then goes on to dissect the code section by section and presents a link at the end to download it all.

So I downloaded it, reviewed it, tested it and implemented it on my site.

It works a treat!

However, I still had a problem; Google and other engines still have lots of links to my site with jsessionid in the URL.

I wanted a clean way to remove those links from its index.

Obviously I can’t make Google do that directly.

But I can do it indirectly.

The trick is first to find a way to rewrite incoming URLs that contain a jsessionid to drop that part of the URL.

Then to tell the caller of the URL to not use that URL in future but to use the new one that doesn’t contain jsessionid.

Sounds complicated, but there are ways of doing both.

I achieved the first part using a thing called mod rewrite.

This allows me to map an incoming URL to a different URL – it’s commonly used to provide clean URLs on Web sites.

For the second part there is a feature of the HTTP spec that allows me to indicate that a link has been permanently changed and that the caller should update their link to my site.

301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.

So, putting these two together, I wrote the following mod rewrite rules for Apache.


ReWriteRule ^/(\w+);jsessionid=\w+$ /$1 [L,R=301]
ReWriteRule ^/(\w+\.go);jsessionid=\w+$ /$1 [L,R=301]

The first rule says that any URLs ending in jsessionid will be rewritten without the jsessionid.

The second does the same but maps anything ending in .go – I was too lazy to work out a single pattern to do both types of URLs in one line.

And I used that all-important 301 code to persuade Google to update its index to the new link.

So, from now on – my pages will no longer output jsessionids and any incoming links that include them will have them stripped out.

In other words; jsessionids purged.

8 thoughts on “Purging jsessionids”

  1. Good point; I’m using Apache 2.0.

    I’ve used mod rewrite with Apache 1.3 before; there were some differences.
    I *think* that Apache 1.3’s version doesn’t like the “w” notation.

    Try something like this:
    ReWriteRule ^/([A-Za-z0-9]+);jsessionid=[A-Za-z0-9]+$ /$1 [L,R=301]

    The “w” means match on any word; [A-Za-z0-9] pretty much means the same.
    If no luck, you’ll have to go through the documentation for each version online.

    Good luck!

  2. It works great. Here’s googlebot scrubbing a jsessionid:

    66.249.66.1 – – [09/Apr/2007:21:24:45 -0700] “GET /pod_73.do;jsessionid=A59D5254EA3F316C606E540C75B61E49 HTTP/1.1” 301 330 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.66.1 – – [09/Apr/2007:21:24:46 -0700] “GET /pod_73.do HTTP/1.1” 200 17467 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

    Here’s my httpd.conf:

    ## Hard redirect bad spiders that cached a ;jsessionid=XXXXXX URL
    RewriteEngine On
    ReWriteRule ^/(w+);jsessionid=w+$ /$1 [L,R=301]
    ReWriteRule ^/(w+.gif);jsessionid=w+$ /$1 [L,R=301]
    ReWriteRule ^/(images/w+.gif);jsessionid=w+$ /$1 [L,R=301]
    ReWriteRule ^/(w+.do);jsessionid=w+$ /$1 [L,R=301]

  3. If you want to redirect all the pages in one rule:

    ReWriteRule ^(.*);jsessionid=.*$ $1 [L,R=301]

    If you also want to preserve the Query_String, a slightly more complicated rule will do the trick:

    ReWriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]

    If you want to redirect only static content, you might consider something like this:

    ReWriteRule ^(.*.(gif|png|jpe?g|html?|css|js|jar));jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]

    Note that “w” will match only word characters (excluding . and /), but “.” will match anything.

    See also http://www.mail-archive.com/struts-user@jakarta.apache.org/msg84279.html

  4. Hi,
    Regarding hiding out JSessionID passing out in my URL.I have created servlet Filter with encodeURL method which hide out jsessionID in url ,but in my application is not working.What should i do more with encodeURL method to hide out jsessionID.Please come up with coding for hiding out jsessionid in URL .

    Regards
    Ram

  5. In my application,i have included servlet Filter coding.but i am working weblogic server which is not take in account of filter coding which showing Jsessionid in url passing regardless.So please answer my queries,what can do more with same filter coding
    Regards
    Karthik

Leave a Reply

Your email address will not be published. Required fields are marked *