Hack the Gibson – Episode #62 – sort of


How to have your cake and eat it too?

Sorry for the lack of posts recently, but I’m just swamped at work and I also have to buy books from time to time. However I can say that I have several javascript and perl goodies prepared and soon I’ll post them

The recent show was a fairly good one (definitely one of the better ones), and I just want to make one comment: there is a solution for Leo’s problem with the people behind proxies (the problem was basically – for those of you who didn’t listen to the show – that because of proxies he couldn’t get accurate figures about downloads and suspected that he has a larger audience, but couldn’t prove it to the marketing people). So here is the solutions:

Point the clients (from the RSS feed enclosures and the links on the site) to a server side script (perl, php, whatever works for you) which generates the headers that disallow caching and then it generates a 302 location moved response with the actual location of the mp3 file. Now track the hits to this script rather then the audio file.

Why does this work? Because with this the conversation looks like this:

  • Client requests the script file.
  • Proxy sees the headers and says: I won’t cache this.
  • Client sees the 302 response and does an other request to fetch the file from the new location.
  • The response will have headers that allow caching, so the proxy will cache it.
  • Now an other client comes. The first three steps will be the same, but at the last step the proxy kicks in and says: I already have this file and give the file from the cache.

The benefits of this method are: better download tracking and the same bandwidth utilization as before (so you let proxies do their thing basically as opposed to the solution where you deny caching of your material). The drawbacks: there are maximum number of redirects a client is willing to follow, so if you use this method in addition with other redirects (from podtrack for example), you might create a larger number of redirects than the client is willing to handle and it just gives up. An other potential problem would be that people figure out the actual location of the audio file (it’s not rocketscience after all) and start downloading from there. However this will be probably a small percentage of the audience. If you’re still concerned, you can make it so that you only deliver the content if the client has the right referrer header. This of course has the drawback that some people turn off referrer tracking. A combined solution might that that you only require referrer headers from people who are behind a proxy (based on the request header).

A final note: please, please give the script the same name as the audio file (and use mod_rewrite for example) so that when I do a save-as on the file I don’t get some generic name like redirect.mp3 as the file name.

Update: one example of implementing this is by podtrack. While I think that their site is … (I’m trying to find a word here which isn’t to derogative) and wonder why anybody would use IIS instead of apache, they got this much right (you can check for example by downloading a podcast tracked by them with curl and the -v option. There you can see that the first connection (made to their redirect server) replies with:

HTTP/1.1 302 Found
Date: Mon, 23 Oct 2006 12:33:33 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Location: http://www.esanity.co.uk/podcasts/23-10-06-boagworld.mp3
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: -1
Content-Type: text/html; charset=utf-8
Content-Length: 173

while the original podcast location in my case replies with:

Content-Length: 27704743
Content-Type: audio/mpeg
Last-Modified: Sun, 22 Oct 2006 22:54:03 GMT
Accept-Ranges: bytes
ETag: "de6dcef82cf6c61:9e4"
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Date: Mon, 23 Oct 2006 12:33:44 GMT

, , ,

Leave a Reply

Your email address will not be published. Required fields are marked *