Again, this will be something new here (at least for me): I’ll publish a pre-rant for Security Now! Steve Gibson expressed interest in the subject of cookies, so I’ll tackle that in this post and also the more general question of user-tracking. I discuss different ways it can be accomplished, ways you could protect yourself and the question: should you?
In a way the World Wide Web is a marketing companies wet dream: just image, tracking the moves of the users, building a profile which lists their potential interests (as it can be inferred from the list of visited sites and the frequency of the visits). Using this they can show ads which they consider will be relevant to us. Of course they don’t do this out of the goodness of their hard. They do it because you have a higher probability of reacting to the advertisement if it’s relevant to you.
Here are the means I know of which can be used to accomplish this:
- Tracking cookies or third party cookies – this is IMHO a bad name (from a technical point of view), and I’ll explain in a minute why. But first lets answer the question: what are cookies? Cookies (or
HTTP State Management Mechanismas it is referred to by the official RFC) are opaque tokens (from the point of view of the client) which contain some information which helps the server side application identify the fact that different HTTP requests are part of the same session. This is necessary, since the HTTP protocol does not define any method for creating, tracking and destroying sessions. That is, whenever you request an object from the web server it will treat it as separate request, having no idea what you requested earlier. The cookie is used as token in the following way: the server says to the client
take this piece of information and return it to me on subsequent requests. This way it can determine if the request is part of the same session (because it can hand out a different value to each client and when the client returns the information, it can identify the session it is part of). Before you ask: you can’t use IP addresses as a reliable unique identifier because of proxies and NATs. You can observe two things here: this behavior is entirely voluntary on the clients part (it may choose not to return the token) and that it applies to every HTTP transaction, not just HTML documents (including images, flash animation, java applets, etc). Of course the standard defines a policy which specifies in which requests should the cookie be returned. The
a third party cookiebecause it is set by a different entity than the server you see in your address bar. However I think that this is a bad name since it implies that some kind of spoofing is going on, like a server is setting a cookie for an other server – which by the way is explicitly prohibited by the standard and won’t work in any modern browser. To sum up:
- Applicability: (almost) every browser supports it. The standard itself if relatively old (almost 10 years)
- Customizability: Current browsers offer ways to set a policy on what cookies should / should not be accepted both in a whitelist and blacklist format. Usually they do not include the option to view the cookies stored on the machine, but there are many free third party tools / extensions which enable you to do this.
- Risk of disabling it: if cookies are disable altogether, many sites which have a member-only area will break and the user will be unable to log-in. Disabling of third party cookies breaks pages which host elements fetched from a third party server (which represents a small but growing percentage of the web in the age of mashups)
- Flash Local Shared Objects (AKA flash
- Applicability: on any platform which has at least version 6 of the Flash Player installed.
- Customizability: you can go to the site of Adobe to completely disable or to manage the shared objects which are on your computer. There is also a Firefox extension, however it seems dated and not maintained any more, so probably the safest bet is to go with the official links provided above.
- Risk of disabling it: sites which rely on it may break, however I didn’t found any sites until now which relied on it for other purposes than tracking, so currently it may be disabled without any problems. This may change in the future however.
- Referrer URLs – Referrer URLs is a piece of information sent by your browser when requesting an object from a web server. For example if you click a link at http://foo.com/link.htm which takes you to http://bar.com/target.htm, the bar.com webserver will receive as part of the request (if you didn’t disable it in your browser) the string
http://foo.com/link.htmas the referrer. This can (and is) used by sites for statistical purposes (to see who links to them) and for security (however this is a pretty weak form of security since it relies on the client
- Applicability: on almost every browser
- Customizability: you can see a tutorial about enabling it here which should point you in the right direction.
- Risk of disabling it: you shouldn’t encounter any problems because few sites use it for other purposes than statistics, but if you don’t mind, give them this piece of information, it can be used to create better content for you!
everything third party is bad.
- Sign-in information – an often overlooked fact by people is that the big three
identityproviders (Google, Yahoo and MSN) also provide advertising. Because of this they can correlate tracking information obtained by any of the methods listed above with the personal information you provided at signup. Now I’m not saying that they do this, I’m just saying that they have the technical means to do it.
- Applicability: if you are a user of any of these sites and browse sites – while you are logged on – which display advertisement from them, you are affected.
- Customizability: log off before browsing to other sites and clear all the cookies from them. Before logging back in also clear the cookies from them placed there by the ads.
- Risk of disabling it: the inconvenience of constantly having to clear cookies.
Now for the philosophical question: should you be worried? Should you go to great length to avoid this tracking, even at the cost of breaking useful features on the site? You should consider the following ideas (they are not absolute truths, but arguments which are used in this debate):
- Nothing is free and advertisement is an (arguably) quick and (mostly) painless way of payment for the content / service. So disabling advertisement can be thought of as a way of
cheatingto get what you desire without payment)
- Contextual ads can be useful. For example if I would like to buy a laptop and I see an ad for laptop, I will most probably click it. This is useful for both parties: for me because possibly I learn about an offer I didn’t know about and for the company who put out the ad, because I might buy something from then.
- Some people say: but this is not right! The user should be in control! If you want to buy laptops, search for them yourself! Of course no rational person (no offense to anybody) would buy something of significant value based on one ad (because usually it’s only showing one detail of the product – probably not mentioning the not-so-bright sides) but it may add value to your research. So, while you shouldn’t buy based on what they say on the teleshopping channel – err I mean ad 🙂 – it may add value to your research while you are considering your options.
- The tinfoil hat people may say: I don’t want the government / Amazon / Google / whatever track my every movement! I have a right to privacy! – and they are right, they do have a right to privacy, however they must be willing to give up certain benefits or to make some additional steps. And before you object saying: why do I have to make extra efforts to get the same service everybody receives while keeping my information as private as possible? – just consider how things work in the real world – if you want to drive a car, you must get a license. It is your right to drive a car (if you are of legal age), however you still have to get a license. Because every analogy breaks down, lets consider the technical point of view: every technology can be used for good an bad (this is even more so if there is no clear distinction between good and bad). The only way of preventing 100% of the bad usages of a technology is to ban it all together. You may choose this, but be aware that you are not getting the benefits either. Now some of the technologies (like session cookies) can be emulated by other technologies (like appending the SID – the session identifier to every request as a GET parameter), however the given technology was introduced to make it easier to accomplish certain tasks without the complication and hassle the old method needed. Guess, what a rational website owner / creator would do: use the more complex, less reliable and more expensive technology for a very little percent of its visitors or go with the easier and more powerful technology?