I came over this article: Our Revised News and it reminded me of a huge problem: it is very easy to modify things on websites and then claim that “it was like this since the start of times” (sidenote: you can f’*** it up and let the HTTP server show the real update date in the headers or let it slip trough some other automated process you’ve forgot about – but this is rarely observed). If the given changes were made (I’m not accusing the author, but you have to be careful in these times) it is extremely unethical of the newspaper. Every edit should be clearly shown, in a wiki-like fashion, even if that edit is just to fix a typo (this way we can avoid arguing over where the line is drawn between “edits to be shown” and “edits to be hidden”). Under some exceptional cases (for example where the newspaper is obligated by a court to remove some content) revisions could be deleted but it should still show the fact that a revision existed which is not available.
Of course I’m not fooling myself into thinking that this sort of system will ever be implemented. As a sidenote: on my articles I usually mark changes with the text “Update”, but there is no support in Blogger per-se to mark these changes. The cryptography behind the idea of signing sounds fairly simple (as it is described in the original post): when I write some text, I take the hash of it (preferably SHA-1 or something never) and send it to a third party service. This trusted service creates a signed statement which says “I’ve see this hash at this time”. Then later readers can use this statement to verify that your text has not changed. Of course there are several pitfalls:
- What if at a later date you want to change the look of the article? Since the HTML tags are part of the signed hash, you can’t do that. What you can do is to sign an intermediate version (like a Wiki-markup version) which is available for the user (important so that she can verify the signature) and use a transformation to create HTML. Of course this begs the question: can you trust the process which transforms the intermediate markup into HTML? (similar to the classic article Reflections on Trusting Trust which analyses the question: what if your compiler included some trojan code in each compiled program?)
- There is a second problem: there are a lot of elements not directly present in the page (like stylesheets or javascript files) which can be modified (so that the link to them stays the same, but their content changes). All of these can be used to dynamically change the content shown to the user. This is similar to the problem faced by search engines when they are trying to detect specially crafted “gateway pages”.
The conclusion? Be afraid, be very afraid. Don’t trust anyone. And donate to the Internet Archive :-).
Image taken from surfstyle’s photostream with permission.
Update: a good example for historical data is the new feature in Goolge Earth which allows you to view historical imagery. However, it also highlights the fact that you can’t trust a single entity to keep its history straight.