Just after I wrote my last post on data citation, I found Joseph Reagle’s blog entry on bibliography and citation. He’s making a number of points, one of which was about transience. In the comments to his post, and in Joseph’s comment on my post, two solutions to deal with internet transience are mentioned: the wayback machine and webcite.

I’ve looked at the wayback machine in the past, but there is no way that it represents any realistic full sample of the internet (for example, as of today, it has exactly one impression of home.badc.rl.ac.uk/lawrence - from 2004!) … but how could it? It’s an unrealistic task. What I do see it as is a (potentially very useful) set of time capsules … that is samples!

By contrast, webcite allows the creater of content to submit URLs for archival, thus ensuring when one writes an academic document, the material will be archived, and the citation will be persistent. This is a downright excellent idea, provided you believe in the persistence of the webcitation consortium (and I have no reason not to). The subtext however, is that the citation is a document, it wont help us with data - and not just because data may be large, the other issue is that the webcitation folk would have to take on support for data access tools, and I think the same argument applies to them as applies to libraries in this regard!

This brings me back to my point about data citation: we had better only allow it when we believe in the persistence of the organisation making the data available, and that will consist of rather more than just having the bits and bytes available for an http GET!