More on Data Citation

There was a lot interest at the e-science meeting about a throw away comment I made about our claddier project.

In talking to folk, it became clear that it helps to draw a distinction between those who interpret data and those who produce data. Often these are the same people, and there is a clear one-to-one link between some data, the data creator(s) and a resulting paper which describes the data in some detail. However, in this situation, the key reason why the paper is published will normally be the interpretation that is presented, not the data description per se (although there are some disciplines in which the paper may simply describe the data, this is not the normal case).

So what CLADDIER is about is the situation where the data creator(s) has (have) only been involved in the creation of the data, not the resulting interpretation. Again, in some disciplines the data creators may end up as authors on relevant papers (and may never even see or comment on the text). However, in most disciplines this either doesn’t happen, or is severely frowned upon, and in these situations the data creators are somehow second class academic citizens (despite being an essential part of the academic food chain). What we want to be able to do is have recognition for these folk via citation of the dataset itself, not a journal paper … that way important datasets will be measurably important.

There are a lot of complications, some of which I’ve addressed before. Some of the additional things we discussed yesterday included

How to handle small contributions (such as annotations). My take on this is that small contributions are visible via authorship within the cited entity, but probably ought not be visible outside of it (although in the semantic web world one ought to be able to count the number of these things any individual does). At some point folk have to decide on the definition of small though (probably a discipline dependent decision) …
The situation is rather more complicated with meta-analyses. Arguably we’re in the same situation as an anthology or book of papers … in either case we would expect the contributions to be citable as is the aggregated work.

One new concept for me was the taxonomy world might want to count (in some way) the number of times a specific taxa is mentioned, and use that as a metric of the work done categorising the species. It struck me that this might not be too clever in their world - surely the description of some rare species ought to be pretty important, but it might not get cited as frequently as some pretty routine work on, for example some agriculturally important species. (In the journal paper world this is the same reason why citation impact alone never used to be the whole story in the UK Research Assessment Exercise)