Yesterday, while on a train to a JISC meeting in Bristol on Rights in a Digital Environment (of which more later), I found time to read through the ICSU Report of the CSPR Assessment Panel on Scientific Data and Information (pdf).

It’s an excellent report. I’ve extracted the key points from my perspective. (I did the extraction to a file, knowing I would be putting it on the web, while listening to a talk on copyright. The verbatim extraction is probably a violation of copyright, but I’m also sure that ICSU would grant me the right to have done this. It’s a case where ICSU really ought to publish the document with an appropriate license, e.g. a creative commons license, rather than put the onus on the user to seek permission to use it in this way).

Definition of Data

For the purposes of the assessment, the panel considered data and information as a continuum ranging from raw research data through to published papers. Data includes, at a minimum: digital observations, scientific monitoring, data from sensors, metadata, model output and scenarios, qualitative or observed behavioral data, visualizations, and statistical data collected for administrative or commercial purposes. Data are generally viewed as input to the research process. Information generally refers to conclusions obtained from analysis of data and the results of research. But the distinction between them is flexible and will vary according to the situation. Increasingly, the output of research (traditionally viewed as information ) includes data and has become input to other research, rendering the output-input distinction between data and information meaningless. In this report, both data and data and information are used interchangeably to refer to the entire continuum because the continuum as a whole has been affected by changes in information technology and is subject to many of the same issues. Where appropriate, the distinction is made in the text between data and scientific publications , which are a specific sub-set of scientific information that raise particular issues.

Example of importance of data archiving

The cost of archiving, which includes the preservation of data integrity and technologically upgrading databases as the software and hardware technologies in which they are embedded are superceded by more advanced versions, is significant and is rarely seen as a high priority activity. However, the adverse effects on future scientific research of not paying due attention to long-term data preservation can be dramatic. For example, data from the first International Polar Year in 1882 would have been invaluable for refining models of global environmental change by today s researchers, but the data have been lost and cannot be recovered.

Professional data and Information Management

Scientific data and information management can no longer be viewed as a task for untrained amateurs or as part of the routine clean up at the completion of a research project. The use of advanced information technology in scientific data management and dissemination makes it essential that data management be the responsibility of professionals. Scientific data centers and archives require stability in their financial resources so that they can make institutional commitments to data management and preservation over many decades.

Recomendations 16 through 21

  • The panel recommends that ICSU play a major role in promoting professional data management and that it foster greater attention to consistency, quality, permanent preservation of the scientific data record, and the use of common data management standards throughout the global scientific community.
  • Recognizing that scientific data and information management is undergoing rapid innovation and change, information technology specialists, librarians, research scientists, government data producers, donors, and others should be involved in a concerted effort to develop standards and curricula for professional training for scientific data managers.
  • Financial support for data and information management should become a routine component in all research budgets and the evaluation criteria for assessing research funding proposals should include evaluation of data management.
  • All scientists should receive training in data management as part of their graduate and postgraduate education. ICSU should encourage the development of guidelines for data management by working scientists and their institutions.
  • Scientists should be recognized and given credit for the scientific contribution of the data sets that they produce as well as for the analysis of those data.
  • ICSU, its members and associated bodies should raise awareness of the increasingly important role that institutional repositories play in relation to scientific information management and preservation and the need to ensure that such repositories are properly resourced, developed and maintained.

Metadata

Metadata standards vary across fields of science. However, in a period in which there is increasing interdisciplinary research directed at problems outside the purview of the traditional disciplines, many scientists look to metadata to obtain information about data in fields in which they have little direct training. It is important that they be able to obtain the basic information they need through on-line, publicly accessible metadata catalogues.

The use of common metadata standards across fields of science facilitates the identification, re-use, and integration of scientific data and provides information that future scientists can use to evaluate the data. Metadata should be the principle vehicle for documenting known data quality issues.

Recommendations 22-25

  • ICSU should work with its members to promote the development and use of flexible, open, and easy to use community standards for metadata. These standards should be interoperable and independent of specific hardware and software platforms. Guidelines for their use should be widely circulated and incorporated into data management training courses.
  • Data repositories and publishers should ensure that standard metadata are available for all databases and records.
  • Metadata should be archived and made freely available electronically in multidisciplinary metadata catalogues.
  • To foster the efficient production of metadata, ICSU should encourage the development of software for writing metadata that can be made available to scientists throughout the world.

Archiving

Permanent archiving of scientific data and information is essential. In some fields, there are institutions that have a clear responsibility for data archiving, but this is not always the case and varies from one scientific field to another. There is a distinction between data centers, which are responsible for providing immediate access to scientific data and information, and archives, which provide for permanent preservation and management of data and information.

Recommendations 26-28

  • ICSU and its members should raise awareness of the need for long-term institutional support for data archives both at the national and international level.
  • ISCU should foster discussion within the scientific community, including its members and interdisciplinary bodies, on criteria, institutional structures, and models for decision making related to the permanent preservation of scientific data and information.
  • Ways to reduce the costs of archiving, such as sampling from extant data bases or establishing multiple classifications to prioritize levels of archival support, should be examined by the scientific community.

Also Recommedations 30 and 31

  • Data security and integrity must be addressed in the context of procedures for data management. A professional information technology staff is required for both archives and data centers if they are to maintain adequate system and database security and integrity.
  • Data that are disseminated by scientists as part of research projects should have metadata that describe the security and integrity measures employed in the collection and management of the data.

Recommendations 39 and 40

  • Governments, and other bodies concerned with international and national policy development, should ensure that IPR legislation recognizes the value of ensuring full and open access to data for scientific research and education purposes.
  • ICSU and its members should investigate appropriate mechanisms to ensure that science is fully represented in international treaty negotiations that might have an impact on access to data for scientific purposes.

Recommendation 58

  • (… ICSU should) establish an international Scientific Data and Information Forum (SciDIF) involving all the key stakeholders: ICSU members, interdisciplinary bodies, science funding bodies and other data providers and users.Through SciDIF, ICSU should aim to ensure that the full benefits of new data and information technologies and capabilities are extended to scientists throughout the world.

Significant ICSU bodies for data analysis

There are several interdisciplinary ICSU bodies whose principal focus is the management and use of large scientific data sets: the Committee on Data for Science and Technology (CODATA), the Panel of the World Data Centers (WDC), and the Federation of Astronomical and Geophysical Data Analysis Services (FAGS). ICSU is also a co-sponsor of the Global Observing Systems (GOS). Both individually and working together, many members of ICSU are very actively involved in issues related to scientific data.

Financial Support Issues

Financial support for data management must become a normal component in all research budgets and part of proposal evaluation criteria. One consequence of this in a situation where the overall funding is limited is that financial support for other research activities may decline relative to data management costs. However, by providing widespread access to well-documented and managed research data, improved data management practices will provide economies of scale for the scientific enterprise as a whole, now and in the future.