CHAMPAIGN, Ill. - The proliferation of scientific research data is creating an urgent situation for organizations and professionals charged with data handling and stewardship, indicating that the demand for data management and preservation services will be quite high, especially for those scientists working in areas traditionally considered to be "small science" disciplines, according to new research published by a University of Illinois expert in information science.
Melissa Cragin, a professor of library and information science, says that although scientists exchanged data with trusted collaborators regularly, sharing with anyone outside their inner circle, sometimes including other members of a project team, usually took place through "just in time" negotiations - that is, at the moment of need.
The study, published in the journal Philosophical Transactions of the Royal Society A, highlights the need for data curation services tailored for the "small sciences" - typically described as hypothesis-driven research led by a single investigator or small research group that generates and analyzes their own data.
One particular concern lies in the high level of variation and complexity in research data and data sharing practices among scientists. For most of the participants in this study, "there were no field-wide norms for sharing, and none of the scientists routinely deposited data into any shared repositories," Cragin said.
"A lot of the current cyber-infrastructure and e-science initiatives have been developed around sciences that are already well-established and have regularly applicable standards and sharing practices in place," she said.
"For some other disciplines, these initiatives act as catalysts for developing community-based practices; all of these research communities tend to be large and distributed, and are often dependent on common resources."
However, data management at the small science-level tends to be very idiosyncratic, with practices often varying from lab to lab. In addition, scientists rarely have the resources to prepare data for public sharing.
The focus on disciplines that produce very large data can often obscure the possible value in small data sets.
"Other research has shown that small science generates a substantial amount of data, and, over time, perhaps even more than big science," Cragin said. "Unfortunately, we don't yet have systems in place to facilitate best practice for data management or to address the range of needs related to sharing practices, but this is becoming increasingly important, particularly as funding agencies add requirements for data sharing and data-management plans."
The ideal of open science, with scientists freely sharing, is affected by a number of issues, according to the study.
"Sharing practices are conditioned by a number of factors, including the research methods, the characteristics of the data themselves, scientists' personal experiences, and the need to control the dissemination of one's own work," Cragin said.
When researchers do share data prior to publication, there is often concern that findings could be misused by a rival researcher, or someone seeking to cherry-pick data to fit a pre-existing hypothesis.
"Luckily, those sorts of negatives are not widespread but they do temper researchers' inclination to share data," Cragin said.
The data generated from small science research often have long-term value, such as observational data that have recorded a non-repeating occurrence of some phenomenon. Whether it's agronomists studying water quality, atmospheric scientists studying severe weather patterns, or civil engineers studying traffic patterns, this type of data may also be aggregated with other similar observations, which increases their value for future computation because "we might be able to put them together and mine them for new findings," Cragin said.
"Building research data collections requires that we understand how all those research practices happen, so that we are better able to support publication and stewardship of research data, much as we have with preserving traditional publications like scientific articles."
Cragin's co-authors are Carole L. Palmer, a professor of library and information science at Illinois, and Jacob R. Carlson and Michael Witt, both of Purdue University.