The Obama administration recently announced a $200 million research initiative in what's known as big data computing - the science of analyzing digital data from a burgeoning diversity of sources to reveal scientific discoveries, educational strategies, profit-making potential or to make sociological predictions. The Graduate School of Library and Information Science at the University of Illinois has made a specialty of curating big data since 2006. Carole Palmer, who directs the library school's Center for Informatics Research in Science and Scholarship (CIRSS), discussed big data and the Obama initiative in a recent interview with News Bureau news editor Dusty Rhodes.
How big do data have to be to qualify as big data?
I don't really weigh in on the numbers. It's about what you can do with the data - the power of data aggregated from many different investigators across different fields. I look for more high-impact data than just big data.
Give me an example or two of big discoveries made by mining big data.
There's a high profile example of a schoolteachers identifying a new galaxy, of the impacts of climate change, and all kinds of modeling and predicting. The idea is that as data becomes more public and accessed not just by the experts who produced it but by other researchers, the public, by citizens, we will have more of these discoveries.
Informatics and data curation are relatively new fields of scholarship. What is your cocktail-party description of your work?
Informatics is about strategies for using information in organizations, networks, cultures and societies. Our job is to make advances that help people get access to and work with information to solve problems and make new discoveries.
That's where data curation comes in. The definition of data curation that we promote is - the active and ongoing management of data through its life cycle of interest and usefulness to scholarship, science and education.
Data are very valuable assets - the raw materials of research - with tremendous potential for re-use in new and innovative ways. But digital data are high risk - extremely fragile and with few standards of good practice.
We study how to collect and add value to data, to promote sharing and integration across institutions and fields of research, looking at both technical and social problems in making data a collective, shared resource.
One of the data curation projects we're involved in is the The Data Conservancy. It is a large multi-institutional collaboration led by Johns Hopkins University. We are partners, contributing to research and education through our data-curation initiatives at CIRSS.
How has GSLIS helped pioneer this field?
As a result of research we were doing in 2002 on high impact information in neuroscience, we recognized the need for data professionals to work together with scientists on the information problems of collecting and organizing data, making it accessible and usable. We saw how our expertise could help scientists do less data management and focus more on solving scientific problems. We got grant funding to develop a specialization in data curation for the master's program - the first of its kind in the country.
We launched the data curation specialization in 2006 with a focus on the sciences and expanded it in 2008 to include the humanities. We now have more than 50 students a year in the Foundations of Data Curation course, with many completing the specialization. Placement is excellent - they are in high demand in the workforce.
We have built partnerships with the National Snow and Ice Data Center, the National Center for Atmospheric Research, and other science and humanities data centers that serve as internship sites and contribute to developing the curriculum to reflect the current and emerging state of practice.
Private businesses such as Google and Facebook have proved capable of analyzing vast amounts of data in creative ways. Why do we need the government to fund this type of work?
We talk about this as an ecosystem of data, with lots of roles for different stakeholders. Google and other kinds of private-sector operations are going to play a really important role, but how do we support our researchers in a particular discipline? What will help the typical scientist in a small lab?
Also, the private sector is not likely to invest in the long-term preservation of data. There are a lot of concerns about the resources being preserved and sustained.
Do you anticipate the U. of I. getting any part of these funds, and if so, for what type of project?
I'm talking to collaborators, and we're cooking up our projects. There's a lot of flurry around this but there really has been for years. Many of these activities are just ongoing. We have a systematic set of research and education initiatives that we've been working on since 2003, so we'll be trying to align that with this recent call.
Some people find the data revolution scary, because they think it smacks of Big Brother. What do you say to those people?
I'm not someone who buys into the worries so much as I see them as research opportunities, although we need to be aware of the ways that technology can be used that aren't beneficial to society. That's part of our job - to also make sure it goes the other way.
Do you ever miss the days when all knowledge, it seemed, could be contained in a card catalog plus Reader's Guide to Periodical Literature?
I think that was always misleading. Our Reader's Guide and the card catalog were always just snapshots that didn't represent the deep, long history of what was happening in the world. What you saw in the Reader's Guide was not an accurate representation of all the world's knowledge.