Science. Communication. Community.
When scientists spend millions of dollars and decades of their lives collecting data, why are many so willing to trust all of their efforts to a cheap $100 hard drive?
At any given time while I was in graduate school, my advisor (like most) wanted instant access to my latest data. In our lab, this usually meant a low-tech printout of my spreadsheets or latest PowerPoint slides – and maybe a data DVD if I had something really good.
DVDs of data may have kept me out of hot water with my advisor, but many experts are wondering if these archaic storage media – and even the file formats saved on them – will be of any use to scientists and researchers in the future.
Last month, librarians, researchers, scientists, and information technology specialists met at the Research Data Management Implementation Workshop in Arlington, Virginia. Panelists discussed strategies for research data storage, curation, and sustainability from the perspective of universities, funding agencies, libraries, and industry.
Today’s technology allows scientists to generate terabytes of data in just one experiment. Panelists at the workshop said that the overwhelmingly popular solution for storing this data is the use of DVDs and inexpensive external hard drives.
Though these are affordable storage options (sometimes less than $50 per terabyte), the data they store can only be accessed by one computer in one location at a time. And as graduate students filter through the lab every few years, data is rarely saved in uniform folders or formats.
However, as research becomes more interdisciplinary, data will need to be made more openly accessible – a feat that many of the panelists say can be accomplished with central data storage facilities. These data hubs would have the hardware and security to safely store data, and the librarians and other staff to sustainably curate a myriad of data types.
But all of this equipment and know-how comes at a cost, and with monthly or yearly pricing structures, that cost could be indefinite. Also, funding agencies don’t provide financial support for data storage after the life of the grant itself.
So who is responsible for data storage and curation?
Many at the workshop felt that there should eventually be federally-funded storage centers for data collected from federally-funded projects. For now though, the ultimate consensus placed the responsibility on individual universities and researchers.
A few facilities (San Diego Super Computing Center and RENCI) are already trying out this university-based data hub model, with some success. And the key, they say, is universal user adoption and cooperation. Scientists and researchers will store their data in the campus data center, as long as it’s easy and cheap. Librarians will help researchers find the data they need, as long as researchers provide the correct storage metadata (necessary information about how and when the data were collected).
All of this expense and collaboration might seem like a lot of trouble, but consider this: if the key to life’s mysteries were stored somewhere in a mountain of floppy disks, do you know anyone who’d be willing to look for it?