Figure One

Science. Communication. Community.

The Problem With Research Data Storage Is…You

When scientists spend millions of dollars and decades of their lives collecting data, why are many so willing to trust all of their efforts to a cheap $100 hard drive?

by Jessica Stoller-Conrad

Scientists, are these really the best we've got?

Scientists, are these really the best we’ve got?

At any given time while I was in graduate school, my advisor (like most) wanted instant access to my latest data.  In our lab, this usually meant a low-tech printout of my spreadsheets or latest PowerPoint slides – and maybe a data DVD if I had something really good.

DVDs of data may have kept me out of hot water with my advisor, but many experts are wondering if these archaic storage media – and even the file formats saved on them – will be of any use to scientists and researchers in the future.

Last month, librarians, researchers, scientists, and information technology specialists met at the Research Data Management Implementation Workshop in Arlington, Virginia.  Panelists discussed strategies for research data storage, curation, and sustainability from the perspective of universities, funding agencies, libraries, and industry.

Today’s technology allows scientists to generate terabytes of data in just one experiment.  Panelists at the workshop said that the overwhelmingly popular solution for storing this data is the use of DVDs and inexpensive external hard drives.

Though these are affordable storage options (sometimes less than $50 per terabyte), the data they store can only be accessed by one computer in one location at a time.  And as graduate students filter through the lab every few years, data is rarely saved in uniform folders or formats.

However, as research becomes more interdisciplinary, data will need to be made more openly accessible – a feat that many of the panelists say can be accomplished with central data storage facilities.  These data hubs would have the hardware and security to safely store data, and the librarians and other staff to sustainably curate a myriad of data types.

But all of this equipment and know-how comes at a cost, and with monthly or yearly pricing structures, that cost could be indefinite.  Also, funding agencies don’t provide financial support for data storage after the life of the grant itself.

So who is responsible for data storage and curation?

Many at the workshop felt that there should eventually be federally-funded storage centers for data collected from federally-funded projects.  For now though, the ultimate consensus placed the responsibility on individual universities and researchers.

A few facilities (San Diego Super Computing Center and RENCI) are already trying out this university-based data hub model, with some success.  And the key, they say, is universal user adoption and cooperation.  Scientists and researchers will store their data in the campus data center, as long as it’s easy and cheap.  Librarians will help researchers find the data they need, as long as researchers provide the correct storage metadata (necessary information about how and when the data were collected).

All of this expense and collaboration might seem like a lot of trouble, but consider this:  if the key to life’s mysteries were stored somewhere in a mountain of floppy disks, do you know anyone who’d be willing to look for it?


About jstoll01

Jessica began her journalistic endeavors as an embarrassingly informal food critic for her college newspaper. After dropping the fork and picking up a micropipettor, she spent two years as a genetics research technician and three years in graduate school before trying her hand at science writing. Upon receiving a Master’s degree in Biological Sciences from the University of Notre Dame in May 2012, Jessica participated in the AAAS Mass Media Fellowship program as a Science Desk intern at NPR in Washington, D.C. There, she contributed a number of posts to the health blog (Shots) and the food blog (The Salt). She continues to write regularly for the NPR blogs, National Geographic News and other media outlets as a freelancer, currently based in Southern California.

3 comments on “The Problem With Research Data Storage Is…You

  1. pdiff
    April 3, 2013

    Yes, Yes, Yes!!! This is my biggest pet peeve as a consulting statistician. When the research is done, many can’t even locate the “old” data a few years later. Thank you for covering this topic 🙂 .

  2. molsym
    April 3, 2013

    Surely this is where we should put pressure on academic publishers. They make fortunes out of research so can’t we force them to give something back by supplying links to the raw data, stored at their cost, on their websites? Obviously, more detail than this initial idea would need to be thought about but they are currently pariahs and this would give them a way to demonstrate ‘added value’.

  3. Pingback: Compiler 4/12/13: TEDxCERN and City Data Culture | ScaleOut

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on April 2, 2013 by in Policy, Research and tagged , , .
%d bloggers like this: