Megabytes, terabytes, petabytes… data is growing by leaps and bounds. In the sciences and social sciences, the use of geolocation maps, telescope imagery, and particle residue data have compelled computer scientists to create faster, thinner, smaller, etc. data storage devices. But supporting big data in the short-term is one of the more manageable challenges facing academia today. Publishing and archiving all of the important data––from the sparest of spreadsheets to corpulent computer files––is perhaps the greatest information challenge facing modern research institutions. Accordingly, U.S. funding agencies such as the National Science Foundation (NSF) have begun to nudge institutions forward by requiring Data Management Plans from some grant recipients (i.e., typically, for the agencies’ largest grants). Given the importance of these research revenue streams, data management planning is likely to become modus operandi soon.
The march toward data management planning has been relatively long and uneven, considering the consequent rise in digital data production. Until recently, individual researchers largely archived their own data (in desk drawers, on office shelves), and made it available to other scholars upon request. Over the past few years, high impact journals such as PLOS ONE, and large swaths of journals in economics and other fields have made images, data tables, and raw data files available via their websites. Libraries and schools have launched data repositories; the Yale Law School repository (eYLS) now houses datasets, for instance. These early starts are giving way to university-wide data management committees, working groups, and pilot projects. Now, research universities are formalizing data management protocols and assisting researchers in managing the lifecycle of their data, from project formulation to long-term conservation.
At Yale, the Data Management Planning Consultation Group, a collaboration of the Yale University Libraries and Information Technology Services, provides pre-submission (i.e., before the funding request is submitted to an agency) consultation on data management plans for NSF and NEH grant proposals. Researchers can submit a consultation request to the group via an online form. Beyond that, a host of campus working groups and task forces have produced reports in the last couple of years; their work is being synthesized into a university-wide data management strategy. Last fall, big data researchers such as YLS professor Andrew Papachristos, and library and IT administrators convened Yale’s first Data of Data to discuss disciplinary trends and campus strategy. A second symposium is in the planning stages now.
Whether research data is big or small, the academy has recognized the importance of disseminating and preserving it. In the near future, data management planning will be a taken-for-granted step in the research process, akin to human subjects or research animal care review. If you would like to learn more about data management, contact empirical research librarian Sarah Ryan, a member Yale’s DMP Consultation Group, or check out these library resources: