Dataverse Project

Summary

The Dataverse project is an open-source web application for sharing, preserving, citing, exploring, and analyzing research data. It facilitates the publication of data to others and makes it easier to replicate the work of others.

The idea behind the Dataverse project is to automate much of the work of the professional archivist while providing services for data creators and ensuring they receive proper credit.

Promoting organizations

Dataverse is an initiative driven by the Institute for Quantitative Social Science (IQSS) at Harvard University, which has led the development of the application since 2006. The Harvard Dataverse is supported by the Harvard Library and Harvard University Information Technology (HUIT), ensuring its technical and institutional sustainability.

At the international level, the project has expanded through a network of national and institutional repositories that use the Dataverse software, such as DataverseNL (Netherlands), DataverseNO (Norway), HeiDATA (Germany), CIRAD Dataverse (France), or Borealis in Canada, coordinated by the Ontario Council of University Libraries. These implementations have given rise to a global and federated community that shares best practices and collaborates in the ongoing development of the platform.

Objectives

Before the Dataverse Project existed, researchers were forced to choose between receiving credit for their data—by managing the distribution themselves but without long-term preservation guarantees—or sending them to a professional archive, where long-term preservation was ensured but little credit was received.

The Project eliminates this choice through the establishment of a Dataverse repository, which hosts multiple virtual archives called Dataverse collections. Installing a Dataverse collection on an institution’s website preserves the look, branding, and URL of the site, while also providing an academic citation for the data that gives the institution full credit and visibility on the web. The institution’s webpage is served by a Dataverse repository, with institutional support and long-term preservation guarantees.

At present, the strategic goals of the Project are to:

  • Increase adoption (by users, Dataverse repositories, Dataverse collections, datasets, and journals).
  • Develop the capacity to handle sensitive level-3 data, at large scale and in streaming contexts.
  • Expand data and metadata functionalities for existing and emerging disciplines.
  • Extend archiving and preservation features.
  • Increase contributions from the open-source development community.
  • Improve user experience and user interface.
  • Continue to enhance the overall quality of the software.

Beneficiaries and stakeholders

The main beneficiaries of the Dataverse Project are researchers, journals, data authors, editors, data distributors, and affiliated institutions, as they gain academic recognition and online visibility while also ensuring the long-term preservation of research data.

Results

The Dataverse Project standardizes dataset citation and makes it easier for researchers to publish their data and receive proper recognition for their work. By increasing the visibility of research data, researchers gain appropriate academic credit for their contributions.

At the same time, by depositing research data in a Dataverse repository, researchers comply with the data management plan requirements set by funding agencies.

In addition, the project helps researchers meet the obligation—imposed by many journals, publishers, and funding bodies—of depositing replication datasets in a public repository.

The project supports researchers in depositing replication datasets, making this type of data easily discoverable so that other researchers can reuse them and verify that a study can be replicated, without the need to contact the original authors.

Challenges

The implementation of Dataverse requires overcoming significant technical complexity and a steep learning curve, which demands specialized training. In addition, it involves infrastructure costs and the commitment of IT staff to ensure its maintenance and sustainability.

Evidence of success

The Dataverse Project has grown considerably over time and is now a major international collaboration initiative. Likewise, the Dataverse software has been translated into several languages.

Between 2017 and 2022, the number of Dataverses increased by more than 10,000, while over the same period the total number of datasets grew by nearly 200,000.

As for the total volume of files in the project, between 2016 and 2022 it rose from 184,000 files to more than 2 million by June 2022. Similarly, the number of file downloads during this period increased exponentially, going from 1.86 million downloads to a total of 68.1 million in June 2022.

Bibliography

Specific information

Topic: Open access policies, Research data, Digital preservation

Implementation scale: International

Responsible agents: Universities (governing bodies), Researchers

Location: United States

Key words: repositories, preservation, open source software

Start and end date: 2006 -

Sustainability: Yes

PDF Document:
Download file

Search by

Authorship information

Created on: 03/08/2022

Author of record: Berta Ollé Pérez

Institution author: Universitat de Barcelona