Skip to Main Content

Research Data Management: Access and Reuse

Data Access 

Granting access to data requires the data to be well-documented, properly organized and stored, licensed (Creative Commons, for example), easily identifiable (DOI), and preserved for the long term.

Benefits of Sharing Data

There are many benefits to sharing data, including:

  • reinforcing open scientific inquiry
  • supporting verification and replication of original results
  • promoting new research and allows for testing of new methods
  • encouraging collaboration and varied perspectives
  • providing resources for education/teaching
  • reducing cost by avoiding duplicate data collection efforts
  • protecting against faulty or falsified data
  • enhancing visibility and impact of research projects
  • preserving data for future use
  • helping researchers in the broader community produce better research

Citing Data

Citing data properly is essential in order to:

  • Give the data producer appropriate credit
  • Allow access to the data for reference or reuse
  • Enable readers to verify your results

Citation Elements

A dataset should be cited formally in an article's reference list, not just informally in the text. Many data repositories and publishers provide specific instructions for how to cite their data. If no citation information is provided, you can use generally agreed- upon guidelines. DataCite Metadata Schema is an example. 

Core elements

There are 5 core elements of a dataset citation, with additional elements added as needed.

  • Creator(s) – individuals or organizations
  • Title
  • Publication year when the dataset was released (may be different from the Access date)
  • Publisher – the data center, archive, or repository
  • Identifier – a unique public identifier (e.g., an ARK or DOI)

Common additional elements

Although the core elements are sufficient in a simple citation (ie.: citation to the entirety of a static dataset), some additional elements may be needed if you are citing a dynamic/evolving dataset or a subset of a larger dataset. These include:

  • Version of the dataset analyzed in the citing paper
  • Access date when the data was accessed for analysis in the citing paper
  • Subset of the dataset analyzed (e.g., a range of dates or record numbers, a list of variables)
  • Verifier that the dataset or subset accessed by a reader is identical to the one analyzed by the author (e.g., a Checksum)
  • Location of the dataset on the internet, needed if the identifier is not "actionable" (convertible to a web address)

Examples:

More Information

Unique Identifiers

Unique Identifiers

A final, important step is obtaining a persistent, unique identifier for the dataset (a digital object identifier, or DOI, for example). Unique identifiers facilitate easy citation of data and allows usage statistics to be tracked. 

Public data identifiers should be:

  • actionable (you can click on them in a web browser)
  • globally unique
  • persistent
  • maintained (metadata is kept up-to-date)

 

Federal Data Sharing Policies