Skip to Main Content

Research Data Management: Archive & Share

Why Should I Preserve My Data?

Data preservation is the process of ensuring that your research data will be accessible for the long-term so that it can be verified and used by others. Many institutions and researchers are legally required to manage and retain research data for several years.

Storage is for making your data accessible during active research. It does not safeguard against degrading media or obsolescence of the data formats. Without data preservation, your data eventually becomes obsolete.

Data preservation has three steps:

  1. Decide what datasets are worth preserving,
  2. Select a repository, and
  3. Document and deposit your data.

Data Repositories

A data repository or data archive preserves data for the long-term. They are often web-accessible to allow for easy discovery and re-use and provide supporting identifiers that facilitate proper citation. 

Selecting a Repository

  • Important factors to consider when selecting a repository include the type of data, it's importance to the field, potential future uses, and privacy. 
  • Check if there are any specific requirements by your funders or journal.
  • Domain (or disciplinary) repositories offer specialized metadata, review and validation by experts in the field, and specialized search and discovery tools. This is a preferred option if available for your discipline.
  • If a domain repository is not available, there are several general-purpose repositories that can fulfill journal and funder requirements.

SMU Dataverse Repository

More Repositories

Preferred File Formats

Try to choose formats that are open, unencrypted, uncompressed, and in common usage by the research community. Examples include:

  • Databases: XML, CSV
  • Containers: TAR, GZIP, ZIP
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8

If you would lose data by selecting a non-proprietary format, make a copy in both the proprietary and open format, and document the software version in a Readme file.