Skip to Main Content

Research Data Management: Archive and Preserve

Preserving Data

Data preservation should be considered early on in planning and new research project.

  • Preserving data ensures that it will be accessible and re-usable for the long-term. 
  • Your research is often judged by the data you collect.  Verifiable data is the only way your research can be judged as sound. 
  • Many institutions and researchers are legally required to manage and retain research data for several years after project funding has ended.

Storage Vs Preservation

Storage simply refers to placing data somewhere that it can be accessed when needed. Data is stored on local internal and external hard drives, cloud-based systems (Box@SMU, Dropbox, Google Drive, Amazon Web Services, etc), or servers.

Storing data does not safeguard against degrading media or obsolescence of the data formats. When you leave digital data on servers or hard drives without performing the proper preservation maintenance, your data eventually becomes obsolete. 

Preservation is the process by which research data is maintained and remains usable for the long-term.  The preservation process has 3 general steps: appraising the data, selecting a repository, and documenting and depositing data. 

Benefits of preservation include:

  • Allowing for re-analysis of the same products to determine whether the same conclusions are reached
  • Allowing for re-use of the products for new analysis and discovery
  • Allowing for restoration of original products in the case that working datasets are lost

Repositories

A data repository (or archive) archives data for the long-term. Data repositories are often web-accessible and user-friendly to allow for easy discovery and re-use. They also provide supporting identifiers that facilitate proper citation. 

There are 2 types of data repository: domain and institutional.

Domain Repositories: These are discipline-specific repositories that offer benefits (specialized metadata, review and validation by experts in the field, and specialized search and discovery tools) that are preferred options when they are available for particular discipline.  

Examples of domain repositories include: eCrystals, PubChem, National Oceanographic Data Center (NODC), the Protein Databank, Genbank, and the Inter-university Consortium for Political and Social Research (ICPSR). Not all disciplines have a dedicated repository.

Institutional Repositories: These repositories collect and maintain the research outputs of a particular institution or group of institutions. 

SMU Scholar is the repository we use. It houses research articles, theses and dissertations, data sets, and other digital assets. It is open-access, free for faculty and students to publish, offers unlimited storage, and secure, perpetual links.

 

Data Repository Lists

Repository Assessment

Important factors to consider when selecting a repository include: the type of data, it's importance to the field, potential future uses, and privacy. 

CRL TRAC Metrics - This tool is used by the Center for Research Libraries to audit and certify digital repositories. The three main categories include: Organizational Infrastructure; Digital Object Management; and Technologies, Technical Infrastructure, & Security. The checklist specifies the requirements for certification as a trusted archive. 

CRL Ten Principles - The ten basic characteristics of quality digital preservation repositories. Created by the CRL. 

Data Seal of Approval- The international organization that certifies repositories based on a set of 16 requirements. These requirements offer a good basis on which to evaluate the repositories you are considering.