Skip to Main Content

Research Data Management: Process & Analyze

Processing Data

Data must be processed before it can be analyzed. This can involve verifying, organizing, transforming, integrating, or extracting the data from its current form.  The process phase is where problems with the data are identified and corrected.

Documenting your processing methods is very important so that you can reuse your data, as well as allow it to be used by others.  Data that is well documented is identifiable and usable, and your research results are more likely to be replicated and verified.

Analyzing Data

Analysis of data helps you to describe facts, detect patterns, develop explanations, and test hypotheses. It can also mean reviewing and evaluating whether data that has been created or acquired can be saved for long-term access and preservation. The process includes data quality assurance, statistical data analysis, modeling, and interpretation of analysis results

Different techniques are used for data analysis, depending on the field of research.  Some institutions use High Performance Computing systems to analyze huge volumes of data. 

Data mining and data visualization are important techniques in this process, and there are various tools that are used. R and Python are among the most popular languages used for data analysis. 

Documenting and Describing Data

It is essential that data is properly documented, for it to be properly understood, reused and cited. Metadata is the term used to document data. Basic information that needs to be recorded includes:

  • Data collection: who, when, and why 
  • Data interpretation information: experimental conditions, statistical sampling, calibration information
  • Data rights and responsibilities, including licensing (if the data is shared) or conditions of access (if access is restricted)

 

Metadata can also be created at the project level (broader) and at the dataset level (more narrow). Examples of these are:

Project-level Documentation - the “who, what, where, when, how and why” of the dataset, context for understanding why the data were collected and how data were used.

  • Name of project
  • Principal investigator and collaborators
  • Context of data collection (geographic location, date of collection, etc)
  • Data collection methods
  • Structure, organization of data files
  • Data sources used
  • Data validation, quality assurance
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access & use conditions
  • Project sponsor (if any)

Dataset documentation - more detail about the data and the dataset

  • Variable names, and description
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • Data acquisition details
  • File format and software (including version) used

Open Source Option