Skip to Main Content

Using Data in Research: Process

Stage 3: Process data

Processing data can include cleaning, wrangling, formatting, compression, encryption, and creating summary data. 

Cleaning

Cleaning is the process of removing errors in datasets. Excel can be used for basic cleaning, Using a specialized tool like OpenRefine allows you to save your steps and reproduce them later.

Transformation

Data transformation is the process of converting data from one format or structure into another to facilitate analysis. Transformation may also be referred to data wrangling, or data munging. Depending on the tools you are using for analysis, you may need to change the format, structure or content of you data to make sure it is compatible and useful for analysis. You may need to: normalize the data, convert data types., aggregate the data or filter out irrelevant data.

Ethics & Compliance

For sensitive data, like medical information, researchers may need to follow additional legal or ethical guidelines. There are general guidelines as well as discipline-specific considerations. 

One universal consideration is called de-identification, which involves removing all identifiable information from data. 

When using data, it's important that the output be both useful and ethical. One way to do this is to follow five framing guidelines, called the Five C's. They’re a framework for implementing the golden rule for data.

  • Consent- Obtain consent from participants to collect and use data
  • Clarity- Be clear about what data you are collecting, what is going to be done with the data, and any consequences of how the data will be used
  • Consistency and trust- Maintain trust by consistently following through with protections (HIPAA, IRB, etc.) and communications
  • Control and transparency- Support participants control of their data by being transparent about how it's being used and stored
  • Consequences- Be honest about potential harms and be vigilant about unforeseen consequences