Processing data can include cleaning, wrangling, formatting, compression, encryption, and creating summary data.
Cleaning is the process of removing errors in datasets. Excel can be used for basic cleaning, Using a specialized tool like OpenRefine allows you to save your steps and reproduce them later.
Data transformation is the process of converting data from one format or structure into another to facilitate analysis. Transformation may also be referred to data wrangling, or data munging. Depending on the tools you are using for analysis, you may need to change the format, structure or content of you data to make sure it is compatible and useful for analysis. You may need to: normalize the data, convert data types., aggregate the data or filter out irrelevant data.
For sensitive data, like medical information, researchers may need to follow additional legal or ethical guidelines. There are general guidelines as well as discipline-specific considerations.
One universal consideration is called de-identification, which involves removing all identifiable information from data.
When using data, it's important that the output be both useful and ethical. One way to do this is to follow five framing guidelines, called the Five C's. They’re a framework for implementing the golden rule for data.