Data must be processed before it can be analyzed. This can involve verifying, organizing, transforming, integrating, or extracting the data from its current form. The process phase is where problems with the data are identified and corrected.
Documenting your processing methods is very important so that you can reuse your data, as well as allow it to be used by others. Data that is well documented is identifiable and usable, and your research results are more likely to be replicated and verified.
Analysis of data helps you to describe facts, detect patterns, develop explanations, and test hypotheses. It can also mean reviewing and evaluating whether data that has been created or acquired can be saved for long-term access and preservation. The process includes data quality assurance, statistical data analysis, modeling, and interpretation of analysis results
Different techniques are used for data analysis, depending on the field of research. Some institutions use High Performance Computing systems to analyze huge volumes of data.
Data mining and data visualization are important techniques in this process, and there are various tools that are used. R and Python are among the most popular languages used for data analysis.
It is essential that data is properly documented, for it to be properly understood, reused and cited. Metadata is the term used to document data. Basic information that needs to be recorded includes:
Metadata can also be created at the project level (broader) and at the dataset level (more narrow). Examples of these are:
Project-level Documentation - the “who, what, where, when, how and why” of the dataset, context for understanding why the data were collected and how data were used.
Dataset documentation - more detail about the data and the dataset