Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Search for Data: Overview

Definitions

Data are observations or measurements recorded for the purpose of analysis. They are represented as text, numbers, or multimedia.

A dataset is a structured collection of data, generally associated with a unique body of work. They typically are structured as tables with many variables.

A database is an organized collection of data stored as multiple datasets.

Statistics are the results of data analysis or the interpretation of raw data, often presented in charts or as percentages. They show relationships between variables.

Search Strategies

Plan

Establish the goals of your project and consider what kind of data you might need. List your parameters for searching, including:

  • Geography (ex. United States by state, City-level data, etc.)
  • Time frame
  • Longitudinal or cross-sectional
  • Demographics

Find Context

Look at the methodology section in scholarly articles on a similar topic to see what data they used, which may be available to you. For example, you might search for articles that used sentiment analysis on newspapers.

Internet Searching

Searching the internet for data is useful for finding government data and open access data. Be sure to evaluate what you find.

Add "data" to the Google search or use Google dataset search engine. 

Recommended Resources

Ask yourself who would be most likely to generate data on the desired topic. Consider these likely places. Be sure to consider possible bias.

  • Government agencies and departments
  • Professional or trade associations 
  • Special interest groups
  • Data collection agencies

Need More Help?

Checklist for Evaluating Data

Source

  • Who collected and published the statistics? What are their credentials?
  • Can you find the original data source from which the statistics were created? Do the statistics look accurate based on the data? If it was a survey, can you read the survey questions? Is there technical documentation (A ReadMe file or a data dictionary?)

Intention 

  • What purpose was the data collected for?
  • Why was the information collected? Was it collected to understand, to persuade, to sell a product?

Representation/ Omissions 

  • What data or statistics are being reported? What populations are included?
  • Dataset accuracy: is there a sampling bias, a nonresponse bias, or an inadequate sample size?

Dates

  • When was the data collected? Are the data current and updated?

Categories 

  • How have demographic categories changed over time? And how do these changes effect the data?