Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Data are observations or measurements recorded for the purpose of analysis. They are represented as text, numbers, or multimedia.
A dataset is a structured collection of data, generally associated with a unique body of work. They typically are structured as tables with many variables.
A database is an organized collection of data stored as multiple datasets.
Statistics are the results of data analysis or the interpretation of raw data, often presented in charts or as percentages. They show relationships between variables.
Establish the goals of your project and consider what kind of data you might need. List your parameters for searching, including:
- Geography (ex. United States by state, City-level data, etc.)
- Time frame
- Longitudinal or cross-sectional
Look at the methodology section in scholarly articles on a similar topic to see what data they used, which may be available to you. For example, you might search for articles that used sentiment analysis on newspapers.
Searching the internet for data is useful for finding government data and open access data. Be sure to evaluate what you find.
Add "data" to the Google search or use Google dataset search engine.
Ask yourself who would be most likely to generate data on the desired topic. Consider these likely places. Be sure to consider possible bias.
- Government agencies and departments
- Professional or trade associations
- Special interest groups
- Data collection agencies
Data Planet (SAGE) This link opens in a new window
Statistical data for the social sciences
ICPSR This link opens in a new window
Data archive of research in the social sciences
Locate datasets across over multiple data centers and repositories including: ICPSR, Harvard Dataverse, Data-Planet, Figshare, Dryad, the Center for Open Science, etc.
Best site for finding global data repositories. Search by keyword or browse by subject, content type, or country.
Similar to re3data, FAIRSharing is a catalog of searchable databases.
Search the dataset collections by domain, project or organization.
Authors can upload and share information: figures, datasets, media, papers, posters, presentations and filesets. Figshare is free but requires registration.
US Government's open data repository.
Find and request access to restricted microdata from U.S. federal statistical agencies
A community-driven place to share and discover data sets.
Search for specific datasets or browse through one of the virtual data archives.
Amazon Web Services (AWS) Public Datasets
Public datasets in a variety of fields, including machine learning, statistical, geospatial and environmental, and bioscience.
Checklist for Evaluating Data
- Who collected and published the statistics? What are their credentials?
- Can you find the original data source from which the statistics were created? Do the statistics look accurate based on the data? If it was a survey, can you read the survey questions? Is there technical documentation (A ReadMe file or a data dictionary?)
- What purpose was the data collected for?
- Why was the information collected? Was it collected to understand, to persuade, to sell a product?
- What data or statistics are being reported? What populations are included?
- Dataset accuracy: is there a sampling bias, a nonresponse bias, or an inadequate sample size?
- When was the data collected? Are the data current and updated?
- How have demographic categories changed over time? And how do these changes effect the data?