Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Finding Text Data Sets: Overview

Text as Data

This guide has information on how to acquire textual data for computational text analysis from various resources.

SMU libraries has contract agreements with some scholarly publishing vendors to conduct text mining. Each publisher has specific access rules written into their license (They may require you use specific tools, or they may want to conduct the process) . 

  • Each library database vendor has specific polices around TDM.:
    • some allow the user to initiate mass downloading
    • some allow the user to initiate limited downloading
    • some allow it only under specific circumstances (additional agreements or funds are required)
    • some do not allow it at all.  
      • Library licenses do not generally permit web scraping of database content. If you do not see a request form or API, please contact your librarian.

If you have a question about a vendor not listed on this guide, or otherwise need assistance accessing data, please contact the Scholarship & Research Studio at srstudio@smu.edu.

Guidelines for Using Databases

Use of these resources is governed by copyright law and individual license terms of use.

By using these resources you are agreeing to the terms of use.

Need More Help?

Scholarship and Research Studio (SRS)

 API

An Application Programming Interface, or API, is basically an interface that allows applications to talk to one another. They can be used in a variety of ways, including downloading large amounts of data from a website without requiring user input. In this way, a researcher can even download the entire contents of a digital library hands-free. Using an API does require some technical or programming knowledge. Some, but not all, resources in this guide require use of an API to access data.

Some content on this guide adapted from library guides.Used under a Creative Commons Attribution (CC BY) 4.0 license.