Skip to Main Content

Search for Text Data Sets: Overview

Text Data Permissions

Each publisher has specific access rules and licensing requirements. They may require you to use specific tools, they may want to conduct the process themselves, or they may have limitations on the amount you are allowed to download. Most databases require additional licensing and funds.

Library licenses do not generally permit web scraping of database content. If you do not see a request form or API, contact us so we can determine licensing for that product. 

Acquire Data

If you need to acquire data, we can help you find and negotiate with vendors. We work on a case-by-case basis to purchase datasets for researchers and prioritize data with permanent rights and multi-user licenses. Request a data purchase.

Text & Data Sets Collection Policy

SMU Libraries will acquire text and data mining sets selectively, with the primary goals being to support faculty research and dissertation- and thesis-level work by graduate students. Priority will be given to the acquisition of data sets to which SMU retains permanent rights. For coursework and exploratory research, we recommend starting with resources that provide a web-based interface for exploration, such as the HathiTrust Research Center and JSTOR Data for Research.

Any independent data sets must include documentation which describes their internal format and meaning, and which can assist in the recovery of the data should changes to the University computing environment render it unreadable in the future.

Need more help?


An Application Programming Interface, or API, is a type of software interface that allows two or more computer programs to communicate with each other. They can be used to download large amounts of data from a website without requiring user input.

Many resources in this guide require use of an API to access data. Using an API does require some technical or programming knowledge.