Research Guides: Search for Text Data Sets: Overview

Text Data Permissions

Each publisher has specific access rules and licensing requirements. They may require you to use specific tools, they may want to conduct the process themselves, or they may have limitations on the amount you are allowed to download. Most databases require additional licensing and funds.

Library licenses do not generally permit web scraping of database content. If you do not see a request form or API, contact us so we can determine licensing for that product.

SMU Libraries databases that allow text data mining
These databases let you download data without mediation by library staff or additional permissions. Use of these resources is governed by copyright law and individual license terms of use. By using these resources you are agreeing to the terms of use.

Acquire Data

If you need to acquire data, we can help you find and negotiate with vendors. We work on a case-by-case basis to purchase datasets for researchers and prioritize data with permanent rights and multi-user licenses. Request a data purchase.

Text & Data Sets Collection Policy

SMU Libraries will acquire text and data mining sets selectively, with the primary goals being to support faculty research and dissertation- and thesis-level work by graduate students. Priority will be given to the acquisition of data sets to which SMU retains permanent rights. For coursework and exploratory research, we recommend starting with resources that provide a web-based interface for exploration, such as the HathiTrust Research Center and JSTOR Data for Research.

Any independent data sets must include documentation which describes their internal format and meaning, and which can assist in the recovery of the data should changes to the University computing environment render it unreadable in the future.

Need more help?

Scholarship & Research Support
Consultations, support, and tools to help researchers.

APIs

An Application Programming Interface, or API, is a type of software interface that allows two or more computer programs to communicate with each other. They can be used to download large amounts of data from a website without requiring user input.

Many resources in this guide require use of an API to access data. Using an API does require some technical or programming knowledge.

LinkedIn Learning
To learn about APIs, search for "Introduction to Web APIs" course.

Web APIs for non-programmers
See here for brief overview of what is an API and what you can use them for.

Some content on this guide is adapted from other guides. Used under a Creative Commons Attribution (CC BY) 4.0 license.

Search for Text Data Sets: Overview

Text Data Permissions

Acquire Data

Text & Data Sets Collection Policy

Need more help?

APIs

Related Pages