It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
EEBO-TCP is a partnership with ProQuest and with more than 150 libraries to generate highly accurate, fully-searchable, SGML/XML-encoded texts corresponding to books from the Early English Books Online Database.
The European Data Portal harvests the metadata of Public Sector Information available on public data portals across European countries. Information regarding the provision of data and the benefits of re-using data is also included.
The Caselaw Access Project (“CAP”) expands public access to U.S. law.
Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law Library.
Text Mining and Library Databases
If you are thinking of basing a research project on data extracted from a library database, contact your subject librarian to discuss issues around permissions (copyright and licensing agreements), formats and fees.
In addition to copyright considerations, we must take into account what the database vendors’ own policies specify in regard to this type of use. When providing access to a database, the library enters into licensing agreements, which also dictate what types of data can be extracted and used.
This guide provides information about the collection of datasets at the Library of Congress, suggests tools for researchers, considers how datasets can be used for research, and provides guidance for locating datasets that may be sources for data science and machine learning projects.