Skip to Main Content

Text Data Mining: TDM Tools


Voyant is a tool that allows for lightweight text analytics.


Constellate is a platform for learning and performing text analysis, building datasets, and sharing analytics course materials from JSTOR and Portico.


The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. this is most useful for handwritten texts that predate the printing press.

Topic Modeling

"Topic modeling is a form of text mining, a way of identifying patterns in a corpus. You take your corpus and run it through a tool which groups words across the corpus into 'topics'."


A concordance is a listing of each word in a text (corpus) and the words that occur near it. "Key Word In Context" (KWIC) is a type of concordance.


A sequence of n items from a given sample of text or speech.

Text analysis glossary


An Application Programming Interface, or API, is a software interface that allows two or more computer programs to communicate. They can be used to download large amounts of data from a website without requiring user input. Using an API requires some technical or programming knowledge.

Data Cleaning