I have always been fascinated by word clouds because they give you a quick snapshot of the topics of any given document they represent. Therefore, I was extremely excited when I got to use Voyant, a free visualization software that allows you to do just that, turn your documents into a word cloud. And I was not disappointed. Voyant provides an easy way to convert a text file or document collection, known as corpus, into a visualization cloud. In addition, this digital tool shows users word counts, trend graphics, a summary of the corpus by document, and the context surrounding the most frequent or distinctive words.  In addition, it offers a way to avoid common words, such as “the,” there,” “are,” etc. Users can make adjustments to make the word cloud more relevant. Word clouds come to reinforce the famous saying: “an image says more than a thousand words.” Paradoxically, this image is conformed exclusively by words.

Voyant came to be a valuable tool to approach the WPA Slave Narratives, a collection of documents comprised of a series of narratives that American writers collected in the 1930s after interviewing formerly enslaved people. These writers collected testimony from more than 2,300 people across 17 states, totaling more than 9,500 pages. The activity consisted in sorting through these narratives using digital tools to get a sense of the topics of those interviews.

The first step was to go to the website: https://voyant-tools.org/

Fig. 1 Voyant Initial Screen

Here we uploaded the interview files, the 17 files of each state involved in the narratives program.

Clicking the “Reveal” button took me to a screen divided into five sections:

  1. Cirrus shows the word frequency for the corpus. By default, a word cloud displays every word in the document, including common words that appear in every text. The panel shows different options in its right upper corner, such as export files and edit words. All the sections show this panel.
  • Reader shows documents as colored blocks. The wider the block, the more occurrences of that word in the document. When you select a word, a line graph appears that shows the frequency of that word across each document.
  • Trends displays line graph showing the total frequency of the selected word in each document. To see the count of each term, click on Document Frequencies in the menu bar.
  • Summary displays information on each document, as well as the corpus
  • Contexts shows all of the appearances of a selected word within the text.
Fig. 2 First Reveal

As you can see, the initial word cloud does not make much sense. The main reason for this is that the interviews’ transcripts have typographic and orthographic errors and contain regional and obsolete words. However, Cirrus provides a way to avoid these words by entering them in a “stop words” list by clicking the “options” button in the upper right panel and editing the existing list.

Fig 3. Cirrus Options Panel

After editing the list by adding the stop words provided in the activity, the updated word cloud showed the most relevant topics more accurately.

Fig. 4 Cirrus Word Cloud after adding stop words

I then proceeded to explore the other Voyant sections by selecting two significative words from the cloud, “white” and “slaves,” to see the occurrence of those words within individual documents as well the entire corpus.

Fig. 5 Trends graph showing “white” and “slaves” word occurrence acrros the corpus
Fig. 6 The word “white” in context in the Alabama file

Voyant provides excellent visualization tools to approach “big data” without going through the time-consuming close reading process. However, we have to keep in mind that, to have accurate results, you have to make sure that your files are as much error-free as they can be. Here the famous computer science saying “garbage in, garbage out” could not be more accurate.

Leave a comment

Your email address will not be published. Required fields are marked *