Shayne’s project | American Literature in the World

Shayne McGregor

Professor Wai Chee Dimock

Performing American Literature

06 March 2017

Death in Late Nineteenth-Century American Literature

My project aims to examine the recurrence of death in American novels published between the years of 1852 and 1875. After having read Moby Dick, The Narrative of Arthur Gordon Pym of Nantucket, and Uncle Tom’s Cabin, I’m interested in what seems to be covert obsession with death among American literary artists of the nineteenth century. Using one of the various open-source programs on the web, I want to track the death as a topic in American novels published in this twenty-year window. To this end, I can potentially text mine for specific words. Most obviously, these words would include “death,” and “expire,” and phrases such as “passed on,” but because of the nature of art, I can not limit myself to such narrow thinking about the ways authors were referencing death in the nineteenth century, and, therefore, I will, over the course of the text two months, experiment with a variety text-mining tools in an effort to accumulate data sets that are as accurate as possible. Once, I’ve gathered this data, my plan is to then visualize the data using a graphing application which will clearly highlight the years I’m charting (with the civil war years color coded) and their respective data.

While my date range may seem arbitrary, this project ultimately desires to serve as an initial proof to a broader argument that I believe can be applied to the long nineteenth century. The larger argument believes that literary authors’ preoccupation with death has shifted over the course of the American nineteenth century. The goal of this type of big data analysis is to 1) reveal the significance of death in nineteenth-century American fiction and 2) push scholars to consider further questions related to death and the American nineteenth century given the results of the data. For the purposes of this class, my project will begin in 1852 with the publication of Uncle Tom’s Cabin, extend through the years prior, during, and after the Civil War up till the year 1875. The Civil War, then, works as a pivot point from which I will analyze the ten years preceding the Civil War and the ten years following its conclusion. Examining these years will reveal in microcosm what I hope will be transferable to the larger nineteenth century.

Literature Review

As a digital humanities project, the term “literature” takes on a different texture. While conventional literature such as Franco Moretti’s Distant Reading (2013) and Gareth Cook’s The Best American Infographics 2016 will prove useful in helping me think about word placement in text and data visualization respectively, the first half of this review will focus on the various online tools I can potentially utilize for this project, the most obvious of which is Google’s Ngram Viewer.

The Ngram Viewer is a free text-mining tool that allows users to search for specific words within the corpus that is Google Books. The system even allows users to bookend their searches with specific dates (eg. 1800-1900). Additionally, the Ngram Viewer allows users to narrow their search even more by selection specific corpora within Google Books. While seemingly useful given my project’s interest in the recurrence of death, which might most easily be tracked by searching for the word itself, there are multiple issues that arise when this system attempts to tackle the aims of my project. Namely, the system’s search abilities are only as good as the user’s ability to select the perfect search term. I could search the word “death” and all its tense variants within the Google corpus of books, but, in fact, there may be a better term that gets at the telos of my project than death. And if I don’t know what that term is, Google Ngram becomes significantly less useful. Furthermore, while Google Books has an impressive corpus of books, the categories with which they group their books does not allow me to focus on American fiction. Their two closest categories, “American English” and “British Fiction,” fall frustratingly short of the type of texts I am to examine.

Two other programs that come to mind when I think about my project is Topicgraph and Voyant. Voyant and Topicgraph are useful in that both programs can text mine without any added input from the user. What this allows is a more comprehensive survey of any text I decide to examine. What this comprehensive survey looks like is essentially a collection of most used words ranging from most common to least common. Both programs also have a wide assortment of tools that will offer any user a closer inspection of their texts. Though, for the purposes of this project, I feel compelled to chose Topicgraph over Voyant because Voyant, like Google’s Ngram, searches solely for the most commonly used words without in any texts without regard to how those words interact with each other. Topicgraph, on the other hand, groups words together to create a list of key subjects. The latter utility is better for the purposes of this project because I can not account for the various manners in which authors are engaging with death and, therefore, will require a tool that is able to both scan entire works of literature and categorize the material therein without having to read all the literature published between 1852 and 1875.

The second portion of this project requires that I graph my collected data in such a way that best captures the results of my study. While Topicgraph automatically graphs the data collected from any uploaded text, there is no way to place graph data in conversation with other graph data. In other words, there’s no way for me to place all my data on the same X and Y axis with Topicgraph. Therefore, I need to transfer my data from Topicgraph to another program. Nor are their graphs user friendly. It is not immediately obvious what one’s is looking at when one first looks at a topicgraph because of both its small size and its lack of a defined Y axis.

To graph my data in a way that is both visually appealing and clear, I’m a considering both RAWGraphs and Tableau. Both RAWGraphs and Tableau will prove useful because of their abilities to visualize data in a variety of interesting ways. RawGraphs is a free program that lets users drag-and-drop their data into a graph of their choice. While multiple graphs can’t be shown at once, the type of graph used to visualize the data can be changed on the fly which might prove useful because it would allow me to revisualize if necessary. This might come in handy when delivering a presentation. Tableau operates much in the same way as RAWGraphs. Though, Tableau seems to offer more in terms of graphing options. However, Tableau’s interface comes off as a bit more intimidating. Using Tableau will require some training. For now I’ll use RawGraphs because of its incredibly low learning curve while at the same time attend workshops on Tableau at Yale’s DH lab.

The second half of this literary review will explore the relevant digital humanities projects that are either similarly interested in death or are utilizing a similar methodology. “Victorian Eyes” is one example of a DH project that is similarly interested in text mining. With the help of text-mining tools, researchers at the University of Wisconsin-Madison were able to determine which body parts in Mary Shelly’s Frankenstein appeared the most frequently. Their data determined that “eyes” appeared the most frequently followed by “mouth” (Spoon). With this information, the researchers “graphed” their findings by creating art pieces that would illustrate their findings. While perhaps not critical to their project, I am curious about the quality of their quantitative approach. While eyes did appear in the text more frequently than other parts of the body, I wonder if the context in which those words appear matter, especially given that their guiding question, ‘What makes a human ‘human’?’ Is being human based on biology alone — the sum of human body parts — or is it something else? (qtd. Spoon), is preoccupied with the body as a collection of actual body parts. Is every “eye” that appears in Frankenstein a reference to the body part? Does it matter if the word is being used metaphorically? Maybe it doesn’t matter generally because, despite where and how it appears, there is something to be said about the frequency with which a word appears in a text, regardless of its context. However, I feel as if their findings could have only been strengthened if they were able to draw a connection between “eyes” as a signifier and “eyes” as a signified. My project, because it is interested in topics rather than in particular keywords and phrases. is largely quantitative but the very nature of parsing words and phrases and placing them in larger categories is I believe inherently qualitative (because it’s not just frequency. It’s also the kinds of reference the word or phrase is making relative to every other word and phrase). Additionally, part of what I find the most compelling about this is their usage of art to visualize their findings. I am intrigued at the possibility of visualizing my data by way of a piece of art. I think it’s effective in that it creates a clearer picture as to both the kind of research that’s being done and how that research influences the text.

Dr. Martin Gilserman’s teXtRays project at Rutgers is a DH program that lists and charts representations of the body in one hundred novel written English published between 1719 and 1997. Like, the researchers at UW-Madison, Dr. Gilserman is looking for specific words and key phrases, but, like me, Gilserman is interested in charting the development of that change over time. To this end, Dr. Gilserman creates semantic webs that chart how each book is representing the body, and how representations of the body in English novels shifted over time. My project looks to pursue the same kind of research as Dr.Gilserman. However, rather than look at the body, my project aims to look at death. I’m interested in seeing how Dr. Gilserman accumulated his data. Earlier I discussed the possibility of using Topicgraph, but, because accuracy, is so important to this project. I am open to the idea of using a different text-mining tool. Dr. Gilserman’s teXtRays provides productive insight into literary representations of the body. I question, however, the ease with which users can synthesize the information on his semantic webs given that the finalized chart includes all one hundred novels and, after you include all the connecting lines, can look pretty busy with all that information. Ideally, I would like to produce a visualization using that takes cues from Dr. Gilserman’s project, particularly its comprehensiveness, and the “Victorian Eyes” project, particularly its clarity.

Limitations

This project has a couple limitations, and the biggest one is the means by which this program aims to collect its data. Without knowing how Dr. Gilserman collected his data, I’m stuck using the next best thing which is Topicgraph (a program that is still in the beta phase of its development). This project is also limited by its largely quantitative approach. This analytical strategy engages with literature with a bluntness that disavows the nuances embedded in any given literary text. While my project is largely concerned with frequency and quantity, it’s important to point out that these recurrences do not necessarily denote literary significance.

Links and Sources

Gilserman, Martin. “Dr. Martin Gilserman-The representation of the body in the novel.” Youtube,

uploaded by Rutgers, 27 May 2009.

Krulwich, Robert, and Gareth Cook. The Best American Infographics 2016. Boston: Mariner ,

Houghton Mifflin Harcourt, 2016. Print.

Moretti, Franco. Distant Reading. London: Verso, 2013. Print.

Spoon, Marianne. “Victorian Eyes Exhibit Draws from Statistics and Art to Experience Literature.”

WID. UW-Madison, 31 Oct. 2013. Web. 4 Mar. 2017.

https://labs.jstor.org/topicgraph/

http://rawgraphs.io/

https://www.tableau.com

https://voyant-tools.org/

https://books.google.com/ngrams