The New York Times published a very interesting update on how humanists are applying big data approaches to their scholarship (see The New York Times, January 27, 2013, p. B3). The article begins with a description of research by Matthew L. Jockers at the University of Nebraska-Lincoln. He conducted word- and phrase-level textual analysis on thousands of novels, enabling longer-term patterns to emerge in how authors use words and find inspiration. This kind of textual analysis revealed the impact of a few major authors on many others, and identified the outsized impact of Jane Austen and Sir Walter Scott.
Jockers said that “Traditionally, literary history was done by studying a relative handful of texts…what this technology does is let you see the big picture–the context in which a writer worked–on a scale we’ve never seen before.”
The implications for comparative literature and other fields that bump up against disciplinary boundaries are compelling.This kind of data analysis has long been the domain of sociologists, linguists and other social scientists, but it is increasingly finding a home in the humanities.
Steve Lohr, the Times article’s author, provides a number of other examples. One of my favorites is the research conducted by Jean Baptiste Michel and Erez Lieberman Aiden, who are based at Harvard. They utilized Google Books’ graph utility–open to the public–to chart the evolution of word use over long periods of time. One interesting example: for centuries, the references to “men” vastly outnumbered references to “women,” but in 1985 references to women began to lead references to men (Betty Friedan, are you there?)
Studying literature on this scale is indicative of the power and potential of big data to revolutionize how scholarship is done. Indeed, the availability of useful data is subtly transforming humanist scholars to the point that interested humanists are gaining a new identity as computer programmers.
Lohr also points out that quantitative methods are most effective when experts with deep knowledge of the subject matter guide the analysis, and even second-guess the algorithms.
What is new and distinctive is the ability to ramp up the study of a few texts to a few hundred text. The trick will be to keep the “humanity” in humanism.
I also draw considerable inspiration from the growing awareness that pattern recognition–a daily exercise for information professionals–is gaining new attention as part of the research process in general.
Perhaps it’s time for some of us to collaborate as co-principal investigators….