Geospatial Librarian

information, all over the place!

(text mining) – (hyperbole)

Wordcloud showing word usage in 30 short stories by A.Chekhov

By Orlovma – Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=104697657

November 13, 2023

I’ve decided to explore the development and use of text mining in digital history, tracing back the observations, descriptions, warnings, and guidance of scholars like Jo Guldi in The Dangerous Art of Text Mining, (2023) and Ian Milligan in History in the Age of Abundance? (2019), to Cameron Blevins in his 2019 digital project Women and Federal Officeholding in the Late Nineteenth Century U.S., as compared to Blevins’ own earlier criticism regarding argument in the digital humanities, Digital History’s Perpetual Future Tense (2018), in which he singles out his own earlier text mining project for finding only what one might have expected to learn about eighteenth century New England gardening in his article Topic Modeling Martha Ballard’s Diary (2010):

“Lost amidst all of this attention, however, was the fact that there was little new or revelatory in my writing about the past itself. It made no new interpretations about women’s history or colonial New England or the history of medicine. It largely showed us results that we already knew—like the fact that people in Maine did not plant beans in January—or visualized patterns that had already been analyzed in far richer detail by historian Laurel Ulrich in A Midwife’s Tale. Outside of a few scattered and underdeveloped sentences, interpretive historical arguments were almost entirely absent from the blog post.”

In the ten-twelve years from the publication of Topic Modeling Martha Ballard’s Diary to the publication of the American History Association’s Digital History and Argument (2017) to Jo Guldi’s The Dangerous Art of Text Mining (2023) the historical profession has gone from being self-critical about being too timid in making scholarly arguments based in text and data mining, to regarding the relative ease and accessibility of text mining as dangerous. How and why did this change happen?

Guldi, especially expresses concern that her undergraduate economics and data science students seem lacking in their ability to make inferences from data and to question their own assumptions about what might be missing vis-à-vis historical context; in this case, her example regards nineteenth century British Parliamentarians’ views of women. But haven’t there always been inept undergraduates? Especially in a required class; is this really as serious a problem as she wants us to think: that it has become too easy for people with a minimal historical or statistics background to draw superficial and even wrong conclusions from their data?

Blevins, on the other hand, sees the problem from the opposite direction, calling out what he sees as a type of timidity on the part of more capable scholars when he goes seeking causes for the lack of argument in digital history in 2018: in the part of his essay devoted to “Argument and Genealogy,” he attributes this fear directly to the bogeyman of all our readings, Fogel and Engerman’s Time on the Cross:

“Digital historians are much more eager to distance themselves from the mistakes of their quantitative predecessors than they are to proudly carry forward their methodological mantle. This is an unfortunate part of quantitative history’s legacy: a fear of argumentative overreach based on numerical evidence.”

Ian Milligan and Blevins both note that it’s the proliferation of records online that poses great challenges for historians, in terms of preservation and because, as Milligan points out, the history of the last decade of the twentieth century going forward will not be accessible at all without digital tools.

For the digital demo component of my project, I intend to learn the rudiments of text mining by utilizing Proquest’s TDM Studio to explore several full text datasets and publications, possibly including two historically black newspapers, the Baltimore Afro-American, and the Chicago Defender, as well as the NAACP papers, held at the Library of Congress.