Feeds:
Posts
Comments

Posts Tagged ‘Ngram’

Word Counts and What Counts

A post back in June on “digital humanities” discussed the promises and perils of turning to “Big Data” to answer questions about American history. I focused there on a study that looked specifically at the history of American literature. A paper in Psychological Science this August uses the same tool – the Ngram function in Google that counts a word in the company’s sample of over 1 million books ever published in U.S. and calculates the percentage of all words it represents – to make broad claims about historical changes in American character.

Patricia Greenfield, an eminent UCLA psychologist who has conducted terrific research on cognitive development, changes in cognitive skills, and cultural differences in thinking, much of it based on her work in rural Mexico (mentioned in this 2012 post), uses Ngram to argue that there was a major shift in America over two centuries from a communal to a self-centered culture. Ngram word counts in American books from 1800 to 2000 show, she claims, that Americans changed from being group-oriented and sharing to being individualistic and self-absorbed. Maybe. But there are a lot of issues to consider before accepting the claim. These concerns show, once again, the pitfalls in using such statistical methods ahistorically.

(more…)

Read Full Post »

Big Data” and “Digital Humanities” are two of the hot terms – “with a bullet,” as they used to say on the pop music charts – in the academy these days. The terms label a variety of projects: preserving large archives by digitizing them and crunching vast amounts of raw data to address topics in the humanities, such as visualizing the economic interconnections of ancient China, mapping the lines of influence among abstract artists, and finding out who authored the anonymous Federalist papers (although that was answered 50 years ago here).

An article in the summer issue of Social Science History by Marc Engal is a nice example of both the kinds of discoveries that might be found and the kinds of pitfalls that might be encountered while tramping through the Big Data jungle. Engal seeks to describe in numbers the thematic evolution of the American novel by drawing on Google’s “Ngram” program. This is a publicly available resource that tallies the words that have appeared in millions of books from before 1800 through 2008. We’ll see what a fertile terrain of  findings it offers — and how one can easily get tripped up exploring them.

(more…)

Read Full Post »