• Home
  • About the book
  • About the author

MADE IN AMERICA

Notes on American life from American history.

Feeds:
Posts
Comments
« Immigrants and Historical Amnesia
New Media and Old Manifestations »

Novel Data: Promise and Perils

June 18, 2013 by Claude Fischer

“Big Data” and “Digital Humanities” are two of the hot terms – “with a bullet,” as they used to say on the pop music charts – in the academy these days. The terms label a variety of projects: preserving large archives by digitizing them and crunching vast amounts of raw data to address topics in the humanities, such as visualizing the economic interconnections of ancient China, mapping the lines of influence among abstract artists, and finding out who authored the anonymous Federalist papers (although that was answered 50 years ago here).

(source)

(source)

An article in the summer issue of Social Science History by Marc Engal is a nice example of both the kinds of discoveries that might be found and the kinds of pitfalls that might be encountered while tramping through the Big Data jungle. Engal seeks to describe in numbers the thematic evolution of the American novel by drawing on Google’s “Ngram” program. This is a publicly available resource that tallies the words that have appeared in millions of books from before 1800 through 2008. We’ll see what a fertile terrain of  findings it offers — and how one can easily get tripped up exploring them.

Ngram-ing

Google  has scanned millions of books and identified just about every word in every one of  those books. The user enters a word or phrase into the Ngram Viewer and it produces a graph. The graph shows how often that term appeared each year from before 1800 to after 2000, as a percentage of all the words in scanned books for that year. Jean-Baptiste Michel, et al., introduced the technology and its possibilities in Science in 2010. Here is an example of mine: In American books published around 1900, “gentleman” appeared about once in every 10,000 words, which was 130 times more often than “guy” appeared. In American books published around 2000, “gentleman” appeared much less often, only once every 90,000 words and only about one-third as often as “guy.” Maybe guys  have replaced gentlemen – at least in American books. Fun! And you can get much fancier than that (see here and here). I have dipped into Ngrams for this blog a couple of times (here, here) and for academic work, too.

In his article on the novel, Engal uses Ngram to track words that he believes indicate central themes in four eras of American fiction. This accounting, he suggests, shows more concretely than literary critics’ analyses the topical shifts from one period to another. In the “Sentimental Era, 1789-1860,” for example, words like seduce and faithful appeared much more often, proportionally, than they did later on. Religious words, too, were relatively most common then – although it may be of interest that “God”’s Ngram low point was over 70 years ago.

(An aside: One of the authors of the Science article announcing Ngram is noted psychologist Steven Pinker, a “New Atheist” writer. That may explain the snarky penultimate sentence of the article: “‘God’ is not dead but needs a new publicist.” Yet, between 1971, the year of Pinker’s college graduation, and 2008, “God” appeared, proportionally, 50% more often, “heaven” 170% more often, and “Jesus” 250% more often. God language seems to be making a comeback. Also: “atheism” peaked in 1810, dwindled away, got a brief spurt in the 1960s, and faded again. Maybe atheism needs a new publicist.)

Engal sees a rejection of bourgeois values during the “Post-Modern Era, 1960-on,” in, for instance, the explosive increase in four-letter words (readers can check that out for themselves as a homework assignment); the growing importance of women (“women” overtakes “men” around 1980; see graph below); and perhaps surprisingly, words having to do with children, such as nurturing and childhood, rose even as the birth rate was dropping in the 1960s and 1970s.

Ngram: Men v. Women

Ngram: Men v. Women

A key description of “Post-Modern” writing is narcissism. Psychologist Jean Twenge and colleagues also use Ngram to argue that individualism and self-absorption have been on the rise since 1960 (see here and critiques here and here). Engal shows that the words I and he were about equally frequent in American books until about the 1970s and then “I” rose, while “he” fell — a finding perhaps suggesting an increase in self-involvement, at least by American authors. This finding provides a good opportunity to see how the procedural details matter and can trip us up.

Engal looked at “I” and “he” but neglected “she” and “you.” Using additional tools available in Ngram, I found that around 1970 “I” appeared at the start of sentences – thus clearly as the subject of the action – about 70% as often as “He” or “She”; in 2000, that ratio was about the same. No change. But the ratio of “I” at the start of a sentence to “You” at the start of sentence dropped from around 5.5:1 to 3.5:1 – maybe a sign that you-absorption rather than I-absorption has been on the rise. Then, if one starts playing with “us” and “we” and “they”…. well, things get more complicated. Which brings us to the deeper issues.

Issues

Even as scholars continue exploring Big Data, we see some of the perils in premature conclusions. A few are apparent in this particular application of Ngrams to what its proponents call “cultunomics.”

There are procedural issues. For instance, which books become the basis of inferring something about the culture — or about the writers, or about the readers? The mix of books that were published changed sharply over time. Early in American history, books were expensive to make, expensive to buy, and readership was limited. Didactic books – e.g., how to keep accounts; farmers’ almanacs; religious books – were especially common. (Even a lot of sensationalist crime material appeared surrounded by religious text warning the reader against following in the criminal’s footsteps.) Later, publishers churned out cheap romance and adventure novels. The mix of words in the books changed accordingly.

Similarly, we can ask which books have survived to be scanned? Engal admits that many if not most of the earlier books in the Google collection are not even novels and many cheap novels never made it into the archives. (I could not confirm whether the novel shown above, The Texan Scout, appears in the Google database.) Google goes to university libraries for the books of yesteryear. How many of those libraries have, for example, collected and kept books from earlier times with crude obscenities? So, did the American novel change, did American society change, or did the mix of books published and surviving change?

Second: Words change, in various ways. Over time, there are simply more different words and phrases. Michel et al “estimated the number of words in the English lexicon as . . . 597,000 in 1950, and 1,022,000 in 2000.” Any particular word has to compete with a growing number of new ones in the denominator of the Ngram calculations. Also, new words crowd out the old. One study found that the apparent decline of Americans’ vocabulary skills reflects in part the fact that some of the words used in vocabulary tests have just become less common. Moreover, word meanings shift over time. The word spiritual used to be associated with the occult, as in spiritualism. The word text as a verb – as in, text me the address – is apparently so new that Ngram does not (yet) recognize it as a verb, instead counting all instances of “text,” even in 2008, as nouns.

At a yet deeper level, we have to think hard (as, for example, here) about the connection between the words writers use and American culture, whether it is the literary culture or the wider culture. Are book writers like weather vanes, showing in the words they use the direction society is moving, perhaps revealing the Zeitgeist or at least fluctuations in customer demand? Or, are writers actually agents of change, using words to move the culture? Or, are writers in a separate world of literature with its own shifting fads and fashions, little connected to the wider society?

The word vampire appeared 10 times as often, proportionally, in American books in 2008 as it did in 1950. What shall we make of that?

(Cross-posted on The Berkeley Blog  and on the Boston Review blog on June 20, 2013.)

Update (June 24, 2013):  Here is a later introductory essay on Digital Humanities at work in analyzing fiction.

Share this:

  • Share
  • Email
  • Facebook
  • Twitter
  • Reddit
  • Tumblr
  • LinkedIn

Like this:

Like Loading...

Related

Posted in Uncategorized | Tagged Big Data, Digital Humanities, Ngram |

  • Made in America: Now available in Paperback, on Kindle, and via Google eBook

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 448 other subscribers
  • Comment Back to:

    madeinamericathebook @ gmail.com
  • * 2010 winner, PROSE Award for U.S. History, American Association of Publishers.
    * "A shrewd, generous, convincing interpretation of American life" -- Publishers Weekly
    * "Masterful and rewarding . . . exactly the sort of grand and controversial narrative, exactly the bold test of old assumptions, that is needed to keep the study of American history alive and honest" -- Molly Worthen, New Republic Online
    * "... brave and ambitious new book ...." "Made in America sheds abundant light on the American past and helps us to understand how we arrived at our own historical moment, and who we are today." -- David M. Kennedy, Boston Review
    * "... this book ... already belongs to the prestigious line of works which decipher the singular character of America. It is in itself a mine of definitive information for all those who are interested in American society and its fate in modernity" [trans. from the French] -- Nicolas Duvoux, Sociologie

  • Pages

    • About the book
      • Corrections & Updates
    • About the author
  • Previous Posts

    • Why Red v. Blue Became Me v. You: Polarization, Part II
    • How Red v. Blue Became Me v. You: Polarization, Part I
    • The Covid Experience Reveals How Weird America Is
    • Americans Continue to Associate. For What Cause?
    • Slavery, Capitalism, and Reparations
    • Opening Day, 2022: Still Unresolved
    • No Peace, No Justice
    • The Right’s Reaction to Americans’ Leftward Shift: A Supreme Example
    • The Culture Has Moved Left… So the Right has Mobilized
    • Overcoming Distance and Embracing Place: Personal Ties in the Age of Persistent and Pervasive Communication
    • The Death Surge Before Covid-19: Who, What, and Why.
    • Women Rising: Life Stories from the Last Century
    • Whither Big Tech, or When Novelties Become (Regulated) Necessities
    • Opening Day, 2021: Baseball’s Crises
    • First Takes on the Election #2: What About the Polls?
    • The Political Census
    • First Takes on the Election: #1, What Happened?
    • Now for Something Different: Is Sex Wilting?
    • Explaining Trump: The Next-to-Last Time (I Hope)
    • Covid-19: Exceptionalism with a Vengeance
    • Is Left Cancel Culture Cancelling Left Culture?
    • BLM Protests: Surprisingly Successful… So Far. Why?
    • White Liberals’ Political Correctness Could Help Trump Get Re-Elected
    • Asteroidal Change or Glacial Change? Peering Over the Covid-19 Horizon
    • Opening Day Under Covid-19: Do Fans Matter?
    • COVID-19: Balancing Short-Term Solutions and Long-Term Effects. Are There Lessons from 1918?
    • Bernie: The Left is Still Waiting for the Proletariat Vote
    • AG Barr says attacks on religion are loosening the hounds of hell. Are they?
    • One Year Down, One to Go: Still Explaining Trump
    • Lead, Brains, and Behavior: Sociology Meets Biology
    • The Year’s Racial Flare-Ups: Signs of the Future or Signs of a Last Gasp?
    • [Bracket] Political Commentary [End Bracket]
    • Parental Love, Opportunity Markets, and Inequality
    • Brain Twisting, or How We Evolved
    • Opening Day, 2019: Data-Crunching, Inequality, and Baseball
    • Fixing Inequality: More Opportunity is Not the Answer
    • A Christian America? The Talk and the Walk
    • Shareholder Value: Law, Inequality, and the Doubting Justice
    • After the Election: More Polarization or Less?
    • Searching for the Authentic Self (… and Finding Trump)
    • The Politics-Religion Vortex Spins
    • Loneliness Epidemic: An End to the Story?
    • Get by with a Little Help from…
    • Feel-Good or Do-Good Politics
    • Do Americans Tolerate Zero Tolerance?
    • How Can Size of Community Still Matter?
    • Sending a Message by Pollster
    • Loneliness Scare Again… and Again… and…
    • Where Have You Gone, “Alienation”?
    • Opening Day, 2018: Politics, Race, and Baseball
    • Local Cultures
    • Chain Migration
    • Explaining Trump Some More
    • “Okie from Muskogee” a Half-Century On
    • Reversing American Voluntarism
    • National Character? A Reply to Stearns
    • Do We (Still) Value Family?
    • Is Marriage Over? For Whom?
    • Bannon, Brown, and the Identity Debate
    • The Great “American” (or is it New York?) Songbook
    • Is Health Care a Right?
    • Church Social
    • Inequality is about Security and Opportunity
    • Democracy in America, France, and “Hamilton”
    • Opening Day, 2017: Inequality on the Field, in the Stands
    • Voting for the Five Percent
    • More (on) Polarization
    • Americans and the Unassimilables
    • Explaining Trump
    • ***** Hiatus *****
    • The Great Settling Down
    • Election Reflection
    • Is the U.S. No Longer Religiously Exceptional?
    • Technology and Housework: Other Tasks for Mother?
    • Can Sociability Blunt Political Polarization?
    • The End of Good Work?
    • RFD, Media, and Democracy
    • Long Tails, Big Cities, Critical Masses
    • A Woman President?
    • Magazines: 19th Century Internet
    • Friends and “FB Friends”
    • Reversal of Fortune: American Cities
    • Does Education Work?
    • A Tony by Any Other Name…
    • Bernie, Hillary, and Historical Memory
    • Driving Cattle, Driving Exceptionalism
    • Build Bigger Wall? Get More Undocumented.
    • Opening Day 2016
    • Great Again
    • A Celebrity Strong Man
    • Survey Says . . .
    • Veterans and Suicides?
    • Odd Man In
    • The Pace, the Pace
    • A Street Divided
    • A History of Health and Health Inequalities
    • Why Diversity
    • Family Wages
    • What Happened When They Came?
    • The Grandma (and pa) Effect
    • Turkle, Times, Technology, Trauma–Yet Again
    • Just Deserts
    • Cell Phone Etiquette
    • Changing Hearts, Changing Matters, 2011-2015
    • American Self-Creation
    • The Immigrant-Crime Connection
    • Black by Choice?
    • The Marriage Contract
    • Attaining Adulthood
    • Left Out: Working-Class Kids
    • Life is a Stage, or Several
    • Family Farms vs. Americanism
    • Censor This, Political Correctness
    • Opening Day 2015
    • Science vs. Religion… or Science and Religion?
    • Building the Natural Market
    • Dressing Down
    • Untangling the Race Gap
    • Finding Public Relief
    • Surveying Change
    • Snap Decisions and Race
    • Holy-Day Exceptionalism
    • Where Does the “Don’t Shoot” Movement Go?
    • Reporting from America’s “Slums”
    • Racism as Mental Illness?
    • Which University?
    • The “Shared” Economy
    • Of Places Past
    • Long Story of the “Long Tail”
    • The Blameless Only
    • When Epidemic Hysteria Made Sense
    • Latest News on “No Religion”
    • Vocabulary Retrogression
    • American Way-Differentism: More a Club than a Family
    • Do Ideas Matter?
    • Alternative to Empathy
    • Women Dining
    • Too Much Social Science?
    • Ferguson and Social Media
    • Blame Who or What
    • “Libertarianism is Strange” Revisited
    • All Tech Is Social
    • How Ideas Make Themselves Matter
    • Women in Politics 1780-2014
    • Government Works
    • Telling Stories vs. Telling Data
    • Persistence of Race, 2014
    • Selfishness or Self-Awareness?
    • Virtuous Debt
    • Work Hours and the Pay Gap
    • Life in Public, Then and Now
    • Mourning 9/11 Victorian Style
    • A “Friends” Gripe
    • Bible Readings
    • Old Days, Fast Times
    • De-Democratizing?
    • Eco-Puritanism
    • Bring Me Your….
    • Thinking Inequality
    • Which Radical Ideas Come True?
    • Pastime – Opening Day 2014
    • Where Did “Hispanics” Come From?
    • Kitty Genovese: The Emblematic Story
    • Public Health
    • Exceptionalism Ending?
    • Risk-Sharing
    • Folktales of the Policy Elites
    • Male (Job) Insecurity
    • Libertarianism is Very Strange
    • Art and the Machined World
    • The Public Housing Experiment
    • The S-Curve of Cultural Change
    • Artful History
    • Inventing the Social Network
    • American Dream, Twisting
    • Deservingness
    • Place Matters More
    • Squirrely History
    • Atheist Evangelism: “Nothing New Under the Sun”
    • The Giving Season… and Era
    • Cell Phone Science
    • Boo! Americans and the Occult
    • You Call That a Shutdown?
    • More Inequality Updates
    • Political Responses to the Crash
    • Child Labors
    • Word Counts and What Counts
    • Loss of Economic Exceptionalism
    • Learning Sympathy
    • Respecting the Science
    • Economic Equality, 1774 and Beyond
    • Declaring You’re a “None”
    • Extremely Local
    • Robert Bellah
    • Inequality Hits Home
    • The Supreme Court Ducks Immutability
    • Postcard from Paris
    • America’s Religious Market
    • American-Made Ethnic-Americans
    • New Media and Old Manifestations
    • Novel Data: Promise and Perils
    • Immigrants and Historical Amnesia
    • Inequality Update
    • Psychologically Damaged
    • Race in the Eye of the Beholder
    • Getting Smarter
    • Suicide Boom?
    • Tweedledee-Tweedledum Nostalgia
    • Sexual License, Sexual Limits
    • Markets, Prices, and Justice
    • Immigration and Political Clout
    • Is the Gender Revolution Over?
    • Writerly Baseball – Opening Day 2013
    • Back Home
    • Catholic Schism
    • How Material Are We?
    • Unholy Alliance: Laissez Faire and the Church
    • The ’60s Turn 50
    • The Left’s Religion Problem
    • Paying Attention to the Kids
    • We’re # Last!
    • Risk Taking
    • The Elderly and Their Children
    • Guns
    • A Modern “Antebellum Puzzle”?
    • Makes One Anxious
    • Psychological Labeling … and Enabling?
    • The Giving Nation? Philanthropy’s Problems
    • Religion, Politics, and the Sunday Mail
    • The Happiness Boom
    • What Americans Have Been Thinking
    • The Verdict on Class and Voting
    • Panderocracy
    • 9/11 Reaction and Resilience
    • A Cost of Inequality: Growth
    • Obama’s Racial Penalty
    • Choose Your Choice
    • To the Poorhouse
    • The Polarizing Political Paradox Redux
    • The 47% Charge in U.S. History
    • The Survey Crisis
    • Competitive Intelligence
    • Execution Songs
    • Spiritual and/or Religious
    • “Who Built That?”: Chance and History
    • Meeting, Mating, and the Web
    • Live Long and Prosper — and Plan
    • Voting Violence
    • Sex and the American Car
    • The Assets Gap
    • Differences Under the Differences
    • Why Americans Don’t Vacation
    • Virtuous Voting
    • Clothes Make the Common Man
    • Driving Blind
    • Geography of Inequality
    • Slavery’s Heavy Hand
    • Gay Vows
    • Explaining Poverty (Again)
    • Out- and Insourcing
    • Still Under God
    • The Loneliness Scare is Back
    • Sunday Pleasures, Private Faith
    • Between Dole and Market
    • Opening Day 2012 – Worldwide
    • Tolerating Americans
    • What’s the Common in the Common Good?
    • End Times and Presidents
    • The Abortion Puzzle
    • The Army of Black Liberation
    • The South Has Risen
    • Can’t Believe It
    • Marrying — Up, Down, Sideways
    • Occupy 2012: Another 1968?
    • Over-Impacted
    • How Bad is “European”?
    • Unique, Sovereign, American
    • The Working Class’s Party
    • Reconstructing Memory
    • Make-Your-Own Religion
    • Consume This
    • Self-Absorbed: Emerson & Thoreau
    • What Works? Votes.
    • Stumbling in the Dark
    • More on Occupy
    • Occupy! Now What?
    • Lost Children
    • Cheerful Yanks?
    • Tolerating Ambiguity
    • New News, Old News
    • Unequal Denial
    • Timing is (Not?) Everything
    • Breastfeeding History
    • What’s a Life Worth?
    • Homesick Blues
    • Summer Break
    • Spinsters No More
    • Missing Tramps
    • City Crime; Country Crime
    • Living Togetherness
    • Naturally Clean
    • Women Graduating
    • Home Owning Dreams
    • Technology and Fundamentals
    • Protected Class
    • Faith Endures
    • American Exceptionalism
    • Buying a Head Start
    • A. Lincoln, Socialist?
    • Opening Day 2011
    • Shaken but Secure
    • Jobs Go and Come
    • Heavy Hand
    • The Big Change
    • American Ties (III)
    • Money and Character
    • Going Out–or Home?
    • Degree Inequality
    • American Ties (II)
    • Ugly or Needy
    • 18th-Century Twitterfeed
    • American Ties (I)
    • Grammar Rules
    • Christmas Struggle
    • Ancestor Worship
    • Was Slavery, Is Slavery
    • Hanukkah or Vanish?
    • Pilgrims, Puritans, Americans?
    • Return on Investment
    • Solidarity, Soldiers, and Baseball
    • Win Stay, Lose Change
    • Why Vote?
    • We’re All Geniuses
    • Caring More or Less
    • Life Begins
    • Equal Visions
    • No Dinner Invitations?
    • Depressing Comparisons
    • Labor’s Laboring Efforts
    • Multiculturalism Lite and Right
    • Who Has Your Back
    • A Natural Romance
    • Alone or Lonely?
    • Sentimental Journey
    • LeBron & the 10th
    • We’re #1 !
    • A Fragmenting America? – Pt. 2
    • A Fragmenting America? – Pt. 1
    • Fighting for the 4th
    • Gentrified Memories
    • Juneteenth: Race? Slavery?
    • Boomer Blues
    • No Longer the Tall American
    • A Crime Puzzle
    • Memorial-izing Day
    • Angry Old White Men
    • Sisters Take the Streets
    • Brooks, Policy, and History
    • Tongue-Tied to America
    • Happiness Happy
    • Inventing Friendship
    • American Individualism – Really?
    • Tax Day: The Government-Enterprise System
    • Opening Day 2010
    • Did “Consumerism” Blow Up the Economy?
    • A Christian America? What History Shows
    • The Myth that Never Moves
    • Good Health, Long Life, and Big Government
    • Announcing the “Made in America” Site

Blog at WordPress.com.

WPThemes.


  • Follow Following
    • MADE IN AMERICA
    • Join 448 other followers
    • Already have a WordPress.com account? Log in now.
    • MADE IN AMERICA
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d bloggers like this: