The words that give away a writer’s gender, in classic works of literature

Overwhelmingly “he.”
Overwhelmingly “he.”
Image: AP/Invision/Chris Pizzello
By
We may earn a commission from links on this page.

Are books getting dumber? Does good writing really use fewer -ly adverbs? Is there really such a thing as “how women write”?

In a new book, data journalist Ben Blatt uses statistics to analyze literature, testing large sets of classic literature, modern bestsellers, and contemporary literary fiction for common questions and assumptions. Nabokov’s Favorite Word Is Mauve came out March 14 from Simon & Schuster. While its results should be taken as seriously as the questions (that is to say, lightly, and with salt), many offer interesting insights and make for fun reading themselves.

To find out which words indicated female and male authorship in classic literature, Blatt used methods similar to those used in 2013 by University of Pennsylvania computer scientists to analyze Facebook statuses. He took a list of words found in classic literature and for each one created a ratio: how many female authors used them more than the average, over how many men used them less than average, or vice versa. This generated a list of the words with the biggest imbalance between the two genders.

One of the most commonly used words by female authors of classic lit, that was rarely ever used by male ones, is “curls.” Female authors hardly ever used the word “rear,” though male authors did.

Blatt’s is a small sample—50 books by female authors and 50 by male authors from this list of best English language fiction from the 20th century—but they are still indicators of an author’s gender for literature written today, says Blatt. So if you come across a bestselling book today with the words “civil” and “enemy,” chances are it’s by a male author.

In a follow-up analysis, Blatt looked at the same 100 works of classic literature, and looked at the ratio of the words “he” and “she.” The results showed that among those books, male authors were far more likely to include virtually no mentions of “she,” than the reverse.

Tolkien’s The Hobbit, the beloved adventure story about a hobbit, a dragon, and a merry band of 13 male dwarves, is perhaps not surprisingly male dominated. But by Blatt’s measure, it’s literally 99.9% male: Tolkien uses the word “she” exactly one time, to refer to Bilbo Baggins’s mother.

The Old Man and the Sea, about a man, his apprentice, and a fish, also contains 1% ratio of “she,” as does Lord of the Flies, about schoolboys stranded on an island.

On the other end of the chart, the most extreme cases of “she” versus ”he,” are indeed mostly by women authors. But they still mention “he” one-fifth of the time. The Joy Luck Club, about Chinese American immigrant women, and The House of Mirth, about a woman in search of a suitable husband, each contain 29% mentions of “he” (and 71% “she”). The male-authored book with the highest proportion of “she” versus “he” is Vladimir Nabokov’s Lolita.

The English-language canon clearly affords boys and men the luxury of adventure without women. But it could also mean that culturally, we’ve normalized stories and whole worlds without women, and not the other way around.

The book is full of other tidbits, including data on other key differences between kinds of writers: The language used in erotica stories from the site Literotica, for example, show big differences between words used by Texas and New York writers: