Last year I wrote about some very interesting research being done by Paul J. Heald at the University of Illinois, based on software that crawled Amazon for a random selection of books. At the time, his results were only preliminary, but they were nevertheless startling: There were as many books available from the 1910s as there were from the 2000s. The number of books from the 1850s was double the number available from the 1950s. Why? Copyright protections (which cover titles published in 1923 and after) had squashed the market for books from the middle of the 20th century, keeping those titles off shelves and out of the hands of the reading public.
Heald has now finalized his research and the picture, though more detailed, is largely the same: “Copyright correlates significantly with the disappearance of works rather than with their availability,” Heald writes. “Shortly after works are created and proprietized, they tend to disappear from public view only to reappear in significantly increased numbers when they fall into the public domain and lose their owners.”
The graph above shows the simplest interpretation of the data. It reveals, shockingly, that there are substantially more new editions available of books from the 1910s than from the 2000s. Editions of books that fall under copyright are available in about the same quantities as those from the first half of the 19th century. Publishers are simply not publishing copyrighted titles unless they are very recent.
But this isn’t a totally honest portrait of how many different books are available, because for books that are in the public domain, often many different editions exist, and the random sample is likely to overrepresent them. “After all,” Heald explains, “if one feeds a random ISBN number [into] Amazon, one is more likely to retrieve Milton’s Paradise Lost (with 401 editions and 401 ISBN numbers) than Lorimer’s A Wife out of Egypt (1 edition and 1 ISBN).” He found that on average the public domain titles had a median of four editions per title. (The mean was 16, but highly distorted by the presence of a small number of books with hundreds of editions. For this reason, statisticians whom Heald consulted recommended using the median.) Heald divided the number of public-domain editions by four, providing a graph that compares the number of titles available.
Heald says the picture is still “quite dramatic.” The most recent decade looks better by comparison, but the depression of the 20th century is still notable, followed by a little boom for the most recent decades when works fall into the public domain. Presumably, as Heald writes, in a market with no copyright distortion, these graphs would show “a fairly smoothly doward sloping curve from the decade 2000-20010 to the decade of 1800-1810 based on the assumption that works generally become less popular as they age (and therefore are less desirable to market).” But that’s not at all what we see. “Instead,” he continues, “the curve declines sharply and quickly, and then rebounds significantly for books currently in the public domain initially published before 1923.” Heald’s conclusion? Copyright “makes books disappear”; its expiration brings them back to life.
The books that are the worst affected by this are those from pretty recent decades, such as the 80s and 90s, for which there is presumably the largest gap between what would satisfy some abstract notion of people’s interest and what is actually available. As Heald writes:
This is not a gently sloping downward curve! Publishers seem unwilling to sell their books on Amazon for more than a few years after their initial publication. The data suggest that publishing business models make books disappear fairly shortly after their publication and long before they are scheduled to fall into the public domain. Copyright law then deters their reappearance as long as they are owned. On the left side of the graph before 1920, the decline presents a more gentle time-sensitive downward sloping curve.
But even this chart may understate the effects of copyright, since the comparison assumes that the same quantity of books has been published every decade. This is of course not the case: Increasing literacy coupled with technological efficiencies mean that far more titles are published per year in the 21st century than in the 19th. The exact number per year for the last 200 years is unknown, but Heald and his assistants were able to arrive at a pretty good approximation by relying on the number of titles available for each year in WorldCat, a library catalog that contains the complete listings of 72,000 libraries around the world. He then normalized his graph to the decade of the 1990s, which saw the greatest number of titles published.
By this calculation, the effect of copyright appears extreme. Heald says that the WorldCat research showed, for example, that there were eight times as many books published in the 1980s as in the 1880s, but there are roughly as many titles available on Amazon for the two decades. A book published during the presidency of Chester A. Arthur has a greater chance of being in print today than one published during the time of Reagan.
Copyright advocates have long (and successfully) argued that keeping books copyrighted assures that owners can make a profit off their intellectual property, and that that profit incentive will “assure [the books’] availability and adequate distribution.” The evidence, it appears, says otherwise.