The first time someone at Google tried to scan a book, it was a disaster. The company had talked about putting books online - and thus making them searchable - since its launch in 1996, but even now, in 2002, nothing had come of it. So Larry Page decided to try it for himself. He strapped up a camera in his office with rubber bands, snapping pictures while a coworker turned pages. It didn't work - her thumbs kept creeping into the images - until, in a flash of inspiration, Page added a metronome.
From these decidedly lo-fi beginnings, Google Books has blossomed into a technological force. Working with Harvard, Oxford and the New York Public Library, among others, Google has now scanned 10 million books - an impressive number until you remember that there are somewhere between 50 and 100 million titles in the world.
So Google keeps scanning, and recently they came to Indiana.
Or, more accurately, Indiana came to them.
This summer, librarians at IU packed up semi-trucks full of books - 57,000 titles in all - and pointed them toward Google's California campus, where each book scanned is a small step toward the company's techno-utopian goal: a future where your Google search will include results, not just from newspapers and blogs, but from Harvard's rarest books and back issues of Indianapolis Monthly
And yet, five years in, Google Books has stalled. No one knows when and where book scanning starts to violate copyright, and the legal battle has mushroomed into perhaps the largest licensing agreement in American history. The opponents of Google Books keep comparing it to Standard Oil and invoking the Sherman Antitrust Act. So IU finds its books safely returned, even as Google can't do anything with them online.
The Question of Orphans
This Friday, Nov. 13, Google, along with a group of authors and publishers, will submit a revision of their 2008 settlement to the U.S. District Court. Copyright infringement can get expensive - as much as $150,000 per case - so Google tried to cover its legal bases from the beginning. While it scanned entire books, the company would show readers only the "snippets" relevant to their searches. But the Authors Guild and the Association of American Publishers took offense even at this, and, in 2005, they sued Google.
The various interests needed three years to hammer out a $125 million settlement, and the result is as complicated as you'd expect. Books published between Gutenberg and 1923 are in the public domain - no permission needed. Newer and more popular books also present little problem, as Google can negotiate directly with their publishers.
These two groups - really old books and really new books - make up most of what you'll find on Google Books right now, but they don't cover everything. Remember those 50 to 100 million books? Most estimates place twenty percent of them in the public domain. Of the other 80 percent, perhaps half remain in copyright and available on Amazon or at bookstores. But that still leaves 40 percent - or at least 20 million books - that are "orphans."
Orphaned books are books that are in copyright but out of print, and, like their authors, they can prove impossible to track down. They quickly became the settlement's main trouble spot, even as they remained the books for which Google could do the most good. In the final settlement, which let Google display part of each orphaned book and sell downloadable copies, Google agreed to give authors 63 percent of all revenue, including ads. The company would also include links to libraries and booksellers and offer public libraries and universities free access to its database. It seemed an elegant solution to the orphan problem.
But it also created new problems. These include complex legal issues - everything from the settlement's status as a class-action lawsuit to the quagmire of international copyright - but, again, the main concern centered on orphaned books - this time, the fear that Google would develop a monopoly on their digital copies. After receiving more than 400 complaints, including one from the Department of Justice, the U.S. District Court gave Google et al
a chance to revise their settlement.
In other words, the programmers are losing to the lawyers, and no one's quite sure how it will all turn out.
IU's Father of Folklore
On March 7, 1885, a Kentucky farmer and his wife welcomed their first son into the world. They named him Stith Thompson. He was a bright child and, when he turned 12, the family moved to Indianapolis, in large part to put him in a better school. Stith managed to carve out a fairly normal childhood, working as an Indianapolis Press paper boy on the north side, up to where the woods started at 30th Street. But Indianapolis also launched him on one of those impossible-to-believe backstories, and, by the time he died, in 1976, Stith Thompson would stand as one of the most interesting intellectuals this state has ever produced.
After high school, Thompson studied at Butler, then Harvard, where he earned a Ph.D. in English in 1914. He was now a tall, thin man with an era-appropriate taste for bow ties, but Thompson was still too young to stay in one place. So he traveled, teaching at high schools and colleges across the country, learning Latin, Greek, French, German, Dutch, Swedish, Danish, Norwegian, Russian - and all while spending his summers as a lumberjack. In 1921, Thompson finally settled down as a faculty member in IU's English department, where he remained until his retirement.
But Thompson would achieve a lasting reputation, for IU as much as for himself, as the "Father of Folklore." Simply put, "folklore" means the study of a community's cultural traditions - not only legends and fairy tales, but also beliefs, practices, even recipes. It's easy to see how important Thompson's Motif-Index of Folk Literature was when the first of its six volumes came out in 1932. The book classified 2,500 cross-cultural motifs - or narrative building blocks - and gave folklorists a foothold in their quest to compare and contrast the world's many characters ("The unpromising hero"), events ("Chastity test by magic objects or ordeals"), places ("Treasure trove") and morals ("Pride brought low").
More than anyone or anything else, Thompson and his Motif-Index established the field of folklore studies in America. By the late 1930s, students were arriving in Bloomington from around the world to study under him, and New York University swooped in and offered Thompson a substantial raise in pay and prestige. Herman B. Wells, who had just become IU's president, cobbled together a counteroffer that, along with Thompson's Midwestern roots, was enough to fend off NYU. At Thompson's funeral, which was held at Bloomington's Unitarian Church, Wells would remark that keeping Thompson was the best decision he made as president.
Collections of Distinction
One of Wells' promises to Thompson was more money to buy books - and here we get back to Google. In 2007, the CIC, a group of 12 universities often called the "Academic Big Ten," asked Google about joining its scanning project. Each university offered to let Google scan a "collection of distinction." First up for scanning would be IU's Folklore Collection.
What Thompson started as a one-room, interdepartmental program is now IU's Folklore Institute, the largest and top-ranked program in America. And, while many of IU's folklorists do research far different than Thompson's, his presence is still felt in the Folklore Collection. In fact, it grew directly out of Thompson's travels, as the library sent money with him to buy books on his research trips to Europe, Asia and South America.
Many of these books still contain Thompson's signature and notes, and it all adds up to the largest folklore library in the world, according to Moira Smith, the Collection's librarian. (Originally from New Zealand, Smith first came to Indiana to earn a Folklore Ph.D.) "Wherever Thompson went," Smith says, "he bought folklore books and brought them back." Today, on the seventh floor of IU's Herman B. Wells library, you can browse the books Thompson used to compile his Motif-Index, in addition to countless other oddities - everything from 1960s fanzines from the folk music revival to a rich collection of Indiana ghost stories.
But the question remains: when, exactly, will you be able to browse them online?
"Don't Be Evil." Google's unofficial motto is also its mythology, the carefully upheld image of a business that doesn't act like other businesses. Google must have expected its book-scanning initiative to only add to this goodwill. After all, libraries had long wanted to digitize their collections, with many launching their own projects, only to run out of money.
And that's where Google Books could help, with its irresistible blend of free labor and superior technology. Google guards its post-metronome techniques as carefully as it does its search algorithm. (That's why IU had to ship books to Google.) Still, we know enough - Google Books' chief engineer came from NASA, and a leaked patent mentions infrared cameras and 3D imaging - to know that we should be impressed. The stats back this up. In IU's case, Google couldn't scan all of the books because some were too big, too small or too fragile. IU's librarians estimate that, on their own, they won't finish scanning this small fraction until 2029.
But Google offered more than faster, gentler scanning. They'd also solved the aforementioned "orphan" mess. Again, IU's situation provides a particularly absurd example of this. "Folklore," almost by definition, should not be protected and profited from but shared. Thanks to current copyright law, though, 95 percent of IU's Folklore Collection remains in copyright. Smith told me that "most of our authors and publishers are dead, out of business, or both - they'd be perfectly happy to make their stuff available." But because IU can't spare the staff to investigate the books individually, they must lump them together.
Which means they must wall them off. Let's say you want to pick up a copy of Stith Thompson's Oral Tales of India.
It's out of print and likely to stay that way; used copies go for $100. Or let's say you want to flip through Thompson's memoir, A Folklorist's Progress.
Good luck doing that in an Indiana library - only seven have a copy. Of course, Google scanned both books from IU's Folklore Collection, but, because of legal issues, it can't make them available.
The Villain of the Story
This raises a second question: who, exactly, is being "evil"? Over the course of the Google Books debate, Google's behavior hasn't changed as much as its status. Google can now claim 30 percent of online ad revenue and 70 percent of online searches; its most recent quarterly report blew investors away, and its stock trades at more than $500 a share.
It's difficult to muster any sympathy for a company sitting on $20 billion in cash, but Google's swelling empire provides even more motivation to sink the Google Books settlement. And you can see this in its loyal opposition. For every genuinely disinterested critic of the settlement, there are several competing corporations. It's become a classic case of if-you-can't-beat-them-file-a-legal-brief-against-them.
Take Microsoft. Microsoft, which dealt with its own antitrust demons in the late 1990s, has attacked Google from all angles. But its behavior toward Google Books seems particularly duplicitous. Microsoft's funneled money to New York Law School's anti-Google Books group through an academic grant program. More broadly, it's joined the Open Book Alliance, along with Yahoo! and Amazon. To be fair, the Alliance also includes the nonprofit Internet Archive and a few library associations. But Microsoft started its own book-scanning project in 2005, one year after Google's, only to end it in 2008. The reason? Microsoft couldn't figure out how to make money off it. Yahoo! launched and folded a similar initiative for similar reasons. And Amazon continues its firm stand for authors' rights, even as it gives them only 30 percent of Kindle subscription revenues.
If there's a true villain in this story, it's almost certainly the copyright fiends like the Authors' Guild, whose president recently called the Kindle's text-to-speech feature, which must be a boon to any blind customer, a "swindle" because it might cut into the sales of audiobooks. But these are the people - and this is the system - with which Google must work. It's no surprise that, of the original settlement's $125 million, the lawyers were slated to get more than all of the authors combined. But neither is it Google's fault. The company isn't in the copyright business, it's in the information business.
Of course, Google will benefit from Google Books - if not in revenue, then certainly in branding, institutional goodwill and market separation from Microsoft's Bing and Yahoo!'s search. But book digitization on the widest scale will benefit everyone. In addition to the improvements in searching, sorting and citation - and let's not overlook the simple but potentially profound ability to link to book pages - Google Books' glut of data will improve the company's other offerings. Digital copies of bilingual books will improve Google Translate; the company will be able to mash up books with Google Maps or Google Earth, so that readers can browse books by location - or, even better, pick a location and see every book that mentions it.
This still comes back to Google's algorithm, Google's technology, Google's page views and Google's bottom line. But what unites these applications is Google's commitment to "free" - software suites, search engines, email, - Google supports them all with ads, which account for 97 percent of the company's revenue.
A Higher Purpose
So: who do we trust? Until Google got involved, the digitization of books lagged far behind that of more immediately lucrative media like movies and music. Should we reward technological innovation or punish it? Should we check a forward-thinking company because it got a head start? Should we penalize a company for possessing the resources and the drive to accomplish something others abandoned? Or, again, and more broadly: who do we trust?
One thing that's been lost in all the litigation is just how excited people were in 2004. IU serves as a ghostly reminder of that sentiment. The university plans to continue working with Google Books, sending it non-folklore items through 2012. Its officials and librarians have nothing but good things to say about Google. Patricia Steele, who was IU's Dean of University Libraries until September, when she took the same position at the University of Maryland, is a national figure in these debates. In fact, after the settlement, Steele met with Google representatives in New York and helped to rewrite the CIC contract. "I was just very impressed with Google's culture," Steele says. "These are people who really have a mission to bring knowledge to everyone. They have a higher purpose!"
That higher purpose can help IU's scholars. Jason Baird Jackson, the chair of IU's Department of Folklore and Ethnomusicology, believes the kind of scholarship practiced by Stith Thompson will soon see a comeback. "The reason for that," Jackson says, "is the digitization of the Folklore Collection. We're on the verge of doing the work he did, but on a scale that was unimaginable."
By the time Thompson began his Motif-Index, he had amassed 20 drawers worth of 4-by-6-inch paper slips. Even today, it's impressive that he forged those notes into a coherent and authoritative work. But it's no longer necessary. Instead of comparing 10 or even 100 versions of a folklore motif, scholars can compare thousands - or could, if we'd only give them the chance..
But the reach of Google Books extends far beyond the academy. By digitizing information, Google hopes to democratize it. In this future, it wouldn't matter if you live in New York or Bloomington, Indianapolis or Elkhart. You could access any book - even, or especially, the one you didn't know existed.