June 23, 2011
By mining a database of the world's books, Erez Lieberman Aiden is attempting to automate much of humanities research. But is the field ready to be digitized?
Published online, 17 June 2011, Nature
Reading very not-carefully
As a reader with a finite amount of time, Lieberman Aiden likes to say, you pretty much have two choices. You can read a small number of books very carefully. Or you can read lots of books "very, very not-carefully". Most humanities scholars abide by the former approach. In a process known as close-reading, they seek out original sources in archives, where they underline, annotate and cross-reference the text in efforts to identify and interpret authors' intentions, historical trends and linguistic evolution. It's the approach Lieberman Aiden followed for a 2007 paper in Nature. Sifting through old grammar books, he and his colleagues identified 177 verbs that were irregular in the era of Old English (around AD 800) and studied their conjugation in Middle English (around AD 1200), then in the English used today. They found that less-commonly used verbs regularized much more quickly than commonly used ones: 'wrought' became 'worked', but 'went' has not become 'goed'. The study gave Lieberman Aiden a first-hand lesson in how painstaking a traditional humanities approach could be.
But what if, Lieberman Aiden wondered, you could read every book ever written 'not-carefully'? You could then show how verbs are conjugated not just at isolated moments in history, but continuously through time, as the culture evolves. Studies could take in more data, faster. As he began thinking about this question, Lieberman Aiden realized that 'reading' books in this way was precisely the ambition of the Google Books project, a digitization of some 18 million books, most of them published since 1800. In 2007, he 'cold e-mailed' members of the Google Books team, and was surprised to get a face-to-face meeting with Peter Norvig, Google's director of research, just over a week later. "It went well," Lieberman Aiden says, in an understatement.