Skip to main content

How the Internet Archive Digitizes 3,500 Books a Day–the Hard Way, One Page at a Time

Does turning the pages of an old book excite you? How about 3 million pages? That’s how many pages Eliza Zhang has scanned over her ten years with the Internet Archive, using Scribe, a specialized scanning machine invented by Archive engineers over 15 years ago. “Listening to 70s and 80s R&B while she works,” Wendy Hanamura writes at the Internet Archive blog, “Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? ‘Working with very old, fragile books.”

The fragile state and wide variety of the millions of books scanned by Zhang and the seventy-or-so other Scribe operators explains why this work has not been automated. “Clean, dry human hands are the best way to turn pages,” says Andrea Mills, one of the leaders of the digitization team. “Our goal is to handle the book once and to care for the original as we work with it.”

Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.

If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.

It’s not a job for the easily bored; “It takes concentration and a love of books,” says Internet Archive founder Brewster Kahle. The painstaking process allows digitizers to preserve valuable books online while maintaining the integrity of physical copies. “We do not disbind the books,” says Kahle, a method that has allowed them to partner with hundreds of institutions around the world, digitizing 28 million texts over two decades. Many of those books are rare and valuable, and many have been deemed of little or no value. “Increasingly,” writes the Archive’s Chris Freeland, “the Archive is preserving many books that would otherwise be lost to history or the trash bin.”

In one example, Freeland cites The dictionary of costume, “one of the millions of titles that reached the end of its publishing lifecycle in the 20th century.” It is also a work cited in Wikipedia, a key source for “students of all ages… in our connected world.” The Internet Archive has preserved the only copy of the book available online, making sure Wikipedia editors can verify the citation and researchers can use the book in perpetuity. If looking up the definition of “petticoat” in an out-of-print reference work seems trivial, consider that the Archive digitizes about 3,500 books every day in its 18 digitization centers. (The dictionary of costume was identified as the Archive’s 2 millionth “modern book.”)

Libraries “have been vital in times of crisis,” writes Alistair Black, emeritus professor of Information Sciences at the University of Illinois, and “the coronavirus pandemic may prove to be a challenge that dwarfs the many episodes of anxiety and crisis through which the public library has lived in the past.” A huge part of our combined global crises involves access to reliable information, and book scanners at the Internet Archive are key agents in preserving knowledge. The collections they digitize “are critical to educating an informed populace at a time of massive disinformation and misinformation,” says Kahle. When asked what she liked best about her job, Zhang replied, “Everything! I find everything interesting…. Every collection is important to me.”

The Internet Archive offers over 20,000,000 freely downloadable books and texts. Enter the collection here.

Related Content: 

Libraries & Archivists Are Digitizing 480,000 Books Published in 20th Century That Are Secretly in the Public Domain

10,000 Vintage Recipe Books Are Now Digitized in The Internet Archive’s Cookbook & Home Economics Collection

Classic Children’s Books Now Digitized and Put Online: Revisit Vintage Works from the 19th & 20th Centuries

Josh Jones is a writer and musician based in Durham, NC. Follow him at @jdmagness

How the Internet Archive Digitizes 3,500 Books a Day–the Hard Way, One Page at a Time is a post from: Open Culture. Follow us on Facebook, Twitter, and Google Plus, or get our Daily Email. And don't miss our big collections of Free Online Courses, Free Online Movies, Free eBooksFree Audio Books, Free Foreign Language Lessons, and MOOCs.



from Open Culture https://ift.tt/3k9qz4F
via Ilumina

Comments

Popular posts from this blog

When Albert Einstein & Charlie Chaplin Met and Became Fast Famous Friends (1930)

Photo via Wikimedia Commons “You do not really understand something unless you can explain it to your grandmother,” goes a well-known quote attributed variously to Albert Einstein, Richard Feynman, and Ernest Rutherford. No matter who said it, “the sentiment… rings true,” writes Michelle Lavery , “for researchers in all disciplines from particle physics to ecopsychology.” As Feynman discovered during his many years of teaching , it could be “the motto of all professional communicators,” The Guardian ’s Russell Grossman writes , “and especially those who earn a living communicating the tricky business of science.” Einstein became one of the world’s great science communicators by choice, not necessity, and found ways to explain his complex theories to children and the elderly alike. But perhaps, if he’d had his way, he would rather have avoided words altogether, and preferred acrobatic feats of silent daring to get his message across. We might at least conclude so from his reverence f...

Howard Zinn’s Recommended Reading List for Activists Who Want to Change the World

Image by via Wikimedia Commons Back in college, I spotted A People’s History of the United States   in the bags and on the bookshelves of many a fellow undergraduate. By that time, Howard Zinn’s alternative telling of the American story had been popular reading material for a couple of decades, just as it presumably remains a couple more decades on. Even now, a dozen years after Zinn’s death, his ideas about how to approach U.S. history through non-standard points of view remain widely influential. Just last month, Radical Reads featured the reading list he originally drew up for the  Socialist Worker , pitched at “activists interested in making their own history.” Zinn’s recommendations naturally include the work of other historians, from Gary Nash’s Red, White and Black: The Peoples of Early America (“a pioneering work of ‘multiculturalism’ dealing with racial interactions in the colonial period”) to Vincent Harding’s There Is a River: The Black Struggle for ...

Zamrock: An Introduction to Zambia’s 1970s Rich & Psychedelic Rock Scene

The story of popular music in the late 20th century is never complete without an account of the explosive psychedelic rock, funk, Afrobeat, and other hybrid styles that proliferated on the African continent and across Latin American and the Caribbean in the 1960s and 70s. It’s only lately, however, that large audiences are discovering how much pioneering music came out of Kenya, Ghana, Nigeria, and other postcolonial countries, thanks to UK labels like Strut and Soundway (named by The Guardian as “one of the 10 British Labels defining the sound of 2014” and named “Label of the Year” in 2017). Germany’s Analogue Africa , a label that reissues classic albums from the era, puts it this way: “the future of music happened decades ago.” Only most Western audiences weren’t paying attention—with notable exceptions, of course: superstar drummer Ginger Baker apprenticed himself to Fela Kuti and became an evangelist for African drumming; Brian Eno and Talking Heads’ David Byrne ( who ...