Skip to main content

How the Internet Archive Digitizes 3,500 Books a Day–the Hard Way, One Page at a Time

Does turning the pages of an old book excite you? How about 3 million pages? That’s how many pages Eliza Zhang has scanned over her ten years with the Internet Archive, using Scribe, a specialized scanning machine invented by Archive engineers over 15 years ago. “Listening to 70s and 80s R&B while she works,” Wendy Hanamura writes at the Internet Archive blog, “Eliza spends a little time each day reading the dozens of books she handles. The most challenging part of her job? ‘Working with very old, fragile books.”

The fragile state and wide variety of the millions of books scanned by Zhang and the seventy-or-so other Scribe operators explains why this work has not been automated. “Clean, dry human hands are the best way to turn pages,” says Andrea Mills, one of the leaders of the digitization team. “Our goal is to handle the book once and to care for the original as we work with it.”

Raising the glass with a foot pedal, adjusting the two cameras, and shooting the page images are just the beginning of Eliza’s work. Some books, like the Bureau of Land Management publication featured in the video, have myriad fold-outs. Eliza must insert a slip of paper to remind her to go back and shoot each fold-out page, while at the same time inputting the page numbers into the item record. The job requires keen concentration.

If this experienced digitizer accidentally skips a page, or if an image is blurry, the publishing software created by our engineers will send her a message to return to the Scribe and scan it again.

It’s not a job for the easily bored; “It takes concentration and a love of books,” says Internet Archive founder Brewster Kahle. The painstaking process allows digitizers to preserve valuable books online while maintaining the integrity of physical copies. “We do not disbind the books,” says Kahle, a method that has allowed them to partner with hundreds of institutions around the world, digitizing 28 million texts over two decades. Many of those books are rare and valuable, and many have been deemed of little or no value. “Increasingly,” writes the Archive’s Chris Freeland, “the Archive is preserving many books that would otherwise be lost to history or the trash bin.”

In one example, Freeland cites The dictionary of costume, “one of the millions of titles that reached the end of its publishing lifecycle in the 20th century.” It is also a work cited in Wikipedia, a key source for “students of all ages… in our connected world.” The Internet Archive has preserved the only copy of the book available online, making sure Wikipedia editors can verify the citation and researchers can use the book in perpetuity. If looking up the definition of “petticoat” in an out-of-print reference work seems trivial, consider that the Archive digitizes about 3,500 books every day in its 18 digitization centers. (The dictionary of costume was identified as the Archive’s 2 millionth “modern book.”)

Libraries “have been vital in times of crisis,” writes Alistair Black, emeritus professor of Information Sciences at the University of Illinois, and “the coronavirus pandemic may prove to be a challenge that dwarfs the many episodes of anxiety and crisis through which the public library has lived in the past.” A huge part of our combined global crises involves access to reliable information, and book scanners at the Internet Archive are key agents in preserving knowledge. The collections they digitize “are critical to educating an informed populace at a time of massive disinformation and misinformation,” says Kahle. When asked what she liked best about her job, Zhang replied, “Everything! I find everything interesting…. Every collection is important to me.”

The Internet Archive offers over 20,000,000 freely downloadable books and texts. Enter the collection here.

Related Content: 

Libraries & Archivists Are Digitizing 480,000 Books Published in 20th Century That Are Secretly in the Public Domain

10,000 Vintage Recipe Books Are Now Digitized in The Internet Archive’s Cookbook & Home Economics Collection

Classic Children’s Books Now Digitized and Put Online: Revisit Vintage Works from the 19th & 20th Centuries

Josh Jones is a writer and musician based in Durham, NC. Follow him at @jdmagness

How the Internet Archive Digitizes 3,500 Books a Day–the Hard Way, One Page at a Time is a post from: Open Culture. Follow us on Facebook, Twitter, and Google Plus, or get our Daily Email. And don't miss our big collections of Free Online Courses, Free Online Movies, Free eBooksFree Audio Books, Free Foreign Language Lessons, and MOOCs.



from Open Culture https://ift.tt/3k9qz4F
via Ilumina

Comments

Popular posts from this blog

Board Game Ideology — Pretty Much Pop: A Culture Podcast #108

https://podtrac.com/pts/redirect.mp3/traffic.libsyn.com/secure/partiallyexaminedlife/PMP_108_10-7-21.mp3 As board games are becoming increasingly popular with adults, we ask: What’s the relationship between a board game’s mechanics and its narrative? Does the “message” of a board game matter? Your host Mark Linsenmayer is joined by game designer Tommy Maranges , educator Michelle Parrinello-Cason , and ex-philosopher Al Baker to talk about re-skinning games, designing player experiences, play styles, game complexity, and more. Some of the games we mention include Puerto Rico, Monopoly, Settlers of Catan, Sorry, Munchkin, Sushi Go, Welcome To…, Codenames, Pandemic, Occam Horror, Terra Mystica, chess, Ticket to Ride, Splendor, Photosynthesis, Spirit Island, Escape from the Dark Castle, and Wingspan. Some articles that fed our discussion included: “ The Board Games That Ask You to Reenact Colonialism ” by Luke Winkie “ Board Games Are Getting Really, Really Popular ” by Darron Cu

How Led Zeppelin Stole Their Way to Fame and Fortune

When Bob Dylan released his 2001 album  Love and Theft , he lifted the title from a  book of the same name by Eric Lott , who studied 19th century American popular music’s musical thefts and contemptuous impersonations. The ambivalence in the title was there, too: musicians of all colors routinely and lovingly stole from each other while developing the jazz and blues traditions that grew into rock and roll. When British invasion bands introduced their version of the blues, it only seemed natural that they would continue the tradition, picking up riffs, licks, and lyrics where they found them, and getting a little slippery about the origins of songs. This was, after all, the music’s history. In truth, most UK blues rockers who picked up other people’s songs changed them completely or credited their authors when it came time to make records. This may not have been tradition but it was ethical business practice. Fans of Led Zeppelin, on the other hand, now listen to their music wi

Moral Philosophy on TV? Pretty Much Pop #32 Judges The Good Place

http://podtrac.com/pts/redirect.mp3/traffic.libsyn.com/partiallyexaminedlife/PMP_032_2-3-20.mp3 Mark Linsenmayer, Erica Spyres, and Brian Hirt discuss Michael Schur's NBC TV show . Is it good? (Yes, or we wouldn't be covering it?) Is it actually a sit-com? Does it effectively teach philosophy? What did having actual philosophers on the staff (after season one) contribute, and was that enough? We talk TV finales, the dramatic impact of the show's convoluted structure, the puzzle of heaven being death, and more. Here are a few articles to get you warmed up: "The Good Place’s Final Twist" by Karthryn VanArendonk "The Good Place Was a Metaphor All Along" by Sophie Gilbert "The Two Philosophers Who Cameoed in the Good Place Finale on What They Made of Its Ending" by Sam Adams "5 Moral Philosophy Concepts Featured on The Good Place" by Ellen Gutoskey If you like the show, you should also check out The Official Good Place Podca