Although this session had a simple one-word title, it imparted a wealth of information on digitization projects. Charlotte Spinner and Christine Rasmussen, Information Architecture Analyst and Manager, respectively, at the AARP Library, discussed how they created the AARP The Magazine database. AARP The Magazine, the world’s largest circulation magazine, is delivered bimonthly to all AARP members. Before the project started, some recent issues had been digitized, but the library staff was frequently asked to find an article from a back issue, and this is how they had to do it.
Typical articles in the magazine have sidebars, photos, and text blocks. PRISM XML, an industry standard for magazines, was selected for AARP’s database. The librarians thought that all the back issues were available, and the publications department agreed to pay for the project. But there were complications: all the back issues turned out not to be available, and he content varied widely; they did not know it as well as they had thought. In some cases, the same content was published in different versions of the magazine for different age groups; for example, “Skin care in your 50s”, “Skin care in your 60s”, and “Skin care in your 70s”. There were also regional variations. Some articles had simply been inserted between pages and did not have page numbers.
The Publication Department had no list of the content variations, so the librarians created the lists manually. One-third of the database had content variations! At least 1/4 of the content was available in print only. There were also variations in format. Some issues were available in PDF; they were translated into XML. Other copies were available only in print, so they had to be scanned and translated to XML. The digitization project took several months more than originally planned, but finally, the entire magazine was completely online.
In the database, links to the PDF files are available. Users like this because they can see how the article looked in the print issue.
There were several unexpected outcomes from this project. The library uses the database all the time, but the Publications Department looks to see if titles proposed for new articles have already been used. Media Relations people use the database, and the legal department uses it for copyright questions. Interdepartmental relationships were forged between the library and other departments. New digitization ideas were spawned; for example, the library was asked to digitize Modern Maturity, and letters, speeches, and documents of AARP’s founder. have been digitized. The visibility of the library has been increased, and its stature has been enhanced in the eyes of managers. Distribution deals with vendors (EBSCO, etc.) have been created, which bring in revenue to AARP.
- It’s always harder than you think.
- It always takes longer than you think.
- It always costs more than you think (actually the project was under budget).
But do it anyway:
- Pave the way
- Have solutions ready for naysayers
- Be prepared to roll up your sleeves.
- Gently push and push some more.
Richard Hulser from the Los Angeles Natural History Museum spoke about enabling discoverability through crowdsourcing and purposeful gaming in a biological database. Although the natural history literature and its archives contain information critical to studying life on earth, much of this information is difficult to find, and much of it is not in digital form. Most of it is available in only a few select libraries in the developed world. This lack of literature is a major impediment to the efficiency of scientific research. The Biodiversity Heritage Library (BHL) is changing this by providing free and open online access to library collections from around the world. The BHL is an open access digital library for biodiversity literature and archives and has materials dating back to the 15th century.
The typical digitized book contains text as well as images, taxonomic names, etc. Just scanning pages is not possible in many cases because fonts, smudges, and foreign language material cause major challenges and result in poor OCR output. Much manual effort is needed to correct the OCR and bring it up to modern standards, and the BHL does not have the resources for this; therefore, it has enabled crowdsourcing by engaging the public to help.
Two purposeful games were developed to improve and enhance discovery. In Beanstalk, players enter words, get points, and grow a beanstalk. The other game, Smorball, has a football analogy in which players get points. The game output is used to add terms to the index, as follows:
The games had 5,000 participants over 6 months; Smorball had 4,365 sessions, and Beanstalk had 2,757 sessions. The two games will be available through the end of 2016 and maybe longer.
Digital library communities that manage large text collections and require novel or more cost effective approaches to generating text will benefit most from this type of approach. The games had challenges: they are unable to automatically generate output for handwritten texts and catalogs, and they cannot collect enough game data to apply corrections to the full BHL corpus of 48 million images.
Lessons learned: a more robust marketing plan is needed, the game designer should be selected before the project begins, and the agreement threshold should be lowered to 2 instead of 4 to consider input for a term complete.
The project demonstrated that games are viable solutions to crowdsourcing improvements. Further information is available at biodivlib.wikispaces.com/purposeful+gaming.