ACRL News Issue (B) of College & Research Libraries C&RL N ew s ■ June 2001 / 603 C o l l e g e & R e s e a r c h L i b r a r i e s News From the ground up Lessons learned from a librarian’s experience with digitizing special collections by Beth M. Russell D igitizing library collections is b eco m ­ ing a com m on request, and libraries are responding to these requests by attempt­ ing to m ake images o f rare or interesting ma­ terials available to a wider, “virtual” audience. My involvement in digitization began quite by accident; the chance to w ork on a large- scale digitization project crossed my desk com pletely unsolicited. As a special co llec­ tions cataloger with just enough computer expertise to know that im­ aging was a com plicated and dem anding p ro cess, my first reaction was far from enthusiastic. Nonetheless, m ore than two years after beginning the project, I can now point to a la r g e d a ta b a s e o f s e a r c h a b le p h o to g ra p h s that has alread y p ro v en useful to researchers.1 My e x p e r i e n c e s a r e liv in g p roof that starting a digiti­ z a tio n p r o je c t fro m the ground up can b e done. This article will highlight how the digiti­ zation project was built from the ground up, stressing a few simple lessons learned that may prove useful for other librarians w ho find themselves in a similar situation. In short, The original boxes and housings of the material and the state in which it arrived at the archives, after many years of storage. I w ould advise librarians beginning such project from scratch to k eep in mind three simple maxims: 1) do your research, 2) nur ture collaborations, and 3) b e flexible. Beginnings A digitization initiative was in the works a Texas A&M University long before my arriva in 1 9 9 6 .1 was unaware o f this, though, w hen I was encouraged to pursue grant fundin from TexShare to m ake ac c e s s ib le a c o lle c t io n o thousands o f agricultura photographs. The program, called a TexShare Access t Local Holdings grant at th tim e , a n d n o w n a m e d T e x T r e a s u r e s , w a s d e signed to assist libraries “t provide access to their spe cial or unique local collec a ­ t l g ­ f l o e ­ o ­ ­ tion holdings and to make in fo rm a tio n a b o u t th e se holdings available to library users across the state.” This was clearly a case o f having the right knowledge at the right time. I had b een work­ ing with a collection o f photographs from the university’s Agricultural Communications O ffice for a short time, using several images to illustrate a library publication and research- About the author Beth M. Russell is head o f special collections cataloging at Ohio State University, e-mail: russell.363@osu.edu mailto:russell.363@osu.edu 604 / C&RL News ■ June 2001 An exam ple o f a p h o to g ra p h fro m th e c o lle c tio n and h o w th e in fo rm a tio n appears in th e database. ing the relationships b etw e en this material and historical archives o f the Texas Agricul­ tural Extension Service an d o th er agencies held in the Cushing Memorial Library at Texas A&M. At first glance, the collection fit the goals of the grant program very well. The material was unique to Texas A&M, but featured images from all over the state. Historical interest in agricul­ tural education and hom e dem onstration was demonstrable, but since most of the p h oto­ graphs also h ad descriptive captions identify­ ing individuals an d locales, they w ould have a much broader appeal. African Americans and women w ere also featured prominently in seg­ ments o f the collection. The pho to g rap h s w ere fragile an d d ete­ riorating quickly; negatives w ere often p a ­ per-clipped an d place d in acidic pockets, while th e prints them selves w e re usually glued to cardstock. If the images w ere to b e preserved for future generations, som ething had to be done. 1. Do y o u r research. The opportunity to pursue a grant w ith a Texas them e w as a good m atch w ith m y know ledge o f the col­ lection. In o rd e r to p rom ote the collection and to fully u n d erstan d it, how ever, required much m ore in-depth research. I b eg a n re­ searching the history of the collection itself. I was able to contact p eo p le w h o h a d w o rk e d with the photographs. Many aspects of the organization an d content o f the collection that w eren’t im m ediately a p p a re n t w e re m ade clear through th ese contacts. This k n ow ledge p aid off in m any ways. I was able to co m p o se a suitable narrative about w hy the p h o to g rap h s w ere important: they d o cu m en ted extension activities for the w h o le state over a p erio d o f decades and highlighted a few im portant areas o f the e n ­ terprise that scholars, genealogists, an d the p u b lic m ight find interesting— specifically h o u sek e ep in g instruction, agricultural e d u ­ cation, publicity, an d o th er activities. In a d ­ dition, my new ly g ained know ledge allow ed m e to anticipate problem s an d those ever­ presen t exceptions to the rule throughout the project design. 2. N u rtu re collaborations. I b eg a n w o rk ­ ing w ith Dilawar Grewal, th e n im aging m an ­ ager for th e Texas A&M Libraries. Grewal assured m e that soon Texas A&M w ould have “multi-terrabyte storage capabilities and com ­ mercial grade, param etric, full-text searching an d retrieval capabilities” at its disposal. We discussed, in b ro a d terms, the w orkflow a n d process by w hich the collection could be digi­ tized and indexed. I did not p re te n d to u n ­ derstand the details, b u t I u sed this inform a­ tion to design th e project w orkflow in the grant proposal. Project design O nce TexShare a p p ro v ed th e b ro a d outline I h ad created, I realized a m ore concrete project design w ould b e necessary. I p u t a w orkflow in place immediately, since th e deadline for com pleting the project w as an am bitious 13 m onths away. 3- B e flexible. Again I consulted Grewal, w h o w as in th e process o f establishing the Texas A&M University Digital Library. The softw are an d hardw are for this entity w ere not actually in place yet, b u t I h ad a large, fragile, a n d u n p ro c essed collection o n m y hands. I decid ed to sp e n d som e of the grant m oney o n preservation supplies an d hiring C&RL News ■ June 2001 / 605 the first student worker. Instead of digitizing, she was put to w ork sleeving and foldering photographs, assigning unique item numbers to each based on the box num ber in which the photographs had arrived at the archives. Initially, I thought this would get w ork un­ derway for a few months, and that this pres­ ervation processing w ould eventually be car­ ried out concurrently with the digitization. To say that Grewal faced innum erable challenges in procuring hardware and soft­ ware w ould be an understatement, and I am sure it comes as no surprise to anyone w ho has attem pted to start u p a similar facility. I will simply advise anyone planning on pur­ chasing and setting up a state-of-the-art imag­ ing center to remain flexible and plan ahead. Faced with a deadline and no usable com­ puters, I got creative. Preservation processing continued while a tempo­ rary Microsoft Access da­ tabase was set up (with the help of the staff of the na­ scent Digital Library) for e n try o f d e s c rip tiv e records. The database was designed to allow input of all the information fields that appeared in the pho­ to g r a p h ic files. T he cardstock that the photo­ graphs were m ounted of­ ten listed a negative num­ ber, photographer, county, date, or caption. Since some photographs did not have captions, or had captions that did not accurately describe the contents of the im­ age, an additional field for a descriptive cap­ tion was added. In retrospect, the design of the database was remarkably workable; although we en­ countered several problem s that did not fit the “ideal” model, w e w ere able to modify the data to record all the necessary informa­ tion. The database Creating the database quickly revealed itself to be the most time-consuming part of the process. Of course, there was the limitation of typing speed, even though w e had de­ signed the data entry form to b e easily navi­ gable with minimal reliance o n mousing. Stu­ dents unfam iliar w ith th e vocab u lary o f A student w o rk e r in th e D igital Library m anipulates an im age fro m th e Texas A g r i c u lt u r a l E x te n s io n S ervice collection. agriculture (or m ore specifically, the agricul­ ture of mid-century America) composed some interesting descriptions o f machinery, plants, and animals. In fact, training and supervising the sum­ mary writing was a major concern because everyone had a different idea of how much information was enough and how to best com pose that information. In fact, w e were limited to a certain field length because of the setup of the database. This was only a problem a few times because some photos had pages and pages of typed “captions.” For the most part, this limit forced the students to think of succinct ways to represent an image’s content in text and will hopefully be helpful to users. My goal, after I had scheduled four stu­ dents for the project, was to have one stu­ dent w orking on data en­ try from 8 a.m. to 5 p.m., M onday through Friday (office hours of the Digi­ tal Library). A dditional students w orked on p re­ serving the material, then the actual scanning, and finally, problem solving and cleanup. The scanning phase A n u m b er of un related factors k e p t th e Texas A&M Digital Library from having its arsenal of scanning equipm ent up and running on the initial timetable. Instead, three Macintosh G3s w ere used, running first the software resident on the flatbed scanners and then on Adobe Photoshop, often all three at the same time. Where negatives were present, these were scanned; otherw ise the actual prints w ere scanned, cropped, and just barely cleaned up. Because w e could not create digital w a­ termarks, w e decided to m ount monitor-level im ages a n d urge p e o p le to c o n ta c t th e Cushing Library to obtain high-quality (digi­ tal or print) copies. Therefore, very little im­ age enhancem ent was done. Despite these issues, the scanning phase was com pleted well ahead of schedule, and students began w orking on other projects while the database was being completed. Cleanup began, but it was complicated by 606 / C&RL News ■ June 2001 system differences betw een the (PC-based) database and the (A pple-based) stored im­ ages. All images w ere backed u p o n ZIP disks, and the unique num bers (w hich had been used as filenam es) w ere sorted and eyeballed for any obvious errors. Later, tw o stu d e n ts w o rk in g to g e th e r w ould call u p a database record, check it for mistakes or for problem s w ith the d e ­ scription, th en consult the accom panying image; m any problem s w ere discovered this way. Some images h ad b e e n very poorly scanned or over- or under-adjusted. Often there was a database record for an item but no scan or vice versa. These problem s had to be resolved by pulling the files in q u es­ tion (w hich had b ee n returned to their p er­ m anent hom e in the Cushing Library, a few buildings away), often re-scanning an d re ­ keying data. The Web phase During cleanup, I b egan w orking with Digi­ tal Library staff to plan the W eb interface for the database. Again, this was a trial-and-er- ror process. While I h a d designed basic p e r­ sonal W eb pages for years, I h ad n o idea how to design a graphics-intensive site that w o u ld link to a d ata b a se m o u n ted o n a server. I som etim es resorted to sketching out on p a p e r how I w an ted results displays to look. There was a high learning curve for the students, the Digital Library staff, an d my­ self in the project. I w ould advise o th er li­ brarians w h o anticipate a similar project to hire students familiar with digitization and photographic m anipulation, if possible. The graduate assistant w orking for the Digital Library also h ad to learn database an d Web site design from scratch, so there was som e­ tim es difficulty in kn o w in g just w h at we could do. Still, given the time constraints of the process, I w as very fortunate to have w o rk e d with an enthusiastic an d com petent g ro u p o f p eo p le. I certainly enjoyed the learning process, how ever, an d the student assistants w h o w en t o n to w ork o n other p ro jec ts w ith th e D igital Library clearly learned their lesson. Conclusion If I can do this, anyone can. Certainly hav­ ing a grant to hire students an d a graduate assistant was a major factor in the success of this project, as w ell having access to the know ledge o f others. However, I believe it was a firm understanding o f the collection and the ability to think things through that really m ade this project work. Regardless of the environm ent, it’s likely that there is som eone know ledgeable around to help. As the project m anager for this grant, I relied heavily o n others for technical ex­ pertise and troubleshooting. I w ould encour­ age anyone attem pting such a project not to reinvent the w heel. Chances are g ood that there is som eone aro u n d to help you. I am very grateful to TexShare for their financial assistance, as well as to Dilawar Grewal an d the staff o f the Texas A&M Uni­ versity Digital Library for technical assistance throughout the project. Note 1. T he d a ta b a s e c a n b e a c c e s s e d at h ttp ://d l.t a m u .e d u / a g g i a n a /c o ll e c t io n s / texshare/hom e.htm l. ■ http://dl.tamu.edu/aggiana/collections/