Integrating Digital Resources into a Traditional University Research Library Issues in Science and Technology Librarianship Summer 1999 DOI:10.5062/F4MW2F4K URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. Integrating Digital Resources into a Traditional University Research Library Fiona C. Coutinho Department of Computer Science University of South Carolina Columbia, SC 29208 Caroline M. Eastman Department of Computer Science University of South Carolina Columbia, SC 29208 Christopher B. Hare Thomas Cooper Library University of South Carolina Columbia, SC 29208 Robert F. Skinder Thomas Cooper Library University of South Carolina Columbia, SC 29208 rskinder@gwm.sc.edu Abstract We describe the ongoing Electronic Library Project at the University of South Carolina. The goal of this project is the integration of digital resources within a traditional university research library. The first step was the development of an Electronic Science Library (ESL), followed by an Electronic Academic Library (EAL) which includes non-science subjects. We discuss the structure of these libraries and comment on our experiences with their implementation and use. The prototype implementation used static web pages, a technology which we knew would not scale up well. This implementation is being replaced by a database system using SQL Server and Active Server Pages. Future plans are briefly discussed. Introduction Every year Internet-based resources become increasingly important to the academic community, but they do not necessarily become easier to use or find. This paper describes the Electronic Library Project being developed at the University of South Carolina. The objective is to harvest thousands of useful academic resources that are available on the Internet and make them available to the academic community in the form of an academic digital library. This is an ongoing project of which various components have been completed. We first discuss the Electronic Science Library (ESL) and the subsequent Electronic Academic Library (EAL). We then describe our supporting database and retrieval system. Planned future work is briefly summarized. Fundamental to our plan is a system that targets discrete items and places them within the structure of an electronic library. The items are placed on clearly marked shelves and assigned call numbers, thus providing structured access to the available but disorganized academic resources on the Internet. In this manner we make them available to faculty and students for teaching and research. To achieve that goal, we have laid out a multi-part plan, parts of which have been completed while others have yet to begin. The ESL was the first step and has taught us a great deal. One of the key lessons that we learned was that the human intellect is better used developing tools to produce electronic libraries than searching for the resources to be included. This is illustrated by the development of our database shell and retrieval engine. The ESL was constructed and we learned from it. When we proposed the non-science Electronic Academic Library (EAL) we not only expanded our scope but went beyond the library to develop computerized solutions. As time passed, new applications for related projects have emerged, such as a project involving Internet-based biological databases. New computer-based tools have also been developed or proposed and will be discussed. In short, our original work has now become a test bed for a wide range of topics. The overall goal has been to design a system that will produce an easy and efficient way for the academic community to use the resources of the Internet. The librarians are concerned with organizing the materials. Our Computer Science allies hope to utilize state of the art technology to reduce the amount of tedious labor inherent in such an endeavor. To date, these innovations include an ASP (Active Server Pages) based database coupled with a sophisticated retrieval engine. Additional tools will include a customizable search robot for discovering resources as well as computerized "Library Assistant" programs to manage housekeeping routines. These will be implemented as opportunities appear. Generally speaking our approach has been for the librarians to build small segments of the library and observe both the researchers and the users. As problems or opportunities are discovered we confer with faculty of the Computer Science Department who have often been able to provide both answers and student programmers. Projects that have involved the Computer Science students to date are the prototype database shell with search engine. They are beginning work with the spiders as we begin our biological database project. Electronic Science Library Objectives The prototype Electronic Science Library (ESL) was developed to determine if it was feasible to locate, evaluate and make available to the classroom and laboratory the numerous educational resources available on the Internet. Limiting our initial efforts to science seemed to be a manageable goal, and the initial investigators were science librarians. It was, however, always assumed that the work would be expanded given any positive results. Secondary objectives included: Determine the best way to locate these resources. Evaluate the quantity and quality of these resources. Devise a system for housing these resources that would be conducive to our arranging them and our patrons locating them. The working model of the ESL can be viewed at {http://www.sc.edu/library/science/elibind.html} Approach The ESL, like most libraries, consists of two distinct parts. First is the infrastructure comprising the physical plant, the computerized system(s), the materials and the librarians. The other half of the equation consists of the users and those parts of the system that allow them to use the library. The approach therefore is divided. Management and Development Electronic or digital libraries include a wide range of concepts such as digitizing existing copies of documents or collecting materials that are already digitized such as maps or statistical data. Some digital collections concentrate on cultural or geographical highlights of the parent organization. Our particular strategy was to locate discrete, catalogable Internet resources such as particular books, journals or online courses. This concept is opposed to the "list of lists" approach or the use of search engines. These methods might be suitable for use by some professionals but are not sufficient by themselves for the undergraduate community. Our resources were collected and arranged by two information specialists working from 5 to 10 hours a week for one year. Resources were located using the very tools that the ESL was designed to replace. The searchers used search engines, virtual libraries, mailing list messages, and Internet surfing. The information specialists used cataloging information when it was available. If not, they approximated call numbers and assigned appropriate locations. The work that they did was extremely labor intensive and turned out to be one of the major problems in exploitation of digital resources. At the completion of our prototype, the format of our pages or shelves was still very much the same as when we began our work, which suggests that we guessed well. We have identified and entered into the database approximately 2,500 resources. Patron Usage To locate materials, the user chooses one of the sciences on the opening or {index page}. The subjects listed are not all of the sciences taught at USC, Columbia, but represent the larger departments and, perhaps more importantly, those that have a reasonable number of electronic resources available on the Internet. They are then taken to the next page or area that is functionally always the same regardless of the subject chosen. Here are listed the separate {categories} of items. In the science library, these categories are the same for each subject. This allows the user to feel comfortable wherever they may be within the ESL. Please note that they vary considerably among the non-science EAL pages. The individual categories are generally self-explanatory. The first two areas, Department Home Pages and Faculty, are included because the ESL is intended to be a part of the educational process at a specific institution. The Online Courses category reinforce this but also offers the students many opportunities to look at classes similar to, but not exactly the same as, those that they may be taking. We expect the online courses area to grow very rapidly. Full-text Books are exactly that; when you select a particular title, the full text of the book appears. The National Academy Press has placed a large percentage of its books on line. There are also archival books whose copyrights have expired as well as several that are online due to the beneficence of the authors. There are also several full-text books that actually appear in other sections such as Reference or may be found under another subject. There are far more full-text books available in the non-science areas. The exact opposite situation appears to exist with the online journals. Online Journals are not as clear-cut an issue as the full-text books. As you know, this is the result of the many academic and societal publishers who seem unable to deal with the myriad issues associated with electronic publishing. As a result we have journals whose access ranges from full-text to abstracts or tables of contents. Since the number of full-text journals is growing so steadily, we will be giving serious consideration to eliminating the products that provide only a table of contents or abstract. The Reference section currently consists of dictionaries, glossaries and encyclopedias. These are often divided by disciplines within a subject for ease of use. The next line consists of Search Tools, Databases, and Calculators and Tools. The search tools are not the typical search engines associated with the Internet such as Yahoo and Excite but are Internet-based tools associated with a particular field such as Medline in the medical field or NASA Reports in the engineering sections. These provide an enormous wealth of information through, rather than within, the Internet. Databases offer enormous potential in the classroom but they also present problems. One that is particularly obvious is differentiation between a search engine and a database. Although we have placed them in separate categories, it is increasingly apparent that the difference is relatively minor and they will soon be combined. The third section of row 3 is called Calculators and Tools and consists of 3 primary areas. The first of these comprises a number of scientific calculators that have been developed using Java technology and that are available on the web at no cost. Applets, also made possible by Java, include virtual experiments and can be found primarily in the physics and astronomy sections. A typical applet displays one or more graphs or some other way to illustrate data points. An area to enter or change numerical values is also provided. Changing these values will immediately change the voltage, the trajectory, the orbit or any other value-dependent variable. The final part of Calculators and Tools is a page designed to go beyond the discrete, catalogable item by pointing to important information regardless of where it is housed. Broad classes of discipline-specific information needs and appropriate resources are identified. This is an experimental tool, designed by the Science Library, and is aimed at less sophisticated students who might not recognize that a particular tool or device was available on the Internet. This particular effort first appeared as the Prototype Chemistry Page but there are obviously many other applications where it can be used. The referenced page is for illustration only and shows only a few of the actual tools available. The remaining three categories on row four are of minimal interest. Primarily they are pointers to areas that we have chosen not to develop fully but that are important enough to be noted. Incidentally, work on the prototype ESL took approximately one year or 500 hours that included searching for resources and cataloging as well as routine housekeeping chores such as ensuring that URLs remain stable. Other tasks included developmental work in non-science areas and in the design and construction of a new database. Results The immediate results are a set of tools, useful for faculty, students and researchers that brings thousands of valuable Internet resources to them in a logical and easy-to-use format. Those are the visible results. We have also gained an understanding of what really is available on the Internet as well as the values and limitations associated with those resources, most notably the arduous work involved in a project of this nature. The Electronic Science Library has often been used as the core around which Internet classes are built or it is cited as an important resource. These classes range from University 101 (an immigration course for new undergraduates) through graduate level seminars on information retrieval. We also introduce the ESL in all library information classes whether they are Internet-based or not. The design of the system in its present configuration is particularly well suited to the library classroom because we need only to pick the subject under discussion and from there, each category (books, journals, etc) is merely a click away. The project has also allowed librarians to collaborate with teaching and research faculty for projects, classes and proposals. For this work to reach its full potential participation by the faculty is required and this is beginning to occur. Seminars, workshops and regular library newsletters will increase faculty and student awareness in the future. We are also making the ESL more visible by integrating important research tools such as the Web of Science, Lexis-Nexis and Current Contents with it. On the negative side, this work has proven to be far too labor intensive. Searchers first must find a candidate object, then examine it. At a minimum they will then add it to a given page, making sure that the connection works. In many cases, depending on the category of material, call numbers, keyword, author and publisher's name and availability of a hard copy in the library were included. In some cases the same record might then be added to different pages. A good hour might possibly produce two or three completed items. Upkeep of the system has also proven to be a problem, particularly in the area of electronic journals where the publishers have continually developed new and trial pricing arrangements. We have had several occasions where one such arrangement causes a hundred or more journals to be added or to have their status changed. Before the prototype was finished it was determined that the results warranted the continuation of our work but that modifications need to be made to our mode of operation, namely to facilitate the librarian's task by developing automated "Librarian Assistants." These, we hope, will increase productivity by assisting in all or part of some of the necessary tasks such as acquisition, search, retrieval, classification and indexing of resources. Software will also be used to simplify the users involvement by including search capabilities and arranging materials in a logical and user-friendly manner. Database Development Objective The objective of this part of the project was to create a database, searchable on the web, to replace the numerous HTML links created in the initial phase of the ESL project. That database would include selected discrete, catalogable items, web and non-web items. Subject specialists can use the system to create and modify discipline-specific databases. As previously stated, constructing the original web links turned out to be very labor intensive. Any one item may have been set up with numerous links. In addition, there were only predefined paths to subjects and no way to search by keywords or subject headings. Each item in the new database would be assigned limited subject headings to minimize the work of the selecting subject specialist. Subject headings could be set up, using an established controlled vocabulary, by the manager of the database or by the subject specialists themselves. At this time it appears that we will use the Dublin Core Metadata format modified to accept our nested subject headings. Approach An existing Microsoft NT Server, SQL Server and Internet Information Server installation was used. This avoided additional software and hardware costs and minimized the need for additional support personal. In addition, only HTML, ASP and Visual Basic Script were used to keep the skill level needed for support minimal. The project was broken down into three tasks: Create a database with a table for each category. All categories , whether they are online classes, books, journals or calculators, will be assigned the following Dublin Core Metadata fields: title, creator, subject (nested) and keywords, description, publisher, other contributors, date of publication, type of resource, format, URL, source, language, relation coverage, and rights holder. In our case we also intend to include the applicable discipline or disciplines. Distilling and then building the database will remain our largest challenge. We are currently experimenting with manual procedures but have no doubt that our success relies on complete or partial automation of the process. Our future plans call for the development of such tools. Anyone who is involved in a similar project should look at the DC-dot-Dublin Core Generator. This, at least, offers some help in developing the metadata. It works extremely well when the metadata have previously been developed and applied to a given item. In the near future, it is hoped that everyone who is developing an online resource will insure that each page has the core items installed or that the proper information is included in the design of the product so that meaningful metadata might be developed. Create a web-based staff module with input and editing screens for each category. Upon reaching maturity, our system will automatically identify and locate those fields that constitute the database. In the meantime, input to editing screens will be done by humans with some subject expertise. The principle tasks will be to enter new records, create/delete disciplines, create/delete any of the core fields and edit existing records. This portion of the system has been developed and we will be entering data relative to our biology database project by the time that this paper is published. Create a web-based patron module with search options. This portion has also been developed and awaits the input of data. It is designed to be integrated with other electronic projects and search tools. Its design is intended to meet the requirements of all searchers regardless of experience. In fact the system will perform best for the more inexperienced users. Search options include selection of one or more categories, selection of a discipline, subject search, keyword search and call number search. One of the more interesting features is the nested subject search. A major subject heading will reveal all of the subject headings subordinate to it to three levels. Results The result of this work was the creation of a database shell. The shell allows a librarian to easily create a World Wide Web searchable database on any discipline. The collections would not be everything located on the web claiming to cover a discipline or subject heading, as traditional web search engines now produce. The collection model resembles the collection management policies of traditional librarians. Collections can be tailored to fit the needs of the curriculum taught at the University or to fill the needs of faculty and students based on their requests. Searching is quite flexible. All hits that are on the web are presented as links. The searcher can choose one, a few or all the categories to search at the same time. Each subject search returns hits for all records for which that subject is the last heading in the record. In addition each subject search returns a list of all subject headings nested within it. Keyword searches query all fields in the selected category(s) and discipline. If a keyword happens to be a subject heading, the searcher will be given both keyword hits and subject headings nested within term. Call number searches can be used to browse related material. Wild card capability is built in. Conclusion We have attempted to provide a look at one university's attempt to integrate the new and ever increasing digital resources of the Internet with the traditional resources. This is not only a physical matter but also a psychological one. A considerable amount of inertia exists in both directions regarding Internet usage. Many people, including some librarians, are loath to consider an Internet resource a reliable one. Conversely many students spend hours looking for non-existent answers on the Internet when the actual answers are available on the library's shelves. The physical portion of our project began with a fairly simple goal. We would evaluate resources available on the Internet. We would seek a sensible way to locate, arrange and use them and we would attempt to introduce them into the classroom and laboratory. Because we had limited resources, we started with the sciences and within one year (and <$5,000) developed the Electronic Science Library. We learned important lessons from this effort. The resources were more plentiful and more useful than we expected. The work to locate them and assign order to them was also more than we expected. Our conclusion was to proceed but to change the focus. Instead of using humans to search for and arrange the resources, we would use humans to design the tools for searching and to evaluate and teach the retrieved resources. Exploratory work began on the Academic EAL before the ESL was near completion. It was important to have an idea of the quality and quantity of the non-science resources if we were to continue. In many ways the non-science resources differed from the scientific ones but there was no perceived decrease in value. We have begun construction of the Electronic Academic Library (EAL). Graduate students in the College of Library and Information Science recently completed preliminary pages for English Literature, Library Science, Music and Philosophy. A collection for Statistics has also been placed in both the ESL and EAL. Work in non-science areas is as interesting as our initial endeavors with certain variations. Instead of databases and calculators we are finding materials such as interviews, artwork and sheet music. It appears, however, that the EAL will now pass to the control of another group within this library. The original designers will assist as consultants. At this point it seems likely that materials in the EAL will be entered into the University OPAC rather than into the database that we designed. Non-science subjects currently available are: {English Literature} {Library Science} {Music} {Philosophy} {Statistics} In closing, what began as an experiment now stands as a sturdy tool at a fraction of the cost of commercial databases. Librarians, faculty and students are becoming more familiar each day using and working with digital resources from around the world. After two years of development we have thousands of separate resources in an orderly format, as well as important computerized tools and techniques. We are in a good position to expand the scope of our system by a factor of at least five within the next year and continue with substantial growth for the foreseeable future. This work also serves as the basis for additional opportunities including those involving other departments, libraries and companies. Future Projects At the present time, Summer 1999, the Electronic Science Library is well established. A model for the academic library has been established and will be passed on to others. The database project is undergoing modification to adhere to the Dublin Core Metadata concept and the search engine is ready for use. We are currently working on a project somewhat smaller in scope, the Biological Database Project. Two of the most important aspects of this project will be the automated spider that will find databases and our ability to immediately develop and impose metadata from the newly found object to our database of databases. Both of these projects will depend on automation, which will probably be accomplished in several steps. The value of attempting these automating projects in conjunction with the Biological Database Project is that everything is starting from ground zero. We will not be facing the prospect of developing and installing additional fields of information on a database that already holds several hundreds or thousands of items. Acknowledgment. Grateful appreciation to BellSouth Instructional Innovation Grants, 1997 and 1999. We welcome your comments about this article.