College and Research Libraries Scholarly Information William Y. Arms theme of recent EDUCOM con- ferences has been the merging of technical areas which have traditionally been separate. The same is becoming true of scholarly infor- . mation, but universities have been slow to react to the need. The problem is simple. A student writ- ing a paper, a faculty member preparing a course, or a scholar working on a research project begins by assembling information from many sources. These sources can in- clude libraries, museums, photographic archives, commercial services, computer data bases, personal contacts, and private files. The search may be on-campus or world wide. In some fields of study, as- sembling information can form the major part of a research project; in others it is an essential building block. Computing has the potential to improve this process, but requires coordination . Otherwise the various areas will continue to develop services that fulfill parts of the need but do not provide the links that would allow scholars access to all the re- sources of a modern university. LIBRARIES In the field of information, the pioneers have been the libraries. Long ago they re- alized that merely to collect books was of little value to scholars. Librarianship de- veloped as a profession around the disci- plines of cataloging and classification, tools used to give information about li- brary collections. The principles of librarianship are care- fully spelled out in documents such as the Anglo-American Cataloguing Rules, and li- brary schools have been established to teach these principles to new librarians. Nobody claims that the classification sys- tems or subject headings are perfect, but they are in widespread use and provide a reasonably effective way to find items in a library. Scholars often require more information than can be found in an orthodox catalog. Secondary information services exist to fill this need. These provide information- titles, keywords, or abstracts-about indi- vidual journal articles. Most secondary services are disciplirie-specific. Some are huge. For instance, Index Medicus, Chemi- cal Abstracts, and Lexis cover the entire fields of medicine, chemistry, and law re- spectively. Others are tiny. When library computing developed in the early 1970s, two major success stories were shared cataloguing and on-line com- puter searching of secondary information services. Shared Cataloguing Cataloguing a book accurately is a skilled and time-consuming task. Since many libraries acquire the same books, it is sensible for libraries to share their rec- ords with each other. This is not an easy computing problem. Bibliographic data is extremely subtle, and an effective shared cataloguing system requires an enormous number of terminals to use a very large bibliographic data base. The pioneer in this area was OCLC under the direction of Fred Kilgour. OCLC has been followed by a number of other systems, most notably the Research Libraries Group based at Stanford University. William Y. Anns is vice provost for computing and planning, Dartmouth College, Han over, New Hampshire. Reprinted with pennission from the EDUCOM Bulletin, v.18, no.3/4, Fall/Winter 1983, p.21-23. The EDU- COM offices are located at P. 0 . Box 364, Carter and Rosedale Rds., Princeton, NJ 08540. 165 166 College & Research Libraries OCLC was able to build on earlier work by the Library of Congress and the British National Bibliography in establishing an international format for exchanging cata- log records between computer systems. This format, known as MARC, is sup- ported by all major cataloguing services. Dartmouth was an early member of both OCLC and the Research Libraries Group . Over the past ten years shared catalogu- ing has allowed the library to improve the quality of its cataloguing and build up a large machine-readable data base despite the recent budget pressures. Information Retrieval Services Large secondary information services produce so much material that searching them becomes a major problem. In this field the computer pioneer was the Na- tional Library of Medicine. The library had an early computer system to assemble the numerous items for printing in Index Medi- cus. As a result, the entire text was avail- able on magnetic tape. The earliest med- ical search system, Medlars I, was a batch processing system which searched these tapes to find articles that matched speci- fied search profiles. When this concept spread to other disci- plines, two requirements emerged. The first was a demand for online searching. The second was "standard procedure" for users. Secondary information services use a wide variety of approaches; indeed, the disciplines they serve are so diverse it is difficult to envisage any single standard satisfying them all. Yet it is important for library staff to be able to use them with a minimum of training. Several commercial companies provide libraries with on-line searching of second- ary information . The first was Lockheed, with the system now known as Dialog, followed by SDC and BRS. These com- panies acquire data bases from many sources, ·mount them on-line, and provide a standard search procedure . This is a competitive business and the companies use advanced methods for storing and searching huge data bases, including free text searching. These two major achievements are now converging, Libraries are beginning to re- place local card catalogs with on-line com- puter systems. These use both the MARC May 1984 records produced through shared cata- loguing and the methods of data base searching developed by the various biblio- graphic services. At Dartmouth, the Pew Foundation provided funds to load the MARC records developed on OCLC and Research Libraries Group computers onto a duplicate of the BRS search system. This was a convenient way to provide a gener- . ally available on-line catalog. NON-BOOK MATERIALS The success of library computing has led to extensions in a variety of areas. Some of these are traditionally housed within the university library; examples are maps and manuscripts. Others, such as artifacts and paintings, are likely to be found in the uni- versity museum. Some areas, such as films and photographs, have a variety of homes in different universities. Collec- tively these are sometimes called "non- book materials". For a number of reasons computing progress has been slower in these areas than in libraries . One reason is that most of the materials are resources for the hu- manities, usually less well funded than the sciences. In addition scholars in the humanities have been less familiar with computing than their colleagues in the quantitative disciplines . Another diffi- culty is that library automation has made its contributions in sharing information about items that are held by many li- braries; most manuscripts, paintings, and museum objectives are unique. Finally, no widely accepted standards exist for cata- loguing and classifying most scholarly materials other than books and journals. Despite these difficulties, numerous at- tempts have been made to develop infor- mation systems for museums and other non-book materials. Funding has been limited, but still much useful work has been accomplished . Recently this work has received a cham- pion in the J. Paul Getty Trust. The Trust has the prestige to coordinate many areas and the long-term funding to tackle some of the underlying problems. The Trust has projects in a wide variety of fields . One is to build a computer catalog of the collec- tions of a group of museums and galleries, beginning with paintings . This will in- clude several major national museums .. I ' I and two universities, Dartmouth and Princeton. Another project is to catalog several enormous photographic archives. Both these areas require subject indexes of visual objects such as paintings, vases, and architectural sites. This topic, known as iconography, is extremely complex with no established standards, yet is es- sential for success in these disciplines. Many of the finest collections are in Eu- rope, which adds the complications of for- eign languages and latent chauvinism . DATA ARCHIVES The discussion so far has been of com- puter systems that provide information about traditional scholarly materials such as books or paintings. In other fields, the information is more closely linked to the computer. Data archives were an early case. Perhaps the best example of a data ar- chive is the U.S. Census; in fact, the Hol- lerith punched card was originally devel- oped to tabulate census data. More recent censuses have released raw data on mag- netic tape . This data is invaluable for stud- ies in several social sciences, but extract- ing information from hundreds of reels of tape is so tedious that for the most recent census each state has set up a dissemina- tion bureau and several universities have provided their own services. The cost of such service is so great that even universi- ties the size of Harvard and MIT have found it cheaper to work together. Several universities, most notably in Michigan, have centers whose task is to gather data archives and make them avail- able for research. Project Impress at Dart- mouth College, developed during the early 1970s, was a data base system for teaching students how to analyze such data archives, a large number of which are stored on-line. The value of Impress lies in the combination of data archives and good quality search software. COMMERCIAL DATA BASES Some academic disciplines use data ba- ses from the commercial sector. These are varied both in quality and scope and have two types of origin. Some, such as the Scholarly Information 167 news services, began life as information services used internally by an organiza- tion which realized that outsiders would pay for access. Others, such as the ser- vices giving information to financial inves- tors, are aimed at specific groups of pro- fessionals. By academic standards all these services are extremely expensive. An interesting experiment in this area is The Source. This commercial company li- censes a range of commercial data bases and mounts them on its own time-shared computers. A more or less standard user interface is provided so subscribers can teach a variety of information with mini- mal training. The Source, in its present form, is of marginal use to scholars, but in five years time such services may mature into more usable form. COMPUTING INFORMATION For many years librarians have been asking computing specialists for assis- tance. Unfortunately, assistance has not been forthcoming. The computing sys- tems of our universities have become enormous collections of poorly indexed tools and resources. In the days that com- puting was restricted to a few specialists this was not important. When computer users were concentrated into terminal clusters, with many users sitting side by side, word of mouth was still an effective way of disseminating information. Now that computing has become widely dis- tributed across campus, some better · way is needed for scholars to learn of the riches at their fingertips. Dartmouth, as the first university to place emphasis on universal computing, developed a set of indexes that were suit- able for a single large time-sharing sys- tem . These include an enormous collec- tion of files which can be read either with system commands or from within pro- grams. In addition there are indexes to li- brary programs and publications. Al- though few universities can rival the completeness of information available at Dartmouth, the system is still far from per- fect. One problem is that many of the most useful programs are unknown to central staff. They are in departmental libraries or even in personal catalogs. Another prob- lem is the variety of computer systems. A 168 College & Research Libraries user of Dartmouth College Time Sharing may be unaware of a program that runs under the UNIX System.* Finally, com- puting is always changing. As services are introduced or withdrawn, keeping infor- mation up-to-date is a perpetual problem. INTEGRATION The word integration is much used in computing, but rarely defined, and even more rarely achieved. Each supplier of scholarly information has a different vi- sion of how to integrate specific areas. For example, libraries want to integrate their internal data processing, their ser- vices to scholars, and their links to other libraries. The aim is for a single descrip- tion of each item to be used by all library systems. A scholar has a different set of objec- tives. A faculty member or student using a library catalog through an on-line terminal is not interested in how smoothly that cat- alog fits with other data processing carried out by the library. However, after finding a reference in the catalog, the scholar de- mands follow-up services such as being able to copy the reference into a personal bibliography or word processor. At Dart- mouth this problem has been partially solved by the fortunate accident of having a catalog system that runs under the UNIX system. UNIX is primarily an academic operating system and works well with other computers used for teaching andre- search. · The scholarly information system of the future will have the university providing central coordination of a variety of inde- pendent suppliers of information. These suppliers can be large or small, on-campus or off-campus. Since many of these sup- pliers will not be under the direct control of the university, providing smooth access to them all is not easy. Key aspects of this information system will be: Quality Control The university must identify major sources of information and ensure that in- *UNIX is a trademark of Bell Laboratories. May 1984 formation provided is accurate and cur- rent. Terminals A major assumption of computer plan- ning for universities is that within a few years almost all scholars will have a small computer on their desks. One use of such a small computer is as a terminal to larger computers functioning as major informa- tion sources. The university must stan- dardize a small number of different types of personal computers . Communications Computer planning for universities as- sumes the existence of campus networks, but Dartmouth is one of the few to have such a network in place. Any terminal or personal computer connected to the ·net- work has equal access to all computers on the network and is also able to make off- campus connections using services such as Telenet. Currently, almost all information ser- . vices are designed around low-speed se- rial communications. The future is likely to require much higher capacities, either digital or video, so images or complete documents can be transmitted. User Interface Since each information source is likely to have a different user interface, the only way to provide integrated service to the scholar is for the personal computer to translate procedures used by the various sources into some homogeneous user in- terface. Today most ~nformation services as- sume that the service is being used di- rectly by a human, either a scholar or a supporting professional. In the future, the user is more likely to be another com- puter. This requires agreement on appli- cation protocols. Long-term Planning New technology and new sources of in- formation are going to become available continuously throughout the next decade. The university must watch these develop- ments, anticipate some, and consciously decide to ignore others. Each of these areas require standards. One of the most valuable services a uni- versity can provide is an acceptable set of standards for computing and information. The difficulty is finding a balance between overstandardization, which restricts flexi- bility, and the chaos that results when there are no standards. Scholarly Information 169 CONCLUSION Scholarly information is too big a topic for universities to ignore. Moreover it has so many ramifications that leaving its planning to the library, or worse still the computer center, is unlikely to provide good balance. The only sensible solution is a coordinated plan in which many parts of the university work toward the com- mon goal of providing faculty and stu- dents with the information they need for study and research.