College and Research Libraries ALLEN B. VEANER The Application of Computers to Library Technical Processing A 1967 WHITE HousE report, Com- puters in Higher Education, begins with an arresting statement: "After growing wildly for years, the field of computing now appears to be approaching its in- fancy."1 Library automation has passed through similar throes, and we may be at the beginning of a period of new and significant development. Several important milestones have al- ready been reached. Computer experts, now facing the problem of structuring and maintaining complex files , and deal- ing with a wide span of graphic output characters, have begun to appreciate the data management complexities inherent in bibliographic data. We no longer hear from computer people that our prob- lems are trivial. We, in turn, have real- ized that it is no longer possible to speak of one component or subsystem such as an acquisition system, in isola- tion from other technical processing functions. Automation has confirmed the integrity and unity of technical process- ing. The economics of applying computers to library data processing has come as a rude shock to many .administrators. The old idea that an automated system could be operated at .a new lower cost than a manual system is dead, indeed. Mr. V eaner is Assistant Director for Bibli- ographic Operations, Stanford Univ ersity Li- braries. This paper was prepared for "New Dimensions in Acquisitions," an American Library Association preconference held in Atlantic City, New Jersey, June 19-20, 1969. 36 I One now needs to plan future budgets in terms of cost avoidance or improved library services. The Large System: Maker or Solver of Problems? The choice between stand -alone equipment .and procurement of services from a central facility is the first major decision in any automation endeavor. The small or medium-sized stand-alone device is attractive because one can fully dedicate it to a specific applica- tion. But as the user's sophistication and system requirements increase, he out- grows the smaller machine and soon finds that he must cast his lot with a larger facility in order to enjoy certain technical benefits and operational fea- tures not available on smaller devices. It is at this point that one must be pre- pared to give up some freedom in ex- change for more computer power, and where the complexities of scale begin to compete with the economies of scale. Software in the large system carries with it unforeseen problems that seem to crop up endlessly and affect the scope of many operations in unknown and unpredictable ways. Hardware m'anufacturers and software developers have already learned about this, much to their chagrin, especially with time- sharing. W. F. Miller, Associate Provost for Computing at Stanford, character- izes this facet of software thus: "The re- ward, and at the same time the retribu- tion, of software is self-change."2 The reward is the enormous increase in our power to do things ; the retribution is Computers and Library Technical Processing I 31 the unforeseeable perturbations which come b.a<;k to haunt operations thought to be fully debugged and dependable. Fifteen major hazards in the develop- ment of large multi-use systems have been enumerated in a paper by F. J. Corbat6, of Project MAC at MIT. 3 The dangers cited include lack or inadequa- cy of documentation, failure to imple- ment designs, overstaffing of the design team with its attendant communication and supervision problems ( Corbat6 con- ceives of ten as a maximum number), overextension in time, the attempt to undertake more than one significant ad- vance at a time, the assumption that a finish date can be predetermined, lack of essential hardware, geographic scat- tering of resources (people and equip- ment), too many maintenance people in the systems programs. Yet, once in the grasp of an auto- mated system, there is no turning back. Entering upon an automated system in any enterprise is practically an irrevers- ible step. This is why reliability in au- tomated systems is a factor of over- whelming importance for library opera- tions. The thing about library operations is simply that they must be operational. Our users and our management de- mand facilities that work during all nor- mal service hours, and sometimes be- yond that. With this critical background, I would now like to describe what I be- lieve are useful and profitable computer applications to acquisitions and techni- cal processing. I also wish to report in some detail Stanford's development work in automated technical processing, an effort supported by the Office of Ed- . ucation' s Bureau of Research. (Contract OEG-1-7-071145-4428) Candidates for Library Automation First, it is clear that a significa~t ;num- ber of libraries do not require and should not embark upon library auto- mation programs; they should instead participate in regional technical process- ing centers, operated either by a juris- diction or a commercial organization. Typically, these libraries order and process mainly current English language imprints marketed in the book trade, and they buy multiple copies of the same title for branch libraries. NILINET (New England Library Information Network), The Ohio College Library Center, and the Colorado Academic Li- brary Book Processing Center are ex- amples of service agencies for libraries which should not individually under- take automation, because their local op- erations are too small in scale. In the aggregate, the scale is sufficient to sup- port the personnel and machine over- head demanded by computerized oper- ations. These new centers may soon supplant in-house technical processing operations. While it is not clear at this time that technical processing will dis- appear altogether in the small and me- dium-sized library, it will certainly be radically altered in the near future. It is doubtful whether large university and research libraries can ever dispense with internal technical processing services, but even there, more widespread utili- zation of centrally produced data is like- ly to shrink the size of technical proc- essing departments. Standardization Second, it is abundantly clear that the major impact of library automation will be felt in the area of bibliographic standardization. Page 1 of the final re- port, The MARC Pilot Project, contains a crucial observation: "The single most significant result of MARC has been the impetus to set standards."4 Standardiza- tion efforts will be greatly aided by budgetary considerations. In every en- terprise there is keen competition for the dollars needed to run every opera- tion in the organization, and the dollars can be very determining. The increas- ing trend towards measuring perform- 38 I College & Research Libraries· January 1970 ance effectiveness is already being felt in libraries. For example, Booze, Allen, and Hamilton is conducting a major management study for the Association of Research Libraries, a study whose aim is management improvement and increased adequacy of budget justifica- tion. Two-thirds of a century ago, Herbert Putnam, then the Librarian of Congress, outlined the Library's proposed card distribution service. The purposes of dis- tributing centrally produced biblio- graphic data are stated in clear and simple language: to supply libraries .with information of books which they do not possess . . . to en- able them to avoid expense in the prepa- ration for use of those which they do possess. He goes on to quote the contemporary library press, pointing out that the two most costly factors of getting a book re- corded in the catalog are the work of the cataloguer, the expert, and the work of the compositor or transcriber. It is worth the time and space to quote in extenso from this 1901 report: Now, the interesting thing is that until now libraries have been, in effect, dupli- cating this entire expense-multiplying it, in fact, by each one undertaking to do the whole work individually for itself. There are thousands of books which are acquired by hvndreds of libraries-exactly the same books, having the same titles, the same authors and contents, and subject to the same processes. But each library has been doing individually the whole work of cat- aloguing the copies received by it, putting out the whole expense .... American instinct and habit . revolt against multiplication of brain effort and outlay where a multiplication of results can be achieved by machinery. This ap- pears to be a case where it may. Not every result, but results so great as to effect a prodigious saving to the libraries of this country. The Library of Congress cannot ignore the opportunity and the appeal. It is, as I have said, an opportunity unique, presented to no other national library. For in the United States alone are the library interests active in cooperative effort, ur- gent to "standardize". forms, methods and processes, and willing to make conc~ssion of individual preference and ·convenience in order to secure results of the greatest general benefit. . .. .A centralization of cataloguing work, with a corresponding centralization of bib- liographic apparatus, has been for a quar- ter of a century an ambition of the li- brarians of the United States. It was a main purpose in the formation of the American Library Association in 1876 ... . The economies effected to the libraries of the country might alone justify the main- tenance expenses of the Library of Con- gress even without a single direct service to scholarship. The country at large might indeed save great expense by purchasing a copy of a book merely to be catalogued at Washington, even if that copy should never go outside of the walls of the Li- brary nor find a reader within it. There are many difficulties of detail, and the whole project will fail unless there can be built up within the Library a com- prehensive collection of books, and a corps of cataloguers and bibliographers adequate in number and representing in the highest degree (not merely in a usual degree, but in the highest degree) expert yraining and authoritative judgment. But the possible utilities are so great; they suggest so obvi- ous, so concrete a return to the people of the United States for the money expended in the maintenance of this Library; and the service which they involve is so obvi- ously appropriate a service for the Nation- al Library of the United States, that I communicate the project of this report as the most significant of our undertakings of this first year of the new century. 5 Is it not time to realize Putnam's dream? Is not the day long gone when we can justify a host of alternatives to centrally produced bibliographic data? It is my conviction that there will be no justifiable computer operations in li- braries · until we realize that the com- puter is an instrument of standardiza- tion, not a device whereby we perpetu- Computers and Library Technical Processing I 39 ate the alteration of bibliographic data produced by a central source. The idea of a local cataloger examining LC pre- pared data on a CRT terminal for edi- torial modification is economically un- supportable and managerially unwise. Yet there are still libraries which, even in their manual systems, alter 100 per- cent of the card sets they receive from the Library of Congress. The cost of performing such chores of questionable necessity is likely to be intolerable in a computer environment. The aggregate of system resources spent on data man- agement, Central Processing Unit cycles, Input/Output, channel time, and so forth, will be too great, and the com- puter's ability to do its own bookkeep- ing is relentless. Hence, it will be im- possible to bury the cost of changing bibliographic data. Perfectionism: Friend or Enemy? Perfectionism .and permanence are . two interdependent fallacies of modern bibliography. Perfectionism is based up- on the idea that the librarian is creating a permanent record. Unfortunately, even in. the manual system this has never be~n true. Even the Library of Congress' Official Catalog changes sub- stantially, the amount varying according to the age of the record and ranging from an estimated rate of about 5 per- cent in the first year of a record's life to an aggregate of about 40 percent of all records after thirty years. 6 To prepare for future network applications it is es- sential that changes in the nation's bib- liographic records be kept as consistent as possible, and this is achievable only by rigorous adherence to data c~ntrally produced at a national bibliographic center, even if those data contain errors when issued. At least in this way, errors will be consistent, and they can be cor- rected later in a consistent way by the central distribution service. The abandonment of perfectionism in bibliography needs to be established as a goal. (It need not be employed as an excuse for deliberate carelessness.) The future of a computerized update mech- anism for bibliographic records should encourage libraries to make rapid in- roads on arrear.ages now, without wait- ing until every bibliographic problem is solved with a score of 100. We may be approaching the first time in history when we can afford a few errors. Another facet of the technical proc- essing problem has been a traditional view, fortunately not shared by every- one, that all books are equal a!ld must receive equal technical processing. Just as we need to establish time priorities for processing, we need to make intel- lectual judgments concerning the quali- ty, amount, and depth of bibliographic treatment to be given publications. Be- cause such decisions are no longer irre- versible, there is an opportunity for ex- pedited processing and the preparation for public use of more books. The idea of self-sufficiency in re- sources, i.e., exhaustive collection build- ing, is dead. Self-sufficiency is a laud- able heritage of the protestant ethic, needed in eras of slower communica- tion. High "budget visibility" of book funds has aided in the development of a variety of cooperative acquisition pro- grams, based on the idea of building national rather than purely local re- sources. The costs of technical process- ing have not been so visible, but they are coming into sharper focus all the time. Costs now hidden in personnel and overhead are likely to be surfaced by the application of computer technol- ogy. Applications to Technical Processing There are two categories of work which can be substantially aided by computer applications. First, we have a great mix of data- management activities: keyboarding, updating, deleting, sorting, printing, dis- tributing, calculating, merging, filing- 40 I College & Research Libraries • January 1970 dull and boring activities. It is difficult to recruit and train, and almost impos- sible to retain staff for this kind of work. Rapid staff growth needed to accom- modate recent large increases in publi- cation output makes for very diffi- cult management problems: supervision troubles, lack of employee satisfaction, high turnover, poor communication within the organization, and difficulty of following standard procedures. Searching is the second category of technical-processing work which can be 1naterially aided by computer applica- tions. Stanford has applied substantial effort to develop a capability for on-line searching, because we believe that in this area there can be a future payoff in public service when computer costs come down to the point where public terminals can be justified. Meantime, thE! paucity and rigidity of access points for searching card catalogs and in _ proc- ess files makes searching for technical processing frustrating and much less productive than it should be. Development in On-Line Search and Retrieval Stanford has developed a search fa- cility by which many users can search the same or different files simultaneous- ly, just as one can do with the card cata- log, but with these additional features which no card catalog can offer: ( 1) users can interact or negotiate with the files expanding or contracting searches at will, even saving them for future ref- erence if desired (saved searches can be run against new MARC tapes); ( 2) users can carry ·out coordinate searches; and ( 3) users can access any of several central files anywhere that there is a terminal. System response time can be kept reasonably short-a few seconds- because an inverted file structure searches index files which point to data base entries. In other words, no serial searching is employed. With the aid of a grant from, the Li- brary and Information Science Branch of the Office of Education's Bureau of Re- search, Stanford is developing an on-line bibliographic control system dubbed SPIRES: Stanford Public Information REtrieval System. Acquisition and cat- aloging are the two chief areas of cur- rent research and application. However, it is well to mention that interactive searching is practical only on fairly large computer systems. 7 Requirements for On-Line Retrieval An on-line search facility requires sev- eral things: ( 1) a large computer facili- ty (Stanford's system uses a partition of an IBM 360/ 67); ( 2) software with built-in feedback features to facilitate system modification; ( 3) a large data base; ( 4) very large storage facilities; ( 5) a means of rapidly displaying search results, preferably by visual ter- minals; ( 6) a wide band communication network to transmit processed data to remote stations. SPIRES software already provides its users with the capability of communi- cating their satisfaction or dissatisfac- tion to the system's designers. A large data base is obtainable through MARC, and an even larger one will be available through RECON (REtrospective CON- version), if the full RECON Project ma- terializes. Really large storage facilities- enough to store even a million records locally-must await future, more eco- nomical devices, perhaps photodigital stores or laser beam recorders, such as the UNICON (Unidensity Coherent Light Recording). In terms of screen c-a- pacity, character set, and writing speed, visual displays are still quite costly and not yet truly satisfactory for bibliograph- ic data. A wideband communication network means coaxial cable, which costs about $1.50 per installed foot. Need for Collaborative Development One of the first automation lessons li- brarians learned was the astronomical Computers and Library Technical Processing I 41 communications gap between computer people and librarians. We conclude that this gap must be reduced nearly to zero -if the automation of library technical processing is to succeed. Three groups need to be brought together: the librar- ian, the computer expert, and the infor- mation scientist. The library can't do this job alone; in fact, none of these people acting alone is likely to succeed. Expendable Equipment? For many years we have been in an era of expendable software. In fact, soft- ware investment commonly runs two to three times the cost of hardware. It is not umeasonable to expect that the fu- ture is likely to bring us quickly to an era of expendable hardware. The Ameri- can economy already provides an out- standing precedent: the automobile is a piece of expendable hardware. Basical- ly, hardware and software are no dif- ferent. Some hardware-terminals in particular-may have a useful lifetime of only one or two years. The Future of Books and Bibliographic Files About ten years ago, the book began to come under some concerted attack as an inefficient means of storing and transmitting information. Despite the controversy surrounding this issue, one fact stands out: the book is still the cheapest to produce, the simplest and easiest-to-use device for information storage and retrieval. A 1969 article on the impact of the computer on publish- ing begins: "The most efficient informa- tion storage medium, by far, is the least sophisticated to produce-the printed page."8 In 1968, consumers spent $4 bil- lion for broadcasting services and an- other $4 billion for consumer electronic products. Yet in the same year, the val- ue of printed and published goods to- taled $22 billion, of which $12 billion was for newspapers, books and periodi- cals-substantially more than the sum spent for nonprint communication me- dia. Looking ahead some distance in the future, I see a long life for the book. I see the retention of paper as a major medium of communicating data for ac- quisition processing; booksellers in de- veloping counb·ies (and even in some advanced ones) will continue to issue paper invoices, some written in a fa- miliar illegible scrawl. I foresee contin- ued lack of rationalization of the proc- essing unit in book procurement (in- voices, purchase orders, checks, etc. ) , the factor responsible for the great amount of effort we face in distributing and redistributing data over media in reconciling our budget accounts and in- voice documents. I do not see vast on- line bibliographic files in our major re- search libraries, except possibly at the Library of Congress and maybe at a few regional bibliographie service centers. Rather I see the possibility that our en- tire concept of file organization will be restructured. A highly simplified model, which I hasten to add I have not casted, might look something like this: closest to the library user might be on-line ac- cess to current items in process and to tho'se permanently held items known to be heavily in demand. Somewhat fur- ther away-in terms of ease of search and retrieval-might be book catal6gs with relatively brief and simple entries supplemented by full bibliographic data in microfilm cartridges permanently ar- ranged by sequence n1embers in the form of a register. Such a master file could be centrally produced by comput- er output microfilm printers as a by- product of the MARC and RECON projects. This register would require vir- tually no updating-all the organization and maintenance would be confined to the book catalogs or on-line files, which would act as indexes to it. Even the book catalogs might be organized far differently from our present ones; some 42 I College & Research Libraries • January 1970 --l might be topical, others chronological. A microform register would be extreme- ly cheap to duplicate and distribute. Hard-copy of full bibliographic data could be easily obtained by convention- al reader/printers. Before any idealized file structure or service like this can be implemented, we need to know much more about our users than we now do. It is unlikely that we will reach this future by postulating great, all embracing "total system de- signs," either conceived in ignorance of user requirements, or representing someone' s pet idea. The necessary re- search, experimentation and implemen- tation should be dominated by two principles: ( 1 ) construction and testing of development models capable of self- change through user feedback, and ( 2) implementation of major functional modules one step at a time. REFERENCES 1. Computers in Higher Education. Report of the Presi- dent's Science Advisory Committee. Washington, The White House, February 1967, p. 1. 2. W. F. Miller, "Economic and Operating Realities of Present-Day Hardware and Software in Libra ry Applications," Proceedings of the Stanford Confer- ence on Collaborative Library Systems D eve lop- ment (Stanford: Stanford University Libraries, 1969), p. 145. (I am indebted to Professor Miller's paper for the section on large systems.) 3. F. J, Corbat6, Sensitive Issues in the D esign of Multi-use Systems (Cambridge: Massachusetts In- stitute of Technology, Project MAC, 1968) , p. 17. 4. The MARC Pilot Project, Final R eport (Washington: Library of Congress, 1968 ) , p. 1. 5. R eport of the Librarian of Congress for the Fiscal Year Ending June 30, 1901 (Washington: Govern- ment Printing Office, 1901 ), p. 28-37. 6. Conversion of R etrospective Catalog Records to Machine R eadable Form (Washington: Library of Congress, 1968 ), p. 144-147. 7. A searching guide, the SPIRES R e ference Manual, has been issued to explain the search facility; this publication is available on request. Further in- formation on Stanford's automation program is contained in the Proceedings of the Stanford Con- ference on Collaborative Library Systems D evelop- ment and in the first issue of the Project BALLOTS Quarterly Newsletter, which was distributed at the Atlantic City ALA Conference. The Proceed ings are available for $7.00 (prepaid ) from the Office of the Financial Manager, Stan- ford University Libraries, Stanford, Calif. 94305. Overseas orders are priced at $8.00 and must be prepaid in U.S. d ollars. 8. Ron Schneiderman, "Printers Seek Electronic Image," Electronic News, vol. 14 (April 21, 1969), 5.