College and Research Libraries l LOUIS KAPLAN I lnforination Retrieval From the Management Point of View Several conclusions may now be drawn by management, based on results derived from several "laboratory" experiments in information retrieval. A major finding is that a controlled indexing language (con- trolled by an authority list of headings) will not provide more ef- fective retrieval than will the uncontrolled type. Automatic indexing, using semantic and syntactic devices, does not improve upon the performance of a manual system. Increasing the number of subject entries per document (with or without computer) will increase the number of retrievals relevant to a question, but will at the same time disproportionately increase the number of nonrelevant references. INTRODUCTION A NUMBER OF INVESTIGATIONS conduct- ed recently by documentalists have grave implications for those library ad- ministrators contemplating the develop- ment of a large-scale information sys- tem. In this paper some well-known ex- periments are discussed, and the results evaluated from a management point of view. During the past few years a num her of significant tests of information retriev- al systems have been conducted, of which three are perhaps most important to librarians: the work by Cleverdon and his associates at the College of Aer- onautics in Cranfield, England; by Sar- acevic, Rees, and others at the Center for Documentation and Communication Research at Case Western Reserve Uni- versity; and by Salton and his co-work- ers in the Department of Computer Sci- ences at Cornell. These information sci- entists have indisputably advanced our Mr. Kapla.n is Director of Libraries, The . Memorial Library, University of Wiscon- sin, Madison . understanding of information retrieval; on the other hand, their efforts to op- timize retrieval have not met with undi- vided success. Furthermore, from the li- brary management point of view, the depth of indexing employed, the con- struction of thesauri, and the sophisti- cated devices introduced seem terribly expensive. Nevertheless, it would be a mistake for librarians to ignore the im- plications of the work done by these in- formation scientists. BRIEF . DESCRIPTION OF THE TESTS UNDER DISCUSSION 1. The Cranfield tests. The Cranfield tests emphasized the significance of lan- guage devices which influence recall and precision, such as roles, links, inter- fixing, partitioning; also studied was the influence of the number of coordinate tenns in a search question and the depth of indexing.1 Three indexing languages were test- ed: single-terms, concepts, and a con- trolled language, all in the subject field of aerodynamics. With each language several recall devices were tested, and I 169 170 j College & Research Libraries • May 1970 for each of the languages several preci- sion devices were used, including coor- dination. 2. The Case Western Reserve tests. Several indexing languages were tested by Saracevic and his team. 2 Those that need be referred to in this context are: ( a) keywords assigned by indexers ( that is, in the language of the text) and (b) a language based on the so-called "telegraphic abstracts" (a language em- ploying a number of formal recall and precision devices ) . The tests conducted at Case Western Reserve University emphasized the in- fluence of the manipulation of search questions. Depth of indexing was tested by treating full texts, abstracts, and ti- tles as independent variables. A third major variable was the indexing lan- guages. 3. SMART. The SMART system ( originally established at Harvard, now at Cornell) is described in a recent text by Gerard Salton and in a number of re- ports entitled Information Retrieval Sys- tem, coming most recently from the De- partment of Computer Sciences at Cor- nell.3 Unlike MEDLARS, where ma- chine manipulation follows manual in- dexing, SMART indexing depends as well upon machine manipulation of the documents prior to the actual retrieval process. Each search question and each document is manipulated from the view- point of word and phrase frequency and from the viewpoint of establishing, by frequency studies, clusters of related documents. In addition, dictionaries are provided to reduce the variety of words by com- pounding stems and suffixes; for exam- ple, one dictionary makes it possible to recognize the singular and the plural of a word as a single term, and words such as economize, economical, economies are also gathered up as a single term. Semantic relationships are established by means of a dictionary of synonyms, and the hierarchical relationships are es- tablished in a classified system. The syn- tactic relationship between phrases is controlled by phrase dictionaries, for ex- ample, library schools and schools of li- brarianship. The emphasis in SMART, then, is on the influence of these dic- tionaries on the document search and in the manipulation of the search ques- tions. These dictionaries are studied in- dependently and also with respect to their cumulative effect. Thus the SMART system identifies the single best dictionary, as well as those which in combination prove most efficient with respect to recall and precision. RESULTS OF TESTS The inverse relationships of recall and precision. There is general agreement that there is usually an inverse relation- ship between recall and precision, that is, while recall can be raised to 100 per- cent, the cost in the number of nonrele- vant documents retrieved is great. The nearer one approaches 100 percent re- call, the greater proportionately is the drop in precision. Automatic indexing. Using SMART methods Salton came to the conclusion that, "Fully automatic text analysis and search systems do not appear to produce a retrieval performance which is inferior to that obtained by conventional systems using manual document indexing and manual search formulations." Precision and recall devices. Precision devices, except for coordination, proved of little value. Of the various recall de- vices, the use of synonyms proved sig- nificant, while the hierarchical ( classi- fied) proved less effective than had been supposed. At Case Western the use of role indicators proved to be sig- nificant only when the full text was available to the indexers; with abstracts, role indicators and other retrieval de- vices were not superior. At Cranfield, the controlled language performance I j 1 j was not improved by manipulating it hierarchically. At Cranfield a surprising outcome was the realization that the uncon- trolled single term natural language of the text was little improved by most re- call or precision devices. At Cornell, it was found that the cumulative effect of all the dictionaries was more effective than any lesser combination. In summary, in any system a signifi- cant recall device is the dictionary of synonyms, but the hierarchical element is not of major significance. Coordina- tion is a powerful retrieval procedure. Controlled languages. At Cranfield, a rank order of thirty-three indexing lan- guages and devices was published, in- dicating their power of recall. The top seven languages were all uncontrolled. The best controlled language ranked tenth; its recall ratio was 61 percent compared to 65 percent for the best of the uncontrolled languages. The statis- tical difference between them is regard- ed as significant. SOME OBSERVATIONS FROM THE MANAGERIAL POINT OF VIEW Cost factors. Information scientists have not seriously attacked the question of the cost of the various indexing lan- guages.4 It would appear, given the em- phasis placed on the indexing languages at Cranfield and the search strategy at Western Reserve, that a number of those engaged in the testing were probably well acquainted with the subject matter of the tests. Despite this, Saracevic re- ported that the single greatest and most important variable was the quality of the indexing. A study of MEDLAR fail- ures shows that with respect to recall, 72 percent of the failures can be attributed to faulty indexing or to faulty search strategy, while with respect to precision the number attributable to these two factors was 45 perc nt From these bits of evidence the relative insignificance of Information Retrieval I 171 the indexing system and language, com- pared to the indexing itself, and the imaginativeness of the search strategy, rises to haunt us. Furthermore, realizing that automatic indexing is not now su- perior to manual indexing, and guessing at the cost of this kind of indexing, the prospects are anything but bright. Depth of indexing. Also significant is the considerable depth of indexing em- ployed in these tests, depth considerably greater than is provided by conventional subject catalogers. At Western Reserve, the num her of indexing terms extracted from the full text ranged from thirty-six to forty, while twenty-three to thirty were taken from the abstracts. The significance of the depth of in- dexing can be seen in the statistics sup- plied by Cranfield in tests run on the single-term, natural language indexing language: with fourteen index terms, the recall ratio was 62.8; twenty-two terms produced a ratio of 63.5; and thirty-three terms produced a ratio of 65. However, there is a law of diminishing returns with respect to the depth of indexing. When an average of sixty terms were taken from abstracts, the recall ratio dropped to 60.9. Au.tomatic indexing. Turning to auto- matic indexing, of considerable signifi- cance from the managerial point of view is the fact that the intellectual effort re- quired is considerable and of great sig- nificance with respect to the results. In the absence of a good dictionary of syn- onyms, the results can be disappointing, while the time required to compose a dictionary is an imposing consideration, as Salton has noted. On the average, using all the devices available, SMART performs as follows: Recall Precision Ratio Ratio 10 85-95 50 60-80 100 30-45 172 I College & Research Libraries • May 1970 As Salton himself has admitted, these are not satisfactory levels of perform- ance. Coordinate indexing. The first Cran- field study ( 1962) tested four indexing systems, of which one was a coordinate system, best known as Uniterms. As summarized by Cleverdon, "It achieved the best overall figures in the test, it pre- sented no difficulties for the technical searchers ... and was notably successful with short indexing times. It appears to have as good a relevance figure as any other system." Nevertheless, the Cranfield group re- fuses to concede any natural advantage to Uniterms (a ''post-coordinate" sys- tem ) over the others tested (the "pre- coordinate" types). The capability of re- trieving any combination of terms is a feature of a post-coordinate system, yet «the results of the investigation show that this advantage, though it existed, was not large." Also: "the difference be- tween the two types of system is there- fore shown to be not a fundamental dif- ference but merely one of cost or con- venience, and it has not been proven as yet on which side the advantage lies."5 It should be made clear in this con- nection that the Uniterm index system tested at Cranfield was devoid of vari- ous precision devices which are a fea- ture of other coordinate indexing sys- tems (such as the metallurgical index at Case Western Reserve). In the presence of such precision devices, the recall ratio found at Cranfield presumably would have been lowered. The argument has been made that a Uniterm system will break down if used with a large collection of documents. 6 Cleverdon disputes this, though neither disputant can argue from experience. Still another theoretical argument against the U niterm system is that it might prove less effective with social science and humanistic materials than with materials in the natural sciences. Computer manipulation of a manual- based system. Such a system is MEDLARS; it is not an automatic sys- tem in the sense of the SMART system. In the MEDLARS system, other than the machine search itself, the indexing operations are performed manually. The MEDLARS system, on the average, pro- vides the user with about 60 percent of the relevant documents in the collec- tion, but of the total documents retrieved, about 50 percent will not be relevant. It is widely believed that computer manipulation when applied to a con- trolled indexing language will greatly improve its efficiency. This is not true; even if more subject terms per docu- ment are posted, the overall efficiency of a controlled indexing language will not be significantly improved by computer manipulation, assuming that improve- ment of the recall factor alone is not enough. This raises a perplexing question. Are all our users equally allergic to an in- crease in the number of nonrelevant documents, given an increase in the number of relevant ones? For example, in this regard are historians to be equat- ed with chemists? With economists? Another perplexing question is this: librarians suspect that scholars do not use the subject catalog extensively, and most often use it with respect to subjects outside their own speciality. Is this mainly because the subject catalog is inadequate, or because their more ur- gent retrieval needs lie in nonmono- graphic documents not now indexed in our subject catalogs? Search strategy. Also of importance is the amount of manipulation of ques- tions (commonly termed search strate- gy) that took place in these experi- ments. In university libraries few ques- tions are manipulated to the extent that took place in the tests under discussion. In the Cranfield tests and at Case West- ern the manipulation of the questions was extensive. At Case Western each question was searched in four different ways, namely: ( 1) the searchable terms found in the question itself; ( 2) to ( 1) is added terms taken from a thesaurus; ( 3) to ( 1 ) is added terms taken from encyclopedias and sources other than the thesaurus; ( 4) a combination of (2) and (3). , The considerable influence of these four manipulations can be seen in the number of relevant and nonrelevant documents retrieved: (1) Relevant 106 N onrelevant 124 Recall Ratio .43 Precision Ratio .54 At Cornell, various semantic and syn- tactic procedures are applied both to the questions and to the documents; to put it otherwise, the heart of the SMART system is the correlation coef- ficients by which terms in the question are matched with terms from the docu- ments. Information Retrieval I 173 Except in libraries serving a small group of users, in a manual setting this kind of question manipulation will not be possible unless highly skilled librari- ans in considerable numbers are em- ployed. In the automatic system the manipulation of the questions is manda- tory. · Whether the costs of sophisticated in- formation systems can be justified in ei- ther the manual or the automatic mode remains to be seen. At the moment we have no idea what costs would be in- (2) (3) (4) 130 180 192 197 509 598 .52 .72 .77 .48 .34 .33 curred by systems such as the SMART system in the setting of a large library with a large number of scholars engaged in research. As for making the system available to undergraduates, this in- volves an entirely different order of cost magnitude. REFERENCES 1. Cyril W. Cleverdon, Factors Determin- ing the Performance of Indexing Sys- tems (Cranfield, England: College of Aeronautics, 1966), 2v. 2. Tefko Saracevic, An Inquiry into Test- ing of I nforrruJ;tion Retrieval Systems (Cleveland, Ohio: Center for Docu- mentation and Communication Re- search, Case Western Reserve Univ., 1968). 3. Gerald Salton, Automatic Information Organization and Retrieval (New York: McGraw-Hill, 1968). 4. Frank B. Rogers, "Costs of Operating an Information Retrieval Service," Drexel Library Quarterly 4:271- 78 (Oct. 1968). 5. Cyril W. Cleverdon, Report on the Testing and Analysis of . . . Indexing Systems (Cranfield, England: College of Aeronautics, 1962), pp.101-02. 6. Arthur D. Little, Inc., Centralization and Documentation. Final Report to the National Science Foundation. 2d ed. (Cambridge, Mass .: 1964). 7. Cleverdon, Report.