College and Research Libraries


l LOUIS KAPLAN 
I 

lnforination Retrieval 

From the Management Point of View 

Several conclusions may now be drawn by management, based on 
results derived from several "laboratory" experiments in information 
retrieval. A major finding is that a controlled indexing language (con-
trolled by an authority list of headings) will not provide more ef-
fective retrieval than will the uncontrolled type. Automatic indexing, 
using semantic and syntactic devices, does not improve upon the 
performance of a manual system. Increasing the number of subject 
entries per document (with or without computer) will increase the 
number of retrievals relevant to a question, but will at the same time 
disproportionately increase the number of nonrelevant references. 

INTRODUCTION 

A NUMBER OF INVESTIGATIONS conduct-
ed recently by documentalists have 
grave implications for those library ad-
ministrators contemplating the develop-
ment of a large-scale information sys-
tem. In this paper some well-known ex-
periments are discussed, and the results 
evaluated from a management point of 
view. 

During the past few years a num her 
of significant tests of information retriev-
al systems have been conducted, of 
which three are perhaps most important 
to librarians: the work by Cleverdon 
and his associates at the College of Aer-
onautics in Cranfield, England; by Sar-
acevic, Rees, and others at the Center 
for Documentation and Communication 
Research at Case Western Reserve Uni-
versity; and by Salton and his co-work-
ers in the Department of Computer Sci-
ences at Cornell. These information sci-
entists have indisputably advanced our 

Mr. Kapla.n is Director of Libraries, The . 
Memorial Library, University of Wiscon-
sin, Madison . 

understanding of information retrieval; 
on the other hand, their efforts to op-
timize retrieval have not met with undi-
vided success. Furthermore, from the li-
brary management point of view, the 
depth of indexing employed, the con-
struction of thesauri, and the sophisti-
cated devices introduced seem terribly 
expensive. Nevertheless, it would be a 
mistake for librarians to ignore the im-
plications of the work done by these in-
formation scientists. 

BRIEF . DESCRIPTION OF THE TESTS 
UNDER DISCUSSION 

1. The Cranfield tests. The Cranfield 
tests emphasized the significance of lan-
guage devices which influence recall 
and precision, such as roles, links, inter-
fixing, partitioning; also studied was the 
influence of the number of coordinate 
tenns in a search question and the depth 
of indexing.1 

Three indexing languages were test-
ed: single-terms, concepts, and a con-
trolled language, all in the subject field 
of aerodynamics. With each language 
several recall devices were tested, and 

I 169 


170 j College & Research Libraries • May 1970 

for each of the languages several preci-
sion devices were used, including coor-
dination. 

2. The Case Western Reserve tests. 
Several indexing languages were tested 
by Saracevic and his team. 2 Those that 
need be referred to in this context are: 
( a) keywords assigned by indexers 
( that is, in the language of the text) and 
(b) a language based on the so-called 
"telegraphic abstracts" (a language em-
ploying a number of formal recall and 
precision devices ) . 

The tests conducted at Case Western 
Reserve University emphasized the in-
fluence of the manipulation of search 
questions. Depth of indexing was tested 
by treating full texts, abstracts, and ti-
tles as independent variables. A third 
major variable was the indexing lan-
guages. 

3. SMART. The SMART system 
( originally established at Harvard, now 
at Cornell) is described in a recent text 
by Gerard Salton and in a number of re-
ports entitled Information Retrieval Sys-
tem, coming most recently from the De-
partment of Computer Sciences at Cor-
nell.3 Unlike MEDLARS, where ma-
chine manipulation follows manual in-
dexing, SMART indexing depends as 
well upon machine manipulation of the 
documents prior to the actual retrieval 
process. Each search question and each 
document is manipulated from the view-
point of word and phrase frequency and 
from the viewpoint of establishing, by 
frequency studies, clusters of related 
documents. 

In addition, dictionaries are provided 
to reduce the variety of words by com-
pounding stems and suffixes; for exam-
ple, one dictionary makes it possible to 
recognize the singular and the plural of 
a word as a single term, and words such 
as economize, economical, economies 
are also gathered up as a single term. 
Semantic relationships are established 
by means of a dictionary of synonyms, 

and the hierarchical relationships are es-
tablished in a classified system. The syn-
tactic relationship between phrases is 
controlled by phrase dictionaries, for ex-
ample, library schools and schools of li-
brarianship. The emphasis in SMART, 
then, is on the influence of these dic-
tionaries on the document search and in 
the manipulation of the search ques-
tions. These dictionaries are studied in-
dependently and also with respect to 
their cumulative effect. Thus the 
SMART system identifies the single best 
dictionary, as well as those which in 
combination prove most efficient with 
respect to recall and precision. 

RESULTS OF TESTS 

The inverse relationships of recall and 
precision. There is general agreement 
that there is usually an inverse relation-
ship between recall and precision, that 
is, while recall can be raised to 100 per-
cent, the cost in the number of nonrele-
vant documents retrieved is great. The 
nearer one approaches 100 percent re-
call, the greater proportionately is the 
drop in precision. 

Automatic indexing. Using SMART 
methods Salton came to the conclusion 
that, "Fully automatic text analysis and 
search systems do not appear to produce 
a retrieval performance which is inferior 
to that obtained by conventional systems 
using manual document indexing and 
manual search formulations." 

Precision and recall devices. Precision 
devices, except for coordination, proved 
of little value. Of the various recall de-
vices, the use of synonyms proved sig-
nificant, while the hierarchical ( classi-
fied) proved less effective than had 
been supposed. At Case Western the 
use of role indicators proved to be sig-
nificant only when the full text was 
available to the indexers; with abstracts, 
role indicators and other retrieval de-
vices were not superior. At Cranfield, 
the controlled language performance 

I 

j 

1 

j 


was not improved by manipulating it 
hierarchically. 

At Cranfield a surprising outcome 
was the realization that the uncon-
trolled single term natural language of 
the text was little improved by most re-
call or precision devices. At Cornell, it 
was found that the cumulative effect of 
all the dictionaries was more effective 
than any lesser combination. 

In summary, in any system a signifi-
cant recall device is the dictionary of 
synonyms, but the hierarchical element 
is not of major significance. Coordina-
tion is a powerful retrieval procedure. 

Controlled languages. At Cranfield, a 
rank order of thirty-three indexing lan-
guages and devices was published, in-
dicating their power of recall. The top 
seven languages were all uncontrolled. 
The best controlled language ranked 
tenth; its recall ratio was 61 percent 
compared to 65 percent for the best of 
the uncontrolled languages. The statis-
tical difference between them is regard-
ed as significant. 

SOME OBSERVATIONS FROM THE 

MANAGERIAL POINT OF VIEW 

Cost factors. Information scientists 
have not seriously attacked the question 
of the cost of the various indexing lan-
guages.4 It would appear, given the em-
phasis placed on the indexing languages 
at Cranfield and the search strategy at 
Western Reserve, that a number of those 
engaged in the testing were probably 
well acquainted with the subject matter 
of the tests. Despite this, Saracevic re-
ported that the single greatest and most 
important variable was the quality of 
the indexing. A study of MEDLAR fail-
ures shows that with respect to recall, 72 
percent of the failures can be attributed 
to faulty indexing or to faulty search 
strategy, while with respect to precision 
the number attributable to these two 
factors was 45 perc nt From these bits 
of evidence the relative insignificance of 

Information Retrieval I 171 

the indexing system and language, com-
pared to the indexing itself, and the 
imaginativeness of the search strategy, 
rises to haunt us. Furthermore, realizing 
that automatic indexing is not now su-
perior to manual indexing, and guessing 
at the cost of this kind of indexing, the 
prospects are anything but bright. 

Depth of indexing. Also significant is 
the considerable depth of indexing em-
ployed in these tests, depth considerably 
greater than is provided by conventional 
subject catalogers. At Western Reserve, 
the num her of indexing terms extracted 
from the full text ranged from thirty-six 
to forty, while twenty-three to thirty 
were taken from the abstracts. 

The significance of the depth of in-
dexing can be seen in the statistics sup-
plied by Cranfield in tests run on the 
single-term, natural language indexing 
language: with fourteen index terms, the 
recall ratio was 62.8; twenty-two terms 
produced a ratio of 63.5; and thirty-three 
terms produced a ratio of 65. However, 
there is a law of diminishing returns 
with respect to the depth of indexing. 
When an average of sixty terms were 
taken from abstracts, the recall ratio 
dropped to 60.9. 

Au.tomatic indexing. Turning to auto-
matic indexing, of considerable signifi-
cance from the managerial point of view 
is the fact that the intellectual effort re-
quired is considerable and of great sig-
nificance with respect to the results. In 
the absence of a good dictionary of syn-
onyms, the results can be disappointing, 
while the time required to compose a 
dictionary is an imposing consideration, 
as Salton has noted. 

On the average, using all the devices 
available, SMART performs as follows: 

Recall Precision 
Ratio Ratio 

10 85-95 
50 60-80 

100 30-45 


172 I College & Research Libraries • May 1970 

As Salton himself has admitted, these 
are not satisfactory levels of perform-
ance. 

Coordinate indexing. The first Cran-
field study ( 1962) tested four indexing 
systems, of which one was a coordinate 
system, best known as Uniterms. As 
summarized by Cleverdon, "It achieved 
the best overall figures in the test, it pre-
sented no difficulties for the technical 
searchers ... and was notably successful 
with short indexing times. It appears to 
have as good a relevance figure as any 
other system." 

Nevertheless, the Cranfield group re-
fuses to concede any natural advantage 
to Uniterms (a ''post-coordinate" sys-
tem ) over the others tested (the "pre-
coordinate" types). The capability of re-
trieving any combination of terms is a 
feature of a post-coordinate system, yet 
«the results of the investigation show 
that this advantage, though it existed, 
was not large." Also: "the difference be-
tween the two types of system is there-
fore shown to be not a fundamental dif-
ference but merely one of cost or con-
venience, and it has not been proven as 
yet on which side the advantage lies."5 

It should be made clear in this con-
nection that the Uniterm index system 
tested at Cranfield was devoid of vari-
ous precision devices which are a fea-
ture of other coordinate indexing sys-
tems (such as the metallurgical index at 
Case Western Reserve). In the presence 
of such precision devices, the recall ratio 
found at Cranfield presumably would 
have been lowered. 

The argument has been made that a 
Uniterm system will break down if used 
with a large collection of documents. 6 

Cleverdon disputes this, though neither 
disputant can argue from experience. 
Still another theoretical argument 
against the U niterm system is that it 
might prove less effective with social 
science and humanistic materials than 
with materials in the natural sciences. 

Computer manipulation of a manual-
based system. Such a system is 
MEDLARS; it is not an automatic sys-
tem in the sense of the SMART system. 
In the MEDLARS system, other than 
the machine search itself, the indexing 
operations are performed manually. The 
MEDLARS system, on the average, pro-
vides the user with about 60 percent of 
the relevant documents in the collec-
tion, but of the total documents retrieved, 
about 50 percent will not be relevant. 

It is widely believed that computer 
manipulation when applied to a con-
trolled indexing language will greatly 
improve its efficiency. This is not true; 
even if more subject terms per docu-
ment are posted, the overall efficiency of 
a controlled indexing language will not 
be significantly improved by computer 
manipulation, assuming that improve-
ment of the recall factor alone is not 
enough. 

This raises a perplexing question. Are 
all our users equally allergic to an in-
crease in the number of nonrelevant 
documents, given an increase in the 
number of relevant ones? For example, 
in this regard are historians to be equat-
ed with chemists? With economists? 

Another perplexing question is this: 
librarians suspect that scholars do not 
use the subject catalog extensively, and 
most often use it with respect to subjects 
outside their own speciality. Is this 
mainly because the subject catalog is 
inadequate, or because their more ur-
gent retrieval needs lie in nonmono-
graphic documents not now indexed in 
our subject catalogs? 

Search strategy. Also of importance is 
the amount of manipulation of ques-
tions (commonly termed search strate-
gy) that took place in these experi-
ments. In university libraries few ques-
tions are manipulated to the extent that 
took place in the tests under discussion. 
In the Cranfield tests and at Case West-
ern the manipulation of the questions 


was extensive. At Case Western each 
question was searched in four different 
ways, namely: ( 1) the searchable terms 
found in the question itself; ( 2) to ( 1) 
is added terms taken from a thesaurus; 
( 3) to ( 1 ) is added terms taken from 
encyclopedias and sources other than 
the thesaurus; ( 4) a combination of (2) 
and (3). , 

The considerable influence of these 
four manipulations can be seen in the 
number of relevant and nonrelevant 
documents retrieved: 

(1) 

Relevant 106 
N onrelevant 124 
Recall Ratio .43 
Precision Ratio .54 

At Cornell, various semantic and syn-
tactic procedures are applied both to 
the questions and to the documents; to 
put it otherwise, the heart of the 
SMART system is the correlation coef-
ficients by which terms in the question 
are matched with terms from the docu-
ments. 

Information Retrieval I 173 

Except in libraries serving a small 
group of users, in a manual setting this 
kind of question manipulation will not 
be possible unless highly skilled librari-
ans in considerable numbers are em-
ployed. In the automatic system the 
manipulation of the questions is manda-
tory. · 

Whether the costs of sophisticated in-
formation systems can be justified in ei-
ther the manual or the automatic mode 
remains to be seen. At the moment we 
have no idea what costs would be in-

(2) (3) (4) 

130 180 192 
197 509 598 
.52 .72 .77 
.48 .34 .33 

curred by systems such as the SMART 
system in the setting of a large library 
with a large number of scholars engaged 
in research. As for making the system 
available to undergraduates, this in-
volves an entirely different order of cost 
magnitude. 

REFERENCES 

1. Cyril W. Cleverdon, Factors Determin-
ing the Performance of Indexing Sys-
tems (Cranfield, England: College of 
Aeronautics, 1966), 2v. 

2. Tefko Saracevic, An Inquiry into Test-
ing of I nforrruJ;tion Retrieval Systems 
(Cleveland, Ohio: Center for Docu-
mentation and Communication Re-
search, Case Western Reserve Univ., 
1968). 

3. Gerald Salton, Automatic Information 
Organization and Retrieval (New York: 
McGraw-Hill, 1968). 

4. Frank B. Rogers, "Costs of Operating an 
Information Retrieval Service," Drexel 
Library Quarterly 4:271- 78 (Oct. 
1968). 

5. Cyril W. Cleverdon, Report on the 
Testing and Analysis of . . . Indexing 
Systems (Cranfield, England: College 
of Aeronautics, 1962), pp.101-02. 

6. Arthur D. Little, Inc., Centralization 
and Documentation. Final Report to the 
National Science Foundation. 2d ed. 
(Cambridge, Mass .: 1964). 

7. Cleverdon, Report.