lib-s-mocs-kmc364-20140601051313


17 

A TRUNCATED SEARCH KEY TITLE INDEX 

Philip L. LONG: Head, Automated Systems Research and Development 
and Frederick G. KILGOUR: Director, Ohio College Library Center, 
Columbus, Ohio. 

An experiment showing that 3, 1, 1, 1 search keys derived from titles are 
sufficiently specific to be an efficient computerized, interactive index to 
a file of 135,938 MARC II records. 

This paper reports the findings of an experiment undertaken to design a 
title index to entries in the Ohio College Library Center's on-line shared 
cataloging system. Several large libraries participating in the center re-
quested a title index because experience in those libraries had shown that 
the staff could locate entries in files more readily by title than by author 
and title. Users of large author-title catalogs have long been aware of great 
difficulties in finding entries in such catalogs. Since the center's computer 
program for producing an author-title index could be readily adapted to 
produce a title index, it was decided to add title access to the system. 

A previous paper has shown that truncated three-letter search keys 
derived from the first two words of a title are less specific than author-
title keys ( 1). Earlier work had revealed that addition of only the first 
letter of another word in a title improved specificity ( 2) . Therefore, the 
experiment was designed to test the specificity of keys consisting of the 
first three characters of the first non-English-article word of the title plus 
the first letter of a variable number of consecutive words. 

The experiment was also designed to produce an index that catalogers 
could use efficiently and that would operate efficiently in the computer 
system. It was assumed that the terminal user would have in hand the 
volume for which an entry was to be sought in the on-line catalog. The 
index was not to be designed for use by library users; subsequent experi-
ments will be done to design an index for nonlibrarian users. 

Other investigations into computerized, derived-key title indexes include 


18 Journal of Library Automation Vol. 5/1 March, 1972 

the previous paper in this series to which reference has already been made 
( 1) and development of a title index in Stanford's BALLOTS system ( 3). 
Although Stanford has not published results observed from experiment 
or experience that describe the retrieval specificity of its technique, it is 
clear that the Stanford procedure is not only more powerful than the one 
described in this paper but also more adaptable for user employment. The 
Stanford index is probably less efficient. 

MATERIALS AND METHODS 

A file of 135,938 MARC II records was used in this experiment. This 
file contains title-only and name-title entries, and keys were derived from 
titles in both types of entries. A key was extracted consisting of the first 
three characters of the first non-English-article word of each title plus 
the first character of each following word up to four. If there were fewer 
than four additional words, the key was left-justified, with trailing blank 
fill. Only alphabetic and numeric characters were used in key derivation; 
alphabetic characters were forced to uppercase. All other characters were 
eliminated and the space occupied by an eliminated character was closed 
up before the key was derived. A total of 115,623 distinct keys was derived 
from the 135,938 entries. 

These 115,623 keys were then sorted. Each key in the file was compared 
with the subsequent key or keys and equal comparisons were counted. 
A frequency distribution by identical keys was thus prepared, and a table 
constructed of percentages of numbers of equal comparisons based on the 
total number of distinct keys. This table contains the percentage of time 
for expected numbers of replies based on the assumption that each key 
had a probable use equal to all other keys. 

Next, by eliminating the fourth single character and then the fourth 
and third, files of 3,1,1,1 and 3,1,1 keys were prepared from the 3,1,1,1,1 
file. For example, the 3,1,1,1,1 key for Raymond Irwin's The Heritage 
of the English Library is HER, 0, T, E , L; the 3,1,1,1 key for this title 
is HER, 0 , T, E; and the 3,1,1 key, HER, 0 , T. The same processing given 
to the 3,1,1,1,1 file was employed on these two files. 

RESULTS 

Table 1 contains the maximum number of entries in 99 percent of re-
plies. Inspection of the table reveals that there is a large increase in 
specificity when the key is enlarged from 3,1,1 to 3,1,1,1; the maximum 
number of entries ( 99+ percent of the time) drops from twelve to five. 
However, when the key goes to 3,1,1,1,1, the number of entries per 
reply goes down only to four from five. 

The percentage of replies that contained a single entry was 67.8 for 
the 3,1,1 key, 84.0 for the 3,1,1,1 key, and 90.0 for the 3,1,1,1,1 key. 


A Truncat ed Search Key / LONG and KILGOUR 19 

Table. 1. Maximum Number of Entries in 99 Percent of Replies 

Search Key 

3, 1,1 
3, 1, 1,1 
3, 1, 1, 1,1 

Title Index Entries 

Maximum Entries 
Per Reply 

12 
5 
4 

Percentage 
of Time 

99.0 
99.1 
99.2 

The Irascope cathode ray tube terminals used in the OCLC system 
can display nine truncated entries on the one screen, and it is felt that 
catalogers can use with ease up to two screensful of entries. Therefore, 
the keys producing more than eighteen titles were listed. For 3,1,1,1,1 
there were only 33; for 3,1,1,1 there were 67; and for 3,1,1 there were 357. 

The maximum number of identical keys was 321 for 3,1,1,1,1 and 3,1,1,1; 
the key was PRO, b, b, b, b, most of which was d erived from "Proceedings." 
For 3,1,1 the maximum was 417, for HIS, 0 , T - "History of the." 

DISCUSSION 

It is clear from the findings that a 3,1,1 search key is not sufficiently 
specific to operate efficiently as a title index in a large file. However, the 
3,1,1,1 key appears to be sufficiently specific for efficient operation, while 
the 3,1,1,1,1 key does not appear to possess sufficient increased specificity 
to justify its additional complexity. 

The observation that there is a large increase in specificity between 
keys employing three- and four-title words that constitute Markov strings 
suggests that the second and third words may be highly correlated. Indeed 
this suggestion is substantiated b y the maximum case for 3,1,1-HIS, 0, T. 
In the more-than-eighteen group for 3,1,1,1, these characters occurred in 
seven keys for a total of 206 entries, and for 3,1,1,1,1 they did not occur 
at all in the more-than-eighteen group. 

CONCLUSION 

This experiment has shown that a 3,1,1,1 or 3,1,1,1,1 derived search 
key is sufficiently specific to operate efficiently as a title index to a 
file of 135,938 MARC II records. Since a previous paper observed that as 
a fil e of entries increases the number of entries per reply does not increase 
in a one-to-one ratio ( 1 ), it is likely that these keys will operate efficiently 
for files of considerably greater size. 


20 Journal of Library Automation Vol. 5/1 March, 1972 

REFERENCES 

1. Frederick G. Kilgour, Philip L. Long, Eugene B. Leiderman, and Alan 
L. Landgraff, "Title-Only Entries Retrieved by Use of Truncated 
Search Keys," l ournal of Library Automation 4:207-10 ( Dec. 1971 ). 

2. Frederick G. Kilgour, "Retrieval of Single Entries from a Computerized 
Library Catalog File," Proceedings of the American Society for Infor-
mation Science 5: 133-36 ( 1968 ). 

3. Edwin B. Parker, SPIRES (Stanford Physics Information REtrieval 
System) 1969-70 Annual Report ( Palo Alto: Stanford University, June 
1970 ), p. 77- 78.