lib-s-mocs-kmc364-20141005044017


Catalog Records Retrieved by Personal 
Author Using Derived Search Keys 

103 

Alan L. LANDGRAF and Frederick G. KILGOUR: The Ohio College Library Center 

This investigation shows that search keys derived from personal author 
names possess a sufficient degree of distinctness to be employed in an effi~ 
cient computerized interactive index to a file of MARC II catalog records 
having 167,7 45 personal author entries. 

Previous papers in this series and experience at the Ohio College Library 
Center have established that truncated derived search keys are efficient for 
retrieval of entries by name-title and title from large on~line computerized 
files of catalog records. 1- 4 Experiments reported in the earlier papers were 
" ... based on the assumption that each key had a probable use equal to all 
other keys."5 However, Guthrie and Slifko have shown that random selec-
tion of entries, rather than keys, yields results closer to actual experience 
but with a higher number of entries per reply.6 For example, they found 
on retrieving from a file of 857,725 records using a 4, 5 (four characters of 
main entry, five characters of title) key tl1at when the basis of the search 
was random keys there was one entry per reply 81.3 percent of the time, 
but when the basis was random records, there was one entry per reply 55.7 
percent of the time. 

This paper presents the results of experimentation with search keys to 
be used in constructing an author index to a large file of on-line catalog 
records. An interactive environment is assumed, with the interrogator em-
ploying a remote terminal. A companion paper de:;etibes the findings of an 
investigation into retrieval efficiency of search keys derived from corporate 
author names.7 

MATERIALS AND METHODS 

The investigation employed a MARC II file containing approximately 
200,000 monographic records from which a computer program extracted 
167,745 personal-name keys. The program extracted these keys from main 
entry, series statement, added entry, and series added entry fields. The basic 
key structure consisted of sixteen characters-the first eight from the sur-
name, the first seven from the forename, and the first character from the 
middle name ( 8,7,1). If the surname and forename contained fewer char-


104 Journal of Libmry Automation Vol. 6/ 2 June 1973 

~ LIKELIHOOD 
90.00% 
99.00% 
99 . 50% 

0 
90. 00% 
99. 00% 
99.50% 

0 ....... 
j: 

II 

I 

.&: -"i II 
0 

" ....... j: 
2 

.J: ..... It 
"i 

~ 
~ 

II 

3 
J::. .... 

" ~ 

NO. OF CHARACTERS EXTRACTED FROM THE 
SURNAME 

3 4 5 6 

(>200) (> 200) (>200) 171 
(>200) 

67 25 18 16 
172 90 71 63 

(>200) 105 102 81 

16 8 6 6 
55 25 23 II 
67 36 32 30 

26 12 9 
87 44 38 

106 62 57 

8 5 5 
29 21 21 
37 30 30 

17 
50 
78 

5 
23 
31 

Fig. 1. Number of Names Retrieved 90, 99, and 99.5 Percent of the Titne 
for Different Key Structures 

acters than the key segment to be derived, the segment was left-justified 
and padded out with blanks. If there was no middle name or middle 
initial, a blank was used. 

Another program derived shorter keys from the 8,7,1 structure ranging 
from 3,0 to 5,2,1. Next, a sort program arranged the shorter keys in alpha-
betical order. A statistics collection program then processed the alpha-
betical file. This program counted the number of distinct keys, built a fre-
quency distribution of names per distinct key and cumulative frequency 
distributions of names per distinct key in percentile groups. 

RESULTS 

Figure 1 presents the findings at three levels of likelihood for retrieving n 


Catalog Records Retrieved/ LANDGRAF 105 

Table 1 . Number of Names R etrieved With 90 Percent Likelihood 

No. of Characters 

3 

4 

5 

6 

7 

No. of Names Retrieved 

( > 200) 

(>200) 
(>200) 

( > 200) 
26 
25 
16 

171 
18 
17 
12 
8 
8 

16 
9 
6 
5 
5 

Key Structure 

3,0 

4,0 
3,1 

5,0 
3,2 
4,1 
3,1,1 

6,0 
5,1 
3,3 
4,2 
3,2,1 
4,1,1 

6,1 
5,2 
5,1,1 
3,3,1 
4,2,1 

or fewer names when a variety of search key combinations were employed 
ranging from three to six characters from the surname, zero to three char-
acters from the first name, and with or without the middle initial. Table 1 
is an extraction from Figure l and contains the number of names retrieved 
at a level of 90 percent likelihood for the various search keys employed. 

Figure 2 has the same structure as Figure 1 but contains the degree of 
distinctness as percentages, 

(
no. of distinct keys) 

100 no. of entries x percent. 
Table 2 records distinctness arranged by number of characters per key. 
Figure 3 is a graphical representation of the degrees of distinctness of the 
various keys. In this figure, different types of lines connect points represent-
ing key structures that contain an equal number of characters. 

The bottom line in Table l may be read as saying that 90 percent of the 
time a 4,2,1 key will retrieve five or fewer names from a file of 167,745 
personal name keys. The bottom line of Table 2 states that from the same 
file the 4,2,1 key. yields a single name 64.1 percent of the time. 

DISCUSSION, 

This experiment has shown the degree of distinctness-that is to say, the 
number of distinct keys divided by the total number of entries from which 
all keys were derived-to be a useful tool in determining what key struc-
tures may be efficiently used. As seen by comparing Figure 1 with Figure 
2 and Table 1 with Table 2, there is a high degree of correlation between 
distinctness aJ}d the likelihood of retrieving a certain number of names 90, 


106 Journal of Library Automation Vol. 6/ 2 June 1973 

NO. OF CHARACTERS EXTRACTED FROM THE SURNAME 

~ 
0 
a: I-
lA.~ 

0 
03: 

~!::: 
~~ 
a:o 
1-z 
~< 

3:-' 
Cl)t-< 
ffiiE 
t;w!: 
:~w 
<Z-' 
%WO 
oa::o 
IA-Oi 
OIA.. 
. ww 

0%3: 
ZI-t-

I 

I 

0 

I 

2 

3 

4 

I 
3 

0 2.271 
I -
0 17.106 
I 44.551 

0 34 .676 
I 56.979 

0 44.914 
I 66. 133 

0 
I 

4 5 6 

I 
9,934 19.220 24.587 
- - -

35,360 44. 850 48.345 
I 

57.148 61.449 62 . 891 

49 .870 
I 

55.803 ' 

64.155 66 . 186 

56.294 
66.599 

Fig. 2. Degree of Distinctness in Percent for Different Key Structures 

Table 2. Distinctness by Number of Characters Per Key 

No. of Characters 

3 

4 

5 

6 

7 

Degree of 
Distinctness 

2.3 

9.9 
17.1 

19.2 
34.8 
35.7 
44.5 

24.6 
44.9 
44.9 
49.9 
57.0 
57.1 

48.3 
55.8 
56.3 
61.4 
62.1 
64.1 

Key Structure 

3,0 

4,0 
3,1 

5,0 
3,2 
4,1 
3,1,1 

6,0 
5,1 
3,3 
4,2 
3,2,1 
4,1,1 

6,1 
5,2 
4,3 
5,1,1 
3,3,1 
4,2,1 

99, or 99.5 percent of the time. Thus, the investigator can eliminate many 
un~esirable key structures on the merits of distinctness alone and pool 
his remaining resources toward studying in detail other structures .. 

'When the 8,7,1 key was tested, it yielded a uniqueness percentage of 


Catalog Records Retrievedj LANDGRAF 107 

0 
(3,0} 

10 (4,0) , 
" " / 

" 
® " 'rs,1J 

(6,0) 
20 I 

I 
I 

I 
30 I 

I 
® 

(7,2) I -- (",I) 
40 

® 

50 (j) 

60 

70 
tiPPER LIMIT • 68.78;,..;;~---- ----- -----

Fig. 3. Degree of Distinctness. Lines Connect Points Whose Key Structures 
Have an Equal Number of Characters 

68.8 that represents the upper limit of uniqueness in this experiment. From 
Table 2 it is apparent that the bottom three keys yield a percentage of 
uniqueness near the upper limit. 

Table 2 shows a distinct jump in percentage of uniqueness between the 
n,O and n,l key structures. Another sharp increase occurs between n,m and 
n,rn,l structures. Each section of the key is derived from a Markov string, 
and it appears from the discontinuities between sections that the parts of 
personal names are not highly correlated. 

As pointed out in previous papers, a key structure that possesses a rela-


108 Journal of Libmry Automation . Vol. 6/ 2 June 1973 

tively high degree of distinctness also yields a small percentage of replies 
containing many entries. For the name-only search key, this effect could be 
reduced by performing the retrieval in two steps when necess~ry. First, the 
full names for each author whose name matcl1es the entered search key 
would be displayed; names appearing with more than one work would be 
displayed only once. Next, the retriever would choose the name desired and 
request all of the titles associated with it. However, some title displays 
could be excessive-William Shakespeare's name appears with more than 
500 works. A paper currently in preparation at OCLC describes an algo-
rithm whose interactive use resolves this type of search problerri.8 

CONCLUSION 

This investigation has yielded findings showing that there are several 
truncated search keys derived from personal names that ate sufficiently 
specific to perform efficiently as an author index to a file of 161,745 personal 
names, thereby providing an on-line index that will make it . possible for a 
terminal user to obtain a listing of all titles by a given author: in an on-line 
catalog. 

ACKNOWLEDGMENT 

This study was supported in part by Office of Education contract OEC-0-72-2289 
( 506) and Council on Library Resources grant CLR-526. 

REFERENCES 

1. P. L. Long and F. G. Kilgour, "A Truncated Search Key Title Index," Journal of 
Library Automation 5:17- 20 (March 1972). 

2. F. G. Kilgour, P. L. Long, E. B. Leiderman, and A. L. Landgraf, "Title-Only En-
tries Retrieved by Use of Truncated Search Keys," Journal of Library Automation 
4:207-310 (Dec. 1971). 

3. F. G. Kilgour, P. L. Long, and E. B. Leiderman, "Retrieval of Bibliographic Entries 
from a Name-Title Catalog by Use of Truncated Search Keys," Proceedings of the 
American Society for Information Science 7:79-82 (1970). 

4. F. G. Kilgour, P. L. Long, A. L. Landgraf, and J. A. Wyckoff, "The Shared Catalog-
ing System of the Ohio College Library Center," Journal of Library Automation 
5:157-183 (Sept. 1972). · 

5. Long and Kilgour, "A Truncated Search Key," p.l8. 
6. Gerry P. Guthrie and Steven D . Slifko, "Analysis of Search Key Retrieval on a 

Large Bibliographic File," Journal of Library Automation 5:96-100 (June 1972). 
7. K. B. Rastogi, A. L. Landgraf, and P. L. Long, "Corporate Author Entry Record~ 

Retrieved by Use of Derived Truncated Search Keys," ]oumal of Library Automa-
tion in press. 

8. J. A. Wyckoff, "A Technique for Extending Searches through Large Numbers of 
Duplicate Matches," in Preparation.