lib-s-mocs-kmc364-20141005044017 Catalog Records Retrieved by Personal Author Using Derived Search Keys 103 Alan L. LANDGRAF and Frederick G. KILGOUR: The Ohio College Library Center This investigation shows that search keys derived from personal author names possess a sufficient degree of distinctness to be employed in an effi~ cient computerized interactive index to a file of MARC II catalog records having 167,7 45 personal author entries. Previous papers in this series and experience at the Ohio College Library Center have established that truncated derived search keys are efficient for retrieval of entries by name-title and title from large on~line computerized files of catalog records. 1- 4 Experiments reported in the earlier papers were " ... based on the assumption that each key had a probable use equal to all other keys."5 However, Guthrie and Slifko have shown that random selec- tion of entries, rather than keys, yields results closer to actual experience but with a higher number of entries per reply.6 For example, they found on retrieving from a file of 857,725 records using a 4, 5 (four characters of main entry, five characters of title) key tl1at when the basis of the search was random keys there was one entry per reply 81.3 percent of the time, but when the basis was random records, there was one entry per reply 55.7 percent of the time. This paper presents the results of experimentation with search keys to be used in constructing an author index to a large file of on-line catalog records. An interactive environment is assumed, with the interrogator em- ploying a remote terminal. A companion paper de:;etibes the findings of an investigation into retrieval efficiency of search keys derived from corporate author names.7 MATERIALS AND METHODS The investigation employed a MARC II file containing approximately 200,000 monographic records from which a computer program extracted 167,745 personal-name keys. The program extracted these keys from main entry, series statement, added entry, and series added entry fields. The basic key structure consisted of sixteen characters-the first eight from the sur- name, the first seven from the forename, and the first character from the middle name ( 8,7,1). If the surname and forename contained fewer char- 104 Journal of Libmry Automation Vol. 6/ 2 June 1973 ~ LIKELIHOOD 90.00% 99.00% 99 . 50% 0 90. 00% 99. 00% 99.50% 0 ....... j: II I .&: -"i II 0 " ....... j: 2 .J: ..... It "i ~ ~ II 3 J::. .... " ~ NO. OF CHARACTERS EXTRACTED FROM THE SURNAME 3 4 5 6 (>200) (> 200) (>200) 171 (>200) 67 25 18 16 172 90 71 63 (>200) 105 102 81 16 8 6 6 55 25 23 II 67 36 32 30 26 12 9 87 44 38 106 62 57 8 5 5 29 21 21 37 30 30 17 50 78 5 23 31 Fig. 1. Number of Names Retrieved 90, 99, and 99.5 Percent of the Titne for Different Key Structures acters than the key segment to be derived, the segment was left-justified and padded out with blanks. If there was no middle name or middle initial, a blank was used. Another program derived shorter keys from the 8,7,1 structure ranging from 3,0 to 5,2,1. Next, a sort program arranged the shorter keys in alpha- betical order. A statistics collection program then processed the alpha- betical file. This program counted the number of distinct keys, built a fre- quency distribution of names per distinct key and cumulative frequency distributions of names per distinct key in percentile groups. RESULTS Figure 1 presents the findings at three levels of likelihood for retrieving n Catalog Records Retrieved/ LANDGRAF 105 Table 1 . Number of Names R etrieved With 90 Percent Likelihood No. of Characters 3 4 5 6 7 No. of Names Retrieved ( > 200) (>200) (>200) ( > 200) 26 25 16 171 18 17 12 8 8 16 9 6 5 5 Key Structure 3,0 4,0 3,1 5,0 3,2 4,1 3,1,1 6,0 5,1 3,3 4,2 3,2,1 4,1,1 6,1 5,2 5,1,1 3,3,1 4,2,1 or fewer names when a variety of search key combinations were employed ranging from three to six characters from the surname, zero to three char- acters from the first name, and with or without the middle initial. Table 1 is an extraction from Figure l and contains the number of names retrieved at a level of 90 percent likelihood for the various search keys employed. Figure 2 has the same structure as Figure 1 but contains the degree of distinctness as percentages, ( no. of distinct keys) 100 no. of entries x percent. Table 2 records distinctness arranged by number of characters per key. Figure 3 is a graphical representation of the degrees of distinctness of the various keys. In this figure, different types of lines connect points represent- ing key structures that contain an equal number of characters. The bottom line in Table l may be read as saying that 90 percent of the time a 4,2,1 key will retrieve five or fewer names from a file of 167,745 personal name keys. The bottom line of Table 2 states that from the same file the 4,2,1 key. yields a single name 64.1 percent of the time. DISCUSSION, This experiment has shown the degree of distinctness-that is to say, the number of distinct keys divided by the total number of entries from which all keys were derived-to be a useful tool in determining what key struc- tures may be efficiently used. As seen by comparing Figure 1 with Figure 2 and Table 1 with Table 2, there is a high degree of correlation between distinctness aJ}d the likelihood of retrieving a certain number of names 90, 106 Journal of Library Automation Vol. 6/ 2 June 1973 NO. OF CHARACTERS EXTRACTED FROM THE SURNAME ~ 0 a: I- lA.~ 0 03: ~!::: ~~ a:o 1-z ~< 3:-' Cl)t-< ffiiE t;w!: :~w