College and Research Libraries B y C . D . G U L L Alphabetic Subject Indexes and Coordinate Indexes: A n Experimental Comparison Mr. Gull is a mernber of the staff, Docu- mentation, Inc. ON E OF T H E O B J E C T S of t h i s c o n t r a c t is to make an experimental study of clas- sification systems,1 alphabetic subject indexes and coordinate indexes. Because the exist- ing catalogs present difficulties of size, loca- tion and security restrictions, a sampling technique was employed. T h e first com- parison was made between the alphabetic subject indexes of the Technical Informa- tion Division ( T I D ) of the L i b r a r y of Congress and of the Document Service Center ( D S C ) in Dayton and coordinate indexes developed by Documentation In- corporated. Cards were obtained from T I D representing 1 2 0 7 reports cataloged under its Office of N a v a l Research contract, and cards were obtained from D S C , repre- senting 543 reports cataloged for the A i r Force. A l l cards found under headings begin- * Technical Report No. 5, Prepared under Contract No. A F I8(6OO)-376, for The Armed Services Techni- cal Information Agency, by Documentation, Inc., Wash- ington, D.C. 1 A comparison of classifications and coordinate in- dexes will be described in a later Technical Report. ning with certain words, such as Antennas, Electric, Electronic and Microwaves, were chosen as part of the sample, because the headings incorporating these words were thought to be the most heavily used head- ings in the list for unclassified reports and would thus illustrate the maximum con- centration of numbers on coordinate index cards that could be obtained for the sample. These cards represented 707 T I D reports. T h e remaining 500 cards were in numeri- cal order, from U 2 0 4 0 0 through U 2 0 8 9 9 , making the sample 1 2 0 7 out of 2 1 , 0 0 0 un- classified T I D reports. T h e subject matter of these cards is so diverse that they are considered representative of the complete subject catalog. T h e 543 D S C cards were chosen at random and not in consecutive numerical order, thus making the samples equally representative for both catalogs. Preparation of Sample Subject Heading Catalogs Sample subject heading catalogs were set up from the two groups of cards, and the following figures were determined from them: T A B L E I Catalog Cards From Re- ports Differ- e n t H e a d - ings U s e d Subject H e a d - ing Assign- m e n t s Average Subject H e a d - ings Per Report Cross References N e e d e d T o t a l Cards T o t a l Cross Refer- ences Cross Refer- ences Per H e a d - ing Catalog Cards From Re- ports Differ- e n t H e a d - ings U s e d Subject H e a d - ing Assign- m e n t s Average Subject H e a d - ings Per Report T o H e a d i n g s T o Subdiv. T o t a l Cards T o t a l Cross Refer- ences Cross Refer- ences Per H e a d - ing Catalog Cards From Re- ports Differ- e n t H e a d - ings U s e d Subject H e a d - ing Assign- m e n t s Average Subject H e a d - ings Per Report See See Also See See Also T o t a l Cards T o t a l Cross Refer- ences Cross Refer- ences Per H e a d - ing T I D D S C T o t a l s or Averages 1207 543 1110 899 1950 1357 1 . 6 1 2 . 5 0 630 460 208 67 212 229 25 5 3025 2118 1075 761 0 . 9 7 0 . 8 5 T I D D S C T o t a l s or Averages 1750 2009 3307 1 . 8 9 1090 275 4 4 1 30 5 H 3 1836 0 . 9 3 276 COLLEGE AND RESEARCH LIBRARIES It was not difficult to underline the subject headings on the cards and arrange them in alphabetic order, but it took a great deal of time to establish the cross refer- ence structures for the two samples. T h e cross references had to be included to make the samples correspond to the original cata- logs and to permit comparison of the sub- ject catalogs and the coordinate indexes for reference purposes, as well as to determine the relative difficulty of preparation. T h e second edition of the N a v y Research Section ( N R S ) List of Subject Headings2 was followed in creating the see and see also references for the T I D catalog; for the f e w headings found on the cards but lacking in the list, the references were made according to the policies used for the List. T h e ASTIA Document Service Center Subject Heading List* lacks the type of cross refer- ence structure of the N R S list, and it provides only seven see references and 28 see also references for the 899 subject head- ings and none to subdivisions. It was there fore necessary to supply the cross reference structure for the D S C sample catalog, and this was done according to the policies fol- lowed in the N R S list. T h u s both samples were supplied with all cross references re- quired by the permutations of words in the various subject headings. W h i l e it was possible to provide all of the necessary see references, no attempt was made to supply any but the most obvious see also references for the D S C sample catalog, since it is ex- tremely difficult to guess the relationship of headings in a list lacking in its own see also references. T h e situation accounts for the small number of see also references in the D S C sample compared to the greater num- ber for the T I D sample. T h e form of the N R S list also indicated the production of 2 2 0 see also references 2 U . S . Library of Congress, Navy Research Section. List of Subject Headings, 2d ed. Washington, 1950. * U . S . Dept. of Defense. Armed Services Technical In- formation Agency. Document Service Center. A S T I A Document Service Center Subject Heading List (Alpha- betically), Dayton, 1952. The subject catalog maintained by the Document Service Center has been provided with cross references, even though they are lacking from the printed list. which could not be used in the sample T I D catalog as such, but 186 of these are changed to see references and added, leaving only 34 to be discarded. A similar situation prevailed for the sample D S C catalog. It is particularly noteworthy from these samples that 97 cross references are needed for every 1 0 0 T I D headings and at least 85 cross references for every 1 0 0 D S C headings. These references are required by the inherent difficulties of alphabetic in- dexes : they must be included for synonyms, relations between headings, and the per- mutations of words in multiple-word head ings. Preparation of Sample Coordinate Indexes Certain assumptions about coordinate in- dexes were current in our thinking when we undertook to prepare the first coordinate index from the T I D cards: 1. Coordinate index terms should be simple. 2. A coordinate index is used by coordinating two or more terms to discover the original materials providing the desired coordina- tion. Coordination is accomplished by any of the logical operations of conjunction, alternation, and negation, or any combina- tion of them. 3. In order to make it unnecessary to search the entire index, the record of the original materials should be posted on the coordinate index term cards. Since numbers are very convenient for such posting, the original ma- terials should be arranged in numerical order. 4. Since we lacked the original materials to put in numerical order, a numerical or accessions catalog was essential. The num- bers already on the cards were ideal for this purpose. 5. Coordinate indexing can be accomplished by manual and mechanical means. The samples described here were made on cards for manual coordination, divided into ten vertical columns according to the terminal digits of the numbers, a device expected to facilitate the coordination of numbers.3 3 A sixth assumption was this: The distribution of coordinate index terms into categories will facilitate both the cataloging operations and reference use. Al- though an attempt was made to categorize the terms, this phase of the investigation is yet to be completed. JULY, 1953 2 77 A f t e r the T I D file was set up in order by T I P number, the coordinate index was started by considering the card with the lowest number: in this sample, U 2 3 , bear- ing the two subject headings—Power meters and Microwaves—Absorption. A t this stage, only the subject headings were considered in preparing the coordinate in- dex, and no attention was paid to the titles and abstracts included on the cards. Clearly, the term Microwaves could be used on one coordinate index card and Absorption on another card, and 23 en- tered on each card in the column headed 3 ( f o r the final digit), but what about the phrase Power meters? If used as a phrase, it is not as simple as if broken into two words, and it requires in an alphabetic file a cross reference from the permutation, Meters, Power. If broken into two words, the specialized meaning of the phrase be- comes lost in the general character of the single words, but is recovered when the two words are coordinated, showing 23 to be common to both words. In an attempt to test the assumptions and with a keen realiza- tion of the costly, time-consuming charac- ter of a cross reference structure, the phrase was broken up into two words, and the next cards were considered. A s the work progressed, it soon became a goal to create the coordinate index without any cross references, if possible. However, not all phrases seemed as easy to break into single words, and the progress of the work was marked by indecision and inconsistency. A chronicle of the efforts to solve the problem, and the solution itself, are found in our Technical Report N o . 3, November, 1 9 5 2 4 T h e rule for the solution is repeated here because of its importance to coordinate indexing: "Enter every word in a coordinate index system as a filing word on a single coordinate 4 Also published as " U n i t Terms in Coordinate In- dexing" by Taube, Mortimer, Gull, C. D., and Wachtel, Irma S . , in American Documentation, 3 : 2 1 3 - 2 1 8 , Oc- tober 1952'. index card. Whenever in a particular system a word is used in one, and only one, descriptive phrase, enter that word as the filing word on a card, followed by the remaining word or words in the phrase. The word or words following the filing word on any card will themselves be filing words on other cards." A n example shows the practical applica- tion of this rule. Given a report dealing with digital computers, two cards are made, one headed Computers and the other Digital. If there are no computers in the system except digital computers, the Com- puters card is modified to read Computers, Digital. If there is nothing digital in the system except computers, the Digital card is modified to read Digital computers. If later a report is received on analog com- puters, the card for Computers, Digital is shortened to read Computers and a new card is made reading Analog computers, providing, of course, there is nothing analog in the system but analog computers. T h e Digital computers card is not affected until a report is received on some other digital de- vice, when the term is shortened to Digital alone. W i t h this rule for a guide in choosing unit terms, the coordinate index for the T I D cards was rapidly completed, with no further problems, and the cards were arranged in alphabetic order. T h e sample coordinate index possesses these character- istics : 1. Every term in the system is a filing term. 2. Since there are no subdivisions, every term is on equal footing with every other and can be the subject of a complete search. 3. All "see" references required in a standard system, by virtue of the order of words in index-headings, are eliminated. 4. All "see also" references from general to specific subjects are eliminated. 5. The subjective choice of the indexer be- tween possible permutations of multiple- term descriptions is eliminated. 6. Since every word in the system is a filing word and each word in the system appears only once as a filing word, searching for 278 COLLEGE AND RESEARCH LIBRARIES the "proper subdivision" in the proper phrase is unnecessary. Since serial numbers do not reveal the security classification of reports, a single coordinate index can be used for all classifi- cations without compromising the security requirements based on the "need-to-know." 1 2 1 4 unit terms for the combined co- ordinate index, since 3 7 2 terms were com- mon to both indexes. T h e merged coordinate index provided a marked contrast to the sample subject heading catalogs, as shown in these figures: T A B L E 2 Catalog Cards From Reports Cards in Sample Subject Heading Catalog Cards in Separate Coordi- nate In- dexes Cards in Merged Coordi- nate In- dex Subject Heading Assign-' ments Subject Headings Per Report Unit Terms Assigned Converted to Unit Terms Per Report T I D 1207 3025 815 443 1950 1.61 4 2 4 9 3 - 5 2 D S C 543 2118 771 399 1357 2.50 2317 4 . 2 6 Common to 771 1357 2.50 2317 T I D & D S C — — — 372 — — — — Totals or Averages 1750 5 H 3 1586 1 2 1 4 33 ° 7 1.89 6566 3 - 7 5 T h e new rule made it easy to prepare a coordinate index from the headings on the D S C cards, and both coordinate indexes possessed the same characteristics. A t this stage of the work, the two sample alphabetic subject heading catalogs were of approximately the same quality for refer- ence purposes because of their full cross- reference structures; but they could not be combined easily into one catalog because the headings are uninverted for T I D (i.e., Digital computers) and inverted for D S C (i.e., Computers, D i g i t a l ) . A n y attempt at combination would require extensive changes on at least one set of cards, as well as new cross references. Merging the Coordinate Indexes It was soon perceived that the contrary was true of the two coordinate indexes. Because the terms in each coordinate index were unit terms, and predominantly single words, it was entirely feasible and easy to merge the two indexes into one. Before the merger there were 8 1 5 unit terms for the T I D coordinate index and 7 7 1 for the D S C coordinate index, a total of 1 5 8 6 terms; but after the merger there were only T h e merged coordinate index requires less than one-fourth the number of cards in the two subject heading samples, yet the average number of indexing assignments was doubled, even though the assignment of unit terms was restricted by the policy of creating unit terms from the subject head- ings only. Improving the Quality of Coordinate In- dexing It was recognized that an improvement in quality could be obtained for the merged coordinate index by 1. Assigning additional unit terms based on information obtained from the titles and abstracts on the cards, or 2. Assigning additional unit terms based on titles and abstracts on the cards plus a re- view of the original documents. T h e second alternative was not tested, under the assumption that it would be too expensive an undertaking for any large collection of documents, but an investigation of the first alternative was undertaken. A new merged coordinate index was created from a sample of 200 cards, comprised of 1 0 0 D S C cards (the lowest numbers in our JULY, 1953 279 non-consecutively numbered sample) and 1 0 0 T I D cards ( U 2 0 2 0 0 - U 2 0 2 9 9 ) . Since all unit term assignments required by the subject headings were retained, the base of the new coordinate index was identical with the old coordinate index f o r these 2 0 0 cards. T h e preparation of the new index revealed that 388 unit terms were used f o r the 2 0 0 cards in the old merged coordinate index. T h e review of titles and abstracts resulted in the use of 90 terms already in the old merged coordinate index and in the addition of 1 1 7 new terms, bringing the sum "access points." T h e following T a b l e has been developed to show per report the comparison between subject headings, ac- cess points, converted unit terms, and unit terms resulting from improved coordinate indexing. I t is interesting to note that while the D S C reports have more subject headings and more converted unit terms per report than do T I D reports ( 2 . 5 0 to 1 . 6 1 and 4.26 to 3.28, respectively), they have f e w e r unit terms a f t e r a review of titles and ab- stracts ( 5 . 6 4 to 6 . 8 8 ) . A s an explanation T A B L E 3 P E R R E P O R T 1 2 3 4 5 6 7 8 Catalog Cards from Subject Headings Times Cross Refer- ences Per Heading Equals Cross Refer- ences Access Points ( 1 + 3 ) • Column 1 Converted to Unit Terms N e w Assign- ments of Existing Unit Terms Assign- ments of N e w Unit Terms Unit Term Assign- ments ( 5 + 6 + 7 ) T I D D S C 1.61 2 . 5 0 0 . 9 7 0 . 8 5 1 . 5 8 2 . 1 2 3 - 1 9 4 . 6 2 3 • 5 2 4 . 2 6 2 . 6 9 0 . 9 5 0 . 9 1 ° - 4 3 6 . 8 8 5 . 6 4 Averages 1 . 8 9 0 . 9 1 1 . 7 2 3 . 6 1 3 - 7 5 1 . 8 2 0 . 6 7 6 . 2 4 total to 595 unit terms. T h e average num- ber of unit term assignments was in- creased from 3 . 5 2 to 6.88 f o r each T I D report and from 4.26 to 5.57 f o r each D S C report, or an average of 6.24 unit terms per report. T h i s last average is three and a quarter times the average number of subject head- ings per report, and it indicates that the depth of indexing is much greater f o r coordinate indexes as we assume f r o m this test they would be prepared than f o r the conventional subject heading catalogs as they are now prepared. T h i s difference is not as great as indicated here, since the conventional subject heading catalogs pro- vide access to reports by means of cross references in addition to entry under the headings. L a c k i n g an accepted terminology f o r the sum of subject headings plus cross- references f o r any report, w e are calling this of this situation, wTe conjecture that it is probable that the D S C policy of assigning headings liberally assures a better conver- sion to a coordinate index than does the T I D policy of restricting the assignment of headings, but that the T I D abstracts are more informative f o r indexing purposes than the D S C abstracts. If a search of a subject catalog is con- sidered from the viewpoint of an average T I D report, it w i l l be found entered under 1 . 6 1 subject headings and access to it w i l l be provided under 1 . 5 8 cross references, or a total of 3 . 1 9 access points, compared to 3 . 5 2 entries when the same subject head- ings are converted to unit terms. A similar comparison for D S C cards shows 2 . 5 0 subject headings plus 2 . 1 2 cross references per report, or a total of 4.62 access points per report compared to 4.26 entries per report when the same subject 280 COLLEGE AND RESEARCH LIBRARIES headings are converted to unit terms. If the pattern of the T I D cards had been repeated, these should have been five or more unit terms per D S C report, rather than 4.26. A n examination of the subject head- ings on the D S C cards reveals why there are fewer unit terms than access points, for many of the reports are assigned overlap- ping headings with certain words in com- mon which are used only once in converting to unit terms, for example, Meteorological equipment and Meteorology—Research, in which four words (or four access points when cross references are included) reduce to three unit terms: 1 . Meteorology; Meteorological (on one card) 2. Equipment 3. Research Since the number of access points is equal for all practical purposes to the number of unit terms for both catalogs, it might be assumed that a coordinate index whose terms are converted directly from subject headings offers no advantage in reference use over a subject heading catalog, but this assumption is incorrect for these reasons: Coordinate Index 1. Reports are listed on all cards consulted, for there are no cross references and no subordination of words. 2. Unit terms can be freely combined in the searching process, thus providing combina- tions to meet each searcher's need, i.e., more generic or more specific searches. 3. The searcher is certain that he has access to all reports listed under a single word. T h e searcher is interested in how many reports can be provided to meet his par- ticular need with the least effort and time, rather than in the number of access points or unit terms per report. A n extensive comparison of subject catalogs and co- ordinate indexes for reference use is planned, but until the statistics are availa- ble, the value of converting subject head- ings to unit terms can be measured only as shown above, although demonstrations per- formed wTith the samples indicate that the reference advantage of this level of co- ordinate indexing is considerable. T h e value of improving the level of co- ordinate indexing by considering titles and abstracts of reports in addition to convert- ing subject headings has been demonstrated, however, in unit terms per report. If it is assumed that the average of 6.88 unit terms per report is the optimum for coordinate in- dexing of the 1 0 0 T I D reports (and here we recognize that all cataloging and index- ing are subjective accomplishments), then 2.69 terms per report of this total are as- signments of unit terms already used in the previous sample—in other words, unit terms under which the searcher would expect to find reports but under which he would not find them in a subject heading catalog or in a coordinate index prepared by converting subject heading assignments. T h e same condition applies to 0.95 unit terms out of the average total of 5.64 unit terms for the 1 0 0 D S C reports. N e w unit terms were needed for both sets of reports: 0.91 unit Subject Heading Catalog Reports are listed under subject headings only —just over half of the access points—and not on the cross references—the remainder. Combinations of words are frozen because of the use of multiple term subject headings and cross references. Because no cross reference system includes all permutations of words in the headings, the searcher is never certain he has access to all reports to which a word applies. terms per T I D report and 0.43 unit terms per D S C report. T h u s the review doubled the unit terms used for the T I D reports ( f r o m 3.28 to 6.88) and increased those for the D S C reports by one-third (4.26 to 5 . 6 4 ) , and these figures are a measure of the superiority of this level of coordinate indexing over subject headings. JULY, 1953 281