lib-s-mocs-kmc364-20140601053644 COMPUTER-BASED SUBJECT AUTHORITY FILES AT THE UNI- VERSITY OF MINNESOTA LIBRARIES Audrey N. GROSCH: University of Minnesota Libraries A computer-based system to produce listings of topical sub;ect terms and geographically subdivided terms is described. The system files and their associated listings are called the Subject Authority File (SAF) and the Geographic Authority File (GAF). Conversion, operation, problems, and costs of the system are presented. Details of the optical scanning conver- sion, with illustrations, show the relative ease of the technique for simple upper case data files. Program and data characteristics are illustrated with record layouts and sample listings. INTRODUCTION As a corollary to the creation and maintenance of large library catalogs, it has become necessary for academic or research libraries to maintain author- ity files of various kinds, such as author name, subject, series. In a manual cataloging system these files serve to unravel the mysteries of form, mean- ing, and usage to the cataloger. They also serve as a control to h elp avoid conflicts, synonyms, or overlapping subjects. With a system of decentral- ized catalogs using different subject entries from a system's union catalog, some method must be derived to preserve such usage for the cataloger. A computer-based subject authority file provides that means. In January 1970, the University of Minnesota libraries began studying the relationship of subject authority files to both the present manual cata- loging system and to a planned mechanized system employing the MARC II format for storage of bibliographic data. Minnesota's subject authority files are divided into two distinct logical files: Subject Authority and Geo- graphic Authority Subdivisions. The Subject Authority File ( SAF) con- tains all topical subject heading terms and their subdivisions down to nine Subject Authority FilesjGROSCH 231 levels of term, and Geographic main headings, i.e. U.S. with nongeographic subdivisions. Nonterm data such as origin, usage notes, "libraries using," and other kinds of information are contained in the SAF. The Geographic Authority File ( GAF) contains topical headings found in the SAF, with geographical place names as subdivisions and indications of direct or indi- rect terms in geographic heading assignment. Also similar nonterm data as found in the SAF are found in the GAF. Immediate and long range benefits, together with the cost of conversion versus photocopying showed that greater flexibility would be achieved through the conversion to machine-readable form. Some of the benefits were: 1) immediate assistance to the libraries performing their own decentral- ized cataloging, while providing cards to the union catalog at Minnesota; 2 ) future assistance to our coordinate campus libraries should they wish to increase compatibility of their catalogs to the Minneapolis Campus union catalog; 3) future provisions of a machine-readable authority to enable linking of various subject vocabularies together for an on-line controlled vo- cabulary subject searching system. When the decision had been made to convert the files to machine- readable form, we tried to determine what others had done regarding this application. Although much previous work has been done on subject analysis, cataloging, vocabulary construction, and mechanization of biblio- graphic processes, very few designers have developed systems to support thesauri or subject heading files. In 1967 Heald ( 1) reported on the system for TEST-Thesaurus of Engineering and Scientific Terms. The following year Hammond of Aries Corp. ( 2) described the NASA Thesaurus and Way ( 3) outlined in detail the Rand Corporation Library Subject Heading Authority List ( SHAL) mechanized using punch cards and computer in 1967. Mount and Kollin ( 4) described the use of the computer in the up- dating and revision of the subject heading list for Applied Science and Technology Index. Of course several famous information systems use mechanized thesauri , among them the National Library of Medicine's MEDLARS System with its MeSH vocabulary and the Department of Defense DDC Descriptors. In addition, the seventh edition of the Library of Congress Subject Head- ings utilized computer photocomposition. Another reported work on subject headings in a mechanized system is that of the Library of Congress in which a MARC record for subject head- ings is discussed. Avram et al. (5) give examples of this record and describe the system now under development at LC. Unfortunately, for us, we completed the work herein reported in 1971, thereby not structuring our file to MARC specifications. We mention this work here, as our file will lend itself to such a conversion, should we later require it. 232 Journal of Library Automation Vol. 5/4 December, 1972 DATA PREPARATION AND FILE CONVERSION The SAF and GAF files comprised 59 catalog card drawers of informa- tion (about 115,000 lines of typed data). Each file would be converted and maintained separately, but would use the same system design and processing programs. At a later stage, merging the files would be considered. More- over, the cost of the system would be lower if one design could be used for both files. Two conversion methods were evaluated, keypunching and optical scan- ning. Other methods would have lent themselves to this conversion, such as IBM Magnetic Tape Selectric Typewriters ( MT /ST) or an on-line system such as IBM's Administrative Terminal System (ATS ). However, because of the relatively small file size (under six million characters) and a desire for as economical a conversion as possible, only keypunching and optical scanning input were seriously considered. MT /ST typewriters were ruled out because of cost and lack of locally available tape conversion equipment. Keypunching was considered too slow in relation to typing. Our assessment of optical scanning as the cheapest method was confirmed later after com- pletion of the conversion phase of the project, as an estimated $1800 in total savings over keypunching. Files were converted without intermediate coding, permitting the typists to transcribe directly from the subject and geographic authority card files. The data preparation was done by the Catalog Division's subject authority coordinator. This librarian edited the file to eliminate ambiguities before the typist received the drawer. Otherwise, except for a quick check of the typist's finished sheets, the data were not examined again until after they were in machine readable form on tape. This procedure worked very smoothly, and caused the staff of the Catalog Division little inconvenience during the conversion phase. Figure 1 shows flow of the complete conver- sion activity. Equipment used for preparation of the data consisted of two IBM Selectric typewriters Model 715 with carbon ribbon, dual cam inhibitor, and 065 typing element ( Rabinow font). One machine had a pin feed platen. This feature later proved to make no discernible difference in the quality of the typed output, but some typists stated that they preferred the pin feed platen over the standard platen. The Control Data 915 page reader with a CDC 8092 Teleprogrammer operating under GRASP III software was used for the conversion. Block time was rented at a commercial service bureau for $50.00 per hour. Li- brary Systems Division personnel operated the system during these time periods. Control Data provided a system manual and debugging time in order to prepare for our operation during conversion. However, little assistance in handling the application was actually received from the Con- trol Data personnel, who were familiar only with business data processing. A stock form, called the CDC 915 page reader form, procured from a Pr1111 SAF!:.. CAF lv!astcr !.ists 1-. upda t <> l'JJ'I,,tt.· . t r~·a ll' S,\F" GAF I \la::-,h·r t <.q"' c.. ' CO:"\' U\SJO\. ACTJV J T) Scanned Shev t s Subje ct Authority FilesjGROSCH 233 CDC 9 15 PrOCl'Ssing Consoll• c..• rr ur· list & n ·.1··• 1 1-------tl shec·ts CDC 3300 list tape fo r er r or chcc king CDC 3l00 updat .. , < onvl•rl ra\\ l alh' tu tnt<~ r- 11"letl i a t f• fornHtt Fig. 1. Conversion Process for SAF and GAF 234 Journal of Library Automation Vol. 5/ 4 December, 1972 T SOC I AL SCIENC E RESE ARCH$SGF n T ESS AYS n T PERIOD I CALSn D CL , En T SOCIAL SC I ENC ESn N DO NOT SUBD I VIDE FURT HE R WITHOUT APPROVAL - n T ABST RA CTSn T PERIODIC ALSn i R c .. 19 00- $f1NU n MNU PH . D, THESIS SHAW , ~EOP~E. i!1i; 1970-n ADOPTED J UNE 1970 PER RECOM~ENDAT I O~ OF A- S - n DO NO T DAT E SUODIV I S?D E FUR THER IN MNU CAT - n Fig. 2. SAF Input Typing Sample Page local forms vendor, was used. This form has a typing area of 9~" x 13" marked off by faint blue lines. Top and bottom alignment areas are pro- vided to check for line skew. Scanner throughput is increased by use of the longest permissible form with as much single line data as possible. Figure 2 shows a portion of a typed page from the SAF. Line 1 is the format recognition line which was repeated on each sheet as a precaution against its loss by the optical scanner program during processing. Such a loss of the format recognition line would have forced complete rerunning of the job. The remaining lines show the various data elements identified by tag characters. The complete set of tag characters is shown in Table 1. The end of page symbol # is used on pages which terminate before the last physical line of the page to increase scanner throughput. The h symbol terminates each line and serves the same speed-increasing function. Table 1. Conversion Identification Tags T ag T D N c R z X Description Term D epartmental catalog in which the t erm is used Scope note or general note on use of the term Continuation line Reference from which the t erm was verified if other than LC F ollowed by S = See; by SA = See also; by X = See tracing; by XX = See also tracing. Geographic authority flle cross reference tracing (implied ). Subject Authority FilesfGROSCH 235 Table 2. Term Subfield Indicat01'S Indicator $ SGF $ DIR $ IND $ MNU $ PROV $MeSH $ NAL Description Term also entered in GAF Direct Indirect Local University of Minnesota subject term Provisional term Medical Subject heading term National Agriculture Library term Indentation spaces serve as a flag to the conversion program to show the level of the term or other data element. This technique decreased the number of characters to be typed, yet level errors were easy to detect during proofreading. Subfield indicators for certain nonterm data completed the input format used during conversion. Table 2 describes these indicators and the meaning of each subfield. The GAF typed input is shown in Figure 3. Note the similarity between the two files, yet the presence of the variant treatment of an older term (SOCIAL SURVEYS IN) from a newer term (SOCIAL SCIENCES). As a result the Catalog Division has now changed these old form terms to con- form with Library of Congress subject heading forms. 1>4oor. T SOCIAL SCIENCESn T HISTORY$DIRn T BYZANTINE EMPIREn T SOURCESn D ARTn T SOCIAL SURVEYS IN$DIRn X AFRICA, SOUTHn X ALABAMAn BRYNMAWRM?, WALESe~~ X aRYNMAWR, WALESn # Fig . 3. GAF Input Typing Sample Page 236 ]ourruJl of Library Automation Vol. 5/ 4 December, 1972 During typing, error correction by typists was facilitated by the use of three special characters: .J, -Delete line ? -Delete preceding character t -Type over a character to delete character without inserting blanks. A program is typed on an optical scanning sheet in an assembly level language for the CDC 915 page reader. It is then assembled into object code which operates the page reader and its controlling computer. An example of the program used in this conversion is shown in Table 3. Line 1 of this program defines the input-output and control characters together with a coordinate to terminate reading of a line if data are not found on the line. It also defines the special characters described above for error correction, end of line, etc. Line 2 specifies that a stock form (not pre- printed) is to be read, giving the left-most and right-most character posi- tions and maximum number of lines per page together with the first line number to establish the scanning area coordinates. These coordinates are expressed as three digit octal values determined through use of a forms grid and ruler. Line 3 describes the tape record format including the field size, the blank fill character, left or right justification, and alphanumeric or numeric only data field content. Line 4 instructs the 8092 telepro- grammer unit to convert certain characters to octal values matching the CDC 3300 computer system which are not identical to the normal 915 page reader octal values. The final E terminates reading of the program sheet. From this sheet GRASP III compiles an object program which is stored in the 8092 teleprogrammer memory, enabling scanner operation. SYSTEM DESCRIPTION AND OPERATION The raw data tape created during optical scanning was used to build the SAF and GAF data files. The magnetic tape coding is binary (odd parity) using 800 bpi density. A fixed length record of 20 characters is used with 100 records per physical block. As many 20 character Format C (continuation of data) records are used as needed to achieve variable length logical records. Table 4 shows the three record formats used. Table 3. CDC 915 Program for Raw Data Tape Creation ICTLIBLK,DSICAN, ? IDLT,tiEOL,niEOP,#IfMT,wlww ISTKID27,350,116,004lww E Subject Authority FilesjGROSCH 237 Table 4. SAF and GAF Record Formats Fo rmat A - Control Record a;ar.--- Contents Pos . 1 2-S 6 7-14 15 16- 18 19-20 Reco rd tv~e Paqe number Column number File cr~ation date File identification Subj. Au th. (SAF) Geog . Auth. {G4F) Co lumns used (123 standard) )lumber of 1 i ne s ner page (75 standard) Format a - Data Record (ini tiaJl ~~!~ · - Con tents 1 Record Tyoe Tenn Reference tenn (GAF only) Reference Dept. Library See See also See from Values 1-9999 1-3 '1:1-DD-YY s G 123 . 121' 111' 131 80 max . Values T X R D 1 2 3 4 Fonnat B C ar. Pas 4 S-6 7-20 (Continued) Contents Qualification code (6 bit binary) SGF (Se~ Geograohic) DIR (Direct entry) 1~0 (Indirect ent.) PRO (Provisional entry) t1NU {r1i nn esota tenn) MESH (Medi cal subj. heading term) NAL (National Agri. Library tenn) Comb inations of these terms are possibl~. They are stored by adding the above values together, i . '!. 17 - r1NU/SGF Number of disolay lines for item First 14 characters of i tern Values 1 2 4 8 16 32 48 2 3 See also from Level number 1 -7 Fonna t C - Data record (cont1nuation) Sort exception code i~umeric ~xce'ltlon Hvn;en excertion Sub>titut1on exceo. U.S. obbreviation ~t . Brit. ab 1 'r~v . !J H s u ' Char· Contents Pas. 1 2-20 ~ecord tyoe Con tinuation of item . Values blank or To change or modify the file, keypunched cards are used; one transaction card is used for each correction for both SAF and GAF files. Table 5 shows the layout of this card. Table 5. SAF and GAF Transaction Card Column 1-4 5 6-7 8-9 10 11 12 13 14-15 16-80 Contents Page of master list Column of master list Line of master list Sequence number Deck number Continuation number Level num ber Transaction type Add Cancel Modify Record type Term Reference term ( GAF) Reference Departmental Library See See also See from See also from Data Values 1-9999 1-3 1-80 00-99 or blank 0-9 blank or 0-9 1-7 A c M T XT R D s SA X XX 238 Journal of Library Automation Vol. 5/4 December, 1972 Catalogers in the Wilson Library (the University's largest and central library) and the Bio-Medical Library use a 3 x 5 card as an input form. This card is filled in and transmitted to the librarian acting as subject co- . ordinator. Then the information is keypunched and prepared for submis- sion to an updating run. The normal schedule as originally planned was to run a cumulative supplement monthly, with a quarterly full updating of the file. However, this schedule has been flexible as the transaction vol- ume has varied considerably from early estimates. Currently updates are run quarterly to produce supplements, with a full listing annually. These updates vary from 5,000 to 14,000 transactions. The program for the system is written in COBOL for the CDC 3300 computer operating under the MASTER operating system. Upon demand the program performs four basic functions on the data files: 1 ) creation of a cumulative supplement list from a transaction card deck; 2) updating of the tape files from the transaction card deck; 3) preparation of master lists either during the update process or independently; and 4 ) querying the file on the basis of user defined search terms. Parameter cards control the options available when supplements or master lists are to be run. The ACCEPT, DECK, LIST, ABORT, LINE, SPACE, COLUMN parameters provide control over cutoff for new supplement, transaction card list form, termination of job if the number of error cards exceeds a given value, number of lines per page of output, and number of blank lines before and after each transaction on the suppl~ment, and whether a single or double column supplement is to be produced. Figure 4 shows a sample from the SAF Supplement. The updating phase of the program creates the new master file and pro- duces an update error listing accompanied by a report on composition of the file by level number, kind of data, and logical/physical record counts. The master list printout is also controlled through parameter cards. The LINE, COLUMN, SELECT options indicate the number of lines of data to be printed in each column, the number of columns per page, and which pages are to be listed. This latter feature permits supplying replacements for pages improperly printed or bound and suppression of printing when a program restart is necessary. Figure 5 shows the most commonly used Master List format. The file query function is performed upon demand to assist in file revi- sion, to change a term throughout the file , or other special purpose. The search items can be composed of any and /or combinations of record types, record levels, qualification codes, sort exception codes, and key words or phrases. A keyword search is a character by character search of file items. Thus, by specifying a root word, all derivatives of the word formed by adding prefixes or suffixes will be identified. If these derivatives are not desired, a blank preceding and/or following the root word in the search key will prevent their display. However, the word will not be identified if it is "'" · t'-•7"> 0 L t .... 'It<:; ~I· '11 • 7? ~ ll r Cflgp.-,g~tl"l'' "' 1 ~., . ., Dl'"' t t(, ! <:LA T TON <:tf' '""l::>P"'04. T J n 'l l ~If wt' tf!"< r-<::; "'"' r '""'o~~>.oo ~ 1 r "' rr T t~r.<: O[<:;flll,l ll' " '" r' L I fr'H'"J W I'\O o" ~ <:: rr '""l:;~r>r)oe T I I" 'I o r <;ro " '"" l"f"l""l'" Ht l'\ •~ <;fr '" l'l~O:.fll TIH TI"'N .aNt'! "'(Qr,r-o nr CO~P r) I'A T J Qt•<:: r(' =~~~~~n~!, r'Q Of II>N r c ooro;P nt;flf' NC'f o;cwont.<:: hjl'l r::ouq.sro:; "'<: ., ~v<:; ,. .. 'ill!. O!f"A'l 4 t• tw o o o:. <:;[ r AtJ T>4"10 S . CCS. t l Olf"4N r" f:O: l t l"' ( Of 11 fCI''ff;4 l" DfOJQI'Hr.aL<; cotrf'l owaotNG(4L o;P .. TNrTfll' I"Of ~f a._n I"OfMfN&l ~ l ~>j (.tJlf( <:; ff '"A 'I f tA••r.u arr <:;ff f".l'IT I"O! I"IC A~'"' "'OU~"' A T 10 '1 "'£" Ow~~"•ll; n[<::c;Pw & G£4l <:;PW J'It';T "' O roy •r Por vE: NTt O"' 1' Q\,' 1 W Dll.:;lf!I"JOJIT fi')N t '1'1)1)1'"1 etU>tt f'tO ,lfJ""' 1 '1 .... I,. f r..,e:vrwt rnN Y Vf1U fw ~I T C!:O\.S f ("O T.'4., OOIC:Vf "4TJON' r:: rni"!"'IL L4.W l"f'Nt't ~rT Or' lAw<:; vr f' C!:J "I"~IL J U OI SDH~TJON 0Lf~OTNG t.tH) ORJi r:lt""' O::<'f l"~f .. !"'IIL PQ11C'!01.J!: f (' 0 .. ~" <;,r[ I"OJ OI!"'Il l4.W (Of' ..... N l 1W) CO•H" lfl"f 1'1~ LdiW C: <:; C'r r.:tTOITIIIll J IJ '!J<;'H C T lO"'l Pl ~:~~~~ T:~2.~$1~ ~~l~~l1'!'" PO .. ,. o;rr !"C!IIIIT~Al lh f Of\Ooi &N L.\W) ~O tf lCOL O(IJN T Ofl>; t yr r, oepwv conn ra~ ~ Uf io4('1oo;, ~=g~~T :.~~~!~~~ f :~~~fl~~ <;H Nflll~ON f"r;>Q<; 'S S £r"T.fi1Nc;; c uo ~ 0£'\rOJDrT O"l GNQ TUVEl fOC:t• 'f 0 "1 0 " 1C fON rl !Tl O"lS 1 0 1:0. Of'!;f01f' f! 0 "1 A ~C TO~V(l 111 1) 1- r'f\O:A .... f l l" ll l1 '1 Oft:l! r'I ')Tr:AL S .,..,r tr fT~S ""U""'rc;; IIJQL t or.o ~ o,..,. Cu'~ e OJ G U 1'"111~>ITQ E~>~GlJS ~o~ " Ul TlJOf 6.,1'1 f( £L t G I Q"' Subject Authority Files/GROSCH U"fT VI' 0 <:Ify OF MI NNF'i0 T6 'i~~~t~; .. ~ ~~ ... ~; ~ i y r 1 ~" H;a "1' .. .J fr !l t .J io ~1\1 n t 1" ~"' ~'~l"lo ' " " t'IJllo l f:o L on ~o 1 , .. {l tl C I N Ot'l 0"1 01'1 f'l"' (ltl r: A"" 00 CHI 00 Ca N or~ r:a~ I ~ o , .. 7 t .. ~ t c; , aoo " I OtloZ H1 t Ot Af10 ' 1 o t .. z 20!> ,. ~ t ,.on r I O lio? '"~ flU Of\0 T I O lio ? '"~ 1'H ADO <:; I 0\lo? ''1 101 B(!O T 1 Olio ? ?2 1 t 5 t ann '< I Olio ? ?23 101 AOI) T 0131o '36 ~ Ofl C At.' 'S I Olio? zz.~ 1<;1 100 <:; I II IIo? Z?lo Ofl ru,j T , ,~ .. 31~ t o t • oo 1 ' Otlo? ??'!> oo ON c:; IIlJa '"''" 15 1 an o I Olio? ?lf. 00 CAN 1 I D11o2 ?21 ll l'f Cl"l 'i I Otlo ? ;t?~ U ("AN f I Otlo ? ??G 00 {"A"' <:; I I 0 1 lo? 7c; 1 I 0 1 ~!11'1 T I lt tlo ? 9 1'1o I t'l l AOO T I : I I I I I I 014 ? ~'SZ 00 C IN T IHit ? l~'i 101 AOrJ T Otlo~ 11! ''Ill Cl~ T Otlo3 1 H- 101 l{)O T 014 ! 1 ~El tSt 100 l' Ol lolo l ,3 101 A(10 t IIJ iolo 133 tst ~~n " llt]O tEo(' t!!'t AOO T I n11o1o 2'if 1 1}1 100 T ?'ifl 1 St ADO I) ?Ofl 00 C ... l T ;?E. 7 00 O N r1 ]Qf tnt 100 T Ht t 'i t JIM ~ 011q t fiO 11o f I flO IH]G tF. O 1~1 AOO ADO AM AOO •no •on AOO rAN CAN CON O N CAN r.&N on~ lF.f tot U )l} nt!OO V? tftt an o Dtlo" t '7 t 'i t •nn nt•)T V I " f 'l ~ • "" ' pcrn •r\trrro:; "r'\J71~~ ,:;;o rr·~~ "'' '"'' I 1_,,_ JO< I• • .,,,, 'CTJC••t s~~> rH.ll · y . ,:. ~,.~;~l.~'¥!1 .~~~;- ~ " ~ .. ..... :, 1i-1T~J,,.S;r; ~;1~,.:HM:}n rr "'" .. , ':' I r p,f c: Of" rr ... t .&': T t r(!H~"I •r t rr "P' rt u.>. P ~~~' ' " OF u .- .. t ~llt .. , 1. .. r,,e t C'f1P.lUf T7 0 "' '· Y'" t • " "Ct~;;::~~l~ft~ ~; ~ ~~f ,.Al f f' 'i l O(" fl~~~fl ~ ~~f! J<:C y H ·•rv • ~ro: 1\JH"tof""J<:; ·C"~ Jy'! ri'IWll- ' ' '" ll~ ( JIH ).OO::J'i . .. , . ~·~. 7 1 c . ... ,. .. r~· r .. • t '"* ~ ....... ., ... , :. "' "'ll"'·""""' • n•·rr'" ~ r .,.q uot: :::g: ~~~; ~ J~J!~H:~i,~~ u-1 o • "('• '· ,"1(1' 1 ~' T1 Y If tJ~ • • l i, W) 0 1·•··.r:r , cr -.. ~ ... ,(' .. ~ C:H<' t r r l'tfi(Trt ~o~r <; :· ''' f ~ ; ~,;~.K1 ~~~~ ~ ~~ ~~ r,: .. ' ~~ ~ ~f~!j'~! ~~ i (: "t.,t · r r,. r •• . ~·r =~""~i~t:~:~;· ... ~ ,., .. ,- , ,. ... ,. .,. C"'' 'trt: •• .,. • ttl\ t IPL~ ~ • c,.~rr •tt t , , • I ,("'0 P~ tr ~ '' "'• 1"1 I t"ll"' P'H f IO , t , ! <\3 <>-t"lo' . f'l~ ·l"t.!I"T ll''l"' Ct' ~J·?;l!;: ... : ,~! ~/f , 6 ·~~ r : ;~~ ~~~jpo y ~" ~::~-~~,Y~f'·~~~~~~~~~ ' ~~~-·11 nrn , ''l Tl ~.l'f Htrt• , ft'"dll • .•· ~"'• n tr~ • ,. .... ~ '' • '' '' ,. , D l"ttCo'" ' ' "c•rtt ., , r " r.l U .fll r-· - "~:~.l~t ~ ~c ~~~~itiL<: ll f' ,., r~ U" • "O"' "r~ t - ~ ... , , ,., ' '' ., 1"- r • •~ r • •r• 1 l· f'' I /'5-f,t' COMU FRU. ITAT( -.,1 •21 • rn' o:• Pila t 1r1 c~c:,r,.ru•(,n ·~r: ... ~~:~: ~~~ H·~(l~~~ .. ~,,~, ,ro II('(' " <'• ((fO,""" fY illr. .. .t klf'l 1 r ~ '"'" •n _., co::~~" ~=ffH ~n~~; ~~ n1~~ :, C' \l ll ro~· tHC1 "":~~~"~~~~f' ~ r~ . ~:~~~ ~ cn·'"' "" ' ''f"' l.r ' ' ch t Hr . ., ""r ONf,C. &'tlf ,.f<' , f' f'I-IS£ C'f •JTn"' t,.D IIH • H' P P!J ..; t f'l''l~"RV•TtOI t'F t't'<;C -•- ,.,,.,.. rr•~•uv •t rn. cr· "' P'f iT t r • ,., ff'~f ... ,:::. • ~('\< ., (l' f> t rr ,. .,. li""~ IPT " ,~ '"J k\.~1 ... H t r . t:"'HVJ>v • nr"' •ttO . ,. . Tf'Ci fiJ : • ~"n· .,~ ,h'• ' rot o1 .. r.,.t· r- "'" ,.,::,..:a¥~!~:; · .... ~~~~t ~~~~l=~ .. , .,., , li P , l t' f' h l l,.r ~ TS • r .,,.,c;rq,. u tct r ,- " 'll:IOI L "rc;ru•r~" I <;CF ; ~:~. !~~ .,:f .. , <'c; A'f S "'" I"C'I! r 't" ~~~~~~~l ~ ~~ ,~ar.-t.c t '>r. F v.-;1 ~\~~~t ·~~!o& "r " yriO D('If' ll' " rn,..o;.• J> VU I O .. ('f ,..IIU<> . "Fr "''''"' r"'t .,, ~y t • I I"._ ,. ,.. .,o:r- J>yttt r • rr r'' "' ''llfr,o: · , ... "~H; ~~ ; q .~f '"- t"V• t latot .,.('I f' O! :~ l·~l ~ ~ ~~ 's~ ~~~~~~~~; I O N t;(' ~<'~" f.'Y .t tJ rJ CF r P1 Jol S "~I Pf' [t 1 • , (f'NJ' I'V .t tr; ,_ -H O PP' l i" CH1 C,._ r:o•·c;~J.o 'flt~N CF ~r i f lft"(:o:; c;u· rC"" t'l ~'t i"N Cf N.tT U"Il FF <' OIJ111 '" r..; • ro;,i~~" ~!l ~~ .$' ! <'PI (' ~ tP~Yf ~~'ll rc;t, · ~g; ~~=~:lJ~~ ~= ;~;"'!~~f' c• o;. H .,OLCI6" f'f':,~~~ :~n('~C"~~ :~~;~~~l OP<' o:-rr CljJttJ•t:•. (l""'"''lfV I t J I"'N &Jon cr• t i" I I TJ')t ,.,. :, ·::~~n ~ f, ,.~~n~~; f('l"l •rc• crov11 1 ~1 t I'I (P~u c:o:t u IC 0tU711 1H i< C.< F I PTtl IPII U"' O: J ., ro • """" ' 'n• t r~ •r~< tiH' ""'"t."f< • C:t'l'C)10f'OltH' H I W) • cowo:;f' l(t.,H' OI •,.tu••t a, r ,, • r t"t••nli Tf ( "" •"t" toorc ., ~ If ) i •nrt,Y Fig. 5. Master List Format Using 3 Column Standard expansion of the shelf list. Although file conversion took five months to complete, the program to operate the system was delayed because of termi- nation of the programmer originally assigned to the project. Although the basic program features were ready in about 3-4 months, it was not until January of 1971 that the system was installed. During that year the staff gained experience in the system and cleansed the data of many ancient errors. By the end of the year, the system was an integral part of our Cata- log Division support activities. COSTS As was pointed out previously, there was consideration given to photo- copying the authority files to provide a duplicate set for the Bio-Medical Library. It was determined that this would cost $2,400 ( 60,000 cards @ $ .04 each) . This equalled the cost of the typing personnel and rental of optical scanning equipment. Moreover, there would have to be duplicate cards and filing to maintain both files, with no assurance that they would remain exact duplicates of one another. In our opinion the benefits of this Subject Authority Files j GROSCH 241 Table 6. Conversion Costs Item Senior clerk typists @ $2.40 ( 2 FTE for 3 mos.) CDC 915 rental (20.1 hours @ $50 per hour) Typewriter purchase Typewriter rental ( 2 mos. ) Magnetic tape CDC 915 forms CDC 3300 computer time @ $95.00/hr. Total Cost $1810.56 1007.50 532.70 60.00 74.00 400.00 1411.45 $5296.21 computer-based system offset the additional cost over the photocopying approach. To create these files completely cost $5,296.21 for all direct expenditures for clerical help, scanner time, typewriter purchase and rental, supplies, and CDC 3300 computer time. Table 6 shows the breakdown of these costs. During the conversion and development phase, salaries of the systems personnel were absorbed by the library so that only these direct costs were charged to the project. Also, the library absorbed the Subject Coordinator's time for editing the file of cards prior to typing. Two senior clerk-typists at $2.40 per hour each were employed for three months full time to type the data. Operating costs are borne by the library, which requires a half time librarian as Subject Coordinator and a student keypunch operator for 15-20 hours per week. The Systems Division provides program maintenance as required. Supplies and computer time require about $2,100 per year if quarterly full lists are used with monthly supplements. Some idea of the relative processing economy can be shown by exam- ining some typical running times on the computer. The sizes of the SAF and GAF files are respectively 4.35 and 1.75 million characters. A typical supplement with 12,000 transactions takes 45 minutes to print on the CDC 3300 equipped with a 1000 line-per-minute printer for either SAF or GAF. Printing of a full master list for the SAF and GAF is 1 hour 25 minutes and 45 minutes respectively. Updating the files takes about 1 hour 40 minutes for 12,000 transactions. A query of the file takes about 30 minutes. Cur- rent computer and channel charges are $95 per hour. GENERAL OBSERVATIONS Our experience with this project has shown us the high reliability of the CDC 915 page reader as a conversion device. Less than 1 percent of the total amount of data the page reader scanned was rejected. Those errors rejected were easily spotted and retyped. No scanner-produced errors were found in the data; however, there was an occasional failure to pick up spaces when more than three occurred together. These errors were very infre- quent and were discovered in the raw data proofreading. These errors were corrected and, after the final output file was generated, we again 242 Journal of Library Automation Vol. 5/4 December, 1972 checked for similar conditions and found everything in order with regard to term level indication. With an upper-case file such as this, use of the CDC 915 is simple and easily accomplished. ·However, the library should not rely upon a scanner manufacturer or the installation where a unit is being leased to provide all the assistance required. The library will have to design its application and become familiar with the equipment in order to achieve best results. All optical scanning usage requires that certain care be exercised in t~e typing operation. Lines must not be skewed, characters must not be blurred, and length of line can be critical even though the scan optics may be opened and closed over longer lines than are intended to be typed. Further, it is imperative that the paper used in the scanning operation meet speci- fications for use with the chosen scanner. Our experience indicates that a pin feed platen is not necessary to maintain forms alignment if typists use care in initial alignment. We experienced some operational problems when we actually tried our program on the page reader. Initially, the system would not compile our program. It was not due to a catastrophic error in our program, but rather a hardware fault in the 8092 teleprogrammer. In trying to read the program onto tape after compilation, the system consistently failed. We finally gave up trying and recompiled from the scanned input sheet at the begin- ning of each conversion run. No one at the data center could explain our failure to load, but we must assume an intermittent or undetected hard- ware problem. During the job run it was imperative that the scanner be watched closely as occasionally it would stop reading or fail to feed a sheet. These were not difficult problems but did require occasional attention by the center's customer engineer. On one occasion the scanner failed during our run, and we could not achieve a timely repair. We rescheduled for the next week and then experienced no problem. After our experiences with the 915 page reader at the data center we felt that we knew as much about the equipment as any of the operators we met while doing our production runs. We would not hesitate to use the page reader again for a simple file conversion, and would continue to handle the operation ourselves as the center operators were no better able to run our job. ACKNOWLEDGMENTS The author wishes to thank Mr. Eugene D. Lourey for developing the program for this system. Mr. Curt Herbert deserves recognition for the preliminary design for the system and initiating the optical scanning activi- ties. Also, Mr. Carl 0. Sandberg, who was responsible for the many details of the conversion portion and who now maintains these programs, con- tributed many significant design parameters. The staff of the Catalog Di- vision, too, deserve our gratitude for their file cleansing and data editing during and after conversion. Subject Authority FilesjGROSCH 243 REFERENCES 1. J. Heston Heald, The Making of TEST -Thesaurus of Engineering Scientific Terms. (Final Report of Project LEX, [U.S. Office of Naval Research: Nov. 1967] AD 661,001). 2. William Hammond, Construction of the NASA Thesaurus, Computer Processing Support, Final Report. (Aries Corp., 1968) N 68-28811. 3. William Way, "Subject Heading Authority List, Computer Prepared," American Documentation 19: 188-99, (April 1968). 4. Ellis Mount and Richard Kollin, "Analysis and Revision of Subject Headings for Applied Science and Technology Index," Special Libraries 60: 639-46, (Dec. 1969). · 5. Henriette D. Avram, Lenore S. Maruyama, and John C. Rather, "Auto- mation Activities in the Processing Department of the Library of Con- gress," Library Resources and Technical Services 16: 195-239, (Spring 1972).