lib-MOCS-KMC364-20140106083304 A MARC II-BASED PROGRAM FOR RETRIEVAL AND DISSEMINATION 141 Georg R. MAUERHOFF: Head, Tape Services, National Science Library and Richard G. SMITH: Analyst/Programmer, Research and Planning Branch, National Library, Ottawa, Canada (Formerly with Library and Computation Center, University of Saskatchewan) Subscriptions to the Library of Congress' MARC tapes number approx- imately sixty. The uses to which the weekly tapes have been put have been minimal in the area of Selective Dissemination of lnforrruLtion (SDI) and current awareness. This paper reviews work that has been performed on hatched retrieval/dissemination and provides a description of a highly fl exible cooperative SDI system developed by the Library, University of Saskatchewan, and the National Science Library. The system will permit searching over all subject areas represented by the English language monographic literature on MARC. INTRODUCTION With subscriptions to the Library of Congress' MARC II tapes numbering approximately sixty ( 1 ), the utilization of standardized bibliographic information in machine readable form has reached an all-time high. Numerous subscribers have written programs to access the tapes in order to produce acquisitions and cataloging products, but, unfortunately, the search techniques in these programs have been limited to searching of fixed-length information, such as LC card numbers, Standard Book Numbers ( SBN' s) and compression codes. Accelerated developments of searching mechanisms have been made by those involved with on-line bibliographic systems, but work on MARC information retrieval in the batch mode has 142 journal of Library Automation Vol. 4/3 September, 1971 been evolving very slowly. That is, the proffering of an assortment of remedies for one of the oldest library problems, that of current awareness and Selective Dissemination of Information ( SDI) using MARC, has not received the emphasis it should. The Library of the University of Saskatchewan has been utilizing the MARC tapes since their weekly distribution on 1 April 1969, with areas of usefulness so far having been restricted by the kinds of searching methods available. Concern has therefore been shown for a far greater exploitation of the MARC records. Since no algorithms other than time decay have been established locally for limiting the size of the file to items which have a high degree of usefulness, and since the cost of updating and storing the weekly files has to be incurred, it is only fitting that as many biblio- graphic records as possible be monitored and disseminated to those sections of the University where they can be most effectively used. A program package for current awareness/SOl is the most likely method for achieving this. Collaborative efforts are now the only realistic means of exploiting MARC. Costs can be spread over a large user group, and at the same time personalized services are assured to those taking part. It is for this reason that the Office of Technical Services ( OTS), Library, University of Sas- katchewan, has been cooperating with the National Science Library ( NSL ), National Research Council of Canada, on the development of such a current awareness/dissemination system. Known by the acronym SELDOM (Selective Dissemination of MARC), the program represents cooperation in the true sense of the word, in that the OTS's experiences with MARC are being coupled with NSL's expertise in nation-wide SDI. This paper will describe in detail the evolution of SELDOM, with a future paper to document user reaction to the SELDOM program. HISTORY The University of Saskatchewan is not alone in the investigation of MARC-based retrieval/dissemination programs. The Oklahoma Depart- ment of Libraries, under the coordination of K. J. Bierman ( 2, 3, 4, 5 ), has been operating a weekly MARC SDI service since February of 1970 and found its reception overwhelming. Over twenty user groups in the United States and Canada are presently experimenting with this current awareness service in various subject fields, using the Dewey Decimal and the Library of Congress classification numbers as search keys. Oklahoma's efforts followed the study by William J. Studer ( 6) and the Aerospace Research Applications Center (ARAC) at Indiana University. Studer's hypothesis was "that an SDI system concerned with book-type material would be of significant benefit to faculty in keeping them alerted to what is being published in their fields of interest-especially faculty in the non-technical areas where books are probably still as vital, if not more important, a medium of information and ideas as periodical and report literature (7)". MARC II-Based RetrievaljMAUERHOFF and SMITH 143 In his experiment, Studer translated participants' interests into profiles consisting of weighted Library of Congress subject headings and classifi- cation numbers. Henriette Avram ( 8) of the Library of Congress' MARC Development Office reported on information retrieval using the MARC Retriever, a modification of Programmatics Inc.'s system known as AEGIS. Regarded as "essentially a research tool that should be implemented as inexpensively as possible," the MARC Retriever is tape based and able to accept almost any kind of bibliographic query. Unfortunately, it is only operational at the Library of Congress. Along similar lines are Syracuse University's L.C. MARC on MOLDS and LEEP projects (9, 10, 11, 12). The inter- active retrieval capabilities, which are used in both batch and on-line modes, permit a variety of queries over their MARC data bases. Additional projects reporting on the subject approach to MARC tapes in a batch environment are not numerous. Dohn Martin ( 13) at the Washington University School of Medicine describes a searching method by L.C. classification numbers, in which a PL/1 program is used to produce selection lists for the medical library. This is along the same lines as the work reported by J. G. Veenstra ( 14) of the University of Florida, D. L. Weisbrod (15) of the Yale University Library, and F. M. Palmer (16) of Harvard University Library. In Sweden, Bjorn Tell ( 17) has run a MARC II test tape in his integrated information retrieval system called ABACUS, while in Edmonton, Canada, Doreen Heaps ( 18) reports on author and title searches of MARC tapes in a Chemical Titles format. In England, related research is being contemplated by F. H. Ayres ( 19) for BNB MARC tapes. In Ireland (20), also, plans are in the offing for SDI services based on BNB MARC tapes, while in the United States, the first commer- cial venture is underway by Richard Abel and Company ( 21), which is contemplating selective dissemination of announcements. BACKGROUND The National Science Library has been providing an SDI service for Canada's Scientific and Technical Information (STI) community since April, 1969, spinning a variety of machine readable indexing and abstracting services on a regular basis. A questionnaire ( 22) was sent out by CAN /SDI Project officials in May 1970, asking its subscribership to suggest where subject expansion should take place in the future. Although the responses emphasized the life sciences, e.g. Biological Abstracts' BA Previews and Medlars, the NSL was nevertheless quite enthusiastic about adding the Library of Congress' MARC II tapes to their present SDI service, especially if the project programming could be accomplished elsewhere. Twenty-one subscribers responded to the MARC II tapes, indicating the existence of a good user group, although not one of top priority. The University of Saskatchewan Library expressed a willingness to perform the systems work and project programming, which was estimated to require less than four 144 Journal of Library Automation Vol. 4/3 September, 1971 man-months, making SELDOM operational by February 1971. THE SELDOM PROGRAM Facilities and Programming Languages In order for the OTS to make use of the PL/1 and Assembler programs, an IBM S360 computer configuration consisting of at least lOOK memory and a PL/1 compiler was deemed necessary. This presented no problem because the Library had at its disposal an IBM S360/50 with 256K bytes of memory. Additional hardware specifications include four tape drives, a 2314 disk and two 1403 printers, one with a TN option. The latter is soon to be replaced with the ALA approved library print train. Now, however, because of the addition of Large Core Storage (LCS ), large bibliographic files such as MARC will be processed much more easily. Release 19 of OS MFT was also implemented in order to effectively utilize this additional million bytes of LCS memory. This more than modest memory has great utility, although serious investigations of automated library systems such as this one can take place even with small memories. As can be imagined, the switchover to Release 19 came at an inopportune time as far as the SELDOM programs were concerned. Implementation of the new release affected the scheduling and turn-around times. The SELDOM Record Format Several years ago, the National Science Library decided to adopt a standard MARC 11-like format and design programs to convert suppliers' tapes to this standard format. When a decision is made to add a new tape service, such as Biological Abstracts' BA-Previews, to the present inventory of CAN/SDI tapes, the NSL personnel select those bibliographic items which will find use in an SDI environment. Selected items are then pulled from the input tape by the conversion program, and structured into an NSL format. This then was the first of many tasks facing the OTS- determining which fields should be utilized from the LC MARC tape for searching and printing. Of approximately fifty MARC tags, fixed and variable, only 32 contain information that might be of interest to users of the system for searching. These tags, however, can be grouped into analytical units, i.e. units of like information. Arranged in six term types, they are: personal name, corporate name, classification, title, geographic area code, and date. The abbreviations for the term types are P, B, K, T, G, and D respectively. Users then will be able to request information from the system in many ways, whether it be for a title term, or a combination of categories such as classification number and geographic area code. The twenty-three fields and five subfields chosen, along with their respective analytics are shown in Table 1, where [ ] are not searched and o are OTS calculations. Per- centages of occurrence, which was the criterion used for selection of the MARC II-Based RetrievaljMAVERHOFF and SMITH 145 tags, are also indicated in the table. All the 500 tags were omitted because NSL and the OTS do not wish to search abstracts, annotations, or biblio- graphic notes at this time. Where frequencies were not available from the Library of Congress' publication entitled Format Recognition Process jor MARC Records: a Logical Design, the OTS conducted its own counts over a tape selected at random. The tape chosen (Volume 2, Number 23 ) for the counts contained 881 records. Table 1. Search Fie ld Definitions Se.arch Key Personal Name (P) Corporate Name (B) Title (T) Classification ( K) Geographic Code (G) Date (D) FieldjSubfield 100 [400] 600 700 [800] 110 260$B 410$A 610 710 810$A 111 130 240 [241] 245 410$T [411] 440 [611] 630 650 651 711 730 740 810$T [811] 840 050 051 082 043 009 (i.e. 008 ) % of Occurrence Per Record 84.7 <0.1 12.1 22.4 0 11.7 97.9 2.4 4.8 11.1 4.6 1.5 0.2 4.3 ° 0.1 ° 100.0 2.4 < 0.1 6.0 0.1 0.9 95.9 17.5 0.2 0.8 4.1 ° 4.6 0.1 0.5 105.1 0.9 ° 95.8 34.0 ° 100.0 146 Journal of Library AtLtomation Vol. 4/3 September, 1971 The fact that only 28 data elements were chosen for searching purposes proved highly useful, since the National Science Library's search module was designed, for the sake of efficiency, to accommodate a maximum of 32 search field definitions. The program can handle this many fields, but on the average it makes use of approximately twelve fields per record. There may be occasions, however, when as few as seven or as many as twenty-two directory entries will be handled, not counting subfields. Table 2 is a distribution of directory entries for the sample MARC II file tape. The mean of the distribution of entries is 13, and the median 12. Table 2. Distribution of Directory Entries #of dir. entries L6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ::::,..23 #Records 0 2 0 28 74 116 160 149 123 106 69 25 19 5 3 1 1 0 881 % 0 .23 0 3.18 8.40 13.17 18.16 16.91 13.96 12.03 7.83 2.84 2.16 .57 .34 .11 .11 0 100.00 At the same time that d ecisions were being made regarding the inclusion of certain search fields, print field definitions were structured. Although the programs can accommodate any number of directory items, only 31 are required for satisfactory and meaningful output. The analytics for these definitions make up Table 3, where o are OTS calculations. Frequency statistics are again included. Description of Programs The SELDOM software is comprised of four modules. These modules (A, B, C, D) are easily identified in the system flowchart (Figure 1) and are: "A" the translation and conversion of MARC; "B" the searching of files; "C" the outputting of the search results , and "D" the compiling of profiles. Two IBM utility programs are also used. MARC 11-Based RetrievaljMAVERHOFF and SMITH 147 Table 3. Print Field Definitions Definition Term ( s) Causing Retrieval Main Entry Title Statement Edition Statement Imprint Collation Statement Series Statement/Notes Bibliographic Price Subject Added Entries LC Card Number Profile Number Expression Number Threshold Weight Weight Source Form of Content Language LC Class Number Dewey Decimal Number ISBN %of Occurrence Per Record 98.6 100.0 4.1 100.0 100.0 13.6 39.7 ° 131.6 100.0 53.2 ° 100.0 105.1 ° 95.8 ° 39.7 ° Translation and Conversion Program ( LCONV) The conversion program, called LCONV, converts the weekly MARC tape into a SELDOM MARC II-like format tape. The input records see the following changes: "%" used as field terminator, "$" used as subfield delimiter, "@" used as record terminator, upper- and lower-case ASCII translated to upper case EBCDIC, diacritics removed, text compressed, and unromanized characters that can't be approximated removed. The program is driven by two tables, one of which consists of the MARC tags in which the OTS is interested, and the other, the processes to which the selected tags will be subjected. Currently, all tags can be handled by one of four processes: 1) Process 1 extracts the language and the form of content code from MARC tag 008, and creates a new field 008 consisting of only these two units. Instead of a one-character code for form of content, a four-letter abbreviation delimited by "$A" is used. Language of publication is de- limited by "$B". Process 1 also extracts the first publication date from the original tag 008, and sets up a new field , tagged 009 and delimited "$A". 2) Process 2 handles the Library of Congress ( 051, 052) and Dewey Decimal Classification ( 082). It utilizes only the first subfield, compresses out slashes, and limits the length of these fields to 20 characters. 3) The geographic area code ( 043) and imprint ( 260) are routed through a third process which retains the MARC subfield delimiters. Subfield delimiters are retained to narrow the object field and reduce search time. 148 Journal of Library Automation Vol. 4/ 3 September, 1971 CONVERSION WEEKLY MARC II TAPE PROGRAM ~ COMPILE PROFILES COMPRO D IBM UPDATE UTILITY LCONV UPDATE -----, I CURRENT PROFILES UPDATE -----, I CURRENT ADDRESSES Fig. 1. System Flowchart of SELDOM. B c CONVERTED MARC HITS & MARC RECORDS SORTED HITS & MARC RECORDS PRIN PROGRAM PR NPRO PRINTED PROFILES STATISTICS MARC II-Based RetrievaljMAUERHOFF and SMITH 149 4) All other tags are routed through process 4, which removes subfield delimiters and heads up the entire field with "$A". Narrowing down the object field is not desirable for fields input to this process. The conversion program also outputs for each record a field identified by 035, a MARC II tag for local system number. This field contains data base code ( R for MARC), volume and issue number (extracted from MARC tape label), and the Library of Congress card number truncated to the first eight characters. LCONV sorts the tags, calculates base address and record length, builds a new directory, and writes the SELDOM MARC 11-like record out on tape. The Searqhing Program ( SRCHPRO) The searching program accepts as input compiled profiles, the converted MARC tape from LCONV, and parameter cards specifying data base and up to 32 search field definitions. Each field definition consists of a term type code, tag, and delimiter of the field or subfield to be searched. Six te1m types are allowed, although additions, deletions and changes to these six may be performed upon requests. All terms except date may be truncated on the right, with title terms benefitting from left truncation. The right truncation feature reduces storage and search time requirements. The searches are conducted over the converted tape according to the Boolean expressions which connect symbols representing profile words. Profile words are simply entered into core until the alloted core is filled, and the source tape is sequentially passed against the profiles; i.e., each of the records on the tape precipitates a search of the profile words in core. If all of the profile words were not entered into core, the source tape is rewound and another search is conducted. This continues until all profiles have been searched. An output tape is created containing the SELDOM record retrieved with a prefix consisting of the profile number, threshold weight, weight, expression number, hit number, and terms which caused retrieval of the record. Users also have the option of applying a weight ( -99 to +99) to each profile word. Each time profile words match terms in a record, the weight value of each of the words found is tallied. Upon completing the search of that record and upon satisfying the expression logic, the total of the weight values is compared to a threshold value. Thus, if the total is greater than or equal to the threshold value ( -999 to +999), that particular record is retrieved. Another option available to the user is a hit option, in which the user may specify the maximum number of records he would like various expressions in the program to retrieve for him. The Output Programs The output from the search program is sorted by calling up the IBM Sort Utility, which sorts the records on prefix. The sorted output is then 150 Journal of Library Automation Vol. 4/3 September, 1971 input to the print program along with the address file. The latter is a separate file that is merely updated using the IBM Utility IEBUPDTE. It is in this address file, however, that several options can be specified. Duplicate printouts can be obtained, such that the left and right sides of the page carry identical output, with the right side carrying a feedback mechanism. Two-up printouts, notes, and if necessary, punched card output can be requested. On the whole, the record printed out (see Figure 2) is similar in format to a 3 x 5 catalog card, the only differences being the fixed format, term or terms causing retrieval, the lack of name added entries and notes, and the control information at the bottom of each printout. PROFILE COMPILATION Because of the Library's bibliographic responsibility to the University, an alerting service such as SELDOM will vastly improve user awareness of the published monographic resources. First, users, in house and out, would not only be alerted to many works to be acquired by the Library, but would also be alerted to items that are currently not being purchased. Secondly, they would be assured of personalized services. Users of SELDOM will not receive listings of just new books, but will be notified of the latest books which are presumed to be relevant to their interests. Profiling When a prospective user (group) wishes to search a weekly MARC tape, his (its) interests are entered onto profile formulation sheets. These sheets (see Figures 3 and 4) contain a description of the user's subject interests, several references to the monographic literature, and a listing of the profile words with logical connectives. The profile words may number as many as 500. Figure 5 shows three of the approximately eighty profiles currently running under SELDOM. The profiles are formulated by search editors using words that appear in the user's narrative and references. Additional words are sought in the Library of Congress' List of Subject Headings. Classification numbers that express the appropriate areas are incorporated; depending upon the information need, personal names, corporate names, geographic area codes, and date are also prescribed. According to Mauer hoff ( 23), approximately twenty-seven hours per year are required of an information specialist/search editor in order to accurately capture and maintain a user's need for information. This figure _ incorporates interviewing time, user education, analyses of user feedback, and revision time. The success of this system or of any information retrieval system therefore depends on having sufficient profiling staff. The Compile Program (COMPRO) COMPRO, compiling of profiles program, edits the profile transactions 0018 0018 0018 0018 0018 0018 0018 OOJR 0018 0018 001~ 0018 0018 0018 OOJ g 0018 0018 0018 0018 001~ 0018 0018 0018 001 8 0018 0018 DAT E: MAR 1q, 1971 **************************************** •••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••• .........................................................•.• SHORTT LII\RARY, C/0 MURRAY MEM OR IAL LIBRARY, U~IVFRSITY OF SASKAT CHEWAN, SASKATOON, S ASK . ••• ••• ••• ••• ••• ••• ••• ••• ••• ••• • •• ••• ••• ••• ••• ••• ••• ... ••• ••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ····································*···················· ··· N CN AB, N CN BC IRISH, ERNEST JAME S WI 'IGET T, 1'll2- STRUCTURE OF THE NORTHERN FOOTHILL S AND EASTFRN MOUNTAIN RAN GES, AL BER TA AN D BR ITISH COLU MBIA, BET WEEN LA TITU DES 53 15' AN D 57 20 •, 1\Y E. J. W. I RIS H. DEPT. OF EN ERG Y, MINES AND RESOU RCES< l96 8> 38 P . ILL US ., FOLD. COL . ~APS I IN POCKET! 25 CM . ** GEOLOGICA L SURVEY OF CANADA. BULLETIN 168 **CA NADA . GEOLOG I CAL SUR VEY. BULLETIN 16R ••2.00 GEO LOGY BRITISH COLU~BIA . **GF.OLO GY AL BE RTA. LC 77-524 81>8 QE 185 POOlS FN OJ TW 000 WT 000 S R024'l FC 557 .11 LENG DA TF: MA O. 1'1, l'l71 SELDOM PROJECT: MA O.C II VOL 02 NO 4~ J 'l7 1 TH E FIJ LL ~WING MONOGRAPHS IN THE ARE AS I N WHI CH YOU HAVE FX~RESSEO INT ERF~ T RFPRESFN T TITLES RECENTLY PRCCES~EO RY THF LI~RAPY Of CONGRESS . THIS LISTI~G I S RE I NG PROV I DED TO YOU AS PART OF A RESEARCH PROJECT 8f i NG CONDUC T ED RY YOUR LleRARY I N COOPERATION ~ITH THE NATI ONAL SC I FNCF LlijRARY I N OTTAWA. MURRAY ~E~OR I A L LI~RARY , UNIVERS ITY OF SASKATCHE o AN , SASKATOON , S ASK . N CN NT CUNDY, ROBFRT . BEACON SIX . LONDON , EYRF & SPOTTISWOOOE, 1'l70 . 25 3 P., 16 PLATES. ILLUS., 7 HAPS, PORTS. 21 C~ . •• SO/ - FR AN KLIN, J OHN , S I R, 1786-IR47. **NO RT ~W EST T EQR I TORIFS , CAN DESCR IPTI ON AND TRAVEL . **ARCTI C REGIONS . LC 73-539884 F 1060 POOL~ F~ ryJ TW CO O WT 00 0 S R0249 FC LENG 9 17 . 122041 I 58~ C413263002 00 18 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 0018 ···············································*·········································· ····································· 0018 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••••••••••••••••••••••••••••••••• ••••••••••••••• 0018 ··················································································································*············ 001 8 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••••••••• •••••••••••••••••••• •••••• 0018 ••••••••••••••••••••••••••••••••••••••••••••• • •••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••• •• •••• ••••••••••••••• Fig. 2. Sample Profile Notices. ~ > :;x:, C":l ....... ';"< b:l ~ ~ ~ ::x:l ~ ~ ~- g ..... .......... 3:: > ~ trl ~ ~ 0 ~ ~ § 0.. (J') 3:: J-( ~ ~ ~ CJl ~ 152 Journal of Library Automation Vol. 4/3 September, 1971 PROFI LE -NUMBER 0003 SHEET- NUMBER INSERT YOUR ADDRESS I.ABEL IN THIS BI.OCK Reference Department Murray Memorial Library University of Saskatchewan Saskatoon, Canada. STA'I'E YOU'i $EARtH ' li!QO!$T _,, Iff .. ~ll~l:i\'1: 'OJm, AW l'W(i REJ:t:fi(P'C~$ _:·oF PAPERS PU8USHEO t'l! 'I!QO OR A 'C.ot.I.[AGUE WORkiNG I~ -::·YO.UR ::' ,fi~~o; ::. -(~I.£ASE ,;T'Tf'f; OR:, PRIIH.L .. This profile is intended to obtain information on current monographs that would most likely be of interest to the Reference Department, in order to keep the collection up to date. Reference works such as dictionaries, encyclopedias, handbooks, catalogues, etc. are the kinds of items sought. References: 1. United Nations. Economic Commission for Europe . Sub-Committee on Urban Renewal and Planning. "Directory of National Bodies Concerned with Urban and Regional Research. " New York: Unitec Nations, 1968. 134 pp. (JX1977) 2. Berlin, Roisman & Kessler. "Law and Taxation; a Guide for Conservation and Other Nonprofit Orqanizations". Washinqton: Conservation Foundation, 1970. 47 pp, (KF6449) 3. Hayes, Robert M.' and Becker, Joseph. "Handbook of Data Processing for Libraries." New York; Wiley-Interscience, 1970 . 885 p. 4. Havlice, Patricia Pate. "Art in time". Metuchen, N.J.: Scarecrow Press, 1970. 350 p. (N7225; 016.7; Art -- Indexes} .. ...... .. - .. . .... .. .. :; .. ..... .. .... .. .. :: l.lsf., "PRO'FI~: : woilos/ Mto tii~I'CI( rJe~ESSIONS ON IIEVERSl: SlOE .. ...... . ..... Fig. 3. Sample Profile Formulation Sheet: Narrative and References. MARC II-Based RetrievalfMAUERHOFF and SMITH 153 ., ·II C. f>ROFIL.E WORJ)S .. :' .. .. .. ,· n w · AC . • PROFI\..E IIIORQ~ A A. _AS* _a ANNIIA c ANN IIAI~ n RTRI rnr.llAP~* E ALMANA C* F DICTIONAR* G DI RECTORY H DIRECTORIES I EN CYCLOPED* J FACTS K GLOSSAR* L GUIDE* M HAN DBOOK* N IN DEX* 0 INTERLIBRARY p :HECK [S. * 0 GFNFAIOGY R MANUAL s MANUALS T T OIITI TNF* T u REFERENCE T v REP RINT* T w REVIEW* T X SYLLABUS T y SYLLABI T z CATALOG* T AA ABSTRACT* T AB STA [S' * T AC YEARBOOK* T AD rE rBOO K* .•.. R 99 AIR -~ 2 R 99 LIM-+ 7 3 R 99 Fig. 4. Sample Profile Formulation Sheet: T erms and Logic. 154 Journal of Library Automation Vol. 4/3 September, 1971 p 0007 B (\ DATE : FE3 ?8 , 1970 /!. CIIN<\0 11 13 BR ITIS H CO LU MB IA C AL[l!'RTI\ - . ---- - ··-- --- 0 SASKAJCHE~AN E I'IA.'-HTO'iA F ONTARIO G OUEfiEC OATF : FEB 28 , 197 C p 10 98 K K T T T T T A L~ I C43* B Ll3 104 4* C AvD!C-VISUt. 0 FILM* E AIIOIO* F VIOE(U Fl B tl ~ B B 13 Fl Fl 13 B H N'IVA SC OTIA l PRINCE EDWAiO ISLAN D J NEwF•JU'lOLANC . __ ,_ G INSTRUCT!DNAL H TRA~SP~RENC IES I AV K YUKJN L NURTH~EST T~RRITORIES M QUEEN ' S PMINTER tl N U, S , R. 0 FOR S hl!; I:IY THE SUPT ;- - - --- fl P AVA IL AA L£ F~O~ CLFAR!NGHDUSE 8 0 NA TI ON AL f\ R STAT( 1 S GT , BR IT, 8 T H,'I , S . U. B U UN IT FD NAT IONS '3 V U, N, T W AGARUnGPAPH* EOl R 99 AI B-M E02R99 N l iJ - R T T T T T p T T -,- Eo3 R 99 SIT ___ · ---- T E0 4 R 99 U IV E J5 R 99 W K G J LA NGUII GE Lfltl* K TV L TEACH I NG ~fiCrli~E* M PROGR~~~EC I~STRUCTION N CA I U C ~ MPUTEq-ASSISTED ~9 AI c1 99 c 1.1 - 0 DA TE: Hfl 2!!, 197 0 ')0()9 A PULLUT" tl CONTfi"H"lAT* C POl StJ~* -- F ENV I RO'li·IENT * F TO• G N-* EOl R ! ~I B- F l&G Fig. 5. Computer Version of Profiles. for sequence, syntax and semantics, and generates codes for the Boolean operators in the search expressions. The program Hags incorrect data base specifications, incorrect term types, and incorrect alpha codes (i.e. symbols corresponding to the profile words). Profile transactions are by way of card input, and can consist of profile additions, profile updates, and profile deletes. Listings accompany all transactions. OPERATION AND COSTS OF SELDOM From the time that SELDOM became operable on a day-to-day basis, cost information has been gathered, and since SELDOM is composed of four modules, the recording of items of cost has been easily done. For example, LCONV computer charges are presently $0.019 per record converted based on weekly files ranging in size from 1194 records to 2399. This breaks down to about 1939 records per week, and averages out to about $37 per MARC tape. Following the preparation of the tape for searching, the SRCHPRO-Sort routine is run. The average computer cost has been about $0.186 per profile per issue. SELDOM's user group presently numbers 81, with profile terms numbering 1121 or 14 terms per profile, and questions or expressions numbering 273 or about 4 per profile. PRINPRO was formerly running under stream-oriented transmission, at a total computer cost of $1.70 per 1000 lines of output. A shift to record- - - - - - -----------------· - MARC II-Based RetrievaljMAUERHOFF and SMITH 155 oriented transmission has lowered charges to $1.50 per 1000 lines. With profiles having averaged about 832 lines of output, the total cost of printing out search results has been about $1.25 per profile. Overall costs for the 81 profiles are presently about $2.23 per profile per tape, or $116.00 per profile per year. Since the profiles require updating at frequent intervals, charges of $0.37 per profile per tape have been incorporated into this charge to take care of changes in terms and addresses. Costs which have not been included in the calculations are such items as MARC tape subscriptions, forms, and staff time. DISCUSSION The OTS and the NSL have at their disposal a program package that is highly flexible. For instance, search keys can be added or deleted at will. Fields from the MARC tapes can either be incorporated or removed from the directory. Any number of fields and subfields can be searched on tape, and any new directory items may be created, with the SRCHPRO limit, however, being 32. This number was chosen because it satisfies 99% of the users' needs. Almost every procedure in the program is table driven, the result being that variations can easily be introduced into the programs. In consequence, if and when BNB MARC tapes are made available, and if and when a Canadian MARC service becomes a reality, searching of these tapes would present no problems whatsoever. The benefits to be derived from SELDOM go beyond the concept of SDI, because SELDOM can produce outputs for a wide variety of applications. SDI and current awareness have received considerable emphasis in the literature by those providing search services over a spectrum of scientific- technical tape services. Since MARC II has also elicited a tremendous response, especially from Kenneth Bierman of the Oklahoma Department of Libraries, these utilities do not merit additional treatment in this paper. SELDOM, however, is unique in that it is the only MARC-based SDI system capable of searches using six coordinated entry points, linear matching, truncation, weighting, and output options. From the point of selection, MARC has great appeal. Since the majority of the University's acquisitions (i.e. almost 80%) are English-language monographs, faculty and staff who have the responsibility for book selection would benefit from regular alerting services based on their areas of interest. Apart from receiving verified bibliographic information, the participants benefit from the timeliness of the records. At the same time, selection costs per record will be brought down significantly, especially now that this selection process becomes tied in to TESA -1, the Library's automated MARC-based acquisitions and cataloguing system. It has been suggested that selection and ordering could be done for the cost of selection alone. The only problem areas envisaged are the lack of Canadian imprints, and the lack of other non-English monographs, such as French, German, 156 Journal of Library Automation Vol. 4/3 September, 1971 Spanish and Portugese. A partial solution to this problem may take the form of a Canadian MARC Project. A more complete solution is on its way, since MARC coverage for other languages is anticipated by the beginning of 1972. Collection rationalization, an area receiving considerable attention along regional and national lines, can also benefit from SELDOM. Devising divisions of responsibility in the acquisition of library materials will enable libraries to acquire, organize, store, and make available to the public, comprehensive monographic collections. MARC deselection, where practised by subscribers, is being pursued mainly along the lines of time decay. The University of Chicago (21) has so far exhibited the only deselection algorithm employing a subject and intellectual level approach, in addition to date. They eliminate records from their file if they fall outside of their collection policy by using classifi- cation numbers. The OTS will be able to perform the same function, but much more rigorously, since its deselection criteria can consist of six elements. In this way, file size can be kept to a reasonable level, and update and storage charges will not be so high. Internal library data and information services will be along the lines of SDI, current awareness, demand bibliographies, and management statistics. These in-house utilities, which are already being obtained, have been very usefuL The Reference Department, for instance, receives a bibliography each week of MARC II reference sources. Another profile for one of the catalogers is monitoring the publications of the modern day novelists and poets. OUTLOOK SELDOM has been operational for only several months. While it has tremendous potential in the library field, and although immediate interest has been keen, the system will have to undergo considerable acceptance testing. Attention will have to be given to costs and to the user and his evaluation of the service. How SELDOM fits into a library's patron or reference services will be especially important, since the system will be integrated into a library's current accessions program and also the card catalog service. ACKNOWLEDGMENTS Major credit for the existence of the SELDOM Project is due to the systems analysts and programmers at the National Research Council of Canada, Messrs. P. H. Wolters, R. A. Green, J. Heilik, Miss R. Smith; and to Dr. J. E. Brown, National Science Librarian. REFERENCES 1. Personal Communication with Henriette Avram, MARC Development Office, Library of Congress, Washington, D. C. MARC ll-Based RetrievaljMAUERHOFF and SMITH 157 2. Bierman, K. J.: "SDI Service," lOLA-Technical Communications, 1 (October 1970 ), 3. 3. Bierman, K. J.; Blue, Betty J.: "A MARC-Based SDI Service," Journal of Library Automation, 3 ( December 1970 ), 304-319. 4. Bierman, K. J.: "An Operating MARC-Based SDI System: Some Pre- liminary Services and User Reactions," Proceedings of American Society for Information Science, 7 ( 1970 ) , 87-90. 5 . Bierman, K. J. : Statements of Progress of Cooperative SDI Project. In Oklahoma Department of Libraries: Automation Newsletter, 2 (February 1970 ), 3-4; 2 (June-August 1970) ; 2 (September 1970); 16, 25-26; 2 (December 1970), 34-35; 3 (February 1971), 1-3. 6. Studer, William J.: Computer-Based Selective Dissemination of Infor- mation (SDI ) Service for Faculty Using Librm·y of Congress Machine- Readable Catalog ( MARC) Records. (Ph.D. Dissertation, Graduate Library School, Indiana University, September, 1968). 7. Studer, William J. : "Book-Oriented SDI Service Provided for 40 Faculty." In Avram, Henrie tte : The MARC Pilot Project, Final Report ( Washington, D. C.: Library of Congress, 1968) p. 179-183. Also in Random Bits, 3:3 (November 1967 ), 1-4; 3 :4 (December 1967), 1-4, 6. 8. Avram, H enriette: "MARC Program Research and Development: A Progress Report," Journal of Library Automation, 2 (December 1969 ), 257-265. 9. Atherton, Pauline: "LC/MARC on MOLDS ; An Experiment in Com- puter-Based, Interactive Bibliographic Storage, Search, Retrieval, and Processing," Journal of Library Automation, 3 (June 1970 ), 142-165. 10. Atherton, Pauline; Wyman, John : "Searching MARC Tapes with IBM/ Document Processing System," Proceedings of American Society for Information Scien ce, 6 ( 1969 ), 83-88. 11. Atherton, Pauline; Tessier, Judith: "T eaching with MARC Tapes," l ournal of Library Automation, 3 (March 1970 ), 24-35. 12. Hudson, Judith A. : "Searching MARC/ DPS Records for Area Studies: Comparative Results Using Keywords, LC and DC Class Numbers," Library Resources and Technical Services, 14 (Fall 1970), 530-545. 13. Martin, Dohn H.: "MARC T ap e as a Selection Tool in the Medical Library," Special Libraries, (April 1970 ), 190-193. 14. Veenstra, J. G.: "University of Florida." In Avram, Henriette D.: The MARC Pilot Project, Final R eport (Washington, D.C.: Library of Congress, 1968 ), pp. 137-140. 15. Weisbrod, D. L.: "Yale University." In Avram, Henriette D.: The MARC Pilot Project, Final Report (Washington, D.C.: Library of Congress, 1968) , pp. 167-173. 16. Palmer, Foster M.: "Harvard University Library." In Avram, Henriette D.: The MARC Pilot Project, Final Report (Washington, D .C.: Library of Congress, 1968 ), pp. 103-111. 158 Journal of Library Automatwn Vol. 4 / 3 September, 1971 17. Tell, B. V. ; Larsson, R. ; Lindh, R. : "Information Retrieval With the ABACUS Program: an Experiment in Compatibility," Proceedings of a Symposium on Handling of Nuclear Informatwn (Vienna : 16-20 February, 1970), p. 184. 18. Heaps, D.; Shapiro, V.; Walker, D.; Appleyard, F.: "Search Program for MARC Tapes at the University of Alberta," Proceedings of the Annual Meeting of the Western Canada Chapter of the American Society for Informatwn Science, (Vancouver: September 14, 15, 1970), 83-94. 19. Ayres, F . H.: "Making the Most of MARC; its Use for Selection, Acquisitions, and Cataloguing," Program, 3 ( April 1969 ), 30-37. 20. Dieneman, W.: "MARC Tapes in Trinity College Library," Program, 4 (April 1970 ), 70-75. 21. "MARC II and its Importance for Law Libraries," Law Library Journal, 63 (November 1970), 505-525. 22. Wolters, Peter H .; Brown, Jack E.: "CAN/ SDI System : User Reaction to a Computerized Information Retrieval System for Canadian Scien- tists and Technologists," Canadian Library Journal, 28 (January, February 1971 ), 20-23. 23. Mauerhoff, Georg R.: "NSL Profiling and Search Editing," Proceedings of the Annual Meeting of the W estern Canada Chapter of the American Society for Information Science, (Vancouver: September 14, 15, 1970), 32-53.