Needs of the Habitability Data B a s p b 4.1 Conceptual Needs of the Habitability Data Base 6 4.2 Current Needs of the Habitability Data Base 10 5. <8S Survey of Interactive Information Systems 13 6. The Choice of an Appropriate HDR System 23 6.1 The Choice of Natural Language query 23 6.2 Suitability of S^ART 31 6.3 The Choice of Continuino With SMART 33 7. Developing a Replacement for S f* A >> AR T a s c choice or i m p I e rr en t a t i ' • , z w e II as the implications of continuing with S :v ART for I next phase of HOd development. ection 7 discusses some of the issues inherent in abandonina the S^IART program and rep I a cinq it with some other s/stem/ whether developed anew or adapted from existing systems. A. Meeds of the Habitability Data base We seoarate the notions of conceptual needs of the H D t> from 6 CERL HDb Recommendations the current needs in the following way: Conceptual needs are based on the problem itself* that is* the problem of storing the HDB and retrievinq the information stored therein on the basis of user requests. Conceptual needs reflect the end user of the HDb. Current needs* on the other hand* deal with the more practical immediate concerns of the HDB effort* making the service available to the current set of users in a cost effective manner as quickly as possible. The current need seems to be primarily for a system which will run the A N D R programs and allow StfART requests to be submitted in a batch mode. We include in current needs any system which could do the user's end function as effectively as SPiART* without extensive reprogramming or reformatting of the data base. 4.1 Conceptual Needs of the Habitability Data base Conceptually* the hDR consists of a set of statements drawn from an appropriate literature. These statements are formulated by trained specialists who not only condense the information in the literature* but also classify the information by indexing the statement. This process is conceptually equivalent to abstracting a document and providing an index classification of the document. Whereas most bibliographic retrieval systems keep the information about the document in the same record as the abstract (and possibly key words in addition to author* title* etc)* in He HDB the only information directly linking the entry in HDb with the original source of the information is a document number encoded CERL HDB Recommendations 7 as part of the sequence number field of the HDb statements. The index of the document is a multidigit strinq of codes which is prepended to the first card image of a particular statement. The user of the HDb wishes to formulate a simple request to retrieve information which is of immediate concern to him. With the HDB as originally designed* this request is stated in terms of the classification of the statement as represented by the index. This index is comprised of 10 coded values: Lbl F U N C . . . a three digit functional area code TRFC...a "> digit training facility code PHYS...3 1 digit physical setting code AENV...a 2 digit "A" environmental descriptor BENV...a 2 digit M F" environment descriptor 0CCU...a 1 digit occupant code P S T R . . . a 1 digit coie for posture of people I N V M ... a 1 dioit code for involvement of people 0R6F...S 1 digit code for organizational functions SFCN...a 1 digit code for function of the state merit A more complete description of the classification and indexing scheme is given in C 1 J . The primary programs for selecting statements interactively on the basis of the indexes are the A'mD and OR oroarams which run interactively on the D E € - 1 U system as part o * the Prototype HDB C43f53C31. These programs use a rather forced dialog to input the appropriate fields which are to be searched on and the values to be searched for. The interactive response is a set of statements/- along with the appropriate document number and the number of this statement with respect to the source occument. There is no capability tc net a count of documents which meet one CFRL H D 8 Recommendations criteria or set of criteria and then deter mininq whether that set should be farther limited by ANDinq with another set. The entire request is made at the outset/ and the entire set of documents which match the request is printed as output. There seems to be no capability of savinq the numbers of these documents for later refinement by further search requests. Two other programs exist in the Prototype HDP system which reflect both conceptual and current needs. The function of the programs is to allow the user to see the bibliographic citation of a document if he knows the document number and to see the text of the document if he knows the number. Note that AN DOR returns the statement and the number of the document. (The document is not really there/ it is just the collection of all statements which came from that document) This technique of finding statements is perhaps approoriste to some potential users of the HDB. In particular/ the person who wishes to write a criteria manual for design of a certain training facility might want to retrieve what is available and related to that kind of facility. How ever/ information specialists responding to a submitted query/ and to some extent the end customer himself/ might find that a better way of expressing the inquiry and conducting the search is needed. The HDB does not contain keywords which can be used to characterize content of statements. (In its present form/ content is only characterized by the index digit string). Thus a retrieval CERL HDB Recommendations system based on full-text search of the statements*' preferably with natural language input/- is a second conceptual need of the HDB. At the present time this need is met by the collection of programs known as the SMART system. This system operates in batch mode on the IBM 36 system. It has been implementeo 3 1 the Computer Services Office (University of Illinois at Urban a) as part of the prototype HDR effort. "The system takes documents and search requests in English/ performs a fully automatic content analysis of the texts/ matches analyzed documents with analyzed search requests/ and retrieves those stored items believed to be most similar to the queries. Among the language analysis procedures incorporated into the system are word suffix cutoff methods/ thesaurus lookup procedures/ phrase generation methods/ statistical term associations/ syntactic analysis/ hierarchical term expansion/ and others. " C 6 3 As a part of the SMART user interface for the prototype HDB/ a prooram on the DfcC-10 computer accepts input o'f Query submittals and formulates batch jobs for the 360. These jobs are submitted across a link to the batch machine. The user returns later to see if his job is done and relieves his output (responses to his query) by runninq another prooram on the DEC-10. The time lao between request and response has not been satisfactory with the present implementation. 10 CLRL HDB Recommendations What is needed and is missing in the current implementation is an interactive on-line version of SMART. Salton recognized this as a need C63. To run SMART interactively would require a different operating system on the 360* namely one that allows for time-shared user interactive terminals. There have been no major updates of SMART since the library was obtained from Cornell for the prototype HDB. At latest report* no interactive version of SMART is available in release form* although some effort was expended at Cornell in implementing an interactive version under the IBM TSO operating system. Even if that were successful* it would be of little value to any solution which proposes using the 360 at CSG* since that system will stay batch until its eventual ret i r ement . If the conceptual need for natural lanquaae processino of a query is artificial* some of the systems to oe mentioned in Section 5 would probably well serve the needs of the HDB. 4.2 Current Needs of the Habitability Data Base The current need of the HDP is a system which provides for the conceptual needs outlined above as well as the more immediate concerns of finding an appropriate operating system and computer to run it on. The scope of work for this contract lists five definitions of the needs of the HDB. Two of these fall into the class of con rtual needs: CERL HDB Recommendations 11 1) the types of programs currently in use must be ava i table ?) the system has the capability of handling summary data as well as bibliographic and textual data These have been examined in the section on conceptual needs. The other needs are current needs discussed in this section. The text -editing capability is desirable so that corrections and changes can be made to the HDB statements/ and so that new statements can be added as the collection nrows. Any system which will be capable of the interactive access required for the AND/OR programs will/ without exception/ have text- e d i cinq capability. So long as the HDB statements are part of a non-specific text file/ they be accessible and editaole with the editors on most systems. However/ the capability to edit HD r - statements which are already included in a data base which has under a one sotiip o'earee of inversion might be somewhat of a problem. The typical retrieval system requires that the data and the fields which will be searched be made ready for a larqe inversion process which is run against the data base to get it properly organized for faster retrieval. In some organizations this data base inversion process is very time consumino. The capability to access the statements independent of indexes to the statements is thus a requirement for on-line correction to the HDB statements. Similarly/ in order to keep the data base updated/ i* should L>e possible to input new statements in text form. This is not a 12 CERL HDB Recommendations problem. However/ it is quite likely that before new statements can be used as an integral part of the HDB* the inversion process must be run aoain. This would restrict updating to periodic updates of perhaps once a month. This is the norm rather than the exception in data base systems of the capability described. Commercially available systems will automatically be aole to take care of control and billing of outside users (they make a living doing it). University computer centers sometimes have more difficulty with this in that their process for establishing user accounts is sometimes rather cumbersome. However/- the systems under consideration and outlinea in the accompanyina recommendations all meet the criterion that outside users can be admitted to the system and billed directly. Similarly* the capability for remote Low-speed access fro^ terminals should be taken as given in all of the systems under discussion here. The only systems for which this is not the case are systems for which access is restricted to remote batch/ and such a system cannot meet the editing and interactive requirements. Where necessary* submission to batch systems should be accomplished via an interactive system* similar to the technioue used between the DEC-10 and the 36(1 in the prototype HDB work. This should always* however* be considered as clumsy snd not conducive to the kind of immediate feedback to be obtained with interactive CERL HDO Recommendations 13 systems such as those commercially available. 5. NBS Survey of Interactive Information Systems The National Bureau of Standards has already anticipated the need for government agencies to consider the choice of an interactive information system. A report published in 1974 constitutes a reference to the technical features and one rational status of such systems available at the time [?3. From the introduction to that report: "This report is written for the purpose of providing Federal ADP customers with information on a certain class of computer systems which are capable of handling scientific and technical inform ation„ The report attempts to show what is available and to characterize these systems in such a way as to answer questions which naturally arise prior to selecting such a system for a particular installation. The report is written at a level of technical detail which is aimed at information specialists rather than programminq experts. It is intended to be informative and instructive* and not critical or evaluative." "We have reviewed tor inclusion in this index over 200 systems which came to our attention from various published and unpublished sources as well as from word-o f-mou t h . The systems which were selected conform to the following definition: "Information Retrieval" or "Data Management" packaqes or services which are available to any Federal ADP installation* and which offer an interactive query and search capability that is geared tor use by non-programmers." They eliminated from consideration systems which: 1) are batch systems* I) have query languages not for use by non- pro q rammers/ .">> are in research or development/ 4) a.e no longer supported/ 5) are no lonqer in business or locatahle/ 6 ) are 14 CERL HDB Recommendat ions subject to legal or security p r o b I e m s in the way of releasing the system/ or 7) were not documented. It seems at least strongly suggestive that these systems meet the basic needs of the US Army CERL* if one of them meets the specific conceptual needs of the HDB. The intent of this section is to examine the organization of that report and to frame current concerns in terms of the selection criteria outlined therein. Table 1 is a list of the systems which met the criteria for inclusion in this survey. Table 2 is the questionnaire which was usee? to characterize the features of the various systems. Included in the reoort is a summary of the features of each of the examined systems/ listed in a manner similar to the format of the Questionnaire. However^ before examining each of the systems reported/ the suggestion is mede that the needs of potential users of the system be classified in order to make a first cut at system selection,, Their recommendation for a first elimination is based upon potential usage and estimated cost first/ then on the availability of a given main-frame/ ar\cj in the case of a requirement for a specific data base/ on the availability of that data base as a service. In the case of the HDB investigation/ several choices of main-frame are available/ and it has not been determined whether a package should be put up on one of these mainframes or a service bureau should be used. Sinc^ a decision can be made on these choices at a later time/ we can proceed CERL HDB Recommendations 15 Name BASIS CDMS CIRCOL (Data/ Central) DIALOG DMARS DML DRS DS/3 EMESARI ENFORM FLEXIMIS GIM GIPSY IMARS IMS(OEP) IMS/360 IMS/8 INQUIRE INSYTE LEADERMART MARK IV MARS III Name MARS VI MASTER CONTROL MICROTEXT MINIDATA MIRADS MUSE NASIS N.Y. TIMES OLIVER ORBIT III PIRETS QUERY UPDATE RAMTS RECON RFI RIQS SHOEBOX SOLAR SPIRES II STAIRS SYSTEM 2000 TICON UNIDATA TABLE 1. CERL HDB Recommendations 17

Name BASIS CDMS CIRCOL (Data/ Central) DIALOG DMARS DML DRS DS/3 EMESARI ENFORM FLEXIMIS GIM GIPSY IMARS IMS(OEP) IMS/360 IMS/8 INQUIRE INSYTE LEADERMART MARK IV MARS III Name MARS VI MASTER CONTROL MICROTEXT MINIDATA MIRADS MUSE NASIS N.Y. TIMES OLIVER ORBIT III PIRETS QUERY UPDATE RAMTS RECON RFI RIQS SHOEBOX SOLAR SPIRES II STAIRS SYSTEM 2000 TICON UNIDATA TABLE 1. SYSTEMS INCLUDED IN THE NBS SURVEY C CJ ic ai a fld£ ■c ■?. - u< U 4-1 a, »j -d >, 3 »h u rj rH *4.< 'u OJ c c o o o - 4J > c o & rj -H Ul •rtOi, i: HU r-| 0. •H UUP i c: n 44 CDC c. 4J u-i O id Ci 'C C O C £U-h V = F O V4 x: c C T3 id a CP >h t; o in -j -h .c 4) TJ 4J 4J C 14 »i O ~H C c x -h o >. o rH rH I t» rj In »-. id c c. o 4-1 O O Id -H O V 4-) 4-> >H C1C4J-3 c o c r( C O IT C V. o O (.'. o 4J c-j 1< c •r, t, a r-. x: c oi C O 4J O rH d CJlH-H u« ; 0*4 '3 id CP O c "U t; u ■H o; -h :< 4J 17 o Id 4-1 o r> id o ■H 10 14 c > O V -ri Id c w C X rl •n rj V4 O a it C ii 11 c m r~ o. ^ a a r( i'. O -3 c C C a o id 4-1 o 4) -H (rt ri 4-1 l: Cu-3 £2 i/ C -1 o •H C *J c o C S- 4J c > >, 11 3 Vh O ri id c. ■o .« w =-< Q z ►J c r4 C S3 2 Ul r^ < U rH £h ~; ►.3 fe- H l-H ^ t- t! Q Hi l~ rJ K u r> c H t: < < n K M ^_i t; c C-. c^ K < !£ J ^ C VH t. • • • y KH < J 2 lz 1 C u r. m u Lh Ct-h 73 < < t~i i; ri u <■ o u zj U -J <. id J3 c < £- Ui c < x z: C u rH U3 K a r^ a ti a <. 7^ j - C Eh o c * z c < n r; g a [. M M rj U E< '^ Eh < < < 1. U E •4H 0) 4J XJ C K C >n Id 14 in > V) ■H •H W 4-1 cn 3 Mj • a A rH c in C U lH C rH w CJ n 4-1 h •rl > ■-H C t: w 4J In -3 jn Cv W £ 4J tji 3 n > w Id O • C -H '4-. .* C lH -H O k O ■-< > u c IJ O'H Id -rl C J-l c avi 4J C 4J 4J c •H >d COd E "J rH •h r; c E T3 nj t! iJ -H a r-t Own — rH U E •H l4»4 C < id •r4 tH a -rt '3 OJ 0; rl r( O l-i 'M •rH '1. u. O 41 1*3 O. ■H > 0) rl X • 41 4-1 "3 V) M 41 14 O u !4h /: vi j: 5 c ~s c w c 1 ri ■rl • "3 "J >4H •a O rH >H rl 4) rH O rH 1/1 r~i 44 •H 4-1 •H 14H Cm H b. c ^d O X O •r| JZ X3 rr id i (0 3 3 44 04d • v. >-J 1h 4J 41 E to .c dn O 3 4J - id 4-1 O >H 10 C >-T3 >- nH04) to c w » •H 3 U) 4-J c 4J tr- rH O C C ■r| 3 C «-< •4J F; fl r. > ■r| 4)4I M 44 OH C 41 c > 3 i ^ z CO Eh X O M i-< fcj H n F^ ►^ M « Eh 14 Fi Eh 10 j~^ z w J W Cd < a O < z w a? CQ M U H ■3; i/i M M W w "C Ei M rH 1-J E-i Eh sa m Eh ;: £•< >-4 Z ^ £- CJ t: 4J w g Eh M 40 4.Q 5 > to r.-< < £h O « H < z Eh u 10 a ui Ch n < c K O 40 t-i O) £h Z rH fcl 7d Ul 0: F^ u c K M < ." tj to u M u] CJ 4-3 tr. M U) to < t- to M ^J « >< t. z u u. • K w ts U W3 < 40 4^ :j « ;i u < H rJ M Ul ■': >■ c to Eh < 4J < C-. M 4*4 rj B ^ ;: '.'. ■ £h t-H E- P w C w a < u «•■: 'C X to c [.. rl CJ i" ^d z. 45 < 0. z tl) Eh Z W 0. g J g E- H < rH w 4>1 s Z rJ k! V. K-rH 1 H O < K w 1--; S < w «c >• 40 p; Eh >< »> to M < VH i-l jr < « « * u < Eh :< C« K r: U 40 K 1^ 1-4 a C-. i< tl z sc Eh ft. 1-4 td M E 2 O Eh O E-i H Ci) Eh < < < O M M Z O u a t: < 4J H S3 Z M P3 g Eh O 14 £ t, < t . < > W <; CI Eh Q Ch b. 40 M t< to Ch £-■ 7' d 0: s z td £h •^ to M r4 c W 40 Z-j M u y g 1 rj EH w >H K W O • • W • » • M • H • * t. s a. c K K 3 s en O £h 01 id ja dl < .0 (J S id £ S id .0 6 O 4/2 E- E-i K 5T C • • • • • CJ CERL HDB Recommendations 17 >, e X Si X >1 £. +J X CJ -u 4J 4-1 ■y 'J •-I !2 i-l 13 x: o -y CJ i-i x; n c > 4J c < v - 4J c XI ■* .C ■h x: •H O o IP c ■y IB t" >-i-i ■H C H C x: ic c O X CJ Si c £> U 4J -U t" u 3 4J c X o -h o n Ci -S x: o 01 ~J m >>-H 14 -. C -1 C" C 3 O O ^ ■r4 >1 04 44 a x -t •r4 •fi a <3 C- Li C Ji y x: C t: — o * Si •h <3 c 13 CI x o C! COt' IB a CI 4-i y j; in C *-4 r: V4 r: o c 4-1 f. S x: 5 U O ~4 (1 « 01 x 0". u w c O x; •rl ■H L. 0- C.-H •rt c c y ■H CJ 3 a a •H 4-1 «J (J! 01 IT JZ, ^ X M >,<4-i X CI x; o w o 13 3 u e y -r C oi i-. ■y ~o t; ■y £ c 1/1 r •"• O 4-1 *J >- C 4-i c C 5 14 -H c - C 11 u x ■H -r-i c in c -h -h a ■H O 3 £ — ■^1 1 13 l- -m -3 Cl -. n 4J -H E c c i> r4 X o --< o o u c i: ■n »4- 5 c u ii p -4 c £ a 'C H •H 13 c C c •h n c ■H r3 •H c t,i '/. -rH •H 13 3 ~ -i s- o x; r: 3 •-I C 1 4J •n c x X r4 > 01 it >H > v. »— l CJ /: y r. K O V4 r- ij jj x: -. c u k IB •-i H 4J q ii >. r. rue r.. ■a j. CJ r. c a c. « ■y •rH C 3 Ci u x; c l, r -7 ,- --4 E u> cp m It 13 "C -1 CJ x a s rs C C. C t >■ cj cr 3; ^~ 4-> LI o- c x: -j it tr. x; o c it y C C C 13 IB •" C 13 TJ c c u: ^" y K H 13 CTi 13 o u c c: 4-i 4J O CJ u.i -H X K C 3. c 4J iH y ■M C ^3 c v. '^J IC -H i". ■H j:'. 3 O " C C • 13 c. 3 C 4J O 4J II C -H 0. ■- ■1 1 CJ IB i3 — CJ •H •f-1 c c i-i ;.'. 3 •^ >CJ O ■" 'I • c 4-1 tl CJ C -> C u a c 14 ~ oi r*. a a Of 14 3--R ^ K c: f:. s E (3 — V, S-. = 1m 13 n c: c c U- »~4 c 01 G C «■. < u K a 1, O K C CJ c^ T r>. 'C x: y c o >44 ii 0- c -y »: 3 SZ O "i « W •" 3 X 44 T 44 XI .J a y 3 H £ c CJ J. 51 4-> •H to >, if. c 410 0-1 -H 0' » r r. in c c uEcn ■h E 4.' ■H r-1 •H 41 i-i x ti 4J ^ r- y k k. •H 4-1 E. '- E 4-I f c. y - o 3 • o 4-1 -i-i •H -H 3 H C C H 3 a r. U o. y c 4J kCh m --, Cj r-1 V, 0! K 13 r -i-i c in r,-. r> 11 y i-i COCK tl c. c < l" —4 ■#4 tl C X H C x: c it X c n c X X -" 3 O 3 c — ■ v. <- £. -j K r- it • c. i: tp C <3 i- C X : r-l l-t rj c > o -3 x; O IB v. h a a H i-l v_1 *c ij fc- U >-i M 1-4 L, I. r_; t~ 2 t~ u; L J < u r Hi « < ^ e. ?. I-I «c Oj ■J W ™ L. >^ c. O ^/ < c e: 7' r e < 2 u < j E-i c I/'. c 1-4 c; Q E-i c. C C fc-, V. c C E c c 2 0. M < 2: *I 3 c C. M < < '. : c H ~ o; ti M 2 <-. O .j j E< 5 tt c c tr. c- o c o; X E. f-1 c a i* t4 £-, ci: X o -. Zi Y. K < c — C H '-^ ZT H < c *^ -- CJ "-i r_ v: u. ■J t r m < m U ^4 M C, CJ 1 c - C rz C : ~ c ___; s 01 44 X 01 > ti -3 x: U4 ■. 'J y G I, 3 'J 04 C C - c '3 01 C u = : •r4 01 u O X o tl X ■H 4J Ci E T3 o C V-. D04 * S it X G X t; > 3 > C P' a r— 1 01 it c c * y C y 01 13 c t4 E » u > X 3 Vi f3 0) 0) O '3 C C ^H rt i— i 3 O ~ ■•-> : o „ 3 G ■n g x: 14 o '"J ■H 31 C C k, 1-1 ~ c x: c = C 01 14 .J ,-> "41 01 G- 3 a Ci 44 G G -H X G 3i-H h fc. 0,1 -, ■nlJU r 4 CJ ij r: ti X > flj 'd ti y >- ■r- J ;J 1*4 3 C H t 3 r 44 ■-4 G >.X -H ■r. u ^4 01 C -: -r< 3 '.- tl "J -H 0> i— ( c -< rji > >■ >,X Z-. P. 5 CC X c CC4 -• ^ — C i-i c -n r L C >, X * >. C = G 11 H .. X h GO > 144 Ih 13 O C 1 j o Vi c 13 r 3 4-1 CJ 'J o y c O "^ ; V- --4 G > -r, ^4 ■-1 C >■ > X y X --4 >. L. 4j x; 4J o E C Md CJ y V, CJ 13 — G t •H 0" 01 X PC o rr G O 1/1 '-4 -H 11 "3 E C 14 4-1 S-T.C4J ," *-' •■ L Cm —1 ti >. -4 01 *0J r G 3 f. 44 . X G U K -i a c- c 14 c rj o ■-. y o 4-1 tl O 1 k. c- >4«H -J O y 3 01 0- -rt 0- .-4 •t; --i oi 13 G cx c n 11 c 0' t. -H u 01 O *-' *- 01 X 1-4 u C_ > - «— 1 -4 E " 44 E G 3 13 01 E X T Si t4 ■•* l-i u 13 CJ c y CO 3 G 01 ... tl X5 G —4 G C 14 EX 01 p.. E -3 'Cv X c. c c 13 C H E o x: 01 c CI 14 x; y x -r4 3 13 44 i4-i rj G V4 CJ O O -4 y C 4J c a V. 13 4J - E C ci r. y x: h 01 V. c ^H l-i 01 01 » L c . 4J C y U 4-i ■#4 o 3 c 3 .-4 n. E C p. u V. 4J c p.. 1-, -r. 5! c C 01 : >. a r- tl O V4 C. -^ It c -y t i-i 4J O CJ X". t: V4 or 1 CJ E E O -1 01 •»4 C tl tl 04 c 13 11 01 > o: 4- >- c :. > 4J c IB c >, 4J it to o s, o c 'J >. 01 J3 3 13 -J -j CI c x- >4 44 C l-i c oi c 4-> ■ --4 c >4 O - 4-1 --J C w I, tl H 3 X. y 3 it ~j 1, 4-1 " G C •-! >^ G it •H 01 G O O G a y E w rj 01 3 p: x- y 00 4J G CJ IP. c — G 0^ * s X •-< C .-I ^4 y C4IH 1( ■3 t-i c ■H rH 13 C « — 4 X" c c f. iB c : >, O 01 P. XC x C. 5 p.. XI X 01 y ■i~ it 3 -H X — i t G a 01 3 -H j_l 01 -H '. x: -H- o X cr o: jr. ci K y o. -. £ 4-i G a •■ — . 01 > X cc 4-1 > E 4-1 X > 3 X "J '. ■- E ;, - > 4-1 i-H *H 4J n -i y i jr^ _c c. CJ - L, i- = o: S .'. ■r- 3 •-- C -H 3 y ■ri -i C G O 4J rt 13 4> o 07.-0 C 13 0. CJ v. c C- \~ C 4. £ " f . G 04 CJ .', "3 3 01 14-, C G 3 G r-( 3i CJ o £. l-i O X J n; V. x: C C 01 H H ST. 01 u CJ PC N G 4-1 u rr G X X G C " ^4 0' C G ig i4 c>4 G -.1 . C E tt- (3 4) W V- E r., 4J O U -H H -. ' 4J H C 4J u - ... = CJ 01 G >J = G O 'J G > ^ 0. *4-' 13 >4 HI 13 ■H > H CM H o 1-q Eh E-i U op: rt E-i £-' E- T. a. :i M C i-l c r: w c u X flt.>4 W i-l ~ P" E-i < W X 1 3 E-i k r < >4 01 • (S D IB w s a X PX a 2 o Q ^j- X X re 03 E- < c: H or. u w 44 Q >^ c > s w W i- 1 > 2; « U CJ M t-J M K u Q X t_3 C c: X E-i I c 4, ;C < .. X a 0". ^_. 7' 1 X fcj X W J2 \ ' ■, '■ ■ J o 00 c E-i M V H '.' X c 71 ;' ii§ L, X : • c ■ X r" C w < ;i T' J (J ;i z cj >4 M ;■_ c IG Eh > x n< on Q C 4 K Z < H X o 04 :. a e? < CJ 0J u fc. fcj E« o C Eh O >4 a Z Q < X X X X E-4 X X Lr5 CJ X x c M X x « G 01 CJ E> 18 CERL HDB Recommendations C\J Etf HMKSOOaw^BK ?ti&acs itj'^3!»5a^ir2«ai - IC • .-t C x: - o CJ u o n u-* M rH S-l CD C • -1 C •" x: <3 c u.. i_ x> 3 W C -H ■J-. a ■*-< c •H '3 a zr. a U X > 4) 3 o - r,. c jj C T c c C 3 •• tj CJ TJ TJ C o o E -h a c >; a > *J u >- #H -X n; "-' Xi .n (XT3 u E C C • r) ^ e :.. x c S 3 C r l '- 3 ^. -J c a •- r; 4J c ill y -»j Is "' C c «lJ "J ,-, * f, ~ rt £. c - r> rc x" c ■>- c X C ■ V- E •-- 17 C ..) c - ■ — - V, r-( *J • ~ E O • n C 'r- U O > o v. >■ > t; xj X C -H = >' u x r-i a V, o x- 1 --i n o • ^1 .C -H 3 c* c c c .V 'S X • u: c c ;: ■r- CO L' i- K .« x i C ■j :: C C <•: X 4. 1 r. x. ^h jj' ' tr " 'C '-'. £ a £ o "_ J; M -H > 3 -H •n C •"- a c h-i :'. X "- x f~* .- O O C a .<« c O n rx V. u c c - .'-. (.1 'XI I; C <.'. -H '.- ■-. "J r-J c ~ c c c i XI -H C C E w- £ f Vi 4J jj -j c r. •r- +J ', E xi v. .t c C rH K .C K^ O 4 U u> 'J) fl i-l u-, a x< c U V) «s . K C 3 ""J " H fi -■ CI U •J < M e^ E- x — ' w f" D O c y " < cr. r C C H* < > 3 < fX u. Q a o C C H C p: E- 'X r K: U C 14 >i rx S G a -z -a z o □ v. < Z I- c c < t; z tf W t-t E- ~ E-; & K ^7. • » • C < " M C- tr. ' z ShS §2 H c HI M HI O C r 3 fX M c u. |H Hi m U u CJ a o >: z > (X! M < o ,J CJ IS1 < e. cj U C £- Z 'X hi n E-" » 6 cx, ^ I A FI OFF-I DISPI < cj o 5 o z t- C * cj < X rx HI c. z Hi K Q d. , . 3 a >• 13 X) c rc x! U TJ CJ n co » w cr. < < C XI E- O (J < y h D CS IX CERL HDB Recommendations 19 immediately to elimination of unsuitable choices based on the technical features. The NBS report suaoests drawing distinctions in three broad classes of system applications: formatted data processing/ structured text searching/ and personal text handling. The needs of the HDB fall into the class of structured text orocessinq. in the following excerpt from the NBS report/ the items in parentheses refer to the characteristic features listed in the Quest ionnai re . repr sear wher segm Exam abst sect ther d e s c the inco desc reco voca chec Or e may Beca unac Eng I Beca reco the of f- esse a co se le judg Str esen chin e t ent s pies ract i ons e mu ript text nsi s ript rd bu la k h Ise/ be use cust i sh- use rds usu line nt i a unt cted e th uc tyred t at i ve g/ lega he f i I of t of te s/ pat . To i st be a ion/ si i n a s tent w ion may by a ry (E.2 is desi any si provi d these s omed like ph a se that wo al 10 pr i nt i t . Ftor (F.3.C) by a e des i r t ex o I te e r ex t / xt s ent dent tec nee egme ith be set .a) red gni f ed yst e to ras i arch uld or ng ( eove of pro abi I t s f x t s ecor (1 egme c la ify hn ig requ nt w the orov of wh i c term i can as a ms a pro ng i ma be e 30 F.2. r/ t th pose ity ear c bibl earc ds 000 nt s i ms/ ate ue f i rin ould i nte ided key h th for t wo val re s g ram s de y s xcee char a) a hese e r d se of c hi ng i oq ra hi ng / consi char wou Id par xt re or ab q an be nded by i words e use acce rd o id se pec i a mi ng emed elect di ng I act e r t hig sy st eco rd arch ont i n is con ph i c i and s i m st of acters be repo agraphs/ cord for bre v i at e exact ma inconce i f unct i on ndex i ng f rom a r can i pt abi I i t ecu r r i ng arch t er I ly ai me encode essent i a vo I umi y I ong t /second h speed ems sbou s that so that ui nq the cei v nf or i I ar pr es or rt t s se I d c tch vab I . C each cont nspe y . i 1 1 es / tat ute ect i on ont ent to al I e and ont ent file rol led ct to .^.f) . text .2.b). users forms/ we 1 1 . text i nt on i na Is / also resent Id be er can r ch . 20 CCRL HOB Recommendations The data files of structured text searching systems would be expected to be unchanging in content and very large in volume. It would be expensive to reorder or restructure them as new data is received/ so it would be desirable for the system to accept new data in any order (D.3). Other desirable features would extend content searching capability* for example by giving a synonym facility ( E . ? . d ) or a presentation of other terms that are conceptually related (E.2.e). As in formatted data processing* tutorial aid is desirable. In contrast to that application however* full Boolean capability* optional report formattinu* and optional ordering ar^ sugaested here as desirable rather than essential. Only a Boolean AND* allowing the conjunction of distinct search terms* is imperative for user convenience* to avoid a tedious selection from record subsets found by individual terms. Optional formatting and ordering may not be used often for such simple structured output records as bibliographic citations. A standard output presentation then is generally sufficient* unless text fields become numerous and frequently of marginal importance* requiring more selectivitty to be niven the user. The chart from the hbS report for cateqorizinq systems is reproduced here as Table 3. Figure 1 shows just the entries which have an x in the feature row cor r espond i ng to structured text processinq systems. This figure shows in a compact format the choices which on the face of it would be suitable for the HDB application. Those systems marked with a "+ M are listed specifically as allowing customer data bases to be added to a "service" system. Also* two systems are included on this table which are not mentioned in the \RS report. These are the CELDS system and the EUREKA system currently in some stage of development at the Urbana campus of the University of Illinois. The SMART system is also indicated on this chart* though it does CERL HDB Recommendations 21 VIVOINTl NOOLL 0003 KUSAS sjuvis n sands HVTOS X0S30HS sfiia N0O3H SDWH axvoan aon& III XISHO SSATIO S3WLL*A - N SISVN 3STW 90VHIW VIVOIKIW IXUOMDIW TCMINCO H3XSVW IA SHVW in savw AI XHVH 1USNI aainbra: 8/SMI 09E/SWI (d30)SW savwi ASdIS SIHECTU iavsiwi e/sa an savm scnvia (Xauuso/eiHa) ioomio SWCD sisva 1 I C « ' X X X X X X X X X X X X X X X X X X X X X X X X X (O a; •H X o en 5 o o 5 & X X X X X X X X X X X X * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 8 I £ X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X o -1-1 1 I >H CD CO Eh S O CO >H CO o ie; o H Eh <: Csl H O o i-5 22 CERL HDB Recommendations 0) o P- o + oo b o VXD + £ CO >-< o CO « g PL, O (XI sa H H o W X VD Eh O CO OO >h O CO S pq PC W CO K M O W h M < W X O Ph w M O g S K CO CO CO O CO < Q > ^ M Ph g m O ■V o H H 0) CJ > M o u n a V O o o 4- c_> 4- — . vo CO < co hSJ H > H r— Eh on S o e + vo iJ "^ o on o < o g O EH J PC H H S O CO S w EH CO >h CO o S2i H o CO w Eh O Eh O D § CO H o o pq PQ (U a /.-; •• o o Ph H 0) o c pq o o n c o • • o u L ^ CO H •g CO hH o c— Fn 00 W HH o CJ W > HH vn "-\ C"J P5 H CO 00 >- "O « t- :.- >-l 1= I z Z o o o a 8« HI ;j z§ fa. C-, CO 2 r_, O b Z < c- < Hi J < g o o w H 3 E- ££ o o HI E-i PH o z c > jg 5 i S3 B o o HI z g c Z HI Pm w E- w c: o Hi H. EC c c_ 5. E-i O ^ C3 ►J C. - : w 1 X H o 1 O £ Cd = Z 1 5 <_> fc: C > § Z W Pi a z W (_> (3 < .t. w u. § p< o '.1 o Z K Hi O o o u **r EC H O Q c § j w >- Z Q O S iJ >, h. &: m z ^j E o < z r- >-l 10 Hi E U w o E-i Hi < HI K H e < Z ht-f-r- J OK U< h1 g c- I O w HI s | PJ C J hi o :- a U- El E* z J s U3 o Z O § E-i < w k a. w c U 5 O W SS o K g Z 5 oo i E-i < R. — >" tt W (X, to O 5 o o E E-i 03 H c g O Eh S (-H « M 5 0* t-i 5 p 8 cc H § fe S 3 • . . . S3 E-- £h £> o . . z o . H h1 E ?» 2 w ■J M < (JOT) A. > O W < 5t >-• 3 0) ,o o o * p o "3 t. V- rH C\J 00 ri Cvj on -s M oj t- ^ a •^ ffi o O 3 \C E Q. C CO Eh Of o fc' M G 3 ^ o *^ H - i z o o w < w M O £C E»- 6-1 ■ 5g S 0) 01 ct tu « o t « o o o x ti no w < O Q u E-i 3= O .-a .- O to CO t: o t~ ^ C »-i M HH to J hi m »-i w a iJ x t«4 wu. Q E-i to o o £ IP >- = K O lv r. Q ;- K w r-i 5 o E o o si o w < Sh t-, «J i.j (•: ; ►-. E-i 3 r : Jb -5 CO p c u; Ol^OOOlJfcKitt-K 2Scqqk£;xw<-3 FlU 5^9 o; ;c C S ID H sp om in. J £ e 2 H<3 Eli ft 3 ll. t»- t"> o o i-i «£ -■ a. O « to 26 CERL HDB Recommendations §1 su V o o o o ;-. ?! z z a Eh u. W o z H w w w set 03 3 cq H 1 3 w w J •-3 eh e- s s z 5 ^ 1 Z o o a a CJ >> w <: o 2 CO z o Cd r; 1 z < t*S o h-t C3 o g w O w En En B 8: Z > s ft O Z Is 0. Z H w W o W « w O n J_ p £ fa a 0, h e- O c a W n x M O u w £ 5 > 3o c < (S w w (4 P. § ►J o B o o CC Eh o e d g J w >H z o z o w a a, o Eh M >i < >n Cd a O < K. fr- § nK WO CO £ z .4 w * .J Ei o: m J M 1 ee C-. J «r E- ,J c c c-> <: o a, s O J t-i ti E- < o E-i M C- >-" 1 £-. z J u; a CO K O Q O < w S ~ S 00 En z X >- M g o e En :- H H = C W (X ft o K £ >n S 1 < KP.KD CJ CO CO a E- O Eh A- W S3 h- ( En w E- S5S w z P. s 3 < o CO o 13 c- > o < s <-> X 3 « JO o o d .a U ■o 01 t~ w CO rH 1 i m cvj © >- O 41 41 O CC *> C TJ '1 '.' 11 coo •Z3 +> 3 m rH c > O -s ••-1 D 13 C t— 3 e < ON ,0 D. rH rH P. Ih iH ^n m M x: u v, u to > 41 u ^ « ^ r-j Q 4? s a 41 > T3 C £1 > a! q o C 3 o h O 3 e a < o z w *n (L, O W O H -325 ►n J 9 < r- a g a si* tH ft w fn O ~. M 'i [h o <: u. ^: 6: E- o p u n Eh U K i.i CO 0, M Eh 3 CERL HDB Recommendations 27 a a s a o u o -d n oh x c v u a p. -p C >~ +> -v-i 'J £ tr > ■3 Pc II •O O ■g ■J BD 4. t, II o >H »>-■ tc B (B Q is 83 3g | >-> C ► 3 w < to ^ p M 6 s S5k CO v> n- »l' l: 2= M a f> o « •> %i 28 CLPL HD& Recommendations m n to co O V. - i.. P ^ -i r-J f. W d O 1, > & a "3 "3 > > S5 O ?5 < o o w EH w P Q O W a w I u h ° O £- O < cc o ft* HH ft. > Q < o < o sS E-..E-. a. z o w c a "I x eh o s < o o >-. ^ o -3 W ft. I o .-; =2 d. >- Uj h >-< z »-• cc *-. < ^ S t; - ° £ 2 H w o §3 P h5 o 5 ft- * CO CC 3 w o w o = K O E-. flip Cj13 O < i so w < o t< o O B M W E-t O m <: £ Sg E- _) i- -J CF.RL HDB Recommendations ?9 to t» u a t> X .4- 8 B tD D 01 O -a f> 8 u n m •^ •^ a> •1 > EC -^ 3 +i 3 U a a> o a> c ft u ■" "8 h I, «. •a -o -a «! !>. >. !> ,0 <-> i r i; < r-J ,-. -1 -H ffl r-4 r-l #H o 5Q 28 a s ° eg 83 sS s S= t- »-( O 2 f- o HI t- t- ill K to n 3 re fa g E o o IS i? a SGo8 e~ w a. 5 ifltOOO <0 X) C "O l> Vh sjh I-; U ft. A. <•. S E >- J ►-. < -5 jJh t a. b m a. o o CO per hour connected to the network/ plus the normal user fees at CCfo. The 360/91 installation at tCN is one of the more reliable places we have come in contact with over the last two years. The interactive portions of the HDB tasks would have to be receded under the TSO system at CCN. However/ since they ere now coded in Fortran/ a mere conversion would suffice to make the system as useable in that environment oS it is in its present environment. Costs there are comparable to costs at the University of Illinois/ except that in our experience jobs which require a large region size (as SMART does) generally are cheaper to run at CCN. Also/ because of more core on the CCN system/ large jobs can be run at any time of day and the turn-around time is generally better than for a comparable large job on the 360/75 at CSO. Also/ the processor out there is much faster and as a result/ the wait for results of a query should be much shorter. 38 CERL MD8 Recommendations In either the University of Michiaan or the UCLA situations* the disadvantage of SMART being a strictly batch system would still apply. However/' with the appropriate cooperation of the originators of SMART at Cornell/ either of these systems would be suitable for converting SMART into an interactive system. This would be no small undertaking. It could not (or should not) be done without the active cooperation of the group at Cornell who are intimately familiar with the inner workings of SMART. The software development for such a task would conservatively take about a year for about a two to two and one-half man-years of prograreroi nq . As a purely batch system* SMART could be installed on one of the nationally available time-sharing systems which offers IBM equipment. If the remote job entry equipment at CERL is sometime attached to such a service* it would be easy to move a copy of the SMART library to such a service and run just the SMART system as pure batch jobs. If the system also supports time-sharing service* the AND and OR programs could be recoded just as they would have to be with any of the other choices. 7. Developing a Replacement for SMART While the SMART system in its present implementation is not quite satisfactory for the production stages of the hDB effort* careful consideration must he given to any proposals to change systems at this staoe of development. Certainly a charge from the implementation on the IBM 360 and DtC-10 system at CSO is CERL HDB Recommendations 39 going to be necessary/ because those systems are scheduled to be phased out of service over the next two years. Section 6 discussed some of the issues which must be addressed in an information retrieval system suitable for the HDB* but outlined the options available to CERL if their decision were to stay with the SMART program. The assumption in this section is that a decision has been made to abandon the SWART proorams and develop or find somethinn else. 6iven that assumption* two avenues of investigation are open. One is to develop the CELDS system which is performing a similar function for environmental data bases in conjunction with another group at CERL. The other is to make the necessary modifications to the HDB to make the information retrievable using one of the nationally available information retrieval. The DIALOG system at Lockheed is given as an example because it comes the closest to meeting the criteria outlined in Section 5. 7.1 Revision of CELDS for the HDB Any initial implementation of HDB on a CELDS-copy retriever would have to include at least the capabilities that ANDOR* BIBAX and DOCAX already provide to HDB users. CELDS provides these options now* and in addition provides: 1) all functions are combined into one retrieval language. 2) SAVE interesting and often used output sets 40 Cl-RL HDB Recommendations 3) HELP 4) partial search tells user how many statements satisfy sub-expressi ons 5) parentheses and full expression nest inn 6) OOPS to return to previous statement-set 7) allows multiple values per field 8) off-line printing 9) simple logon-louoff 10)retains fast response time even for very larae databases To convert to a field-oriented system (like CELDS) the HDB could be broken into the following fields: ACC - accession number DOC - document number STMT- statement number DATE- date published/ researched/ input [unknown for current database! BIB - bibliographic data AUTH- name of author(s) [unknown for current DB3 FUNC- functional area code TRFC- training facility code PHYS- physical settings ENV - environmental descriptors (however many apply) 0CCU- occupants PSTR- posture INVM- involvement 0RGF- organizational functions SFCN- function of statement TEXT- the text of the statement KEY - keywords Cunknown for the current DB3 Several new values would have to be added to the SFCN field including "objectives"* "data"* and "procedure". Several of the fields (such as PSTR and 1NVM) could be dropped and their values used as KEYWORDS. It would help streamline the list of fields without loss of generality. The DATE field is a useful field to CERL HD3 Recommendations 41 include* but not strictly necessary. The only non-searchable fields would be BIB* TEXT/ and DATE. CELDS-like format includes one line per field and each line is prefixed by accession number and field number. The current HOB lines are suffixed by statement number/ card number/ and document number* and separated (unnecessarily) by 'NEXT TEXT 1 cards. Converting data formats would be fairly simple/ except that a few desirable fields would be missing I e.g. keywords] and the current HDB uses dioit strings for the indexes. Names would be much easier for novice users to read. These could be converted automatically. Two CELDS input proaratis would have to be modified sliohtly (made more general) to accomodate the different field names. The CELDS retriever program would have to be modified to use the new fields also. The inversion program woulo have to be run on the newly created Habitability Data Base. The next obvious improvements would include adding keywords to the database/ and adding an on-line thesaurus to the retriever. Then the combined retriever could be modified to use the thesaurus to recognize concepts in a very SMART-like environment. Concept numbers and weighting are not currently practical for interactive searching/ but this could make a fascinating research project. 7.2 Using a Commercial Information Retrieval System One of the more popular and widely used of the commercially 42 CLRL HDB Recommendations available information retrieval systems is the DIALOG system operated by Lockheed in Palo Alto* California. This system was included in the survey discussed in Section 5. If a commercially available system is considered as a home for the HDR/ certainly DIALOG should be considered a prime candidate. The decision to move to a commercial retrieval system presents questions both of a technical nature and of a purely operational nature. We address both kinds of questions/ but from the very limited basis of the specific information which is available to us in the course of this investigation. We consider first the technical questions of what would be reouired to put the HD3 into the DIALOG system. Putting the HDB into DIALOG would require almost exactly the same amount of effort as putting it into CEL DS-f ormat . DIALOG is a field -oriented system with full text searchinq ability* but not natural language query. The HDB would almost certainly have to be converted to a DIALOG format* and keywords should be added. DIALOG would require a very complete thesaurus* which would then be available on-line. Full text searching in DIALOG requires an exact match to the words in the statement. The DIALOG system works primarily with searches on predefined fields. Although the system is designed for b ibl i oqraphi c retrievals the similarity to the information in the HDB suggests that only a small perturbation of the H D ^ would be required for conversion to DIALOB. The fields in the index used CERL HDB Recommendat i ons A3 with the HDB statements could be made into fields in the DIALOG sense. The statements in the HDD are similar to abstracts and thus could be treated by DIALOG in the same way abstracts are treated. The task of convertino what now exists in HDB to a form suitable for DIALOG could be assisted by some of the text processing capability in the UNIX system at CAC. One conceptual dissimilarity between the two systems is that in DIALOG all information of one record (or set of records) concerns a sinqle document/ and there is no field to refer to a parent document. In HDB/ on the other hand/ the basic information is a statement/ several of which come from the same parent document. It would be possible to think of each HDB statement as a document in the DIALOG sense/ providing that an extra field is added to yi v e reference to the parent document. Also/ in this context/ it would probably be advisable to encode keywords for each of the Hu3 statements. other information would be based on the parent document. This would probably need to include the author or some other reference to the source/ the date if that applies/ the corporate author if one exists/ and inevitably/ the key words for the parent document. Another pressing need/ in the event of this choice as well as several others/ is for a completed thesaurus for the HDB. The approach taken in the thesaurus for the early rortions of the HDB is a step in the right direction/ but it needs to be expanded to include terms peculiar to the whole ranqe of habitability 44 CLPL HDB Recommendations statements/ not just the limited subset available to CERL now. By standardizing the HDR vocabulary* and by carefully keywording/ DIALOG could be a fast/ easy to use system for retrieving from the HOB. It is impossible to determine the cost of puttina the HDB on DIALOG nor estimating what it would cost to run/ except by comparing the complexity of HDB to some of the other available databases for which at least representative user costs are available. The cost for accessing the NTIS data base/ for example/ is SZb per connect hour. (The system is purely interactive.) In addition to this a communication cost is added dependent upon the mode of access. For access via Telenet this charge is $8 per hour. Since the HDB is considerably smaller than NTIS/ one would expect the charge to be less/ except for the fact that fewer customers might mean hi q her prices. The documentation of DIALOG makes it \/ery clear that they will not be able to predict the cost for a new database snd the accompanying service to access it. Such an estimate could be nothing but a raw guess without an extremely detailed proposal from Lockheed. One immediate suggestion is that Lockheed should be contacted/ given as much information as possible about the HDB/ includinq this report/ and then asked to submit a cost proposal. Sales brochures for DIALOG indicate the price for out of the ordinary services as "negotiable." CERL HOB Recommendations 45 Although we have no idea what it costs to put up the NTIS information on DIALOG* it seems clear that one of the justifications is the wide interest in accessing the NTIS database* and thus the customer base with which to recover the installation costs. For a special purpose client like CERL the cost cannot be spread over so many customers and thus the apparent cost will seem higher. In order to operate an IAC which includes the capability to search the HDB for clients* CERL would thus have to pass on fairly high operational costs to the client or operate the service at a loss until the number of clients spreads the cost out over a wider base of users. There is another whole question which is still unanswered as to whether Lockheed would even be interested in putting HDB on their system. Certainly the customer base at the present time would not warrant their covering the cost of transformino the HDB into a form suitable for DIALOG. CERL would either have to do that themselves or pay Lockheed to do it. Now it is certainty true that Lockheed is intended to be a profit making venture* and thus they may be willing to put whatever someone wants onto their system for an appropriately larqe sum of money. However/ it may be that their growth plans do not allow for yet another potentially large data base to come on the scene in the near future. If this is true they will not be able to put the HDB database on DIALOG* regardless of whether or not they could recover their costs for doing so. We were able to contact DIALOG 46 CbPL HDB Recommendations users, and use DIALOG on-line. The DIALOG users we sailed were largely pleased with Lockheed service. CERL HDB Recommendations 47 LIST OF REFERENCES [13 T. A. 