L I B HAHY OF THE U N IVLR.SITY Of ILLINOIS 510.84 ' I£6r no. 204-211 cop. 2 The person charging this material is re- sponsible for its return on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. University of Illinois Library JUN 1 APR 21 APR 1 Hlf/ll Report No. 210 ynait coo- 1018- 109U ON THE APPLICATION AND DESIGN OF AN IMAGE STORE by Sylvian R. Ray August 22, 1966 DEPARTMENT OF COMPUTER SCIENCE • UNIVERSITY OF ILLINOIS • URBANA, ILLINOIS Report No. 210 ON 'IKE APPLICATION AND DESIGN OF AN IMAGE STORE by Sylvian R. Ray August 22, 1966 Department of Computer Science University of Illinois Urbana, Illinois 6l801 (Supported in part by Contract AT(ll-l)-10l8 with the U.S. Atomic Energy Commission and the Advanced Research Projects Agency.) Digitized by the Internet Archive in 2013 http://archive.org/details/onapplicationdes210rays ACKNOWLEDGMENT Professor Bruce H. McCormick and Mr. Robert Amendola have contributed many valued ideas to this paper. Mrs. Letitia A. Prendergast prepared the illustrations for this report. SUMMARY The value and use of an image store is dismissed in the context of its application to an Information Resource Center: an information retrieval system incorporating a pattern-oriented general purpose computer and remote video communication network. With the proposed system configuration., text is stored only once - in low cost, high-introduction-rate image form. Indexing research, providing indexing in depth where required, and information reorganization may then proceed on a large scale without further recourse to external data handling. Ihe j.se f a remote video communications net contributes a complication to the global system design which is rooted in channel capacity economics. In particular, the information content of stored images needs adjustment to match channel capacity, a problem which can be resolved by format control during original loading of the image store. Finally, two macrodesigns for image stores with different access times and storage capacities are sketched. -1- 1 . INTRODUCTION The emergence of electronically accessible Information Resource Centers sprinkled throughout the United States appears to be an increasing certainty^ as hardware capability and systems knowledge continue to improve . Various visions of the goal and the paths leading to it have been proposed by Licklider [1], Clapp [2] and McCormick [3]° In addition., at least one working conference, INTREX [h] has been devoted to planning an experimental program directed toward similar objectives . In general terms, an Information Resource Center consists of a library-like information collection, with facilities to encourage interaction between all participants of the collection.. The center is further augmented by an evolving capability for fact retrieval and possibly facilities for on-line control of experiments and the processing of experimental data. The present paper summarizes the system specifications for the design of the library-like aspects of an Information Resource Center and proposes alternate image store designs which satisfy these requirements. The term "library-like" refers to document-type data which includes books, journal articles, picture and map files but excludes some forms cf numerical experimental data and other highly structured data. -2- 2. SYSTEM DESIGN Limiting our purview to documents, as normally construed, the required system hardware consists of four major components:* 1) a general purpose computer with special facility for high-volume processing -- e.g. a machine of the ILLIAC III class. 2) a bank of photographic microimage stores. 3) a bank of semirandom access bit store -- e.g. magnetic disk storage. h) an extensive communications network with remote video, as well as low-speed digital, intercommunication between users and the central equipment. System macrooperations consist of insertion of images into photographic image store and insertion of basic index data (title, author, image store address, etc.) into bit store. A normal user types his search query into the system, the computer generates the image store addresses of responsive documents, and these documents may then be viewed or copied after transmission of the video image. The image processing facility proposed above will, in addition, be utilized by research workers and system programmers to investigate machine index- ing methods and to evolve deeper indexing for more important and frequently used documer.ts . tal distinction is made between text in image store and the index in bit store. This dichotomy is essential to high speed index searching as well as flexibility in altering the length of the index. + Secondly, the contemplated system provides for economical, high volume, rapid introduction of basic data by the provision for image format insertion. The objective is that this initial introduction shall be the only introduction of any particular piece of original source data. The image data will have * All of these components except (2) are well advanced in design, under construction, or being purchased commercially at the University of Illinois. On this point, I heartily agree with Van Dam and Evans [5]- -3- sufficient resolution that character recognition processing can be successfully performed -- the processing capacity of the central computer must be adequate to the task. As documents of special interest are identified and algorithms for index development sharpen, deeper and more precise indexes can be evolved without return to original documents. Furthermore, any qualified and interested researcher can reorganize, regroup, and synthesize information from the system into new image formats to be introduced into the system. This facility to work directly with photographic images of documents makes possible a number of the objectives proposed by Licklider while abrogating his pessimistic forecast on the rate of introduction of documents into the system. The third remark concerns the technical nature of images introduced into storage. In every available or proposed image store, design is based on the assumption that images are viewed and/ or photocopied locally. Such a system would be obsolete before it became operational. Facilities for image transmission are essential. The problem however is that the achievable resolution of any practical television system is not adequate to reproduce most text pages to human readability standards. The limitation is imposed either by the camera/ monitor capabilities (presently, 1275 lines maximum) or by channel bandwidth. In the present commercial market, the camera sets resolution limits but even if this limit is eased, the time required to transmit one frame over available communication channels will set the next barrier. Considering the economics and limitations of transmission equipment, monitors, etc., it is necessary that each image in the image store correspond to a unit transmittable frame. One solution is to reformat original text material to conform to transmittable resolution standards. In many cases, reformatting only amounts to division of a page into two images, one for upper half and another for lower half. In other cases, computer control of the recording camera system will be necessary. -k- 3- THE IMAGE STORE: DESIGN CRITERIA We discuss next some criteria which affect the design of the image store itself. 3-1 Image Access Time In any multiple user on-line system, once the files have been organized in the optimum practical manner, the residual limitation on the rate of servicing users (or the total number of users who can be served in tolerable waiting time) is a function of image access time. A maximum access time to one image of about one second seems a reasonable goal. That law of nature, the trade-off of access time for capacity, will affect this goal by factors of two to four. 3-2 Capacity In order to service even one scientific research field, a minimum count of the order of 10 pages is indicated. For example, the EURAT0M infor- mation system servicing only the field of nuclear science presently counts 4.10 * documents averaging perhaps 10 to 20 pages per document. R o Stores of 10 to 5-10 images would find application in regional or national centers. In summary, the widest range of application would be served by an image store designed for minimum capacity of 10 images and economically Q expandable to 10 images. Larger storage requirements could be served by multiple storage units. L Rolling, private communication. ■5- 3 »3 Image Editing, Insertion and Readout The readout system for stored images must satisfy several conditions: 1) Output image resolution must be adequate for human viewing without strain. 2) Images must be transmittable to remote stations. 3) Output resolution and signal/noise ratio must be adequate for mechanical recognition without extravagant noise reduction, line thinning, and contrast enhancement procedures . k) Image quality must be sufficient that file images edited by electronic "scissors-and-paste" can be reintroduced into the file. Clearly, there is a substantial economic advantage in using a standard or near- standard television system for readout and image distribution. Regarding condition (l), experimental tests of text displays on high resolution television systems were performed. The results show that between 10 (minimum) and 15 (maximum) TV lines per gross character line height provide readable resolution of text. This assumes that horizontal resolution, per inch, is approximately equal to vertical resolution of the TV system. The minimum quantity, 10, corresponds to a barely readable image quality while the maximum would be acceptable to a severe critic. Table 1 presents calculated resolutions required for three cases. These results can be interpretated as asserting that one whole document page can be legibly reproduced by a 1029 line television system (a readily available type) for a substantial fraction of documents. However, there remain a non-negligible number of special cases (footnotes, figures with small captions, etc.) for which one page-one frame resolution is inadequate. One u iversally valid solution to this problem is to edit (reformat) the original image. Generally, reformatting under machine control is much less time consuming then complete character recognition, but the actual complexity of format control logic can be determined only after sampling a representative range of practical cases. -6- fD Co > X co W C+ Til CO fD fD g 3 O § CO ct W- H- a O O TO j3 M) c+ O e+ p" o tr P 3 fD c+- id 4 3 c+ (D tr Co (D M co H- << Co (5 co CO P ct T3 fD fD P 3 O 3 c+ P 4 O < P c+ CO c+ tr H H- fD p: O 4 fD CO „ s £ s: P CO H- P H P- H ro ci- h- 1 - O p* ct ~~ — H- P- p- fD fD fD CO 'a H" fD TO o P p~ H) P c+ c+ o 3 a O fD H H- c+ 1-4, rm !-r O fcr fD a c+ r+ TJ o O" CO O TO H fD P a y fD CD CO C+ CO O >* P~ fD CO c+ p- fD vn fc> O s; S Tl M ir ^ O t" 1 Tl -^ U ct- H- o fD O 4 -P" H' Hj H- Co = O p- O M Ct o o o O ct- C3" TO B O ct- H CO 4 TO H !* P ct 4 fD O p H H- tD c+ Co 4 M 4 P" Co ^g P P O TO fD O Co H- fD fD 4 H' fD h- a h- 4 ^ i P 1 H- P P ct- Co CO - fD § fD fD CD c+ H- 4 Ct CO P H- 4 CO H O <