CENTRAL CIRCULATION BOOKSTACKS The person charging this material is re- sponsible for its renewal or its return to the library from which it was borrowed on or before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each lost book. Theft, mutilation, and underlining of book* are reason* for disciplinary action and may result in dismissal from the University. TO RENEW CAU TELEPHONE CENTER, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN DEC 141998 JUL 5 2000 When renewing by phone, write new due date below previous due date. L162 Digitized by the Internet Archive in 2013 http://archive.org/details/trainablecharact944elso °f A UIUCDCS-R-78-944 l/'/aZA^ UILU-ENG 78 1737 TRAINABLE CHARACTER RECOGNITION INTERFACE COMPUTER (INCOM) by Mohamed Taher Abdalla El-Sonni Q 1978 October 1978 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS "he Library of \ I'MIVL UIUCDCS-R-78-944 TRAINABLE CHARACTER RECOGNITION INTERFACE COMPUTER (INCOM) by Mohamed Taher Abdalla El-Sonni October 1978 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS 61801 Supported in part by the Department of Computer Science, and submitted in partial fulfillment of the requirements of the Graduate College for the degree of Doctor of Philosophy. © Copyright by Mohamed Taher Abdalla El-Sonni 1978 TRAINABLE CHARACTER RECOGNITION INTERFACE COMPUTER (INCOM) Mohamed Taher Abcialla El-Sonni, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1978 A trainable, real-time character recognition device has been designed and built. The visual feature extraction method is chosen as the most favorable computational approach around which the feature extractor is designed. A new method of local feature extraction is presented which uses a minimal size window and simple raster scanning of the character image. Merging rules are devised to reduce the number of these features by merging two features at a time. A method of dynamic segmentation of the character representations is introduced, in which the character is described as a collection of horizontal or vertical segments connected by links. A multiple description technique of the character segments make it possible to reduce the number of training characters and to use a simple data structure for the classifi- cation dictionary as well as a simple decision criterion for recognition, Experiments show the power of the described techniques. Ill ACKNOWLEDGEMENT I wish to express my gratitude to my advisor, Professor Michael Faiman, for suggesting the thesis topic and for his continuous guidance and friendship. I am also grateful to Professor Wolfgang Poppelbaum for the opportunity to work in the Information Engineering Laboratory and for his friendship. I thank Professors Sylvian Ray and William Kubitz for encouragement and friendship. A special word of thanks goes to Frank Serio for his expertise in fabricating the printed circuit boards and the panel, Stan Zundo for his personal care and skill in drafting the figures, Cinda Robbins for her assistance and for typing the final draft, Clyde Helm for cooperation and to June Wingler for her excellent job in typing the thesis. I am indebted to my friend Professor Ahmed Sameh for his most appreciated advice and encouragement. I would like also to thank my friends in the Information Engineering Laboratory: Al Irwin, Gary Gostin, Dan Pitt, Joe Luhukay, Mike Robinson, Izumi Suwa and Randy Moss for their friendship and for many valued discussions. Finally, I wish to express my special thanks to my wife Magda for typing the early drafts of the thesis and especially for most needed encouragement and support, and to my son Taher for encouragement in his special way. IV TABLE OF CONTENTS Chapter Page 1 . INTRODUCTION 1 1 . 1 Design Goals 1 1.2 Relevant Literature 2 1.3 Comparison of Computational Methods 7 1. 4 Design Strategy 10 2 . SYSTEM DESIGN 14 2.1 Introduction 14 2.2 General Description 16 2 . 3 Scanning-Windowing Schemes 18 2.4 The PREPROCESSOR 18 2 . 5 The FEATURE EXTRACTOR 23 2. 6 Local Feature Extraction 24 2. 7 Structural Features Extraction 29 2. 8 Feature Vector Construction 36 2.9 The CLASSIFIER 40 2 . 10 V-ELMENT CONSTRUCTION 44 2. 11 DATA STRUCTURING and LEARNING 46 2.12 DECISION and RECOGNITION 49 3. HARDWARE IMPLEMENTATION 51 3.1 General Discussion 51 3.2 The PANEL 54 3.3 System Organization and the MASTER CONTROL 57 V Page 3 . 4 The WORKING STORAGE 66 3.5 PRELOG Module 69 3.6 The MERGER 73 3.7 FEATURE VECTOR FORMATION SUBPROCESSOR 76 3.8 The TEMPORARY SEGMENT Module 77 3.9 The ENCODING LOGIC 81 3.10 The FORMATION SUBPROCESSOR Operation 83 3. 11 The CLASSIFIER 87 3.12 The V-ELEMENT CONSTRUCTION Module 90 3. 13 The DATA STRUCTURING Part 92 3. 14 The DECISION Module 95 3.15 The V-CONTROL 95 3. 16 The C-CONTROL 102 3.17 The B-CONTROL 105 3. 18 The A-CONTROL 109 4 . SUMMARY AND CONCLUS IONS 116 REFERENCES 122 APPENDIX A. OPERATION AND EXPERIMENTS 124 B. INCOM CIRCUIT DIAGRAMS 132 VITA 152 VI LIST OF FIGURES Figure Page 1.1 General Block Diagram of a Character Recognition Device 4 1.2 Window Cells Designation and Types of Connectivities 12 2.1 Character Segmentation 15 2.2 INCOM as a Character Recognition Device 17 2.3 Horizontal and Vertical Scanning-Windowing Schemes using 3x3 Window 19 2.4 Anticlockwise Scanning-Windowing Scheme using 1x2 Window 20 2 . 5 Examples of Preprocessing Patterns 22 2. 6 Line Representation of Node Types 25 2. 7 Binary Representation of Nodes 26 2.8 Examples of Nodal Interactions 28 2 . 9 MERGING RULES 30 2.10 Different Shapes of 'A' with the Same Structural Features (Horizontal Segmentation) 32 2.11 Example of Horizontal Separator Vector (HSEPVEC) 34 2. 12 Example of Feature Vector Coding 38 2.13 INCOM CLASSIFIER Block Diagram 41 2.14 The Relationship between the Feature Vector, the V-ELEMENTS and the Character Codes 42 2.15 Characters with the same Top and Bottom Segments 45 2.16 CLASSIFIER Word Partitions 47 2.17 Recognition Cycle 50 Vll Page 3 . 1 INCOM General Block Diagram 52 3.2 INCOM Architecture 53 3. 3 The PANEL 55 3.4 System Operation Flow Chart (M-CONTROL) 58 3.5 System Data Flow Diagram 59 3. 6 WORKING STORAGE Plane Organization 68 3.7 Examples of Possible WORKING STORAGE Scanning 70 3.8 PRELOG Logic Diagram 71 3.9 MERGER Block Diagram 74 3.10 MERGER Logic Diagram 75 3. 11 FEATURE VECTOR SUBPROCESSOR Block Diagram 78 3.12 TEMPORARY SEGMENT Module Logic Diagram 79 3.13 ENCODING LOGIC Block Diagram 82 3. 14 F-CONTROL Flow Chart 84 3. 15 INCOM CLASSIFIER Block Diagram 88 3.16 V-ELEMENT CONSTRUCTION Module Block Diagram 91 3.17 DATA STRUCTURING Part Data Flow Diagram 93 3.18 DECISION Module Logic Diagram 96 3.19 V-CONTROL Flow Chart 99 3 . 20 C-CONTROL Flow Chart 103 3. 21 B-CONTROL Flow Chart 106 3.22 CLASSIFIER Operation Flow Chart (A-CONTROL) 110 viii LIST OF TABLES Page 1 . 1 Ranking of Computational Methods 8 According to Comparison Criteria 3 . 1 PROCESSORS-OPERATIONS-MODULES 60 1. INTRODUCTION 1.1 Design Goals INCOM (INterface COM puter) is a real time, trainable hand- drawn symbol recognition device. It is intended as an interface between a graphics input tablet and a computer system in an inter- active graphics environment. INCOM can be trained on-line to recognize hand-drawn symbols such as alphanumerics , flowchart symbols and symbols of similar complexity. In addition, the average number of training samples is reasonably small: about 10 samples/symbol for a recognition rate of 80% or better. The system response, in training or recognition is almost instantanious (about 50 msec) . The device, as it stands, can be regarded as a character recognition machine, and a character recognition technique has been chosen and implemented in hardware. Such a device will certainly save the computer system the burden of recognizing the input character besides the obvious speed up of the recognition task. There is also the advantage of having a character recognition machine of reasonable size which can be used for man-machine interaction purposes, personal use, e.g. reading for the blind, or as the recognition part of an optical character recognition device. It is interesting to note that there is no reported device which has been designed and implemented specifically to be trained on-line to recognize hand-drawn symbols or characters. Actually most of the known approaches in character recognition have been implemented on large computers [1]. Several techniques have employed a front-end processor, a special purpose computer, or a minicomputer, to pre-process the input data and rely on a bigger computer to store the recognition dictionary and to handle the training [5] . 1.2 Relevant Literature Character Recognition has been the subject of a large number of papers for almost two decades [1,2,3,4], Actually almost every approach to pattern recognition has been tested on samples of characters to show some degree of applicability. Character recognition methods can be classified in several ways. For the purpose of this thesis these will be discussed according to the following three major requirements: accessibility to the user; expandability of the character set; and the type of computation used in the recognition process. In terms of user accessibility, both off-line and on-line (real-time) approaches have been considered [1]. In the former there is no interaction between the user and the device; optical character recognition falls under this heading. In the on-line approach, the user interacts with the device by inputting, modifying and even training the system to recognize his own symbols. We are interested in this latter approach. According to the expandability of the character set the device can be characterized as of the specialized type or the trainable type. If specialized, the device is tuned to recognize a limited set of characters which the user may not change. INCOM, on the other hand, is trainable to recognize characters specified by the user. Trainable systems are more flexible, but may exhibit some degradation of recognition if used by others without retraining. A character recognition device can be viewed as consisting of three main stages: preprocessor, feature extractor and classifier (Figure 1.1). The preprocessor operates on the raw input data pre- paring it for further processing. A set of measurements are then performed on the preprocessed pattern image by the feature extractor, which produces a compacted form of these measurements. We call this form a feature vector. To train the device, the resultant feature vector is tagged with the input character class and presented to the classifier. The classifier then stores this information into its data structure. In the testing stage (recognition) the feature vector of the unknown character is presented to the classifier. The classifier uses the feature vector to retrieve relevant information from the classifier data structure. According to some decision criterion, the retrieved information is used to find the unknown character class. To obtain good performance from a character recognition device the set of measurements performed by the feature extractor must be relevant to the classes of characters to be recognized. This makes the feature extractor stage the most important and critical stage of any character recognition device. Therefore, the computational approaches will be categorized here according to the types of measure- ments used in order to construct representations of input character patterns. These may take any of the following forms: template matching, pattern sampling, spatial transform, geometrical moments 0) o •H % P C O •H ■U •H 60 O o q; OJ •M O CO l-l CO u 4-1 o u cd •H Q ^! o o tH PQ CO ■h cu § CD M 60 •H Pm and visual feature extraction. Template matching [1] is the simplest of all the computational methods. Measurements are taken directly from the character image and, therefore, the resultant features are the points constructing the image. In the learning phase the character prototypes are stored in the classifier dictionary. In the recognition phase the input character image is matched against the character prototypes using a decision criterion. The character prototype closest to the input character according to this criterion is considered the recognized character. This technique has been applied successfully to the problem of fixed-font optical character recognition. It is not suit- able for hand-printed character recognition because of the large number of prototypes required to accommodate the many variations in characters . Because of the inherent redundancy in human generated symbols we can expect that only a fraction of the points constructing the character images actually contribute to the recognition process. It should be sufficient to sample the input character pattern at some known places in the image plane and use these samples, e.g. groups of points, as measurements. The measurements do not necessarily repre- sent the character image faithfully but can nevertheless be useful in the recognition task. These measurements can be the number of points in some chosen fixed areas of input image, points of intersections between image lines and lines of known directions [6], the presence or absence of certain points in the image [7], etc. This method is called pattern sampling. It has obvious advantages over the template matching approach by allowing more variations in the input character. But both of them have the disadvantage of requiring large numbers of training prototypes, especially for the kind of patterns we are dealing with: skeletonized patterns or character images from an input tablet. However, pattern sampling has been tested successfully on optical character recognition systems for letter sorting [8]. A third technique is to use a set of orthogonal functions, e.g., Fourier, Walsh/Hadamard, Haar, etc. [9,10] to spatially transform the input image, and use the resultant transform coefficients as measure- 2 ments . Because of the large number of the resultant coefficients, n , where n is the dimension of the input matrix, some of them are chosen under program control, hoping that they will be sufficient to recognize the unknown input patterns. The drawback here lies in the choice of these coefficients, which require a large sampling of input patterns, as well as a powerful computer with a sufficiently large memory [10]. It works well if the number of character classes is limited and the variations in the unknown input characters are not severe. For example, if the unknown character is skewed or shifted in one direction the transform coefficients will change values appreciably. This will result in either misclassif ication or rejection. A fourth approach is that of weighting the different points of the input character image with known weight functions and summing the result, an instance of which is the method of geometrical moments [11]. Although these measurements can be made translation-invariant, this technique possesses the major disadvantage of the spatial transform method: large amount of training data and insensitivity to local variations of character classes. The fifth scheme is that of visual feature extraction, in which a character is considered as a collection of interrelated meaningful features. For example, a character can be described as a collection of line strokes [5,12], or as a collection of line endings (spurs), line- crossing, etc. [13]. A "feature" represents a set of characteristics of the image plane that is invariant to several character classes and is also invariant to translation. Therefore, describing a character using these kinds of features will inherently describe variations of this character, thus enhancing recognition. Clearly, the choice of appropriate features, as well as methods of extracting and inter- relating them, are crucial to the efficiency of any character recogni- tion device which uses this scheme. 1.3 Comparison of Computational Methods Before adopting a computational technique for implementation, three general criteria are considered: (i) amount of storage per character set, (ii) amount of processing per sample, and (iii) number of training samples per character class. Table 1.1 gives the ranking (1-5) of the five computational approaches. Rank 1 indicates the most favorable and rank 5 is the least favorable, according to size, com- plexity, cost, speed, etc. The first criterion, amount of storage per character set, is an indication of the complexity of the system as a whole and its response. It includes the storage required for programs, intermediate results and classifier dictionaries, as well as the amount of processing Table 1.1 Ranking of Computational Methods According to Comparison Criteria. Approaches Amount of Storage/ Character Set Amount of Processing/ Sample Training Samples/ Character Class Template Matching 5 1 5 Pattern Sampling 4 2 4 Spatial Transforms 3 5 3 Geometrical Moments 2 4 2 Feature Extraction 1 3 1 required for the relevant measurements. The feature extraction method is ranked first because the measurements need less storage and processing than the rest. In addition, the size of the classification dictionaries is usually much less than that of the other schemes. The large sizes of the classification dictionaries and the amount of processing required for the associated measurements make template matching and pattern sampling the least favorable. Geometrical moments have some advantages over the spatial transform approach because the resulting moments are more locally sensitive and can be made translation invariant and, hence, better recognition characteristics can be expected. The second criterion, amount of processing per sample, indicates the amount of computation required for performing the measurements on the input character. The top two methods in Table 1.1 are simpler and require less computation than the rest. Feature extraction, however, is superior to spatial transforms and geometrical moments. For the case of line-like patterns it is computationally simpler and requires less time to extract features than generating transform coefficients or geometrical moments. The third criterion, number of training samples per character class, is a direct requirement from the design goals: the smaller the number of training samples the better the approach. The feature extraction approach excels in this requirement as mentioned in section 1.2. From the table it is obvious that feature extraction has the highest overall ranking and is therefore the method of choice for INCOM. 10 1.4 Design Strategy After adopting the feature extraction approach, it is necessary to determine which features and interrelationships will be used. To achieve such a goal, first, human factors must be considered, since they are involved in generating and interpreting characters. Secondly, in detecting the chosen features, it would be advantageous to exploit the input form, i.e., binary pattern, of the character image. a. The Human Factors (i) The lines drawn are affected by the complex motor activity of the hand as well as its inertia. Because of this, lines people draw are not necessarily straight, even if they are meant to be so, and lines meant to meet at a vertex may cross each other or may be slightly separated. A good technique has to tolerate such deviations and eliminate some of them before further processing. It has also been observed that obvious changes in the line direction best describe line-like scenes [14]. Line- curvature, line-endings and line inter- sections have been chosen as the local features of characters for implementing INCOM. Before detecting these features gaps are filled between two points considered close enough to be connected, (ii) It has been observed in experimental psychology, using human subjects, that segmenting the character horizontally gives a good description [14]. Actually this segmentation has already been used in a practical system to recognize 11 hand-written numerical characters for automatic letter sorting [15], Another approach is to segment the character image plane into horizontal, vertical and diagonal strips of known positions and detect the presence of simple features, like short lines and long lines, within these strips. This approach has been applied successfully to recognize alphanumeric characters [16]. In INCOM horizontal as well as vertical segmentation are done dynamically on the character representations. Segmentation here is made dependent on the distribution of local features within the character itself, instead of the fixed segmentation of the image plane of the other approaches. In addition, the resulting segments are themselves segmentable for the purpose of establishing desirable topological relationships with other parts of the character, (iii) Horizontal and vertical orientations are preferred to others in describing the spatial relationship between objects [17]. In INCOM relationships like "above," "below," "left of" and "right of" are used to describe spatial interrelationships between parts of a hand drawn character. However, these descriptions are not explicitly coded but they are imbedded in the feature vector describing the character. b. The Input Form Since the input from the tablet is a series of points to be stored in a matrix storage, an effort has been made to exploit the binary representation of line-like drawings. The input points are stored as 1-cells, while 0-cells represent the unwritten part of the image matrix. Let x_,x ,...,x q denote the cells (l's or O's) neighboring an arbitrary center cell x in Figure 1.2(a). Two types 12 *7 X 8 x 9 *6 Xl x 2 x 5 X 4 *3 a) Window Cells Designation ■ ■ y ■■ ■' '■>■ ■■"■■ ^ b) 4-Connectivity c) 8-Connectivity Figure 1.2 Window Cells Designation and Types of Connectivities 13 of connectivity: 4- connectivity and 8- connectivity, are defined between a neighboring ceil and the center cell when both of them are 1-cells as shown in Figure 1.2(b) and (c) . Two points in a binary image are connected when there is a series of pairs of 4-connected or 8-connected points between them. To simplify the feature extraction operation the binary image of the input character is thinned by changing superfluous 1-cells to 0-cells on the edges of lines, while preserving connectivity and tips of lines. After thinning (cleaning) the binary image, it may be observed that 3><3 cells is the smallest window through which a change of line direction can be detected. Using the above properties three sets of local patterns of 3x3 binary matrices have been constructed. These are the gap-filling, image cleaning and nodal extraction templates for connecting close parts of the binary image, eliminating redundant points and extracting local topological features (e.g., tips of lines, change of direction, etc.). 14 2. SYSTEM DESIGN 2.1 Introduction In INCOM a character is viewed as a set of local features connected by a set of links. In other words, it may be regarded as a linear graph with local features as vertices, or nodes, and links as edges. Instead of using lists of node coordinate information and explicit searching for connections between them, a simple method is described for specifying the character (graph) by means of a set of one-dimensional list subvectors. This method is segmentation, in which the character is described as a collection of either horizontal or vertical segments, which are connected by links (Figure 2.1) . Segments may consist of several disconnected subsegments each of which may contain one or more nodes. For the current implementation minimum descriptions of the constitutent parts of the character — segments, links, and nodes — have been considered. The existence of segments and subsegments, the number of links and their distribution within subsegments and the number of nodes in each subsegment constitute the feature vector. It should be noted that all the operations involved in pre- processing and feature extraction as well as constructing the feature vector are done by simple raster scanning of the WORKING STORAGE memories without resorting to contour following or back tracking. This makes the control simpler and the implementation more elegant than the other approaches. 15 UD LINKS n FIRST HORIZONTAL SEGMENT • • • • • • T 1 LAST HORIZONTAL SEGMENT a) Horizontal Segmentation FIRST VERTICAL SEGMENT LAST VERTICAL SEGMENT LR LINKS I I T I ' I I !•! i*l • • i i i i b) Vertical Segmentation Figure 2.1 Character Segmentation 16 In addition, the multiple-description approach makes it easier to devise a simple data structure for learning and a simple best fit criterion for decision (recognition) . 2.2 General Description INCOM consists functionally of three main processors: the PREPROCESSOR, the FEATURE EXTRACTOR and the CLASSIFIER (Figure 2.2). The PREPROCESSOR accepts a series of points (x,y coordinates) from a tablet digitizer, or its equivalent, and stores them in the image memory (IMAGEM) . Upon finishing the drawing of the input character the PREPROCESSOR begins the operation of gap-filling followed by image-cleaning, thus preparing for feature extraction operations. The FEATURE EXTRACTOR operates on IMAGEM by extracting local features and storing them in the node memory (NODEM) . Furthermore, according to the current scanning mode the vertically or horizontally inclined links are extracted from IMAGEM and stored in LINKM. The WORKING STORAGE memories (IMAGEM, NODEM and LINKM), which now represent the input character, are scanned either horizontally or vertically. While scanning, they are partitioned into segments. Each segment is repre- sented by the links which connect it to the neighboring segments and the number of local features in each segment. Each segment is then coded as elements of three feature subvectors, ILINKV, OLINKV and TABV. The locations of these elements are indicative of the order of their corresponding segments. The feature vector which consists of these three subvectors is the input to the CLASSIFIER. The CLASSIFIER preprocesses the feature vector by combining parts of its constituents into compound features (V-ELEMENTS) according 17 Q C O •H u •H c 60 O O '/ V, Yt 'A A VY // % zzz: %. /,< A Y, V, vl Y/ Y/ AA A sy V, %■ // ' A V, // f /j> /,* // 42 7a //, 54 */, %■ /,< Y,< V> 'A* // A AA M //> Y%> £* v.* /,> Y,> } A Ws // y Z< sy, Y,< 22 y* 'A y' t V, y< Yl \Y M /y Y,t A % % vl YA v y< y< VY A 77 Y, A A y+ '/< \y,i % 3 /!> /y, %Y 'A Y,< Y< x< % Y< v) / // z^ // // zz V V >jY 7/ j& A Ya V', /,< % Y< '// /<■ 7/ a) HSCANW33 DIRECTION OF SCANN -SCANNED AREA b) VSCANW33 Figure 2.3 Horizontal and Vertical Scanning-Windowing Schemes using 3x3 Window 20 DIRECTION OF ANNIN6 CENTER CELL (W23) WINDOW TO BE SCANNED £ W12 1st POSITION OF ASCANW12 §§ 1 3rd POSITION OF ASCANW12 ^ SCANNED CELLS OF W23 Figure 2.4 Anticlockwise Scanning-Windowing Scheme using 1x2 Window 21 IMAGEM is a two-dimensional 16x16 cells matrix for storing the binary image of the input character. The image is stored in the inner 14x14 matrix leaving the outer rows and columns empty (0-cells) for ease of manipulation. We consider 14x14 a reasonable size for the binary pattern under consideration. As a matter of fact a matrix of 14 rows by 9 columns has been found satisfactory for representing character images (17) . Gap-filling is the process by which gaps (0-cells) are filled with points (1-cells) between closely adjacents points, or between a point and a line or tips of lines. A gap may occur as a result of quantizing the input drawing on a discrete grid. It can also occur because the user may consider the gap insignificant (see section 1.4 on Human Factors) . One cell is considered a reasonable gap in a 16x16 cells IMAGEM. This assumption makes the gap-filling easy to perform using HSCANW33 (section 2.3). Examples of fill-in-gap patterns (FIG) are shown in Figure 2.5(a). For gap-filling the center cell of a 3x3 pattern is a O-cell while the neighboring cells may take any combination of 0-cells and 1-cells; 256 combinations in all. A FIG-Table of 256 1-bit entries has been constructed. Each entry is addressed by the contents of the outer cells of the current window. If the pattern is a FIG pattern the addressed entry contains '1'; otherwise it contains '0'. To fill gaps, IMAGEM is scanned using the HSCANW33 scheme and the center cell of the current window is changed to '1' if the surrounding cells address a FIG pattern in the table. Otherwise, it remains the same a) Fill-in-gap (FIG) Patterns b) Clean-image (CLN) Patterns 22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 2.5 Examples of Preprocessing Patterns 23 and the next window location is considered. The FIG operation is completed in one IMAGEM scan. The clean-image process (CLN) follows the fill-in-gap (Figure 2.5(b)), and is designed to get rid of some of the quantization noise by smoothing staircase-like lines and deleting one-cell extensions of intersecting lines. In cleaning, the center cell under consideration is reset to '0'. The same scanning table look-up mechanism used for fill-in gap is also used here. A CLN-Table is constructed: an entry contains '0' if it corresponds to a CLN pattern, otherwise it contains '1' . While scanning IMAGEM the center cell will be changed to '0', cleaned, if the surrounding cells address a CLN pattern, otherwise it remains the same. The clean-image operation also needs one scan only. After the above two operations, the binary pattern in IMAGEM is suitable for feature extraction. 2.5 The FEATURE EXTRACTOR The feature extraction technique adopted here is based on the hierarchical description of characters represented by multiple descrip- tions of the components. A character can be described as a human- generated line-like pattern which can be segmented into smaller patterns (segments) inter-related by connectivity (links) and spatial relations (above, left-of, etc.). Each segment in turn may consist of subsegments. Each segment is described by the distribution of links connecting it to its neighboring segments, by its contents of local features (ends of lines, curved lines, etc.), their distributions in the segment, etc. Each description is encoded and stored separately in 24 a feature subvector. In INCOM each character is segmented horizontally and verti- cally. For each segment three descriptions are given: two descriptions using the distribution of the lines linking it to the neighboring segments ( ILINKV and OLINKV) , and the distribution of local features in each segment (TABV) . Three main operations are performed in the FEATURE EXTRACTOR: local feature extraction, structural feature extraction and feature vector formation. Local feature extraction involves two consecutive operations: one on IMAGEM to extract nodes and store them in NODEM and the other operation (Merging) i s applied on NODEM to eliminate the superfluous nodes. Both of these operations are explained in the next two sections. 2.6 Local Feature Extraction a. Node Extraction The types of nodes and their line representations as well as examples of their equivalent binary patterns are shown in Figures 2.6 and 2.7. It has been observed that 3x3 cells is the smallest window through which line endings, changes of direction and lines intersection can be detected (section 1.4). The image cleaning operation has made it possible to use such a small size window to detect these nodes. The same scanning-windowing scheme, HSCANW33, in conjunction with addressing a table, NEX-TABLE, is used here to detect the nodes in IMAGEM. The contents of NEX-TABLE are the class codes of the nodes. OR /> •12 25 OR © . e> • e N MORE THAN TWO LINKS INTERSECT IN ONE POINT © e- o Figure 2.6 Line Representation of Node Types 26 12 1 1 1 ■12 1 1 1 '12 1 1 1 N 1 1 1 1 u 12 1 1 1 1 1 Figure 2.7 Binary Representation of Nodes 27 b. Merging Because of the way that people draw line-like patterns (section 1.4), the quantization noise and the small size of the pattern used in node extraction, one may expect that not all the extracted nodes are necessary for representing the drawn character. With this in mind it may be observed that a net change of line direction may be attributed to the sum of small local changes in direction and the place of this net change is not critical. One may also ignore changes in line direction (R, L, U, and D) when they are too close to line- intersections (N) or line-tips (T) . To make use of these observations, line-like drawings were studied and a set of MERGING RULES were devised. These rules when applied to the NODEM contents leave some nodes as local features. The MERGING RULES are designed to be applied on two neighboring nodal cells in a certain order defined later. Each two neighboring nodes can be seen as interacting and the result of interaction will determine the net result. Three types of interactions are defined between neighboring nodes: i. Cancellation Interaction : (Figure 2.8(a)) This occurs between R and L-type or U and D-type nodes. The result is to null the two nodes i.e., rewriting the two cells contain- ing them as type nodes. ii. Domination Interaction : (Figure 2.8(b)) In this type of interaction one node dominates the other. After merging, the dominated node is nulled and the dominating node remains unaltered. Examples are interactions between N-type or T-type and 28 a) Cancellation Interaction b) Domination Interaction * c) Modification Interaction Figure 2.8 Examples of Nodal Interactions 29 any other type node resulting in dominating N-type or T-type nodes, respectively. iii. Modification Interaction (Figure 2.8(c)) This is similar to the previous type except that the dominating node is rewritten as another node (modified) and the dominated node is deleted (nulled) . The ALGORITHM MERGE has been designed for performing the merging operation using the HSCANW23 scheme and applying the MERGING RULES (Figure 2.9) . ALGORITHM MERGE : 1. Scan NODEM using HSCANW23 (section 2.3). 2. For each position of W23 apply the ASCANN12 scheme. 3. For each position of W12 find the MERGING RULE which can be applied to merge the two cells seen through W12. Rewrite the contents of these two cells. This algorithm needs five steps for each NODEM cell: one step for positioning W23 and four steps for scanning W23 using ASCANW12. 2 Therefore, the total number of steps is 5N , where N is the dimension of the NODEM matrix. However, as will be seen in the hardware imple- mentation of the algorithm (section 3.6), one can reduce this number 2 to 4N by overlapping the positioning of W23 while merging two cells. 2. 7 Structural Features Extraction Structurally, a character is described by means of horizontal (or vertical) segments connected together by links, as has been seen in Figure 2.1. This level of description will supress some of the 30 i. Cancellation Rules W * R 12 L 12 /L 12 R 12 /R 3 L 3 /L 3 R 3 U 12 D 12 /D 12 U 12 /U 3 D 3 /D 3 U 3 ii. Domination Rules T0 «■ Tn ^ /. where n is any node other than N. 0T +■ nT J N0 + Nk ^ / where k is any node. 0N «- kN J 0U 12 * R 12 « 12 /«3»12 /I 12 12 /L 3 U 12 iii. Modification Rules 0P •*■ TT U 12 - U 12 R 12 /U 12 R 3 U 12 L 12 /U 12 L 3 0D 12 - R 12 D 12 /R 3 D 12 / 0R 12 ■*■ L 12 R 3 r D, /L„D 12 12' 3 12 "12 R 12 /D 12 R 3' 12 L 12 /D 12 L 3 R 12 *" R 3 L 12 D no «• D 10 R 10 /D 10 R„/ 0L 12 R 12 L 3 D, „L, ,/D, „L L 12 * L 3 R 12 0U 12 * D 12 U 3 U 12« * U 3 D 12 (»D 12 <■ U 12 D 3 D « ♦ D 3 U U Figure 2.9 MERGING RULES 31 details (local features) and can be used to classify the characters into subgroups having the same structures. It also allows for a variety of forms of the same character to be recognized using the structural features (Figure 2.10). a. Segmentation The character is segmented horizontally or vertically. A segment may consist of several separable subsegments if there are no lines interconnecting them within the segment. Each subsegment may consist of a part of a line or one or more local features. These local features are related within the subsegment by the spatial relationships 'left of or 'above', in case of horizontal or vertical segmentaion, respectively . Segmenting the character into horizontal segments will be con- sidered here. Intuitively, the local features which appear in con- secutive rows in NODEM could be included in the same horizontal segment, However, a limit on the number of these rows has to be set. Otherwise, a character which has local features in every row will be considered as a horizontal segment no matter what its size is. In this case the spatial interrelationship in the vertical direction will be lost. To avoid this difficulty and to be compatible with the previous processing (3x3 window), no more than two rows are merged into one row if each row contains at least one local feature. In other words the WORKING STORAGE memories (IMAGEM, LINKEM and NODEM) are segmented according to the local feature distribution in NODEM. In view of this discussion the following algorithm defines the boundaries of horizontal segments. 32 A A A A A A A -R A A Figure 2.10 Different Shapes of 'A' with the Same Structural Features (Horizontal Segmentation) 33 6tG0RITHM_HSEGMENT : 1. The first segment begins at the first row of the working storage memories. 2. It ends at the row for which one of the following situations occurs: i. Both the rows immediately above and below have local features, ii. A previous row has local features in the same segment and the next row also has local features, iii. The whole working storage has been scanned. 3. The next segment begins at the row immediately below the last segment. 4. It ends at the next row for which one of the previously mentioned situations (step 2) occurs. While defining the boundary rows of a segment a vector is constructed to define the boundaries of its subsegments. This vector is called the separation vector (SEPVEC) . 22D£t£ucti^n_of_SEPVEC SEPVEC has the same length as the dimension of the working storage memories (IMAGEM, NODEM and LINKM) . It is constructed by pro- jecting vertically the points (l's) contained in the segment onto a horizontal line (HSEPVEC) (Figure 2.11). In this example the segment under consideration consists of two subsegments. It is obvious that adjacent points (l's) in SEPVEC define the ranges of the subsegments separated by adjacent (O's). 34 A HORIZONTAL SEGMENT SEPVEC 1 1 1 1 1 1 i 1 (• : L 1 1 l 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II h i f " 1 1 l ] I o|o 1 i|o Figure 2.11 Example of Horizontal Separator Vector (HSEPVEC) 35 b. Linking of Segments Linking is the property that two neighboring segments are connected by links. Since these links are at the boundaries of segments, they contain no nodal features. Links are designated as input (ILINKS) or output links (OLINKS) according to the way they connect the segment to its neighbors. ILINKS are either ULINKS or LLINKS. The OLINKS are either DLINKS or RLINKS according as the segmentation is horizontal or vertical (see Figure 2.1). In other words, IOLINKS flow vertically for horizontal segmentation — or horizontally for vertical segmentation. A simple way of extracting these IOLINKS is to eliminate horizontal or vertical lines from the binary image in IMAGEM for horizontal or vertical segmentation, respectively. The resulting binary pattern is stored in LINKM. LINKM is then segmented and the links that appear on the segment boundaries are the IOLINKS for the particular segmentation. An algorithm which will do just that follows. ALGORITHM_IOLINKS 1. Clear LINKM 2. Scan IMAGEM using the HSCANW12 (VSCANW12) scheme for horizontal (vertical) segmentation. 3. For each position of W12 in IMAGEM copy its right (down) cell into the corresponding cell of LINKM if and only if its other cell is 0. After applying this algorithm, LINKM memory will contain a subset of the character binary image (IMAGEM). After segmentation if 36 row I and row J (I$ S T * * L 3 T N T D I2 D I2 T o Ul Q cr UJ i- o < cr < co UJ Q o o i o O z cr < => H H < o Q =) cr H c/) CO 1- 2 UJ 2 UJ _l UJ 1 > Z »- O z — gs UJ q; H »- UJ to i z > o o to o o H PQ w M fa M CO CO CJ CO CN 0J 60 •H fa 42 o O UJ > UJ (T < UJ 3 0. in h- z UJ UJ _J UJ I > m UJ o o o cc UJ o < < X o X! ■u * H o ■u o 0) > 0) S-i 3 en ■u Xi a. 4J •H X! T3 CO a C cfl o •H LO •u H n) ts CD 1 p3 w -J 01 w XI 1 H > CN K a H o o < UJ u. UJ > 2 *— Z UJ o UJ O §E tt H z z eg 3 O h- < < (T z tu UJ J- z * U. X Ul q; CO Q UJ UJ UJ O K a o O o O < < a. UJ z oc cc Ul < 1- I a. < tr a or < UJ O I N CO to UJ o O INPUT C REC06NI (T Q. i I Ul f cr 1 Q. 1 ? i IH >-^ 3 a. x" z — _l UJ z < Q. CO a o PQ CO a QJ u a o 25 iH d) H 60 •H Pn 53 -J MASTER CONTRO • • • u U i i a> v m 3 en UJ 4-1 CJ 2» 4-1 _l — •H < "J * _j o UJ <1 GEN NTR o u O 2: H u CM CO QJ J-4 3 oc ■H Mm >- or o UJ to UJ _1 Q O H- UJ < _) -J => 3 Q Q- O < 2 cc en O UJ K Z _! (/) H Ij O UJ UJ -■ q: -J _l _l _J H => o o z S o cr o o i h o 2 zz UJ O CL O o 54 are needed to perform them. A control module produces open-collector signals which can be wire-ORed with other control signals. This arrangement facilitates modularizing the controls as well as enabling the addition and modi- fication of control signals or even entire modules. In terms of the algorithms described in Chapter 2, the memory modules hold the data structures inherent in the algorithms, the data manipulation modules perform the operations and the control modules implement the sequencing of these operations. 3.2 The PANEL Besides its communication role as input-output tool the PANEL (Figure 3.3) has been designed to help the user to direct the operation of INCOM. It also has debugging aids which help in constructing the device. Its switches and indicators are grouped into OPERATION SELECTION, CLOCK, SIMULATED TABLET SIGNALS, CONTROL COUNTERS, DATA IN and DATA OUT groups. The OPERATION SELECTION group switches are set according to the chosen machine mode. The MODE switch is active only when the OPERATION SELECTION switch is in the NORMAL position. By choosing the RECOGNITION or the LEARNING position of the MODE switch the user can make the machine perform these respective functions. In the PROGRAM position the INPUT switch is active and the DATA IN switches are interpreted by the machine according to the position of the INPUT switches . To help in operating the machine, status LEDs are lit when actions are required from the user. To inform the user the PROCEED LED 55 < ~"> r > ^ ^ + ^ z 4- z 1- < UJ < K 4- o z 4- a: > o it 4- 4- £ 1- o t- a z> & 4- 4 z ^ + ) o ^ 4- f?| + 1 4- u < 1- + < < 4 < 4- H UJ a 4- 3 4- «> si> 3« J> + ) -> o f in L J J z o l/> UJ c t 2 Si ) Z — f Si ^ j O Z UJ j o tr < cc 4- 4- 4-4 4-4- 5 * > 1 u. O. O 4 HO . T t UJ uJ < O INIT DRA ; or _| o < 4-4- 4- + a. e + Z 4 D 3 4 4- + 4-4-4- + f MODE + INPUT 4- 10 < u 3 ( 1 1 1 3 + i + r 4 + 1/1 u. 4-4-4-4- + 4- 4 < aa u > NORMAL PROGRAM 4- 4- 4- i J y S r o +> + Q UJ —i $ B a B o o *• f. o 4- is 55 o o _J »- < _J 2 ET SIG + CONTA (J a > z UJ X 4- 4- 4- REAOY ENTI + Ul UJ (—1 tt UJ if" a. uj O t/> o if) 3 - 4- 4 < J K H J QJ H 3 00 ■H DO is turned on in conjunction with one or more of the remaining status LEDs. These are INITIALIZE and DRAW in the OPERATION SELECTION group, the CHARACTER DISPLAYED in the DATA OUT group and the DESPOSIT LED in the DATA IN group. The user should act according to the requirements of the status of the machine before pressing the PROCEED button. Once this button is pressed the machine resumes operation until it reaches another state in which a status LED is lit asking for a new action from the user. When the INITIALIZE LED is on, the machine is in the INITIALIZATION state. If the user wants to train the machine from scratch, the main memories of the machine are initialized. If this is not the case and the operation is NORMAL, the MODE switch is posi- tioned accordingly. In the LEARNING mode the input character code is inserted via the DATA IN switches before pressing the PROCEED button. In the RECOGNITION mode there is no need to input anything at this stage, except to push the PROCEED button. This will make the machine go to the next state, DRAW, and the DRAW LED is lit. The user can now draw a character point by point by pressing the ENTER Switch. The DRAW action using the SIMULATED TABLE SIGNALS is enabled when these switches are in the CONTACT and X-Y COORDS positions. When the user has completed the drawing, the upper switch must be placed in the END position for the machine to begin processing the stored image of the drawn character. As a debugging aid the master CLOCK can be FAST or MANUAL. In the FAST mode the CLOCK runs at its maximum speed, while in the MANUAL mode pressing the STEP button will pass one CLOCK pulse only. This is used for single stepping through the machine states. The 57 machine states are displayed using the CONTROL COUNTERS' LEDs . Each control counter is indicated by a letter (M,S,F,A,B,C, V) followed by corresponding LEDs. The DATA IN group consists of the DATA IN switches and LEDs and the DEPOSIT switch and its LED. The DATA IN switches are enabled when the MODE switch is in the LEARNING position or the OPERATION SELECTION switch is in the PROGRAM position. Information to be input is set up on the DATA IN switch and then the DEPOSIT switch is pressed. The DATA OUT group consists of the DATA OUT LEDs, which dis- play the code of the recognized character, the CHARACTER DISPLAYED status LED, and the CORRECT/TRY AGAIN! switch. The CHARACTER DISPLAYED LED is on when the machine is in the recognition mode and asking the user if satisfied with the character code displayed in the DATA OUT LEDs. Accordingly, the user will set the switch to CORRECT if satisfied or to TRY AGAIN! otherwise. The PROCEED switch is then pressed to let the machine continue operation. 3.3 System Organization and the MASTER CONTROL Before describing in detail the different processors and modules of INCOM, the operation of INCOM will be described step by step following the flow chart of the MASTER CONTROL (M-CONTROL) (Figure 3.4) This description will provide a general understanding of the different processes involved in learning and recognition of input characters. Reference is also made to the SYSTEM DATA FLOW DIAGRAM (Figure 3.5) in conjunction with the description of the operation of the M-CONTROL. For convenience Table 3.1 of PROCESSORS-OPERATIONS- MODULES is included, in which the different operations performed in MO RESET 58 ] Ml INITIALIZE M2 DRAW CHARACTER PREPROCESSING M3 FILL IMAGE FEATURE EXTRACTION CLASSIFICATION YES M4 CLEAN IMAGE M5 EXTRACT NODES M6 MERGE NODES j| M7 H-LINK M8 H-FORM VECTOR M9 H-CLASSIFY MIO DISPLAY CHARACTER CODE RECOGNITION NO TRY AGAIN LEARNING " Mil V-LINK M12 V-FORM VECTOR M13 V-CLASSIFY " M14 NO OPERATION t M15 NO OPERATION Figure 3.4 System Operation Flow Chart (M-CONTROL) 59 I u CO o .-H Pn cfl Q QJ •U LP| CD 3 •H P-! 60 Table 3.1 PROCESSORS-OPERATIONS-MODULES en Pi O en en w cj o Pi PL, SUBPROCESSORy PARTS OPERATIONS PERFORMED MODULES INVOLVED MAIN CONTROLS SPECIAL CONTROLS DATA MANIPULATORS MEMORY MODULES Pi O CO CO u § Pm -DRAW -FILL -CLEAN ►J o pi H Z O CJ S-CONTROI PANEL+ PRELOG PRELOG IMAGEM Pi O H CJ to H -NODE EXTRACTION -MERGING -LINKING o Pi H Z O u 1 -SEQUEN- CER -PRELOG -MERGER -PRELOG -IMAGEM+NODEM -NO DEM -IMAGEM+LINKM -SEGMENTATION -FEATURE VECTOR FORMATION o pi H Z o u 1 -TEMP, SEGM. -ENCODING LOGIC -IMAGEM+NODEM +LINKM -SUB VECTORS MEMORIES CLASSIFIER o M H cj S3 Pi H en z o o -V.EL. CONSTRUCTION o Pi H o cj 1 V-CONTROL -V-ELEMENT CONSTRUCTION MODULE -SUBVECTORS MEMORIES RF resistor OF FEPOLIST o z H £g < H Q CJ P Pi H en LEARNING/ RECOGNITION ■J hJ o o pi Pi H H Z Z o o CJ CJ 1 1 < CQ -INSERT LOGIC OF CONFUMAT FEP0LIST+ C0NFUMAT+ CHAR res. of DECISION MODULE+ CHAROUT of PANEL Z o H en M CJ w Q RECOGNITION h4 O Pi H Z 8 J. C-CONTROL -DECISION MODULE RF of FEPOLIST 61 INCOM are listed in conjunction with their associated modules. The M-CONTROL has 16 states (M0-M15) , two of which, Ml 4 and M15, are dummy states in which no operations are performed. The normal learning cycle begins with Ml, ends in M13 and returns to Ml. In the first phase of recognition, M2-M9, the character image is processed by scanning it horizontally, while in the second phase M11-M13, the scanning is done vertically. The code of the recognized character is displayed in M10 after the first phase, waiting for the user response. According to the response, the machine may repeat the first phase by returning to Ml and waiting for a new character. Otherwise, the machine will start the second phase completing the process, displaying the character code and returning to Ml. While describing the operations of INCOM the actions required by the user will also be stated. MO: RESET On power on, or when the RESET switch is pressed, the M-CONTROL counter is cleared and the RESET LED is turned on. The WORKING STORAGE memories (IMAGEM, NODEM and IOLINKM) are also cleared in this state. The PROCEED LED will be ON waiting for the user to press the PROCEED switch. When pressed, the machine will proceed to the next state, Ml. Ml: INITIALIZE The INITIALIZE and PROCEED LEDs will be on indicating that the machine is ready for initialization. If a new set of V-ELEMENT CONSTRUCTION SCHEMES are to be written into the VMEM memory of the 62 V-ELEMENT CONSTRUCTION MODULES, the OPERATION SELECTION switch is set in the PROGRAM position and the user enters the necessary information using the DATA IN group and INPUT mode switch. Each time a new set of these SCHEMES is introduced the CLASSIFIER memories (FEPOLIST and CONFUMAT) have to be initialized. This initalization involves clearing CONFUMAT and FEPOLIST memories. In the normal operation of INCOM the above initialization is done only once, during start-up. After that the OPERATION SELECTION switch will remain in the NORMAL position while the MODE is set either in the LEARNING or the RECOGNITION position. In the LEARNING mode the input character code is deposited in the CHAR register of the DECISION module using the DATA IN group. This will place the character code into the CHAR register/counter of the DECISION module. In the recognition mode the DATA IN switches are disabled, and this step is ignored. In either case PROCEED has to be pressed. M2: DRAW CHARACTER The DRAW LED will be on signalling the user to begin drawing a character. The CONTACT/END switch, of the SIMULATED TABLET SIGNALS group, should be in the CONTACT position as long as the user has not finished drawing the input character. When the user flips the switch to the END position the machine proceeds to the next state and begins processing the character. This DRAW operation is under the supervision of the S-CONTROL (Appendix A ) . 63 M3: FILL IMAGEM. M4 : CLEAN IMAGEM The preprocessing operations of Fill-in-gap and Clean-image are performed by the PRELOG module under the supervision of the M- CONTROL while scanning IMAGEM horizontally. Upon receiving a completion of scanning signal from the PRELOG module the machine proceeds to the next state enabling the node extraction operation. M5: EXTRACTION NODES This is the front end state of the FEATURE EXTRACTION processor. While scanning both IMAGEM and the node memory, NODEM, nodes (MNEX) are extracted from IMAGEM by the PRELOG module. They are routed through the MERGER module to store them in their corresponding cells in NODEM. At the end of scan (CRV=0) the machine proceeds to MERGE. M6 : MERGE During the MERGE cycle, the COPYING REGISTERS in the MERGER module will act as the W23 window of NODEM. While scanning NODEM, the COPYING REGISTERS contents are merged using the MERGING RULES and shifted back into NODEM. One extended scanning cycle is needed for the completion of merging. This cycle needs 4x256=1024 Master clock cycles . M7: H-LINK, Mil: V-LINK While scanning the WORKING STORAGE the LINKING part of the PRELOG module extracts LINKS from IMAGEM and stores them in the LINKM memory. The H-LINK and the V-LINK operations are identical in every respect except for the scan mode. 64 M8: H-FORM VECTOR, Ml 2: V-FORM VECTOR The FEATURE VECTOR FORMATION SUBPROCESSOR control (F-CONTROL) is enabled while scanning the WORKING STORAGE memories. The SUB- VECTORS, ILINKV, OLINKV and TABV, are formed using the TEMPORARY SEGMENT module, the F-CONTROL and the ENCODING LOGIC of the FEATURE SUBVECTORS module. They will be stored in their respective memories of the last module. According to the direction of scanning the H-FORM or theV-FORMis enabled. One scanning cycle is all that is needed to complete the FORM operation. At the end of scanning, the F-CONTROL returns control to the M-CONTROL which proceeds to the next state. M9: H-CLASSIFY, Ml 3 V-CLASSIFY The CLASSIFIER processor is enabled in this state. It consists of the V-ELEMENT CONSTRUCTION, the DATA STRUCTURING and the DECISION parts. The first, with its control ( V- CONTROL) , constructs the front- end of the CLASSIFIER. It reads parts of the SUBVECTORS memories forming a V-ELEMENT. Each formed V-ELEMENT will be in the register file RF of the FEPOLIST modules of the DATA STRUCTURING part waiting for further processing. The DATA STRUCTURING part has two control modules: the A-CONTROL and the B-CONTROL. The A-CONTROL acts as the main control of the CLASSIFIER processor while the B-CONTROL comple- ments the operation of the A-CONTROL. In the learning mode new entries are added to, or modifications are performed on the contents of the memories of the DATA STRUCTURING SUBPROCESSOR (FEPOLIST and CONFUMAT) to accommodate the input character code and its constructed V-ELEMENTS . When all the possible V-ELEMENTS 65 of the current scanning mode have been processed the A-CONTROL returns control to the M-CONTROL. In the recognition mode the FEPOLIST memory is searched and the CONFUMAT memory is retrieved to find the character codes correspond- ing to the V-ELEMENTS constructed. If a V-ELEMENT is found in FEPOLIST which matches the constructed V-ELEMENT of the drawn character and has a unique character code associated with it, this character code will be displayed in the DATA OUT group of the PANEL. Then control will return to the M-CONTROL. On the other hand, if a V-ELEMENT is found with a pointer to a word in CONFUMAT, this indicates that there is more than one character associated with this V-ELEMENT. The CONFUMAT is read, under the B-CONTROL, and added to the previously accumulated scores of the previous retrievals of CONFUMAT. These scores are stored in a set of registers in the DECISION module. Now the C-CONTROL, which controls the DECISION part, takes over and the character code which corresponds to the maximum store is found and displayed. However, this will not terminate the classification process unless all the V-ELEMENTS have been processed. The processes described in this paragraph are repeated for all the remaining V-ELEMENTS and at the end the CLASSIFIER returns control to the M-CONTROL. M10: DISPLAY In the learning mode the machine will proceed to the next state, Mil, on the next clock, pulse. While in the recognition mode it will stay in this state as long as the user does not press the PROCEED button. The CHARACTER DISPLAYED LED is now on indicating that the 66 DATA OUT LEDs display the code of the recognized character. In this case the user has the choice of returning the machine to state Ml or proceeding to the next state Mil. If the displayed character code is satisfactory, the user may position the switch in the DATA OUT group on the PANEL to the CORRECT position and press PROCEED returning the machine to Ml, waiting for a new input. However, if the displayed code is not satisfactory, positioning the switch to the TRY AGAIN! position will make the machine proceed to Mil to 'look 1 at the character from a different view (vertical scanning) . 3.4 The WORKING STORAGE The WORKING STORAGE consists of the memories which hold the image of the input character and the features which are to be extracted from this image. It also contains the necessary logic for the scanning and windowing described in Chapter 2. There are three distinct memories, the IMAGEM, the NODEM and the LINKM. These memories are essentially two dimensional matrices of the same dimensions: 16x16 cells in the current implementation. A submatrix in each memory, called window, is 3x3 cells through which the contents of each memory can be processed. This window is located identically in the upper left corner of the matrices. Although this window is fixed in posi- tion, the particular organization and implementation of the memories make it possible to virtually move this window one location horizontally or vertically, in one clock cycle. Each memory consists of one or more WORKING STORAGE planes. IMAGEM and LINKM are one plane each while NODEM is four planes. 67 WORKING STORAGE Plane Organization ; A WORKING STORAGE plane is organized having the following design objectives in mind. i. Resemblance, as close as possible, to the two dimensional properties of the digitized input pattern. This suggests regularity, with cell structure and accessibility in both row and column dimensions, ii. Ease of scanning of the memories horizontally (row by row) or vertically (column by column) . iii. Ease of accessing windows. Ideally, by addressing the center cell of a window, its cells will be easily available for processing. The universal shift register (74198) chips were found to meet the requirements stated above if they are connected in the organization shown in Figure 3.6. Each column consists of two 74198s, giving 32 chips in a 16x16 plane. In the upper left corner is the 3x3 window (W33) through which processing is performed. Inputs and outputs of an array are connected through selectors as shown in Figure 3.6. By skewing the outer row (or column) one cell position and feeding it back to the input while shifting the whole array, all the memory cells in the plane will pass a chosen fixed location. Thus, scanning horizontally (or vertically) is achieved. This fixed location is chosen to be the center cell of W33. With this technique the physi- cally fixed window W33 can be effectively scanned over every location in the plane. To avoid end effects when the window is at the edges of the plane, the information contents of the memories are always contained 68 o UJ z z < CE O CO (X) > ' UJ o < cr o ^5 (/) / A , V ' f * , Y<£ ro o Q Z o •H 4-J CO N •H c CO 60 M O cfl a c/r — 1 1 r i ■ " J U-L-J | ^ 1 ^ „ J H • CO 1 -J- fi?i ii i~L f X] •H c0 r o z Pi z 4 4-1 O * 0) • ^ U. o """ ' ^ I - i - UI I— -J 1 _J i T L 9NINNV3S JO N0I1D: 5161'" i51 j Lx •-•u 1 o is jaia £ c < CO r-i r» cO ■u c N >■ •h ; H e O 'r PC f» CJ F[ iNK >-~ HV ■"- : , 'Cj 2C , 2C r 1C, 1C, l^ol Lf.YN o- 1 B [tNj Q, ' M19J ° f=B= - i C x ON CLR "•[ID ?" '4193 m. *> > T -A V ii^ a 3" 0.0 3x1 j — *Ji J 3lf ic n •C| 74155 2C. 2* — ADDRESS ING I STORING LOGIC 1 — T"j T 6 i T J NE5it» i IK - . ! "1 J I J" " (PEPOSIT) l.OrN I 74157 T 3LDYN 3LDXN TABLET SIMUL1TI0N S IMAGE 0ISPIA1 LOGIC IZ0 3 [>- 1>°~ 74CH DISPLAY SIGNALS Figure 3.8 PRELOG Logic Diagram 72 The display signals, the counter load signals and the cursor positioning signals are generated and controlled by the TABLET SIMULATION and IMAGE DISPLAY LOGIC. When the XYCORDS signal is high the counter load signals are disabled and the cursor signals are enabled and passed to the ADDRESSING and STORING LOGIC. The XN counter is incremented by a SLOW CLOCK (.5 sec) to advance the cursor in the X-direction on the display screen. For convenience the YN counter can also be incremented by pushing the DEPOSIT button on the DATA IN group of the PANEL. When XYCORDS is low the load signals LDYN and LDXN are passed to their corresponding counters. In this case the device can accept input coordinates from a tablet digitizer under the S-CONTROL. Because an actual tablet is not available the TABLET SIMULATION XYCORD signal is kept HIGH. Preprocessing and node extraction involve scanning the IMAGEM memory and using the outer cells of its window, OIMC2-OIMC9, to address the contents of ROM-1. ROM-1 holds the FIG, CLN and NEX-tables, as described in Chapter 2. The FILL and CLN bits of ROM-1 are used for preprocessing while the NEX bits are used for the node extraction operation. The operations of storing points in IMAGEM or preprocessing its contents can be viewed as modifying the output of the center cell, 0IMC1, before storing the result in its next position in IIMC6 . These modifications are simply done by a multiplexor whose output, IIMC6, is controlled by the control signals ECORD, EFILL and ECLN. When all these signals are inactive, high , 0IMC1 is passed without change to IIMC6. The storing operation ECORD=LOW, is performed by OR-ing the output of the coincidence comparator with 0IMC1 before it is passed 73 to IIMC6. Since the fill-in-gap operation is actually inserting new points, it is similarly done except that the FILL bit of ROM-1 is ORed with 0IMC1 passed to IIMC6 when EFILL=LOW. When ECLN=LOW the clean-image operation is performed by ANDing the CLN signal with 0IMC1. To extract nodes from IMAGEM, the contents of its window outer cells, OIMC2-OIMC9, are used to address ROM-1 and the output NEX bits are ANDed with the contents of the center cell (0IMC1) to form the extracted node MNEX. MNEX is routed through the MERGER module for writing in the corresponding NODEM cell during the node extraction cycle (ENEX=0) . The LINKING portion of the PRELOG module performs the linking operation by copying the contents of IMAGEM into LINKM with either the horizontal lines or the vertical lines deleted according to whether the scanning is horizontal or vertical. Two adjacent cells, including the center cell of the IMAGEM of window are tested. The next content of the center cell of LINKM window is cleared if the processed outer cell of IMAGEM window has a '1' in it. The outer cell is either IMC, b or IMCo, depending on horizontal or vertical scanning. 3.6 The MERGER The MERGER will be described using Figures 3.9 and 3.10. Merging is enabled when EMERGE signal from the M-CONTROL is low. In this case the COPYING REGISTERS act as the NODEM W23 window. The logic associated with them acts as an interface between the MERGER module and NODEM by rerouting the processed cells of the window back to NODEM. 74 >- DC < z o I- o Q A o z CO o CO — a:© Q - 1 O < A CO or UJ t- co e> UJ o: >- Q. O c_> o e> o ot3 A k > ' ^ — SEQUENCING LOGIC V ■i* <. — u 00 aj •H o a o PQ Pi w o CO QJ bO •H ro O Q 2 UJ Q O Z u_ O A 1 < i t ( \~" / 75 i > OUCDCZ I Figure 3.10 MERGER Logic Diagram 76 To perform merging the SEQUENCER simulates the ACSGANW12 scheme by sequencing the register clocks and selecting two cells of the copy registers to address the MERGING DICTIONARY (ROM-2) through the ADDRESSING LOGIC. Each ROM word corresponds to a merging rewriting rule for the two-cells combination which addresses it. The addressed word is used to rewrite the contents of the two cells in their new positions. In other words, one clock cycle is needed to address the ROM while the previous two cells are rewritten. There are 4 two-cell combinations processed for each position of W23. Instead of using a clock period exclusively for positioning W23, positioning the window as well as processing the first two-cells combination are performed simultaneously. This saves a clock period: instead of five clock periods to process W23 only four are required. The scanning clock (CKW) of the WORKING STORAGE is controlled in this module. During merging (EMERGE=0) a divide-by-4 counter is enabled and the main clock is scaled down by a factor of 4 to produce CKW. However, in any other state, the counter is always reset to and CKW is the same as the main clock. At the same time the COPYING REGISTERS are bypassed. In the NODE EXTRACT state (ENEX=0) , the MNEX output of PRELOG is routed through the MERGER to be stored in NODEM, while the merging operation is disabled. 3.7 FEATURE VECTOR FORMATION SUBPROCESSOR : By the time the control is transferred to this processor (EFETV=0) the WORKING STORAGE memories, IMAGEM, NODEM and LINKM will contain the different representations of the character: the preprocessed image, the local features and the horizontal or the vertical links. 77 The SUBPROCESSOR performs segmenting and subsegmenting the character representations of the WORKING STORAGE, as well as coding the resultant segments into the three feature subvectors ILINKV, OLINKV and TABV. These subvectors are formed concurrently and stored in their corresponding memories . The SUBPROCESSOR consists of three main modules: the TEMPORARY SEGMENT, the FEATURE SUBVECTORS MEMORIES and the F-CONTROL (figure 3.11). The TEMPORARY SEGMENT module acts as the front end processor while the FEATURE SUBVECTORS MEMORIES hold the processed subvectors. The F-CONTROL modules directs the different operations performed by the SUBPROCESSOR. The data manipulation logic of this subprocessor is distributed among its different modules, so as to reduce the number of modules and the total number of components and interconnects. At the same time they are scanned the WORKING STORAGE memories are segmented and each segment is compacted into three temporary vectors by the UPDATING LOGIC and stored in their corresponding registers in the TEMPORARY SEGMENT module. The contents of these vectors as well as the output of LINKM memory are transferred through the F-CONTROL module to the FEATURE SUBVECTORS MEMORIES module where they are encoded using the ENCODING LOGIC. When the end of the currently scanned character segment is encountered the encoded subvectors elements are stored into their respective locations in the FEATURE SUBVECTORS MEMORIES. 3.8 The TEMPORARY SEGMENT Module The segmentation of the WORKING STORAGE is determined by the distribution of nodes in NODEM (Figure 3.12). The history of the segmenting flip-flop (SEGF) in the F-CONTROL module represents this distribution. A 78 uj o: co ce o — Z5 h- LU |-°=! < LlI 3 LU > Q LlCDO 3 5 CO CO CE (0 o LU o rr < UJ o < > CD Z) CO LU 5 Q ADDRESS A 4 o a: o e CO M 60 cd •H Q ^ U O PS O CO CO w u o Pi Ph pq CO Pi o H U w > tod •H 79 > a. CO > o o z 5 o 5 , — o cc o h- o UJ > CO UJ Q O CO z tr z _l o H-l »- > > tr < Q. Q O z -J M UJ to 1 i i 1 1 1 1 ► *• UPDATING LOGIC 1 — »» t i J 1 i 1 3 O o. an CO > or o UJ 5 UJ 83 pointer. The pointer points initially to the position immediately left of the 4-bit INSERT REGISTER. Every time an event is encountered in the scanned segment the COUNTER is incremented, thus moving the pointer one position to the right. At the end of each subsegment, a string of l's in the SEPARATOR VECTOR, the position of the pointer, which now equals the number of events, is OR-ed with the current contents of the INSERT REGISTER. When the scan of the current segment is completed, the different INSERT REGISTERS are written in their corresponding SUBVECTOR MEMORY locations addressed by the SEGMENT NUMBER COUNTER (BC counter) . 3.10 The FORMATION SUBPROCESSOR Operation The subprocessor operations (Figure 3.14) is controlled mainly by the F-CONTROL module. Besides the control logic there are two flag flip-flops (SEGF and HVF) and two loop counters (U and V). The SEGF is set to '1' when- ever the next row (or column) of NODEM contains at least one node. It is always cleared at the beginning of each row (or column) (U=0) . Its output is always sampled by the F-CONTROL at the end of the currently scanned row (or column) . The HVF is the scanning mode flip-flop. It is set to 1 or to according as the mode is horizontal or vertical. The scanning counters U and V indicate the position of the currently processed cell. The U-counter indicates the column or row while the V-counter indicates the row or column position for the cases of horizontal or vertical scanning modes. The U-counter is incremented by the WORKING STORAGE scanning clock (CKW) . The carry output of the U-counter indicates the end of the currently scanned row or column 84 END-F FO [initialize] BEGIN FIRST SEGMENT UPDATE 1 F2 UPDATE 2 F3 UPDATE 3 F4 CONSTRUCT FIRST SEGMENT YES F5 BEGIN NEW SEGMENT F6 CONSTRUCT NEW SEGMENT YES / tNU Uf \ NO SCANNING Figure 3.14 F-CONTROL Flow Chart 85 and it also increments the contents of the V-counter. The carry of the V-COUNTER (CRV=0) signals the end of scanning. Before describing the operation of the F-control we observe the following. First, the memory buffers and counters of the ENCODING LOGIC, in the FEATURE VECTOR module, are always cleared at the falling edge of the master clock when U=0. Second, the BC counter holds the number of the currently scanned segment. It is updated by an incrementing pulse from the F-CONTROL whenever a new segment is encountered. At the end of scanning it contains the total number of segments minus one. While scanning the WORKING STORAGE the following operations are performed in the subprocessor. FO; INITIALIZE (BEGIN FIRST SEGMENT) : The TEMPORARY SEGMENT vectors SEPV, NODV and ILV and the BC counter are cleared. The control remains in this state for one clock period. At the end of state the output of SEGF is sampled. Since SEGF is not cleared yet, its contents will indicate whether the row (or column), which is about to be processed, has a node in it or not. If SEGF=1 the control goes to state F3 . Otherwise, it will proceed to state Fl. Fl : UPDATE1 : The processing of the first segment (BC=0) begins in this state. ILV will remain cleared. This indicates that there is no connection between this segment and the segment before it; because it is the first segment. The updating of SEPV and NODEV begins here. The duration of this state is one clock period. 86 F2 : UPDATE2 ; The procedure which has begun in Fl is continued here. This means that SEPV and NODV are updated. Since SEGF is not tested at the end of Fl, it means that the current row (or column) is ORed in the SEPV and NODEV whether it contains a node or not. At the end of F2 (CRU=0) , SEGF is sampled. If SEGF=0 the control remains in F2, otherwise it proceeds to F3. F3 ; UPDATE3 ; This state indicates the first occurrence of a row (or column) which has a node in it. Updating is also continued here but only for one row (or column) . F4: CONSTRUCT FIRST SEGMENT : The updating of the SEPARATOR and NODES vectors is continued. At the same time the encoding of the different subvector elements is performed. At the end of this state the constructed elements in the memory buffers are written in the memory locations addressed by BC. This means that they will reside in location '0'. The control will remain in this state unless SEGF is 1. If there are no other segments the scan will end by branching to F7. The subsequent segments are processed during states F5 and F6. F5: BEGIN NEW SEGMENT : The processing of a new segment, other than the first, begins here. To indicate this fact BC is incremented. The TEMPORARY SEGMENT vectors are cleared at the beginning of the first row (or column) before storing the new information in them. The remainder of the state is devoted to updating the above vectors. The ILV will contain the first row of the new segment of LINKM. To insure continuity and consistency the last row (or column) of the last segment is ORed with the now new row (or column) of the new segment. The NODES vector is updated as usual. F6: CONSTRUCT NEW SEGMENT : The operations performed in this state are similar to that performed in F4 except they are performed for new values of BC which is not zero anymore. F7: ENDF: When the end of scan is reached (CRV=0) the control is trans- ferred to this state. On the next clock cycle the control is transferred to FO waiting for the next activation of the F-Control (EFETV=0) . 3.11 The CLASSIFIER Besides the controls the CLASSIFIER consists mainly of three parts; the V-ELEMENT CONSTRUCTION part, the DATA STRUCTURING part, and the DECISION part (Figure 3.15). The first part reads the FEATURE VECTOR memories, constructs a V-ELEMENT and sends it to the DATA STRUCTURING part for further processing. The DATA STRUCTURING part accepts this V-ELEMENT and tries to find a match for it among the stored V-elements in its store. There are two modes of operation for this module: the learning mode, and the recognition mode. If the matching is not successful and the module is in the learning mode the new V-ELEMENT is stored with its character code in a new location in the FEATURE POINTER LIST memory (FEPOLIST) . If the matching is successful and a character code is stored in the same word of FEPOLIST two possibilities arise. The first 88 o u 00 CO ■H o a o H PQ Pi W H H c/3 S o u 2: m M d 00 •H 89 possibility is that the stored character is the same as the new input character code. The next V-ELEMENT is produced and the above processes are repeated. The second possibility is that the input character code is different from the stored character code. In this case, these two codes are decoded into one CONFUMAT word. The address of this word will be stored as a pointer in the corresponding FEPOLIST word. A last possibility arises when the V-ELEMENT matches the feature part in a word in FEPOLIST but the POINTER part addresses a CONFUMAT word. In this case, the new character code is encoded and added to the CONFUMAT word by setting to '1' the corresponding character bit. In the recognition mode, each new V-ELEMENT is matched against all the V-ELEMENTS stored in FEPOLIST. If the matching is successful and a character code is stored in the matched FEPOLIST word, this code will be displayed as the recognized character code. If a pointer is stored instead, the addressed word of CONFUMAT will be read and its contents are sent to the DECISION module. When all the V-ELEMENTS are processed the decision logic will display the character which possesses the maximum matching score. Four control modules supervise the different operations per- formed by the CLASSIFIER. These modules are the A-control, the B-control, the V-control and the C-control. The A-control acts as the master control for the remaining modules as well as supervising the operations of the DATA STRUCTURING part. The B-control complements the operations of the A-control. The V-control supervises the operations of the V-ELEMENTS CONSTRUCTION module. Physically the 90 A-control and the B-control are on two separate cards while the V-control and C-control occupy a third card. In the following sections, the data manipuator and memory modules of the CLASSIFIER will be described separately. The constructs of each module are described and their functions are discussed briefly. This should prepare the reader for the detailed discussion of the CLASSIFIER operations. These operations are distributed among the control modules. The operations of the satellite controls (V-CONTROL, C-CONTROL and B-CONTROL) will be described first. The A-CONTROL which acts as the master control for the CLASSIFIER is described last. This will sum up the description of the CLASSIFIER. 3.12 The V-ELEMENT CONSTRUCTION Module This consists of three parts; the VECTOR ADDRESSING part, the V-SCHEMES MEMORY part and the V-MULTIPLEXORS (Figure 3.16). The V-SCHEMES MEMORY (VMEM) is a writable memory (16 bytes) in which the construction schemes are stored. Each byte, when read in the memory buffer (BV), controls the outputs of the VECTOR ADDRESSING and the V-MULTIPLEXORS parts. The contents of either the TC counter ot the BC counter will address the FEATURE SUBVECTOR memories outputs, some of which are selected to be passed through V-MULTIPLEXORS to R-BUS . A V-ELEMENT constructed by this module consists of three bytes. The first byte is the identification byte (ID); it names the construc- tion scheme used to construct the remaining two bytes, the scanning mode and a segment location tag (contents of BC or TC) . The other two bytes constitute the feature part, which usually consists of a combination 91 HV >■ 77=Z> FEATURE SUBVECTORS MEMORIES DATA OUT II II II II II ' I II ii ii ii 1 1 1 1 ! i !' II II II II II II UPTC >- DWNTC >- CLRTC >- UPBC >- DWNBC >- CLRBC >- HV >- UPAV >- CLRAV >- CLRBV >- LDBV >- VECTOR ADDRESSING PART TOP SEGMENT COUNTER (TO BOTTOM SEGMENT COUNTER (BO TF=& ' > AV 1 V-ELEMENT SCHEMES MEMORY (VMEM) I BV V-SCHEMES MEMORY PART I M 1 V-MULTIPLEXORS PART F-BUS £H= A R-BUS Figure 3.16 V-ELEMENT CONSTRUCTION Module Block Diagram 92 of several elements from different feature subvectors. The feature part will be produced byte by byte using two bytes (one byte each) from VMEM. These construction schemes bytes are addressed by the AV register/counter. The three most significant bits of AV are considered the name of the construction scheme and are connected to the input of the V-MULTIPLEXORS. TC and BC counters will contain at most the maximum number of segments — less than 8. Therefore, the identification byte will contain 3 bits of AV, 3 bits of TC/BC, 1 bit of HV and a bit reserved for storing the type of pointer in FEPOLIST memory. 3.13 The DATA STRUCTURING Part This consists mainly of the FEATURE POINTER LIST (FEPOLIST) memory and the CONFUSION MATRIX (CONFUMAT) memory modules (Figure 3.17), FEPOLIST is a RAM of 256 4 byte wide words; three bytes contain the F-part and the fourth byte contains the P-part. CONFUMAT memory is another RAM of 256 words. In the current implementation, each word has a maximum width of 4 bytes; thus 32 characters can be processed. However, a dip switch on the DECISION module can change the word width from one byte to 4 bytes in one byte increments. Two up-down counters (MF and MC for FEPOLIST and CONFUMAT, respectively) act as memory address registers. LF and LC are two registers which act as top-of-stack pointers to FEPOLIST and CONFUMAT, respectively. When each of their memories is not full, they point to the next available memory location. LF is also useful when searching FEPOLIST by loading its contents into MF and using this as a loop counter. 93 o CO 3 u. o o OUTPUT 2 O ^ ADDRESS INPUT A ft C: t: C CO o cc 3Q 8^ r: uj c V 60 CO ZL OUTPUT ce LU H £ - 3 h co g _i 2 O LU Si t ADDRESS INPUT CO CD jr^ K <* ^^ „ ^^ t- ' o: 1- JO o »— < (VI ro => (- o O D X ~-' ~"* •"^ Q- _l X tL J UJ u. U. li- u. I- 3 UJ Z UJ -1 K or ce: tr 3IJ -Ol o a. iu.nn FEATURE SUBVECTORS 94 IFP and ICN are index counters which address bytes in FEPOLIST and CONFUMAT words, respectively. While IFP addresses a byte in FEPOLIST it also addresses a corresponding register in the register file RF. Each register acts as a memory buffer for a FEPOLIST byte. RF(0), RF(1) and RF(2) hold a V-ELEMENT from the V-ELEMENT CONSTRUCTION part or an F-part from FEPOLIST. RF(3) may hold an input character code or a P-part. The first bit of the first byte of the F-part is the type of pointer bit (TP) . The type of pointer flip-flop (TPF) always holds the TP bit of the currently accessed FEPOLIST word. To write a new TP the contents of TPF is modified first. Then, while writing RF(0) in FEPOLIST its bit-0 is overridden by TPF. In all other occasions the outputs of RF(1), RF(2) and RF(3) are passed to F-BUS without modifications . BCN is the memory buffer register of CONFUMAT. With the aid of CHARACTER INSERT LOGIC, it also plays an important role in encoding a character code into its equivalent bit position before storing it in CONFUMAT. The contents of BCN can also be shifted serially to the DECISION module during recognition. To encode a character, its code is presented on R-BUS. Assuming the code is 5 bits wide, the two most significant bits will be used to address a byte of an addressed CONFUMAT word. They are loaded into the index register ICN. The remaining 3 bits will address a bit in BCN. This bit is set to '1' by the CHARACTER INSERT LOGIC. To complete the encoding operation, the new contents of BCN are written in the byte addressed by ICN in a word addressed by MC. 95 3.14 The DECISION Module Its main components are the INPUT CHARACTER (CHAR) register/ counter, the SCORE ACCUMULATOR (SAC), the SCORE ADDER, the MAXIMUM SCORE REGISTER (MSR) and SCORE COMPARATOR (Figure 3.18). CHAR acts as a register for the input character code or a loop counter for the C- CONTROL, for the case of learning or recognition, respectively. In the second case the initial contents are loaded from a switch which is set to contain the CONFUMAT word width. The same switch controls the length of SAC. The SCORE ACCUMULATOR (SAC) holds the accumulated scores resulted from adding all the CONFUMAT words read during a recognition cycle. This is done, by recirculating SAC and adding a new CONFUMAT word to the previous contents of SAC using the SCORE ADDER. To decide which is the recognized character code, SAC is recirculated while inhibiting the ADDER. Under the C-control super- vision MSR is compared with each cell of SAC. CHAR will keep track of the position of the compared SAC cell. The maximum score is always loaded in MSR while the corresponding character code (CHAR contents) is passed to the R-BUS to be sent through RF(3) to the output character register (CHAROUT) for display. 3.15 The V-CONTROL This controls the operation of the V-ELEMENT CONSTRUCTION module. Upon receiving a request signal from the A-CONTROL a V-ELEMENT is constructed and stored in the RF file registers in the FEPOLIST module— RF(0) , RF(1) and RF(2) . 96 7 oo z h tr — I- z a: _j o < 3 o 0- (/) i SCORE COMPARATOR UJ - fc Q UJ O z O 3 o q: cr o < c \- S * X £ o O K < cn CE o < UJ I o * § M 00 CO •H Q a o PQ 3 O H cyj H O W o oo CO 0) 1-1 60 •H Pn 97 The V-CONTROL has been designed such that the V-ELEMENTS con- structed are the results of 'looking at' the character, represented by its feature subvectors, from two opposite sides; top and bottom or left and right, for horizontal or vertical scanning, respectively. This is done by scanning the subvectors using the TC and BC counters as pointers to both ends of the character. One of them is chosen at a time by the construction schemes to address two 4-bit subvector elements for constructing a V-ELEMENT byte. At the beginning TC contains while BC contains the maximum number minus one of the character segments of the current scanning scheme (horizontal or vertical) . TC is incremented and BC is decre- mented to move these pointers from both ends of the character. For each construction scheme a V-ELEMENT is constructed for each position of these counters. To avoid duplication of V-ELEMENTS, the construction is done until TC and BC point to the middle segment of the character (BC=TC when BC is even or BC=TC+1 when BC is odd) . To try a new construction scheme (a new even value of AV) , the TC and BC are reset to their initial values. V-ELEMENTS are then constructed upon request, using the new construction schemes for each position of TC and BC. When all possible schemes have been tried (when AV overflows) the V-ELEMENT CONSTRUCTION is terminated. The A-CONTROL is notified and the CLASSIFICATION will be terminated as well. The detailed operation of this module is described using the flow chart in Figure 3.19. For each new V-ELEMENT the VMEM buffer (BV) is cleared in order to pass the identification byte to the R-BUS . To 98 write this byte in RF(O) the index register IFP is also cleared. We should remember that two construction schemes bytes are needed to write the feature part in RF(1) and RF(2) . The first byte is addressed by the new AV (even value) while the second byte is addressed by incrementing AV (odd value) . Now, the detailed description of opera- tion follows (Figure 3.19). VO: WAIT The control remains in this state waiting for the enabling signal (ENV) to be low. The control will also go to this state whenever a reset pulse (CLRV) is received or when the END-V is reached and the classifier has completed operation (ENDCLASFY=0) . VI; INITIALIZE To prepare for constructing the identification byte the BV register is cleared making the selector lines of the MULTIPLEXORS low. The HV signals, the TC counter output and the AV address counter output are ready to be passed to the R-BUS. The IFP counter is also reset to so that the identification byte will be stored in location of the register file RF. V2: WRITE R-BUS in RF The MULTIPLEXORS are enabled in this state. The information on the R-BUS is written in RF. If IFP=0 the identification part appears on the R-BUS. Otherwise, one byte of the feature part will appear and will be stored in the RF registers RF(1) or RF(2) according to the contents in IFP. If IFP=0 the address register AV remains the same, -^*- 99 v VO WAIT .1 VI INITIALIZE II V2 WRITE R-BUS IN RF \[ V3 READ VMEM CLEAR TFP V5 RETURN-V " V7 INCREMENT TC RESET BC 8TC i' ve INCREMENT BC DECREMENT TC j£ V10 NEW SCHEME (INCREMENT AV) Vll END-V Figure 3.19 V-CONTROL Flow Chart 100 otherwise, it will be incremented at the end of the state to address the next byte scheme of VMEM. V3; READ VMEM The addressed byte in VMEM is loaded into BV. The IFP counter is also incremented. The completion code for constructing a V-ELEMENT is tested by sampling the outputs of IFP. After the three bytes of V-ELEMENT have been stored in RF (IFP=3) the control transfers to V4. Otherwise, it transfers to V2. V4: CLEAR IFP To begin processing the constructed V-ELEMENT when returning to the classifier A-Control, the IFP counter is cleared. V5 : RETURN-V The V-control signals the A-control of completion of operation. The V-ELEMENT bytes reside in RF waiting for further operation. At the end of V2 the condition of TEB (TC equal BC) is tested and the control is transferred either to V, or V„ according as TEB is or 1. If TEB is 1 this means that the current scheme addressed by the scheme number held in AV has been tried for all TC and BC combina- tions, allowing for TC <_ BC only. In this case the control is trans- ferred to V . In the case of TEB is the branching is to V,. V6: DECREMENT BC, V7: INCREMENT TC BC and TC can be considered as two pointers which point to two opposite segments of the character. They are not allowed to overlap so that the combinations will not be repeated. From this comes the 101 condition that TC _< BC. Coming from state V it has not been determined yet if TC+1=BC. This is to assure that when BC is decre- mented and TC is incremented the condition TC <_ BC still holds. For this reason BC is decremented first with the new value of BC. If they are not equal (TEB=0) TC is incremented in V7 to get a new combination of BC and TC with the condition TC <_ BC. The counter AV is also decremented to return it to the first byte of the current scheme. The V-control goes to the WAIT state V0 waiting for another request from the A-control. And, the current scheme will be used again to process the segments addressed by BC and TC. V8, V9: RESET BC and TC Branching occurs to one of these states in order to return BC to its original value which is 0. This prepares for a new scheme to be tried with new BC and TC combinations. V10: NEW SCHEME (INCREMENT AV) TC is cleared in this state because of the underflow which occurred because of V9. The AV counter is incremented to address the new scheme. However, at the end of the state the AV overflow (CRAV) is tested for completion of all schemes. If all the schemes have not been tried the control branches to V0. Otherwise, it will go to Vll. VI 1 : ENDV The control reaches this state, when there is no further schemes to be tried on the current character in the current scanning mode. 102 V-control remains in this state until a completion signal from A-control (ENDCLASFY) is received. This will make V-control to branch to V0. 3.16 The C-CONTROL The primary purpose of this module is to search the SCORING ACCUMULATOR (SAC) for the maximum score. When the maximum score is found it is stored in the MACIMUM SCORE REGISTER (MSR) and the corresponding character code is passed via the R-BUS to RF(3) and from RF(3) via the F-BUS to the CHAROUT register for display. The description of operation follows (Figure 3.20). CO : WAIT The C-control will remain in this state waiting for an enabling signal from the B-control to go to the INITIALIZE state CI. Resetting the C-control counter will also put the control in this state. CI: INITIALIZE The IF counter is set to contain l's to address RF(3). And, the MAXIMUM SCORE REGISTER (MSR) is cleared to contain 0. C2: LOAD MSR and LOAD RF(3) Branching to this state means that each score in SAC is greater than the score already stored in MSR. MSR is loaded by the new maximum score from SAC and the corresponding character code, held in CHAR counter, is loaded into RF(3). The CHAR counter is decremented at the end of this state. The underflow signal (BRWCHAR) is also tested. If BRWCHAR=0 this means that all SAC contents have been processed and the (enter) h CO WAIT Cl INITIALIZE l' C2 LOAD MSR LOAD RF(3) SHIFT SAC DECREMENT CHAR C5 LOAD CHAROUT 103 C6 CLEAR IFP n END-C Figure 3.20 C-CONTROL Flow Chart 104 control jumps to Cj.. Otherwise, there are some locations in SAC to be tested and the control proceeds to the next state C,,. C3: SHIFT SAC SAC is clocked to shift the next unprocessed location to appear on its output. This output is tested against the current contents of MSR. If SAC output is greater than or equal to the contents of MSR, MSR and RF(3) have to be updated. In this case the control is returned to C2. Otherwise, the control proceeds to C4. C4: DECREMENT CHAR The CHAR counter is decremented at the end of this state and the BRWCHAR is also tested. If BRWCHAR=0 more SAC contents have to be tested and control is returned to C3. Otherwise, testing of SAC has been completed and RF(3) will contain the recognized character code. C5: LOAD CHAROUT To load the character code into CHAROUT register the contents of RF(3) are passed to the F-BUS and CHAROUT is loaded at the end of the state. C6: CLEAR IFP The IFP counter is cleared preparing for the next cycle of the A-control. C7: END-C In this state the RTRN2 signal goes low which makes the A-control proceed from A6 to all indicating a completion of a recognition 105 cycle. The C-control will return to its WAIT state C_ on the edge of the next Master Clock pulse. 3.17 The B-CONTROL In the learning mode, the B-CONTROL is responsible for updating the CONFUMAT memory as well as the type of pointer (TP) and the pointer part (P-part) of the FEPOLIST memory. The details of the these updating operations will be considered in detail when describing the B-CONTROL flow chart. In the recognition mode, the CONFUMAT word, addressed by the P-part of FEPOLIST word, will be shifted bit by bit and added to the contents of the SCORING ACCUMULATOR registers (SAC) . The control is then passed to the C-control to find the character code which obtains the maximum score in SAC. The following is the description of the B-CONTROL operations using the flow chart in Figure 3.21. Note that the control signals of the B-control are active only when an ENB=0 is received from the A-control in A6. BO: READ1 CONFUMAT When this state is active, the A-control is in A6. BCN is loaded with the addressed byte of the CONFUMAT in case of the recognition mode. In the learning mode BCN is cleared to prepare for inserting the appropriate character code in its corresponding bit position of the addressed byte. In the recognition mode, the control transfers to Bl while in the learning mode it transfers to either B9 or B12 according to the value of TP; either 1 or 0, respectively. 106 BO READ-1 CONFUMAT RECOGNITION if Bl— ~B8 ADD TO SCOR! MODE LEARNING B9 INSERT-1 CHARACTER BIO WRITE POINTER IN RF(3) Bll WRITE RF(3) IN FEPOLIST B12 READ- 2 CONFUMAT B13 INSERT -2 CHARACTER B14 UPDATE TP " B15 END- B Figure 3.21 B-CONTROL Flow Chart 107 B1+B8: ADD TO SCORE In this group of states BCN is shifted and its serial output is added to the previous contents of SAC. This process will be repeated by transferring back to BO and entering this group until all the bits of the current CONFUMAT word have been added to the contents of SAC. Then, the control jumps to B15 (END-B) . B9: INSERT1 CHARACTER The B-control branches to this state when the machine is in the learning mode and the information in the pointer byte of FEPOLIST is of the character code type (TP=1) . FEP0LIST(3) and RF(3) are read and compared. If they are equal (FIT=1) this means that the input character code is the same as the stored character code and any other processing done in this state will be ignored. The control jumps to ENDB. In case of FIT=0 more operations will be considered in the following control states beginning with BIO. This state can be considered as a look-ahead state. The idea is to store the codes of the characters represented by FEP0LIST(3) and RF(3) , in a new CONFUMAT word. In the current state the character code which resides in FEP0LIST(3) is put on the R-BUS and R0 and R2 are encoded into the buffer register BCN of CONFUMAT. This corresponds to the least significant 3 bits of the character code. The remaining most significant bits (R3-R4) are loaded into ICN to address the corresponding byte in the CONFUMAT word. To summarize, the end result of B9 is a character encoded in ICN and BCN. As discussed before, the control will proceed to BIO if FIT=0. 108 BIO: WRITE POINTER IN RF(3) The address of the next available word of CONFUMAT is loaded from LC to MC . ICN and MC will address a single byte in CONFUMAT in which the contents of BCN will be written. The new pointer held in LC is also loaded into RF(3). On next clock the control proceeds to Bll. TPF is also cleared in this state indicating that V-ELEMENT is not unique to a particular character anymore. Bll: WRITE RF(3) IN FEPOLIST The new pointer is finally written into FEP0LIST(3). The update flip-flop (UPDF) is set to enable updating LCN later. The control proceeds to B12. B12; READ2 CONFUMAT The B-control enters this state either from BIO or from BO. The input character code, held in CHAR, is used to choose its corresponding byte in the addressed CONFUMAT word by loading ICN. This byte is read into BCN. The control proceeds to B13. B13: INSERT2 CHARACTER The new input character is encoded by setting its corresponding bit in BCN contents. IFP is also cleared preparing for updating the least significant bit of FEPOLIST (0). B14: UPDATE TPF To update FEPOLIST (0), RF(0) is read while its least significant bit is overridden by the contents of TPF. The F-BUS is then written in FEPOLIST(O). To prepare for updating LCN, MC is incremented at the end of this state. 109 B15; END-B This is the end state of the B-control. The control may return to the A-control if the machine is in the learning mode. Otherwise, the C-control will be enabled. The LCN register is also updated by loading the contents of MC only when a new entry has been added in the CONFUMAT when UPDF contains 1 (see Bll) . 3.18 The A-CONTROL Under the supervision of this control a request is issued to the V-control to construct a new V-ELEMENT and store it in the RF register file. When the control returns from V-CONTROL the FEPOLIST memory will be searched for a match between the contents of RF(0), RF(1) and RF(2) and the FEPOLIST F-part. At this point RF(3) will contain the input character code. If the machine is in the learning mode and no match was found, a new entry in FEPOLIST is generated (MARF is already pointing to the top of the memory stack) . The type of pointer flip-flop (TPF) is set to contain '1' indicating a unique V-ELEMENT. This will be inserted in the first bit (bit number 0) of RF(0). In the new memory location in FEPOLIST the RF contents are written. If a match is found and the machine is in the learning mode the control is transferred to the B-CONTROL to complete the task of updating the DATA STRUCTURING memories. In the recognition mode there are two possibilities. The first occurs when the newly constructed V-ELEMENT was not found in FEPOLIST. In this case the control goes back to V-CONTROL to construct another V-ELEMENT. This means that the current V-ELEMENT is ignored. WAIT w Al INITIALIZE A2 GET V-ELEMENT A3 SEARCH FEPOLIST YES [LEARNING A8 PRESET TP A9 WRITE PF IN FEPOLIST AlO INCREMENT LF °^IFC^ 1 " All LOAD MF YES r A15 END-A 110 A4 INITIALIZE ICN LOAD MC YES CONTINUE LEARNING RECOGNITION Figure 3.22 CLASSIFIER Operation Flow Chart (A-CONTROL) Ill The second possibility occurs when a match was found. If the corresponding TP in the matched FEPOLIST word is '1', this means the current V-ELEMENT corresponds to a unique character code which will be considered as the recognized character. Otherwise, when TP=0, the control transfers to the B-CONTROL for further processing. The above is a summary for the operation of the A-CONTROL the details are discussed below with the aid of the flow chart in Figure 3.22. AO : WAIT This is the reset state of the A-control. The A-control will stay in this state until an enable signal (ENA=0) has been received from the M-CONTROL (ENCLASFY=0) . Upon receiving this signal the A-control goes to the INITIALIZE state Al. Al: INITIALIZE Three operations are performed in this state: loading RF by the input character code, initializing the memory address register MF of FEPOLIST and clearing the scheme address register/counter AV of the V-ELEMENT CONSTRUCTION module. The first operation is accomplished by loading IFP with l's, enabling the output of CHAR counter and writing the CHAR contents into the addressed location in RF. This will place the input character code in RF(3) in the case of learning mode. In the recognition mode the initial contents of RF(3) will be ignored. The MF register/counter is loaded with the contents of LF which holds the address of the next available location top of the stack of the FEPOLIST memory. AV is cleared to begin processing of the FEATURE VECTOR with the first scheme of VMEM. 112 A2: GET a V-ELEMENT A request is issued to the V-control to get a new V-ELEMENT by holding down the ENV line. This signal will make the V-control transfer from the WAIT sate (VO) once it gets there. When returning from the V-control the RF registers RF(0) , RF(1) and RF(2) will hold a V-ELEMENT. The A-control will then proceed to state A3. A3; SEARCH FEPOLIST This state and the states A7 and A4 can be viewed as the SEARCHING states. In state A3 a match is tried between the currently addressed register in RF and the elements of the corresponding bytes of FEPOLIST. If a match is found the transfer of control goes to A4. Other- wise, the transfer goes to A7. The TPF flip-flop is always loaded with the least significant bit of the R-BUS when IFP=0. The IFP counter is incremented at the end of this state so that the next byte of the V-ELEMENT will be tried if the control comes back to this state. A4: INITIALIZE ICN ICN is loaded with all l's to prepare for addressing the most significant byte of CONFUMAT in case the control transfers to the B-CONTROL. At the end of this state IFP is tested for completion of matching V-ELEMENT in RF against the contents of FEPOLIST. If the matching has been completed successfully (IFP=3) the control proceeds to A5. Otherwise, the control is transferred back to A3 with a new IFP in order to find a match to the remaining bytes of V-ELEMENT. 113 A5: LOAD MC The pointer byte, is read from FEPOLIST (3) and loaded into MC, If the device is in the recognition mode the pointer byte is also loaded into RF(3) and passed to the F-BUS (RRF=0) to be loaded into CHAROUT (LDCHAROUT=0) at the end of A5. In case of TP=1, this will be the recognized character code, otherwise it will be rewritten in subsequent cycles with the recognized character code. If TP=1 and we are in the recognition mode the current classification ends by transferring the control to END-A state. If TP=0 further processing is needed for both learning and recognition modes. The control proceeds to A6 enabling the B-CONTROL . A6; CONTINUE LEARN/RECOGNITION The A- CONTROL remains in this state until a completion code has been received from the B-control or the C-control. When this happens the control jumps to state All. A7: DECREMENT MF If the matching has failed at the end of state A3 (FIT=0) the control trnsfers to this state. Here the memory address register MF is decremented and the IFP counter is cleared. This will insure that another FEPOLIST word will be tested and the testing will begin by the first byte of the FEPOLIST new word and RF(0) . If the last word of FEPOLIST has been tested (MF=0) the BRW line of MF will go to and branching occurs on the next clock pulse to either A8 or All. If the machine is in RECOGNITION mode (LRN=0) or the FEPOLIST memory is full (FPFULL=1) the next state will be All. Otherwise, the next state will be A8. 114 A8: PRESET TPF This state can be viewed at as an initialization state for the combination of A9 and A10, which are responsible for storing a new entry in FEPOLIST. To do that, MF is loaded with the address of the first available memory location (the top of stack) from the LF register. To indicate that this entry is unique the TP flip-flop preset to 1. A9: WRITE RF IN FEPOLIST The content of TPF is passed to FO of the F-BUS when writing RF(0) into FEPOLIST. In the subsequent passes of A9 the contents of RF(1), RF(2), RF(3) will be written in FEPOLIST (1), (2), (3), respectively. At the end of the state IFP is incremented to address the next byte of FEPOLIST and the corresponding register of RF. A10: INCREMENT IF (STACK POINTER) LF is updated by incrementing MF, which contains the current value of LF, and loading MF back into LF. This is done provided that the FEPOLIST is not full. At the end of A10 the contents of IFP is tested for completion of storage operation. When IFP is 4 the operation has been completed and control proceeds to All. Else, the control transfers back to A9. All: LOAD MF In this state MF is loaded with the address of the next available memory location (LF) in FEPOLIST. If there are more V-ELEMENTs to be extracted and processed (ENDV=1) the next state will be A2. In case of ENDV=0 the classification cycle (LEARNING or RECOGNITION) is completed and the next state will be ENDA (ENDLASFY=0) . 115 A15: END-A (ENDCLASFY=Q) The classification cycle ends here and the control transfers back to the M-CONTROL. 116 4. SUMMARY AND CONCLUSIONS A trainable, real-time character recognition device has been designed and built. The visual feature extraction method was chosen as the most favorable computational approach around which the feature extractor was designed. Human factors were considered in choosing the features and the two-dimensional, binary input form was exploited in devising methods for preprocessing the character image and extracting the local features. A method of segmentation was introduced, in which the character is described as a collection of either horizontal or verticle segments connected by links. Segments may consist of several disconnected subsegments, each of which may contain one or more local features (nodes) . Links connect nodes of a segment to neighboring segments. For the current implementation, minimum descriptions of the constituent parts of the character — segments, links and nodes — were considered. The existence of segments and subsegments, the number of links and their distribution within sub- segments and the number of nodes in each subsegment constitute the feature vector. This multiple description technique made it possible to reduce the number of training samples and to use a simple data structure for the classification dictionary as well as a simple maximum fit decision criterion. The introduction of compound features (V-ELEMENTS) allowed the number of the training samples to be reduced still further. 117 Moreover, the programmability of the V-ELEMENT construction schemes gives the user more flexibility in enhancing the recognition rate by choosing the feature combinations and writing their schemes in the writable memory VMEM. To reduce the storage required for the classification dictionary, the DATA STRUCTURING part of the CLASSIFIER was designed to include some general properties that deal with the relationships between features (V- ELEMENTS) and character classes. A V-ELEMENT may appear in several character classes which can be grouped and encoded as a CONFUMAT word addressed by a pointer associated with the V-ELEMENT in FEPOLIST. In this way, the storage of the true character codes were avoided which resulted in reduction of storage. On the other hand, if a V-ELEMENT is unique to a certain character a flag bit will indicate that the pointer part is the character code. In this case a CONFUMAT word is saved and the recognition is speeded up. The CLASSIFIER memories, CONFUMAT and FEPOLIST, were also designed to make it easy to add new character classes and even to introduce new features not detected in previous character samples. CONFUMAT can be expanded horizontally, bit-wise, to accommodate more character classes and FEPOLIST can be expanded vertically, word-wise, to store more features. In addition, the encoding of character classes as CONFUAMT words made it possible to implement simple maximum fit decision criterion. The input image as well as the extracted features are stored in two-dimensional registers (WORKING STORAGE) organized in such a way as to be easily accessed in both the x and y directions. This 118 organization, together with the facility of scanning made the pre- processing of the image plane and the extraction of different features extremely straightforward, easy to implement and reasonably fast. Minimal size, 3x3 patterns were used not only in preprocessing the input image but also in extracting the nodes. Table look-up techniques were used to perform these operations by using ROMs for more efficient storage of the necessary patterns and for processing speed up. Because of the way people draw line-like patterns, and because of quantization noise and the small size of the windows used in node extraction, not all the extracted nodes are necessary for representing the drawn character. MERGING RULES were therefore devised for operating on node memory contents to eliminate superfluous nodes and modifying others. The careful selection of these rules enabled them to be applied to only two neighboring cells at a time using a special scanning facility and ROM addressing while scanning the WORKING STOAGE. Nodal extraction using 3x3 windows combined with the merging operations on two neighboring nodal cells made an elegant, but nevertheless powerful method for detecting local features in line-like patterns. It should also be noted that all the operations involved in preprocessing and feature extraction as well as constructing the feature subvectors are done by simple raster scanning of the WORKING STORAGE memories without resorting to contour following or back- tracking. In addition, no mathematical computations were involved in any of these operations other than simple ROM addressing using small size memories. These techniques of simple raster scanning and 119 ROM addressing simplified the control and rendered the implementation more elegant than in the other approaches described earlier. Because of the unavailability of a writing tablet and hard- ware reliability problems, statistical studies could not be made. Otherwise, detailed data could have been obtained on the effect on the recognition rate of various parameters, e.g. character classes and their numbers, number of training samples per class, V-ELEMENT construction schemes, etc. However, the device as it stands now shows the validity of the general approach and the power of new methods of processing and hardware organization. Immediate enhancements could be made to the device to achieve a higher recognition rate. One is adding a module which forms the local feature subvector NODEV and changing the V-ELEMENT construction schemes to reflect that. Expanding the FEPOLIST memory might also be desirable to accommodate more feature compounds and hence more character classes. Currently the construction schemes are tried separately for horizontal or vertical segmentation. Possible improvement would be to have new construction schemes which construct compounds of features from different horizontal and vertical segments. This would require both wider VMEM and FEPOLIST words. The decision criterion was based on simple matching and counting the number of successful matchings. Another improvement might be the introduction of higher level feature compounds consisting of combinations of the previously addressed pointers. These new compounds would be stored in turn with a new set of pointers to CONFUMAT. This process can be iterated to 120 any higher level wanted. The technique could enhance the recognition rate tremendously and reduce the training samples still further. Another improvement could be the addition of an EXCEPTION MATRIX, organized similarly to CONFUMAT, to be used in erasing the mistakes otherwise made in recognition. At the time this project was envisioned (1973), large scale integrated (LSI) chips, such as microprocessors and bit-sliced microprocessors, were not readily available and, even two years later, were rather expensive. But now, in 1978, they are available at affordable prices. If one were to redesign the whole system, use would be made of these LSI chips and others, e.g. programmed logic arrays, in implementing the device. For example, microsequencers and PLAs could be used instead of the random logic circuits which were used in implementing the controls of the preprocessor and the feature extractor. The MASTER control could be implemented as a microcomputer and a slave microcomputer could take over the functions of the CLASSIFIER. However, because of the current state of the art of technology, speed and cost, this system would be slower and more expensive than the current system. On the other hand, reliability would be higher and the system would gain in terms of versatility and flexibility. But, the shift in implementation from hardware to software might restrict the ability to investigate architecture more suitable to the character recognition problem. Still further in the future is the prospect of VLSI content addressable memories. This type of architecture would be ideal for the pattern recognition problem in general including character 121 recognition. Notice, for example, the CLASSIFIER and how its opera- tions, search and insert by content, are exactly the operations of content addressable memories. If only a part could be implemented directly in LSI, it would be the WORKING STORAGE. This is because of its simple regular cell structure. One might also envision a WORKING STORAGE implemented with CCD technology as an integral part of the retina of a CCD camera. Besides meeting the challenge of implementing a trainable, real-time character recognition machine, this project has introduced new powerful techniques which have opened the way for future improvements and for new possibilities in designing a character recognition machine. 122 REFERENCES [1] Harmon, L. D. , "Automatic Recognition of Print and Script," Proc. of IEEE , Vol. 60, No. 10, pp. 1165-1176, Oct. 1972. [2] Rosenfeld, A., "Picture Processing: 1974," Computer Graphics and Image Processing , Vol. 4, No. 2, pp. 133-155, June 1975. [3] , "Picture Processing: 1975," Computer Graphics and Image Processing , Vol. 5, No. 2, pp. 215-237, June 1976. [4] , "Picture Processing: 1976," Computer Graphics and Image Processing , Vol. 6, No. 2, pp. 157-183, April 1977. [5] Miller, G. M. , "On-line Recognition of Hand-generated Symbols," FJCC , 1969, pp. 399-412. [6] Kwon, S. K. and Lai, D. C, "Recognition Experiments with Hand- printed Numerals," 1976 Joint Workshop on Pattern Recognition and Artificial Intelligence, pp. 74-83. [7] Bledsoe, W. W. and Browning, I., "Pattern Recognition and Reading by Machine," 1959 Easter Joint Computer Conference, pp. 225-232. [8] Fairhurst, M. C. and Stonham, T. J., "A Classification System for Alphanumeric Characters Based on Learning Network Techniques," Digital Process 2 (1976) 3321. [9] Young, T. Y. and Calvert, T. W., Classification, Estimation and Pattern Recognition , American Elsevier, New York (1974). [101 Wendling, S. and Stamon, G., "Hadamard and Haar Transforms and Their Power Spectra in Character Recognition," 1976 J. Workshop on P.R. and A.I., pp. 103-112. [11] Tucker, N. D. and Evans, F. C, "A Two-Step Strategy for Character Recognition Using Geometrical Moments," Second International Joint Conference on Pattern Recognition, August 1974, pp. 223-225. [12] Berthod, M. and Maroy, J. P., "Morphological Features and Sequential Information in Real-time Handwriting Recognition," Second International Joint Conference on Pattern Recognition, August 1974, pp. 358-363. [13] Watt, A. H. and Beurle, R. L. , "Recognition of Handprinted Numerals Reduced to Graph-representable Form," Second Intl. Joint Conf. on A.I., September 1971. [34] Attneave, A., "Some Informational Aspects of Visual Perception," Psychological Review , No. 61, pp. 183-193 (1954). 123 [15] Genchi, H. , Mori, K. , Watanable, S. and Katasuragi, S., "Recognition of Handwritten Numerical Characters for Automatic Letter Sorting," Proc. IEEE , Vol. 56, No. 8, pp. 12992-1301, August, 1968. [16] Karpinski, J. and Michalski, R. , "A Recognition System for Alphanumeric Characters," Proceedings of the Institute for Automatic Control, Polish Academy of Sciences, No. 35, 1966, Warsaw (in Polish) . [17] Freeman, J., "Survey - The Modelling of Spatial Relations," Computer Graphics and Image Processing , Vol. 4, No. 2, pp. 156-171, June, 1975. [18] Arps, R. B., "Entropy of Printed Matter at the Threshold of Legibility for Efficient Encoding in Digital Image Processing," Technical Report No. 69-36251, Stanford University, 1969. 124 APPENDIX A OPERATION AND EXPERIMENTS 125 A.l Initialization This is the process of initializing the writable memories (FEPOLIST and CONFUMAT) of the DATA STRUCTURING part of INCOM CLASSIFIER. This process is done after turning on the machine or reprogramming the V-ELEMENT CONSTRUCTION memory (VMEM) , or when changing the character set to be recognized by the machine. CONFUMAT is initialized by clearing its contents, by setting the INPUT switches to O's and the INIT-C switch on the B-CONTROL card to CLR-C position. Similarly, to initialize FEPOLIST the INPUT switches are set to O's and INIT-FP switch on the A-CONTROL card is set to CLR-F position. In either case, setting the INIT switch to the CLR position will enable corresponding circuits on the A-CONTROL or the B-CONTROL cards to produce the necessary signals to increment memory address counters and the index registers as well as writing pulses to the corresponding memories. This will result in addressing the memory bytes sequentially and writing the INPUT switches data in them. A. 2 Programming VMEM This is the process of writing new V-ELEMENT CONSTRUCTION SCHEMES in VMEM or modifying old ones. The contents of VMEM (16 bytes) can be examined or rewritten one byte at a time. As in the INITIAL- IZATION process, the PROGRAMMING is done when the machine is in the INITIALIZE state (Ml) . To program VMEM the OPERATION SELECTION 126 switch is set in the PROGRAM position. To examine a VMEM byte, the INPUT switch must be in the ADDRESSES position and the right most four switches of the INPUT group is set to indicate the byte address. Upon depressing the DEPOSIT switch the contents of the examined byte will be displayed in the DATA OUT group. To rewrite the contents of a byte, the examine step is performed first followed by setting the INPUT switch to DATA and positioning the DATA INPUT switches to reflect the new data. To complete the writing step, the DEPOSIT switch is depressed. The signals necessary for the above operations are generated on the CLOCK and PROGRAMMING module (see Appendix B) . A. 3 Drawing Characters Under the STORE CONTROL LOGIC (S-CONTROL) (see Appendix B) INCOM can accept a series of points either automatically by communicating with a tablet-digitizer or manually by the user in form of ENTER! commands. In either case the machine must be in DRAW state (M2) and the CONTACT/END switch in the CONTACT position. To terminate the drawing operation the switch is set in END position. Each accepted point is stored in its corresponding cell in IMAGEM, which contents appear on the CRT display during the DRAW state. Because of the unavailability of a graphics tablet, only the manual form of input is used using the SIMULATED TABLET SIGNALS switch group. When X-Y COORDS LED is on, a curser point appears on the CRT screen. The curser scans contents of IMAGEM horizontally. It can also be positioned on any horizontal line by pressing the DEPOSIT button. To input a point the user presses the ENTER! switch when 127 the curser coincides with the required point position on the screen. To input a character the user can follow the following steps: i. Draw the character on a transparent sheet, ii. Overlay the sheet on the CRT screen and trace the drawing point by point by pressing ENTER! switch whever the curser coincides with the drawing, iii. When the drawing process is finished position the CONTACT/END switch in the END position. This will let the machine begin processing the input image. A. 4 Example of V-ELEMENTS Construction Schemes As discussed before V-ELEMENTS are constructed using construction schemes which have been written in VMEM while initializing the machine. Because of the experimental nature of INCOM, we have chosen to let VMEM programming accessible to the user. We may recall that the construction schemes are used in both scanning modes. We may also recall that a construction scheme is stored as two bytes in VMEM; each byte will be used in constructing a byte of the feature part of the V-ELEMENT. In other words each feature byte can be constructed independently of the other one. But we have to remember that TC or BC will remain the same for either byte. Furthermore, the V-CONTROL was designed in such a way that TC will scan half of the character representation from one end while BC scans the other half from the opposite end. The construction schemes can vary from the simple to the complex and from the general to the more specific. A simple scheme may result in constructing a V-ELEMENT consisting of elements of different feature 128 subvectors which describe a particular segment. A more complex scheme results in combining several elements from one subvector but belonging to two segments addressed by the top (left) segment counter (TC) and the bottom (right) segment counter (BC) . Since no more than eight different schemes can be stored in VMEM, only schemes which combine two or more subvectors are considered. A possible set of construction schemes are shown in Table A.l. They are ordered hierarchically from the general to the specific and from the simple to the complex. The first four construction pairs construct V-ELEMENTS using elements from different subvectors of the same segment addressed by either TC or BC counters (used as subscripts in the table). The first two pairs use the LINKV subvectors, ILINKV and OLINKV. The first pair scans the character from the top (left) using TC while the second scans it from the bottom (right) using BC. The next two pairs use the three subvectors, ILINKV, OLINKV and TABV. Hence, the resulting V-ELEMENTS are more complex and more specific than the previous. The remaining four pairs will construct more complex and more specific V-ELEMENTS than the first four. They construct V-ELEMENTS using two or more subvectors of two different segments, one addressed by TC and the other by BC. The first pair of the last four uses the LINKV subvectors, the second pair uses ILINKV and TABV and the third uses OLINKV and TABV. The most complex of them all is the last pair which constructs V-ELEMENTS using OLINKV and TABV of a top (left) segment and ILINKV and TABV of a bottom (right) segment. 129 Table A.l V-ELEMENTS CONSTRUCTION SCHEMES ADDRESSES DATA SCHEMES 1 0100 0100 0100 0100 OLINKV(TC), ILINKV(TC) 2 3 0100 1100 0100 1100 OLINKV(BC), ILINKV(BC) 4 5 0100 0100 0010 0010 OLINKV(TC), ILINKV(TC) TABV(TC), TABV(TC) 6 7 0100 1100 0010 1010 OLINKV(BC), ILINKV(BC) TABV(BC), TABV(BC) 8 9 0100 0100 0100 1100 OLINKV(TC), ILINKV(TC) OLINKV(BC), ILINKV(BC) 10 11 0010 0100 0010 1100 TABV(TC), ILINKV(TC) TABV(BC), ILINKV(BC) 12 13 t 0100 0010 0100 1010 OLINKV(TC), TABV(TC) OLINKV(BC), TABV(BC) 14 15 0100 0010 0010 1100 OLINKV(TC) , TABV(TC) TABV(BC), ILINKV(BC) 130 A. 5 Experiments in Recognition After programming VMEM with the construction schemes of Table A.l the machine was trained to recognize two different classes 'A' and 'R' using only one training character for each class (Figure A.l. a). Several testing characters were presented to INCOM for recognition. Examples of the correctly recognized characters and misrecognized are shown in Figure A.l. The testing characters were intentionally drawn distorted and noisy to show the recognition power of INCOM. In real-life situation characters the user draws should not be of such variety once he trained the machine to recognize his characters. The two misrecognized characters were intended to be members of class T R' . However, looking closely these characters might also be misrecognized by humans as 'A's. Although one training character was used for each class, INCOM recognized quite a variety of character shapes using only the three subvectors ILINKV, OLINKV and TABV. A R 131 a) Training Characters ft R ft R b) Recognized Characters B R c) Misrecognized Characters ('A' instead of 'R') Figure A.l Example of Recognition 132 APPENDIX B INCOM CIRCUIT DIAGRAMS 133 LIST OF FIGURES Figure B.l The PANEL Cards B.2 CLOCK and PROGRAMMING Card B.3 WORKING STORAGE Plane B.4 SKEW/TRANSFER Module B.5 PRELOG Module B.6 MERGER Module B.7 MASTER and CONTROL Card B.8 TEMPORARY SEGMENT Module B.9 FEATURE SUBVECTORS MEMORIES Module B.10 F-CONTROL Card B.ll V-ELEMENT CONSTRUCTION Module B.12 FEATURE POINTER LIST Module B.13 CONFUSION MATRIX Module B.14 V-CONTROL Card B.15 A-CONTROL Card B.16 B-CONTROL Card B.17 DECISION Module B.18 C-CONTROL Card 134 _Tl.TABl.ET > SIMULATED ■ TABLET SWITCHES Figure B.l The PANEL Cards 2RST 20UT 10UT 1RST 2DIS 10IS DUAL TIMER NE596 2TRG 1TRG 2CRL 1CRL 2THD :th[) t 00 ' t 01 -±z 001„f 135 CLOCK _FL F-BUS Figure B.2 CLOCK and PROGRAMMING Card 1H0 IM1 IH2 IH3 IH4 IH5 IH6 IH7 me ;— IM9 >- IniO >- IM11 >- IH12 :- IM1S >- IHK >- O - Y © 8 I © I © = U © © © i i V 5 1 © © © rirlr © © © Oil I i 3 Q r r 136 -» OMO.IMWO -»• 0H1.1HW1 -•- 0M2 ,IHW2 -»• 0HS.IHW3 -»■ 0M4.IMW4 -»• OH5.IMW5 -•• 0H6.1HWS -•■ 0H7.INW7 0H8 ,IHW8 ■ 0H9 ,IHW9 OHIO.IMWIO 0H11.1HW11 OH1J.IHW12 OM15.IMW13 ' OHM ,IHW14 OH1S.IHW15 IHW1 >- IHW2 >- IM*3 >- 1HW4 >- IHW5 >- IHW6 >- IHW7 >- IHW6) — IHW 9 >- IHW 10 >— IHW 11 >— IHW 12 >— IHW 13 >— IHW 14 >— IHW 15 > — ALL lC'i ARE 74198 » COMMON SO COMMON SI COMMON CK COMMON CLR COMMON GNO COMMON +5 V G a) First (Second) Module 1 V /.AAA ♦ ¥ t !! ♦ ♦ ♦ YYYY CL -, Ol (M IO 3 S te m o z, OOO II I 1 II SR 0, © -Si © © r " OH-t TP A f « * M ° OHWO ' 0HW1 -» ohwb -» 0HW9 -» 0HW11 -» 0HW1Z -■• 0HW13 -•» 0HW14 -t» 0HW15 ALL IC't ARE 74l98'« b) Window Module COMMON SO COMMON SI COMMON CK COMMON CLR COMMON GNO COMMON +5V. Figure B.3 WORKING STORAGE Plane K >- Io >- II >~t IC IC, ,, ic 2 IC, 2C ZC, 2- 2C 2 2C 5 1G 2G A B 5T Is >~t k, ; 14 >-i K, >— — t SK C SK, ^T K e > I. > K, > 'SK, •SK, ^T 'SK, •SK lr > K. > I. > X^ •SK 6 •SK, 7 5< , 5 1 3301) OKFW > HV > 75451 to CLRW >■ CKW >- 4> ■JO- 75452 *S > WS, > 4> ALL IC'» ARE 74153 I, >-t K.„> Iio> IC, 2C «C, 2C 2 10 20 A B ^r '.1 >-r K, 2 >■ 1,2 >-r K„> •SK, pT in>-t K„> Il«>"t K„> •SK, ^T Iis> K, k > 1.6 5- •SK, •SK,, I5T •SK.„ •SK, 0CLRW (39) 0CKW (40) -* WS (4l) -* «3i 42 137 Figure B.4 SKEW/TRANSFER Modul* CSRI > "mnex •• ZCURlSOR Figure B.5 PRELOG Module 139 O t A, K (CPT) 74 298 Oo Qt -h -» _ ■ < I ICj 74153 7^ 4A 21 741S7 IB !■ 7 MERGE SEOUfNCING F- =Pi :0 Oi CKSCAN >- EMERGE > 1 4 ^3=r MERGE DICTIONARY I± A 21 74157 (CP?> 74298 0, At Oj -0 2Ci £ MERGE ADDHrSSING A COPYING REGISTERS ENEX >- Figure B.6 MERGER Module 140 PRHV, INITIALIZE [ [p"") k ENTER STORE CONTROL Figure B.7 MASTER and CONTROL Card CLRTSV >■ 7404 CKTSV > INSO > ~q> IMSKO >- WSPN V NODO >- NODI >- N0D2 >- N0D3 >- I0L >- WIL >- 7410 X _ D> 7404_ 7402 I> S 7402 s s s D MCi457CP VARIABLE LENGTH REGISTERS CD4009A CK CE R A A/B (sepT) d — IV SPV MSPV . CK CE R L[ A/B (NOD B G -o- NODVO D A » CE R L, a/b Cnodvi" d tl CK CE R L, A A/B (NOD B L3 MNODVO N0DV1 CK CE R L, A * *'B (N0DV3J — 1>- 7402 7$> £[> ■fe: 7404 d -♦ MN0DV2 N0DV3 i-» MN0DV3 & CK CE R L, Lg. 8 Li L 2 Lj L, L, Lg 1 [>-T ► ILV JJJJJJ □ 4>*n 7404 LENGTH -OF - REGISTER SWITCH ~L ■OLV 141 MNODVl N0DV2 7432 -»• MOUT Figure B.8 TEMPORARY SEGMENT Module — cD- :a e 2 >- r: « > — — ■ — — — G, G, INPUT LINK ENCODING LOGIC S MEMORY t> A Ai A, Aj 9 9 9 9 »o *i «i *j 0) 0| 0j o. ® YYYY 7404 ® Si 74194 cm °a 0» °c a t> 142 A A, A, A, YYYY —ct> rre^ nHO, no. a *T ftftfl TAB ENCODING LOGIC 8 MEMORY o A A, A, A, 9 9 9 9 «„ a «J D 1 P» D > °4 7489 0, 2 Oj 4 C=D YYYY 7404 u I CH) 74194 CL» q » "■ Q c °o Figure B.9 FEATURE SUBVECTORS MEMORIES Module ILV >- OLV >- STRB >— I » NOD ^ r I 1 I 1 1 $ ■=0-rO 'Pt^ n r X" T i? O ] 2 J 4 » 6 7 '«|5I 1C 2C B A o a a r o cl «-CRU id a B C « Ci» raxrr n i i i i toi> r> ^=> X 2/3 X 74MU !z[7>. ^D-=^> L>-UU>> -O-l >- ^n • 2M l_|,0 7474 T5 I ' '10 1C ," I 2CLK 5 x f If \ 143 SCANNING COUNTERS ' SEGMENTING F/F SCANNING MODE F/F -»• F. -» UPTC CSMIOT WSPN ■ WMIOT UPB WIL Figure B.10 F-CONTROL Card ~* ■— «_> *f. "^ tJ* 144 145 3 O s ps w H z M o p* w PS :=> H 3 PQ 0) M 3 60 146 3 O X! M Pi H S3 O i— I En o en PQ 00 CLOCK > 147 ENOCL»SFY> s7RB> Figure B.14 V-CONTROL Card 148 >-Hi. ul -J o I 1 o ' J -1 => -■ z i K _i ' U. 1 J J Ol 5 «t *- F £ * \i 1^ H 1 z J ; l! u. 1 i ' i 1-1 CO u § H S3 O u I < m fa u. _| * ° •" » j 5 S * D U. H o- a E 149 i 1 MASKO I L A>B A.B ACB *J IN *3 a 4 7485 B| B 2 Bj B< OUT 'A>B A=B A '2 ^ F 3 > 7485 A=B R 6 o r« * 1 3 ^ r 6 > r 7 <■* FEATURE COMPARATOR NC Inc 150 FIT LRN >- ECKSAC > CKSAC > CLRSAC>- AOD >- C 7 >- -KH -Kr- LENGTM -OF -SAC SWITCH -o CK CE L g Lj L 4 L 3 (SAC) 6x MC14557CP VARIABLE -LENGTH SHIFT REGISTERS. WITH COMMON CONTROLS EACH Qj IS FEDBACK TO Bj Q, CD4009 ->- ^OSAC _r (MSR) 74174 5D SO CK _t>_ CLR -Tf— ,A<8 A=B A>B 7485 £ r F B« OUT 'A3' ^AB , 7485 Bj B 2 Bj B < OUT 'AB' CKMSR >- Cl.RMSR >- ■*■ SGM -»• SEM ■*■ SLM DECISION SCORING LOGIC Figure B.17 DECISION Module 151 O h- z o o g u LlI Q u CO U i-J o PC! H Z o c_> I 00 I— I pi. v-l 3 60 152 VITA Mohamed Taher El-Sonni was born in Alexandria, Egypt on May 12, 1944. He received his Bachelor of Science degree in Electrical Engineering (Communications Section) with grade Distinction, First Class of Honor from Alexandria University, Alexandria, Egypt in June 1966. He was a teaching assistant in the Electrical Engineering Department in the same university from October 1966 until February 1968. After working with the Broadcasting Services in Libya, he got a teaching assistantship from the Alfateh University in Tripoli, Libya in February 1969. In September 1969, he joined the University of Illinois at Urbana-Champaign under a graduate study scholarship from the Libyan government. During the period from January 1972 until May 1975 he had served as research and teaching assistant. He is an associate member of Sigma Xi and a student member of the Institute of Electrical and Electronics Engineers. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-78-944 4. Title and Subtitle TRAINABLE CHARACTER RECOGNITION INTERFACE COMPUTER (INCOM) 3. Recipient's Accession No. 5. Report Date Oct. 1978 7. Author(s) Mohamed Taher Abdalla El-Sonni 8. Performing Organization Rept. No UIUCDCS-R-78-944 9. Performing Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. 12. Sponsoring Organization Name and Address Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 13. Type of Report & Period Covered Ph.D. Thesis 14. 15. Supplementary Notes 16. Abstracts A trainable, real-time character recognition device has been designed and built. The visual feature extraction method is chosen as the most favorable computational approach around which the feature extractor is designed. A new method of local feature extraction is presented which uses a minimal size window and simple raster scanning of the character image. Merging rules are devised to reduce the number of these features by merging two features at a time. A method of dynamic segmenta- tion of the character representations is introduced, in which the character is described as a collection of horizontal or vertical segments connected by links. A multiple description technique of the character segments make it possible to reduce the number of training characters and to use a simple data structure for the classifi- cation dictionary as well as a simple decision criterion for recognition. Experiments show the power of the described techniques. 17. Key Words and Document Analysis. 17o. Descriptors Character Recognition Trainable Machine Feature Extraction Node Extraction Dynamic Segmentation Scanning-Windowing Schemes 17b. Identifiers/Open-Ended Te 7c. COSATI Field/Group Availability Statement Unlimited ORM NTIS-15 (10-701 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 152 22. P USCOMM-DC 40329-P7I FFR 8 1979 u ■ J H 0i m