LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 51084 Ittbr no. 233-300 cop.£ CENTRAL CIRCULATION AND BOOKSTACKS The person borrowing this material is re- sponsible for its renewal or return before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each non-returned or lost item. Thsft, mutilation, or defacement of library material! can be causes for student disciplinary action. All materials owned by the University of Illinois Library are the property of the State of Illinois and are protected by Article 16B of Illinois Criminal law and Procedure. TO RENEW, CALL (217) 333-8400. University of Illinois Library at Urbana-Champaign 'JUN2 81999 r£ 8 1 A,t1, When renewing by phone, below previous due date. write new due date L162 Digitized by the Internet Archive in 2013 http://archive.org/details/stringprocessing299heim > Report No. 299 STRING PROCESSING ON A PARALLEL COMPUTER by Walter L. Heimerdinger THE LIBRARY OF THE. UNIVEKSITY OF ILLINOIS. January 13, I969 ILLIAC IV Document No. 162 Report No. 299 STRING PROCESSING ON A PARALLEL COMPUTER* by Walter L. He imer dinger January 13, 1969 Department of Computer Science University of Illinois Urbana, Illinois 61801 *This work was supported in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. US AF 30(602)4144 and submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering, February, 1969. ACKNOWLEDGMENT The author wishes to express his gratitude to Professor S. R. Ray for his guidance, encouragement, and abundant patience, iii TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. THE ILLIAC IV MACHINE 4 2 . 1 The Quadrant 4 2.2 The Processing Element 6 2.3 The Control Unit 10 2.4 Memory Addressing 14 3. RECOGNIZER SPECIFICATIONS 17 3.1 The Source Language 17 3.2 Internal Identifiers 18 3.3 Tokens 20 3.4 Sections 21 4. THE CHARACTER CLASSIFIER 24 4.1 The Classification Process 24 4.2 The Control String 27 5. THE SECTION BUILDER 29 5.1 The Section Builder Process 29 6. THE SECTION JOINER 33 6.1 The Section Joiner Process 33 6.2 Character Count Determination 34 6.3 Symbol Tail Concatenation 35 7. TABLE MAINTENANCE 40 7.1 Table Use 40 7.2 The Reserved Word Table 40 7.3 The Main Token Table 41 8. RESULTS 46 8.1 Execution Time Data 46 REFERENCES 51 APPENDIX 52 iv 1. INTRODUCTION The name "computer" immediately associates the digital computer with numerical calculations. However, a significant portion of digital computer time is used for non-numerical tasks such as string processing. Probably the most common programs that use string processing are pro- gramming language translators or compilers. Since the string processing operations used in these applications are well known, they are useful for comparing alternative string processing implementations. A compiler may conveniently be divided into two major portions, the recognizer and the analyzer. The recognizer scans the input string of source language characters and recognizes these as a series of tokens (identifiers, numbers, etc.). The analyzer then uses informa- tion generated by the recognizer to determine the structure of the source program and to generate the required object code. Since the structure of the program to be translated depends only upon the syntactic units used and their order of appearance, the syntax analyzer does not need the actual name or value of each token. In fact, many translators use the recognizer to rename tokens with internal identi- fiers or pointers before passing them to the analyzer. If the internal identifiers or pointers are coded to indicate the classifi- cation of the token, then the analyzer has all of the information it needs in a compact, well-defined form. In such a system, the recognizer converts the string of sparsely and irregularly distributed source language characters into a new string of compact, standardized token identifiers or pointers. All translators must maintain a table of identifier tokens encountered. Since, in many cases, the 2 internal identifiers are merely pointers to the table location occupied by the original token, the maintenance of token tables is a natural task for a recognizer. A conventional recognizer usually works with only one symbol at a time. Since the required operations are often simple and repetitive, we might ask if a parallel machine capable of operating on several characters simultaneously could be efficiently used to speed up the process. The first large scale implementation of a parallel computer is the Illiac IV computer which is currently under construc- tion. In this machine, the single accumulator of the Von Neumann computer is replaced by multiple accumulators, each having separate subsidiary registers, arithmetic hardware, and memory modules. The machine was designed for arithmetic problems that use vectors, matrices, or meshes, but it has some features that are useful for string processing. The operations required for the translation of a modern fie Id -independent programming language such as ALGOL or PL/ I are a challenge to implement on a parallel machine because the basic source program elements (tokens) such as numbers or identifiers vary in length and separation. Furthermore, the treatment required for a given token depends not only upon the identity of the token, but also upon the context in which it appears. Ordinarily, the string of input symbols is scanned from left to right one character at a time to insure that all of the information needed is available at each step. Thus the interpretation of a group of symbols such as A + B is changed by the insertion of the string delimiter, ", in the input 3 string ahead of the group. Despite such obstacles, it will be shown that a recognizer can be turned sufficiently "inside out" to obtain reasonable efficiency with a parallel computer such as Illiac IV. 2. THE ILLIAC IV MACHINE 2.1 The Quadrant The Illiac IV system is built around four identical arrays known as quadrants . Each quadrant contains 64 processing units (P.U.s) operating under the control of a single control unit (C.U.) in the arrangement shown in figure 1. The construction of an Illiac IV quadrant can be visualized by considering a conventional computer arrangement consisting of an arithmetic/ logical section for transforming or comparing operands, a control section with the instruction decoding and sequencing hard- ware, and a memory section. A quadrant essentially replicates the arithmetic/logical section and the memory section of the conventional computer 64 times, but retains a single control section. In the single quadrant array mode of operation, in which the quadrants act independently, the 64 arithmetic/ logical units, named processing elements or P E.s, are arranged in a string with numbers from to 63. Thus, each element has an adjacent higher neighbor and an adjacent lower neighbor, with P.E. acting as the higher neighbor for P.E. 63. Single words may be transferred between neighboring P. E.s, a process referred to as "routing." Additional routing paths are provided between each P.E. and the P.E.s located eight positions higher and eight positions lower in the string, so each P.E. can communicate directly with four other P.E.s in the quadrant. Connections Co Other Control Units and Input /Output Controller CONTROL UNIT (Instruction decoding /sequencing! and address indexing) To & from P.E. #63 1 C.U. BUFFER (8 words) To & from P.E. #56 PROCESSING bLiii4ENT #0 (Arithmetic & logical operations) PROCESSING UNIT #0 4 T » P.E. MEMORY #0 To & from P.E. #8 To & from P. E. j&7 PROCESSING ELEMENT PROCESSING UNIT #1 4 ¥ To & from P.E. #9 P.E. MEMORY #1 64 -bit Common Data Bus plus about 260 control lines to P.E. 8 To & from P.E. Jb5 PROCESSING ELEMENT #63 PROCESSING UNIT #63 > X To & from P.E. #0 To & from P.E. #7 *■• <- / Routing Connections 1024 bit bus to & from Input /Output Switch Figure 1 - Illiac IV Quadrant 2.2 The Processing Element The processing element is a 64 bit unit with the arithmetic equipment of a large-scale floating point computer. As shown in figure 2, the P.E. contains an accumulator (register A), a register to hold the second operand for binary operations or to act as an extension to register A for double length operands (register B) , a temporary storage register (register S), and a register provided with the connections for routing data to or from one of four other P.E.s (register R). In addition to the aforementioned 64 bit registers, the P.E. has a 16-bit address register (register X) and a separate adder mechanism for address arithmetic. Address arithmetic is per- formed modulo 2 , but the memory accessing hardware uses only the 11 least significant bits of the P.E. address field. The P.E. 8 can be set to treat the 64 bit register contents either as 64-bit operands or as pairs of "inner" and "outer" 32-bit operands. Thus a single instruction may cause 64 simultaneous 64-bit operations or 128 simultaneous 3 2 -bit operations in a quadrant. All but a few of the P.E. instructions are applicable to both 64-bit and 32-bit operands. The word size is changed by a control unit instruction which sets all P.E.s of the quadrant to the selected word size. There is a small set of P.E. instructions that allow simul- taneous addition or subtraction of the eight 8-bit fields or "bytes" of the 64 -bit word. These instructions are unaffected by the word size state. They are implemented by breaking the carry propagation in the adder circuitry at eight bit intervals. The same equipment To & from Neighboring P.E.s Data from P.E. Memory 1. Common Data Bus from C.U To & from C.U. OPERAND SELECT GATES O R REGISTER ( Rout ing register) i n. B REGISTER (Extension register) 1—1 ARITHMETIC UNIT A REGISTER (Accumulator) I t t LOGIC UNIT (Boolean operations) I BARREL SWITCH (Shift unit) k MODE REGISTER S REGISTER (Temp, storage registor) I ±JlA ADDRESS ADDER X REGISTER (Index register) Addresses to P.E Memory -^Data to P.E. Memory Figure 2 - Processing Element (P.E.) 8 is used to test 8-bit fields for equality and inequality relations. These tests operate on the eight fields simultaneously and leave register A with ones in the least significant bits of byte fields meeting the test conditions. The remaining bits of register A are cleared. The character manipulation schemes will rely heavily on these 8 -bit operations. Each P.E. also has an 8-bit register not commonly found in conventional computers known as the mode register (register D). The eight bits are designated E, El, F, Fl, I, G, J, and H respectively. The I and J bits record the results of various P.E. test operations. If the P.E. is in the 32-bit mode so that two operands are tested simultaneously, then the G and H bits hold the additional test results corresponding to the 1 and J bits, respectively. The F bit indicates arithmetic overflow and is supplemented by the Fl bit in the 32-bit mode. The E and El bits, or enable bits, perform a function unique to the array computer. Setting these two bits of a particular P.E. to one causes that P.E. to operate in the "enabled" state by fully executing any P.E. instructions encountered by its controlling C.U. However, one of these instructions may cause the E or El bit to be set to zero, either directly or by using the value of one of the other mode register bits. When this occurs, the P.E. enters a "disabled" condition in which registers A, S, and X cannot be changed and memory references are ignored, effectively stopping activity in the P.E. The E bit controls the action of register sections corresponding to the outer operand of the 32 -bit word format, while the El bit controls the inner operand sections. Thus, when E and El differ it is 9 possible to modify 32 bits of a register while leaving the remaining portion of the register unchanged. Some test instructions use the address adder and will cause the B register of a disabled P.E. to be changed. Register R of a disabled P.E. is also liable to change, since it must be available for routing between other enabled P.E. 8. Since P.E. 8 may be enabled according to predetermined patterns or by the results of test operations, one can obtain the effect of program branching by executing several sets of instructions with a different group of P.E.s enabled for each set. This will be used in many places in the character processing routines to follow. Obviously, such a technique must be used with caution since disabled P.E.s represent unused processing power. Processes such as normalization require that a single shift command provide for differing shift counts in the various P.E.s. The common technique of repeatedly shifting a short distance until reaching the desired shift count was deemed unacceptably slow, so a cascade of shift gates that can produce any shift or rotation from to 64 bits in two clock times is provided. This shifter is referred to as the "barrel switch" in most Illiac IV descriptive literature. The shift count for the barrel switch can be obtained from a unit that detects the position of the first one in the mantissa field of the P.E. accumulator (the leading one detector) or it may be obtained from a P.E. or C.U. register. The shift count may be indexed as if it were a memory address. This feature is extremely useful for masking and alignment operations. 10 2.3 The Control Unit Figure 3 shows the major sections of the control unit. The advanced station (ADVAST) portion decodes, Interprets, and controls the flow of all Instructions executed In the quadrant. The ADVAST has four 64-bit accumulators designated ACAR through ACAR 3, and a local storage area consisting of 64 Integrated circuit registers known as the ADVAST data buffer (ADB). ADVAST also has a 24-bit address arith- metic unit which can use any of the four accumulators as an index register divided as shown below: 1 bit 15 bits 24 bits 24 bits Not used Increment field Limit field Index field The first bit of the increment field (bit 1 of the ACAR) acts as a sign for the increment. Indexing instructions that use the increment field to modify the index field will subtract the increment from the index if bit 1 is a one, and will add the increment and index otherwise. Although the arithmetic section is limited to 24 bits, ADVAST can apply any of the standard Boolean operations to 64-bit operands, and can set, clear, complement, or test any of the individual bits in an accumulator. The C.U. can examine the mode bit settings of the P.E.s in its quadrant with an instruction that sets each of the 64 bits in a designated ACAR to agree with the value of the selected mode bit in a corresponding P.E. For example, if the E bit were selected, a one in the last position of the accumulator would indicate that the E bit of P.E. 63 was set to one (and that P.E. 63 was enabled). An E bit pattern with only zeros indicates that all of the P.E.s are Memory Service UNIT (MSU) Advanced Station (ADVAST) Test- Maint. Unit (TMU) i | Final L_ Station (FINST) Instruction Look -Ahead Unit (ILA) 64 -word ADVAST Data Buffer (ADB) ACARO ACAR 1 ACAR 2 ACAR 3 24 -bit adder and 64 -bit logic unit 11 JTo & from "p.E. Memories P.E. instructions and operands 8 -word Final Queue (FINQ) P.E. Sequencer J P.E. control signals Figure 3 - Illiac IV Control Unit 12 disabled, while the complementary pattern of ones indicates that all P.E.s are enabled. Such patterns may be detected by C.U. test instructions that check for all zeros or all ones in the entire 64-bit accumulator or in the rightmost 24 bits of it. Unlike P.E. test operations, which merely cause status bits to be set, C.U. tests control the instruction flow by skipping forward or backward in the instruction stream. A pattern in an ACAR may be transferred to the P.E. mode registers so that each of the 64 bits in the pattern sets the selected mode register bit in a different P.E. A novel feature of the C.U. is the leading one detection hardware which can replace a 64-bit pattern in one of the accumulators with a binary integer that represents the position of the leftmost one (or zero) in the pattern. This facility is a boon for finding special characters or subfields in a long chain of characters. The second major section of the control unit, the final station or FINST, receives only instructions that require P.E. action. FINST converts the P.E. instructions into sequences of appropriate enable and gate signals and transmits these signals to all P.E.s in the quadrant over a system of about 260 control lines. FINST also sends any operands or addresses needed from the C.U. by "broadcasting" the required data over a 64-bit common data bus (CDB). Since ADVAST controls the instruction stream, all instructions pass through it for decoding first. If the operation involves only control unit hardware (a C.U. instruction), the ADVAST completes the operation so that the instruction never reaches FINST. If the instruction is a P.E. instruction, ADVAST decodes it, provides any 13 indexing operations necessary at the control unit level, and passes the recoded instruction and an operand, if required, to FINST for disposal. Thus some instructions may be entirely processed by ADVAST while other may pause in ADVAST only long enough for decoding before being sent to FINST for execution. To avoid situations where either ADVAST or FINST is idle waiting for the other section, the instruction- operand pairs are passed from ADVAST to FINST through an eight word first-in, first-out final queue named FINQ. Occasionally, ADVAST will require results from the previous FINST operation, usually when reading test results from the P.E.s. In such a case ADVAST is halted until the FINQ is empty. Otherwise, FINQ allows for a considerable amount of overlap between C.U. and P.E. instructions. This FINQ overlap ability makes program timing estimation difficult, since the total execution time is rarely the sum of ADVAST (C.U.) and FINST (P.E.) time, although it cannot exceed this sum. The three other control unit sections are of little interest here. The instruction look-ahead (ILA) tries to maintain a supply of new instructions in a set of 64 integrated circuit instruction storage registers known as the instruction word store. This is done by fetching blocks of eight instruction words (there are two 32-bit instructions per word) from quadrant memory. Access to quadrant memory is controlled by the C.U. 's memory service unit (MSU). The remaining control unit section is a test -maintenance unit (TMU). 14 2.4 Memory Addressing Each processing element is provided with a P.E. memory unit (P.E.M.) consisting of 2048 words of 64 -bit thin film memory. Thus each quadrant contains an aggregate of 131,072 words of memory which will be referred to as quadrant memory . P.E. memory addresses usually originate in the address field of a P.E. instruction. The control unit extracts this field when the instruction is decoded and may increment or decrement this address by the contents of one of four C.U. accumulator registers for a first level of indexing. The address is then distributed to all P.E.s in the quadrant where it may be added to or subtracted from the contents of each P.E.'s index register (register X) to provide a second level of indexing. (Note that hardware design considerations have dictated that the address be subtracted from the index register contents instead of the more usual subtraction of the index register contents from the incoming address. Some index subtraction operations in the sequel may be confusing if this requirement is not kept in mind.) The 16 bit address produced by the P.E. hardware to access one of the 2048 P.E.M. locations will be known as a P.E.M. address . (The memory hardware, however, uses only the least significant 11 bits of this 16 bit field.) Normally, an instruction that references a P.E.M. address will cause 64 memory locations to be simultaneously accessed, one in each P.E.M. of the quadrant. If no indexing is performed in the P.E.s, each location in this block of 64 quadrant memory locations will have the same P.E.M. address. Such a block will be called a slice . One of the 64 words or slice elements in the slice will be located in P.E.M. 0. This 15 word will be defined as the slice leader of the slice. Since indexing in the P.E.s is possible, however, the locations actually accessed all may not be in the same slice. The control units have no memory except for a small buffer, and obtain instructions and operands from the P.E.M.s. To permit the referencing of any location in the four quadrant memories, a quadrant address of 24 bits is created by appending eight bits to the least significant end of the 16-bit P.E. address. The most significant two bits of the 8 -bit addition indicate the desired quadrant, while the remaining six bits locate a specific P.E.M. in that quadrant. In the 8 ingle quadrant mode, the quadrant specification bits become irrelevant, and the address arithmetic is arranged so that, as the addresses are incremented, the successor to a given address occurs at the same r location in the next higher numbered P.E., except that successors to addresses in the last P.E. (P.E. 64) are located in the next higher numbered location in the first P.E. (P.E. 0). The 24 -bit quadrant address has the following format: 16 bits 2 bits (The most significant five bits are not currently meaningful.) (This field should not change if a single quad- rant is being used. ) 6 bits P.E. memory location Quad number P.E. number Note that 11 of the first 16 bits of a quadrant address act as a 3 lice address , specifying which of the 2048 slices contains the desired 16 address, while the last six bits locate the specific slice element desired. The array may operate with the four quadrants completely separated or else two or more of the quadrants may execute the same instruction stream and coordinate their routing operations to form a larger array. Only a single quadrant will be used for the problem In this study. If the full array of 256 P.E.s were available for compilation, each quadrant would probably be assigned to a separate compilation task. The majority of the llliac IV supervisory programs reside in a B6500 commercial computer which acts as the controller for the array. The B6500 controls an extremely large disk file system with transfer rates up to 5 x 10° bits/second. Since Input /Output operations are not directly controlled by the array, this study will assume that all of the necessary information is available in quadrant memory. 17 3. RECOGNIZER SPECIFICATIONS 3.1 The Source Language To see how a parallel processor may be used for program translation, we will study an Illlac IV assembly language coding of a recognizer for a dialect of the ALGOL programming language known as Burroughs Extended ALGOL. Thus the source material should conform to the rules specified In the 1966 edition of the Burroughs Extended ALGOL Language Manual, with a few exceptions. The Burroughs Extended ALGOL is defined for a set of 63 characters, each composed of six bits. Since the smallest word subdivision provided in the Illiac IV instruc- tions set is an eight bit "byte," and since newer communication standards such as EBCDIC (Extended BCD) and ASCII (American Standard Code for Information Interchange) provide for an eight bit character size, a source string of eight bit characters will be assumed. The first two bits of each character will be assumed to be zeros, and the remaining six bits will be used to define input symbols according to the coding of appendix B-l of the Burroughs Extended ALGOL Language Manual. All constructs allowed by the manual should be properly pro- cessed by the recognizer with the exception of the COMMENT construct and a more stringent limitation on the use of reserved words. A conventional compiler usually calls upon the recognizer to process only one token before returning control back to the analyzer. However, if the full capabilities of the Illiac IV array are to be realized, it is apparent that more than one token must be processed during each recognizer cycle. With the eight -bit instructions, each Illiac IV quadrant can simultaneously manipulate 512 characters. 18 Since the specifications for Burroughs Extended ALGOL provide for tokens from 1 to 63 characters long, the recognizer is able to examine several tokens simultaneously. Thus the parallel recognizer should concurrently build and classify several tokens which may not be in the same class. ALGOL is built upon four classes of tokens: delimiters, numbers, identifiers, and strings. The Burroughs specifications further require that identifier and string tokens consist of 1 to 63 contiguous characters. Two types of delimiter tokens appear, reserved words such as BEGIN, FOR, PROCEDURE, and the set of single character delimiters or "special" characters consisting of character symbols that are neither numeric nor alphabetic symbols. Care must be taken to insure that delimiter symbols which are components of string tokens are not confused with delimiter symbols occuring in the normal context. This confusion arising from the string construct is resolved quite naturally by recognizers that scan serially, but it presents a problem for parallel recognizers. 3.2 Internal Identifiers The final goal of the recognizer is the production of internal identifiers in the following format: HEADER POINTER Bit 7 8 39 40 63 All internal identifiers begin with an eight -bit header. If the 19 internal identifier represents a delimiter token, these eight bits are all zero. Otherwise, the eight bits of the header nay be repre- sented as follows : ABNNNNNN The two bits labelled A and B identify the classification of the token according to the following convention: A B Number token 1 Identifier token or 1 1 String token The six bits designated N form a binary integer that gives the number of symbols included in the token. Delimiter tokens belong to a pre-defined set and are not filed in a token table, so their internal identifiers can be simplified. Delimiters that appear in the input block as a single non -alphanumeric symbol can be represented by identifier words having the original symbol in the rightmost eight bits and zeros elsewhere. Reserved word delimiters such as BEGIN, PROCEDURE, FOR, etc. can be replaced by eight bit quantifiers by assigning each word a unique integer betwetsu 63 and 256. These quantifiers can then be used as the last eight bits of an identifier word that has zeros elsewhere. Since input symbols are limited to six bits, the eight -bit quantifiers can represent all possible non -alphanumeric symbols as well as over 190 reserved word delimiter tokens without conflict. Only the last eight bits of the delimiter 20 token pointer field are used to hold the actual delimiter symbol or a pseudo- symbol for reserved word delimiters. If the internal identifier replaces a number, string, or identifier token, the pointer field will contain the memory location in a table of the beginning of the stored token. 3.3 Tokens Tokens that are not delimiters are arranged to facilitate filing them in a table. Each of these tokens begins with the same eight -bit header that is used in the corresponding internal identi- fier. The two classification bits of the header allow number, string, and identifier tokens to be held in the same table. The character count indicates the number of bytes following the header that contain meaningful characters, and it can be used to determine the number of words occupied by the stored token. Obviously, no token may occupy more than eight words. Delimiter tokens contain exactly one input symbol which is placed in the rightmost eight bits of the one word token. The rest of the word, including the header field, consists of zeros so that a delimiter token and its internal identifier are identi- cal. The zeros in the header field distinguish a delimiter token from the other token types. Since the header is an essential part of every token, any subsequent use of the term token will be to refer to a construct having an eight bit token header followed by a symbol tail. The symbol tail is made up of the input symbols of the identi- fier, number, or string followed by enough zeros to reach a word boundary. The symbol tail for a delimiter consists of fifty-six 21 zeros followed by the original delimiter character. Thus, tokens will appear in one of the following formats : DELIMITER TOKEN: HEADER (All zeros) NUMBER, IDENTIFIER, OR STRING TOKEN: DELIMITER SYMBOL A B N N N N NXXXXXXXX YYYYYYYY 41- 00000000 SECOND SYMBOL ZEROs'tO END OF WORD (If necessary) HEADER FIRST SYMBOL (A and B represent the two bit classification code. The N's form a six bit character count.) Although the recognizer could be required to convert number tokens to a machine format, this step will be left for the analyzer. Number tokens will be formed exclusively from digit symbols and will be added to the table as if they were identifier or string tokens. Thus, floating point quantities may appear as several number tokens separated by appropriate delimiters. 3.4 Sections The recognizer is confronted with a block of 512 characters, many of them blank, that form tokens containing from 1 to 63 symbols each. Each of the 64 P.E.s in a quadrant holds eight of the input characters, thus partitioning the 512 input symbols into 22 eight -character sub-blocks. The P.E. boundaries also divide the tokens symbols into sections of eight symbols or less. Consider, for example, the following input statement: ANS 4. IDENTIFIER + 24 TIMES SUM; This statement could appear in P.E. memory with the following grouping: P.E. A N S 4- I P .E l L D E N T I F I E P.E. 2 Rf M P .E 3 E S S u M > Note that TIMES is not an identifier token, but a reserved word which replaces the delimiter symbol X . Since reserved words have the same structure as identifier tokens, the token building routine treats them as identifier tokens which will be recognized as delimiters later in the table maintenance process. To minimize interference from the P.E. boundaries, the scanner first gathers the characters into sections using the section builder , and then assembles the sections into complete tokens with the section joiner . Sections created by section builder will ultimately be combined to become tokens, so they are constructed with the same format as tokens; number, identifier, or string sections have an eight -bit header followed by a symbol tail, while delimiter sections consist of a single word containing a delimiter symbol preceded by fifty -six zeros. Obviously, the symbol tail of a section contains at most eight symbols. If all of the symbols of a token occur within the same P.E., the section formed from these symbols will be a complete token. This is always the case for delimiter tokens, that are not reserved words since, as noted before, section builder 23 treats reserved words as identifiers end can thus assume that all delimiter tokens contain exactly one character. All of the tokens in the example except the tokens for the identifier named IDENTIFIER and the reserved word TIMES generate sections that are complete tokens. The token for TIMES is built from two sections; the token for IDENTIFIER requires three sections. 24 4. THE CHARACTER CLASSIFIER 4.1 The Classification Process Section builder does not directly examine the input symbols to determine the action to be taken, but is guided by a control string in which there is an eight-bit control byte corresponding to each input character. This control string is generated by a character classifier that begins by placing each of the 512 input symbols into one of the following classes: ol - symbol: An alphabetic letter. (3 - symbol: A blank symbol that does not belong in a string token. V- symbol: A symbol that is part of a string token. (A symbol properly enclosed by quote symbols.) 6- symbol: A delimiter symbol--a symbol that does not fit into any of the other classifications. 1- symbol: A numeric symbol (Digit). 4>- symbol: A null character. (Should not appear in any output token. ) Conventional recognizers apply a series of tests to each individual character, using the results of one test to determine the next test until the symbol is identified with a specific symbol class. The parallel recognizer uses the eight -bit relational test instructions to test all of the 512 input characters simultaneously. The only characters that change the classification of other characters are the quote characters which always signal the presence of string tokens, so the classifier checks for these symbols first. If quote symbols 25 are present, every odd -numbered quote symbol should be the beginning of a string token that will be terminated by the next even-numbered quote character. If the previous block of source symbols ended with an uncompleted string token, then the roles of the odd and even numbered quote symbols are interchanged. If any quote symbols are detected, a marker bit is generated for each symbol in the string tokens. To do this, the bits that mark quote symbols are shifted right one character position in each P.E. and added modulo two. The process is repeated eight times. The bit corresponding to the right- most character in each P.E. now indicates if an uneven number of quote signs occurred in that P.E. These bits are accumulated in an * AR and the process is repeated for 63 shifts. Each one in the ACAR uuw indicates that the markers in the P.E. corresponding to that ACAR bit position should be complemented. When this has been done, the newly generated bits mark all of the symbols that belong to a string token. Unfortunately, the routine just described breaks down if a "three-quote" construct is encountered. This construct arises be- cause the Burroughs character set does not include the right and left quote symbols used in the ALGOL report. Thus, while quote characters may be readily inserted into the string tokens described in the ALGOL report, a group of three successive quote symbols is required to represent a single quote character inside of a string token in Burroughs Extended ALGOL. Since three consecutive quotes upset the modulo two scheme used in generating the string symbol markers, such groups must be detected and remedial action must be 26 taken first. This is done by detecting pairs of adjacent quote symbols, deleting their markers from the set of quote markers, and then marking them as null characters to prevent their appearance in the output tokens. If a single quote symbol follows such a pair, the quote marker corresponding to this symbol is deleted so that the now unmarked quote sign will be ignored by the modulo two routine. Sequences of several continguous quote symbols must be reduced by repeated use of the quote pair remover. An unfortunate by-product of the removal of quote symbol pairs is a shrinkage in the length of the token. As we will see, the possibility that there may be gaps in the symbol tails necessitates the use of more general (and thus more complicated) section builder and section joiner schemes. After string components have been properly identified, the character classifier uses a series of eight -bit inequality tests to separate numeric and alphabetic characters from the delimiter characters, Digits may be easily isolated, but the Burroughs coding requires several inequality tests to identify the alphabetic characters, as they occur in three groups separated by delimiter symbols. This is of little concern to a parallel recognizer since each step is being applied to 512 characters instead of just one symbol. The results of these tests are recorded in a marker string. Each eight -bit byte of the marker string corresponds to one of the input symbols and indicates the classification of that symbol using the following coding: 1 2 3 4 5 6 7 4 P Y * 6 « Bits and 3 are not used. The exaaple input statement and its corresponding marker string are given below: 27 P.E. A N 3 4- I cCcC 9 P .E L D E N T I F I E tt ot W oi* UcLU P.E. 2 R + 2 4 T I M oi S/l If oi U oi P ,E . 3 I E S S U M > of of P * d *i 6 P 4.2 The Control String The marker string is easily converted into a control string in which each eight -bit byte is coded in the following manner: 1234567 YAS The o£, V, and 6 indicators have the same meaning as they did in the marker string. The two new bits of control information have the following significance: X : Indicates a null character or a blank character that is not contained in a string token. (Should not appear in any output token.) £ Section store indicator. Marks either a delimiter symbol or a blank symbol that is not contained in a string if the preceding symbol is not a delimiter symbol or a non-string blank symbol. The control string format is designed so that the control bytes may be loaded directly into the eight -bit P.E. mode registers to control 28 the section assembly process. The £ indicators appear in bit positions and 1 since both the E and E 1 enable bits must be set to one to completely enable a P.E. Bit positions 2 and 3 are the fault bits in the mode register. These are always set to zero to avoid complications with the interrupt hardware. The example input statement would give rise to the following control string: P. E. c ) P E. 1 P E , : > P E '. \ A N S 4- i D E N T I F I E R ♦ 2 4 T I M E S S u M > ot <* << e A X a J\ * eC 0t|o£ ot o< oi cA oL oC S \OLOL4 ci cL 6 *<* ci 29 5. THE SECTION BUILDER 5.1 The Section Builder Process Section builder processes the eight characters held in each of the 64 P.E.s, beginning with the leftmost character and moving right under the direction of the control string. The input characters are held in the R registers of the P.E.s. The section builder begins constructing a partial section in each P.E. This partial section consists of a symbol tail in register A and a header in register S. The eight -bit header is kept in bit positions 53 through 60 of the S register, an offset of three bit positions from the right end of the register. Thus, the character count of the header is incremented by adding eight instead of one. The reason for this will become clear later. Initially, the partial section symbol tail and header are both all zeros. If an alphabetic, numeric, or string character is encountered, this character is extracted from the set of input characters and added to the end of the symbol tail in register A. The character count in register S is also incremented. If the new symbol is an alphabetic character, the first bit of the partial section header, located in bit 53 of the S register, is set to one. Bit 54 is set to one when a string character is added. If a delimiter symbol is encountered in the input, the partial section previously being accumulated (if any) is assembled into a complete section and stored in P.E. memory. The delimiter symbol is then extracted from the set of input symbols, and placed in the rightmost eight bits of register A to form a delimiter section, which is also stored in P.E. memory. Register A and register 30 S are then both cleared to begin a new partial section. The occur- rence of a blank symbol that is not a part of a string token will also cause the section builder to assemble and store any unfinished partial sections, but the blank symbol itself will be discarded. The section builder does not need to directly examine the input symbols to determine the action to be taken since its behavior is completely determined by the control string. The first step in processing a new input character is to load the control byte corresponding to that symbol into the P.E.'s mode register. This immediately disables the P.E.s that do not have partial sections to be stored. When the partial section storage operation is complete, the mode register bits are rearranged to enable only P.E.s that are required to extract a non- blank, non-null character from the set of input characters. The section builder continues in this fashion, using the mode bits to insure only the correct P.E.s are active at each step until all of the possible operations on the current character have been completed. It then proceeds to the next character and loads a new control byte into the mode register until all eight of the characters in each P.E. have been assimilated. The kernel of the section builder is the process of simul- taneously extracting the i tn character from the eight-character input word in each of the enabled P.E.s. This could be done by masking a copy of the input word in each enabled P.E. with a pattern properly positioned in an ACAR. However, since it is important not only to extract a character from the word in which it is imbedded, but also to shift that character to a position aligned with the end of a 31 symbol tail, an alternate method which dispenses with the mask in favor of a series of shift operations was adopted. As noted before, each P.E. is equipped with a shifter, which is capable of shifting a 64 bit operand in either direction, end-off or end-around, in a single clock time. Even with the necessary instruction decoding and other overhead, the shift instructions require only three clock times for any desired shift. Furthermore, the shift count can be indexed separately in each P.E., so that a single instruction may produce a variety of shift lengths in the different P.E. a. The section builder uses two end-off shifts to extract the character from the input word and place it in the rightmost eight bits of register A. The word is first shifted left far enough to left justify the selected character. Next, a 56 bit right shift leaves the desired character in the right- most byte of a word that contains zeros elsewhere. Delimiter characters in this position need no further treatment before they are stored as completed tokens. Otherwise, the extracted character is added to the partial section symbol tail which is then shifted left eight bits so that another symbol may be added to the right end. When a symbol tail is to be joined with its companion header to form a completed section, the exact position of the right end of the symbol tail becomes unimportant, and the tail must be shifted so that its leftmost symbol is aligned with the header to be added. A rotation (end-around shift) to the right is used for this. The symbol tails may vary from one to seven characters in length, so the shift distance will vary. The correct shift counts are derived from the character count portion of the header accompanying the symbol tail. 32 Displacing the header three bits from the right of the S register effectively multiplies the character count by eight to give the necessary shift count increment. After the header in the S register is OR-ed with the symbol tail, an eleven bit right rotation aligns the completed section for storage. Given the example input statement, the section builder would produce the following result: R Register (Input string) A N S 4- I D E N T I F I E R + 2 4 T I M E S S U M > k 1 Regis ter (Partial sec tion header: 1 8 3 A Register (Partial section symbol tails) I E N T I F I E D T I M P.E. P.E. 1 P.E. 2 P.E. Memory P.E. 3 3 A N S 1_ R 2 E S <■ + 3 S u M 2 2 4 > The underscored digits represent header bytes, and give the value of the character count field in the header. 33 6. THE SECTION JOINER 6.1 The Section Joiner Process Many of the sections constructed by section builder are complete tokens. The amalgamation of the remaining sections into tokens is the function of the section joiner. A P.E. that contains the beginning section of a mult i -section token is designated as a RECEIVER P.E. since this P.E. receives the remaining sections of the token from adjacent P.E.s and adds them, one at a time, to the growing partial token until a completed token has been assembled. The ending section of a multi-section token will always be the first token in its P.E., which is called a GENERATOR P.E. Finally, LINK P.E.s hold the intermediate or LINK sections of tokens which include three or more sections. A P.E. which is a RECEIVER for one token may be a GENERATOR for another token, but a LINK P.E. must contain exactly one section and cannot also be either a RECEIVER or a GENERATOR P.E. The section joiner maintains three 64 bit patterns in separate ACAR registers to indicate which P.E.s are RECEIVERS, GENERATORS, or LINKS. The registers in each P.E. are tested to determine if any complete sections were stored by the P.E., if any partial sections remain in the registers of the P.E., or if the section store indicator of the first control byte in the P.E. was zero. The patterns resulting from these tests are used to produce the RECEIVER, GENERATOR, and LINK patterns. In the case of the example input statement, P.E. and P.E. 2 would be considered as RECEIVER P.E.s, P.E. 2 and P.E. 3 would be GENERATOR P.E.s, and P.E. 1 would be a LINK P.E. 34 6.2 Character Count Determination The sections are combined by merging all of the section headers into a single token header and by concatenating the section symbol tails. As in the assembly of single-section tokens, the symbol tail of the first partial section must be aligned so that it will be adjacent to the token header. Subsequent partial section symbol tails are received from neighboring P.E.s and must be shifted to align them with the previously assembled symbols. If the "three -quote" convention for inserting quote symbols into strings did not exist, all inter- mediate or LINK partial sections would contain exactly eight symbols and all sections except the beginning section would be shifted identically during the symbol concatenation process. Unfortunately, however, the "three-quote" convention can shorten the length of LINK sections, so a separate shift count must be used whenever a new partial section symbol tail is joined to the token. As before, the character count fields of the section headers provide the necessary shift counts. The R registers are used to send the character count of the first section in each P.E. to lower numbered P.E.s in shift register fashion. The P.E.s add the shift count of their last section to the incoming shift counts to determine the number of symbols that would be obtained by joining two sections. The R register contents are again routed to the next lower numbered P.E.s and the addition process is repeated. The resulting sums now give the number of symbols included in three sections. Repeating the process and saving the sums obtained at each step finally produces a counter string of eight sums. Each sum represents the number of symbols accumulated in a successive stage 35 of the section joining process. The same steps are applied to the classification bits from the headers, only the bits are "OR"-ed together instead of added. These are combined with the character counts to form a word containing eight potential headers in each P.E. A RECEIVER P.E. forming a token from only two sections uses its right- most header byte as the header for the assembled token. A P.E. that forms a token using the maximum of nine sections uses its leftmost header byte in that token. 6.3 Symbol Tail Concatenation The process of joining the symbol tails is best illustrated by examining the steps used in assembling the multi-section tokens, IDENTIFIER and TIMES in the example input statement. First, the chain of header bytes is temporarily stored in P.E. memory so that the information remaining from the section builder process may be returned to the S and A registers, with the symbols in the A register left justified in the register. The symbol tail of the first section in each P.E. is also left justified and loaded into the R register for routing to lower numbered P.E. 8. P.E.s processing the example input would contain the following information at this point : R Register (First section symbol tails) 36 A N S D E N T I F I E R E S S Register (Partial section headers) i 8 3 A Register (Partial section symbol tails) I D E N T I F I E T I M P.E. P.E. 1 P.E. 2 P.E. 3 P.E. Memory (Stored sections) 3 A N S 1 R 2 E S + 3 S U M 2 2 4 » All P.E.s that are not RECEIVER P.E.s are now disabled. This does not affect the routing action of the R registers, but it allows only RECEIVER P.E.s to perform the other symbol joining operations. In the example, only P.E. and P.E. 2 will be active initially. The R register contents are routed to the left to begin the first cycle. (The example is constructed as if there were only four P.E.s in the quadrant, so P.E. routes to P.E. 3.) Each RECEIVER P.E. shifts the symbols in the A register eight bits to the right to make room for a header and then moves them to the B register. The incoming symbol tail is brought into the A register and is shifted right by one character position more than the number of characters given the by the character count in the S register. The symbols in the A and B registers are now prrmerly aligned. After shifting the symbols of the example 37 statement, the registers of the two RECEIVER P.E.s appear as follows: P.E. A Register P.E. 2 A Register D E N T I F B Register I R Register D E N T I F I E S Register 1 E S B Register T I M R Register E S S Register 3 The symbols in the A and B registers are then "OR"-ed together and the eight header bytes are loaded into the S register. The bit in the "eights" position in the character count is compared with the corresponding bit in the previous character count. If the bit has changed, register A is filled with symbols and must be stored. Bit 60 of the S register is monitored for this purpose in the first cycle. In the example, P.E. would be the only P.E. to store its A register contents. If the A register contents are stored, the symbols in the R register are again transferred to the A register and the symbols that did not fit into the A register before are passed into the B register using a double length shift. The pattern of GENERATOR bits in the C.U. is now shifted left and compared with the RECEIVER pattern. A GENERATOR bit that is shifted into coincidence with a RECEIVER bit cancels that bit and causes the corresponding P.E. to be disabled for 38 the rest of the concatenation process. In the example, P.E. 2 is disabled in this way after the first cycle. The second cycle begins by routing the R register contents to the left again. P.E. would be the only P.E. active at this point In the example, and the registers of this P.E. would have the following contents : P.E. A Reg ister D E N T I F B Register I E R Register R S Register 29 26 24 23 15 12 10 9 As before, the incoming symbols are transferred from the R register to the A register and shifted into alignment with the symbols in the B register before the symbol groups are "0R"-ed together. The cycles continue until the absence of RECEIVER P.E.s signals the end of the concatenation process. In the example, this occurs at the end of the second cycle. In any case, the process is stopped after eight cycles. All RECEIVER P.E.s are then reactivated to store any symbols remaining in the A registers and to insert the appropriate header byte in the first word of the new multi-section tokens. The tokens are 39 now completely formed and are ready for storage in the token table or addition to the output string. At this point, the example input state- ment symbols are stored in P.E. memory as follows: P. E. P. E. 1 P.E. i I P .E. i 3 A N S *■ + 3 S U M ±9. I D E N T I F 2 2 4 > I E R 5 T I M E S 40 7. TABLE MAINTENANCE 7.1 Table Use After the tokens have been created, the recognizer retrieves them one at a time and creates an internal identifier for each one. Delimiter tokens are transferred to the output string without change. Identifier tokens are compared with a reserved word table so that reserved words masquerading as identifiers can be detected and re- placed with an appropriate delimiter type internal identifier. Other identifier tokens as well as number and string tokens are located in the main token table or are entered in this table if no matching entry is found. The address of the token in the main table is then combined with the header of the token to create the internal identifier that represents the token in the output string. 7.2 The Reserved Word Table The recognizer uses two tables, the reserved word table and the main token table. The two tables could be combined, but the entries in the reserved word table are predetermined single word entries that are never changed by the recognizer, so a simpler arrange- ment may be used for this table. The Burroughs Extended ALGOL manual lists 111 reserved words, but 58 of these are reserved words only when used in certain contexts. For simplicity, the parallel recognizer arbitrarily treats the entire set as reserved words in all situations. (If desired, this restriction could be relaxed by treating the semi -reserved words as identifiers and allowing the analyzer to detect them. ) The reserved word table entries are stored in 41 consecutive quadrant memory locations and fit easily into two slices of 64 words each. The query word is broadcast to all P.E. accumulators where a P.E. test instruction is used to compare the query with the 64 words in the first slice of the reserved word table. The test instruc- tions report a successful match by setting a specified mode register bit in each P.E The test bit pattern is read into an ACAR for examination. If no ones appear in the pattern, none of the 64 table entries accessed match the query. Otherwise, the leading one detector of the C.U. is used to identify the P.E. containing the matching entry. Essentially, the quadrant is being used as a 64 word associa- tive memory here. Only two P E. memory cycles are required to exhaustively search the reserved word table. Since the entries in the reserved word are predetermined and do not change, the fastest con- ventional search technique orders these entries so that the set of possible matching entries is divided in half each time a comparison is made. On a serial machine, such a binary search procedure would require at least six memory cycles to determine that a given query word is not contained in this reserved word table. The only reserved word that contains more than seven characters is PROCEDURE. A test for this word is made separately from the search for shorter reserved words to simplify the table. 7.3 The Main Token Table The token table is much more complicated to search and maintain than the reserved word table. Since new entries are constantly being added to this table, it is not feasible to order the contents. 42 However, since the table may grow to include several hundred entries, it may not be feasible to examine every table entry for each new query. Conventional recognizers attack this problem by dividing the table into blocks (sometimes called "buckets"). Some property of all or part of the query word is used to associate this word with exactly one block. The recognizer can then limit the search to the table entries stored in the selected block. Obviously, this scheme is an advantage only if the entries are fairly evenly distributed among the blocks. Often, many of the identifiers are similar to one another. For example, all of the variables associated with a certain process may begin with the same letter. To keep this clustering effect from adversely affecting the distribution of entries among table blocks, parts of the query word are usually transformed in some way to further "randomize" the keys obtained. Since Illiac IV has high speed multiplication hardware, the query word is multiplied by a "hash constant" to use as many bits of the query as possible in the randomizing process. A four bit hash key is derived from this process. The four bits designate one of the 16 hash blocks used in the table. The table is designed to hold a minimum of 1024 entries with room for 2048 entries under optimum conditions. The organization of the table is evident from the following diagram of the table portion of a typical P.E memory: 43 MEMORY ADDRESS (B ■ Base Address) B - 1 B + B + 1 CONTENTS OF MEMORY CELL End of table pointer First word of main block #0 First word of main block #1 B + 15 B + 16 B + 17 First word of main block #15 Second word of main block #0 Second word of main block #1 B + 31 B ♦ 32 B+ 33 Second word of main block #15 First word of extension block #0 First word of extension block #1 B +47 B + 48 B + 49 First word of extension block #15 Second word of extension block #0 Second word of extension block #1 B + 63 B + 64 B + 65 Second word of extension block #15 Continuation word #0 Continuation word #1 44 Each P.E. has space for one entry for each of the 16 main blocks. Two locations are provided for each entry, the second location 16 words after the first one. Zeros are stored in the first word locations when the table is set up so that empty spaces in a block may be found by matching with a query word of all zeros. A second set of 16 blocks is provided to extend main blocks with more than 64 entries. The extension blocks are linked to the main blocks or to other extension blocks through a set of 32 link words kept in the ADVAST Data Buffer of the C.U. When an entry is stored in the table, the first location in the table receives the first word of the token; a header followed by up to seven characters. The information stored in the second word of the table depends upon the length of the token being stored. If the token contains more than seven but less than sixteen characters, the eighth character is stored in the leftmost byte of the second word, followed by the rest of the characters of the token and enough zeros to fill out the memory word. If the token contains more than fifteen characters, the second word of the table entry is loaded with the address of a continuation word that holds the second word of the token. The remainder of the token is stored in subsequent continuation words. An end of table pointer, stored in the memory location preceding the first main block word, holds the address of the last continuation word used. The search for a match to a query token is begun by multiplying the proper section of the first word of the query by the hash constant and then extracting the four bit hash key. The hash key is added 45 to a base address to obtain the slice address of the first word of the selected main block. The query word is compared with the set of 64 first word entries for the selected main block. If no matches occur, the link corresponding to the main block is checked to see if the block has been extended. If so, the first word entries of the associated extension blocks are also compared with the first query word. When a match is found, the next word of the query token is broadcast to the P.E.s for matching. An examination of the token character count indicates whether the second word of the entry should be matched against the second word of the query token or used to access a continuation word. If several long tokens with identical beginning portions are stored in the table, the first comparisons may produce several bits indicating matches. These bits are "ANO"-ed with the bits produced by subsequent comparisons until at most one match bit remains when all of the words in the query token have been used. Note that this process examines several of the candidates for a match in parallel, where a serial table routine might try several entries that do not quite match the query before locating the correct matching entry. Once an entry is located in the table, the leading one detector is used to determine which P.E. holds the entry. The P.E. number is combined with the slice address of the entry to produce a quadrant address. It is this address that appears in the output string. 46 8. RESULTS 8. 1 Execution Time Data Execution times were estimated for the parallel recognizer for a variety of conditions. As noted before, the total execution time is not the sum of the individual instruction execution times due to the overlap between P.E. and C.U. instructions created by the final queue (FINQ) and by the action of the C.U.'s instruction look- ahead unit. All intermediate calculations are expressed in Illiac IV clock times. (An Illiac IV clock period is about 40 nanoseconds.) However, since it is desirable to have some measure that is not dependent on circuit speed, the final results are expressed in equivalent memory cycles. An Illiac IV memory cycle requires six c 1 oc ks . The routines that assemble the 512 input characters into tokens are evaluated separately from the table maintenance and output string generation procedures. The former is evaluated on a memory cycle/input character basis while the latter is judged on a memory cycle/ output token basis. Table 1 gives the execution times obtained for the token building routines. Four possible input character sets are considered: CASE I: Best possible case. No quote symbols appear in the input and the longest token is two words in length. CASE II: Longest possible tokens (eight words) appear but no quote symbols are present in the input. 47 CASE I CASE II CASE III CASE IV Set-up routines 1560 1560 1560 1560 Quote string builder and first cycle of quote-pair routine — -- 1282 1282 Quote -pair routine -additional cycles -- -- -- 2340 Marker and control string generator 195 195 195 195 Section builder 728 728 728 728 Section joiner -minimum portion 662 662 662 662 Section joiner -additional cycles -- 595 595 595 Total clocks 3145 3740 5022 7362 Equivalent memory cycles 525 624 837 1227 Memory cycles /Input character 1.03 1.22 1.63 2.40 Table 1 - Token Building Routines -Execution Times 48 CASE III: Longest possible tokens as well as quote symbols appear in the input. No more than three quote symbols appear contiguously. CASE IV: Worst possible case. This very rare input string includes 63 consecutive quote symbols which form a maximum length token. The execution time stays near to one memory cycle/ input character unless quote symbols are present. Then the increase is slight unless an absurd number of contiguous quote symbols appear. Obviously, the three -quote convention would be one of the first things omitted in a language designed for a parallel recognizer. A conventional recognizer would probably operate the fastest on a set of all blank input symbols, but even then at least two memory cycles would be required for each symbol scanned. As more and more non-blank symbols, especially delimiter symbols, appear in the input string, a conventional recognizer slows noticeably. Thus the parallel token building routines provide a definite speed advantage when processing non-trivial input strings. The table maintenance and output string routines do not show a similar advantage, however. The execution time required for these routines depends upon a multitude of factors including the length of the token, the number of tokens already in the table, the distribution of the table entries among the table blocks, and the existence of a table entry that matches the query token. The execution times for four widely differing cases are presented in table 2. The cases selected are as follows : 49 CASE I CASE II CASE III CASE IV Preparation of token for table search Table search Entry of token into table Creation of output internal identifier for token 110 44 139 104 24 42 164 77 77 42 298 77 283 42 Total clocks 178 285 360 700 Equivalent memory cycles/Token 30 48_ 60 117 Table 2 - Table Maintenance Routines -Execution Times 50 CASE I: A single word reserved word token. (Naturally there is a matching entry for this token in the reserved word table.) CASE II: A single word string token that has a matching entry already in the table. CASE III: A single word identifier token that does not match any entry in the table. CASE IV: A maximum length token (eight words) that does not match any entry in the table. The parallel table maintenance routines show little if any, gain over conventional schemes. This occurs because each token receives a considerable amount of individual attention by the C.U., mostly in the preparation of the token for the table search. An algorithm that performed some of the more common pre-search steps in the P.E.s would help to reduce the dominance of the preparation step in the total time. Possibly a linear table search could be used for the main token table, especially if less than 500 tokens is expected. But no matter what modifications are adopted, a well designed conventional table routine is not easily aclipsed. Since conventional serial recognizers are usually embedded in a compiler, etc., an exact comparison cannot be made, but the Illiac IV parallel computer should provide an increase in string processing speed of from two to fifteen times. Thus the parallel computer should be a useful tool for string processing. 51 REFERENCES Burroughs Corporation. "15500 Extended ALGOL Language Manual," 1966. Burroughs Corporation. "ILLIAC IV Systems Characteristics and Programming Manual," with change 1 of June 12, 1968. Graham, Robert M. "Bounded Context Translation," in Rosen, Saul, "Programming Systems and Languages," McGraw-Hill Book Company, New York, 1967. Nauer, P., et al . "Revised Report on the Algorithmic Language ALGOL 60," Communications of the Association for Computing Machinery , Volume 6, Number 1, pp. 1-17, January, 1963. Petersen, W. W. "Addressing for Random Access Storage," IBM Journel of Reseerch end Development , Volume 1, Number 2, pp. 130-146, April, 1957. 52 APPENDIX ILLIAC IV ASSEMBLY LANGUAGE LISTING OF PARALLEL RECOGNIZER .QUOTSiEQU .BTMSKlEGU ,LIM3« EOu .LIM7I EQU . L I M<*>3 I EOU .ENDSTIFOu t .HlNBRlEOU .BLANKIEOU .HAMsKjfou ,CYC^6lEQU .IOENT«FOU .STRNGlEQU .GENRl EQU .NLlMK»EOu .RCVRl EQU .ACTIVlEQU ,HASM« EQu .HMSKI Equ .WRDCTiEOU .CLIMTiEou .LASTXlEOU .XLIMTIEQU .ALIMt FQU .PROCDlEQU .REl EQu .POYNTiFqu .HOMSKIEQU .OUTPTlFQU .LOALFlEQU * .HIALF'EOU % .QUERYtEOU .QUERPlFOU •XBASElEOU % .CNTRi FRSTRI «BASF« TSI Z F, nxsrt i OLDMKI MARKS I KTENni EQu EQU F<5U E(5U EQU EQu Fqu EQU **** SOlj S02J SD3) SD4; SD5j $06) SO?) SD8J SD9j SDlOJ soin SD12) SD13| SOU' SD15) $016; SOU) SDU| sou; so2o; S021I SD22J SD23) SD24I SD25I SD26I *D27) SD28) SD29) S032| SD36) $037) »D«6j S063| «8192; .72192) .510, 64*256) 64x257| 64x258; 64*259) LABEL DEC PATTE MASK STE STE STE TELLS ENOE PATTE PATTE MASK A STE MASK MASK FLAG TNDIC FLAG FLAG MASH MASKS WORD LIMIT NUMBE LIMIT 1 STE "9PR0 "REOO TEMPO MASKS OUTPU USES FOR USES FOR USES QUERY USES EXTE COUNT FIRST 256*T SET T LARATIONS RN OF EIG ALL BUT E P 1 UNTIL P 1 UNTIL P 1 UNTIL IF PREVI WITH AN RN OF BYT RN OF S B OUT LEFT P 8 UNTIL ALL BUT I ALL BuT S BITS FOR ATES NON- BITS FOR BITS FOR COOING CO ALL BUT COUNT OF OF CONTI R OF LAST OF EXTEN P 1 INDEX CEDU" PAT 0000" PAT RARY sTor OUT ALL T STRING S029-SD31 alphabeti $D32-$034 alphabeti SD36-S044 ♦ 1 $D46-$D6l NSION BLO ER FOR SE STORE FLA BASE able size **** HT QUOTE C ACH BYTES 3 INDEX P 7 INDFX P 63 INDEX OUS INPUT UNTERMINA ES ■ ULAR LANK CHARA 4 BITS OF 56 INDEX DENTIFTER TRING MARK GENERATOR LINK P.E.S RECEIVER P P.E.S STIL NSTANT 4-BlT HASH QUERY NUATION WO EXTENSION SION BLOCK PATTERN ( TERN TERN AGE FOR PO BUT LEFTMO INDEX REGI FOR TEST C CHARACTE FOR TFST C CHARACTE FOR QUERY HARACTERS LAST BIT ATTERN ATTERN PATTERN BLOCK TED STRING GEST NUMBR CTERS EACH BYTE PATTERN MARKS S P.E.S ,E.S L ACTIVE KEY RD AREA BLK USED AREA LIM.O) Inter ST BYTE STER BYTES RS BYTES RS STRING FOR LINKS TO CKS CTION JOINER CYCLES G BIT TO 228 wORqS/P.E, MARKER STRING FROM PREVIOUS CYCLE MARKER STRING FOR CURRENT CYCLE FND MARKER FROM PREVIOUS CYCLE 00000200 00000300 00000400 00000500 00000600 00000700 00000600 00000900 00001000 oooouoo 00001200 00001300 00001400 00001500 00001600 00001700 00001800 00001900 00002000 00002100 00002200 00002300 00002400 00002500 00002600 00002700 00002800 00002900 00003000 00003100 00003200 00003300 00003400 00003500 00003600 00003700 00003600 00003900 00004000 00004100 00004200 00004300 00004400 00004500 00004600 00004700 00004800 00004900 00005000 00005100 00005200 00005300 00005400 00005500 00005600 00005700 00005800 53 1 1 2 2 3 3 1 1 2 2 1 1 2 2 3 3 1 1 2 2 3 3 E Tl |Aj (0) 64x260; * TEMP STORAGE FOR PARTIAL SECTION 64x261; X TEMPORARY STORAGE FOR X REGISTER 64x262) I TEMPORARY STORAGE FOR S REGISTER 64x263; * HEADER OF TOKEN BEING FORMED 64x264; % ADDRESS OF LAST CONTIN. WORD USED 64X265) * FIRST SLICE OF RESERVED WORD TABLE 64x266) I SECOND RESERVED WORD TABLE SLICE 64x267; I I USES LOCATIONS 267-282 FOR COMPLETED SECTIONS 64x266) X SECTION ♦ 1 64x283; % TBASE " 1 64X284) X USES LOCS 284-511 FOR MAIN TABLE 64x2B5| X TBASE ♦ 1 64x512; % SOURCE STRING **#* SET-UP PROCESS **** ■000 01 00000007000000010 | 8) .CYC56; x 8 STEP * UNTIL 5* INDEX PATTERN ■000401 00200401 0020040M 8) .bTmSk) x mask all but each bytes last bit ■0074 170360741 70360741 7 |8l »HAMSK; I MASK OUT LEFT 4 BITS OF EACH BYTE ■ 037477 176374771 7637477 1 8; .OUOTS) I PATTERN OF EIGHT QUOTE CHARACTERS ■000001 0000000300000000 i 6; •LlM3j X STEP 1 UNTIL 3 INDFX PATTERN •000 001 0000000700000000 I 8) ,LIM7J X STEP 1 UNTIL 7 INDEX PATTERN ■ 0000010000007700000000 I 6| .LIM63) X STEP 1 UNTIL 63 INDEX PATTERN .ENDST) ■030060140300601403006018) .BLANK; x PATTERN OE 8 BLANK CHARACTERS «005ol2o24o5ol2o24o5ol2|8; ,HlNBR) * PATTERN OF BYTES ■ ULARGEST NUMBR ■01 00200401 00200401 0020 | 8) .LOALF) X FIRST ALPHA GROUP TEST BYTES ■ 0150320641503206415032 1 6; •HIALE) ■020040100200401002004016) ,L0AIE*1)X SECOND ALPHA GROUP TEST BYTES ■025o52l2425o52l2425o52|8; .HIALF41) ■030461 142304611 42 30461 t 8) .L0ALF*2)X THIRD ALPHA GROUP TEST BYTES ■035072164 35072164 3507218) ,HlA|.F*2) -E.OR.E) X ENSURE ALL P.E.S ARE ENABLED •E.ORtE) NXSRC) OLDMK) MARKS) XTEND' ■0000006777777777777777181 00005900 00006000 00006100 00006200 00006300 00006400 00006500 00006600 00006700 00006800 00006900 00007000 00007100 00007200 00007300 00007400 00007500 00007600 00007700 00007800 00007900 00006000 00008100 00008200 00008300 00008400 00006500 00006600 00008700 00008800 00008900 00009000 00009100 00009200 00009300 00009400 00009500 00009600 00009700 00009800 00009900 00010000 00010100 00010200 00010300 00010400 00010500 00010600 00010700 00010800 00010900 00011000 00011100 00011200 00011300 00011400 00011500 54 AGAIN t STL(O) LTTd) STLU) L T T ( 2 ) 5 T L ( 2 ) L T T ( 3 ) <> t lc3i L T T(0) LTTCl) C| 0(2) J S T l ( 2 3 T X i. T H ( 1 ) LTTC3J l_r>4 STfl l*;.(0) CI RAJ S 'A ST A f Vi TM(O) ST Lf O, I »TfO) - •l.Cl) w r T C 2 ) L'2) L » T ( 'J ) S T L f ) l »T(1) ST L (1) L ^A L^L(O) NFB L r 'i. ( 3) NFB SMAL l"- Lncf 0) ir\p RU C I . C ( 1 ) I C«.B I. HA SFTE sftf! SmAr L.OX .HASHJ « SET UP HASH CONSTANT «36|8| ,HMS 63| sci ; NXSRC) -E.OR.E* -E.OR.E) 48J 0)JI INITIALIZE TABLE ENTRIES TO ZERO 10000000000000001 I 8 I I SET ACAROINCR ■ *1» ACAROINDX ■ I 00000000000000000181 J I MASKS OUT ALL BUT LEFTMOST BYTE lOOOlOOOOOOOOOOOOlS) % SET UP INDEX i X FOR 8192 WORD OUTPUT STRING 7l222302305212064|8| t * SET UP • , 9PR0CE0U , » PATTERN 5 00000 000000000018 J I SET UP "REOOOOOO* PATTERN OUOTF-PAIR ROUTINE **** * LOAD SOURCE STRING i X LOAD PATTERN OF 8 QUOTE SYMBOLS X CHECK FOR QUOTE CHARACTERS i X OBTAIN MARK FOR EACH QUOTE SYMBOL X MOVE MARK TO PROPER POSITION X SAVE MARKS x "0R W marks Into acaro i I SKIP IF NO QUOTES FOUND X 10AD QUOTE MARKERS FOR ROUTING X ROUTE MARKERS LEFT ONE P.E. SAJ X SET RIGHTMOST BlT OF ACARl X FNABLE ONLY RIGHTMOST PtE, % BRING NExT SOURCE STRING WORD IN AT RIGHT END OF STRING X RE-ENABLE ALL P.E.S X FXTRACT LEFTMOST TWO BYTEs X SAVE RYTES IN REGlSTFR X 55 SMAR 81 LOB SAJ LnA is; SMAL 8J OR »BJ ANON SSI LOR SAJ LnA SSI SHAL 161 flR sx; And SRJ Lnc(O) SAJ ZFRTCO) #SBitDj LOR SAJ RTL U LOB SAJ LnA SRJ CROTR(l) U LnEFl sci; LnA OLDMKJ STR oldmki SFTE "EtOR.EJ SFTEl -E.OR.El ?TAR 17) .nx SAJ >HAR 56J »MAL 56| swap; >HAR 9J ]R SB! .nR SAJ )p SSJ .ns SAJ nA SXJ HAL 56J ns SAJ nA SRJ HAR a; 9 SBJ nR SAJ R SSJ ns SAJ OA SRJ HAL 8J 9 sx; TAR 15j AND ss; ^S SAJ A SSJ * NOW QET LEFTMOST BYTE ALONE I SAVE LEFTMOST BYTE In REGISTER B S RELOAO ORIGINAL QUOTF MARKERS I MOVE MARKERS LEFT ONF BYTE POS, * FORM NOOUOTE-QUOTE MARKERS I MOVE MARKERS LEFT TWO BYTE POS. I OBTAIN NOOUOTE-quOTE-quOTE mArkErc I -OR" MARKERS INTO ACARO I SKIP IF NO MORE MARKpRS REMAIN « MOVE NEW MARKERS RIGHT ONE P.E, % fnable only leftmost P.E. I OBTAIN MARKS FROM PRFVIOUS CYCLE % SAVE MARKER STRING WORD SHIFTED OUT RIGHT END FOR NFXT CYCLE I RE-ENABLE ALL P.E.S % RE-ASSEMBLE MARKERS SHIFTED RIGHT 9 BITS I RE-AssEMBLE MARKERS SHIFTED RIGHT 17 BITS * SET NULL MARKERS FOR SECOND SYMBOL IN CHAINS OF THREE QUOTES I RESET QUOTE MARKER FOR THIRD QUOTE SYMBOL IN CHAINS OF THREE QUOTES t BEGIN "MODULO TWO" PROCESS 00017300 00017400 00017500 00017600 00017700 00017800 00017900 00016000 00016100 00016200 00016300 00016400 00010500 00018600 00018700 00018800 00016900 00019000 00019100 00019200 00019300 00019400 00019500 00019600 00019700 00019600 00019900 00020000 00020100 00020200 00020300 00020400 00020500 00020600 00020700 00020800 00020900 00021000 00021100 00021200 00021300 00021400 00021500 00021600 00021700 00021600 00021900 00022000 00022100 00022200 00022300 00022400 00022500 00022600 00022700 00022600 00022900 36 CSHL(3) AMD LOB LHL(O) SMAR TyLTM(O) in >FTC( 1 ) L Ct.C(l)) CSBt 1 ) CANDCl ) STL 0SHR(2) rri .: n m p a i LR A A« 3 SMAR OB L^S ALOOP» LHA LHR LHL(O) ; ■"-■ SmAL op 5TA LOL(O) LHA Lnul) G* LOS L'^A LOLC 1 ) LR AND Oft STA * ,BTMSK IS ALREADY IN ACAR3 X extract only quote markers X SET ACAROINCR - *1# ACAROLIM m 7 X REPEAT THE PROCESS EIGHT TIMES 3* *C3j SAJ .LIM7J 8j SBI *-3j 60J I» .ENosTj »U sell .LIM63J X SET ACAROINCR ■ *1» ACAROLlM ■ 63 U scij ,-3; 63; SC2J .ENDSTJ U SC2J -E.OR.E-I -E.OR.EJ SAJ x re-enable all p.e.s ssj SC3| X GET ORIGINAL QUOTE MARKERS AGAIN X ALIGN QUOTE MARKERS WITH NULL MARKER POSITION IN MARKER STRING X SET REMAINING QUOTE MARKS TO NULLS X SAVE MARKER STRING **** MARKER STRING GENERATOR **** SORCEJ X BEGIN NUMBER MARKER LOAD SOURCE STRING X SAVE A COPY OF THE SOURCE STRING X I OAO NUMBER TEST BYTES X MARK ALL DIGIT CHARACTERS X SHIFT MARKS TO NUMBER MARKER POSN X ADD MARKERS TO MARKER STRING % S AvE MARKER STRING END NUMBER MARKING X BEGIN MARKING ALPHABETIC SYMBOLS SET AcArOIncR ■ ♦*' AcArOlIm ■ 3 SRI ic°* 1 C0> x test for alphabetic characters $AJ X SAVE TEST RESULTS IN REGISTER S SR| X RELOAD SOURCE STRING IN REGISTER A scil^^^ COMPLETE TEST FOR ALPHA SYMBOLS SSI X COMBINE ALPHA TEST RESULTS MARKS! X ADD ALPHA MARKERS TO MARKER STRING MARKS' * SAVE MARKER STRING SR> SAJ SAJ .HINBRJ SCOJ 6J SSJ MARKSJ .LIM3J * ALOOPJ X END ALPHA MARKING AFTER TESTING FOR THREE ALPHA GROUPS SA' SRI .BLANK) SCI; .BTMSKJ sen 5) •SI SA) S BEGIN BLANK SYMBOL H A RKING RELOAD SOURCE STRING X LOAD BLANK CHARACTER TEST BYTES x test for blank characters s obtain marker for each blank S MOVE MARKS TO BLANK MARKER POSTN X ADD NEW MARKERS TO MARKER STRING X BEGIN DELIMITER MARKING SKJ SCO) SCI) 1) SS) ** SA) ,BTM 5) SC3) SA) SS) 2) SC3j 2) SR) SA) SS) 2) SC3| 4) SR) SA) 56) SA) 1) SR) 1) SCO) XTEND) XTEND) •E.OR»E| •E.OR.E) 8) SB) X OBTAIN MARK FOR ALL CHARACTERS NOT PREVIOUSLY MARKED X SHIFT DELIMITER MARKER INTO PLACE X ADD DELIMITER MARKER TO MARKER STRING ** CONTROL STRING GENERATOR **** X FXTRACT BLANK CHARACTER MARKERS X MOVE MASK TO STRING MARKER POSITION IN BYTE X EXTRACT STRING MARKERS X ROTATE TO BIT POSITION $2 OF BYTE S MARK NON-STRING BLANk CHARACTERS S MOVE MASK TO DELIMITER MARKER POSTN X EXTRACT DELIMITER MARKERS X SHIFT TO BIT POSITION #2 OF BYTE X MARKERS NOw INDICATE EITHER DELIMITERS OR NON'STRING BLANKS x set bit #0 of acaro s enable only leftmost p,e, x add marker from previous pass to left of string x save end marker for next pass % re-enable all p.e.s X RE-ASSEMBLE STRING SHlFTEO RIGHT EIGHT BITS 00028700 00026600 00026900 00029000 00029100 00029200 00029300 00029400 00029500 00029600 00029700 00029S00 00029900 00030000 00030100 00030200 00030300 00030400 00030500 00030600 00030700 00030600 00030900 00031000 00031100 00031200 00031300 00031400 00031500 00031600 00031700 00031600 00031900 00032000 00032100 00032200 00032300 00032400 00032500 00032600 00032700 00032600 00032900 00033000 00033100 00033200 00033300 00033400 00033500 00033600 00033700 00033800 00033900 00034000 00034100 00034200 00034300 58 NAND SRI * CREATE PARTIAL SECTION STORAGE INC. shal i) i move indicator to bit position #i LDB $Ai S^AL ll OR SBI * DUPLICATE INDICATOR BIT In BlT POSITION #0 OF BYTE LOR $A| LnA $si LOLCO) .HAMSKi AND SCO; I MASK OUT LEFT 4 BITS OF EACH BYTE OR SRI t ADD PARTIAL SECTION STORAGE INDICATORS TO CREATE CONTROL STRNG STA MARKS) * CONTROL STRING IS NOW REAqY FOR USE **** SECTION BUILDER **** CLC(D) * PEGIN SECTION BUILDER SET-UP COMPC(l)j x ACAR1 PATTERN TO RAPIDLY ENABLE ALL P.E.S LnL(2) .IDENTI % SET UP FLAG BIT FOR IDENTIFIERS LnL(3) , STRNGj X SET UP F|_AG BIT FOR STRING SECTIONS LOR SORctj X LOAD SOURCE STRING IN REGISTER R CI RA' Lns $ai Lnx *A) STA SECTNl STA SECTPJ X FND SECTION BUILDER SET-UP LOB MARKSJ I lOAD FIRST CONTROL BYTE LOD SB! * SET MODE REGISTERS SFTE -E.OR.-EIX COMPLEMENT SECTION STORAGE SFTEl -El.OR.-EUX INDICATORS XT .FRSTRJ * SET FlRSTSTORE FLAG LnEFl tC lj I PE-ENABLE ALL P.E.S STTE -H.AND.EII CONSIDER ONLY NON-BLANk# NON-NULL STTF1 -H.AND.ElIX CHARACTERS lor sa; SHAR 561 % EXTRACT FIRST CHARACTER SFTE -I. AND. FIX CONSIDER NON-BLANKS# NON-NULLS# AND SFTEl -I. AND. Ell NON-DELIMITERS ONLY RTAL 8' Lns *AI CLRAl AriMA «8I X SET PARTIAL SECTION CHARACTER COUNT TO 1 SFTE J.AND'EJ I CONSIDER ONLY ALPHABETIC CHARACTERS SFTEl J,AND.E1' DP SC2; X SET IDENTIFIER MARKER LPEEl *Clj * RE - ENABLE ALL P»E»S SFTF Q, AND. El % CONSIDER ONLY STRINq CHARACTERS SFTEl G, AND. Ell OR 1C3| % SET STRING MArkEr LOEEl tCH * RE-ENABLE ALL P.E.S SFTF -H. AND, El* CONSIDER ONLY NON-BLANKS AND SFTEl •H»AND«El>* NON-NULL CHARACTERS L.HB SA> LHA $Sl X RETURN PARTIAL SECTION HEADER TO LOS SBI I REGISTER S 59 SETE SFTEl STA CLRAl *T ldeei mL(O) I LOB LDA SMAL SwAp| LOO RTAR OR RTAR STA CLRAI LOS XT LDEEI SETE STTEl LOB IDA SHAL SHAR SETE SETE1 OR RTAL LOB LnA LOS ADMA SETE SETE1 OP LOEEl SETE SFTEl OR LOEEl SETE SETE* SFTE LOB LnA LOB SETE SETEl STA CLRAI LOS XT I, AND. El s CONSIDER ONLY DELIMITER SYMBOLS I. AND. Ell ♦S^CTNJ I STORE DELIMITER In OUTPUT STRING • 1J • CM .CYC56; SAJ MARKSl ■0(0)1 SB' #51 SSJ 111 *SECTNJ SAl ■ 1) *CU •H.AND, •H.AND, sa; sri ■0(0)1 56; -I, AND, -I .AND. sb; 81 SAJ SSJ SBI SJ J.ANO.E J.AND.E SC2; SCl| G.AND.E G.ANDtE sc3; ten •H.AND. -I tAND. -LAND, SAl SSI SSJ I. AND.- LAND*" ♦SEcTNl sa; ■ H s re-Enable all p.e.s COMPLETE FIRST CHARACTER S LOAD 8 STER 8 UNTIL 56 INDEX I MOVE TO N"TH CHARACTER * LOAD CONTROL BYTES * SELECT BYTE #N I MOVE CONTROL BYTE TO REqIsTEr b % LOAD CONTROL BYTE IN MOOE REGISTER I ADD HEADER TO P*RTl*L SECTION I ALIGN PARTIAL SECTION FOR STORAGE I STORE PARTIAL SECTION I CREATE NEW EMPTY PARTIAL SECTION » CLEAR CHARACTER COUNT I ENABLE ALL P.E.S EM CONSIDER ONLY NON-BLANK AND FlJS NON-NULL CHARACTERS I EXTRACT NEXT CHARACTER EJS CONSIDER NON-BLANK#NON-NULL AND EUS NON-OELIMITER CHARACTERS ONLY S ADD NEW CHARACTER TO PARTIAL SECTN. S I I H I I I I u % I EJS EJ* El INCREMENT CHARACTER COUNT CONSIDER ONLY ALPHABETIC SYMBOLS ADD MARKER FOR IDENTIFIER RE-ENABLE ALL P«E*S CONSIDER ONLY STRING CHARACTERS ADD STRING MARKER RE-ENABLE ALL P.E.S consioer non-blank, non-null» and NON-DELIMITER CHARACTERS ONLY I RETURN PARTIAL SECTION HEADER « TO REGISTER S e;s consioer only delimiter CHARACTERS Ell * STORE DELIMITER IN OUTPUT STRING S CREATE NEW EMPTY PARTIAL SECTION I CLEAR CHARACTER COUNT 00040100 00040200 00040300 00040400 00040500 00040600 00040700 00040800 00040900 00041000 000*1100 00041200 00041300 00041400 00041500 00041600 00041700 00041800 00041900 00042000 00042100 00047200 000*2300 00042400 00042500 000*2600 00042700 00042800 00042900 00043000 00043100 000*3200 00043300 00043400 00043500 00043600 00043700 000*3800 00043900 00044000 000**100 00044200 00044300 00044400 00044500 00044600 00044700 00044800 00044900 00045000 00045100 00045200 000*5300 00045400 00045500 000*5600 00045700 60 LDEE1 SCI) I RE-ENABLE ALL P.E.S TXITM(O) #REPETI X CONTINUE UNTIL * CHARACTERS HAVE * BEEN PROCESSED * **** SECTION JOINER - CHARACTER COUNT ROUTINE **** IXE «8192I X SET I BIT If X ■ 2**13 (FjRSTSTORE SFTC(O) It X HARKER ■ 1* X ■ OTHERWISE) IXE »0l S^cd) Ij COR(O) SC1I X ACARO ■ NOBREAK JXG «4096l I SET J BIT If X > 2**12 (FlRSTSTORE % X HARKER ■ 1) SFTCC1) J) * ACAR1 ■ FlRSTSTORE IsE «0| I SET I BIT IF THE P.E, CONTAINS SFTC<2> II * NO UNSTORED PARTIAL SECTIONS CnMPC(2)l * ACAR2 a REHAIN L0L(3) JC2) CSHRC3) II CANO(O) $C3| C*N0(3) SC 1 1 CSHLL(3) .LIM7I I SET ACAR3INCR ■ *1# ACAR3LIH ■ 7 RTAR #81 I LEFT JUSTIFY PARTIAL SECTION TAIL STA SAVEI I SAVE "ENDING" PARTIAL SECTION STS SSAVE) I SAVE "ENDING" HEADER LOA SECTMI X LOAD FIRST PARTIAL SECTION STORED LDEEl SCll X ACTIVATE ONLY LINK P.E.S LDA SSI SHAL 53| I SHIFTED COUNT IS DIVIDED BY 8 SFTE E.OR.-EI I CORRECT STARTING SECTION HEADER IS SFTF1 E.OR.-EI I NOW IN LEFTHOST BYTF OF REGISTER A L08 SAI SHAL 21 SHAR 581 LOR SA| I STARTING SECTION CHArAcTEr COUNT IS % NOW IN REGISTER R LDA SBI SHAR 62j SHAL 61 LOB SSI X FNOING SECTION CHARACTER COUNT AND % HArkErs ArE now In register b LOS SAI X STARTING SECTION MARKERS NOW IN RGS LHA SBj S H A R 3 I RTL 631 X ROUTE COUNT BYTES LEFT ONE P.E, AOMA SRI X COHBINE COUNTS MOREl RTAR 8| LHB SA* 61 SMAR OR RTL AHMA TXITM(3) RTAR LOLO) LnR Lns SMAR smal RTAR Lns ShAr OR RTL OP TYLTM(3> RTAR OR STA LHS * SECTION LOL(3) STLC3) STX CLRA) LDR LOB LDA SHAL 561 SB) 631 SRI 'MORE) 6) .LIM7J SS) SA) 62) 6) a) SA) 56) SB) 63) SR) ,MMORE) 6) SS) HEAOR) SA| JOINER - ,LlM7) .CNTR) SAV E Xj SA) SAVE) SECTN) 6) LOEE1 LOA LHEE1 STR SETE SFTE1 LOR LHA SHAL Lns J« SFTCC3) CnMPC(l)) STL(l) CANOCl) LnA LOEE1 SHAR STA SCI) SB) SC2) SECTN) E.OR.-E) E.OR.-E) SA) SSAVE) 3) SA) 57) J) •NLINK) SC3) SB| SCl) 6) *SECTN) I COMBINE MARKERS I ROUTE COUNT BYTES LEFT ONE P.E, f ADD IN NE X T COUNT BYTE X REPEAT SEVEN TIMES I 8 ELEMENT COUNTER STRING COMPLETE X RESET ACAR3INCR • ♦ !• ACAR3LIM • 7 X SAVE COUNTER STRING IN REGISTER S x Ending section marks In register a X ROUTE MARKERS LETT ONE P.E. S COMBINE MARKERS X REPEAT SEVEN TIMES S 8 ELEMENT MARKER STRING COMPLETE S FORM COMPOSITE MARKER / COUNTER STRING X SAVE COMPOSITE STRING X CHARACTER COUNT ROUTINE COMPLETE RECEIVER/LINK/GENERATOR ROUTINE **** X SET ACAR3INCR ■ *\, ACAR3LIM • 7 X SET UP SECTION JOINER CYCLE COUNTER X SAVE LOCATION OF FIRST NEW ENTRY X RELOAD ENDING PARTIAL SECTION X RELOAD STARTING SECTION (IF ANY) X REMOVE OLD HEADER FROM STARTING SECTION X ACAR1>LINK» THE ONLY CASE WHERE AN ENDING PART. SECTN, IS ROUTED LEFT X ENABLE ONLY GENERATOR P.E.S X CLEAR STARTING SECTIONS IN GENERATOR P.E.S X RE-ENABLE ALL P.E.S X SAVE SECTIONS TO BE ROUTED LEFT X RETRIEVE HEADER OF ENDING PARTIAL SECTION X MULTIPLY CHARACTER COUNTS BY EIGHT X LOCATE 8-CHARACTER PARTIAL SECTIONS X FORM -LINK X STORE ONLY NON-LlNK# 8-CHARACTER X SECTIONS 00051500 00051600 00051700 00051600 00051900 00052000 00052100 00052200 00052300 00052400 00052500 00052600 00052700 00052800 00052900 00053000 00053100 00053200 00053300 00053400 00053500 00053600 00053700 00053600 00053900 00054000 00054100 00054200 00054300 00054400 00054500 00054600 00054700 00054800 00054900 00055000 00055100 00055200 00055300 00055400 00055500 00055600 00055700 00055600 00055900 00056000 00056100 00056200 00056300 00056400 00056500 00056600 00056700 00056800 00056900 00057000 00057100 62 LOA IBJ XT all SHAL 56 * CPB(O) 631 « PREVENT END-AROUND ROUTING STl(O) .RCVRI * S*vE PATTERN OF rEcEIvEr p.E.S CnMPC(2)l I FORM -GENERATOR i! **♦* section Joiner - first concatenation cycle **** LOEEl $CO| X CONSIDER ONLY ACTIVE RECEIVERS LOB SA* RTL 631 l_nA SRI X load nExt element FROM THE RIGHT SHAR #OJ OR SBI X ADD NEXT ELEMENT TO PARTIAL TOKEN SHAR 81 X MAKE SPACE FOR THE HEADER LHB SAI l_nA hEAqRI JR 601 LHA $B| STTCd) Jl X CHECK TO SEE If CURRENT PARTIAL crxOR(3> SC1J X TOKEN Is LARGE ENOUGH TO STORE CANDC3) tCO| LOEEl SC3| LHL(3) SC1I STA *SECTN| X ADD PARTIAL TOKEN TO OUTPUT XT "U Cl.RAJ LOB SAI LOA SRI SHABR #81 X PREPARE LEFT PART OF NEXT PARTIAL 1! TOKEN IN REGISTER B CSHL(2) II CAND(O) SC2| X UPDATE ACTIVE RECEIVER INDICATORS LOA SBI LOEEl SCOl l_nB SAI LDA HEADRI X RELOAD CHAR, COUNT / MARKER STRING SHAL 31 X MULTIPLY COUNT BY EIGHT LOS SAI z^rtcO) #f!nali x End first concatenation cycle t #*** SECTION JOINER - EXTRA CONCATENATION CYCLES **** CYCLFi LOEEl SCO; X CONSIDER ONLY ACTIVE RECEIVER P.E.S RTL 63 * L r>A SR' * LOAD NEXT ELEMENT FROM THE RIGHT SHAR #81 OP SBI X ADD NEw ELEMENT TO PARTIAL TOKEN LnB SA* LOA *Sl jr 491 X SEE Ip NEXT CHARACTER COUNT BYTE X ITS "8"-BIT » 1 LDA IBI STTCCl) Jl X CHECK TO SEE If CURRENT PARTIAL CPXORC35 XCH * TOKEN IS LARGE ENOUGH TO STORE CAN0(3) SCOl LOEEl SC3| LnL(3) SC1I 63 STA XT CLRAI LDB LOA SMABR LOEEl CSHL(2) CAND<0) LOA SMAR SHAL Lns LOA EXCHLCO TXGFM(0 HAITI EXCHLCO ZfRF(O) **** LOL(O) LOEEl SHAR |lOL<1) ;lde |SFTE1 SFTE | STA XT SFTE srin CLRAl STA JXE SETCO) 'COMPCO STL(3) LDX LHB Ida ,SHAL OR Loe SFTE1 .SFTE STA SFTE SFTFl ■.run) .OL(0) .ni(2) *SEcTNj • ll IA' SRI #8| scoi U SC2I SSI tu 31 IAI SBI ) .CNTRJ ) H ) .CNTR' *CYctE; SECTION JOI ,RCVR| SCOj 61 .NLINKj »Clj -I. AND, El -I.AND.FI *sEctni ■U E.OR.-EI E.ORt-Ej ♦sEctni •OJ JJ * ADD PARTIAL TOKEN TO OUTPUT >l .ACTlVI SAVEXl ♦SECTNJ SSI 531 SBI SCl| •LAND, -LAND, ♦SECTNI E.OR.-E E.OR.-E **** .ACTIVI •GENRI SCM i prepare l en part of next partial token in register b i update active receiver indicators % shift to next character count i multiply count by eight % load cycle counter i skip if more cycles are possible i error halt • too many cycles * save cycle counter i continue only if active receivers still remain ner • final portion **** x reload original receiver pattern t use pattern to enable p.e.s % make space for header % reload -link * consider only (remain ), ("link ) % add last elements to output * re-enable all p.e.s * store zeros to mark end of output i save bits that indicate p.e.s with * at least one output section s reload pointer to first location used by section joiner for output « reload first section joiner output % obtain header * attach header to final tokens El El % RETURN FIRST ENTRY WORD TO OUTPUT I I RE-ENABLE ALL P.E.S I % END OF SECTION JOINER PROCESS RESULT STRING GENERATOR **** * FIND P.E.S WITH TOKENS STORED 00062900 00063000 00063100 00063200 OOO63300 00063400 00063500 00063600 00063700 00063600 00063900 00064000 00064100 00064200 00064300 00064400 00064500 00064600 00064700 00064600 00064900 00065000 00065100 00065200 00065300 00065400 00065500 00065600 00065700 00065800 00065900 00066000 00066100 00066200 00066300 00066400 00066500 00066600 00066700 00066600 00066900 00067000 00067100 00067200 00067300 00067400 00067500 00067600 00067700 00067600 00067900 00066000 00066100 00066200 00066300 00066400 00066500 64 ZFRFC2) 1J JUMP CEASE) * JUMP IF ALL TOKENS PROCESSED LEAD0C2>* * FIND NEXT P,E, WITH A STORED TOKEN CPB(l) SC2j I CLEAR MARKER FOR P#E, FOUND CTSBF(O) 1C2#1) X SKIP IF P.E. MAD NO GENERATOR SECT* SI.ITC2) «256> * INDEX PAST FIRST ENTRY OF GEN. P.E. A|_IT(2) ■QBASEj I ADD 256xqBASE TO INOFX NEXwm LHAD(2) .QUERY! I LOAD FIRST WORD OF TOKEN LDL(3) .QUERY; ZFRTC3) #NEXPE; S SKIP IF NO TOKENS LEFT IN THIS P.E. AI.IT<2> -256* LHL(O) SC3j CSHLO) 21 CSHRC3) 58) I EXTRACT CHARACTER COUNT OF TOKEN Z*"RT(3) ,EMITI X SKIP IF TOKEn IS A DELIMITER ALIT(3) Ml CSHLO) 21J « FORM WORD COUNT FOR TOKEN SLlT(3) mOt stl(3) .wrdcti i save word count zfrtc3) #one*dj i skip if token is a single word chr(3) .alimi x set acar3incr ■ *l* acar3indx ■ 1 anthri ldad<2> ,query(3)»x get additional token words Al IT(2) «256l TXLTM(3) #ANTHR; TABLFI LOL(O) .QUERY* I CHECK FOR "PROCEDURE" RESERVED WORD CFXOR(O) .PROCDI I SKIP IF TOKEN IS NOT A 9-CHARACTER ZFRF(O) #ENTERJ * IOENT. STARTING WITH "'PROCEDU* LDL(O) ,QUERP> * LOAD SECOND WORD OF OUERY TOKEN CFXOR(O) ,REj X CHECK SECOND WORD OF QUERY TOKEN ZFRF(O) #ENTERI X SKIP IF TOKEN IS NOT "9PR0CEDURE . . » LTTfO) .1281 * FORM DUMMY DELIMITER FOR PROCEDURE SKIP #EMITJ X OUTPUT DUMMY DELIMITER ONEwni CTSBT(O) 1. ENTERI X SKIP IF TOKEN IS A STRING TOKEN CTSBF(O) CENTERJ X SKIP IF TOKEN IS A NUMBER LDA SCO| JLE RESRVl % COMPARE WITH RESERVED WORD TABLE ILE RESRPJ X CHECK SECOND RESERVEn WORD SLICE SETC(l) Jl ZFRTCl) #SEC*DJ I SKIP IF NO MATCH IN FIRST SLICE LTAOOd)! % FIND NUMBER OF MATCHING RES, WORD ALIT(l) «64j X FORM DUMMY DELIMITER TO SKIP #EMIT' * REPLACE RESERVED WORD SECNDt SFTCCl) II ZFRT(l) #ENTER| X SKIP IF TOKEN NOT A RESERVED WORD LFADOCDI X rIND NUMBER OF MATCHING RES, WORD ALITC1) -1281 X FORM DUMMY DELIMITER TO SKIP >EMITJ X REPLACE RESERVED WORD ENTER! ST|_U) .ACTIV) X SAVE ACARS STL(2) .POYMTj ZFRTCl) lj X SKIP IF THIS LAST P.E. WITH A TOKEN JUMP SERCH' * JUMP TO TABLE MAINTENANCE ROUTINE L0AD(2) SC3; ZFRT(3) lj X SKIP IF NO MORE TOKEnS In THIS P.E. jump serchl x jump to table maintenance routine jump ceasej * jump if all tokens stored but one 65 LOUD CANDO ) cnR(O) LHL(l) LDL(2) L0L(3) |ST0RE(3) | Al IT<3) iSTLO) SKIP **** T ani(O) !Lf>L(i) line Ilha SMAR MLMA ILOC(I) i ICAND(I) Ma |L0L<2) lj|,E ISETCCO) zfrf(o) liLnL(O) SLIT(2) ZPRT(O) LOLC 1 ) 'LOL(O) Ma ISKIP 'JTVLFMC2 > TXLrM(2) Lnx »!LnL(3) in A JLE SPTCO) CAND(O) ZFRT(O) TXLFM(2) |5KIP iLnLO) lDA JLE 5FTC<3) ZFRTCO) •Lfadoco) :SHL(1) :add(0) |5KIP MALTI **** T .QUERY! .HDMSKI sen .ACTIVJ •POYNTI .OUTPT! tcO| »il .outpti #nExwdi able maint .QUERY) •HASH! SCI! SCO) 16; SB! SA; .HHSK! SCO! .WRDCTI TBASE(I) J! #HTCHll .XBASE(1 ■ 01 #A0ST6l SCO) .QUERY! SCO; #TST! #FOWNOl #SHORT) TBASP(l) ,QUERY(2 SC3) *TBASH(2 SC3! #N0YET! #F0WND! #LONGRI .QUERPl SC3) TBASPU) Jl #N0YETI ; * RELOAD FIRST WORD OF TOKEN X EXTRACT TOKEN HEADER X ATTACH HEADER TO RESULT I RELOAD ACARS X GET RESULT STRING OUTPUT POINTER I PUT RESULT IN RESULT STRING X INCREHENT OUTPUT POINTER X SAVE OUTPUT POINTER X CONTINUE TO NEXT TOKEN ENANCE - SEARCH PROCEDURE **** I LOAD FIRST *ORD OF QUERY TOKEN S LOAD HASH CODING CONSTANT IN RGB S LOAD FIRST WORD OF QUERY in RGA I SHIFT QUERY WORD TO THE RIGHT OUT OF THE EXPONENT FIELD X PERFORM HASH CODING MULTIPLICATION S RETURN MOST SIGNIFICANT HALF OF HASH COOING RESULT TO ACAR1 X CLEAR ALL BITS NOT IN 4-BIT KEY S ACAR2LIM>(NUMBER OF QUERY WORDS-1) !I TEST BLOCK FQR MATCHING FIRST WORDS X SKIP )IS NO X SET SKIP USE RELO INT TEST SKIP SKIP !S LOAD )IS GET IF SOME FIR MATCH • LOOK WORO COUNTER IF NO MORE ExT, BLK, PO AD FIRST WOR REGISTER A FIRST WOROS IF QUERY IS IF QUERY IS POINTER TO NEXT QUERY st word matches for Extension blks to first word extension blocks inter vice hash key of query token in new block a single word two words long continuation words WORD )II TEST FOR MATCH 6! SCl! fCOMPLl X SEE IF ALL WOROS MATCH SO FAR X SKIP IF NO ENTRIES STILL MATCH X SKIP IF ALL WORDS OF QUERY USED X TEST NEXT WORD X GET SECOND QUERY WORD IX TEST FOR MATCH X SEE IF BOTH QUERY WORDS ARE MATCHED X SKIP IF BOTH WOROS ARE NOT MATCHED X FIND NUMBER OF P.E. WITH MATCH X CONVERT P.E. ADDRESS TO QUAD ADDR, X FORM COMPLETE RELATIVE QUAD ADDRESS X ADDRESS OF MATCHING ENTRY IN ACARO X FRROR HALT FOR TABLE OVERFLOW ABLE MAINTENANCE • NEW ENTRY PROCEDURE **** 00074300 00074400 00074500 00074600 00074700 00074800 0007*900 00075000 00075100 00075200 00075300 00075400 00075500 00075600 00075700 00075600 00075900 00076000 00076100 00076200 00076300 00076400 00076500 00076600 00076700 00076600 00076900 00077000 00077100 00077200 00077300 00077400 00077500 00077600 00077700 00077800 00077900 00078000 00078100 00078200 00078300 00076400 00078500 00076600 00078700 00076800 00078900 00079000 00079100 00079200 00079300 00079400 00079500 00079600 00079700 00079800 00079900 66 ADSTGI SPACE STOWl MNYSTI MORSTI QUIT t CEASTi LOL(3) CSHR(3) LDA J! Z; sftc(O) GRTRFC3) LnB LOX XT LHL(3) J*L SETC<3> ZTRTC3) CAND(O) 7FRF<0> L*L(1) AI.ITC1) GRTRT(1 ) LFADOCO)* C I. C ( 3 ) ; CSTOW* * SKIP IF SPACE IS ALREADY AVAILABLE .lastx; ■ 2J ,XLIMT # NOSPC'I SKIP IF NO SPACE CAN BE FOUND tL A sTX| X UPDATE LAST EXTENSION BLOCK ADDRESS X rIND NUMBER OF P.E. WITH MATCH $COj SC3J .QUERY; SCO; TBASC(1 ■ Oj #MNYSTJ #QUIT) .QUERPJ SCOI TBASP(1 'QUIT* ■ U $s; ■ U TBASP(1 .OUERYC SCO; #TBAsE( *morst; SSJ *C2; LASTC* E.OR.-E E.ORt-E COMPL' X FNABLE SELECTED P.E, TO STORE ENTRY X LOAD FIRST WORD OF QUERY )JX STORE FIRST WORD X SET ACAR2INDX ■ X SKIP IF ENTRY HAS MORE THAN 2 WORDS X SKIP IF ENTRY IS A SINGLE WORD X LOAD SECOND WORD OF QUERY )JX STORE SECOND WORD OF ENTRY X ROTH QUERY WOROS ARE NOW STORED X RESET ACAR2INDX ■ 1 X RGS CONTAINS LASTC* THE ADDRESS Of THE LAST CONTINUATION WORD USED X INCREMENT CONTINUATION WORD ADDRESS )JX STORE POINTER TO LATER ENTRY WORDS 2)»X GET NEXT QUERY WORD TO BE STORED 2)JX STORE QU E RY IN CONTINUATION WORD X SKIP IF MORE WORDS TO STORE X GET OLD VALUE OF LASTC X UPOATE LASTC X SAVE LAST CONTINUATION WORD ADDRESS J X RE-ENABLE ALL P.E.S X TABLE MAINTENANCE COMPLETE X FND OF RECOGNIZER **** oc fi '9*3 fc