L I B R.A R.Y OF THE U N IVERSITY or 1 LLI NOIS 510.84 Iter no. 349-354 cop. 2 Digitized by the Internet Archive in 2013 http://archive.org/details/automaticgenerat350beal / i THE AUTOMATIC GENERATION OF FLOYD PRODUCTION SYNTACTIC ANALYZERS Alan J. Beals Jacques E. LaFrance Robert S. Northcote THE LIBRARY OF THE OCT 28 1969 UNIVERSITY flF II LINfllS September 9, I969 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA- CHAMPAIGN URBANA, ILLINOIS 6l801 THE AUTOMATIC GENERATION OF FLOYD PRODUCTION SYNTACTIC ANALYZERS* by Alan J. Beals Jacques E. LaFrance Robert S. Northcote * This work was supported in part by the Department of Computer Science, University of Illinois, Urbana, Illinois, and in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. USAF 30(602)*klM. ABSTRACT This paper describes an algorithm for the conversion of a grammar in the form of a set of BNF productions into a deterministic parsing algorithm as described by a set of modified Floyd productions. It describes the implementation of a recognizer based on Floyd produc- tions, including optimization of the recognizer and syntactic error recovery. A complete example is given in an appendix and illustrations from it are used in the text. Computing Reviews Category Numbers: k.l and k.2 -111- 1. Introduction Floyd Production Language (FPL), developed by Floyd [l] and later modified by Evans [2] and Feldman [3], has often been used to specify the syntax of a language for writing compilers. An FPL syntax specifica- tion of a language £ is a direct specification of a (usually) very fast deterministic parsing algorithm for £,. The specification is written as a set of labeled groups of FPL statements (often called "productions", although "reductions" would be more appropriate). Many minor variations on the form of the statements and on the mnemonic meanings of the labels have been used. The form used in this paper is influenced by the fact that the statements, and the labels to them, are generated automatically. The notation used will be described as it is introduced. An FPL statement will have the form: 1' a P -N \ \ where (a) L 1= (b) a (c) p (d) ->N (optional) labels the group L of 1 or more FPL statements; is a string of n symbols which is to be compared with the top n symbols of the recognition stack; (optional) is a string of m symbols to be compared with the next m symbols on the input string; (optional) means reduce the string a in the recognition stack to the single nonterminal symbol N; *-N (optional) means place the marker symbol N on the marker stack to indicate that the nonterminal symbol N is being sought; (optional) is a call on the lexical analyzer (scanner) to place the next symbol from the input string into the recognition stack; (g) L p is the label of the next group of statements to be executed; (h) if the comparisons in (b) and (c) are unsuccessful, the next statement in sequence (in the same group) is executed; if there are no statements left in the group, a syntactic error has occurred. An example of a 36 statement FPL syntax specification of a simple language is given in the Appendix. Most FPL specifications have been hand coded. While this is not a particularly difficult task the syntax definition so obtained is of little benefit to programmers who wish to program in £. They would prefer to work from a BNF specification of £. Ear ley [k] and DeRemer [5] have proposed schemes to convert BNF productions into FPL statements. The algorithm described in this paper is an extension and implementation of DeRemer 's algorithm. The use of special markers (bar symbols) in an auxiliary stack facilitates the automatic generation of syntactic error recovery, as well as minimizing the number of FPL statements which must be matched. Also, reductions to nonterminal symbols are allowed only in the top of the recognition stack, thereby further reducing the number of stack compari- sons required. The algorithm described below enables very fast and efficient recognizers for many BNF grammars to be generated automatically. It has been used successfully to generate parsers for TRANQUIL [6], a language for specifying array-type algorithms, as well as several other languages being developed on the ILLIAC IV project. 2. The Basic Algorithm Consider a language £ to be defined by a grammar G = (V T , V N , S, P), where V •■■■ the set of terminal symbols of £ (represented by lower case Latin letters and the system supplied end marker J_); V N = a set of nonterminal symbols (represented by upper case Latin letters); SeV is the objective symbol; P is a numbered set of BNF production rules defining £ with Z:: = J_sJ_. Appendix B is a sample BNF grammar with both intermediate and final results of the application of the algorithm. The examples used in this and the following chapters are chosen from this appendix. Some of the examples used will differ slightly from the appendix but in each case the difference will be explained as further details of the algorithm are described. Define XeV V to be a head symbol of N eV if there exist BNF product ir N l • • — N 2 • "a : : = N 3 • N n : : = X . • for n > 1. a) M : := cm ... b) M : := art . . . c) M : := a The following formal rules for converting from BNF productions to FPL statement s, for determining the formal labels of groups of FPL statements, and for determining the statements which should constitute each group were developed by DeRemer [5]. If a represents the string made up of the first n symbols on the right of BNF production n, then the following BNF to FPL mapping rules apply: BNF Production FPL Statement a I *Nh a I *t(n, n+l) a|-»M| Mt In these FPL statements the string ex is to be compared with the top n symbols in the recognition stack; * denotes scan another symbol from the source program into the stack; -*M| means replace the string ex in the stack by the nonterminal symbol M. The symbols on the right are labels of the next group of FPL statements to be executed after complete execution of those productions; Nh identifies a group of statements which attempts to locate an initial (head) symbol of N in the stack; t(n, n+l) labels the statement group which attempts to find a terminal t in the top of the stack corresponding to the t in position n+l of BNF production number jt, Mt labels the group which attempts to match constructs with an M at the top of the stack. For example, BNF production 1 £:: -Js! converts to FPL productions 1, ik and 15: *Nh-S IS *t(l, 3) J_SX -z| Nt-Z for n = 1, 2 and 3, respectively. Define X (n) to be the n symbol on the right side of BNF pro- duction jt. Then the following rules determine which labels (groups of FPL statements) must exist. (a) For each NeV : label Nh exists if 3 it, n: N = X (n), n > 1 e.g., for S but not for D in the example . (b) For each NeV : label Nt exists if 3 Jt, n: N = X (it), n > i.e., for all elements of V except Z in the example. (c) For each teV ,: label t(jr, n) exists for each jt, n: t = X (jt), n > 1 e.g., for x but not d in BNF production 7 D ::= dEx The next step is to determine the set of FPL descriptors for each FPL group which must exist, where each descriptor will correspond to (normally) an FPL production. The three rules are: (a) {(it, 1)|X (jt)eV and the left side of BNF production it is a head symbol of N} e. g., D^g - [(11, 1), (12, 1), (13, 1)}. (b) D Nt = {(«, n) | N = X n («), n > 0, all *} e.g., D Nt _ c = {(3, 2), (6, 1)) < C) D t(n, n) -U*> ^ e *S- D t(l, 3) = C(1 ' 3)} By determining the group labels, then the descriptors, and apply- ing the mapping rules to the set of BNF productions, it is always possible to obtain an equivalent set of FPL statements which may be used as a syntax recognizer for the language specified by the BNF grammar. Unfortunately, this recognizer will usually be nondeterministic. To make it deterministic, FPL statements within a group may have to be reordered, and, when two state- ments mutually preclude each other (the placement of either before the other will always preclude execution of the second one), expansion of syntactic context in one or both of them may be necessary. In theory this can always be done, but in practice it is necessary to obtain determinism in a way which minimizes the expansion of context. 8 3. Rules to Make the Algorithm Deterministic simple example of mutual preclusion is obtained when the BNF grammar has productions of the form: 1: N ::= ab... 2 : N : : = ac . . . which lead to the creation in the Nh group of the FPL statements: a| *t(l, 2) a| *t(2, 2) i.e., statements which have identical stack comparison strings, e.g., Nh-E: f | - E| Nt-E f| *t(l2, 2) f| *t(l3, 2) A first attempt at resolution of this problem is to combine the destination group labels of the offending statements into a single combined group label, combine the groups into a single group with that label, and replace the FPL statements by a single statement. No combination of destination groups is possible when one of the precluding statements involved is of the type a | -* N | Nt ie reductions are not delayed. In this case an expansion of context will be necessary. The following revisions are made to the BNF grammar prior to conversion to FPL statements: (a) Two BNF productions of the form A :: = CUB. .. and C : : = CCt . . . are changed to A J = CUB. • • C : : = CUD. • • D . = t (h) Two BNF productions of the form A ::= ON. .. and B : := oM. . . , where M is a head symbol of N, are changed to A : != ON B : : = aD D ::= M e.g., BNF production 3 S : : = aCqr becomes BNF productions 3 and 18 S : : = aZqr Z ::= C since the C of production 3 is a head symbol of the B of production 2. 10 te above revisions have "been made to the BNF grammar, and neither of two precluding FPL statements involves a reduction, then the sta* will be of the form: a| *Nh and a| *Mh or a| *t(it-, m) and a| *t(it 2 > m ) where m = n + 1. These can be combined into a| *ch(p) or a| *ct(p) where the ch(p) and ct(p) groups of statements are the union of the Nh and Mh groups, and the t(n , m) and t(n , m) groups, respectively, and p is described below. e.g., FPL production 2 Nh-S a| *ch(l) is a result of combining a | *Nh-B a| *Nh-Z which arise from the descriptors (2, l) and (3> l) for Nh-S. When a production of the form: a\ *m corresponding to the descriptor (n, n) is executed a special marker (bar symbol) denoted by N(n, m) is pushed into a separate bar symbol stack. Th- ^presented in the appropriate FPL statement by: 11 a|-N(jt, m)|*Nh If the statements, arising from descriptors (it,, n) and (jtp, n) are identi- cal then a special bar symbol N*(p) is used in a single combined statement, where p is a pointer to the bar symbols N(it , m) and N(jt p , m) (there may be more than two). A combined statement involving several different nonter- minal symbols is represented by a|«-(p) | *ch(p), where p is a pointer to a list of the bar and special bar symbols involved, e.g., FPL productions 1, 2 and 26 ]_ | «-S(l, 2) | *Nh-S a | «- (1) | *ch(l) d | -E*(2) | *Nh-E The FPL statements in each Nt group are classified as type a or type b, according as their descriptors (jt, n) have n = 1 or n > 1, respec- tively. Only type a statements are relevant in the syntax analysis if the symbol at the top of the bar symbol stack is not N, or a pointer (p) to a symbol list containing N, because a terminal head symbol is always sought first to begin the construction of a nonterminal and a bar symbol is not, therefore, pushed onto the bar stack for any nonterminal head symbol, i.e., the latter is never explicitly looked for. Thus the type a FPL statements form a subgroup, labeled Nta. The Nta subgroup is void if N is never a head symbol of a BNF definition of a nonterminal other than itself; e.g., Nta-C: c|j *t(6, 2) C I -Z Nt-Z 12 If, after a reduction to N, the top symbol in the bar stack is N(n. or a pointer (p) to a symbol list containing N(n, n) then either a left recursive production (to continue reduction to N) or the FPL state- ment with descriptor (jt, n) applies next. Thus an Ntb(n, n) subgroup which includes the FPL statements arising from the recursive BNF produc- tion defining N, followed by a statement to remove the top symbol from the bar stack, followed by the statement with descriptor (n, n) is generated. If the top bar symbol is N*(p), or if this symbol is included in a combined list at the top of the stack, then a combined subgroup cNtb(p) is generated in like manner but with the several statements determined by the descriptors in the list pointed to by (p) being placed after the bar removal statement; e.g., Ntb(9, 2): F| *t(lO, 2) pop bar stack IF | -D | Nt-D The combination rules described above may now be applied in a Nta subgroup, in the recursive part of a Ntb or cNtb subgroup, and in the nonrecursive part of a cNtb subgroup. No combination is allowed between recursive and nonrecursive statements in any subgroup. With this subgrouping, transfer to the group label Nt becomes a dynamic transfer, DNt, to Nta, a Ntb(jt, n) or a cNtb(p) subgroup depend- ing on the DNt symbol currently at the top of the bar stack; e.g., FPL production 21 iF| -D| DNt-D If preclusions still exist after the subgrouping and combining :ribed above then contextual expansion is required. A great deal of 13 information concerning the symbols preceding the string a in the stack comparison part of an FPL statement is implicit in the grouping of the statement. Therefore, only right context expansion (lookahead) is employed. This is done by generating, for each statement involved in the preclusion, all possible strings (up to length k) of terminal symbols which may follow the a in the stack comparison part. If all such strings are different for the involved statements the preclusion has been eliminated. Experience has shown that a lookahead of k = 1 symbol is usually sufficient to eliminate preclusions and no practical examples have been found where a finite look- ahead of more than k = 3 symbols is necessary to resolve a preclusion* Thus, when contextual analysis is necessary, a one symbol lookahead is generated. If this fails to differentiate, a three symbol lookahead is generated. If this also fails to resolve preclusions "the attempt to obtain a deterministic FPL recognizer from the given BNF grammar is terminated. The lookahead for a particular FPL production need only be enough to differ- entiate it from any following productions which it precludes. Thus the last of a set of precluding productions needs no lookahead; e.g., Nh-E: f| X -»e| DNt-E *ct(*0 Ik Optimization of Interpretive Instructions e construction of a syntactic analyzer out of the FPL state- ments involves the construction of a string of operators and operands which either interpret ively executed or converted to an ALGOL program to be compiled and then executed directly. The operators fall into seven classes: pointer initializing, recognition stack tests, lookahead queue tests, recognition stack manipulation, transfer of control, semantic routine calls, and error recovery, a complete list of which is given in Appendix A. Recognition Stack Tests In the Nta, Ntb(jt, n) and cNtb(p) type subgroups no recognition stack tests are necessary. In the Nta type subgroups, such as Floyd pro- duction number 12: Nta-D: D| * ct(3) the string a consists of the single symbol D which is put there immediately before transfer to this group either by production number 21: IF | -D | DNt-D or by production number 28: i| -D | DNt-D the Ntb(n, n) and cNtb(p) type groups the string a is of the form PN, where is of length n, n > 1, for example, Floyd production number 21 above. In th <3, the presence of the string p = "i" is verified just before the symbol N(n, n)(= "F(9, 2)") is pushed into the bar stack, in production number 27: 15 m k *-F(9, 2) | * Nh-F I and a transfer is made to the Nh-F group to begin seeking the constituents of F. When F has been found, the recognition stack contains iF, the top symbol in the bar stack is then F(9, 2), and control is transferred to the type b subgroup, Ntb(9, 2). In the Nh, and ch(p) type groups, a terminal head symbol of a non- terminal is sought. Hence, the a's all consist of single terminal symbols, as in Nh-F: k | ->F | DNt-F l\ -»F | DNt-F m | -»F | DNt-F In the ct(p) type groups the a in the stack is of the form pt where the £ was recognized in the previous production and the t is the symbol scanned just before the transfer to this group. For example, ct(5): dEx | -*C | DNt-C dEy | -»C | DNt-C where the "dE" has been recognized in production 30: dE | *ch(5) Hence, in all three of these type groups it is sufficient to test only the top (terminal) symbol of the recognition stack. The statements in each of these groups are ordered before they are processed so that all those with identical stack comparisons (differ- entiated by lookahead) are together. The symbol at the top of the recogni- tion stack is tested on the first statement with an instruction which, upon 16 failure, transfers to the beginning of the next statement with a different : ol at the top of the stack. No stack test is made on the statements in between. An example is group ch(l): d | -E*(2) | * Nh-E m -F(9, 2)| * Nh-F I -D DNt-D i the test for "i in the third production is not made. Failure in the stack test in the second production causes a transfer over the third production to the end of the group; failure in the lookahead test in the second production causes a transfer to the next (third) production. In the event that several statements have different stack comparisons but all take the same actions (same semantic routine calls and same recognition stack reduction), a mode pattern is built with a bit on for the stack symbol of each statement. One instruction is produced for the stack tests of all of these statements. It simply checks to see if, in the appropriate row, the bit corresponding to the top stack symbol is on. The Nh-F group mentioned above is an example of this. Since the t(n, m) groups are identical in basic form to the ct(p) groups they are handled in the same way as the preceding except that there is always only one statement in each group and only one transfer to each group (which yields a further optimization to be discussed later). Lookahead Contextual Analysis Tests These tests follow the recognition stack tests whenever needed and are implemented with three main types of instructions: IT (1) if the right symbol is present, increment the lookahead level pointer and go on with next instruction, otherwise branch to another instruction; (2) if the right symbol is present, branch to another instruction, otherwise go on with the next instruction; (3) if the right symbol is present, go on with next instruction, otherwise branch to the beginning of the next statement. The address of the next statement is set in a global location just before the string of lookahead test instructions. Each of the last two types include a bit pattern test like that used in the stack tests above. To optimize the lookahead test the lookahead strings are ordered as in the following example: given strings ordered cef a cd b cij cd cik cef a ceg ceg ceh b cij cil cik ceh cil 18 The following f igure shows how the above lookahead strings would be marked for testing with subscripts indicating which of the three types of test is needed for each symbol. Subscript indicates that no test is needed, and y and z are the bit patterns formed for this example: "2 l 3j a, b € x f , g, h € y j, k, 1 e z Let XLB be the instruction type 1 (branch on failure), XLA be type 2 (branch on success), and XLL be type 3 (branch to next production on failure), with a suffix "B" meaning a bit-pattern test, and L. represent- ing a transfer to label L. as indicated by the instruction type. Then the following is the list of instructions performing this lookahead contextual ana] LI: L2: XLAB (x, L2) XLL (c) XLA (d, L2) XLB (e, LI) XLLB (y) GOTO (L2) (i) XLLB M (rest of this Floyd production) 19 Transfer of Control Some additional optimization can be applied if the symbol follow- ing a is a terminal which is not combined. The fact indicated earlier, namely, that there is always only one statement in a t(n, m) group and only one transfer to it, means that there is no need to generate a transfer. Therefore, the next symbol is scanned and the t(it, m) statement is created immediately as if it were part of the same statement. Floyd productions l8 and 19 are an example: F I n * t(lO, 2) t(lO, 2): Fn | -F | DNt-F Further, if a lookahead was performed then the symbol was successfully tested in the lookahead test so no test need be performed in the t(rc, m) statement. The next symbol is scanned and the recognition stack pointer is incremented without a test. This is the case in the above example. If a lookahead of at least k symbols is required, then this is done for t(jt, n + i) (l < i < k) statements, if they exist. Several other minor optimizations involving stack reduction and transfer of control have been implemented, as indicated by the description of the parser operators given in Appendix A. 20 5. Error Recovery At the end of each group there is an error statement which is applied if every statement in that group failed to match. The general error recovery technique used has been to look at the top symbol in the bar symbol stack, reduce the stack to the nonterminal named in the case of an N or N*, skip the input to the first symbol which can follow N, and transfer to the Ntb group indicated by the bar symbol. If the bar symbol is a combined bar symbol, then the input is scanned for the first symbol that can follow any of the bar symbols in the combined group. When one is found the bar symbol it follows is treated as above. This implies the existence of a table which gives, for each occurrence of a nonterminal, a table of terminal symbols which may immediately follow that nonterminal occurrence. This table actually includes terminal occurrences also, as will be seen later, and is generated in the same manner as the lookahead strings are generated. It sometimes happens that the symbol in error is first encoun- tered in a lookahead test. This can cause the appropriate FPL statement to be skipped in favor of a wrong one. The parse is directed down a wrong path and several reductions sometimes can be made and bar symbols popped before the error is detected. This causes a greater portion of the input to be skipped than if all the bar symbols had been retained. An ALGOL example of this problem would be the following: Suppose an arithmetic expression in an assignment statement contains an incorrect exponentiation operator. Then when it becomes the next input symbol, the bar stack will contain primary, factor, term, arithmetic expression, statement, compound tail, program. The lookaheads needed to decide if the end of each construct has been reached could be ordered in 21 such a way that the reduction is always made and the bar symbol popped until the END following the last statement is needed, and inserted, to make the compound tail until, finally, only the program symbol is left. Then the input is scanned to the first symbol following program, namely end-of-file mark, i.e., the rest of the source string is skipped; whereas, if the primary were the top bar symbol only a few symbols to the next operator, "; ", END, ELSE, etc. would have been skipped. In order to avoid this problem an additional test is made in any group where lookahead testing is needed to check whether the next symbol is in the set of symbols which can occur at that point. If it cannot, then the general error recovery scheme is called immediately. There are many situations which have special features and allow for more specialized error recovery than that outlined above. A discus- sion of these now follows. The t(jt, m) type of statement is executed when the previous history of the parse leaves no choice. Since there is only one production and, therefore, only one possibility for the top symbol in the stack it may be inserted by the following insertion rules if it is not there. The parse then may proceed as if no error had occurred; no error production is required. Rules for Insertion Assume the following symbolism: £ = the part of the stack below the top symbol, 7 = the unscanned portion of the input after the next symbol, a = the symbol that is sought by the test, b = any symbol which can imme- diately follow this occurrence of a, c = any symbol, | = the top of the stack (the stack to the left, and the unscanned input to the right, -" 2 » = if the situation to the left of the arrow holds, then change it 22 to the situation on the right. The following are the rules; the first one from the top that applies is the one that is used: P b | a 7 > a | b y 3 b | c 7 =*. 3 a | b c 7 P c , a 7 - > P a I 7 3 cj c 2 7 =^ p a | c 2 7 The second special case occurs when an Nh, ch(p), or ct(p) group consists only of statements all of which contain the same symbol in the stack com- parison field; then the stack test can insert the symbol if it is not there, using the above rules for insertion as in the t(«, m) statements. The error statement is not then needed since the parse will continue in the same way regardless of the outcome of the stack comparison. The next special case is that of a group in which one statement has a stack symbol of a character or special word whereas all the others of that group have, as stack comparison symbol, a terminal class symbol (identifier, number, or string). In this case the assumption is made that the error is far more likely to have occurred with the specific terminal symbol than with a terminal class symbol. The error statement here inserts the specific terminal symbol according to the rules for insertion and trans- fers back to the appropriate place in the corresponding statement. The last special case applies when all the statements of a group make the same reduction. In this case that reduction is made anyway. The top symbol of the stack, which didn't match any of the stack tests, is put back into the input queue if it can follow the nonterminal to which the stack is reduced. The Nh-F group is an example: 23 k | -F DNt-F I | -F | DNt-F m | -*F | DNt-F Note that no error statement is needed in the Nta, Ntb(jt, m) and cNtb(p) type subgroups because no stack test is made in these groups and the last production requires no lookahead test so it, at least, will match. 2k 6. Conclusion The FPL parsing algorithm, not surprisingly, has similarities to precedence parsing algorithms in that the three different possible stack actions do nothing, -N, -»N, which can be specified in a FPL statement, correspond to the three precedence operators =, <* , and •>, respectively. The conversion algorithm is rather slow compared with some other algo- rithms but has the advantage that, with one exception, it is able to make use of more context than is employed in precedence schemes in determining . the bounds of the phrase next to be reduced. A more general error recovery capability than that usually associated with precedence techniques is included in the algorithm. Careful consideration of rather obvious optimizations in the information included in the FPL statements has been reflected in the FPL statement generation algorithm, thereby enabling the production of highly efficient syntax recognizers. Representation of the basic parser inter- preter instructions as hardware instructions, or at least as microprogrammed sequences, would further enhance overall compiler performance. Interpretation of syntax tables also can be avoided by directly coding the FPL statements in a higher level language for machines, such as the B5500 and B65OO, which have a suitably matched software -hardware capability. 25 Mnemonic LLVL ILVL XSBS XSBT XSBB XSIS XSIT XLBS Appendix A: Parser Operators Operands Action Initialize lookahead buffer test level pointer to the first position Increment the recognition stack pointer S,A Test the top of the stack for symbol S yes => increment stack pointer no => branch to address A T,A Test the top of the stack for class symbol type T yes => increment stack pointer no => branch to address A R,A Test the top of the stack with row R of the pattern array marked => increment stack pointer not marked => branch to address A S Test the top of the stack for symbol S yes => increment stack pointer no => insert S at top of stack and increment stack pointer T Test the top of the stack for class symbol type T yes => increment stack pointer no => insert a symbol of type T at top of stack and increment stack pointer S,A Test the input queue for symbol S yes => increment lookahead level pointer no => branch to address A 26 JCLBT T,A ICLAS XLAI XLAB HIS XLLT XLLB REDl S,A T,A R,A N S Action Test the input queue for class symbol type T yes => increment lookahead level pointer no => branch to address A Test the input queue for symbol S yes => branch to address A no => go on Test the input queue for class symbol type T yes => branch to address A no => go on Test the input queue with row R of the pattern array marked => branch to address A not marked => go on Test the input queue for symbol S yes => go on no => branch to address in NEXTm/foUCTI^N Test the input queue for class symbol type T yes => go on no => branch to address in NEXTPRjfoUCTI/!)N Test the input queue with row R of the pattern array marked => go on not marked => branch to address in NEXTPR,|6DUCTIj#N Subtract N from the recognition stack pointer Change the name of the top symbol of the recognition stack to S Mnemonic REDK BPSH BP,0P TPSH EXEC XTSM XRSM Operands N,S Action 27 B N N N N/^P SKIP N GjbTjb A XBGfi A SETS Subtract N from the recognition stack pointer and change the name of the top symbol of the recognition stack to S Push bar symbol B into the bar stack Pop the top bar symbol from the bar stack Put the next input symbol into the recognition stack at location of the recognition stack pointer Execute semantic routine N Execute semantic routine N and test global Boolean SEMANTICTEST true => go on false => branch to address in NEXTPRjZ!>DUCTl/>N Execute semantic routine N and test global Boolean SEMANTICTEST true => go on false => print error message and go on Go on Skip N characters to next row of parser instruction table Branch to address A Test top stack symbol with top bar stack symbol (possibly going into a combined group) match => branch to address in bar symbol no match => branch to address A Put A in NEXTPRj&)UCTI/!>N 28 • -rands ERKE ERRN ERRR S,A S,A, Action Test top of stack with next input symbol to see if latter can follow the former yes => go on no => execute code for ERRR instruction Print error message, insert terminal syaibol S at top of stack and go to address A Print error message, reduce stack to nonterminal symbol S, and go to A Print error message, recover from error by using top bar symbol 29 Appendix B: Conversion of a Simple BNF Grammar (a) The BNF productions: Production number 1: Z : : = 1*1 2: S : : = a B | 3: a C q r k: B : : = D b | 5: D c | 6: c j 7: C • • — d E x | 8: d E y 9: D : : = i F | 10: i 11: E : : = f i 12: f g I 13: f h Ik: F : : = F n | 15: k | 16: I | 17: m A dummy nonterminal symbol Z is needed at (3*2) so production 3 is changed and 18 is added as follows: 3: 18: a Z q r C (b) FPL statement group labels and descriptors 30 FPL Statement Group Label Descriptor Set Nh-I (start) (1,1) Nh-S (2,l)(3,l) Nh-E (11,1) (12 ,1) (13,1) Nh-F (15,1) (16,1) (17,1) Nta-Z exit Nta-C (6,1)(18,1) Nta-D (^,D(5,D Ntb(l,2) (1,2) Ntb(2,2) (2,2) Ntb(3,2) (3,2) Ntb(9,2) (lU,l)(9,2) ch(l) (7,l)(8,l)(9,D(lO,l) cNtb(2) (7,2)(8,2) ct(3) (^,2)(5,2) ct(U) (12,2)(13,2) ct(5) (7,3)(8,3) t(l,3) (1,3) t(3,3) (3,3) t(3,*0 (3,10 t(6,2) (6,2) t(l4,2) OM) 31 (c) The FPL statements generated: 1. Nh-£ (start): jj «-S(l,2)| * Nh-S ^"HT I * ch(l) x -*E I DNt-E 2. Nh-S: 3- Nh-E: It. 5. Nh-F: 6. 7- 8. Nta-Z 9- Nta-C 10. 11. 12. Nta-D f "ly ot(k) k| ->F DNt-F t| -»P | DNt-F m| -» F | DNt-F success exit C|j * t(6,2) t(6,2): Cj| ->B| DNt-B C| -»Z I DNt-Z D| * ct(3) 13. Ntb(l,2): pop bar stack Ilk j_s| * t(i,3) 15. t(l,3): J_SJ_| -»S| DNt-E 16. Ntb(2,2): pop bar stack 17. aB| -» S I DNt-S 18. Ntb(9,2): F|n * t(10,2) 19. t(10,2): Fn| -> f| DNt-F 20. pop bar stack 21. iFl -»D DNt-D . 23- 2k. 25- 26. 27. 28. Ntb ch(l) pop bar stack aZ| t(3,3): aZq| t(3»: aZqr| -* E (2) t(3,3) * t(3,k) DNt-S Nh-E m k «-F(9,2)| * Nh-F I ^D DNt-D 29. 30. 31. 32. 33. 3^. 35. 36. cNtb(2) Ct(3): ct(U) ct(5) pop bar stack dEl * ch(5) Db| ->B DNt-B Dc| -»B DNt-B fg| -> E DNt-E fh| -> E DNt-E dEx| -» C DNt-C dEy| -»C DNt-C 33 (d) The FPL parser interpreter instructions Nh-S: push -L into recognition stack TPSH GOTO (LI) LO: ERKR Nh-S: LI: XSIS (a) BPSH (1) TPSH GOTO (116) Nh-E: L2: XSIS (f) SETS (L3) LLVL XLLB (1) RED1 (E) XBGO (LO) L3: TPSH GOTO (L2k) Nh-F: LU: XSBB (2, L6) L5: RED1 (F) XBGO (LO) L6: ILVL ERRN (F, L5) Nta-S: L7: success exit Nta-C: L8: SETS XSLR LLVL (L9) XLLS (J) TPSH (t(6,2) ILVL REDN (1, B) XBGO (LO) L9: RED1 (z) XBGO (LO) Nta-D: L10: TPSH GOTO (L21) row 1 of pattern array: x,y row 2 of pattern array: k,l,m 3h Ntb(l, Lll: BPOP TPSH XSIS (1) REDN (2, E) XBGO (L7) Ntl L12: BPOP REDN (i,s) XBGO (LO) Ntb(9, L13: SETS XSLR LLVL (LlU) XLLS (n) TPSH (t(l0,2): ) ILVL NPOP (1) XBGO (LO) LIU: BPOP REDN (1, D) XBGO (LIO) Ntbl . L15: BPOP TPSH (t(3,3)0 XSIS TPSH (q) (t(3,U):) XSIS (r) REDN (3, S) XBGO (LO) L): Lll ;bs (d, L17) BPSH (E*(2)) TPSH TO ) l: (i, L19) (L18) ,VL XL, PSH (F(9j TP1 /no (l.k) 35 L18: L19: cNtb(2): L20: ct(3) ct(4) ct(5) L21: L22: L23: L2U: L25: L26: L27: L28: L29: REDl XBGO ILVL ERRR BPOP TPSH GOTO XSBB REDN XBGO ILVL ERRN XSBB REDN XBGO ILVL ERRN XSBB REDN XBGO ILVL ERRN (D) (LIO) L27) 3, L23) 1, B) LO) B, L22) k, L26) 1, E) LO) E, L25) 1, L29) 2, C) L8) (C, L28) row 3 of pattern array: t>,c row k of pattern array: g,h 36 References [1] Floyd, R. W. "A descriptive language for symbol manipulation", . ACM 8 (Oct, 1961), p 579-58U. [2] Evans, A. "An Algol 60 compiler", Annual Review in Automatic Programming , Vol. h, 1964, p 37-50. Feldman, J. A. "A formal semantics for computer languages and its application in a compiler -compiler ", Coram ACM £ (Jan, 1966), P 3-9- Earley, J. C. "Generating a recognizer for a BNF grammar", Report, Computation Center, Carnegie Mellon University (1965). DeRemer, F. L. "On the generation of parsers for BNF grammars: an algorithm", Proc SJCC (I969). [6] Abel, N. E. , Budnik, P. P., Kuck, D. J., Muraoka, Y. , Northcote, R. S. , and Wilhelmson, R. B. "TRANQUIL: a language for an array processing computer", Report No. 315 > Department of Computer Science, University of Illinois at Urbana-Champaign, (April, 1969). UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA RAD (S.curity e,..., llc .tlon a, ,/, /., 6««fr „, a..rr»c. mnd ln *. atn4 a— alarlan „,., ». ^^^ mhm „ ^ owll) ,.„,., ,. C ,...,, M) JITINC ACTIVITY rCnmnrml- ...«._, ™^™ ~^ ^ ^ ^ I ORIGINATING ACTIVITY (-Corporate aurftor) Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 a. report security classification UNCLASSIFIED •a. t»ou» J report title ~ — ~ ' ~ ~ ~~~~~ — — — __ THE AUTOMATIC GENERATION OF FLOYD PRODUCTION SYNTACTIC ANALYZERS 4. descriptive notii (Typ» o/ rtaort and Mclut/M dif.ij Research Report 5 autmorisi (Flral nmmf. ai/drfla Initlml, Immt nam*) " Alan J. Beals, Jacques E. LaFrance, and Robert S. Northcote «. REPORT DATE September Q, I969 •a. CONTRACT OR GRANT NO. U6-26-15-305 b. PROJECT NO. US AF 30 (602)1+144 10. DISTRIBUTION STATEMENT 7*. TOTAL NO. OF PACE* 38 7*. NO. OF Rin 6 M. ORIGINATOR'S REPORT NUMBER(S) DCS Report No. 350 •6. OTHER REPORT NOISI (Any oOiar nwnfn Char mmy b» aaafartacf thla import) ' Qualified requesters may obtain copies from DCS. It. SUPPLEMENTARY NOTES NONE IS. ABSTRACT la. SPONSORING MILITARY ACTIVITY Rome Air Development Center Griffiss Air Force Base Rome. New York 13M+0 This paper describes an algorithm for the conversion of a grammar in the form of a set of BNF productions into a deterministic parsing algo- rithm as described by a set of modified Floyd productions. It describes the implementation of a recognizer based on Floyd productions, including optimi- zation of the recognizer and syntactic error recovery. A complete example is given in an appendix and illustrations from it are used in the text. 3D FORM 1 nov eg 1473 S*ai»aaa#l La*Mnaa_< curity CUtti flection »if ication >OWDJ •nming language syntax analy parsing algorithm er error recovery yd productions KOLI »T ROLE VKT yNCLASSIFIED Security Classification