"L I B R.AR.Y 
 
 OF THL 
 UNIVERSITY 
 Of ILLINOIS 
 
 510. 84 
 ha 271-278 
 
 cop. 2 
 
 -I •' • 
 
i he person charging this material is re- 
 sponsible for its return to the library from 
 which it was withdrawn on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 
 
 m 22 a* 
 
 JAN 2 8 WI 
 
 L161 — 0-10% 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/ongenerationofpa276dere 
 
5/C- H 
 
 Report No. 276 
 
 /luuOh 
 
 ON THE GENERATION OF PARSERS FOR BNF 
 GRAMMARS: AN ALGORITHM 
 
 by 
 
 Franklin L. DeReraer 
 
 ILLIAC IV Document No. 199 
 
 DEPARTMENT OF COMPUTER SCIENCE • UNIVERSITY OF ILLINOIS • URBANA, ILLINOIS 
 
ILLIAC IV Document No. 199 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS 
 URBANA, ILLINOIS 6l801 
 
 Contract No. 
 US AF 30 (602)klkk 
 
 ON THE GENERATION OF PARSERS FOR BNF 
 GRAMMARS: AN ALGORITHM 
 
 by 
 
 Franklin L. DeRemer 
 
 Report No. 276 
 August 1, 1968 
 
 This work was supported in part "by the Department of Computer Science, 
 University of Illinois, Urbana, Illinois, and in part by the Advanced 
 Research Projects Agency as administered by the Rome Air Development 
 Center, under Contract No. US AF 30(602)klkk. 
 
A CKNOWLEDGEMENT 
 
 The author would like to thank Alan J. Beals for his help in 
 evaluating and debugging the algorithm. Thanks are also due Dr. R. S. 
 Northcote for suggested improvements in the paper itself. 
 
 11 - 
 
ABSTRACT 
 
 This paper describes an algorithm which, for suitable grammars, 
 maps the Backus Naur Form (BNF) definition of the grammar of a 
 language into a parser for the sentences in that language. By design 
 the algorithm generates a suitable parser for any bounded right context 
 grammar . It happens that it also covers some LR(k) grammars which are 
 not bounded right context. A modified version of Floyd's descriptive 
 language for symbol manipulation is used to describe the parser. 
 Several examples illustrate the application and generality of the 
 algorithm. 
 
 - 111 - 
 
Introduction 
 
 The algorithm described herein is in essence an extension, 
 albeit a simplification, of the work of Earley^ which in turn was 
 based on Evans/ ' Feldman,^' Floyd/ ''' and Standi sir ' '. For a 
 large subset of grammars, the algorithm maps the Backus Naur Form (BNF) 
 definition of the grammar of a language into a deterministic, left- to- 
 right parser for the sentences in that language. It is shown below 
 that the algorithm, by design, covers all bounded right context grammars 
 and, as a by-product, some LR(k) grammars which are not bounded right 
 
 (g\ — 
 
 context (see Knuth for the definitions of these classes of grammars). 
 
 More precisely, the algorithm maps a set of BNF productions 
 into a program : a "reductions analysis" program* consisting of modified 
 Floyd productions (really reductions ) referred to below as FPL (Floyd 
 Production Language) statements.** The program consists of labeled, 
 mutually exclusive groups of statements called sections. Each section 
 has a specific task to perform. It is activated, by transfer of control 
 to its first statement via the label, only at appropriate times. Upon 
 each activation it either scans a new terminal symbol or makes a reduction 
 (combined with an "unscan" in the case of a production with an empty 
 right part) and then transfers control to the appropriate next section, 
 or it transfers control to an ERROR routine if control falls out the 
 bottom of the section. 
 
 The algorithm is based on Earley's intuitive notion that the 
 top symbols on the stack matched against the right parts of certain 
 productions should determine parsing decisions. It is an extension of 
 his algorithm in that it provides for both finite look-ahead and finite 
 look-back and in that it covers productions with empty right parts. 
 It is a simplification of his alogrithm in that it allows reductions 
 only at the top of the stack, therefore reducing the number of mapping 
 rules. 
 
 *It is assumed that the reader is familiar with reductions 
 analysis programs and the associated stack, input string, and manipula- 
 tions thereupon. 
 
 **This nomenclature is adapted to clarify the distinction between 
 the BNF productions , which together define the grammar of a language, and 
 the FPL statements which, when combined to form a program, describe a parser, 
 
 - 1 - 
 
A word notation is in order before proceeding. In this paper, 
 non- terminal symbols are represented by Latin capitals, terminals by 
 lower case Latin letters, arbitrary strings by the Greek letter <x, and 
 the empty string by the Greek letter e. The Greek letter a designates 
 a symbol which matches any other symbol. 
 
 The Algorithm 
 
 The algorithm simply consists of : (a) three rules to 
 determine what sections are necessary for the program, (b) three 
 corresponding rules to determine which productions should be mapped into 
 statements for each section, (c) four rules to map the productions into 
 statements, (d) a rule which prescibes the combination of some state- 
 ments in a given section and a corresponding combination of certain 
 sections, and (e) a contextual analysis rule for expanding statements 
 so no two statements in a given section are both applicable to a given 
 stack and input string configuration. (The latter operation is referred 
 to below as making the statements disjoint . ) Of course, there are also 
 several rules for optimizations, some of which are given toward the 
 end of the paper. 
 
 (a) Necessary Sections . A special STAET section and a special section 
 for SUCCESS EXIT are required together with the following: 
 
 (1) A section labeled Nh is required for each non-terminal N 
 which appears in the right part of some production as other than 
 the first symbol. This section is activated whenever one of the 
 terminals, which may begin N, is supposed to be at the top of the 
 stack. It is the purpose of section Nh to verify that one of 
 these terminal "head symbols" is indeed at the top and, depending 
 upon which terminal is there, to take appropriate action to 
 commence, and perhaps conclude, a reduction to N. 
 
 (2) A section labeled t(it,p) is required for each occurrence of 
 a terminal as the p-th symbol in the right part of each pro- 
 duction Tr, where p > 2. This section consists of exactly one 
 statement which compares the first p symbols of production « 
 with the top p symbols of the stack. It is activated only when 
 the match must occur for a well-formed string. Its purpose is 
 to verify the top symbol and to take appropriate action to 
 continue the parse. 
 
 - 2 - 
 
(3) A section labeled Nt is required for each non- terminal 
 N which appears in the right part of some production. The 
 section is activated immediately after a reduction to N occurs 
 at the top of the stack. The statements in this section 
 indicate comparisons to the stack to determine which of the 
 production(s) in whose right part N appears is applicable to 
 the case at hand. A match determines the appropriate sub- 
 sequent action. 
 
 (b) Descripton Sets . In order to generate the appropriate set of state- 
 ments for a given section, a descriptor set of pairs D = (k, p ),...) 
 is assocated with each section label. This descriptor set is deter- 
 mined by investigating the productions and serves to indicate to which 
 part of which production(s) the mapping rules described below are to 
 
 be applied. The pair (rt,p) points to the first p symbols of production 
 n as the stack comparison symbols of the corresponding statement. The 
 descriptor sets are determined as follows: 
 
 (1) D„ : Initially D is empty and the following recursive 
 procedure is applied. The right part of each production rt that 
 defines N is examined. If it is empty, then (rt,0) is added to 
 
 D„ : if it begins with a terminal, then (jt.l) is added to D,- : 
 
 Nh Nh 
 
 otherwise it begins with a non-terminal and the procedure is 
 applied to that non- terminal. 
 
 (2) D. / \ contains exactly one pair (it,p). 
 
 t(,jt,pj 
 
 (3) D,,, : The right part of each production n is examined. If 
 
 J.M \j 
 
 the non- terminal N appears as the p-th symbol, then (n,p) is 
 
 in D Nf 
 
 (c) The BNF to FPL Mapping Rules . Presented in Table I are four rules 
 for mapping BNF productions into FPL statements. Together with the 
 descriptor sets they represent a naive first try at generating a parser 
 for the grammar. Implicity, the rules assume there is no question about 
 which production applies to the case at hand but only what action is to 
 be taken by the parser next, given that a certain production is applicable. 
 
 - 3 - 
 
Table I 
 
 The ENF to FPL mapping rules. (cc represents the first p 
 symbols of production n , a is a symbol which matches any other symbol 
 and q = p + 1 . ) 
 
 ENF production (ft,p) 
 
 (1) M ::= ON ... 
 := Ob ... 
 
 (2) M 
 
 (3) M 
 (h) M 
 
 = a 
 
 = e 
 
 maps into 
 
 FPL statement 
 a| * Nh 
 a| * t(*,q) 
 a| -> m| Mt 
 
 a\ -► Ml er Mt 
 
 - k - 
 
It is the purpose of the last tvo rules of the algorithm to extend it to 
 cover a reasonable set of grammars by resolving confusion about which 
 production(s) may apply to different cases within a given section. 
 
 The rules of Table I are explained intuitively as follows. If 
 the first p symbols of the right part of production n are at the top of 
 the stack and 
 
 (1) if the (p+l)st symbol is a non-terminal N, then the parser 
 should scan(*) the next terminal and activate section Nh to 
 begin to reduce a substring to N. 
 
 (2) if the (pfl)st symbol is a terminal b, then the parser should 
 scan the next terminal and activate section t(jt,q), where 
 
 q = p +1, to verify that that terminal is indeed b and to 
 decide how to continue the parse. 
 
 (3) if the p-th symbol is last in the right part of the produc- 
 tion, then the parser should make a reduction (->) to the symbol 
 M defined by the production and activate section Mt to decide 
 how to continue the parse. 
 
 (k) if p = (and, therefore, the right part of the production 
 is empty), then the parser should "unscan" the top symbol, push 
 an M onto the top of the stack, and activate section Mt to 
 decide how to continue the parse. (The symbol unscanned will 
 always be a terminal since this statement will appear only in 
 an Nh-type section, the activation of which is always immediately 
 preceded by a scan (see rule (l)).) 
 
 (d) Combinations . In general, a reductions analysis program generated 
 according to the above rules will contain sections in which some of the 
 statements are not disjoint. That is, the conflicting statements will 
 indicate stack comparisons (l) which are identical, or (2) the shorter 
 of which are identical to the top few symbols of the longer ones. Thus, 
 several statements may be applicable to a single stack and input string 
 configuration, and the parser is in some sense non-deterministic. To 
 render the parser deterministic it must be modified so it can either delay 
 
 - 5 - 
 
or determine the decisions concerning which of the several similar 
 productions associated with the conflicting statements is applicable 
 in various cases. Decision delays are effective "by pairwise statement 
 combinations as follows. 
 
 If a pair of statements in a given section are not disjoint 
 and if each was generated according to either mapping rule (l) 
 or (2), then replace them with a single statement: one whose 
 stack comparison is the shorter of the two and which, upon a 
 successful stack match, scans a new terminal and activates a 
 new combination section which must be added to the program. 
 The new section is that section whose description set is the 
 union of the two descriptor sets of the sections which the 
 original statements would have activated. Of course, the new 
 section must be checked for disjointness, and the old sections, 
 of which the new one is a combination, should be checked for 
 usefulness, since the only reference in the entire program to one 
 or both might have been deleted by removal of the two statements. 
 
 (e) Expansion by Contextual Analysis . The only decisions which cannot be 
 delayed are those concerning reductions. This limitation is due to the 
 requirement that reductions be made only at the top of the stack. Thus, 
 conflicts with statements generated according to mapping rules (3) and 
 (k) cannot be cured by combination. In this case the statements' com- 
 parison fields are expanded by contextual analysis to provide the parser 
 with whatever finite look-ahead and look-back are necessary to make 
 the decision at hand,* i.e., 
 
 for each of the conflicting statements the grammar is inves- 
 tigated and generation begun of the strings of symbols which, 
 in the context of the production associated with the statement, 
 may surround the original stack comparison substring a of the 
 statement. Appropriate comparison of the composite strings 
 associated with each of the original statements, indicates the 
 minimum context which must be checked to make the statements 
 disjoint. In the worst case each statement must be replaced 
 
 - 6 - 
 
with several statements which differ from the original in 
 that they indicate more symbols which must be matched in the 
 stack and/ or the input string. 
 
 Examples 
 
 Since the parser proceeds from left to right, always making 
 reductions at the top of the stack on the "basis of whatever finite look- 
 ahead and look-back are necessary, the algorithm by definition covers all 
 bounded right context grammars . Further, due to the fact the sections of 
 the program themselves imply certain extra information about the stack 
 configuration, in the same sense that a state of a finite state acceptor 
 implies information about the string read, the algorithm also covers 
 some LR(k) grammars which are not bounded right context. An example 
 grammar in this class is S : := aA|bB, A : := cA|d, B ::= cB|d, the 
 sentences of which are a c d and be d. It is not bounded right 
 context since the clue as to whether to reduce d to A or B is an a or b 
 arbitrarily far down the stack. The grammar is however, LR(O) and can 
 be parsed by the algorithmically generated parser of Figure 1. Note 
 that a transfer of control to an ERROR routine is implicit at the bottom 
 of each section in case no match occurs. 
 
 - 7 - 
 
START (Sh) 
 
 Ah 
 
 Ah 
 
 Bh 
 
 At 
 
 Bt 
 
 St 
 
 b 
 
 -K- 
 
 
 Bh 
 
 c 
 
 * 
 
 
 Ah 
 
 a 
 
 — » 
 
 A 
 
 At 
 
 c 
 
 -* 
 
 
 Bh 
 
 a 
 
 — » 
 
 B 
 
 Bt 
 
 cA 
 
 — > 
 
 A 
 
 At 
 
 aA 
 
 — » 
 
 S 
 
 St 
 
 cB 
 
 — > 
 
 B 
 
 Bt 
 
 bB 
 
 — » 
 
 S 
 
 St 
 
 SUCCESS EXIT 
 
 Figure 1. Algorithmically generated parser for a grammar which is LR(O) 
 but not bounded right context. 
 
As an example of a grammar requiring "both look-ahead and look- 
 back consider the following. 
 
 IT 
 
 
 P 
 
 
 
 123 
 
 1 
 
 S : 
 
 := cAB 
 
 2 
 
 S : 
 
 := dAe 
 
 3 
 
 A : 
 
 := aG 
 
 k 
 
 B : 
 
 := xe 
 
 5 
 
 G : 
 
 := Gx 
 
 6 
 
 G : 
 
 := x 
 
 Confusion arises in the Gt section about when to terminate the gathering 
 of x's into the non-terminal G. Generation of the context related to 
 production five produces three possible strings: 
 
 (1) G| -* G|x -» G|xx 
 
 (2) G| -» G|x -* aG|x -* daG|xe 
 
 (3) G|-*G|x->aG|x^ caG|xB -* caG|xxe 
 
 There are two possible strings for production three: 
 
 (1) aG 
 
 (2) aG 
 
 daG|e 
 caGlB 
 
 caG xe 
 
 Most of the confusion is between case (2) of production five and case (2) 
 of production three. One possible solution is to construct the following 
 
 Gt section 
 
 Gt 
 
 G|xx 
 daG|xe 
 aGl 
 
 A 
 
 t(5,2) 
 
 t(5,2) 
 At 
 
 Note that advantage has been taken of the sequential nature of the 
 program here. Since the first two statements will catch all config- 
 urations to which production five is applicable, the statement associated 
 with production three checks no extra context. That is, the restriction 
 that the statements in a given section must be disjoint may be relaxed 
 in special cases where advantage is taken of the order in which statements 
 are executed, however the contextual analysis must still be performed to 
 ascertain the validity of such an optimization. Finally, note that had 
 production five been G : := xG the grammar would not have been bounded right 
 context nor covered by this algorithm, although it would still be LR(2). 
 
 - 9 - 
 
As a final, larger, and more practical example consider the 
 grammar of Table II, which is Earley's example of a 'simple algebraic 
 language. The corresponding list of necessary sections and their 
 descriptor sets are presented in Table III, and the parser is given 
 in Figure 2. This grammar requires no special look-back end look- 
 ahead of more than one symbol in only one case, section Dt. A single 
 pair of statements were combined in section Ht causing the combination 
 of sections t(l2,2) and t(4,2) to form a section labeled t(l,2; 4,2). 
 Note that such combinations are probably most efficiently effected by 
 operations on the descriptor sets before the sections are generated. 
 Also note that maximum advantage was taken of ordering the statements. 
 However, for expositional purposes several optimizations were not made: 
 (l) since the first p-1 symbols are matched immediately prior to its 
 activation, a t(n,p) section need match only the p-th symbol with the 
 top symbol of the stack, (2) since a reduction to N occurs immediately 
 prior to the activation of an Nt section, it need not match the top 
 symbol, and (3) several sections could have been "concatenated", as for 
 example sections Dt, t(6,2), and t(6,3) which would form 
 
 Dt D|;r *** Th 
 bD| H| Ht 
 
 Finally, since sections Ph, Fh, and Th are identical, and are a subset 
 of section Eh, all these could have been combined to save space; however 
 this is probably undesireable as it implies a loss of information 
 useful for error recovery. 
 
 10 - 
 
PRODUClTON table 
 
 p 
 1 
 
 <AXIOM> 
 
 
 
 <BLOCK> 
 
 1 
 
 <HEAD> 
 
 2 
 
 
 3 
 
 
 h 
 
 <DECLARATION> 
 
 5 
 
 
 6 
 
 <TYPE LIST> 
 
 7 
 
 
 8 
 
 <STATEMENT> 
 
 9 
 
 ^EXPRESSION> 
 
 10 
 
 
 11 
 
 
 12 
 
 <TERM> 
 
 13 
 
 
 lk 
 
 <FACTOR> 
 
 15 
 
 
 16 
 
 <PRIMARY> 
 
 17 
 
 
 18 
 
 z 
 
 B 
 
 H 
 H 
 H 
 D 
 D 
 
 T i 
 
 T i 
 
 S 
 E 
 E 
 E 
 T 
 T 
 F 
 F 
 P 
 P 
 
 H 
 b 
 
 b 
 H 
 r 
 D 
 i 
 
 T i 
 
 i 
 
 T 
 _+ 
 E 
 F 
 T 
 P 
 F 
 i 
 
 ( 
 
 B 
 e 
 
 D 
 
 y 
 
 T, 
 
 T 
 f 
 
 l 
 E 
 
 E 
 
 NOTE : i is identifier 
 r is real 
 "b is "begin 
 
 e is end 
 
 Table II. Production table for a simple algebraic language, 
 
 - 11 - 
 
NECESSARY 
 SECTIONS 
 
 DESCRIPTOR 
 SETS 
 
 START (Eh) 
 B h 
 D h 
 
 V 
 
 T h 
 
 E h 
 
 S h 
 
 F h 
 
 P h 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 t 
 
 H t 
 
 D t 
 
 V 
 
 T t 
 
 E t 
 
 F t 
 
 P t 
 
 B t 
 
 S t 
 
 Z t 
 
 (1,2) 
 
 (h,2) 
 
 (6,2) 
 
 (8,2) 
 
 (9,2) 
 
 (12,2) 
 
 (1^,2) 
 
 (16,2) 
 
 (0,3) 
 
 (6,3) 
 
 (8,3) 
 
 (18,3) 
 
 } 
 
 0,1 
 
 2,1 
 
 5,1 
 
 7,1 
 
 17,1 
 
 11,1 
 
 9A 
 
 17,1 
 
 17,1 
 
 3,1 
 
 18,1 
 17,1 
 
 18,1 
 18,1 
 
 combined to form t (1,2; 4,2) 
 
 18,1 
 
 1,1 
 
 4,1 
 
 (co 
 
 6,1 
 
 3,2 
 
 
 8,1 
 
 5,2 
 
 6,4 
 
 10,1 
 
 iM 
 
 11,2 
 
 12,1 
 
 18,2 
 
 9,3 
 
 13,1 
 
 16,1 
 
 1^,3 
 
 15,1 
 
 16,3 
 
 
 0,2 
 
 
 
 U,3 
 
 
 
 (combination) 
 
 12,3 
 
 Table III. List of necessary sections and the corresponding descriptor 
 sets for the grammar of Table II. 
 
 - 12 - 
 
START (Eh) 
 B h 
 
 D h 
 
 V 
 
 T h 
 
 E h 
 
 S h 
 F h 
 
 P h 
 
 t 
 
 [l,2j h,2) 
 
 t 
 
 [6,2) 
 
 t 
 
 [8,2) 
 
 t 
 
 (9,2) 
 
 t 
 
 [12,2) 
 
 t 
 
 (1^2) 
 
 t 
 
 [16,2) 
 
 t 
 
 [0,3) 
 
 t 
 
 [6,3) 
 
 t 
 
 [8,3) 
 
 t 
 
 [18,3) 
 
 H • 
 
 t 
 
 D " 
 
 t 
 
 V 
 
 t 
 
 b 
 b 
 
 r 
 i 
 
 i 
 
 ( 
 
 + 
 i 
 
 ( 
 
 i 
 i 
 
 ( 
 
 i 
 
 ( 
 
 He 
 
 H; 
 
 V 
 
 i+- 
 
 E + 
 
 Ft 
 
 |-B-| 
 
 D;r 
 
 T r i 
 
 (E) 
 
 H 
 
 D 
 
 bD 
 
 * 
 * 
 
 # 
 * 
 
 * 
 * 
 
 * 
 
 * 
 * 
 
 * 
 * 
 * 
 
 * 
 
 * 
 * 
 
 H 
 
 \0" 
 
 p| 
 
 B 
 
 n 
 
 pi 
 
 H| 
 
 D| 
 
 Dl 
 
 B h 
 D h 
 H t 
 
 V 
 P t 
 
 E h 
 
 T h 
 
 P t 
 
 E h 
 
 t (9,2) 
 
 P t 
 
 E h 
 
 P t 
 
 E h 
 
 B t 
 
 S h 
 
 t (6,3) 
 
 t (8,3) 
 
 E h 
 
 T h 
 
 F h 
 
 P h 
 
 It 
 
 V 
 
 P t 
 
 t (1,2; 4,2) 
 
 t (6,2) 
 
 H t 
 
 t (8,2) 
 
 D t 
 
 D t 
 
 Figure 2. Algorithmically generated parser for a single algebraic 
 language. 
 
 - 13 - 
 
T t 
 
 E t 
 
 F t 
 
 P t 
 
 B t 
 
 S t 
 
 Z t 
 
 t|* 
 
 *■ 
 
 
 t 
 
 (1^,2) 
 
 e+t| 
 
 — » 
 
 E| • 
 
 E 
 
 t 
 
 +t| 
 
 — * 
 
 El 
 
 E 
 
 t 
 
 t| 
 
 — * 
 
 "1 
 
 E 
 
 t 
 
 e| + 
 
 * 
 
 
 t 
 
 (12,2) 
 
 (E| 
 
 * 
 
 
 t 
 
 (18,3) 
 
 i«£| 
 
 — > 
 
 B| 
 
 S 
 
 t 
 
 ?\t 
 
 * 
 
 
 t 
 
 (16,2) 
 
 rpfF| 
 
 -» 
 
 T| 
 
 T 
 
 t 
 
 F| 
 
 -> 
 
 T| 
 
 T 
 
 t 
 
 FfP| 
 
 — > 
 
 F| 
 
 F 
 
 t 
 
 P| 
 
 — > 
 
 F| 
 
 F 
 
 t 
 
 |-B| 
 
 * 
 
 
 t 
 
 (0,3) 
 
 H;S| 
 
 — > 
 
 Hi 
 
 H 
 
 t 
 
 
 
 
 SUCCESS EXIT 
 
 Figure 2. (continued) 
 
 - Ik - 
 
REFERENCES 
 
 1. Earley J. Generating a Recognizer for a BNF Grammar, Carnegie - 
 
 Mellon Institute of Technology, June 1965* unpublished. 
 
 2. Evans, A. An ALGOL 60 Compiler, National ACM Conference, Denver 
 
 1963. 
 
 3. Feldman, J. A Formal Semantics for Computer-Oriented Languages, 
 
 Doctoral Thesis, Carnegie -Mellon Institute of Technology, l$6h. 
 
 k. Floyd, R. A Descriptive Language for Symbol Manipulation, J. ACM 8, 
 k (1961), 579-584 
 
 5. Floyd, R. Bounded Context Syntactic Analysis, Comm . ACM 7, 2 (196k) , 
 
 62-67. 
 
 6. Knuth, D. On the Translation of Languages from Left to Right, 
 
 Information and Control 8, (I965), 607-639. 
 
 7. Standish, T. Generating Productions from a Restricted Class of BNF 
 
 Grammars, Carnegie-Mellon Institute of Technology Computation 
 Center, unpublished. 
 
 - 15 - 
 
UNCLASSIFIED 
 
 Security Classification 
 
 DOCUMENT C JNTROL DATA -R&D 
 
 (Security claaaltlcatlon ol till; body of abatrmct and IndamtnM annotation mutt ba antarad whan tha orarall raport la clamalllad. 
 
 originating ACTIVITY (Corporal* author) 
 
 Department of Computer Science 
 
 University of Illinois 
 
 Urbana, Illinois 6l801 
 
 S REPORT TITLE 
 
 i». REPORT SECURITY C L A SSI FIC A TION 
 
 UNCLASSIFIED 
 
 2b. GROUP 
 
 ON THE GENERATION OF PARSERS FOR BNF GRAMMARS : AN ALGORITHM 
 
 4. descriptive NOTE! (Typa ol raport and htclualra dalam) 
 
 Research Report 
 
 B »UTMORUI(fl«lniM, middle Initial, laal nama) 
 
 Franklin L. DeRemer 
 
 • REPORT DATE 
 
 7a. TOTAL NO. OF PACE* 
 
 19 
 
 7b. NO. OF NEFt 
 
 7 
 
 •*. CONTRACT OR GRANT NO. 
 46-26-15-305 
 b. PROJEC T NO. 
 
 US AF 30(602)klkk 
 
 •a. ORIGINATOR'S REPORT NUMBER(S) 
 
 ILLIAC IV DOCUMENT NO. 199 
 
 •b. OTHER REPORT Noli) (Any othar numbers that may ba aaaljnad 
 thlm rapart) 
 
 DCS Report No. 276 
 
 10. DISTRIBUTION STATEMENT 
 
 Qualified requesters may obtain copies of this report from DCS. 
 
 11. SUPPLEMENTARY NOTES 
 
 NONE 
 
 12. SPONSORING MILITARY ACTIVITY 
 
 Rome Air Development Center 
 Griffiss Air Force Base 
 Rome, New York 13440 
 
 13. ABSTRACT 
 
 This paper describes an algorithm which, for suitable grammars, maps the 
 Backus Naur Form (BNF) definition of the grammar of a language into a parser 
 for the sentences in that language. By design the algorithm generates a 
 suitable parser for any bounded right context grammar . It happens that it 
 also covers some LR(k) grammars which are not bounded right context. A 
 modified version of Floyd's descriptive language for symbol manipulation is 
 used to describe the parser. Several examples illustrate the application 
 and generality of the algorithm. 
 
 DD ,?<,?.. 14 73 
 
 UNCLASSIFIED 
 
 Security Classification 
 
UNCLASSIFIED 
 
 Security Classification 
 
 key wo ROS 
 
 ROLE W-T 
 
 ROUE *T 
 
 RO!_ E WT 
 
 Parser 
 
 Syntax analysis 
 
 Compiler 
 
 Compiler-Compiler 
 
 Security Classification