1 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN L161 — O-1096 { J Report No. k 59 \s1 1^ ^hcZ4 SYNTAX- DIRECTED ERROR RECOVERY FOR COMPILERS Jacques Emmett LaFrance June 21, 1971 ILLIAC IV Document No. 2^9 w o >» H i CO g o 0) o PC! *Sagg£*l w Report No. k 59 SYNTAX-DIRECTED ERROR RECOVERY FOR COMPILERS by Jacques Emmett LaFrance June 10, 1971 Department of Computer Science University of Illinois at Urbana- Champaign Urbana, Illinois 6l801 This work was supported in part by the Advanced Research Projects Agency as administered by the Rome Air Development Center under Contract No. USAF 30(602)-klkk and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, 1971. Digitized by the Internet Archive in 2013 http://archive.org/details/syntaxdirecteder459lafr ABSTRACT This paper presents a system of automatic error recovery for syntax-directed parsing algorithms which is based solely on the syntax of the language. This system of automatic error recovery uses a table of those symbols which can follow a construct to determine how that construct might be inserted. Four compilers built with this system are described along with examples of the error recovery. The translator writing system in which this system of automatic error recovery has been developed is discussed, including the syntax description language, the generation of Floyd productions, and the parsing table built from the Floyd productions. Finally, the author presents suggestions for improving the error recovery in compilers built using either of the two parsing algorithms, and for research into further extensions. Possible applications of the techniques described in this paper include extensible compilers and compilers for computer science education. V TABLE OF CONTENTS CHAPTER Page 1. INTRODUCTION 1 1.1 Purpose of This Work 1 1.2 Contributions of Other Research 2 1.3 Philosophy of This Work t 7 l.k Measurement of Effectiveness of Error Recovery 9 1.5 Overview of the Thesis 12 2. THE TRANSLATOR WRITING SYSTEM lk 2.1 Overall System Description lk 2.2 The TWINKLE Syntax Description Language and Compiler . . 15 2.3 Conversion of the Backus Naur Form of the Language to Modified Floyd Production Language 16 2.U Construction and Use of Parser Instruction Tables .... 22 2.5 Optimization of the Parser Instruction Table 25 2.6 Source Code Generation for the Parsing Tables 32 2 . 7 Summary 32 3. ERROR ANALYSIS IN MODIFIED FLOYD PRODUCTION LANGUAGE 3I4 3.1 Introduction to Chapter III 3^ 3.2 Error Detection 3I4 3.3 Error Recovery in Case of Unique Terminal Symbols .... 35 3.^+ Error Recovery in the Case of Same Parser Actions .... 38 3-5 Error Recovery with Procedure ERRR 38 3.6 The String Generator Procedure ERRFPINTERPRETER k2 3.7 Summary k8 VI k. DEMALGOL COMPILER 1+9 U.l Purpose and Definition of DEMALGOL k9 k.2 Error Recovery in DEMALGOL 51 U.3 Summary of the Errors 5k k.k Comparison of Burroughs ALGOL Compiler with DEMALGOL Compiler 59 U.5 Comparison of 7090 ALCOR Compiler -with DEMALGOL Compiler 60 k.6 Summary Comparison of Three Compilers 6.1 5. COMPILERS FOR THREE OTHER LANGUAGES 6k 5.1 Introduction to the Three Languages and Their Compilers 6k 5.2 TESLA 6k 5.2.1 Introduction to TESLA 6k 5.2.2 TESLA Error Recovery 65 5.3 ICL 67 5.3.1 Introduction to ICL 67 5.3.2 ICL Error Recovery 71 5.3.3 Error Recovery in the Recursive Descent ICL Compiler 72 3.k OSL 73 5.U.1 Introduction to OSL 73 5.U.2 OSL Error Recovery 7^ 5.5 Summary of the Results from DEMALGOL, TESLA, ICL, and OSL 76 VI 1 6. LANGUAGE DESIGN CONSIDERATIONS FOR IMPROVED ERROR RECOVERY 79 6.1 Introduction 79 6.2 Identifiers and Semantic Tests 79 6.3 Errors Which Resemble Valid Constructs 82 6.k Representing Error Constructs in the Syntax 83 6 . 5 Problem of Segmented Language Structure 85 6.6 Redundancy 86 6.7 Noise Symbols 86 6.8 Delineators 88 6.9 Ordering of Alternatives 89 6.10 Conclusion 91 7. EXTENSION TO RECURSIVE DESCENT PARSING 92 7.1 Additional Considerations Needed for Recursive Descent 92 7.2 Initial Analysis 93 7.3 Parsing and Error Detection 9h J.h Rationale for Required Searches 95 7.5 Conclusion 96 8. DEMALGOL RECURSIVE DESCENT COMPILER 97 8.1 The Recursive Descent Compiler Building System 97 8.2 Construction of the DEMALGOL Compiler 97 8.3 Error Recovery Results 98 8.J+ Summary of the Error Recovery in Seven Programs 99 8.5 Comparison of These Results with Others 101 Vlll 9. LANGUAGE DESIGN CONSIDERATIONS FOR IMPROVED RECURSIVE DESCENT ERROR RECOVERY 103 9.1 Introduction 103 9.2 Forcing Required Symbols in Positions of Common Errors 103 9.3 Syntax Description for Longer Required Searches 10i+ 9.^ Semantic Setting of the Current Search as Required . . . 106 9.5 Uniqueness of Different Alternatives 106 9.6 Conclusion 107 10. FURTHER RESEARCH 108 10.1 Introduction 108 10.2 Refinements to the Current System 10° 10.3 Extension to Other Parsing Algorithms 109 10. k Extension to Probabilistic Grammars 109 10.5 Applications in Computer Science Education HO 10.6 Applications to Extensible Compilers HI 10. T Summary 113 11. SUMMARY Hh LIST OF REFERENCES 118 APPENDIX A. DEMALG0L PROGRAMS 126 B. ALCOR-ILLINOIS ERROR RECOVERY EXAMPLES 1^6 C TESLA SYNTAX 151 D. ICL SYNTAX 155 VITA 161 IX LIST OF TABLES Table Page 1. Table of DEMALGOL Syntax Errors 58 2. Error Recovery Effectiveness of Three Compilers 62 3. Comparison of DEMALGOL, TESLA, ICL, and OSL 78 h. Comparison of All DEMALGOL Compilers 102 X LIST OF FIGURES Figure Page 1. Interdependence of Error Recovery Variables 11 2. Example of a Floyd Production Group 17 3. A Small BNF Grammar 18 h. Modified Floyd Productions of Grammar in Figure 3 18 5- Format of Parser Instruction Table Entries on Burroughs B5500 23 6. Syntax of the Parser Language 2k 7. A BNF Grammar 27 8. Floyd Productions for Grammar in Figure 7 27 9. Partial Optimization of Productions in Figure 8 29 10. Complete Optimization of Productions in Figure 8 29 11. An NTH Group before Optimization of Transfers 30 12. The NTH Group after Optimization of Transfers 30 13. Conditions for Inserting a Symbol 37 1^. Two-symbol String Patterns for Error Correction ^+6 15. Three-symbol String Patterns for Error Correction ^7 16. Syntax of DEMALGOL 50 17. Semantics of DEMALGOL 52 18. A BNF Grammar Illustrating Unique Terminal Symbol Flag Use 96 19. Illustration of Poor Syntax for Long Required Searches . . . 105 20. Illustration of Good Syntax for Long Required Searches . . . 105 CHAPTER 1. INTRODUCTION 1.1 Purpose of This Work The purpose of the research described in this thesis is to provide an automatic error recovery system which is based on the syntax of a language and which can operate without any additional effort on the part of the language designer. These objectives seem to have been realized. Further, the system described comes reasonably close to the ultimate goal of finding all the syntactic errors in a program and describing them precisely to the programmer in order to minimize the effort required of him to achieve a syntactically correct program. Batch processing compilation is the context in which this system operates; however the techniques used could equally well be applied to an on-line compilation system. The error recovery system that is described has been built into a compiler building system based on a modification of Floyd's production language (^0), but it is extended to a recursive descent parsing algorithm as well. Several tests from several different compilers implemented on this system have shown it to be superior to any of the systems with which it is compared in both effectiveness and clarity to the programmer. Systems with which the present one is compared include the Burroughs B5500 ALGOL compiler, the 7090 ALCOR Illinois ALGOL compiler, and one other compiler with hand-written error recovery mechanisms. Although one can identify five levels of errors which a pro- grammer must overcome to be able to have his program perform correctly, this research attempts to provide automatic and effective mechanisms 2 only for the third level, that of syntactic errors. The first level is that of keypunching errors which prevent the job from even getting on the computer. The second level is that of errors in the job description to the operating system so that no processing can be done. The third level is syntactic errors in the compilation which prevent the object program from running. The fourth level is that of fatal run time errors such as divide-by-zero which terminate the job prematurely. The fifth level is that of logical errors which cause the results of the job to be erroneous even though it is able to run to completion. 1.2 Contributions of Other Research There has been a considerable amount of work done on giving the programmer help with errors at levels four and five , but he has not been given as much help with the level three, syntactic errors. There has been almost no work done on automatic syntactic error recovery. The work on debugging, finding levels four and five errors, has taken a quite varied approach. Chapin (l8) has proposed a change in machine design specifically for more effective debugging. Bayer (6) and Hext (52) have considered matters relating to program failures (level four errors). Some theoretical concepts have been discussed by Constantine (22), Green (hi) and Van Horn (93), while some specific debugging systems have been proposed by Balzer (5) 5 Grishman (51) » Kulsrad (64), and Zadrevskii ( 100). Fuchi , et al (h6) have considered a debugging system which operates by simulation, or simulated execution of the program. The most popular approach has been that of on-line 3 debugging. Examples of this are given in Baecker (U), Bernstein and Owens (10), Brady (ll) , Dean (25), Evans and Darley (35), Josephs (58), Jossen (59), and Zimmerman (101), Irving and Morrison (57) and Pullen and Shuttee (78) have considered debugging in some special situations, and Levy (69) and Varley (9^) have considered data-related errors. There have been many compilers constructed for practical application and each of these has had to do something with syntax errors. Some of these have handled the situation quite well even though the measures used have been ad hoc. Compiler building systems have tended to be less satisfactory with regard to syntactic error recovery. All but two have required a certain amount of error recovery information to be provided by the language designer. In some cases the systems have depended entirely on the language designer for any error recovery. Feldman and Gries (38) have said "the problem of automatic recovery from syntax errors could use considerably more attention" (p. 107) and "There has been very little effort on the problems of automatic error detection and recovery in syntax- directed processors. Once again, even a bad system would be of great value to users." (p. 108) The two compiler building systems which have had error recovery built into them independent of the language are those of Irons (55) and Leinius (68). Irons' error correcting algorithm is based on a multiple track- ing top down parse algorithm, i.e. goals are established and each possible branch is extended to the next subgoal until either all the goals of one branch are found or all branches terminate. If all currently active branches terminate without finding their respective goals, then the error recovery procedure is called. This procedure attempts to fix the source k string by insertion, deletion, or replacement. It makes a list of all syntactic elements which can follow on all the last set of parsing branches. It then scans the input until it comes to one of the items in the list. The string is then fixed in the simplest way which will allow this symbol to be connected to the previous parse. Informally Irons has stated (per- sonal communication) that this procedure has done the correct thing in about 20% of the cases. Leinius based his method on a simple precedence parsing algor- ithm. He associates an error recovery procedure with the simple prece- dence parser which attempts to find the smallest substring containing the error which can be reduced. Although it is not clear from the material which was available at the time of this writing, this appears to be a more sophisticated version of the commonly used concept of throwing away symbols until something recognizable is found. He further says that when two or more syntactically valid recovery actions can occur, more text is examined to make the decision as to which to choose. He concludes with a discussion of the extension of his method to the class of LR(k) languages. Other compiler building systems leave the error recovery either partially or entirely up to the particular language designer. For example, McKeeman (72) expects the user to initialize an array called STOPIT with an appropriate set of terminal symbols. When an error is found, an XPL compiler skips symbols until it comes to one of the symbols in STOPIT. It then makes a reduction appropriate for the symbol at which the skipping stopped, i.e. it finds a reduction which can be followed by that symbol or can include it as the last symbol. Simpson (88) also skips to an appro- priate symbol. 5 Wirth (98) discusses the handling of syntactic errors in PL360 from the point of view of a precedence relation parsing algor- ithm and hence the method used is not limited to PL360. However the method he outlines is based on information provided by the language designer. When no precedence relation is defined "between two symbols, an error is detected. A set of symbols, which the language designer has provided, is searched for one for which there exists a precedence relation between it and each of the two symbols which had no relation between them. This symbol is then inserted between the two symbols. If no symbol is found, a table of erroneous productions, again provided by the language designer, is searched for one that applies. Associated with it is an appropriate message provided by the language designer. Nothing was said about what happens when no production from the table applies. META-PI (76), a top-down interactive compiler-compiler, allows a CLAMP operation to be used in the syntax. This operation disallows backtracking to go past it. Any need to backtrack past it is considered an error. It is not clear what is done when an error is detected. A later chapter will discuss an algorithm for automatically achieving the effect of the CLAMP operation in top-down compilers. Some specific compilers and some special topics can be noted for contributions to syntactic error recovery even though they have been designed for specific systems. One of these is the use of spelling correction described by Morgan (7^)- He describes the use of a spelling correction algorithm, based on work by Damerau (2U) and Freeman (U5), to correct simple spelling errors in language processors and operating systems. The corrections were limited to a single letter change, a single letter inser- tion, a single letter deletion, or an interchange of two adjacent letters. These changes were ahle to correct over 80% of the spelling errors in several hundred programs studied by Morgan at Cornell. No attempt was made to fix the somewhat less than 20$ remaining. This algorithm could reasonably be applied at some points in the algorithms discussed later. The Illinois ALCOR 7090 ALGOL compiler (50) is worthy of mention because of the thought the designers gave to its error recovery methods. Its performance will be seen later as it is used as a comparison with a compiler produced by the system described below. WATFOR (23, 87) and DITRAN (75) are both FORTRAN-based compilers which have been designed with the idea of improving the error detection and diagnosis capabilities at all of the levels 3, h, and 5 mentioned earlier. The goal is to achieve rapid turnaround and throughout for student type programming and to give as many error detection aids as possi ble, described as clearly as possible. Both seem to have achieved their goal reasonably well. Evans (3*0 describes a Floyd production language compiler for ALGOL 60 in which there is a fairly complete set of error recovery pro- ductions . Chapter fifteen of Gries (k9) gives some good principles for error recovery in several parsing algorithms. These are aimed toward helping in the construction of specific compilers as opposed to general aut omat i c t e chni que s . Discussion of error recovery techniques is either absent or incomplete in most other compiler and system descriptions. However, their references are included here to provide greater documentation of 7 the compiler building field for subsequent readers. Other work in the area of translator "writing systems, parser generators, and recognition algorithms includes the work of Brooker and Morris (12, 13, lh, 15, 16, 17, 8l) , Cheatham (19), Schneider and Johnson (85), Backes (3), DeRemer (26, 27), Earley (28, 29, 30), Eickel (31, 32), Ferentzy and Gabura (39), Floyd (k0 t kl t U2, kk) , Ingerman (53), Korenjak (63), Ungar (91), and Warshall (95). Some other descriptions of compilers for specific lan- guages can be found in Irons (5*0, Kanner, et al (6l), Randell and Russel (79), Resnick and Sable ( 80) , Uracheva (92), and Wirth and Weber (99). Syntactic and semantic specification are discussed in Feldman (36, 37), Ledgard (67), Machado ( 71) , Schorre (86), and Whitney (96). Some special topics relating to parsing and compilers can be found in Cohen and Nguyen-Dinh (2l), Gries (48), Kahan and Dumas -Primbault ( 6l) , Rosenberg (82), and Samelson and Bauer (84), and articles of a survey or general comment nature are Cheatham and Sattley (20), Feldman and Gries (38), Floyd (1*3), Irons (56), Lomet ( 70) , and Samelson (83). The main point to be derived from all of these references is that there has been much ad hoc work on syntactic error recovery, some not very effective, but surprisingly little of an automatic or algorithmic nature. 1. 3 Philosophy of This Work Since the programmer was actually trying to produce a string of symbols which would constitute a legal program, the strings of symbols given to a compiler will ordinarily differ only slightly from legal strings. Hence the error recovery need only concern itself with the set of strings which differ in only minor ways from a valid program and not 8 with the entire set of possible strings. In the method described below, small perturbations of the string in the vicinity of the error are considered in an attempt to find one which is syntactically correct. However, provision must be made for some recovery technique for cases in which the string is further from being a valid string than any of the attempted perturbations. The success achieved below illustrates the validity of this approach. There has been discussion about the validity of a compiler attempting to "fix" errors. The advocates say that as long as the com- piler can make some sense out of the program, it should go ahead and produce something and run it in order to let the programmer find as many errors from all levels as possible from each run. The opponents say that the programmer will come to depend on the compiler to fix his errors and hence tend to become sloppy in his programming or that an incorrect program will be compiled and run without any indication that there is an error in it. The error recovery system described in this thesis "corrects" errors but is not affected by the points of the above discussion for two reasons. First, it is part of a compiler building system and the decision about whether to have the compiler attempt to run a program which had syntactic errors or not is up to the language designer. In all cases in which the system has actually been used, the compilers would not run pro- grams with syntactic errors. Second, the error correction arises out of a different motivation, that of an attempt to be as clear to the programmer as possible as to the nature of his error and to enable the compiler to continue parsing with a minimum loss in continuity with the goal that all the syntactic errors would be detected on the first run and be accurately described to the programmer. The compiler's attempt to fix the error is the result of its attempt to analyze the error situation to such an extent that it is able to tell the programmer exactly what his mistake was and to direct the parser on the most desirable parsing path. l.k Measurement of Effectiveness of Error Recovery What is needed is a method of measuring the effectiveness of error recovery which gives a numerical basis to the intuitive notions of more and less effective error recovery. In order to arrive at such a measure of the effectiveness of error recovery, we must begin with the goals of error recovery. One is to get the compiler parsing again. This is actually part of the solution of the other goals and so can be ignored for the moment. The ultimate goal of error recovery is to mini- mize the programmer's effort in getting a syntactically correct program. This effort is determined by the number of runs he is required to make and the clarity of the error messages or time and energy spent in under- standing the error messages he receives. Although the clarity of the messages is somewhat subjective since it depends to some extent upon the programmer, the number of runs is determined by two things, the number of errors missed and the number of extraneous error messages. The latter have the same effect as missed errors when they occur in large numbers since the programmer gives up looking through the extraneous messages trying to find the correct ones. The errors missed are the result of the recovery having to skip symbols to recover, which in turn is largely the result of incorrect or poor recoveries. The extraneous errors are also the result of incorrect or poor recoveries. The clarity of the 10 messages could be determined by the way they are presented or by their content but it could also depend on the accuracy of the recovery and to some extent on the confusion provided by the extraneous error messages. Figure 1 illustrates this interdependence of variables, where each higher item is influenced by those lower ones which have arrows pointing to it. In order to measure the effectiveness of the error recovery for a particular compiler on a given set of runs of one or more programs, in the examples described later the following quantities which could be obtained easily were used: M - the number of errors missed in all runs, N - the number of errors detected, and X - the number of extraneous errors produced which were the result of the recovery and not the result of any direct action on the programmer's part. Further, the recoveries from the errors detected were rated on a U-level scale: E - excellent (the recovery was able to identify the string the programmer actually intended and told him precisely what was wrong), G - good (the recovery was not exact but close enough that the programmer could tell almost as easily as in the former case what the error actually was), F - fair (the recovery told the programmer where there was an error but what it said about it did not help him much in identifying the actual error), P - poor (the recovery could easily have misled the programmer as to the cause of the error, making him choose the wrong correction or look in the wrong place). The following formula, used to measure the effective- ness of the error recovery of a compiler for a given set of runs, has the value for a set of recoveries in which all errors were missed and the value 1 for a set of recoveries in which all errors were detected 11 Programmer's Effort to Correct Syntactic Errors Number of runs required Clarity of error descriptions Number of extra errors Number of errors missed Number of incorrect recoveries Effectiveness of compiler's syntactic error recovery Interdependence of Error Recovery Variables Figure 1 12 with a rating of "E" and no extras were created. The measure is propor- tionately reduced by extra errors and by. poorer recoveries. effectiveness = E + 3AG + ^ 2F + ^ . N effectiveness R + M N + x where E,G,F, and P are the numbers of error recoveries with ratings of "E", "G", "F", and "P" respectively. This formula gives a measure of the effectiveness of error recovei which corresponds to intuitive notions of effectiveness and is based on numerical quantities. 1. 5 Overview of the Thesis The structure of the rest of this thesis is as follows: Chapter 2 describes the compiler building system consisting of the syntax description language, TWINKLE, the TWINKLE compiler, the modified Floyd production language, the conversion algorithm from Backus-Naur form productions which are the output of the TWINKLE compiler, the operation of the parser, and the optimization of the parsing tables. Chapter 3 then discusses the error recovery operation in the parser and the changes made to the parsing table by the preprocessor for the error recovery. The DEMALGOL test compiler is described in chapter k and the results of its error recovery on actual programs are compared with the Burroughs ALGOL compiler and the ALCOR Illinois 7090 ALGOL compiler. Chapter 5 indicates the results three other compilers imple- mented on this system. For one of these, there was already another 13 compiler with which it could be compared. The results of these four compilers demonstrate that not only is automatic error recovery based on the syntax of the language possible but that it can come reasonably close to the ultimate goal of finding all the syntactic errors in the program and describing each one accurately and clearly. Chapter 6 summarizes some observations, based on all the examples that were run, that suggest language design considerations which improve error recovery in this system. Chapter 7 discusses the changes that are necessary to extend this method of automatic error recovery to recursive descent parsing. The primary factor here is the detection of the error. A compiler for the DEMALGOL language was built with a recursive descent compiler- duilding system and modified according to the algorithms presented in chapter J. This compiler and the results for it using the same programs as were used for the compiler described in chapter k are described in chapter 8. Additional language design considerations for recursive descent parsing are given in chapter 9, and the final chapter gives some appli- cations and further research which can be built on the work reported here. CHAPTER 2 . THE TRANSLATOR WRITING SYSTEM 2.1 Overall System Description The translator writing system in which this system of syntactic error recovery has been developed consists of five programs and one file of procedures written in Burroughs extended ALGOL for the B5500. These components are : 1. TWINKLE/DISK. A compiler for the syntax description lan- guage TWINKLE which produces a Backus Naur form description of the language being processed. 2. BNF2FPL/TWS. A program which converts the Backus Naur form productions into modified Floyd productions. 3. FPL2PAR/TWS. A program which generates parsing tables from the Floyd productions. k. PAR2ALG/TWS. A program which generates B5500 ALGOL source code from the parsing tables. 5. ISL/DISK. A compiler which translates the language seman- tics description written in ISL into B5500 ALGOL source code. 6. TWS/FILES. A file of B5500 ALGOL source code containing the scanning and parsing procedures. These components will be discussed in separate sections below except that ISL/DISK will not be discussed since this thesis is concerned only with syntactic matters. Also the parsing procedures in TWS/FILES will be discussed along with the parsing table generation program FPL2PAR/TWS . 15 The error recovery techniques which are the contribution of this thesis are described in chapter 3 but involve components 2, 3, and 6 above . A discussion of the structure of this system, the semantics language ISL, and the structure of compilers built with this system can be found in Machado (71). 2.2 The TWINKLE Syntax Description Language and Compiler The TWINKLE langauge is based on Backus Naur form, but it extends Backus Naur form in two different ways. First, it has some additional forms of expression which simplify the definition of several language structures, such as lists and sets of symbols. An example of a list is the < compound tail> in ALGOL which is a list of < statements separated by "; ". An example of a symbol set is "any character except ; ". Second, it allows the language description to be English-like so that the syntactic description of the language which is used to construct the compiler can also be given to users of the language to explain the syntactic structure of the language. For example, the two examples given above could be stated, "A < COMPOUND TAIL> CONSISTS OF A LIST OF S SEPARATED BY SEMICOLONS" and "ANY CHAPACTER BUT SEMICOLON". The English form is not required, and some very compact symbolic nota- tions are also allowed. The TWINKLE compiler produces a Backus Naur form description of the language, inserting dummy productions where necessary, in such a way as to enhance the grammar for the Backus Naur form to modified Floyd production language conversion algorithm. A complete description of the 16 TWINKLE language and the TWINKLE to Backus Naur form conversion can be found in Mercer (73). 2. 3 Conversion of the Backus Naur Form of the Language to Modified Floyd Production Language The next step in the syntax preprocessing is the construction of modified Floyd productions from the Backus Naur form (BNF) of the language. These productions are arranged in groups which correspond to specific situations in the BNF grammar. This section describes these Floyd productions, the significance of the grouping of the Floyd productions, and the generation of the Floyd productions from the BNF grammar . Each of these modified Floyd productions consists of four parts: (l) a test of the current symbol, i.e. either the current symbol from the input source program or a nonterminal to which a reduction was just made, (2) a forward context test of the next few symbols of the input source program beyond the current symbol, (3) a set of semantic actions, and (h) a set of parser actions. Any one of these parts except the parser action part may be empty. The parser action part consists of four different types of actions: (l) whether or not the stack that is maintained for semantic purposes is reduced, (2) whether or not a new symbol from the input source program is put on the top of the semantics stack, (3) what action is taken with regard to a marker stack if any, i.e. whether one or more markers are popped or a new marker is put on the top or neither, and (k) which is the next group of productions to be attempted. The significance of the markers will be indicated below. 17 These modified Floyd productions are arranged in groups some of which have an error production at the end. Testing usually begins with the first production in the group. Figure 2 illustrates a typical Floyd production group. ou and a are the current symbol or stack tests, $ and 3 9 are the lookahead tests, ->N . means reduce the parsing or semantics stack to nonterminal N., -«-N ' means push marker N 1 onto the marker stack, * means take a new symbol from the input m source program and put it at the top of the semantics stack, and the G, , G , and G are the names of the next groups to be executed, k n p . G. : a, ; 3 n •> N. G 1 1 ' 1 j k a n ! 3 + N * * G 1 ' 2 m n a 2 | * G p ERROR Example of a Floyd Production Group Figure 2 Four basic types of Floyd production groups are generated corresponding to four different situations in the BNF grammar. To see how these groups are determined, consider Figures 3 and k. Figure 3 gives the BNF productions for a small language and Figure k gives the Floyd productions produced for this language. 18 A -> CB B ■*• Bd B ->- DE C -*■ F F -*■ f D ■* Fg E -> e A Small BNF Grammar Figure 3 TH-A: f | + F TH-B: f | -*- F TH-E: e| -> E NTH-A: F| + C C| «- B' TH-B NTH-B: F| TPN(6,2) D| «- E 1 TH-E NTPN(l,2) -B: B|d TPN(2,2) B| ->• A E| -* B d| ■> B g| •*■ D NTPN(3,2) -E TPN(2,2) TPN(6,2) Modified Floyd Productions of Grammar in Figure 3 Figure k 19 There is one group for each terminal symbol in a non-first position called TPN for "terminal as the Nth symbol (N •*■ l) of production P" and one group for each nonterminal symbol in a non-first position called NTPN for "nonterminal as the Nth symbol (N -»■ l) of production P". Each nonterminal symbol which has a non-first occurrence has a TH group (for "terminal head symbol") containing all the first position terminal symbols which are in the first position on some derivation for this nonterminal and an NTH group (for "nonterminal head symbol") con- taining all the first position nonterminal symbols which are in the first position on some derivation for this nonterminal. Figures 3 and k give examples of all four of these kinds of groups . The significance of these groups in the parser is as follows: Whenever the portion of a production to the left of a terminal symbol is identified, the TPN group is executed. (Since this group has just one production, it is put in as a continuation of the current Floyd pro- duction.) If the next symbol is a nonterminal symbol, a marker is put on the marker stack indicating the nonterminal, the NTPN group to apply when this nonterminal is recognized, and the NTH group for this nonterminal, Control is then transferred to the TH group for this nonterminal. When the symbol on the right end of a BNF production is recognized, the next Floyd production is determined by comparing the name on the left hand side of the BNF production with the name of the top marker in the marker stack. If the names are the same, the NTPN group indicated by the marker is the next group executed; otherwise, the NTH group indicated by the marker is the one executed. 20 Preclusion conflicts between Floyd productions are resolved by forward contextual analysis or "by the formation of new groups which are the combination of the next groups for each production in conflict. This causes there to be combined markers and CTPN, CNTPN, CTH, and CNTH groups in addition to the groups mentioned above. In some cases, the grammar is ambiguous because the resolu- tion of the conflict depends on semantic information; for example, may be a or an depending on information stored in an identifier table in the semantics. A semantic test is used in this case to resolve the conflict. In the parser, this is the same as a regular semantic action call except that when control returns from the semantic routine, the global Boolean variable SEMANTICTEST is tested. If this variable is false, this pro- duction is terminated and the next one is tried. In the BNF to FPL conversion, any semantic test is assumed to resolve any conflict associated with the stack symbol preceding the test in the BNF. The details of this algorithm for converting BNF to modified Floyd production language can be found in Beals (7,8). The program BNF2FPL/TWS makes some small optimizations to the set of Floyd productions it generates. One of these is the putting of a TPN production in sequence after the production which calls it, as has been mentioned above. Another is the using of just one group for several which are identical. This helps reduce the number of Floyd productions generated and sometimes leads to other subsequent optimizations. Also, a table of following symbols is constructed. This table will be referred to below as the error situations in which it is used 21 the table to the next production to "be applied. If the instruction was either the XSIS or XSIB instruction of Figure 6, an error proce- dure is called to insert the missing symbol and the next part of this production is applied. The stack tests are always successful. SKIP is used in those groups which have stack tests, namely TH, NTH, CTH, CNTH, and CTPN. NONE is used in groups without stack tests, NTPN and CNTPN. ILVL is used in TPN productions. FDNT is used on the first nonrecursive production in an NTPN or CNTPN group, and CNONE is used in CNTPN groups following the production with FDNT. The purpose of all these different null stack tests is to allow TPN productions to be skipped over after a false lookahead or semantic test and to set flags and to be markers for the error recovery mechanism. If a lookahead test XLS, XLB , or LK fails, then the next non- TPN production in sequence is applied. This production will have a stack test so another lookahead test will be applied. If a lookahead test XLRR or XLNR fails, then an error procedure is called to recover from the error in the next symbol. This procedure will reset the production pointer to the appropriate production following the recovery measures . If a semantic test fails, the production pointer is incremented to point to the next production unless this production is the last pro- duction for this stack test symbol. In this case, an error message is printed and the test is considered true. The parser action section consists of four possible separate actions. A number of markers may be popped from the marker stack; a new marker may be pushed onto the marker stack; the semantics stack may be 22 are discussed. This table contains a bit row for each marker symbol and each TPN production. Each row has marked all the terminal symbols vhich can immediately follow the symbol which references that row. At the conclusion of BNF2FPL/TWS , a file of Floyd production groups has been created which corresponds to the BNF grammar for the language. Also created is a table of following symbols which will be used by the error recovery procedures as described in Chapter 3. 2.k Construction and Use of Parser Instruction Tables The completed file of modified Floyd productions is then con- verted into a table of parser instructions by the program FPL2PAR/TWS. The format of this table is given in Figure 5- Also at this time, four other tables are created or added to. Tests for lookaheads of length greater than one are put in one table; multiple semantic action calls are put in another; the combined markers are put in a third; and additions are made to the symbol set table. The meaning and use of these parser instructions and tables are discussed briefly in this section. The syntax for the language of the parser is given in Figure 6. Except for the error production or finis production, each production has four parts: a stack test, a lookahead test, semantic actions, and parser actions. A successful test in each part causes the parser to move to the next part. If the stack test fails, then one of two actions is taken depending on which parser instruction made the stack test. If the instruc- tion was either XSBS or XSBB instruction of Figure 6, then the indicated in Figure 6 is used to determine the distance down first -word: 23 1 5 6 t IT 18 23 2k 35 36 Lookahead symbol Lookahead operator Stack test operator - 1 Stack test symbol — Stack test failure skip number or error table row — ' hi second word: 12 3 1+56 a n a I; I L 11 12 13 IT 18 A 4 29 30 33 3U a hi Semantic test error bit j ^-Semantic action operator *— Transfer directly bit Scan-a-new symbol bit ~ Push-a-marker bit Reduce semantics stack bit Pop-markers bit Number of add'l action calls J Semantic action symbol Number of markers popped Next address for direct transfer third word (marker symbol) 1 9 10 2324 Error table row NTPN group address- of number of markers combined NTH group address Combined marker bit 3T 38 39 hi Symbol or combined marker group row marker Format of Parser Instruction Table Entries on Burroughs B5500 Figure 5 2k : : = / / : : = . : : = / / / / : : = NONE / CNONE / SKIP / ILVL / FDNT : : = XSBS : : = XSBB : : = XSIS : : = XSIB : : = / : : = / / / / empty : : = XLS : : = XLB : : = LK : : = XLRR / XLNR : : = ACT / TEST / MANY / empty : : = : : = POP / empty : = RED / empty = PUSH / empty ::= DNT Nonterminal symbol> / SCAN / GO
/ SCAN GO
: : = / : : = ERRN Nonterminal symbol>
/ ERRR / ERNR : : = EXIT Syntax of the Parser Language Figure 6 25 reduced; and a new address in the production table is taken in one of four ways: 1. If a reduction to a nonterminal is made, then the address is taken from the marker symbol according to whether or not the nonterminal matches the marker name; 2. A new input symbol is taken and the next production in sequence is applied; 3. A next address is given for the next production; h. A new input symbol is taken and a next address is given for the next production. If transfer case 2. occurs, the next production is a TPN production and is considered to be part of this one. Its stack test will be XSIS, XSIB or ILVL and these cases will be skipped over in the XLS, XLB, LK, and semantic test fail- ures mentioned above. This section has given a basic description of the modified Floyd production language as implemented in the parser, the parsing operators, and how those operators are used. 2.5 Optimization of the Parser Instruction Table The parser represented by the parsing table is considerably optimized over the set of Floyd productions produced by the conversion from BNF. This section gives four optimizations which FPL2PAR/TWS makes to the parsing table. These optimizations are: 1. the combining of single symbol tests into symbol set tests where possible; 2. using one group of Floyd productions for two or more which are identical; 3. elimin- ating unnecessary markers and stack reductions; and h. introducing direct transfers within NTH groups. Wherever possible in stack tests or lookahead tests, sequential tests are combined into single set membership tests , sometimes creating 26 new symbol sets which are added to the symbol set table. Lookahead tests of one symbol can always be combined into one symbol set test as can the nth symbol tests (n=2, 3, or k) of a multiple symbol lookahead test of length n. The stack tests of two consecutive productions in a group can be combined into a symbol set test if all the other corre- sponding parts, lookahead tests, semantic actions, and parser actions, are identical. Each new NTPN and CNTPN group is compared with the previous ones for that nonterminal. If it has the same set of parser instructions, then that group is identified with the previous one. After the table is initially completed, all reductions and markers that are not needed syntactically or semantically are removed. For example, consider the BNF syntax of Figure 7 and its Floyd pro- duction counterpart in Figure 8. First the tests for e and f can be combined so group TH-D of Figure 7 becomes "Th-D: {e,f} -*• D." Then groups NTPN-D and NTPN-D are the same so NTPN-D can serve for both. Since B and C each have only one NTPN group which has a reduction as parser action and contains no lookahead test or semantic action, and they do not occur in an NTH stack test, their markers are not needed. Also since B and C each have only one NTPN group and do not occur in an NTH group and since that one production in each NTPN group contains no lookahead test or semantic action, the reductions to B and C can be re- placed with the parser actions of each of their respective NTPN productions and the NTPN groups eliminated. After eliminating B and C in order, we have the productions of Figure 9- Now groups NTPN-D and NTPN-D« are identical and so we have only one NTPN-D group. D now satisfies the 27 : : = a : : = b c : : = c d : : = elf A BNF Grammar Figure 7 TH-A: a -<- B' TH-B TH-B: Td -*- C TH-C c -*- D i TH-D TH-C: c -*- D 2 TH-D d -*- D 3 TH-D TH-D: e -> D f ->- D NTPN-B : B -*■ A NTPN-C : C ->■ B NTPN-D : D ->■ B NTPN-D 2 : D -»■ C NTPN-D- : D -*■ C Floyd Productions for Grammar in Figure 7 Figure 8 28 same criteria above which B and C satisfied so the D markers, the reductions to D, and the NTPN-D group can "be eliminated leaving the set of productions of Figure 10 as the final set for the parser. Although the system currently implemented (9) does not do it, group TH-C could become "TH-D: {c,d} TH-D" . If the NTPN group, or production in an NTH group if the single nonterminal occurrence is in an NTH group instead of an NTPN group, consists of several productions distinguished by lookahead or semantic tests, then the group must be retained but it may be transferred to directly without reference to the marker stack. If these productions are in an NTH group instead of being in an NTPN group, they are removed from the NTH group and made into a special NTPN- like group to be transferred to. After all of the above has been completed, the reductions which remain in an NTH group are changed so that the parser action just trans- fers to the applicable production within the group without looking at the marker as long as the nonterminal named in the reduction is not the same as the nonterminal name of the group (which would require the NTPN group of the top marker). Further, if the new production has no lookahead test or semantic action and is not followed by a TPN production, then the parser action field of the new production is copied into the former pro- duction. As with the previous process, this process is iterative until all possible backsubstitutions have been made. For example, the group in Figure 11 would become the one in Figure 12. Combining parser actions includes adding the number of markers popped, using the next address and marker or reduction name of the second production, and oring the rest of the parser action fields. 29 TH-A: a TH-B TH-B : b TH-C c ■^ D' TH-D TH-C: c ■*- D^ TH-D d +■ D' TH-D TH-D: {e, f} -»■ D NTPN-D : D -»■ A NTPN-D : D ■* A d. Partial Optimization of Productions in Figure 8 Figure 9 TH-A: a TH-B TH-B : b TH-C c TH-D TH-C : c TH-D d TH-D TH-D: {e,f} +A Complete Optimization of Productions in Figure 8 Figure 10 30 101 NTH-A: B -> A 102 C «- Z l TH-Z 103 D TPN 10U d «- Y' TH-Y 105 E ■*• C 106 F + B 107 G -*■ F 108 H -* G 109 I -► D An NTH Group Before Optimization of Transfers Figure 11 101 NTH-A: B ■* A 102 C *■ Z» TH-Z 103 D TPN 10U d -e Y' TH-Y 105 E «- Z' TH-Z 106 F + A 107 G + A 108 H •*• A 109 I ■*■ D 103 The NTH Group After Optimization of Transfers Figure 12 31 The following summarize the optimization criteria: 1. Set tests can be used in the stack test field of pro- ductions in TH, CTH, and CTPN groups as long as all three other fields, lookahead test, semantic action, and parser action are identical for each production combined. 2. A previous NTPN or CNTPN group may be used instead of a later one if they consist of the same set of productions. 3. A reduction may be replaced with a direct transfer if the nonterminal named by the reduction occurs only once, either as an NTPN group or in only one NTH group. In the latter case, the production or productions are removed from the NTH group and made a separate special group. Further, if there is just one production, possibly followed by TPN productions, and it has no lookahead test and no semantic actions, then the parser actions can be combined with the parser actions of the production making the reduction and the NTPN or NTH production removed. (if there were TPN productions, they must be copied after the production which made the reduction.) This optimization and the previous one interrelate in that the application of one of them can create the conditions for the other one. A marker can be removed if the above condition for removing reductions to that nonterminal are met, the NTPN production has no semantic action and the parser action is a reduction, and there is no NTH group for this nonterminal. The conditions for this optimization can be generated during 32 , application of the previous optimization since it is possible for an NTH group to disappear if all its productions are either eliminated or made into special groups. k. Within NTH groups, any reduction to any nonterminal other than the one for which this group was made must come hack to this group. Hence those reductions can he made to trans- fer directly to the applicable production. If the applicable production has no lookahead test or semantic action and the parser action of the production making the reduction can be combined with the parser action of the production it would have transferred to. 2.6 Source Code Generation for the Parsing Tables The program PAR2ALG/TWS converts the parsing tables to Burroughs ALGOL so that they may be compiled into the object code for the compiler. This eliminates the need to have the file of parsing tables present during execution of the compiler. Burroughs Extended ALGOL FILL statements are used to implement this. Also, the multiple symbol lookahead testing is very natural to nested conditional statements and the option of converting this table to ALGOL code and having it compiled in for direct execution instead of being interpreted is also available. 2.7 Summary This chapter has discussed the syntax preprocessing portion of the translator writing system from the language specification in TWINKLE 33 to the parsing tables or code generation output. More detail was given on the Floyd production generation and the construction of the parsing table as these are more basic to an understanding of the implementation of the error recovery mechanism discussed in the next chapter. 3^ CHAPTER 3- ERROR ANALYSIS IN MODIFIED FLOYD PRODUCTION LANGUAGE . 3.1 Introduction to Chapter 3 At the detection of an error, three steps are taken: (l) the fact that an error occurred is described to the programmer; (2) appro- priate recovery measures are taken; and (3) the recovery is described to the programmer. The recovery measures used are determined by the situation in which the error is detected. There are three possibilities: (l) the absence of a unique required symbol; (2) a stack test error in a group in which all productions have the same parser action; and (3) all other cases including lookahead errors. Within each of these three situations, several different measures are attempted until a recovery can be made. All three cases may make use of a string generator procedure. All of these situa- tions and the respective recovery measures are described in this chapter. 3.2 Error Detection Because of the choice of parsing algorithm, the identification of syntactic errors is automatic; it is inherent in the algorithm. At the end of a group, the failure of all the stack tests means the symbol at the top of the stack is one which is not legal at that point, i.e. it is an error. Similarly, at the end of a subgroup or series of lookahead tests, if all of the lookahead tests fail, there is an error in the symbols ahead. When an error is detected, the procedure TSKTSK is called. This procedure prints the initial part of the error message as follows : If the source program is not being printed, the current card is printed. An "X" 35 is printed under the last character scanned to indicate approximately where on the card the error occurred. Then a line is printed which is easily seen in a listing by its having asterisks all across the page except for words indicating the error. This line gives the ordinal number of the error, whether or not the error occurred in a lookahead which is deter- mined by the first parameter passed to TSKTSK, and the name of the symbol the parser was looking at , for example : ***** 13TH ERROR IN LOOKAHEAD AT IDENTIFIER WHATISITSNAME *************** A second Boolean parameter passed to TSKTSK determines whether a second line is printed which gives the name of the marker at the top of the marker stack, for example: ***** WHILE SEEKING STATEMENT If the marker happens to be a combined marker, then the names of all the markers combined are printed, for example: ***** WHILE SEEKING STATEMENT OR DECLARATION After printing these lines, procedure TSKTSK is finished and control is returned to the calling procedure for appropriate recovery measures . 3. 3 Error Recovery in Case of Unique Terminal Symbols Recovery from the error is based on which of several error situations is encountered. These situations are identified during the syntax preprocessing and appropriate parser instructions are placed in the parsing table to signal that particular error situation. The first situation is that of unique terminal symbols. These are situations in which there is only one stack test. This is the case 36 , with TPN productions and it is the case in TH, CTH, and CTPN groups in vhich there is only one stack test. This can happen in three ways: there is only one production, there are several productions hut all have the same stack test, or there are several productions with different stack tests hut they have all heen combined into one production with a symbol set test as described in section 2.5. The parser stack test instruction that is used in this case is the XSIS instruction for a single symbol test or the XSIB instruction for a symbol set test. With these two instruc tions, the failure of the test constitutes an error. In this case, after calling TSKTSK, the procedure PUTINSTACK is called to put the correct symbol at the top of the stack. This procedure uses a hierarchy of three levels of attempt to insert the symbol. The first is to use the table of following symbols, mentioned previously as being created at the time the Floyd productions are created, to determine if the symbol in error, or the subsequent symbol, or the second symbol following can succeed the correct symbol. This information is used to determine if there is any simple way of fixing the input string to correct the error. This attempt at fixing the input string is based on the table of Figure 13. The conditions are tested from top to bottom with the first one being applied. If none of these conditions is satisfied, the second level is attempted. This level consists of calling a string generator procedure ERRFPINTERPRETER which goes through a more sophisticated process to determine if there is any way to change the string to correct the error. If there is such a way, then the change is made and described to the programmer in the same manner as was indicated for the first level. Other- wise the third level is done. 37 b a Y ->- 3 a 1 b Y b c Y ■+ 3 a 1 b c y c a Y ->• 3 a 1 Y c b Y -*■ 3 a 1 b Y c c a Y -»■ 3 a 1 Y c c b Y ->■ 3 a | b Y 3 = the initial portion of the source string y = the terminal portion of the source string a = the symbol that is required at the top of the stack b = any symbol which can immedi- ately follow this 'a' c = any symbol i = the top of the stack -*■ = if the condition on the left applies, transform it to the string on the right Conditions for Inserting a Symbol Figure 13 The third level is just to insert the correct symbol in front of the one which is there. This means that the next symbol will be an error, but hopefully by being one symbol further along in the parse, either the additional symbol of context will allow the recovery mechanism to fix the string or one of the other error recovery techniques which are based on the marker stack will be the one used. The change is described to the programmer by printing two lines, one as the string was, the other as the string was changed, for example: ***** CHANGED: IF A ***** TO: ; IF A 38 This section has indicated the procedure used to recovery from errors when the error is detected by an XSIS or XSIB instruction. The recovery is made "by the procedure PUTINSTACK which first tries a simple method of fixing the string using the table of following symbols. If this does not work it uses the procedure ERRFPINTERPRETER which follows a more complex method to determine if the string can be fixed to correct the error. If this method fails also, the missing symbol is simply inserted in front of the symbol at which the error was detected. 3.^ Error Recovery in the Case of Same Parser Actions Sometimes it happens that all the productions of the group have the same parser action (not counting following TPN productions). In this case, the error production at the end of the group takes one of two courses of action. First, it attempts to fix the source string by using the string generator procedure. If this procedure is not able to find an acceptable change to the program which will fix the error, then the second course of action is followed. This involves making the reduction anyway either including the symbol on the top of the stack or not depend- ing on whether it is not or is marked, respectively, as one of the symbols which can follow this nonterminal. An example of the message given to the programmer in this case is: ***** REDUCED TO A DECLARATIONTYPE 3.5 Error Recovery with Procedure ERRR All other cases use the procedure ERRR. This includes the error production at the end of all groups which did not satisfy the conditions 39 for the special cases of either of the two preceding situations. It also includes the error lookahead test at the end of a subgroup with lookahead tests. This section describes the two methods used by ERKR, first the string generator procedure and second the discarding of symbols based on the symbols in the marker stack, and all the special situations that are checked by the procedure. Groups which do not fit either of the two special cases mentioned above in sections 3.3 and 3.^4 are concluded with an error production con- sisting of the parser instruction ERRR. After each of the stack tests have been applied without success, this production is used. The first step in the recovery in this case is to attempt to correct the source string by calling the string generating procedure. If this procedure fails to find a correction which will fix the program, the ERRR procedure then looks at the marker stack. It forms a symbol set which is the union of all the following symbol sets associated with each of the symbols in the marker stack. Markers for symbols which have been found but not yet popped because the end of the right hand side has not yet been reached are marked as found by the FDNT stack comparison operator and these markers are not included in this symbol set. The source symbols are then skipped, beginning with the one at which the error was detected, until one is found which is a member of the set of following symbols formed above. This symbol is compared with the following symbol sets of each entry in the marker stack beginning at the top to find which marker it is that this symbol follows. The parsing stack is then reduced to this nonterminal and the next group of productions applied will be the NTPN group corresponding to this nonterminal, whose 1+0 address is in the marker symbol. The marker stack is also reduced to this level. This amounts to assuming that all the symbols skipped were supposed to have made up that construct identified by the nonterminal to which they were reduced and that some gross error caused them to be unparsable. It is also possible that the error was such that the string appeared to be a different construct and, by the time the error was detected, the parse had gone irretrievably far down the wrong path. This second attempt at recovery nearly always gets the parser out of the difficulty, allowing it to continue the parse. Procedure ERER has some other measures which it applies to catch infinite loops and other special situations which can occasionally arise from the above recovery techniques. If a reduction is to take the same name as was already on the top of the stack, then that reduction is not made and the search for an acceptable following symbol is continued. An end of file mark is always an acceptable following symbol and causes a reduction to the top marker symbol if it is not a legitimate following symbol of one of the other markers in the stack. If ERRR is called four times in succession without the scanner moving further down the input stream, then the first level of recovery, the string generator, is by- passed. Since the input is skipped to a symbol which is guaranteed to follow the nonterminal to which the reduction is made, that symbol will be correct and the parse is assured of moving at least one symbol past the hangup. A parse completely dependent upon semantic tests can get into a situation in which the string generator creates a string for which the semantic tests fail, causing the error to continue to occur. The previous check prevents this sort of situation from causing the parser to get stuck. 1+1 Other special considerations apply with regard to end of file marks and program goal symbols. If ERRR is called a fifth time with an end of file mark as the next symbol, it automatically terminates the parse by transferring to the DONEWITHPASSL Jab el which is the same effect as the EXIT parser instruction. If the top marker in the stack is the program marker, which is followed only by an end of file mark, rather than skipping the rest of the program, the parse is restarted at that point by transferring to the starting group of productions. This will usually result in at least one more subsequent error, such as inserting a BEGIN, but at least the rest of the program is able to be given a syntax check. The error production at the end of NTH and CNTH groups is always the ERNR instruction. The action in this case is to mark the top stack symbol as a nonterminal so that it will not be considered in the generation of strings or as a possible following symbol and then to call the ERRR procedure which, except for the above two differences, does the same as described above. In subgroups, groups of productions with a common stack test differentiated by lookahead test, the last production has a lookahead test which is to catch errors which would cause all the lookahead tests to fail. In this case, which is signaled by the XLRR parser instruction, if the lookahead test fails, the procedure ERRR is called. The proce- : dure ERRR has a Boolean parameter which is used to distinguish the looka- head case from the nonlookahead cases both for the subsequent use of the TSKTSK procedure and to determine whether or not to include the top stack symbol in the recovery. If the group is one with a nonterminal stack test, such as NTH or NTPN , the corresponding error lookahead parser instruction k2 . is XLNR. This instruction does the same thing as the ERNR instruction before calling the procedure ERRR. This section has discussed the two methods which procedure ERRR uses to recover from the errors which do not satisfy the conditions for the situations handled by the methods of the previous two sections. These two methods are the use of the string generator procedure and, if the string cannot be corrected easily, the use of the marker stack symbols to control the skipping of symbols until one is found which can follow one of the marker symbols. Also discussed were the various checks that are used by ERRR to prevent the parser from becoming stuck in particular sequences of errors which can occasionally arise. 3.6 The String Generator Procedure ERRFPINTERPRETER The procedure ERRFPINTERPRETER calls a recursive procedure TESTFP which examines all the productions of a group in order to generate all the strings of length either two or three which can legally come next in the syntax. Each string is compared with the next four input symbols and any string with some sort of match is saved. These strings are then examined for some recognizable patterns. If one is found, the source string of symbols is modified to be like the pattern of the string generated. The string generator procedure uses the parsing table to find all possible two or three symbol strings which could occur at this point in the syntax. It uses the stack comparison field and the parser action field, and uses a copy of the marker stack. Some modifications of the parser instructions were necessary to allow them to be used to generate >+3 strings as well as to parse. Because the lookahead comparison field is not used in generating strings for error recovery, it is necessary to have the symbol number in the stack test field of the TPN productions with an ILVL stack test. The ILVL null stack test is used on TPN productions which follow a production with a lookahead test since in parsing it is not necessary to test the stack in such cases. Other additions are due to the need to apply all the productions of a group instead of just one. Hence it is necessary to be able to tell when a group ends. This requirement is no problem with those groups which end with an error production, but it has necessitated the various null stack tests used in NTPN and CNTPN groups which have no error production. For example, in applying an NTPN group, all the productions through the FDNT and CNONE are used until a NONE or another FDNT comes up indicating a new NTPN group. All of these modifications are done by the table build- ing program, FPL2PAR/TWS in the preprocessing of the syntax. Each string found which bears some resemblance to the actual string is saved in a table. Each entry consists of three 12-bit fields which contain the three symbols of the string and a 9-bit field which contains a coded representation of the extent to which this string matches the given string. The four symbols of the given string are numbered through 3. The 9-bit field consists of three 3-bit fields, one for each of the symbols of the generated string. If one of the symbols of the generated string matches one of the symbols of the given string, the number of the given symbol is put in the 3-bit field for that generated symbol; otherwise a 7 is put in the 3-bit field. kk If after two symbols the pattern is "77" > i.e. the two symbols did not match any of the symbols of the given string, then this path is discontinued and not put in the table. If the pattern is "70" to "73" and the second symbol is a class symbol, identifier, or number, the string is extended to three symbols. If the string has only one match with the given string and that match is a class symbol, the string is not saved in the table. Consequently the table will contain all two or three symbol strings in which there are two matches with the given string and all two symbol strings with one match which is not a class symbol. Under the control of a control card, the string generator can extend all strings to three symbols, instead of just those mentioned above involving a class symbol. The string generator is implemented with a recursive procedure TESTFP that consecutively adds each of the stack test symbols of a group to the current string. For each one it adds, it does the parser action field, calling TESTFP for that next group. Since the markers cannot actually be popped but must remain available for other paths, those that are to be popped are simply marked with the current recursive nesting level. After applying the parser action field, TESTFP unmarks any markers marked as popped for that level or greater and applies the next production of that group. The initial call of TESTFP is affected by the type of error situation in which the string generator is called. Each time the parser action transfers to a new group, the starting address of that group is saved. In an ERRR type error, this address is used as the starting point of the group. When the string generator is called from the J+5 PUTINSTACK procedure, only the current production is used in the initial TESTFP call. In the case of the string generator being called from a lookahead error, only those productions with the same stack test, the subgroup in which the lookahead failed, are considered as the group for the initial TESTFP call. After the strings which could apply have been generated, the string generator searches the table for any occurrence of particular patterns in order. These patterns are given in Figure ik for the normal case and in Figure 15 for the extended, three symbol, string case. If the first pattern, "70" in the normal case or "701" in the extended case, comes up during the generation process, the generation of the strings is stopped and the one just generated is used. The first recognizable pattern it finds is used as a basis for modifying the program. The pro- gram is changed to be equivalent to the generated string, the program before the change and afterwards is described to the programmer, the parser table address is set to be the one which was used as starting address for the initial TESTFP call, and the string generator procedure returns with the value true . For an example, assume that "ID1 := ID2 + 3" is a statement in a language in which the ":=" is supposed to be the single symbol "■«-" and that after seeing "ID1" the parser follows a path in which the ":" is not valid. It would get the four symbols ": = ID2 +" for comparison. When it generates the string "-*- identifier", it has the pattern 72 in which the second symbol is a class symbol so it goes ahead one more symbol and finally gets the pattern 723 for the string "■«- identifier +" . In this case the first pattern it recognizes when searching the table of generated strings will "be the 72 (or 723 in the extended string mode). It will then replace the symbols in positions and 1 with the symbol correspond- ing to the "J. The messages printed will he ***** igij> ERROR AT CHARACTER ";" *************************************** ***** CHANGED: : = ID2 + ***** T0: ^ ID2 + If none of the acceptable patterns match any of the ones entered into the table, the string generator procedure returns the value false and the calling routine takes another course of action. These other courses of action have been mentioned previously for PUTINSTACK, ERRR, and ERRN. This section has shown how the string generator procedure is used to generate legal strings at the point of the error and how these strings are then examined for the best match with the program symbols in order to correct the program for the compiler's recovery and for the description of the error to the programmer. Pattern Interpretation 70 a symbol was left out 71 the wrong symbol was used, possibly a misspelling of a reserved word 12 an extra symbol was inserted 10 two symbols were interchanged 72 two symbols were used where one other was required 23 two extra symbols were inserted Two-symbol String Patterns for Error Correction Figure ik ^7 Pattern Interpretation 701 a symbol "was left out 712 the wrong symbol was used, possibly a misspelling of a reserved word 123 an extra symbol "was inserted 102 two symbols were interchanged 770 two symbols were left out 120 three symbols out of order 201 three symbols out of order 301 three symbols out of order 12 an extra symbol was inserted 723 two symbols were used where one other was required 23 two extra symbols were inserted 771 one symbol was used where two others were required 772 two symbols were used in place of two others Three-symbol String Patterns for Error Correction Figure 15 k8 3.7 Summary The error recovery of the modified Floyd production language consists of two phases : the conditions that are determined during the preprocessing of the syntax and are built into the parsing table, and the procedures in the compiler which do the error recovery. The error conditions for which parser instructions are built into the parsing table fall into three groups: the error productions at the end of groups which include the parser instructions ERECT, ERRR, and ERNR, the unique symbol stack test instructions, XSIS and XSIB, and the lookahead test instructions which are used at the end of the subgroups with looka- head, XLRR and XLNR. Each of these three groups of instructions uses the procedure TSKTSK to print the initial part of the error message, but the recovery measures taken vary. The XSIS and XSIB instructions call PUTINSTACK which first uses the table of following symbols, then the procedure ERRFPINTERPRETER, and last just inserts the symbol. Instruction ERRN first uses procedure ERRFPINTERPRETER, then makes a reduction anyway. The rest of the error parser instructions use procedure ERRR which first tries the string generator procedure ERRFPINTERPRETER and, if that fails to provide any recovery, discards a portion of the input based on the symbols in the marker stack. These are the basic mechanisms described in this chapter. Their effectiveness is discussed in the succeeding chapters. I" CHAPTER h. DEMALGOL COMPILER „_« In order to test and develop "both the translator writing system and the error recovery portion of it, a small language was used. This small language called DEMALGOL is essentially a subset of ALGOL. In this section, the syntax and semantics of DEMALGOL are given. The rest of this chapter discusses the error recovery of the DEMALGOL com- piler. The results of this compiler are compared with the results of two ALGOL compilers for the same programs. DEMALGOL includes three declaration types: INTEGER, BOOLEAN, ; and LABEL; most basic statements: assignment, block, compound statement, ; go to, and conditional; and both Boolean and arithmetic expressions with 'OR, AND, NOT, relational operators, +, -, x, * , and unary -. Besides the LABEL declaration, there are three additions to DEMALGOL beyond ALGOL-60 : (l) In the definition of a block , a ";" terminates each rather than just separating s ; (2) Declaration type has been added; and (3) A program is concluded with a "." following the 1 last END. The first change was to bring about compatibility with the i recursive descent compiler to be described in Chapter VIII. The second change was to improve the error recovery in the case someone used a non- DEMALGOL declaration type such as REAL, FILE, ARRAY, etc. The third change was to allow DEMALGOL programs to be compatible with the Burroughs ALGOL compiler. The syntax of DEMALGOL is given in Figure 16. Only some basic semantic functions have been implemented in the compiler. Identifiers are marked in the symbol table as integer, Boolean, 50 : : = . : : = END ::= BEGIN ; | BEGIN : : = | COMMENT DECLARATION LIST> : : = | : : = INTEGER | LABEL | BOOLEAN | : : = , | : : = ; ; | j : : = | : : = : | | ' | | IF ELSE | : : = GO TO | -GOTO | GO : : = : = | <- | : : = and | : : = | TRUE | FALSE (|<|> | : : = | : : = | | ( : : ^ + | - : : = x | / Syntax of DEMALGOL Figure 16 51 or label according to their last declared occurrence. The left hand side identifier of an assignment statement is tested for having integer or Boolean attribute to resolve a local ambiguity in the parser. If at that time the identifier is undeclared, it is assumed integer and marked as such in the symbol table. A message regarding this error is printed. When the declaration type is used, the semantic routine prints a message stating that this declaration type is not allowed. The sub- sequent identifiers are undeclared. The semantics of DEMALGOL is given in Figure IT. DEMALGOL is a small language with the only semantics implemented being that which is necessary for parsing. However, it is large enough to contain a variety of syntactic constructs and so has been useful in testing and demonstrating the effectiveness of the error recovery system described in the previous chapter. h.2 Error Recovery in DEMALGOL The examples described in this section and the next are from seven programs. One of these was written by the author with five inten- tional errors, while the others were written by novice programmers who were attempting to write correct programs. This section discusses the effectivenes of the DEMALGOL compiler in recovering from the errors in these programs and in describing them to the programmers . The next section summarizes all the errors and the recovery in each case. The only time the compiler could not make any sense at all of the program was with the string "FILE DATA (2, 10);". The mistake was in declaring a file. When the word FILE was read, a message was printed 52 At declaration type INTEGER DECLARATIONTYPE := 1; at declaration type BOOLEAN DECLARATIONTYPE := 3; at declaration type LABEL DECLARATIONTYPE := 2; at declaration type BEGIN DECLARATIONTYPE := 15; TSKTSK( FALSE, FALSE); WRITE (LINE, "***** THIS DECLARATION TYPE NOT ALLOWED. ") END; at each identifier in the declaration type list IF DECLARATIONTYPE = 15 THEN ELSE mark this entry in the symbol table with DECLARATIONTYPE; to test for BOOLEAN declared primary identifiers SEMANTICTEST : = is this entry in the symbol table marked = 3; to test for labels at the beginning of statements SEMANTICTEST : = is this entry in the symbol table marked - 2; after label in IF J : = this entry in symbol table marked = 2 THEN ELSE BEGIN TSKTSK( FALSE, FALSE); WRITE (LINE, "***** THIS LABEL WAS NOT DECLARED."); IF J = THEN mark this entry in the symbol table with 2; END; at arithmetic expression primary identifiers IF J := this entry in the symbol table marked = 1 THEN ELSE BEGIN TSKTSK( FALSE, FALSE); WRITE (LINE, "***** THIS IDENTIFIER WAS NOT DECLARED INTEGER."); IF J = THEN mark this entry in the symbol table with 1; END; Semantics of DEMALGOL Figure 17 53 "by a semantic routine telling the programmer that this declaration type was not allowed. An error was also detected at the left parenthesis. A pattern involving inserting two symbols was found and after "DATA" " ; IF" was inserted. The "," was changed to a "+" at a subsequent error. Finally an error was detected at the ";". Since this was followed by "LABEL ENDA, ", no string of symbols generated resembled the given string. Hence the compiler assumed it had found a followed "by the ";" and started looking for another . "LABEL" was an error as the beginning of a statement, but this was easily fixed by inserting a "BEGIN". At the end of the program, it was then necessary to insert a matching "END". Although in most cases the alternative of generating two symbol strings for error recovery is either equal to or poorer than that of generating three symbol strings, in this case it would be better. None of the possible two symbol strings generated at that point would have matched the given string, so the compiler would have scanned to the ";" and reduced the skipped symbols to a . This would have avoided all the extra errors mentioned above and would have been clearer to the programmer. In all these example programs, the most frequent mistake was leaving out a symbol; this was done kl times. Eight times the wrong symbol was used; six times an extra symbol was put in; and once two symbols were interchanged. On three of these occasions, there was another error within the range of the context used to analyze the first error. In each of these, the first error was corrected in such a way that the second error disappeared too, once exactly as the programmer had intended. Also in one place, three errors were each affected by a subsequent error as the programmer made four mistakes in a row. The first error was corrected as the programmer intended, and the correction of the second eliminated the third, leaving the fourth as an isolated error which was handled correctly. Of the remaining 52 errors, four were not corrected exactly as the programmer had intended, although in three of the four, the effect was the same as that of the perfect correction. Three more of the 52 errors were caught by the declaration type and flagged by the semantic routine as invalid declaration type. This means that is only one of the 53 isolated error situations did the com- piler make the wrong adjustment to the program. This was the string "IF M # THAN 55". Instead of producing "IF M / 55" it produced "IF M 4 THAN + 55". These results show that this compiler was able to tell the programmer what his mistake was in 98% of the isolated errors and in about 90% of all the errors . k.3 Summary of the Errors In this section, all the errors in the seven DEMALGOL programs will be presented along with the compiler's recovery from them. The recoveries will be rated and the effectiveness of the compiler for these programs will be calculated. Each error is identified followed in parentheses by the number of times it occurred if it occurred more than once. The actual construct that was intended is then given if this is not clear from the statement of the error. The line concludes with the recovery made by the compiler 55 and the rating of that recovery according to the ratings given in Chapter I : "E" means the compiler changed the program to that which the programmer intended; "G" means the compiler did not change the program to that which the programmer intended, hut the change it made was such that the programmer should have no trouble identifying his error; "F" means the description of the compiler's recovery would not help the programmer much in identifying his actual error; "P" means the compiler could have misled the programmer as to the identity of his error. The errors made in the seven programs and the compiler's recovery measures are as follows : ; left out after either or (19 times) ... missing ; inserted (E) ELSE left out (8 times) ... ELSE inserted (E) THEN left out (3 times) . left out (6 times) .. initial BEGIN left out : after label left out THEN inserted (E) inserted (E) BEGIN inserted (E) : inserted (E) : of assignment operator left out ... : inserted (E) operator missing in I ■*• I + kJ ; ... multiply sign inserted (E) =: ... changed to := (E) GO TO 6; ... changed to GO TO (E) GO TO TE; ... 7 deleted (E) I := TRUE; (i declared INTEGER) ... changed to I := ; (E) IF L := ... : deleted (E) 56 FILE REAL declaration types not valid but allowed LABLE J syntactically "by : := . Semantic routine printed invalid type message. (E, E, E) THE DAY := ... DAY deleted (Actually the error here is one at the lexical analysis level since THE DAY was meant to be the single identifier THEDAY.) (G) IF THEDAY + JAN 1 > 7 ... multiply operator inserted before the 1 (Here again it is actually a lexical error. ) (G) GO TO MULTIPLY END ; ... END and ; interchanged and then after detecting an error following the END a ; was inserted following the END (G) IFM/ THAN 55 ... + inserted before 55 (G) SUM*- SUM + 10 END; IF ... a ; is missing after the 10 and the END; is extraneous since there has been only the one BEGIN at the beginning and this is not the end of the program . . . The recovery treated this as one error and deleted the END. (E) GO TO MIX ENDA; ... a ; is missing after MIX and ENDA is the label on the next statement, i.e. the ; is intended to be a : ... Again the recovery treated this as one error and deleted the ENDA. (G) A conditional statement was of the form IF THEN with ELSE ; missing (Since : := EMPTY, we can consider ELSE ; 57 as missing) . . . The recovery inserted only the ELSE causing the next statement to follow the ELSE. (G) I + A followed by A + 1 ... The programmer intended this to he I •*- A; and A •*• 1; ... In the three symbol string mode this was changed after three errors flagged to I ■(■ + A x A + 1; Due to the way the implementation was handled, the two symbol string error recovery changed this to I -*- A x A + 1; If the two symbol string error recovery had not allowed the pattern 710 as an acceptable pattern with the second symbol an identifier and then checked only the 71 when the pro- gram was changed, it would have found no acceptable pattern for the first + and since the top marker was it would have skipped to the first A and called the intervening + an . Note that the effect is the same, but the way it was actually done is clearer to the programmer than the message: ***** REDUCED 1 SYMBOL TO A ASSIGNARROW (E, G, E) Table 1 gives a tabulation of the above results. From this table, the measure of effectiveness given in Chapter I can be calculated. This formula was : ™« 4.- E + 3AG + F/2 + P/U N Effectiveness = ' — — ' ' — • — N + M N + X where N=E+G+F+P. From the above table, E= 51, G= 8, F=0, P = 0, M=0, X = 5- Hence 58 Gross errors: 1 Number of symbols involved: 5 Number of symbols skipped: 5 Subsequent errors : k Multiple errors : Doubles : 3 First recovery fixed second error: Triples : 1 First error: Second fixed third: Single errors: 53 Subsequent errors: 1 Totals: Actual errors: 63 Compiler error count : 6k Extraneous errors : 5 Errors missed: E G F P 10 10 10 ^9 51 8 Table 1. Table of DEMALGOL Syntax Errors 59 the effectiveness of the DEMALGOL compiler on these programs is .891. If we were to ignore the four extra errors produced as a result of the FILE declaration, the effectiveness would be .95- Appendix 1 gives listings of the DEMALGOL programs. k.h Comparison of Burroughs ALGOL Compiler with DEMALGOL Compiler The seven programs mentioned above were also run on the Bur- roughs ALGOL compiler. This was possible because DEMALGOL is a subset of the Burroughs Extended ALGOL. This section presents the results of these runs. The errors detected and the corresponding diagnostic messages were as follows : ; missing (8 times) ... "missing ' ; ' or END" (E) THEN missing (2 times) ... "missing THEN" (E) THE DAY ... "undeclared identifier" (E) JAN 1 ... "undeclared identifier" (E) x (multiply sign) missing ... "missing ';' or END" (G) IF L := THEN ... "expression not of type Boolean" (G) GO TO 6 ... "expression not of designations! type" (G) GO TO 7E ... "expression not of designational type" (G) + THAN ... "undeclared identifier" (G) I := TRUE ... "primary may not begin with a quantity of this type" , (F) final . missing (6 times) ... compiler terminated with an end of file error (P) 6o This is a total of 2k errors. Twelve were rated "E" , five were rated "G", one was rated "F", and six were rated "P". The total is less than that for the previous section since some of the former errors are valid constructs in ALGOL. Since the recovery used is to skip to the next semicolon, the compiler missed 21 errors as it skipped over them. Although there were no extra errors created here, this compiler has been known to create hundreds of extra errors. The value of the effectiveness formula for this set of recoveries is .39^. The above ratings do not take into account the fact that the error messages actually give just a number which is used as a reference to an external listing of the error messages. If this listing is unavailable, the message given the programmer is simply that an error occurred. This error diagnostic is probably a "fair" description of the programmer's error. The author has frequently heard someone saying something like "What is error 11^?". By assuming all the excellent and good recoveries are just fair, the value of the effective- ness formula for these recoveries with a listing of the error messages unavailable is .233. Obviously the automatic error recovery functioned much better than the hand-coded techniques used in the ALGOL compiler. U.5 Comparison of 7090 ALCOR Compiler with DEMALGOL Compiler The 7090 ALCOR-Illinois compiler developed by Gries, Paul, and Wiehle (50) was also available for comparison, running on an IBM 709^+. This section gives the results of running the previously discussed seven DEMALGOL programs on this compiler. This compiler is based on a transition matrix parsing algorithm (U8) and hence has the same advantage as the modified Floyd production 61 language parsing algorithm used in the DEMALGOL compiler, namely that of immediate error detection. The subroutines which handle the error recovery- construct the error messages from some skeleton forms as the DEMALGOL compiler does , using the symbols given and the names of the nonterminal symbols defining the language. The ALCOR compiler is more like the DEM- ALGOL compiler than the Burroughs ALGOL compiler in its effect. It missed only a few errors , but it generated quite a few extraneous errors and the error messages were sometimes vague. The compiler detected 10^ errors of which 5^+ were extraneous, but it missed six errors. Of the 50 valid errors which it detected, the error descriptions received ratings as follows: There were 27 with an "E" rating, 15 with a "G" rating, seven with an "F" rating, and one with a "P" rating. By the effectiveness formula given in Chapter 1, this compiler on these programs has an effectiveness of .U2. Sixteen of the extra errors were due to the 'LABEL' declarations in which all the identi- fiers in the were flagged as undeclared. With these sixteen extra errors ignored, the effectiveness value is .U9T • The specific errors and the ALCOR compiler's descriptions of them can be found in Appendix 2. The ALCOR compiler found almost all the errors and described more than half of them excellently. Except for the large number of extraneous errors, it performed very well. 4.6 Summary Comparison of Three Compilers The ALCOR compiler is more effective than the Burroughs ALGOL compiler by the measures described in Chapter 1, but the DEMALGOL compiler 62 is significantly more effective than either of the other two. This is illustrated by the summary given in Table. 2. DEMALGOL Burroughs ALGOL ALCOR-Illinois Excellent E 51 12 (0) 27 Good G 8 5 (0) 15 Fair F 1 (18) 7 Poor P 6 1 Missed M 21 6 Extra X 5 (1) 5h (38) Effectiveness .891 (.< ?5) .39^ (.233) .421 (.497) Table 2. Error Recovery Effectiveness of Three Compilers This table gives for each of the three compilers operating on the seven programs (l) the number of errors detected arranged according to their ratings; (2) the number of errors missed; (3) the number of extraneous errors detected; and {k) the effectiveness value calculated by the formula: __. .. E + 3AG + F/2 + PA N Effectiveness = ' N + M ' FTT where N=E+G+F+P. The numbers in parentheses give the alternative counts which were dis- 63 cussed in the preceding sections and the resultant changes in the effec- tiveness values. These results demonstrate the practicability of the methods of error recovery proposed in this thesis. These methods have "been shown superior when compared with other working systems. 6k CHAPTER 5 • COMPILERS FOR THREE OTHER LANGUAGES 5.1 Introduction to the Three Languages and Their Compilers Compilers for three other languages have "been implemented using the current version of this translator writing system with its automatic error recovery. Another is nearly complete and some work has been done on a fifth although there is no data available at this time on the effectiveness of the error recovery for either of these two. Other compilers have been built in the past but, since these were com- pleted before the current version of the error recovery system was implemented, they too offer no data for effectiveness studies. Each of these has been a complete compiler as opposed to being just a syntactic processor as is the DEMALGOL compiler. This chapter gives a brief description of each language, a discussion of some of the error situations encountered by each compiler, and a summary of the error recoveries from all the available programs for each language. 5.2 TESLA 5.2.1 Introduction to TESLA TESLA is a computer design diagnostics language developed by Luther Abel, implemented by Nicole Alldgre, and modified by Bill McTeer. It allows the description of circuit board logic and simulates the output signals which would be produced by that logic with a given set of input signals. The hardware failure diagnostic work for the ILLIAC IV is being done with the use of TESLA. A complete description of TESLA can be found 65 in Allegre (l), although for reference the syntax has been copied in Appendix 3- 5.2.2 TESLA Error Recovery TESLA has a quite different structure from the other languages considered and, because of the manner in which it is usually used, it is subject to different kinds of errors. This section indicates some of the special error situations and then summarizes the error recoveries in the runs analyzed. Frequently the same program is used many times with only a few cards changed each time. This happens when the same circuit logic is to be tested with different input signals. The result is that a higher proportion of the errors in TESLA than in other compilers are the result of cards being out of order. Since there are usually several symbols on a card, this means that to the parser two large sets of symbols have been interchanged. The time needed to analyze a context greater than three symbols is prohibitive except in very simple languages. Hence the interchange of two cards cannot usually be identified as such in the parser, and the error recovery must handle the error in some other way. Another common cause of errors in TESLA is the position of the sequence [ : ] which is called . It is easy to get a out of order and, since this construct con- sists of five symbols , it is larger than the context available for analysis in error recovery. can therefore never be put in the right position in one error recovery operation. 66 Only a few programs with syntactic errors were saved from discard and could "be used for the evaluation of the error recovery. The following errors were obtained from those listings of runs of the TESLA compiler which were saved: ; missing (3 times) ... ; inserted (E) final END missing (2 times) ... END inserted (E) final . missing (2 times) ... . inserted (E) DUMMY used where only is allowed . . . changed DUMMY to (E) INPUT SIGC ; , SIGC ... ; extra ... changed , to INPUT (equivalent to deleting ;) (E) DIGIT A 2( . . . = missing after A . . . Because the scanning mode is changed back and forth by some semantic actions, the 2 was scanned as a character instead of a number. Hence the recovery replaced the 2 by =. Otherwise, if the 2 were a number, the = would have been inserted between A and 2. (E) (i.e. ,) out of order coming after the end of the (k times) ... the , was changed to a : making the a (G) , NOT ( ... [ : ] before NOT missing ... inserted before NOT (also valid) (G) CLOCK 3; ... SIMULATION CONTROLS> missing ... changed to REPEAT 3 a SIMULATION CONTROLS> (G) 67 INPUT 0:23 SIGA 2( ... identifier SIGA should be after INPUT ... changed to INPUT 0:23 SIGA, ( Given : INPUT SIG12 STEP12202 : FFOLD; [0:0] ... "\ Should have been: ... J ... J Created: r INPUT SIG12 STEP12202 , [0:0] v . changed to (G) STEP12202 : FFOLD; INPUT SIG12 [0:0] ... Ek6:h6]2( ... should have been [U6:U6]2( Ek6[k6:2)( (G) NOT [23:23] ... should have been [23:23] NOT ... changed to NOT , [23:23] (F) card sequence number in n (12 A 1 2 (F) Of these 21 errors, the recovery in 10 cases was excellent, while in nine cases it was good and in two cases it was fair. There were six extraneous errors created and no errors missed. The effective- ness of TESLA in these situations was : Effectiveness _ 10 + 9-3A + 2-1/2 21 21 21 + 6 = .657 5.3 ICL 5-3.1 Introduction to ICL ICL, Illiac Control Language, was designed to be the language for controlling the job description for execution on the Burroughs B65OO or ILLIAC IV. The compiler was originally designed using a recursive descent compiler building system with no automatic error recovery features This necessitated a complete implementation of all the error recovery "by the language designers as they built the compiler. The working version of the ICL compiler was modified to run on the modified Floyd production translator writing system. Most of the work was involved in making the semantic action calls equivalent as only minor modifications were required to the syntax. The productions that had "been inserted for error recovery were removed and only the semantic error description features were retained. A description of ICL can be found in Pavis (77) and a complete syntax for ICL is found in Appendix k. 5.3*2 ICL Error Recovery As the ILLIAC IV and its operating system, have not yet been completed, there do not exist any ICL programs other than the error-free ones that were used in testing the ICL system. However, with three solicited contributions and some of the author's tests, some indication of the effec- tiveness of the error recovery can be made. The error recovery was not as effective as the DEMALGOL error recovery for several reasons: (l) The constructs were less familiar to the programmers than those of DEMALGOL and hence the programmers tended toward more complex mistakes; (2) The language being larger had greater complexity allowing the error recovery to produce a greater variety of possibilities with an increased likelihood of an incorrect match fitting the error situation; (3) The ICL compiler depends much more on the use of semantic tests which in error situations have a tendency to direct the compiler down the wrong path before an error is detected so that when the error is finally detected the recovery mechanism, is not able to find the 69 correct path. Too late to be included in this thesis, ICL was modified to eliminate completely the problem with semantic tests. This is expected to improve considerably the error recovery in those situations influenced by the semantic tests. Whereas the DEMALGOL compiler had to reject symbols in only one case, the ICL compiler had to reject symbols in seven cases in slightly more than half as many error situations. Nevertheless, in over half of the situations, the recovery was perfect and in most of the rest the recovery although incorrect clearly showed the programmer what his errors were. Six errors were missed by the recoveries which had to skip symbols to recover, and five extra errors were generated. The following summary lists the errors, the recoveries, and the ratings assigned by the author: FILE XF[1000] ... = needed after XF (2 times) ... = inserted (E) PRINT ( ) ... should not be included in parentheses (5 times) ... ( deleted and later ) deleted (E,E) ; missing after or (k times) . . . ; inserted (E) SANE AS . . . misspelled, should be SAME . . . changed to SAME AS (E) ILLIAC PROGRAM = COMPILEONU ... = extra ... deleted = (E) ALL OF TRY . . . BEGIN missing after OF . . . BEGIN inserted after OF (E) 70 •<- used as assignment arrow instead of : = (5 times) ... := inserted with two errors , one to insert : the second to replace *- with = (G) (Although this is perfect, because of the distraction to the programmer of having two errors for one, it was given a G rating. ) THEN ; ELSE ... changed ELSE to BEGIN (G) THEN missing (2 times) ... changed ELSE to THEN (G) missing after THEN . . . inserted which was syntactically correct but, because the null identifier failed all semantic tests, a parsing path was chosen which required additional errors to insert " : = " (G) COMPILE GLYPNIR ( ... COMPILE is not an ICL key word ... inserted := after COMPILE in two errors, one for each symbol GLYPNIR ( INI , CI ) ... changed , to = PROGPAM Y( INPUT, 0UTPUT=X/Yj = BY ; ... The label equation =X/Y is not allowed as part of the parameter list ... inserted ) after OUTPUT. This caused another error at the later ) in which the recovery skipped to the ; (G) , left out of list in a declaration . . . skipped to the next , in the parameter list of the next identifier. This caused the remaining parameters to be declared as separate identifiers 71 and created one additional extraneous error (g) since the recovery communicated well what the error actually was. label equation used in an (k times) ... label equations are allowed only in declarations, not in executions . . . skipped to symbol following the execution step: THEN (twice), ELSE, and END. This caused it to miss three errors and create one extraneous error. (F) B6500 FILES A, B, C; ... FILES should be singular ... skipped to ; missing three errors (each of the identifiers was to have a label equation) (F) • missing after an (which is a single program name with parameters) ... inserted the word WITH and caused two extraneous errors. Because this was more confusing to the programmer, it was rated (P). This summary indicates 19 recoveries rated E, 12 rated G, five 1 rated F, and one rated P. There were six errors missed and five extra errors created so the effectiveness of the error recovery was: J 19 + 12.|+ 5.|+ l.J 37 Effectiveness = 37 + g ' " — 3? + E= = -630 1 If the errors regarding using <- instead of : = are rated E, then this measure changes to .655* 72 5.3.3 Error Recovery in the Recursive Descent ICL Compiler The same programs described in the previous section were run on the original recursive descent ICL compiler. This section discusses the results of these runs which were much less satisfactory than the ones just described. Any situation which the language designer does not anticipate causes the compiler to hack up to the top and fail. This top failure is not checked and hence the compiler does not include this in its error count. In three of the four programs, the first error the compiler encountered was an unforeseen one, so the compiler terminated after two or three cards say- ing that there were no errors. The compiler required five runs to get to the end of one program. As a result this compiler missed a lot of errors, including some that were missed several times. Due to the fact that some things had to he fixed to get the compiler to run to conclusion, the set of errors listed below is somewhat different from the set described for the I previous ICL compiler. The following are the errors found, the explanations ; of the errors, and the recoveries: ,ALGSE,1, in parameter list ... should have been ,ALGSEM1, . . . "impermissible parameter" W SANE AS . . . "syntax error in file map" ( G ) label equation in parameter list (5 times) ... "impermissible parameter" v ; <- (5 times) ... should have been := ... "statement (f) not recognizable" v COMPILE GLYPNIR ... "statement not recognizable" (F) B5500 FILES ... "statement not recognizable W 73 (F) (F) (F) (P) (P) (P) (P) (P) ILLIAC PROGRAM = ... "statement not recognizable" no BEGIN after ALL OF . . . "statement not recognizable" , OUTPUT = X/Y) = BX ... "impermissible parameter" and "label equation required" PRINT( "string") {h times) ... "missing string" ; missing (2 times) ... "statement not recognizable" THEN missing ... "statement not recognizable" XF [1000] ... "missing file map specification" This is a total of 26 errors. The recoveries were rated as follows: one excellent, one good, 15 fair, and nine poor. There were 98 errors missed, including those missed several times, but there were no extra errors created. This gives an effectiveness rating of .093 "by the effectiveness formula: 3 , , 15 ■ 9 ^ 1 + j-.l + — + r- 26 Effectiveness = ^ + gf . ^ = .093 This compiler was rarely able to do more than tell the programmer I that an error occurred and it was unable to recover so many times that it |missed many errors, causing the programmer to make additional runs. Hence this compiler was very ineffective in its error recovery as reflected by its effectiveness value of .093- 5.1+ 0SL p.^t-.l Introduction to OSL 0SL was designed by Peter Alsberg (2) as part of his Ph.D. research. \s an operating system language, it is functionally similar to ICL. It is ouch more complex than ICL however; in fact it is the most complex and the largest language implemented on the translator writing system. This fact 7^ itself led to some of the conclusions reported later. A complete descrip- tion of OSL is available in Alsberg (2). Because of the size of the defini- tion of OSL, the syntax is not included in this document. Hence appropriate comments will be made along with the discussion of each error in order to clarify the syntactic error that was made in each case. 5.^.2 OSL Error Recovery As OSL was being developed, its creator ran several test pro- grams to debug his work and found the error recovery features implemented quite satisfactory. The only drawback was the significantly long time the compiler took to make the recoveries. It was this fact which led to the use of two symbol strings instead of three symbol strings. Also other changes were made to the string generation procedure to reduce the time involved. Unfortunately, all of the creator's tests were discarded. Hence no set of programs with errors was available for this thesis and all of those discussed below are some the author ran himself. In the following listing of the errors found by the OSL compiler, the error is given, followed by the appropriate syntax constructs and the recovery that was used. ; missing (k times) ... same use as in ALGOL ... ; inserted (E) ESAC missing ... CASE OF list of [ELSE ] ? FI ... FI inserted (E) 75 END missing ... same use as in ALGOL ... END inserted (E) operator missing in K <- J + 6l ; . . . * (multiply) inserted (E) , used as separator in a can be was inserted after K (E,G) 6 x 23 ••• * , not x, is the multiply operator ... changed x to + (G) ) missing in a structure declaration ... ) inserted but not at the place intended (G) extra ) in structure declaration . . . last ) deleted whereas the next to last was really the extra (G) initial BEGIN missing ... an OSL program can be a or a list with pattern specif ication> lists as its elements. Both lists are separatee The list begins with an identifier . by possibly one "[number : number]" followed by either an identifiei cbit pattern>. Each succeeding . A common error was interchanging the "[number : number]" dentifier in front of it. This made that identifier look like hich really can follow the "[number : number]". Hence after the the missing identifier error in which an identifier was in- ; another error when the literal came up. 83 Because the identifier looked legitimate, the parser went too far, finishing the prematurely before it found the error. Hence when it did find the error, the -recovery had to "be more complicated and less clear to the programmer since it had gone past the actual circum- stance of the error. The recovery could only describe the error in terms of the position in the parse in which the error was finally detected. This happened to be the beginning of either a or an : : = . he problem for which this production was a solution was that the use of a non-DEMALGOL declaration type appeared to the DEMALGOL compiler as an identifier. An identifier following a semicolon or BEGIN looked like the "beginning of an assignment statement. When the compiler came to the first identifier that was declared to have that type, an error in an assign- ment statement was detected and an assignment arrow was inserted. The ensuing commas all had to be changed to operators. Furthermore the next declaration was in error since all the declarations had to come at the be- ginning of the block, before any statements. This error was easily fixed with the insertion of a BEGIN; but that necessitated an additional END at the end of the program. Adding the above production with a semantic routine which stated that the declaration type was not valid solved all of these problems very neatly. However that change now makes an assignment statement : with a missing assignment arrow look like a declaration. Of the two choices,: the latter seems to be a less frequent mistake than the former and is there- ! fore preferable. Another example of an error which could be allowed as valid is that illustrated by the semicolon in the following ALGOL construct: IF THEN ; ELSE . Since the semi- colon is valid if the statement is a partial conditional, the parser conclude the partial conditional statement and finds an error at the beginning of a statement when it looks at the ELSE. The ALCOR compiler checked for a semi- colon preceeding an ELSE. The ELSE in GLYPNIR, see Lawrie (66), was re- placed with [ELSE / ; ELSE] making the semicolon syntactically valid. Although there is no reason not to allow the semicolon to be a valid constru" 85 in the language when it has been added to the syntax, a semantic routine could print an error message. OSL and ALGOL 68, see van Wijngaarden, et al (97)> avoid this error by allowing statement lists following the THEN and the ELSE hut requiring an easily forgotten keyword FI at the end of the conditional statement. The FI enables the parser to be able to conclude the conditional statement. A forgotten FI could allow a number of state- ments to pass before a situation is reached where the FI is required for the parser to continue. Again it is not possible to conceive of all the potential error situations at the time the language is designed in order to be able to anticipate some of them through additions to the syntax, but any which are discovered or anticipated and can be handled precisely with such additions will make the error recovery that much better in communicating to the programmer. 6.5 Problem of Segmented Language Structure The problem dealt with this section is one which arises in con- structs like an ALGOL block in which all declarations must precede all statements. The problem of declarations coming after statements or an error in a declaration making it look like a statement and therefore throwing off all the remaining declarations has occasionally been helped by allowing statements and declarations to be intermixed syntactically. This is done by ICL and also by the GLYPNIR compiler. 86 6.6 Redundancy The example in Section 6.k with the ";ELSE" illustrates another step that can be made in the language design to acilitate the error des- cription and recovery. That step is the matter of allowing some redundancy in the language. This actually amounts to allowing some situations to be accepted as valid which would otherwise be errors. Sometimes it is quite easy to allow several different ways of saying the same thing, including several of the more common ways that a given construct might be written in error. When a pair of symbols or constructs might easily be interchanged, thought should be given as to whether there is any reason why both alter- natives might not be permissible. If this could be done in TESLA, it would help the TESLA error recovery. The use of redundancy in places which are easily written incorrectly would be a big help to the programmer. 6-7 Noise Symbols Another type of redundancy which could be added is the presence of noise symbols, i.e., either symbols which are currently in the language but not actually needed for parsing or symbols which are not in the language but are apt to be inserted by a programmer. In either case, permitting both the presence and absence of these symbols in the language would probably allow a few situations which are errors to become valid, and therefore the programmer would have less to do in order to write an acceptable program. In ALGOL, the ": " of ":-" in an assignment statement is noise but the same thing as an assignment in a primary as part of a Boolean expression is not noise. Allowing "=" in addition to ":=" in the places where there would be no ambiguity would eliminate some errors. Also in ALGOL, the TO 87 of GO TO is noise and is not required. In ICL, OF in ALL OF BEGIN is a noise word and is not required. However, all of these examples cause no problems to the error re- covery. In each case, the compiler would be able to identify exactly what was missing and tell the programmer what symbol or symbols he must put in. In fact it is the characteristic that makes symbols noise symbols that allows the parser to be able to tell exactly what is missing, namely that the parse is completely determined without those symbols. Therefore when the parser gets to them and does not find them, it is able to tell exactly what is missing since there is only one path valid at that point. Also since the parse is completely determined without them, they can be left out of the definition of the language without introducing any ambiguity in the resulting parser. Another aspect to required noise symbols is that they can sometimes help in the recovery from other errors. For example, if a crucial symbol which distinguishes several possible alternative branches is in error (left out, misspelled, etc. ) but on each branch is another noise symbol which is not in error, then the error recovery mechanism will be able to tell from the noise symbol which path was intended. Similarly, if a particular lan- guage allows optional noise symbols throughout the program, it is probably to the programmer's advantage to make a habit of using them since for the same reason his errors might be more clearly indicated to him if he left them out. The use of noise symbols is an example of a redundancy which can improve the general error recovery. 88 6.8 Delineators A type of redundancy which is useful in some error situations is the delineator or separator between constructs of the language. This is illustrated by the semicolon in ALGOL. The semicolon between statements and declarations is not needed for parsing but serves to delineate the statements and declarations. When the parser takes the wrong path first and is not able to find an appropriate correction or the program contains a large set of extraneous symbols so that nothing the parser tries will fix the bad string, the parser must skip symbols to find one which follows one o: the symbols that it is seeking. It is with this type of error that it is important to have effective delineation symbols in the program. In an ALGOL-like language, there is usually a statement or declaration marker in the marker stack and hence the symbol- skipping will usually stop at a semi- colon and finish off either the statement or the declaration. This gets- the parser back on the track reasonably effectively, missing only those errors which remained in that statement or declaration. Without the semicolon, an identifier could follow a statement as either a label or the left hand side of an assignment statement and therefore the skipping would stop at the next identifier, the error recovery assuming that it had come to the end of the current statement. Clearly, the use of an identifier as a delineator will frequently be erroneous and the parser will find another serious error immediately. The recovery will be poor, causing the programmer much frus- tration in identifying the legitimate errors. Having delineators such as semicolons scattered throughout the program will help significantly in the recovery in the case of errors which are such that the only recovery avail- able is to skip symbols until a suitable following symbol is identified. 89 Since identifiers usually occur in many different constructs, it is usually a good plan to avoid situations in which they would become follow- ing symbols. It is important to know how the compiler building system operates in order to know what will become following symbols in the language. In the translator writing system described in Chapter 2 , there is a marker for every nonterminal symbol in a nonfirst position on the right hand side of the BNF productions. If the nonterminal symbol is followed by a terminal symbol, then that terminal symbol is a unique following symbol. If it is followed by a nonterminal symbol, then all the first terminal symbols on all the derivations for that nonterminal are the following symbols. If this ■ nonterminal is at the end of the right hand side of the BNF production, then the following symbols are all the symbols which can follow the left hand side nonterminal wherever it occurs in the syntax. By keeping this in mind, it is possible that the productions or definitions can be chosen in such a way i as to improve the sets of following symbols, in particular so that right parentheses and brackets, semicolons, commas, key words, etc., are follow- I ing symbols for as global markers as possible. This section has shown the desirability of having delineators in the syntax and the language specification considerations necessary to make ■ the delineators effective in error recovery. ■ 6.9 Ordering of Alternatives The final consideration to be mentioned is the order of the de- finitions. Because of the parsing algorithm used in the examples in this research, the order of the definitions does not affect the parsing except perhaps in the speed of the compiler. In other parsing algorithms, the 90 order may be fixed in order to parse correctly. However, where possible, the ordering can he changed to improve the error recovery in some situations. Since the error recovery changes the source string according to the first occurrence of the first pattern match it finds, the order of the definitions will affect which symbols are put in. The error recovery will always search the table of generated strings for any occurrence of the first pattern before trying the second and then any occurrence of the second before trying the third, and so forth. The search stops with the first pattern to match a generated string, and that string is used to correct the program. There- fore whenever there are parallel constructs (which would create the same patterns in most cases), the most desirable choice should be the first definition. Since the most desirable choice for error correction or des- cription is also probably the most frequently occurring construct, the order j that is best for error description is also the order that is best for pars- : ing speed. For example, "+ " and "-" are usually in parallel constructs and since "+ " is the more frequently used of the two (62), it should be the - first definition. Similarly, the most common declaration type should be the first definition of declaration type. Other orderings may be more subtle. For example, it may be possible to cause the error recovery to insert a multiplication operator rather than an addition operator where an operator was left out of an expression by changing the order of two pro- ductions which would not at first seem to be parallel. This may have been the case with DEMALGOL although this has not been verified. In short, the definitions should be ordered in such a way as to have the most frequently occurring constructs first. This will not only improve parsing speed but will also cause the error recovery to insert the more common of two or more symbols defined in parallel. 91 6 . 10 Conclusion This chapter has given various suggestions which a language designer can use to improve the error recovery in a compiler implemented on the system described in this thesis. Some of these suggestions pertain only to the language design and are independent of the system on which the com- piler is implemented. The considerations discussed have included: (l) avoidance of semantic tests, (2) use of different identifier classes, (3) avoidance of syntactic constructs which resemble common errors, (h) repres- entation of some errors in the syntax and then either printing an error message in a semantic routine or allowing the construct as valid, (5) an alternative to separating syntactic constructs into sequential groups, (6) use of redundancy, (7) use of noise symbols, (8) use of delineators, and (9) ordering of alternative definitions. The application of these concepts to the design of a language with its compiler constructed by the system out- lined in this thesis would lead to a compiler with very effective syntactic error recovery. This compiler would be very easy for a programmer to use in the sense that he would have very little trouble getting his programs to be syntactically correct. 92 CHAPTER 7. EXTENSION TO RECURSIVE DESCENT PARSING 7-1 Additional Considerations Needed for Recursive Descent Error detection is inherent in the modified Floyd production language^ parsing algorithm but not in the recursive descent parsing al- gorithm. This is the primary problem in applying the method of automatic error recovery presented in this thesis to recursive descent parsing. It is primarily with the problem of error detection in recursive descent parsing that this chapter is concerned. If, in parsing a program according to a group of Floyd productions, no Floyd production applied, then an error was detected. In recursive descent parsing algorithms, the fact that a given production did not apply ; does not imply an error. It could be just that the wrong branch was tried and that some other branch is the valid one. In general, the only guar- antee that there is an error is the event that the parse backs up to the program goal symbol with a false value. The approach toward error recovery '• taken by many recursive descent compiler building systems is to require the language designer to catch all the errors through special error productions and semantic routines. He must define not only the language but also the complement of the language with the latter denoted as error constructs. He must also define appropriate semantic routines to tell the programmer of the error. If the problem of identifying the error can be solved sufficiently well, then the previously described error recovery mechanisms can be used tc recover from the errors. Uniquely occurring terminal symbols can be in- serted in the same manner as was described earlier. Other errors can be 93 recovered from by using the parsing tables to generate strings -which are compared with the given string for a possible change which will fix the error; if this fails, symbols can be thrown away until one is found which can follow any of the constructs currently being sought. This chapter describes a method of putting error detection into a recursive descent parsing algorithm. This error detection mechanism re- quires a minimal initial analysis of the syntax of a language. 7-2 Initial Analysis In the initial analysis of the syntax, the syntax table is marked with three types of information: (.1) some symbols are marked as required; (2) some terminal symbols are marked as unique; and (3) a table of follow- ing symbols is constructed. To mark the appropriate symbols as required, all the definitions for a particular nonterminal are compared from left to right. When the string to the left of the symbol being examined is different from the initial strings of all other definitions for this nonterminal, then that symbol and a all those following it in the definition are marked as required symbols. The second function which must be performed is the identification of all those terminal symbols which occur in only one place in the syntax. Their occurrence is marked with a special flag. Finally, an entry in a table of following symbols must be made. This table will have an entry for each symbol, terminal or nonterminal, which is marked as required and an entry for each non-required nonterminal symbol which has a unique terminal symbol in one of its definitions. Each entry will include all the terminal symbols which can immediately follow that entry. 9h 7-3 Parsing and Error Detection In parsing, the special markings denoting required symbols and unique terminal symbols are used to mark each search as required or not. A required search which concludes with the value false constitutes an error. Recovery takes one of two approaches depending on whether a terminal or nonterminal was sought. When the search for a symbol is initiated, the search is marked as required or not according to the logical AND of the current search and the required flag for that symbol. The entry pointer to the table of following symbols is stacked. Then an error is detected when the symbol is not found and the search for that symbol is marked as required; otherwise simply the wrong branch has been attempted. When a terminal symbol is found which has the unique terminal flag^ the current search is marked as required regardless of whether or not it. was: so marked before. When an error is detected, the following steps are taken. If the . symbol being sought is a terminal symbol, then exactly the same steps are taken as described for the case of unique terminal symbols in the modified Floyd production parsing algorithm. These are the steps implemented by the PUTINSTACK procedure. If the symbol being sought is a nonterminal, the string generator procedure is called first. If it can find a correction to the string which will correct the error, then that change is made and parsing begins again at the beginning of the search for this nonterminal. If no correction can be found, the set of symbols is formed which is the OR of all the following symbol sets indicated in the stack of following symbol table entries. Symbols are then discarded until one is found which is a member of this set. The nonterminal symbol is then assumed to have been found and parsing continues. 95 7 .h Rationale for Required Searches This section gives the reasoning "behind the setting of the flags discussed above and their use in parsing. If a particular nonterminal symbol is required, one of the de- finitions for that nonterminal is required. Once enough symbols have been seen to identify -which definition is valid, the rest of the symbols of that definition are required. This is the motivation for the setting of the "required symbol" flags. If this nonterminal is not required, then neither are any of the definitions required nor any of the symbols in the definitions. Hence at parsing time, the requirement of finding a symbol is the logical MFD of the current search and the requirement for this symbol. If a particular terminal symbol occurs only in one place in the syntax, then the recognition of that symbol confirms that the partial parse ending with that symbol is correct since there is no other branch on which this symbol can occur. Hence the recognition of this symbol means that the current search can be set as required. With a more detailed analysis of the syntax, this could be done for two symbol pairs or even triples. That is, all unique two symbol pairs could be identified and, whenever the second symbol of the pair were recognized, the current search could be marked as required. However, only the current search can be marked as required by the recognition of a unique symbol. To illustrate this, consider the syntax of Figure 18. 96 : : = c e : : = g : : = h : : = f A BNF Grammar Illustrating Unique Terminal Symbol Flag Use Figure 18 The symbol "f" is a unique symbol but it can only cause the search for to be required since the right parse may include either or . One could expand the analysis of the syntax in order to determine those cases in which the recognition of a unique symbol could mark more than just the current search as required. However, at some point, the analysis might become complex enough that one should just make the conversion to modified ; Floyd productions, or some other deterministic parsing algorithm, which has \ much more accurate error detection anyway. 7>5 Conclusion The error recovery system discussed in previous chapters can be applied to recursive descent parsing by means of the special techniques outlined in this chapter. These techniques include marking symbols as re- quired, marking terminal symbols as uniquely occurring, and building a table of following symbols. Each search is marked as required or not according to these markers. A required search which fails constitutes the detection of an error. The recovery can then follow essentially the same methods as discussed in previous chapters. 97 CHAPTER 8. DEMALGOL RECURSIVE DESCENT COMPILER 8.1 The Recursive Descent Compiler Building System A compiler employing most of the system discussed in Chapter 7 was implemented for the DEMALGOL language. The compiler building system that was used is one designed and "built "by Robert Trout (90) for the ILLIAC IV Project at the University of Illinois. This compiler building system uses an extended BWE. It outputs ALGOL procedures for the recognition of each nonterminal. These procedures are merged with the global scanner and parser procedures and the semantic procedures. Since Trout's compiler building system outputs ALGOL code for the syntax and not tables, there were no tables available for use in string generation. Hence this DEMALGOL compiler did not implement that feature of the error recovery system. 8.2 Construction of the DEMALGOL Compiler The description of DEMALGOL was given to the compiler building system and the ALGOL source code that was produced was modified by hand in accordance with the methods outlined in Chapter 7 . The syntax was analyzed to determine the settings of the various flags and the entries in the table of following symbols. Also a table of nonterminal names was prepared for use in printing names for the error messages. The message printing and recovery procedures and the filling of the additional tables were added to the compiler source code. The parsing procedures were modified to contain the flag setting and testing, stacking and unstacking of the following symbol table pointers, and the calling of the recovery procedures. Another 98 change that was made was in the initial calling sequence. If the call of program returned with the value false or returned with the value true hut the parse was not at the end of the program, then the next symbol was dis- carded and the parse was restarted at that point. 8.3 Error Recovery Results The results of the error recovery of this compiler were not as good as those of the modified Floyd production compiler. This is primarily the result of not having the string generator procedure available since there was no parsing table to use to generate strings. Nevertheless, the effectiveness of the recovery was still better than the ALGOL compilers with which it was compared, as will be seen in Section 8.5. This version of DEMALGOL was not quite the same as the previous one. This version allowed assignment as a primary but did not allow com- ments. For the following results, the comments were removed from the pro- grams. This compiler depended more on the semantic tests than the previous one and this fact affected the error recovery to some extent. There was a certain amount of backtracking before the compiler was able to identify the error and this meant that in some cases the error was detected at a higher level, making the recovery less effective. In seven cases, the compiler had to restart the parse because, by the time the error was detected, the paths available were only those which conclude the parse. In each case this amounted to inserting END. BEGIN although it took several messages to say this (one for each symbol and one for the restart message). 99 8.k Summary of the Error Recovery in Seven Programs The following summary lists the errors which were found in the same seven programs as were used in the discussion of the results of the modified Floyd production DEMA1G0L compiler. ; missing (l8 times) ... ; inserted . ELSE inserted . inserted . THEN inserted BEGIN inserted ELSE missing (8 times) . missing (6 times) . THEN missing (3 times) Initial BEGIN missing final END missing . . . END inserted TRY ; ... TRY was meant to be a label ... : inserted GO TO 6 ... changed 6 to identifier GO TO 7E ... deleted 7 END ; ... interchanged and then inserted • one extraneous error) END; . . . this END matched the initial BEGIN and the previous statement was missing a semicolon . . . interchanged, inserted a period, and restarted the parse declaration types not valid but allowed syntactically by : : : = FILE_J ^semantic routine printed invalid type message (E,E,E, and 11 extra errors because of the (2,10) part of the FILE declaration) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E and (E,E) 100 ELSE ; missing ... inserted ELSE (G) ENDA ; • •• ; missing from previous statement, ENDA intended to be a label ... deleted ENDA (G) I *- I kj; ... J deleted (G) If L : = THEN . . . assignment allowed but a relation is needed . . . > inserted and then skipped to THEN (0 symbols) seeking an (G,G) HOME : END ... ; missing ... skipped to END (0 symbols) seeking a and inserted • (G) JAN 1 . . . skipped 2 symbols seeking (G) M ^ THAN 55 ... skipped 2 symbols (THAN 55) seeking (G) I + A ... the + was meant to be an <- . . . at + skipped one symbol seeking (G) A+ 1 ... at + skipped one symbol seeking (G) On each of these, the missing semicolon was correctly inserted and this was counted above in the first error reported. However, after inserting the semicolon on the first one, the parse terminated and was restarted creating k extra errors. = : ... skipped two symbols seeking (G) R := TRUE; ... TRUE and FALSE are not primaries in this version of DEMALGOL, hence they look like identifiers ... at TRUE skipped to ; seeking (F) THE DAY . . . inserted END . and restarted the parse (F and 5 extra errors) 101 BIN = ... : of assignment operator missing ... error at BIN while seeking , skipped to semicolon (p) INTO <- LO ... LO undeclared identifier, should have been 60 . . . error at INTO while seeking skipped to following symbol (P) PLUSONE <- TRUE . . . error at PLUSONE while seeking , skipped to following symbol (p) I : = TRUE . . . error at I while seeking , inserted END and . and restarted parse, inserting BEGIN (P and k extra errors) ENDOFJOB : END . . . label not declared . . inserted END . and restarted parse (P and 5 extra errors) 8.5 Comparison of These Results With Others Table k compares these results with those of the modified Floyd production DEMALGOL compiler, the ALCOR compiler, and the Burroughs ALGOL compiler for the same programs. Even though the performance of the compiler with error detection inherent in the parsing algorithm and a string gener- ator procedure as part of the recovery mechanism was much better than the others, the results of the DEMALGOL compiler built on a recursive descent parsing algorithm are still enough better than those of the ALGOL compilers to demonstrate that automatic error recovery is not only possible in recur- sive descent parsing but that it can be better than hand written error re- covery. Its biggest drawback appears to be that it generates many extra- neous errors. All but one of these were the result of situations in which 102 the parse terminated and had to be reinitiated. Terminating and reini- tiating the parse involved inserting a number of symbols in order to cause the given program to look like two programs placed end to end. Since the extraneous errors all came in groups of three to five (except the 1.1 from the FILE declaration), it does not seem that they would seriously inter- fere with the programmer's study of the other messages to find his legi- timate errors. If these extraneous errors seem to be a problem, one solu- tion might be to suppress all but the first error message occurring at a particular card location. These results demonstrate that effective automatic error re- covery in recursive descent parsing is definitely possible. DEMALGOL DEMALGOL Illinois Burroughs FPL Recursive ALCOR ALGOL Compiler Compiler Compiler Compiler Excellent Good Fair Poor Missed Extra Effectiveness 51 8 5(D h6 11 2 5 30 27 15 7 1 6 5^ (28) 12 (0) 5 (0) 1 (18) 6 21 ,891 (.95) .60.1 .tel (.U97) •39 1 + (.233; Table h. Comparison of All DEMALGOL Compilers 103 CHAPTER 9. LANGUAGE DESIGN CONSIDERATIONS FOR IMPROVED RECURSIVE DESCENT ERROR RECOVERY 9.1 Introduction The earlier remarks on language design also apply to the recursive descent case with the exception that the order of the productions makes a difference in recursive descent parsing. The order cannot he changed to be optimal for the error recovery unless the order that is best for error re- covery is also the order required for parsing. This chapter discusses some additional optimizations of the error recovery for recursive descent com- pilers. 9-2 Forcing Required Symbols in Positions of Common Errors By arranging the syntax appropriately, symbols which are commonly in error can be made to occur in positions of required symbols. In the following paragraph, this is illustrated by the commonly left out semicolon. In Chapter h, it was indicated that the syntax of DEMALGOL was different from ALGOL in that a semicolon was required to follow a statement instead of separating two statements for the improvement of the recursive descent parser. If a semicolon separates statements, then finding a semi- colon indicates to the compiler that another statement is expected. If the semicolon terminates a statement, then finding a statement indicates to the compiler that a semicolon is expected. In the first case, the compiler will terminate the search for a statement list when a semicolon is missing but, if the beginning of a statement is in error, it will be fixed. In the second case, a missing semicolon will be inserted but an incorrect statement 10U beginning will cause the compiler to conclude its search for a statement list. Since a missing semicolon was a more common error than an incorrect state- ment beginning, the second form of the syntax improved the error recovery. This illustrated the principle of arranging the syntax so as to cause the more common errors to come at positions in the syntax which will be required symbols. 9- 3 Syntax Description for Longer Required Searches Since a required search is not only a function of the required symbols but also depends on the previous recursion level being required also, it is necessary to arrange the syntax in such a way as to have re- quired searches reach as low a level as possible. '■ Figures 19 and 20 illustrate two different ways of defining a . In each case, the symbols underlined represent symbols which are marked as required. If we assume that is called from a non-required j search and that BEGIN occurs only in those places indicated in Figures 19 and 20, then, in the example in Figure 19, no subsequent search will be marked as required unless the preprocessor recognizes that all the BEGINs are in the same set of definitions in which case the "; " following in will be required. Whereas, in the example in Figure 20, BEGIN will set as required which will in turn cause DECLARATION LIST>, , "• " following , "■ " following , , "; " following , and END to be required searches. Obviously, the required searches will be carried to a much lower level in the second case and hence the error recovery will be much more precise than in the first case. 105 < C0MP0UND TAIL> : : = | j_ : : = j_ | END : : = BEGIN j_ | BEGIN Illustration of Poor Syntax for Long Required Searches Figure 19 : : = BEGIN DECLARATION LIST> < STATEMENT LIST > END : : = _j_ | \ DECLARATION LIST> : : = DECLARATION LIST> I I X STATEMENT LIST> : : = •_ | \ Illustration of Good Syntax for Long Required Searches Figure 20 The main difference between these two examples is that in the ; second one the uniquely occurring symbol BEGIN was made part of a much ? larger definition, namely , so that it caused all the subsequent I searches arising out of the search for a block to be required searches. In the first example, the BEGIN caused only the search for to be re- quired and so it had little effect on the rest of the search for . Arranging the syntax so that more symbols are marked as required and so that unique terminal symbols cause as global constructs as possible 106 to be marked as required can make an important difference in the effective- ness of the error recovery. This section has shown how the definition of an ALGOL block can be chosen in such a way that the error recovery is more effective. $.k Semantic Setting of the Current Search as Required Another addition to the system to improve the control over the error recovery would be the ability through either semantic routines or markers in the syntax to set the required search flags as the unique symbols do. It may be true that if the parser gets to a certain point in the syntax then the parse to that point is correct, although the analysis necessary to determine that automatically is beyond the scope of the preprocessing pro- gram. In this case, the special semantic marker could be inserted in the syntax by the language designer to cause the same effect as the unique symbol markers and consequently improve the error recovery by keeping the required searches extended down to lower levels. 9«5 Uniqueness of Different Alternatives In addition to having required searches extend down to as low a level as possible, it is also helpful to have alternative definitions differ as close to the beginning as possible so that more of the symbols of the definitions will be required symbols. This will cause more searches to be required and hence cause more errors to be detected at lower levels so that the recovery can be more precise. 107 9.6 Conclusion This chapter has discussed four additional suggestions beyond those of Chapter 6 which can improve the effectiveness of error recovery in the automatic error recovery scheme outlined in Chapter "J. These have included (l) putting symbols commonly subject to error in positions of re- quired symbols, (2) arranging definitions so that required searches are carried to as low a level as possible, (3) setting the required search flag from semantic routines, and {h) making alternative definitions differ as close to the beginning as possible. 108 CHAPTER 10. FURTHER RESEARCH 10 . 1 Introduction This chapter presents several paths that future research in this area could take. These include more work on the methods described in this thesis, extension of these methods to other parsing algorithms, special extensions for probabilistic grammars, applications in computer science education, and applications in extensible compilers. Each of these is discussed in a separate section. 10.2 Refinements to the Current System Three additional avenues of research apply directly to the systems of automatic error recovery already discussed. One of these is the area of optimizing the speed of the string generator procedure. As was indicated in Chapter V, the original three- symbol string generator took so long that it caused OSL programs with many errors to take about five to ten times as long as error-free ones, although the other compilers were not so adversely affected. This led to shortening the string length to two symbols. Also some additional tests were inserted to prevent the string generator from following two or more identical paths. A version of OSL with these changes was never implemented so there has been no test of the actual improvement. There are other conditions which the string generator might be programmed to recognize which would further reduce the time needed for error recovery. The second avenue of research which could be followed is an inves- tigation of the possibility of having the order of the Floyd productions automatically optimized on the basis of a record of the number of times each 109 production was actually applied. This would probably have a greater effect on the speed of the parser than on the effectiveness of the error recovery. The third area is an implementation and development of the re- cursive descent error recovery system since for this research only a hand modified compiler was used. With a compiler building system implemented using this error recovery system, it would be possible to refine the tech- nique further and evaluate its effectiveness over existing handcoded error detection and recovery systems. 10.3 Extension to Other Parsing Algorithms Another area of research is that of the extension of the present system to other parsing algorithms such as those based on precedence tech- niques. This research has taken two different parsing algorithms and shown that the form of automatic error detection and recovery described can be applied in both. It is conceivable that it could be extended to other parsing algorithms as well but this would need to be investigated in order to determine the extent to which it can be used and what modifications, extensions, or limitations are needed to make it work. 10.^ Extension to Probabilistic Grammars The work of Clarence Ellis (33 ) on probabilistic grammars suggests a further question. That is, to what extent could the error recovery be based on the probabilities of the generated strings in combination with the degree to which the generated strings match the given string instead of being based just on the degree to which the strings match. It would be interesting 110 to see if such additional information would improve the recovery in situations where it is currently only fair without minimizing the effect on those which are currently excellent. 10.5 Applications in Computer Science Education Compilers which describe syntactic errors in ways that are clear and precise could be a big help to beginning students. Some studies into the use of the automatic error recovery system described in this thesis for developing compilers for student use could be profitable. Beginning students in computer science seem to have three main areas of frustration in learning to program. The first is the discipline of atomizing the logic of a problem to the level needed to program it. It seems to be difficult to learn how to describe explicitly the procedure used ; to solve a problem. The second area is that of getting the algorithm repres-j ented in a programming language. Misplaced commas and misspelled words are | less crucial on English themes than they are in programming and it is some- times difficult to become accustomed to this preciseness. The third area is that of identifying and correcting the logical bugs in the program and in the algorithm. Certainly this area is not limited to beginning pro- grammers. Compilers built for computer science education, particularly be- ginning programming courses, using the error detection, recovery, and dia- gnostic features outlined in this thesis could help eliminate the frustra- tion experienced by beginning programmers in writing correct statements in Ill the programming language. Some research designed to determine whether such compilers would significantly reduce the effort needed in learning pro- gramming would be helpful. It might also he interesting to determine whether such compilers help experienced programmers accomplish their work significantly more quickly and easily. 10.6 Applications to Extensible Compilers Perhaps the most important area of application and further re- search is that of extensible compilers since some form of automatic error recovery is necessary for an extensible compiler to operate effectively. Extensible compilers are compiler systems in which a basic or core language is given to which other language constructs can be added. Theoret- ically, the core contains all the primitive syntactic constructs and seman- tic operators that are needed for any addition or extension to the language. An area of computer application which cannot be handled easily by any exist- ing languages is a candidate for an extensible compiler. The additional syntactic constructs which are needed for that application are easily speci- fied in terms of the existing syntactic constructs of the extensible lan- guage, and the semantics associated with the syntactic additions can easily be written in terms of operators given in the core language. The extensible compiler system merges these additions into itself giving a compiler for the new "extended" language in which problems from the given area of application can then be programmed. To a number of people extensible compilers appear to be the best solution to the task of providing programming languages for the increasing variety of problem areas to which computers are being applied. The approach 112 of producing separate languages and compilers for each problem area is con- sidered too expensive and limiting, and that of producing one universal programming language is considered both too large and too rigid to allow for future changes. These concepts were presented and discussed at a sym- posium of the special interest group on programming languages of the A. C. M. , and the papers were printed in volume k, number 8, August 1969* of the SIGPLAN Notices. As well as discussing the concept of extensible compilers this symposium also presented seven extensible languages which were either developing or operational. The increasing effort currently being spent on extensible compil- ers and extensible languages plus the fact that the extensible compiler must be based on some compiler building system in order to be able to build the extension into the compiler make this research on automatic error recovery of greater importance. If the extensible compiler is to function effective- ly, it must be able to handle errors in the extensions effectively. Also the extensions will cause constructs which formerly were errors to be valid, hence the error detection in the basic part of the compiler must itself be dynamic in order to allow the extensions to be valid. Therefore the only error detection and recovery mechanism which is suitable for extensible com- pilers is an automatic one. Some research which would apply the error re- covery system outlined in this thesis to an extensible compiler system would be very much in order. This research might parallel that described in sec- tion 10.3, namely that of determining the extent to which this error detec- tion and recovery is applicable to other parsing algorithms. 113 10 . 7 Summary This chapter has indicated five directions that could be taken in future research with the system of automatic error recovery described in this thesis. One of these, refinements on the current system, includes timing evaluations and speed improvements, automatic optimization based on run time records, and implementation of the recursive descent automatic error recovery system. Perhaps the single most important application of this system might be in the implementation of extensible compilers. Also of importance might be use of a compiler building system with this auto- matic error recovery system to build compilers for computer science educa- tion or other student use situations. Also suggested were the possibilities of extending the methods to other parsing algorithms and to probabilistic ;. grammars. The pursuit of any of these areas of further study could be re- warding and could provide valuable contributions to the field of compiler , construction. nk CHAPTER 11. SUMMARY This thesis has set forth a method of automatic syntactic error recovery, a discussion of its implementation, a presentation of example results, a comparison of these results with the results of the error re- covery of other compilers, suggestions for improvement of error recovery in specific languages, and suggestions for further research. The first chapter summarized the results of other research and gave the philosophy of this research. Although there has been much work in some related areas, there has been very little work on the subject of auto- matic error recovery. The basis of the philosophy of this work is the conviction that the compiler should attempt to find all the programmer's i syntactic errors on the first run and describe them clearly to the programmer, In order to accomplish this, the compiler should attempt to diagnose the error situation to such an extent that it can correct the source program because it is only in that context that the compiler can give complete in- formation to the programmer as to the nature of his error. A method of evaluating and comparing error recovery systems was devised. This method includes several variables which can be measured as well as a functional relationship between these variables which can reflect the extent to which an error recovery system approaches the goal of describing all a programmer's syntactic errors precisely. The second chapter presented the translator writing system in which this work on automatic error recovery was imbedded. The TWINKLE language and compiler for syntax description were mentioned followed by a 115 discussion of the conversion from the Backus Naur form created by the TWINKLE compiler to modified Floyd production language. This discussion included the types of Floyd production groups generated and the basis for the them in the Backus Naur form. The parser language and the parsing tables constructed from the modified Floyd productions were discussed as well as the optimization of the parser instruction table. The chapter concluded with a short comment on the conversion of the parsing table to Burroughs extended ALGOL source code. Chapter 3 gave the operation of the error recovery system in the translator writing system. The operation of the parser instructions which are specific for error recovery was discussed as was the operation of the procedures which are called by these parser instructions. Three types of error situations were identified by the type of parser instruction in which they were detected. The first of these was the case of a specific required terminal symbol. The error recovery in this case was to insert that symbol in one of three ways: (l) to compare the symbol in error and the next one or two with a table of legal following symbols, looking for a simple way of inserting the missing symbol to make the string legal; (2) to call a string generating procedure which would generate all two or three symbol strings legal at that point in the parse, looking for one which when compared with the symbol in error and the next three symbols would indicate a way to correct the program; and (3) to insert the missing symbol in front of the symbol at which the error was detected. The second type of error situation was the case of a group of productions all of which were to make the same reduction. In this case, the recovery either was to do step (2) above or to make the reduction anyway. The third type of error situation 116 included all other cases. Here the recovery first was to try step (2) above and if that failed, then it was to discard input symbols until it came to one which would allow it to make a reduction. The basic test and development language DEMALGOL was described in Chapter k. The results of the error recovery of the DEMALGOL compiler on seven small programs were presented. The results of running the same seven programs on two ALGOL compilers were compared with the results of the DEMALGOL compiler and summarized in Table 2. Three additional languages were implemented on the translator writing system described in Chapter 2. These three, TESLA, ICL, and OSL, were discussed in Chapter 5 along with examples of the error recovery of these three compilers. In one case, ICL, there was also another compiler with which the results could be compared. These results along with those of DEMALGOL were summarized in Table 3« Chapter 6 offered several ways in which error recovery could be improved for any compiler constructed with this translator writing system. The system of error recovery outlined in Chapter 3 could be extended to work in a recursive descent parsing algorithm. Chapter 7 presented an algorithm for error detection appropriate for recursive descent parsing which would enable the automatic error recovery system to be used. This algorithm provided a means of marking subgoals as required, causing an error to be detected if the subgoal were not found. A DEMALGOL compiler built with this recursive descent error de- tection system and automatic error recovery was described in Chapter 8. The results of this compiler on the same DEMALGOL programs used earlier were compared with the results of the other DEMALGOL and ALGOL compilers and summarized in Table k. 117 As Chapter h had offered ways to improve the error recovery of a modified Floyd production language compiler, so Chapter 9 offered additional suggestions for improving the error recovery in recursive descent compilers built using the system described in Chapter 7. Several further areas of research involving the systems described in this thesis were discussed in Chapter 10, the primary ones probably being the applications to extensible compilers and the applications in computer science education. The system of automatic error recovery described in this thesis seems to be a very effective system by all the comparisons and tests that have been discussed. It has proven superior to all other systems with which it has been compared in its ability to recover from errors in such a way as to find all the errors in most programs and to describe most of the errors exactly. Thus it has been shown to be superior in the effectiveness of the resulting error recovery by the definition of effectiveness given in this thesis. 118 LIST OF REFERENCES [l] Allegre, Nicole G. , "TESLA, A Control Language for Logic Simulations of Digital Circuits", Report No. 3^7, Department of Computer Science, University of Illinois at Urban a- Champaign, Urbana, Illinois, Aug. 1969. [2] Alsberg, P. A., "OSL/2, An Operating System Language", Ph.D. Thesis, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1971. [3] Backes, S., "Automatic Generation of Syntax-Processors". (AEG- Telefunken Res. Inst., Ulm, West Germany) Elektronische Rechenanlagen 12 April 1970, 80-86. [h] Baecker, Ronald M. , "Experiments in On-line Graphical Debugging: The Interrogation of Complex Data Structures". Proceedings of the Hawaii International Conference on System Sciences . University of Hawaii Press, Honolulu, Hawaii, 1968, 128- 129 • [5] Balzer, R. M. , "EXDAMS-Extendable Debugging and Monitoring System", Proc. AFIPS 1969 SJCC, 567-58O. [6] Bayer, R. , "Post Mortem Program Failure Analysis in a Time-Sharing Environment", (Boeing, Seattle) Rept . Information Sciences 33, June 1969, 23 pp. CFSTI, AD 692 6l6. [7] Beals , A. J., "The Generation of a Deterministic Parsing Algorithm", Report No. 30i+, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, January 1969' [8] Beals, A. J., "The Automatic Generation of Deterministic Parsing Algorithms", Ph.D. Thesis, University of Illinois at Urbana- Champaign, Urbana, Illinois, 1971. [9] Beals, A. J., LaFrance, J. E., and Northcote, R. S., "The Auto- matic Generation of Floyd Production Syntactic Analyzers", Report No. 350, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, (September 1969 )» presented at the SIGPLAN Symposium on Programming Language Definition, San Francisco, August 1969. [10] Bernstein, William A., and Owens, James T. , "Debugging in a t-s Environment", Proc. AFIPS 1968 FJCC, Part I, 7-14, [ll] Brady, Paul T., "Writing an On-line Debugging Program for the Experienced User", Comm. ACM 11,6 (June 1968) , 423-427- [12] Brooker, R. A., and Morris, D. , "An Assembly Program for a Phase Structure Language", Comput . J. 3 (i960), 168. 119 Brooker, R. A., and Morris, D. , "Some Proposals for the Realization of a Certain Assembly Program", Comput. J. 3 (l96l), 220. Brooker, R. A., and Morris, D. , "A Descriptive Mercury Autocode in Terms of a Phrase Structure Language", Second Annual Review of Automatic Programming , Pergamon, New York, 1961. Brooker, R. A., and Morris, D. , "A General Translation Program for Phrase Structure Languages", J. ACM 9 (Jan. 1962), 1. Brooker, R. A., Morris, D. , and Rohl, J. S., "Experience With the Compiler-Compiler", Comput. J. 9,^ (Feb. 1967), 3^5-3^9- Brooker, R. A., et al. , "The Compiler-Compiler", Annual Review in Automatic Programming , Vol . 3_, 1963, 229. Chapin, Ned, "Logical Design to Improve Software Debugging - A Proposal", Computers and Automation 15,2 (Feb. 1966), 22-2U. Cheatham, T. E. , "The TGS-II Translator-Generator System", Proc. IFIP Congress, New York, 1965, 592-593- Cheatham, T. E., and Sattley, "Syntax- Directed Compiling", Proc. AFIPS 19 6 U SJCC . Cohen, J., and Nguyen-Dinh, X., "Note on Ordering of Grammar Rules in Syntax-Analyzers", Comput. J. 9,3 (Nov., 1966), 250-251. Constantine, Larry L. , "Design and the Reduction of Bugs", In Concepts in Program Design , Paragon Press, Somerville, Mass., 1966, II5-I26. Cowan, D. C. , and Graham, J. ¥. , "Design Characteristics of the WATFOR Compiler", SIGPLAN Notices 5,7 (July 1970). Damerau, F. , "A Technique for Computer Detection and Correction of Spelling Errors", Comm. ACM 7,3 (Mar. 196U) , 171-176. Dean, P. , "Information Retrieval and Conversational Diagnostic Techniques", Computer Bulletin 12,2 (June 1968), U8-53- DeRemer, F. L. , "Generating Parsers for BNF Grammars", 1969 SJCC AFIPS Proc . Vol . 3^ AFIPS Press, Mont vale , N. J. , 1969 , 793-799. DeRemer, F. L. , "Practical Translators for LR(k) Languages", Rep't. MAC-TR-65 Oct. 1969 219 pp.; CFSTI, AD 699 501. Earley, J. C. , "Generating a Recognizer for a BNF Grammar", Computa- tion Center Rep. Carnegie Institute of Technology, Pittsburgh, Pa. , 1965. 120 [29] Ear ley, J. C, "An LR(k) Parsing Algorithm", Carnegie Institute of Technology, Pittsburgh, Pa., 1967 (mimeo). [30] Earley, J. C, "An Efficient C. F. Parsing Algorithm", Comm. ACM 13 (Feb. 1970), 91+-102. [31] Eichel, J., "Generation of Parsing Algorithms for Chomsky 2-type Languages", 61+01, Mathematisches Institut der Technischen Hochschule, Munich, I96I+. [32] Eichel, J., Paul, M. , Bauer, F. L. , and Samelson, K. , "A Syntax Controlled Generator of Formal Language Processors", Comm. ACM 6 (Aug. 1963), 1+51-1+55 • [33] Ellis, C. A., "Probabilistic Languages and Automata". Ph.D. thesis, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1969. [ 3I+] Evans, Arthur, "An ALGOL 60 Compiler", Annual Review in Automatic Programming, Vol . h_, 196k, 87-I2U. [35] Evans, T., and Darley, D. , "On-line Debugging Techniques: A Survey", 1966 FJCC AFLPS Proc. Vol. 29, 37-50. [36] Feldman, Jerome A., "A Formal Semantics for Computer-Oriented Languages", Ph.D. Thesis, Carnegie Institute of Technology, Pittsburgh, Pa., I96I+. [37] Feldman, Jerome, A., "A Formal Semantics for Computer Languages and Its Application in a Compiler-Compiler", Comm. ACM 9 (Jan. 1966), 3-9- ' [38] Feldman, Jerome A., and Gries, David, "Translator Writing Systems", Comm. ACM 11,2 (Feb. 1968), 77-113- [39] Ferentzy, Eors N., and Baubura, James R. , "A Syntax-Directed Processor Writing System", Proc. AFIPS 1968 FJCC, Part I, 637-61+7- [U0] Floyd, R. W. , "A Descriptive Language for Symbol Manipulation", J. ACM 8 (Oct. 1961), 579-58U. [1+1] Floyd, R. W. , "Syntactic Analysis and Operator Precedence", J. ACM 10 (July 1963), 316-333. [1+2] Floyd, R. W., "Bounded Context Syntactic Analysis", Comm. ACM 7 (Feb. 1961+) , 62-67. [1+3] Floyd, R. W. , "The Syntax of Programming Languages - A Survey", IEEE Trans. EC13, 1+ (Aug. I96U), 3^6- 35 3- [1+1+] Floyd, R. W. , "Nondeterministic Algorithms", J. ACM ik, k (Oct. 1967) 636-61+1+. 121 [1+5] Freeman, D. N. , "Error Correction in CORC: the Cornell Computing Language", Ph.D. Thesis, Cornell University, Ithaca, New York, 1963. [1+6] Fuchi , Kazuhiro, Tanaka, Hozumi , Manago, Uriko, and Yiba, Toshitsugu, "A Program Simulator by Partial Interpretation" , Second Symposium on Operating Systems Principles, Princeton University, Princeton, New Jersey, October, 1969. Brandon/Systems Press, Princeton, New Jersey, 97-IOI+. [I+7] Green, Joseph H. , "Program Analysis - A Program in Man-Computer Communication", Ph.D. Thesis, Harvard University, Cambridge, Mass. , 1969. [1+8] Gries, David, "The Use of Transition Matrices in Compiling", Comm. ACM 11,1 (Jan. 1968) , 26-3 1 + , or Tech. Rep. CS-57, Computer Science Department, Stanford University, Stanford, Calif., or Clearinghouse, U. S. Dep't. of Comm., Springfield, Va. , PB 176 765 March 1967, 63 pp. [^9] Gries, David, " Compiler Construction" . John W. Wiley and Sons, New York, 1971. [50] Gries, D. , Paul, M. , and Wiehle, H. R. , "Some Techniques Used In the ALCOR-Illinois 7090", Comm. ACM 8 (Aug. 1965) , !+96-500. [51] Grishman, Ralph, "The Debugging System AIDS", 1970 SJCC Proc . AFIPS Vol . 36, Atlantic City, N. J., 1969, AFIPS Press, Montvale, N. J., 1970, 59-61+. [52] Hext , Jan B., "Recovery From Error", Computers and Automation 16,1+ (April 1967), 29-31. [53] Ingerman, P. Z., "A Syntax-Oriented Translator " , Academic Press, New York, 1966. [5^] Irons, E. T., "A Syntax-Directed Compiler for ALGOL 60" , Comm. ACM h (Jan. 1961) , 51-55- [55] Irons, E. T., "An Error-Correcting Parse Algorithm", Comm. ACM 13 (Nov. 1963) , 669-673. [56] Irons, E. T., "Experience with an Extensible Language", Comm. ACM 13 (Jan. 1970), 31-UO. [57] Irving, D. C. and Morrison, G. W. , "PICTURE: An Aid in Debugging Geometric Input Data". Rep't 0RNL-TM-2892, Oak Ridge Nat'l Lab., Oak Ridge, Tenn. , May 1970, ik pp. CFSTI , AD 706 695. [58] Josephs, William H., "An On-line Machine Language Debugger for OS/360", Proc. AFIPS 1969 FJCC 179-186. 122 [59] Jossen, Sven Ingvar, "On-line Program Debugging", BIT 8,2 (1968), 122-127. [60] Kahan, B., and Dumas-Primbault, H. , "Principles of a Syntactical Method for Compiler Writing", Revue Francaise d'Informatique et de Recherche Operationnelle B-2 (Aug. 1969), 51-75. [61] Kanner, H., Kosinski, P., and Robinson, C. L. , "The Structure of Yet Another ALGOL Compiler", Comm. ACM 8,7 (July 1965), 1+27-^38. [62] Knuth, D., "An Empirical Study of FORTRAN Programs", Computer Science Report CS-186, Stanford University, Stanford, California, 1970, 9- [63] Korenjak, A. J., "A Practical Method for Constructing LR(k) Processors' Comm. ACM 12 (Nov. 1969), 613-623. [6k] Kulsrad, H. E., "HELPER: An Interactive Extensible Debugging System", Second Symposium on Operating Systems Principles, Princeton University, Princeton, New Jersey, October 1969* Brandon/ Systems Press, Princeton, New Jersey, 105-111. [65] LaFrance, Jacques E., "Optimization of Error Recovery in Syntax- Directed Parsing Algorithms", SIGPLAN Notices 5,12 (Dec. 1970). [66] Lawrie, D. H., "GLYPNIR, A List Processing Language for ILLIAC IV", Report No. 322, Department of Computer Science, University of Illinois at Urb ana- Champaign, Urbana, Illinois, April 1969« [67] Ledgard, H. F. , "A Formal System for Defining the Syntax and Semantics; of Computer Languages", Report MAC-TR-60, (April 1969), 206; CFSTI, AD 689 305. [68] Leinius, Ronald, "Error Detection and Recovery in Syntax-Directed Compilers", Ph.D. Thesis, University of Wisconsin, Madison, Wisconsin, 1970. [69] Lewy, Arieh, "A Routine for Detecting Data Cards Out of Sequence in Standard Programs", Educational and Psychological Measurement 28,1 (1968), I7I-I75. [70] Lomet, David Bruce, "The Construction of Efficient Deterministic Language Processors", Ph.D. Thesis, University of Pennsylvania, Philadelphia, Pa., 1969. [71] Machado, N. C, "ISL - A Semantics Language for a Translator Writing System", Report No. 367, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, (December 1969). 123 [72] McKeeman, Horning, Nelson, and Wortman, "The XPL Compiler -Generator • System", 1968 FJCC Proc. AFIPS Part I, 6.17-635 [73] Mercer, R. L. , "TWINKLE— A Syntax Language for a Translator Writing System", Report No. 396, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, June 1970. [74] Morgan, H. L., "Spelling Correction in Systems Programs", Comm. ACM 13 (Feb. I97O), 90-94. [75] Moulton, P. G., and Muller, M. E., "DITRAN-A Compiler Emphasizing Diagnostics", Comm. ACM 10,1 (Jan. 1967), 45-52. [76] O'Neil, John T., Jr., "META-PI - An On-line Interactive Compiler- Compiler", 1968 FJCC Proc. AFIPS Part I, 201-218. [77] Pavis, Denise C, "ICL - A Control Language for ILLIAC IV", Report No. 356, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, October 1969* [78] Pullen, E. W. , and Shuttee, D. F., "MUSE: A Tool for Testing and Debugging a Mult i- terminal Programming System", 1968 SJCC Proc. AFIPS 491-502. [79] Randell, B., and Russel, D. J., "ALGOL 60 Implementation ", Academic Press, London, 1964. [80] Resnick, Mark, and Sable, Jerome, "INSCAN: A Syntax-Directed Language Processor", Proc. ACM 23rd Nat M Conf . , 423-432. [8.1] Rosen, S., "A Compiler-Building System Developed by Brooker and Morris", Comm. ACM 7 (July 1964), 403-4l4. [82] Rosenberg, A. L., "A Note on Ambiguity of C. F. Languages and Presentations of Semilinear Sets", J. ACM 17 (Jan. 1970), 44-50. [83] Samelson, K. , "Programming Languages and Their Processing", Proc. IFIP Congress, Munich, 1962, 487-492. [84] Samelson, K. , and Bauer, F. L. , "Sequential Formula Translation", Comm. ACM 3 (Feb. i960), 76-83. [85] Schneider, F. W. , and Johnson, G. D. , "Meta-3: A Syntax-Oriented Compiler-Writing Compiler to Generate Efficient Code", Proc. ACM 19th NatM Conf., 1964. [86] Schorre, D. V., "Meta-II: A Syntax-Oriented Compiler-Writing Language", Proc. ACM 19th NatM Conf., 1964. 12k [87] Shantz, P. W. , German, R. A., Mitchell, J. G., Shirley, R. S. K. , and Zarnke, C. R. , "WATFOR - The University of Waterloo F ortran IV Compiler", Comm. ACM 10, 1 (Jan. 1967), kl-k 1 ?. [88] Simpson, H. R. , "A Compact Form of One-Track Syntax Analyser", Comput. J. 12,3 (Aug. 1969), 233-2^3. [89] Trout, R. G., "A Compiler-Compiler System", Proc. ACM 22nd Nat'l Conf., 1967, 317-322. [90] Trout, R. G., "A BNF-Like Language for the Description of Syntax- Directed Compilers", Report No. 300, Department of Computer Science, University of Illinois at Urban a -Champaign, Urbana, Illinois, January 1969* [91] Ungar, Stephen H. , "A Global Parser for Context-Free Phrase Structure Grammars", Comm. ACM 11, k (Apr. 1968), 2UO-2U7. [92] Usacheva, N. A., "Detection of Syntactic Errors in L-Programs", in Gavrilov, M. A., and Zakrevskii, A. D., (Eds.) LYaPAS ; a Programming Language for Logic and Coding Algorithms , Academic Press, New York, 1969~ [93] Van Horn, Earl C, "Three Criteria for Designing Computing Systems to Facilitate Debugging", Comm. ACM 11,5 (May 1968), 36O-365. [9^-] Varley, T. C, "Data Input Error Detection and Correction Procedures", Report Serial-T-222, June 1969, 2k6; CFSTI, AD 698 365- [95] Warshall, S., "A Syntax-Directed Generator", Proc. EJCC Washington, 1961, MacMillan and Co., 1961. [96] Whitney, G. E., "An Extended BNF for Specifying the Syntax of Declarations", 1969 SJCC AFIPS Proc. Vol. 3^, AFIPS Press, Montvale, N. J., 1969, 801-812. [97] van Wijngaarden, A., Mailloux, B. j., Peck, J. E. L., and Koster, C. H. A., "Report on the Algorithmic Language ALGOL 68". Numerische Mathematik 1^,2, 1969, 8U-218. [98] Wirth, Niklaus, "PL360, A Programming Language for the 36O Computers", J. ACM 15,1 (Jan. 1968), 37-7 1 +. [99] Wirth, Niklaus, and Weber, Helmut, "Euler-A Generalization of ALGOL, and its Formal Definition", Comm. ACM 9,1 (Jan. 1966), 13-25 and 9,2 (Feb. 1966), 89-99. 125 [100] Zakrevskii, A. D., "Checking Out L-Programs on Ural-1", in Gavrilov,. JVU A..,, and. Zakrev.skii, A. D. (Eds.) LYaPAS : a Programming Language for Logic and Coding Algorithms , Academic Press, New York, 1969* [101] Zimmerman, Luther L. , "On-Line Program Debugging - A Graphic Approach", Computers and Automation, 16,11 (Nov. 1967), SO-S^- 126 APPENDIX A DEMALGOL PROGFAMS 127 <_> 2 ■JL ■a I »-< ••■! UJ Si _i »- + i c c O O * —• LJ i DC a> •- »- O z o 2 —■ 2 »— .2 O Uj >- U, 1 a k a. »- c UJ 11 >-■ a o o 128 »- a. Q. C X U. C C.Z o o 2 O UJ 1 2 . UJ X a U. X « _ a i u a CL D u. z 1— c: a >- C i »-i U. O O uj O Z* u_ o z _J ►- o — 2 m ►- *~ MS 2 O »- z ►- - o >- Cj _> 2 a ^ 2 i/i 2 •- c -n o a. -a ►- •— o <\J J. o X o m X « X o CL S x * u. z a. *- c. z a: uj a c x t/> ►- •— US X z o UJ u. C. C- >- 2 ■a G u. 2 IT Uj • c. CK — a o L u u. »- O X z »- 1 ox 129 1/5 4/3 4/5 4/5 4/3 4/» o o o o o o z z z z z Z o o o o o o uucuuu i.i 1. 1 tjj UJ UJ LU IA V) lA IA- ►-—••- i-i —• >-i 3 mux z o o o o o o x c c c c c c 4/5 4/} 4/*, 4/V f/> 4/5 O O O O O O z <£ 2 z 2 2 a o o C o O o a <-. o O 4 u U U- U Lu u. o • • • 1/5 »/. 4/. t/j vn %/t CM 4/5 4/5 4/3 c c c • /n oc <— o «-• o CM Z TL 1 -~ o o o ■p- •- 4N B- cv. o o o o 4/> uuu 1/ » 4/; 4/5 4/3 ■o 1/*. V. V 4/- V. 4/5 u O- U - U u- u- u. UJ ►- 3 «» «r *-* L. w _. w _, _ ^ Uj 1 i 1 c a a a D. O- Ol a. o o o L> <-J «-> o o -« — i: — M JC 2 o > o <* a »— Q- ^_ rr «/> V, u «— 1 »~ u. Im cv. •a *- 3 X •— cc Q_ N^ _* J- o o % ui ci •-* •-H a. X •- • i ►- Ul ►- >■ < —I o o «»>-•« i « _j 2 t~ * u «- * > a. u ►- •-• u c a CM c x «-» a «x k^. o z o »- ft o --> M O z " o _* ►- o w _l o o ►- Ui X* ►- »— Z O • u zoo «X _J o «x «. — 2 1% *— H2 ' X* Ul a. *— X •- a u. a. a -a a. * v* PM CL H « o c _j C Ul Cj U a: •• •• ex u a Ul M •• Ct UJ *• a. o c. a. u a Cj _i 2 X _> z l/> -4 X s •-I U. •- x X a x x <-> ►- X <^ x o X Z O ~ IK X « « « « « * « * * « « « « « • o ►- 2 «« *- -< -i •« ►- ►» z O Ul >- U X a. a. ui h- «n »- < * r- »- «x v. >- ■*t a <-. o i- z ■a a u Ul o « X —1 1 ■a Ul * u C- a; O CL z ■« o >- UJ _l _J ►- _J z z o U •-« »- 2 a_ o .— «* u, a. *. _j a. O Ui a: ui •* — a. v. o c. u u H- ■ •• Ul o *-* X -1 2 «i ►- ►- ■a «« •O X X X <-> Ui « * * * c « « * « u. * « « * >- « « « * 2 « * • « •— u •» X * *- 131 "X U. wi yi t/i ^ y) t/j 000000 Z Z Z Z Z Z O O O O Cj o o *_ Cj cj L L , , t I. 1 I.I 1.1 i . I UJ IAV) V) IAl/1 V) f\j p*. m •-. o- o OO-ONO iflw) y) ifl ^llfl X J. X X X X oo o o o o c c z O UJ z UJ 3C a a x UJ •- a o uj -a Z 5 1^ I U. C -I •- o U- z u C jx or o «« Z X u. u. i. —I or — 4 u. o at I/, l/l tf) y. trt yj O O O O O O Z Z Z Z Z Z D O O O O O Cj C C- C Cj Cj L.U. b. U. U. LU y) y> I/) ITi IA IA cmomifio o o — o «> c x x a x x i o o o o o o cc c zz z cc c c-o o UjUJ UJ V. If. If. =3 L.OJ UJ a. ►-►-»- o Zj-Zj ZJ z z z a r_, __J __, rr, ,_, o u. 1 A X a. a a a a a Q- CJ C o CJ CJ tj oo o C Uj u. UJ U- V u. u_ X _J - u X « X W> M X -a _l U- «f »-i « jtXUj«-iO SiX C O- • C. Ct •- A ♦ ■♦■ c uj « O •-• OUI —Jt-O • jt- A < •a o z — ♦ zouj z o < u. «" O — Z +« •- i. |x •- O ►-. u. ♦ ♦ tt )J I- « X Or j* ►- •» C U. O U. m - n ui a ••"• UT Uj — — cc avo cc a u, c c » UJ Z O UJ O it X _l U. Z X _J Z * >- ►- X «l ►-.►-.•l I » JL t- X. O X X " * CJ -1U * * * « A * * «««« * « «*«H «« *«*« « * « * ■»« «««« «« X * * u. »• X • * . * • c u. u. X z o U, w c CJ »- Z ►- «« < c z fN - C »- >-> t3 — O Uj IE _j z u> f or a. ■a •— z »- •- z •- z a c; ►- z i/. « c z. •-• 4 or x or cd z c < u. or uj •- u: a. uj u. c i i X X i - — IS o a O 2 -J i ■ 4t o «J <•> U ^, _j a. ti- ■ < l/y er 0. — a. >- > c «a Uv -j ►- •^ a M •— •« w- < ►- a u UJ Ul O 2 r> X i 1— 2 ■«x X O 2 p— i. 1- -a •— i C •— U IS 2 o .-c 2 UJ * J» — 1 -a Oj —i 2 -a 2 2 u •- 2 ^- O UJ a o .— H U X C. L. a * — a. »- •* ►- a c Uj „j > (j IX. u* *» •• «• ■J ■— c c_\ a. u u u tr u ►- ►-• 2 2 'j. O o ►- * >— _J t— 2 _i 2 Ol l/J •-• o « * O rl i «< X X UJ ^ o * U 2 u 2 _j 15 IT ■ * * * * M UJ U. *■• 2 * * « « c a *- U * * * * Ul «* 2 ••• U. « « * « a. CL. 2 O 2 U U. u a. a. c at CD ~> U _1 C U f «J -> o U UJ u c •— o D 2 ►- C. U 2 1- 2 X U C. UJ O O 2 2 O ►- O Q. JtC » « * * « « « « « « * * « « « « « « • « « C/ •-• X - a _j ft. IX 2 U C C UJ Ct CO Ct «1 * J c a ■a. O UJ w _J t- r, 7 O 13 ►- 2 .* a. x C UJ ir uj a >/■ c * * * »- * •- ♦ * 133 1/5 t/3 t/i uO (/) t/> O C. o o o o Z Z Z Z Z Z D C O O O O O O C O O CJ UUUJLJ UJbJ l/)l/l Wl(/1 VO CO *. cv —• c\. « o (Or< (MO OiO ______ ^> A A Jt A j. j. Z O O O O O O 2 t/j i/-, i/:- «/? t/i O O Cj o Cj o 2 Z Z Z Z Z o o o o o o o «- c o *-; o */J CO */5 l/> »/) */> 9 9 » . • « u_ o _ <- • D~ "VI U. U VI- X 3. fl 1 X -< 3 it t- ~ c u oo ^ 2 a c u. ZL U. it z — »• X 1 U, u. -a X o D O i »- ►- c »- C _ CJ z o a: x x « « « « * * « « « « « « X * • * * « « « • l/> — c I- 2 «J C Ui z a u. — _ «_> * « * ■• * « * * * * * * « « « t- z « c u. a u. • « « « * « * « « « « •« « « « « a z c > O m c c i- a. z ro a _ ►- u »-< •r- a »- C/ > K- LJ 4 -ci. o Z N. ~ t- _ ►- i _ o «■ (-> UJ or •X -a _J 2 o f-« a. a. QC 4 »-« 2 »— o (9 ►— z »— 5? a o < C. ►- 2 I/. i. c. 3 u. a. _ or a. Z O ~ --« ^ 13^ _> •- o c u. a a a z — o 1— »- ►- 2 «i « ^ DC C •* QL -a o > C_, -1 ^ % IX O 2 X a. u. »- n u c CL, UJ X * a »- t o o QL »- CJ ♦ x •/. c c a u. ►- U 4 UJ o _J z X l- •-I ■<* »- z X X •ft ft z ■ >- X ft * «i ft * ft * a •— 1 « ft t— « M X « « IX. « « * « u o • ft a> ■* 2 • « ►- « « ft ft ft ft « ft ft ft ft « ft ft ft ft ft ft 136 U X — 4 « — a ♦ u a c. ■« •-» < «■« I X X X Ci «-» UJ 2 a V- UJ u. UJ >- •- « I *- «/j « > »- _i »— a — u ct c. c o c uc '• •• — IX. II a •» c: c a — O Q O OQC! Z 2 Z Z Z 1 o o □ occ 4- t- CI l_ t_ I UJ Ui UJ Ui Ui VI i/l fa l"MIC« • • • • • O — — o -< i/i */) to *y> io v UJ UJ UJ U! Ufa »—»—»—»-(-( U 3 z, ^ 3: .£ .£ ZXZJj JL i JL i i o o oooi faA if. «; v>i/i o Ci OQC z z i i Z| o o D OO UJ U/ >u uj 1/3 fa/-, 1/1 IRVX . _ •c «««!, "- * • I/-, */. « 1/ «/. 1/. 1/. 1/ o. u. U' luUk h- K (->->- z _J -J 3 3 3 •— Z a * it 2 c J- 3. 3 2 1 u •- O O O OO a a u a a O O C «- 0. •-• — o c «/> — _J •» U. M u t - •- - j, •- •- X .-« ►-• o X o I- fafa- a z UJ 137 a. u. o- m o tr m «_ «. • • Miom ir etc ■o Z2Z ccg <- <_, o Ui 1*1 W •. ". v, i/> u. H- -i C\( »• Z3 • • • -Z ►» IT (M ^* -t fti i i*0 CO =) luklkl Q- ►-►■*»- <-i 3 => => 2 2 2: a. u. a. A i 0. o o o u. z; c -a. a: a. n. _i *-> f— i Uj C — I I 1 138 ! o I— o U> cr D S U. >" a. z ■a Uj _J I u. c. Z Zi z o —. OS tl. Uj -j u. a. _j "» a —i 4 _| U. Z Ui o z «/> c =. io _i Z3 a. c a. « o >- ^ ui _i ►- «* -" u. o Z O — > O _ 2 c a. •-. o a at a. — o u. CC Ui •• "• a. *s, CC I VI o < u. * « « _J * * *, a * * • «» * * * _i • « ►,►<►■ U. t/5 ►- 1 ►- ■« •a _l o > u o a. «» c >- z o 2 1 U X X o X «- c W)UJ «- 2 »- O. . ►- 1/3 ■«» tti - — ' C. U- 3 a. u< ■• a w. »- <=> Uj CL O _l Z y X — — CM X JC ►- Z • * u. « « a. • « a. « ■» o • * o u. c _j -J a — > a o u z ■a 1 IX a -» c _-> a. <-> > a. uj u CL C_j WO CC •-• « * • * * « « « « • ►- V. X •4 12 X »- o = .«»-►- O- UJ i/> 3 Z X «-> Q. »- « _1 Z CO. z O >- ►- UJ _l I- Z X UJ X z o x o ►- Z X »J — . o a *u» c u. a: u " - X Ct fj c— c. U. U I- Ui o i -I z (-•-««* 4 X X • * « • * « « • «* ^ u t- c «/> UI u. z o .J */) o UJ _1 ►- t/> c u. 2 ■« •■»' > u. -£ 1 o a cr *'''' _J —. CC >- UJ o + t- >-• > t~ a u z c o ►- a ►- 2 U 1 ■ * 139 sfi \fl l/> */) */> O Q O O O Z Z Z 2 z o o o oo i- 4^ O ■_■ *-> UJUU1 UtJ o tf>cb CO 4/1 */> «/> «/5 d i i. *. A- i o o o oo c c o c o - < • z on »« c u. IUJ - "• 3 * « « « _1 « « * • Ul * « « * « * • * u. * • * * n o. c_ u oc •- Cc u. « o •-• z »- i- o z u. WO c cm ii ►- »— «* z due > u. _J — — a «i c c: u »- c z I •- c. ^ u •- U. z UJ I •-» X ►- o ►- ^ «* • z «t * X o t^ U jt c. a: • O o UJ P-. A l-r d —J H- U. A O ►- •-• z O • 4 U K i— ^c O •-• a. i- a * w~ X c z c. UJ IX. UJ a UJ — M a. c a w. o c u — u UJ Uj o t- X u. c u. u. X z o UJ u. c. o >- z ■^ «* o l>. •• o •— ' 13 •* O UI _J Z O >- QL « •— Z ►- «- z »- i a t- 2 >/: 4 c: —• 4 CC X _c Zu< yd h i/j a. i/jk a a. a. a. a. o c o o o u. u u u u * * * * • « « « lUO A. *- z o •-• UJ _J _1 _J «a -►- C C C. lAl » J-l -O K- > _> l/l 3 -> o z •-« X ►- o ** - X a o •- ■« >- -i X a a ** lu a. n U, .* 1 3 if. c a. 3 k x X Vj c a — OUU) t _i I- _. ►- * 1 * X Z O 3 2 O -j »- Z X */> o »- 2 X I/: IT *C 11 !£«« ♦ a x v: •» -* ♦ C Uj c u. a ui « •* x ku •• «• i. 2. a. j _j j*-» x x « « « « 2 »~i i x r« j. X X <-> « « • * « * * * * * * * « • * * « « • * X •* X ■« 3 XXXV. C OL 3 Cj L*J l/> * _J I— * X 2 O -0 — 2 X lO K D; it V. — . ♦ C Ui au - - X IL IA. £-. O — x. m x X 51 X c a o UJ + X X If. in * * X x in la. »- If. * * « * « <• « « X -I 2 >- —• 1 x a X X * * * « 1U2 o in * — i. < u. z u, u _ u. o U. 2 . J 3 v/ C or Z. Cj ll. «/> * * 3. 2 n r. '/ i u. - O u. ir t*j — •• ci - * i 2 <£ Zj O — -< 3 ^; .- a * w. — ♦ c. u. » « « * > 3 : i/- O wv) * M 21 IT. a ki/i>* o. * X ►~ 2 « c a u u. 2 O Uj o *-• O _1 t- cs X •- a M> u. 2 O 2 u. t- •- »- 2 U. c — « u. »— X i. ^» a x t- u. •» a >- O Uj I- z at: ui in M •- a. uj a X X ■» o »- « « * « X « * * « « * « * u. * * « * >-« « « « « _l o u. o _1 1 *- o •- Z • IT tt u- •• •• * * * « « « « « « « « « « « « « • « * « an ~ u. c U. X i~ t- t~ \t\ «« »- «» ~ iA X C\- X «-• o ** *J N • * « * « * « X * « « « * * * « « « u- « « • * « •-> * « • • o * a e < it i ■a X -i a >- c a u. & • 2 O •- i o c >- ^ k a * u u LC U. *• •• U IACO u i*. *~ u, O X -J ^ »- — •« fl X X — I «J « « « « « « » * « « ♦ ♦ « « « ♦ « • • * 1*3 1/i */J l/> %/i 1/i o o o o oo Z. 2. 2. 2. ZZ o o o o oo »/» l/> l/> «/J Wl o o «- «n "-iO ■ • • ■ • • OlMflO « O crt t/5 t» vq fc/i UJ UJ 1&J uj uj UJ i. J. a i. i i o o o o OO Z Z z z z z o c o o o oo CE <- o «-* <-> o o « U. u- u. UJ u. o •. • • »/> >s, 1/, 1/, IT »« i/> *> c c c • o cv. V. •c — O PI Z z z coo •- — tv c IfO o o o tn Uj Ul Uj v. • V. V, V. - 3 «> r> 3 3 Z c o « "- Z Z Z z Z Z •-* «~ «~ m _ _ ►, _ »- >_ a c mill Uj 3 i- oooooo a. U. AAA a. o o o UJ DC 2 IS 1*1 c c ►^ a z v-t *J*. tfj u a *~ (~ u. — ■n Ml < X a t- C- > >- J. 2 ►" Uj - ►- IO tr ►- z *- z a c «I 0. c t- z cr. •a. c. — u. Z. c «t ct •— t «-i UJ •- I/, a. l/j u. o - »- _i t- 2 at u. a. z o x o ►H i X l_> •- o -L .J <_ — c u. U. u. — — a ~ > i H _1 I •» 2 UJ «l 2 a. _j •-> * * * * • « * • « « •• « 1^5 %f> %/t */i %/> i/j «/> o o o o o o Z. *. 2L Z. 1. 3C. oo o oo o <_ o «- <_. «J 1 ■ I | | I . ■ I L ■ ■ ■ . ■ I ■ > lAVIIAIQIAO « O CM O "O O o o — o -> o oo o oo o o Ul ■A a. __ _ ■_■_ _ < -i ui IE Ul » * * * % •. a. •/? «/) I/) «/*< IS, V) & cj o o o o m Z Z Z Z Z Z O o o o o o o cc <_ «-, <-J «_ o o « U U UJ Uy Ul u» c* . • • • v, v. tr, i/j i/i til i/j to io — c c c , «- ». — in m © ir> z z z ...... C.OO »- — — O _• O <_> O t> Ul Ul Ul « V. *S, It. 41". IS. tf. IS. V. %f. u U U U U Ul Ul h~ ^fl N _> _) z> _iz> _> z m •» _ Z Z Z Z * Z t- _■ <\l _ _ •_ •-< _ ►« I i i i i i 5. to v> 3 Ul Ui Ul oooooo a. *-*-*- o r> — o z z z _._._._._._. u. a. i i a. a. a. a a. o. a. O t_ O O O l»> ooo 10 o o _ s. etc Ul -a. i •_ « uz u o_i u cu u Z Q UU c, c ►- z «!C U 2: a u • c UL - - a c o u u i- o z c >- o _ C- •- a z wm t/. «•. u _l ►- u •— Cm H- -» j*. »_ 4 . ~ *» _■ ►- u u *_ o _ <-> Ul ct 4 A 1 o _l z o _ ct Q- »-< •-. Ui « •_ z ►- e >— t- tO •— z »- z a _• l_ _ ify Q- •/- u t- to _l — 1 _l a a cc ci a cc ct ■a «I prn -^ < P P n C R A t TF« I. F I P I r ft H(U< P TO T PRTpP PF IpP PPInR F TRsT kPTT r IV t'TPT r iw M> • ! MP TA ATPP> M Fl F I A? » PXAHF LI TA AR2AI SI t ITY = ITY = TT Y = J tfm r FRP.iF FTFP FRPJF = *PF II > ! » : = VFNt AP t PLF«s t Pi J « ) ? J 3 ; < PROGRAM F^FmT> Y A PT, F,Y pt, GIN' « = «F.H LIS

t « := <* ::= <* s J=

: t • = ASSTP^ MFMT I I«T> • t = P,IT #ST #ST > l < * I r- r A L I I <: T F | F »' S 5 I P N ► r N T > ASSTGf "Ff> T < r t r- 1 t it F F F W P F N T P : : = F^ Pty <*m> a LIST M T i- 1 : ST IP,. I / to PAP r a m p a p WTN*I F F R, UNlV'F HI 1970) Ul ATTCIw RAM Fl FS' i = GPnl'P PF ATA #7 / t fl J PPIIP PF? RPliF PFS I TST Fl I T * I T<:T> t <*T> #1 TA PFSTG TA PFSTC- AcF pFpF PapF*> $ = A^ Y IF < P I r. T T < p- p r 1 1 p TP p : FSIrMTpR> TTf t> > t t = T T F P > I T S T I I FGR TM T MCTFF 30 J : : RPGRA r #5 \> #6 RflltP I GIT ATA p : : = < < < := #P

Mr s : : J F K > lis * ( #1 < r I g i j t = F M P T v / t* <*^ > #1 t = <* I> I = <*t> * 1 p ; ASF> : : = Fl » l)NT VFRSTTY PF II | U'PIS* pp thF trAkklatpr ^IPITT^'P, systFmi PSITY PF 1 1_| INPIS» CA«;fs> #; ) FMT LlST> MOrACK J AM FLF>FMT> / T LTST> ) M> *) / PPMTRPLS> i»3 ) I.PWPR I I Ml T> #t #] *H I TPN> / TPM> / Ok'> J PF> tGROUP PF> / PlA D ATTp,w> #. fl asrIpnmEmt> / APATTOM> ft 00 / t, flOO TGMflTPp>fi7 = #(in *9 / Tp,MATPP> #17 *f3)' ) #9 J F W FMT> / » / , #ni|MMY J > MJT"R> = TT PATTFpM> # 1 1 / NflTPR> <^IT PATTFRN> ) v.pFMT PlP,lT> r f 1 1 MPBAC< ) CIO? #12 J TTFR #13 J • 00 AMY DTGTT #15 / T LTST WITH PA»FN> amy PI GIT #15 J TFRK' I TST wTTH PARFM> #> #102 #( #1P0 Amy DT p I T #20 / 00000100 00000200 00000.^00 oooootoo 00000500 00000600 ^0000700 POOOOPOO P0000900 ooooiooo 00001 100 00001?00 0C001300 00001 aOO OPOOlsOO P0001 600 P0001610 00001 700 OOOOlflOO P0001900 , O000?000 00007100 ; 00002200 : P0002300 i 00002400 00002500 j 00002600 P0002700 : 00002800 i 00002900 jj 00003000 i 00003100 ; 00003200 00003300 : 00003400 OC003500 POOO36P0 P0003700 P0003800 00003900 P 00 A 000 . POOOAlOO POOO^POO OO0OA300 OP004400 | O0001500 O00OA600 00004700 ocooAeoo nC'004900 00005000 000051^0 00005?00 00005300 O0005400 00005500 P0005600 153 Cfm-FM* AFFl > I. ST* FFTTT f! n «PFCI Mr < » t = TTMrv #c #1 M SPFCIFI #PFPFA IPK SPF HP : STFF LA "IT ST ST <*T> T> li = T PFSTf, I f I T S asstp r-FTuP ATTTF f M f- K' T pW nF F T C A T RPFCI N A T P P > J t = h M F*'T FUT A *SST IT CP M cdF cf | I T A pA h> t J G I r a I r TrAt PAfK> S. T p M' put p F PTFT SF | I FOPAP t|r * MP pfa IPNS> : : : = t < tl ! t = *' SSI r^.'MF GN'>. FMT> PUP PFS ClFTCAT WITS> < ttfrn c 00 FP> < J <0N'FS V ALI * SI ON A wFp I UPIIT M> * s ? = IGK'AT IOVS> PATA FTP Tr ASF PFPFNPFN'T PTGIT> #21 / PN I TST WITH PAPFM> ANY PIPTT #20 / WITH PARFN> #21 CLAPATTPM nFFlMTTTPh *****************; Pl'FNPF PfPfTTTTPN SPFClFTCATlnN> / ,> :•= FMPTY / PETITION SPECIFIER^ #2? J ATIPN> M = FMPTY / Fl> #23 #» #21 h!> \ #25 #» / CPNTPPL STATFMFM> #J » SSTGN^FNT> / ASSTGWMF^T> / ASSTGN'MFWT> / FTFDPACio / TISPI AY STATFMFN'T> / COMPARISON STATFK'ENT> / TPNAL OUTPUT PlSPlAY STATFMFNT> / FF STATF^FNT> / PF STATFwFNT> / sTATr^EMT> / F ASSTGK'MENT> / F PFSPT> / F SFT> / SFT> / ?6 #30 #28 / 26 < TNPl'T SPFCTFICATT0NS> 1 27 #3^ #28 / 27 J TPFs^ AREMT TAsF LIMIT I.T5TSJ I'T PFSIGmATPp> #9fi / #29 / #29 / ! IST> ; IT> #» #] *32 I 33 / I PP> S > #34 ::= ATA pATTFRV> #35 / PATTFP N > #^6 / TCATTONO «, <0ASE llMlTS> #36 J IMFRSIPK TMnlCATPR> f\7 / U\FR?Tr^ !K'PTpATnP> *3P J t:= *T»P|iT A3Q #10 / #40 F W T > !l = PPl'F PF«;IpwATPR> #11 > CATTPMS> Jt= #12 / MTTS> #13 / * SFFCTFICATTPNS> #» #13 ; T" r TCATTPN'^> tl = IN'VPRSTPh INPICATPP> #11 00005700 00005fiOO 00005 900 00006000 O000M00 00006200 00006300 00006100 00006500 00006600 00006700 O00OA80O 00006900 00007000 00007100 00007200 00007300 00007100 00007500 00007600 00007700 00007800 00C07900 00006000 00008100 00008200 00008300 00008100 00008500 00008600 00008700 00008800 O0008900 00008950 00009000 00009100 00009200 O0009300 00009/jOO 00009500 00009600 00009700 00009>'00 000O9900 00010000 00010100 P 1 2 P0010300 POO 1 C 100 00010500 00010600 00010700 OC010800 0001C900 0001 1000 0001 1 100 0001 1200 0001 1 300 0001 1 100 opoi 1 500 JOOOl 1 600 15^ » OATF>-' rh n> flTF*>FN' T > T L TcT> « : = c,c ] r,n >tn T> < S T fi P rt r, RppP ASSIGN < r, r n 1 1 p r T> : := t rsFT> : • = t PFCTFTTATTf no ts = TP^> i J t tr #FF #FF : « = *FF #FF : != nv *FF ! PFSIGN'A / TN N MF != FMPTY / # Nj n T * 4 5 i PpliP OFSTGMATpR> #46 I r,Fniip PFMr-wATriR> **7 t flF / r??r i imtt LT«;T> #49 t #50 / #51 t P #71 #7^ / P #7? #73 J tt*> **? / TT I I«T> 1 , #53 ) # 5 <' <<;TnpfiGr p, p n j i p A S S I p. N' M F N T > M> *. <^TP,RAp,r r.pDUP A?^ tp,Nn'EnT> FFSrT 1 FFCI R * M ? > : : = : s= #ri.nr S P I. A Y S T A T F : : s < r, tk <<; Y < * N > vFnt> : < P I ' T P I ! T 5PFC iTTfATTPN Pill f-K Fin CC f n f r r it 1 1 f nvr i t ?t> . t : = : : = I VT T5-> «PFC 7F : = * A| i f rj #6? #* : :r t FFCTF r\ isn^ ft s ATTFPK' SPrCTrTCATTPM<;> #S6 J RAP,F ?PFCTFICATI0V^> J RAfiF SPFrTFirATIflMS> t RF STGMAI. LI«T> / I TwIt«;> / GF FPFCIrTCATTPN<> t> f^F | Iwjt<;> ; 5Tr,NATPR> #5"> #69 / *fo / Mr,NA(_ LTcT> $, #^5 #69 J N'T? #57 #S9 / \tP #58 #«;9 / N'T? *57 nt« #5^ PFSTPNATPR> / P F S T C- m A T P R > / <:> #, PFSTGNATPR> > HP LT?T> J O.N'ATnR> tfO / ^t> *> #60 FRAC"» #61 / P|iTPt"T> / ATFN'FNT> #» PFSTGVATpF> -63 #^" l?P|Av ctatfmtmt> :t= #*"S / **6 / T n k f i pitfht n t <; p i a y statf"Fnt> #* > POO 1700 000 1800 000< 1900 noo ?ooo noo ?100 000 ??oo O00 ?300 000 ?AO0 000 ?bOO O00 ?600 POO ?700 000 ?eoo 0001 ?qoo oOO 3000 POO 3100 000 3?00 POO 3300 nOP 3400 000 3500 000 3600 POO 3700 OOO 3^00 ooc 3900 POO 4C00 00 4100 POO 4?00 noo 43O0 OOO 4400 POO 45OO 000 4600 /O001 4700 ;pooi 4f00 0001 4 COO o 1 5000 0001 5100 0001 5?00 POO 53O0 POO.l 5/iO0 POOl 5^00 ; o o o 1 5600 00 01 S700 0001 5f00 ; n 1 5900 0001 6000 ; o o o 1 6 100 no 01 6?00 no 01 *3»0 OOP' 6400 01 ^500 0001 6600 155 APPENDIX D ICL SYNTAX 156 ICL /SYNTAX j * PSWD LANGUAGE t I CL J DOM ZIP TD ISL^ SPECIAL SYMBOLS « COtoMFNT* BEGIN, FND# DO» THEN* ELSfJ 11 = < BLOCK > j Q ***************************************************************** | < Df CLARATION > : : = < RESULT DECLARATION > / < PROGRAM DECLARATION > / < FILE DECLARATION > J < RESULT DECLARATION > 11= RESULT LIST < *I > SEPARATED BY » i < PROGRAM rFCl.ARATION > «: = r ILLIAC ? PROGRAM / B6500 PROGRAM / JOB ? PARTMFR / COMPILER ] LIST [ <*I> [ < PARAMFTER LIST > < label equation > / < papamftfr list > ] ] separator » i < F ILE OECl ARATION > J » = Rf^OO < B6500 F II F MODF > FILE LIST [ <*!> / <*I> < LABEL EQUATION > J SFPARATLR , / AfcSfiLUTE ? ILLIAC ? < II LI AC FILE MODE > LIST r <*I> < DISK LAYOUT > / <*I> ] SEPARATOR > ) < H650C F II F MoDF > : » = SHCLF PPECISION ? DL'I'PLF PRECISION ? PACKED / UNPACKFD ? ; < ILLIAC F IlF MODE > : J = BYTC FILE / UNSIGNFD FILE / t SHORT ? t INTEGER / REAL ? I 1 ? FILE > C a*************************************************************** } < U I S K L A Y o U T > « I = = < *N > ? r < UNIT > ? < NON LIST REQUFST > / *[ jTj.J i < DISK SF'ACE > :» = < RLCJIlESTS > / < NON LIST REQUEST > / < UMT > C < NON I 1ST REQUEST > / < c IMPl E REQUFST > / < ITERATION REQUEST > I ) OOOOIOOO 00002000 00002010 00002020 OC003000 OOOOIOOO 000C5000 00006000 00G07000 OOOObOOO 00009000 00010000 0001 1000 00012000 00013000 O0OJ.70O0 00016000 00019000 00020000 00021000 00022000 00023000 00024000 00026000 00030000 00032000 00036000 00037000 OOO3SOOO 00039000 00042000 00046000 00049000 00052000 0005*000 00055000 OOO56OOO 00057000 00056000 00059000 00060000 00061000 00062000 00063000 00065000 00069000 00070000 00071000 00072000 oooaoooo 00081000 00065000 00066000 00087000 00088000 00089000 00090000 157 < SIMPLE REQUEST > l» = <*N> ? J < NfcN LIST REQUEST > » i« SAME AS < R6500 FILE 10 > < CONTIGUOUS > ? BREAK FILF i < CONTIGUOUS > » » = < *i > ; < UNIT > J 1 a < EU > < SU > ? < SU > < EU > ? < El! > 1 Is C EUA tUp HJO / / / EUl 1 ; < SI > 1 !c [ SUA SUP. sue SUP SUE SUE SUO SU1 SU? SU3 SU<; / / / / / / / / / / / SU5 ] ; < OUANTH IcATInN > « « = < *l > WT STARTSV.'ITHl.ETTERC / < *\ > ¥1 STAHTSwTTHLETTERS [ < *l > PT ST ARTSWl THLETTERC ] ? < multiple > i 1 = < *k > ; < RE&UESTS > t «* LIST < REQUEST > SEPARATOR , I < Rf Out ST > t t = < PHASFT REOUFST > / < SIMP! I pFQUEST > / < ITEpATlPN REQUEST > / < UMT > ? < MULTIPLE >',»(< REQUESTS > #) I < f HAStO h\ QUEST > : t= < AbtPFSS > Hi < SUPIF REQUEST > I < ACLRl SS > Xi- < *fj > < *I > < *y. > » < JURATION PEqUFST > ::= < HiMpFR > t[ < ADDRESS > It < DELTA > it < SIMPLE REOUEST > 00092000 00093000 00095000 00100000 00101000 00102000 00106000 00111000 00112000 00113000 00115000 00116000 001 17000 00116000 00119000 00120000 00122000 00123000 0012^000 00125000 00130000 00131000 00133000 00134000 00135000 00136000 00137000 00138000 00139000 00140000 00141000 00142000 00143000 00144000 00149000 00150000 00151000 00163000 0016B000 00169000 00170000 00174000 00175000 00177000 0017&000 00179000 ooipoooo 00181000 00162000 00163000 00190000 00191000 . C0192000 0019 30 00 00194000 00195000 002C000C 00201000 #]00203O0O 002C4000 0C209000 158 < DF LT A > t l» < *N > < *T > ? ; < *N > J < PARAMETER LIST >»»= #( LIST < *I > SEFAFATED BY > #) / EMPTY ; Q ************+****■************************************#**** + ***** | Us < TF STATEMENT > / < PLUCK > / < ASSIGNMENT STATEMENT > / < PFMOVF STATEMENT > / < CUIT STATEMENT > / < PPIMT STATEMENT > / < KPVF STATtMFMT > / < Td PptiGRAK STATEMENT > / < pPUM PROGRAM STATEMENT > / < FXECUTJCN STATEMENT > / < [I'KING STATEMENT > / < F PPK STATEMENT > / < CASE STATEMENT > / EMPTY J < IF Si ATEKFNT > J » = IF < EXPRFSSIUN > THEN < STATFlFNT > I PT NOTELSE / Fl SE < STATEMENT > ] J < ASSIGNMENT STATEMENT > «» = < 1L-ENTIFIEP > PT RESULT S»|s < EXFHFSSICN > J < K VE STAtFNFnT > t : = < I L t IaC FILE ID > #:= < B6500 FlLE ID > / < b6S00 FIIF IT > *'= < ILLIAC FRF ID > J < E >ECLT 1U ST ATE Mf NT > : t = < fxfcutipn step > / try r / < puking statement > / < execution statement > ] Uh < STATEMFnT > [ PT MPTFLSF. / F I SE < STA1F» L'NT > 1 t < A S S I (» N N E N T STEP > :i = < IDF NTIF IEP > »T RE SULT *:= < FXECUTIPN STFP > ; < I. l.PRG STATFNFNT > j : = 00210000 00211000 00213000 00214000 00215000 002160C0 00217000 0021AC00 00219000 00220000 00221000 00222000 00225000 0022/000 00233000 00234000 00235000 00236000 00237000 00236000 00239000 00240000 002MCP0 002/i2000 002/13000 00244000 00245000 002/16000 00247000 00248000 00250000 00251000 OC252000 0U253000 00254000 00255000 00260000 00261000 00262000 00264000 00267000 0026&C00 00269000 00274000 00275000 00280000 00262000 00283000 00264000 00266000 002B7C00 00286000 00269000 00269010 00290000 00295000 00296000 00297000 00300000 00302000 00303000 159 DUhING [ < FXECUTION STATEMENT > / < ASSIGNMENT STEP > ] DO #BFGIN [ < DURING STATEMENT LIST > WHICH IS LIST < STATEMENT > SFPARATFD BY t) 3 Its tALL #OF IALL ) < FORK STATFKFNT > | :■ < ALL nf > 'RFGIN [ < FORK STATEMENT LIST > WHICH. IS LIST < STATEMENT > SEPARATED BY *} ] » 1 = CASE < EXPRESSION > #OF #BEGIN I < CAsF STATEMENT LIST > WHICH IS LIST < STATEMENT > SEPARATED BY #; ] #EnD i < CASE EXPpFSSlON > I »= CASE < EXPRESSION > *OF #( c < case expression list > which is LIST < EXPRESSION > SEPARATED BY , ] #) } < IF EXPRESSION > 11= . IF < FXPRFSSION > THEN EXPRESSION > ELSF EXPRESSION > J < TO PFiOGFAM STATEMENT > 11 = < IDENTIFIER > frT I NOLRl NGST ATEyENTANDTH I S I STHEDUR I NGPRCGRAM »l= < EXPRESSION > I i. < FhOM program statement > u = < 10EF.TIF IEP > PT RESULT *« = # F ROM < IDENTIFIER > PT ISTHI STHEDURINGpROGRAM J < PRINT STATEMENT > : := PRINT < *S > } < CUIT STATFfFNT > :« = STOP < *N > / STOP / ERROR < *N > / ERROR I < REMOVE STATEMENT > lie REMOVE LIST r < ILLIAC FILE 10 > / < ILlIAC PROGRAM ID > I SEPARATED BY » I ;< blOCR > s It OBEGIN F < BLOCK LIST > WHICH IS LIST r < DECLARATION > / < STATEMFNT > I SEPAFATED BY t\ ] *END I ! **************************+******#***************************#** J < LABEL FCi.ATlpN > t : = L I ST t ANY NONCHAPACTER BUT <*M> BUT <*S> J SFPARATUR %/ < *S > ? J | *********************************♦*♦**♦****+*********#********#* j < EXFRlSSIdN > : « = < IDENTIFIER > PT PFSUlT I *t= / 00304000 00305000 00306000 00310020 00311000 00323000 00323010 00323020 00324000 00325000 00326020 00327000 00337000 00336000 00339000 00340030 00341000 00349C00 00350000 00351000 00352010 00353CO0 00361000 00362000 00363000 0036^000 00366000 0036&000 00369000 00370000 0C373000 0037&000 00379000 00360000 00384000 00387000 0038b000 00395QO0 00396000 00398G00 00-400000 00401000 00402000 00403000 00407000 004100CO ootncoo 00412000 004)2020 00413000 00414C00 00416000 004 17 (;00 00418000 00419000 004?0000 004210C0 00425000 00426000 00426010 00426020 i6o EXPRESSION^ / < IF EXPRESSION > / »+ t ■ < I F ?KfT < IF EXPFESSIrN > / < f- T C I E * K' TERh> f #0p ?5CFM!T0<0RV*O#-l)] ]* J IS A FOLLOy.ED BY t *AK'0 3* * FACTTP> IS A FOLLOWED BY POSSIBLY ONE TGFO / >3 UFO / *J /*<] TFR*> TERN> TERV> 3 < t- C L F * ^ t / / / / / IS AM FOLLOWED BY * + - ]* i tLSS r&TR riFo CEOV / <] / =] ; AN IS A FOLLOWED BY r x / $/ ]* J < (-hIMARY > lie *+ / - / * h 7 / < IDENTIFIFP > *-T PFSULT / TRUE / false / < *N > / #( < EXPRFSSION > #) / < EXECUTION STEP > / r MINI / MAx ] *( r < MlN MAX EXPRESSION LiSt > WHICH IS I. TST OF SEPARATED by Cp^AS STARTING WITH 3 #) / < fASE EXPRESSION > ; < i o t n 1 1 n e f> > ' » » = <*i> > < ILLIAC FILE IP > t:= < IDENTIFIER > PT ILLIACFILEID J < ILL1*C FFOGPAM ID > :« = < IDENTIFIER > ?T ILL I ACpROGRAMlC. t < b65G0 FIl.F ID > ::= < IDENTIFIER > PT FJ6500FTLEID J < EUCLUU STEF > »t = < Ci L lECT STATEMENT > / < ILEnTJfTER > BT ILI IArPP(iGPA^nRp6500pRCGPAM0PC0MPlLEP < ACTUAL rARAKFU R PA&T > [ J" '■ I T H < IDENTIFIER > «• ACTUAL PApAMETER PART > ] ? J < ACTUAL FAPAMFTFR PART > M = » ( L 1ST [ < *S > / DUMPfILF = < ILLIAC FILE ID > / <*!>=<*!> 1 SEPARATE)* > *) / EFF1Y i < CtLLLn ST ATF rFNT > : : = CrlLFCT *( LIST < R6500 FllF ID > SFPARATOR » #) Tf>iTP < b*50C F II F ID > ; 004?6030 00426040 001?6050 00126000 00429000 00430000 00431000 00^32000 00433000 00434000 00435000 00436000 00437000 00438000 00439000 00440000 00441000 00442000 00443000 00445000 00446 000 00447000 00447010 0C447020 00446000 00452000 00453000 00454000 O0455CO0 00457000 00456000 00456oJO 00459000 00460000 00461000 00464000 0046bo00 00466100 00467000 00469000 00470000 0047i;OCO 00473000 00475000 00476000 004770C0 00476000 0048600& 00490000 00510000 00511000 00512(00 005J6000 00523000 00524000 00526000 00527000 005260CC 00529000 00530000 00533000 16.1 VITA Jacques Emmett LaFrance -was born June 25, 1939, in Stockton, California. He received most of his elementary and secondary education in the public schools of Overland Park, Kansas. His undergraduate years were spent at Harvard University from which he was graduated in 196.1 with the degree of Bachelor of Arts in mathematics. Following two and one-half years' study at the University of Kansas, he received the degree of Bachelor of Science in Education in January of I96U. He then taught mathematics for one and one-half years at Northeast junior High School in Kansas City, Kansas. While working on his M.A. at the University of Illinois beginning in the fall of 1965, he did substitute mathematics teaching in the public schools of Champaign and Urbana and was employed as a programmer for the physics department. Following the granting of his Master of Arts degree in mathematics in January of 1967* he transferred to the Department of Computer Science. He has served as a research assistant on the ILLIAC IV project and for the Center for Advanced Computation while working on his Ph.D. He is the author of two papers entitled "Implementation of 'A Fast Direct Solution of Poisson's Equation Using Fourier Analysis' on ILLIAC IV" and "Optimization of Error Recovery in Syntax-Directed Parsing Algorithms". He is also co-author with Alan Beals and Robert Northcote of a paper en- titled "The Automatic Generation of Floyd Production Syntactic Analyzers". Mr. LaFrance is a member of the Association for Computing Machinery and three of its subgroups, the Special Interest Group for Programming Languages (SIGPLAM), the Special Interest Group for Computer Science Education (SIGCSE), 162 and the Special Interest Group on Computers and Society (SIGCAS). He aiso holds m emhership in the Mathematical Association of America and «as elected to membership in the scientific honorary, the Society of the Sigma Xi in 1968. He is currently Special Instructor in Mathematics and Director of the Academic Computer Center at Wheaton College, Wheaton, Illinois. UNCALSSIFLED Security Classification DOCUMENT CONTROL DATA -R&D (Security elaaalllcatlon ol till: body ol abattmet and Indaxtnt annotation mull bo ontotod wh—t tha ormrmll report la elaaalllad) 'iriginatinC ACTIVITY (Corporal* author) Center for Advanced Computation iJniversity of Illinois at Urbana-Champaign Jrbana, Illinois 6l801 a*. REF'OUT SECURITY C L AS SI Fl C A TIOM UNCLASSIFIED 26. CROUP ■ IEP0RT TITLE 3HWAX-DIRECTED ERROR RECOVERY FOR COMPILERS oucriptive NOTES (Typa ol r a p o r t end Sneluelva data*) Research Report iu thohisi (Flnt nam**, middla IniUet, laat namo) Jacques Emmett LaFrance 6 SPORT DATE June 10, 1971 7a. TOTAL NO. OF PACES 172 7b. NO. OF REF8 101 • CONTRACT OP. CHANT NO. JSAF 30(602)-klkk PROJECT NO. £PA Order 788 •a. ORIGINATOR'S REPORT NUMS3ERIS) ILLIAC IV Document No. 2^9 Sb. OTHER REPORT NOW (Any othar numborm that may be aaalgnod (Ilia raport) DCS Report No. 459 I DISTRIBUTION STATEMENT opies may be requested from the address given in (l) above. I SUPPLEMENTARY NOTES one 12. SPONSORING MILITARY ACTIVITY Rome Air Development Center Griffiss Air Force Base Rome, New York 1^440 '3ABSTRACT This paper presents a system of automatic error recovery for yntax-directed parsing algorithms which is based solely on the syntax of he language. This system of automatic error recovery uses a table of those .ymbols which can follow a construct to determine how that construct might e inserted. Four compilers built with this system are described along with Xamples of the error recovery. The translators writing system in which this ystem of automatic error recovery has been developed is discussed, including |he syntax description language, the generation of Floyd productions, and the |arsing table build from the Floyd productions. Finally, the author presents uggestions for improving the error recovery in compilers build using either ;f the two parsing algorithms , and for research into further extensions . pssible applications of the technique described in this paper include .[Etensi ble co mp^^ersanc^^oTTTT^^^^^fnT" pojrm^ W -..1473 UNO lassifte: Security C H. curity Classification TINCLASSIFIED Security Classification KEY WORDS Utility Programs Debugging LINK A ROLE LINK B ROLE TTTNTHLASSIFIED Security Classification -\ OONC, % % \^ .io 8 zr s !. TYoF,LLiNois - uRBA ™ 510.84 IL6R no. C002 no.457-462(1971 Automatic BMBtflontf ditemlnWIep. 3 0112 Q88399flnn mm ttigS»l8? mm mm HR