CENTRAL CIRCULATION BOOKSTACKS The person charging this material is re- sponsible for its renewal or its return to the library from which it was borrowed on or before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each lost book. Th«f», mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. TO RENEW CALL TELEPHONE CENTER, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN JAN 2 8 1996 FEB 1 2 1997 AUG 2 4 2006 When renewing by phone, write new due date below previous due date. L162 r UIUCDCS-R-77-861 rh ^u UILU-ENG 77 1713 H is A PASCAL COMPILER "by Yao-Ching Stephen Chen April 1977 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS The Library of the JUN 2 4 1977 isity or Illinois A PASCAL COMPILER by YAO-CHING STEPHEN CHEN B.S., National Taiwan University* 1971 THESIS Submitted in partial fulfillment of the requirements for the degree of Faster of Science in Computer Science in the Graduate College of the University of Illinois at Urbana -Champai gn* 1977 Urbana/ 1 1 I i noi s Digitized by the Internet Archive in 2013 http://archive.org/details/pascalcompiler861chen 1 1 1 ACKNOWLEDGEMENTS I wish to express my gratitude to professor Thomas T. Chen for his continued advice and support throughout the development of this project. Thanks also go to A. B. Baskin and Douglas W. Jones for their many helpful suggestions during the implementation phase of this pro j ect . 1 V TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1. THF SYSTEM ENVIRONMENT 2 1.?. THF DESI6N OF THIS COMPILER 2 1.2.1. LANGUAGE USED FOR ENCODING THE COMPILER . . 3 1.2.2. COMPILER TARGET LANGUAGE 4 1.2.3. THE CHOICE OF A PARSING ALGORITHM .... 4 1.2.4. ENCODING THE COMPILER 5 1.2.4.1. SYNTAX ANALYZER AND SCANNER 5 1.2.4.2. SEMANTIC ANALYZER AND CODE GENERATION . . 6 1.2.4.3. RUNTIME ROUTINES 6 2. MSP PASCAL GRAMMAR 7 2.1. MIXED STRATEGY PRECEDENCE 7 2.1.1. MSP AND STACKING-DECISION PREDICATE .... 6 2.1.2. MSP AND PRODUCTION SELECTION 9 2.2. THE MSP PASCAL GRAMMAR 10 2.3. DIFFERENCES IN SYNTAX BETWEEN THE MSP PASCAL GRAMMAR AND THE STANDARD PASCAL GRAMMAR ... 11 2.3.1. THE EMPTY STATEMENT 11 2.3.2. FILE VARIABLE AND POINTER VARIABLE .... 13 ^. THE SCANNER AND THE SYNTAX ANALYZER 15 3.1. THE SCANNER OF THIS COMPILING SYSTEM .... 15 3.1.1. THE PROCEDURES READ AND WRITE 16 3.1.2. THE SCANNER AND READ, WRITE PROCEDURES ... 16 3.2. SYNTAX ANALYSIS 17 4. SYMBOL TABLE MANAGEMENT 19 4.1. STRUCTURE OF AN ENTRY IN THE SYMBOL TABLE 4.2. REPRESENTING STRUCTURE IN THE SYMBOL TABLE 19 21 RUNTIME STORAGE MANAGEMENT 25 5.1. STACK-BASED STORAGE MANAGEMENT 25 5.2. HEAP STORAGE ALLOCATION 26 5.3. STATIC STORAGE MANAGEMENT - AN EXTENSION . . 27 5. A. DATA STRUCTURES 27 5.4.1. STORAGE FOR ELEMENTARY DATA TYPES .... 27 5. A. 2. ARRAY 28 5. A. 3. RECORD 28 5. A. A. FILE 28 5. A. 5. SET TYPES 29 5. A. 6. PACKED STRUCTURE TYPE 29 5. A. 7. ACTUAL - FORMAL PARAMETERS . . 30 SEMANTIC TRANSLATION 32 6.1. THE SEMANTIC ANALYZER AND CODE GENERATION . . 32 6.2. CODE GENERATION ROUTINES . 3A 6.2.1. REGISTER ALLOCATION ROUTINES ...... 3A 6.2.2. THE BI NOP ROUTINE 3A 6.3. AN EXAMPLE FOR CODE GENERATION 35 6. A. REMARKS ON GOTO STATEMENT 36 6.5. ERROR RECOVERY 37 6.5.1. RECOVERING FROM SEMANTIC ERRORS 37 6.5.2. RECOVERING FROM SYNTACTIC ERRORS 37 7. EXTENSIONS TO THE MODCOMP PASCAL SYSTEM 39 7.1. COMMON VARIABLE 39 7.1.1. SPECIFICATIONS FOR THE COMMON VARIABLE . AO 7.1.2. AN ALTERNATIVE APPROACH FOR VARIABLE INITIALIZATION AD 7.2. EXTERNAL PROCEDURES A1 7.2.1. SYNTAX FOR EXTERNAL PROCEDURES IN MAIN PROGRAM A2 7.2.2. PREPARING ASSEMBLY PROGRAM AS EXTERNAL PROCEDURE A2 7.3. FORWARD PROCEDURES A2 7. A. FILE TYPES A3 7.5. READ/WRITE PROCEDURES AND RANDOM ACCESS FILES . A5 SUMMARY . A6 VI LIST OF APPENDIX APPFNDIX APPFNDIX APPENDIX APPENDIX APPFNDIX APPENDIX REFERENCES A8 A - THE PASCAL BNF GRAMMAR 50 B - STATISTICS FOR SYNTAX TABLES 57 C - SPECIFICATIONS FOR THE SEMANTIC STACK . . 58 D - PASCAL RUNTIME PACKAGE AND EXTERNAL PROCEDURES 59 E - MACROS FOR PREPARING ASSEMBLY SUBROUTINE AS EXTERNAL PROCEDURE 63 F - REGISTER ALLOCATION 65 G - SUBROUTINE LINKAGE 67 Page 1 1. INTRODUCTION * compiler for the PASCAL [4/5/21] programming language has been implemented on the MODCOMP IV [15] computer system. The compi ler/ which is encoded in FORTRAN/ is one pass and generates reentrant assembly code. Its syntax parsing algorithm is based on Mixed Strateay Precedence (MSP) [10]/ and the MSP PASCAL grammar [Chapter 2] used in this compiler is listed in Appendix A. The FORTRAN and BASIC currently supplied by MODCOMP are not well suited for system software development. The principal goal of this project is to implement a powerful language which may be compiled into efficient machine code and used for system software and data base management. It is also desired that the project be completed within a reasonable amount of time. After careful analysis/ PASCAL/ which combines a rich data type structure with strono type checking/ was selected as the language to be i mp lemented . PASCAL/ designed by Dr. Niklaus Wirth/ is an ALGOL-like [12/14] programming language with extensive data structure facilities. The original language definition is in "The Programming Language PASCAL" [21]/ and the concise definition of an updated standard PASCAL can be found in the Revised Report in the "PASCAL User Manual and Report" [5]. Since detailed information regarding PASCAL is readily available in the literature/ this paper will only concentrate on the local implementation. Page 1.1. THE SYSTEM ENVIRONMENT The MODCOMP IV is a medi um-to-l arge/ mu I t iprogr ammable / paged memory/ qeneral purpose computer/ with a maximum central memory capacity of one million 8 bit bytes. A disk operating system/ MAX IV/ is provided to control the MODCOMP IV paged memory system/ the register context hardware and the priority task execution [15/16,173. Due to local system constraints/ the size of the compiler is limited to a maximum of 27/648 (Hex 6C00) 16 bit words. This has necessitated the tight packing of many tables/ and the excluding of the implementation of some language features. However/ these limitations may be lifted when more core memory becomes available on the local system . 1.?. THE DESIGN OF THIS COMPILER Basically/ this implementation of a PASCAL compiler is a one-pass processor which generates assembly language as its object code. The execution of a PASCAL program requires the service of the PASCAL compiler/ assembler and link-loader in that order. The assembly code generated by the compiler is system compatible and can be stored/ examined and edited by system utilities. Page / 1.2.1. LANGUAGE USED FOR ENCODING THE COMPILER It was decided in the early stages of development that a hiqh level lanquaqe should be used to encode the compiler. The use of the assembly lanquage could be justified because of efficiency. However the current trend in programming is toward using higher level languages* and regarding programming as an engineering discipline. Its reasons are obvious: easier programming* less effort required in debugging and modification* and less cost. FORTRAN certainly is not an ideal compiler writing language. FORTRAN is not flexible enough in its control structures* especially in its lack of recursion. In dealing with data structures* FORTRAN provides very little support in maintaining such well-defined structures as stacks and binary trees. Given the conventional FORTRAN data structures* strings* stacks and binary trees have to be simulated by the use of vectors and appropriate routines to support operations on them. However* FORTRAN was used for the encoding of the PASCAL compiler because it was the best available language on the local system . It would be possible to have a boot-strap PASCAL compiler written in PASCAL on another system which already has a PASCAL compiler* and the boot-strapping compiler could generate the target machine executable code. Indeed* this is an elegant method for writing a compiler. It was not adopted here for two main reasons: First* writing a boot-strapping compiler itself requires a Page A tremendous demand on the resources of the other computer system which are not readily available. Secondly* the MSP parsing aloorithm C 1 03 is based on many syntax tables [Appendix BD* and there exist no adequate features in PASCAL to initialize those tables. 1.2.2. COMPILER TARGET LANGUAGE The compiler can generate either assembly or machine code. In order to ease the burden of debugging the compiler* the assembly lanouage was selected as the target language. Moreover* this approach takes advantage of utility routines in the assembler* thus reducing the size of basic compiler. For example* the memory allocation of PASCAL can be handled by pseudo assembly instructions. Of equal significance is the availability of macro assembly language features for code generation. 1.2.3. THE CHOICE OF A PARSING ALGORITHM A formal parsing algorithm is used because of the availability of many we I I -deve loped Translator-Writing-Systems (TWS) which are very powerful tools for compiler writing. The Mixed- Strategy-Precedence (MSP) parsing algorithm was selected because of the existence of an MSP grammar analyzer C 1 03 on the IBM/360. Paoe 1.2.4. ENCODING THE COMPILER A decision must be made about the number of passes for the compiling process before encoding the compiler. A one-pass approach was selected mainly in the interest of the compilation speed. This section serves as an introduction to the encoding of the compiler/" detailed discussions are given in later chapters. 1.2.4.1. SYNTAX ANALYZER AND SCANNER The first step in encodino the PASCAL compiler was to manually develop a grammar [Appendix A3 in Bac kus-Naur-Fo rm[6NF3 . Based on this grammar/ a grammar analyzer/ which is part of the TWS designed by McKeeman [103/ was used to produce the syntax tables [Appendix B3 for the MSP parsing algorithm. The grammar analyzer was also designed to check whether the BNF grammar for PASCAL was MSP of degree (2/l;l/1) parsable. Although the grammar analyzer was run under the IBM OS/360/ all of the other tasks in implementing this compiler were done on the local system. The next step was the encoding of the MSP syntax recognizer in FORTRAN. This syntax recognizer utilized the tables [Appendix B3 produced by the grammar analyzer; however/ those tables had to be transformed from XPL data format into FORTRAN data statements first (XPL is a dialect of PL/I and is part of the TWS C 1 0D > . The scanner [Chapter 33 was developed next/ and the compiler at this stage could be used for syntax checkinn. Page 1.2. A. 2. SEMANTIC ANALYZER AND CODE GENERATION Semantic analysis and code generation constituted the major effort of writing this compiler. The semantic analyzer includes symbol table management [Chapter 43/ runtime storage management [Chapter 5] and semantic routines [chapter 63 for various constructs in PASCAL. Since this was to be a one-pass compiler* code generation was mixed with semantic analysis/ and the code generation routines [Chapter : : = ♦ ::=■ - : : = ♦ < s i «-> n > : : = - If the four rules above were merged into two/ the error message 'stacking cannot be made with (2/1) context' and/or 'production cannot be distinguished with (1/1) context' would be generated during the grammar analysis phase. Some of the production rules were also created because of semantic considerations. For ex emple : rule 46 ::= REPEAT was inserted there so that detection of the beginning of a REPEAT statement could be communicated to the semantic routines by the syntax parser. Page 1 1 After the compiler has been written* it would be rather difficult to adapt to changes on the MSP PASCAL grammar. Thus* it is advisable to resolve the grammar specifications carefully before advancing to the phase of semantic analysis and code generation. Even after the grammar is accepted by the grammar analyzer* there remains a risk that the grammar might not represent the intended taroet lanquaqe* or that semantic considerations miaht require the creation of additional productions. 2.3. DIFFERENCES IN SYNTAX BETWEEN THE MSP PASCAL GRAMMAR AND THE STANDARD PASCAL GRAMMAR The standard PASCAL syntax was presented as a top-down grammar [53. Several difficulties occurred during the process of generating the MSP PASCAL grammar for this implementation* partly because of restraints imposed by MSP algorithm* and partly because of the fact that the top-down grammar was restructured for a bottom-up parsing analyzer. In addition to the extensions presented in chapter 7* there are two minor differences between the syntax specification of this grammar and that of the PASCAL Report. These are explained in the sub-sections below. 2.3.1. THF EMPTY STATEMENT Since the semicolon [ ; D is considered as a statement separator instead of a terminator* the empty statement presents some problems in making the MSP PASCAL grammar acceptable to the MSP grammar analyzer. The empty statement must be explicitly represented in the Page 12 MSP PASCAL grammar wherever it is allowed. A considerable amount of manipulation of the grammar is required in order to make the grammar acceptable. A compromise is suggested in this implementation to allow only restricted usaae of the empty statement. Empty statements are most commonly used at the following two loc at ions : a) Before keyword END or UNTIL; and b) After