828 BwcwwHaBa WSBBmaBap mrrnflMTiwiVfliflirff BMrcKHl Rev iMMMMarti $8s£g SB VwHoSB JSSL mm Urn BmfflB MHfyHMn Hmyw 1$ m Hi HI B HHmh Hi m wSS SsssnsKm tJWQffliffii MfnmMnt P'--. uCMCKOoK mMMObMvHQU JW MB B H WMB M M M MWflH Hgj LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 IX&r no. 824-823 cop. Z L16 l_O-1096 S f U" v / UIUCDCS-R-76-828 / /' CL'^ VALIDATION OF THE ANALYSIS PASS OF THE CLEOPATRA COMPILER by SANDRA ANN LEACH I October 1976 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS JAN 20 1977 University ot Illinois at Urbane- Chair an Digitized by the Internet Archive in 2013 http://archive.org/details/validationofanal828leac VALIDATION 3F THE AHALYSIS PASS OF THE CLEOPATHA COHPILEH BY SANDFA ANN LEACH B.S., University of Illinois, 1975 THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science In Computer Science in the Graduate College of the University of Illinois at Orbana-Champaign, 1976 rirbana, Illinois Ill hck n ow 1 ed qme nt s Preparing a thesis is not only a technically demanding process, bat a lonely and frustrating one as well. T have been fortunate to have many colleagues and friends who have supportei me in both respects. on the technical side, T would first like to thank my thesis advisor. Dr. H. Georae Friedman, Jr. , for suggesting this project, for his helpfnl comments while reading this thesis, and for his patience. I also wish to thank Dr. Axel t. Schreiner for his advice in the planning of the project, and Mr. Scott Fisher for his cooperation and assistance. «V heartfelt appreciation and thanks go to my good friends. Dr. R. L. Danielson, Hr. W. D. Gillett, Ms. H. J. Trwin, Dr. *. r. Weaver and Dr. T. R. Wilcox, for their help in editing and proofreading this thesis, and most of all for their moral saoport. Last and most importantly, I thank my parents for their unflagging confidence and encouragement. Without them this thesis would not have been possible. Thank you all. 1 V Table of Contents 1 Introduction ....1 2 Testing Philosophy U 2.1 Definition of Validation ...U 2.2 Available Testinq Methods .,..5 2. J Selecting a Method .....8 2 • *4 Implementation of the Testing . • 10 3 lexical Analysis 11 3.1 Character Set Mapping 11 3.2 Delimiting boundaries .....13 3.3 Comment Tokens • • 15 3.M Identifier and Operator Tokens ..16 1.5 Numeric Constants .18 1.6 Character Literals .•••....23 3.7 Complete Lexical Analysis ••••••25 fc Syntactic Analysis ...30 U. 1 Structure and rata Blocks 31 U.2 Routine Blocks .....37 U.j Fxpressions U 2 5 Semantic Analysis 50 5.1 Symbol Table Complex 50 5.1.1 Predefined Symbols .•♦..50 5.1.2 User-defined Symbols 52 5.2 Intermediate Text 53 6 Conclusions 55 references 57 ADpendix: Summary of Productions Implemented ...^9 1 Introduction CLEOPATRA, the "Comprehensive Language For Elegant Operating System And Translator Design", was designed in 197«l by Schreiner [1,2] as a high-level, Mock-structurei, extensible language aimed primarily at operating systems impiementors. During 1975 a first attempt at an implementation of the language was made in PL/T on an IRH/360. This version implemented only a subset of CLEOPATRA and was intended to be a bootstrap so that, future CLEOPATRA compilers could be written in the language itself. The compiler consisted of two phases: analysis and coie generation. The former performed lexical, syntactic, and semantic analysis and produced a symbol table and interme3iate tert. The latter phase operated on the intermediate text to produce ob-jest code in the form of a relocatable binary deck suitable for input into the OS/160 loader. Unf ortanately, soon after the completion of this project the analysis phase implementation was found to be inadequate; it was bug-ridien and, bpcause of inferior coSing, would have required substantial effort to correct. Therefore, the decision was male pot to atteupt repair of the bad code, but to rewrite it. During 1976 a second exempt was made; the impleientation of the old c.o&e generation phase (Halbur [3, a]) was considered to be of sufficient quality to retain, at least for the present, so the new analysis phase (Fisher [S]) was written to fit the coder's sDecifisations. The fanlts of the original analysis phase and their defection at. such a late date resulted from meager and ad hoc testing methods during its development. In order to preclude running amok in the same fashion a second time, an extensive testing effort ttas been made concurrently with the development of Pisher's implementation; a documentation of the testing effort comprises th<» tody of this thesis. The goal of the testing is twofold: first, to demonstrate that Pishe^s implementation of the analysis phase is "correct" (or, more realistically, to document any deviations from the specifications) so that later CLF0PITR1 compilers can be assured of a solid basis; and second, to produce a test data package which can be permanently associated with the compiler and used fir regression testing when the currently implemented subset is ^voanded. Chapter 2 discasses the difficult problem of testing a larqe piece of software and the basic philosophies that this testing effort follows. Subseqnent chapters detail the design, production, and execution of test cases for each component of the compiler's analysis phase: Chapter 3 on lexical analysis, rhaDter 4 on syntactic analysis, and Chapter 5 on semantic analysis. The final chapter summarizes the results. Appendix K qiv a s *he lefini+ion o^ the language actually accepted by this i mDle»mentation of the analvsis phase, and Appendix B presents the complete t=st data generated. This thesis is intended as a companion to the reports ■*n*ioned above. * knotflege of the information contained in [1] and [ 1 1 is essential to understanding the material discussed here. To minimize confusion, the terminology and notation used in this thesis are, as much as possible, consistent with the earlier reports. 2 Testing Philosophy The intent to validate a piece of software immediately gives ris Q to many gnestions. Soie of them are: - What does validation mean? - what testing methods are available? - Which method is to be used and why? - How is the method to be carried out? Pefore discussing the actual validation of CLEOPaTFa«s analysis phase, it would be beneficial to answer these questions. 2.1 Definition of Validation Gruenberger [61 gives a very useful definition of validation. Assume that a prograi has been designed and encode!, and the imolementor has debugged the code, thus removing the mechanical errors of coding and producing a superficially correct program. The program as written clearly solves some problem; ♦■he purpose of validation is to show that it solves the desired problem. Validation usually involves testing performed by an outside agent playing the role of the "Devils Advocate". This testing has a dual purpase: showing that all functions in the program poecif i~at ions are implemented, ani showing that all implementei functions are soecified. These are what Elmendorf [7] calls "specification testing" and "program testing," respectively. If the *wo sets of functions are identical, the correct problem mas been solved, snd the program has been validated. However, if the two sets of functions are not identical, the testing will hare pinpointed the aberrant function(s). The cause of the deviation could be inconsistent, poorly-stated or erroneous specifications or miscoding (or even a combination of these) . The implementor must then decide how best to remedy the discrepancy and carry oat the correction; the testinq can then be reoeatpd. The repetition of the entire bank of test cases ran against the software after each repair of the code is called "regression tasting" (Brown and Sampson [8]). Any modification of the program runs tha risk of introducing an error which was not present before the change was made and which nay be totally unrelated to the successful implementation of the change itself. So testing must be repeated to insure that previous work has not been damaged in some obscure way by the alteration. 2.2 Available Testing Hethods There have been many methods of software testing discussed in the literature during the past few years. Some of these techniques are lore thorough than others, both in a formal sense (Soodenoogh and Gerhart [9"J) and in relation to the concepts of program and specification testing as discussed above. Some are more costly in human time or computing resources or both. Therefore a practical balance between thoroughness and expense ■ast be found. On* way to show the correctness of a program is by proof. However, there Is rarely enough inforeation available to allow a formal proof of correctness. as London M°] points out, assumptions wast be made about the semantics of the programming languacre used, the completeness of the problem domain specifi- cation, and the correctness of the proof itself. Going a steo further, the entire running enwiroment of the. program (language, operating systea, and hardwire processors) must be coipletely axioaatized and proved consistent before the program can be formally implemented and proved within the system. This is clearly beyond current state-of-the-art technigues, so more informal proof aethods are employed. But these less formal proofs are more prone to error. Goodenough and Gerhart show that several programs which have been "proven" in the literature contain numerous bugs. It is interesting to note that many of these bugs are due to imprecisely stated specifications and would have been found if even rudimentary test cases had been run. Another testing method, which relies on brute-force computing power, is to test all possible input combinations. Even for a program having a very small number of inputs, exhausting input coabinations is absurdly impractical. Take, for example, a program with two inputs X and T that computes one oatput z = F(X,T) (Huang f 1 1 1) • If * and Y are represented in 12- bit registers, *here are 2 , **2'*=2** possible input combinations. F?en if thp program itself took an average of only one millisecond per execution, it would take more than 50 billion years to complete the t«»st! I'he next three methods involve siailar approaches. The first method is to generate test cases that exercise every statement of the program. This test, is practical, since choosing the test lata is a fairly straicrht-f or ward, mechanical process. However, there ire several types of errors that are not necessarily discovered using this technigue. Erroneous transfer of control nay not be found since it is possible to execute every statement in a program and not traverse all (possibly faulty) control paths. Also, missing control paths (i.e., failure to examine a special easel probably will not be discoverer! since the choice of test lata is based on the program coie and not on the program specification. The second method, which is much more likely to detect errors, is to reguire that each and every control path be execute! at least once. This method guarantees that erroneous transfers of control will be found, and also detects subtle errors which are continaent on the exact sequence of previous events in the execution. However, the method is impractical due to loop constructs. h program with a loop has at least as many different control paths as the number of times the loop can be iterated, and for programs with several loops the total number of paths is the product of the paths through each individual loop, obviously, this can lead to a prohibitively large number of paths in manv cases. A more practical approach, a compromise between the previous *wo, is to generate test data that exercises all program statements anl all branches. This is equivalent to reguirincj that each edge of the flowchart corresponding to the program be traversed at least once (ffnang [11]). This scheme is reasonably effective, bat still does not insure that all errors will be defected; those such as missing control paths, incorrect path selection, and an incorrect or missing action aay not be found du a to an (an) fDrtanate choice of test data. •^he last technigue to be discassed has a coapletely different emphasis. Instead of basing tests on the program itself, «-his npthod generates test cases directly froa the specifications, using prograa structure only to complement the specification information. Goodenough and Gerhart describe ho* decision tables aay be employed to facilitate identification of "all conditions relevant to a program's correct operation," froi which test cases are generated that exercise all possible combinations of these conditions. This is an effective technique, but it involves an element of insight on the part of the tester to correctly determine the "relevant conditions" froi ♦-he specifications and program. 2.3 Selecting a Method Clearlv, some of the technigues described above are more saited than others to a product such as verifying a compiler . The criteria used in rating these methods include practicality, effectiveness, and expense. attempting" to formally prove a piece of software as substantial as =* compiler is an exercise in complexity and frustration. The applicable technigues are extremely tedious aid ranidlv becom 3 unmanageable. Us© of lass formal techniques l«>i1s *o "proofs" which are much less conclusive than one would desir*. '"astira hy exhausting input combinations is similarly i DDrart-ica 1: + he r-omplexi«-y of the input domain of the CLEOPATRA compiler results in sets of test data which are intolerably large. Random samplincr is feasible, but it is verv difficult to oblectively measure the degree of success of this method since* th» test process is entirely sub-Jective. Osing test data that. ^K^r-is? all statements of a program is easier to °valnate f but is not effective enough in exposing errors. Exercising all program paths is impractical due to the sheer number of possible paths; the problems in using this method are similar to those encountered with exhaustive input testing. Two testing attacks remain: exercising all proaram sfrat»ments and branches, and exercising all combinations of specification conditions. The former approach implements program testing, but ignores the specifications; the latter implements specification testing but tests the program only to a lesser 3egrep. sirr> nei^h^r of thesa individually constitute a sufficient tost, th? loaical answer is to combine the two approaches, obtaining a testing method that performs both program mi specification t^stin. therefore, the attack to be taken in th* validation of *he ^ieoortra analysis phase is to first generate f^st cases from the specifications anl from a limited know! ear of the vroiras'^ structure so that all combinations of r a levant conditions are »xercised, and then to verify that thesa te«;t r^c;=>=; * n indeed cause the "xprution of all program statements anl branches. 10 2.4 Implementation of the resting Whereas good program design is performed top-down, program testing is normally performed bottom-up. Hetzel [12] advocates "segaenfat ion r w the concept of testing new functions only after posting all sub-f unrtions. The advantages of segmentation are increased conceptual clarity, economy of test cases, and the ability to do more testing in parallel with program development. ▼his nodular approach was used in the validation effort to be discussed. Test cases for sub-f anct ions were generated using Goodenough and ~erhart»s decision table teshnlgue, then the test cases for all the sub-functions were synthesized to test the complete function. This last, unified test was run on the PL/I optimizing compiler, which offered an option of counting statements executed and branches taken, in order to verify that all program statements and branches were exercised. The major modules considered in this validation were lexical, syntactic and semantic analysis. Each of these modules was tested bottom-np, beginning with lexical analysis, and the cDrrectness of each successive module depended on the correctness of the preceding module. ^he bulk of this thesis is composed of a derailed account of the validation of each of these major modules. The soecif ications used for the testing were the nrirrinal language definition (Schreiner [1], hereafter called H the Reoort") , abbreviated by Halbnr's subset definition [41, aad modified somewhat by fishers implementation considerations [5], 11 3 Lexical Analysis Because lexical analysis, the identifying and isolating of tokens froi an input stream, is the foundation of any compiler, the testing of this module was very thorough. The aim was to insure that any legal token (and many Marginally illegal ones) wsnld be correctly isolated from any possible environment. This resulted in a greater concern with how the lexical analysis handled illegal inout than with how later modules processed illegal input. I'he sections of this chapter describe the sub- functions that »ere tested, including the character set mapping, t-he detection of delimiting boundaries, and ths isolation of tokens (comments, identifiers, operators and constants) . The last section discusses the integration of all sub-function test cases and the tasting of the complete lexical analysis module. 3. 1 Character Set Happing The kernel of lexical analysis is the mapping of the input characters into the functional classes of character types (letter, digit, delimiter, special and control). The specifi- cations against which ths» code was tested wers productions 1.1 through 1.5 of the Report. The code for this mapping function was isolated, and all characters that could be input via a DFC/10 terminal were presented to the function. The results were as follows. (See Appendix ft for an explantion of the BNP notation.) 12 (1.1) letter ::* Ik | B | C | D | ! | P | B | R | I | J | K | L|H|W|OtP|0|R|S|T|0|V|W|X |T|Z|a|b|c|a|elf|g|h|i|j| M 1 I ■ I b | o I P I q I r | s | t | b I v I « I x 1 y l * (1.2) deliwiting_character ::« ( 1 ) I . I : I , I • I ; I blank_character f _ (1.3) special_character ::= a | * i S | « | & I * I - I ♦ I = I ■ I ? t / I < I > I I I *-'l - (1.4) digit ::=0l1)2l3|tt|5l6f7|8|9 (1.5) contr9l ::= ! Productions 1.1 and 1.4 agree exactly with the specifi- cations. Production 1.2 differs only in that the underscore character has been added; this will be discussed further with identifiers. The characters < and > do not appear in production 1.3, hawing been deleted froi the character set because of the inability to enter them via input dewices available at this installation. Sguare brackets f and "J have also been deleted froi the character set. The user is cautioned that the character t (which is entered as a * on a keypunch, but as a [ on a DEC/13 terainal) prints on th* pw train printer as a [ ; this is due to the internal character napping of the systea link between the DEC/10 and IR1/360, and a policy decision was aade to let it reiain so. Production 1.5 differs froa the specifications in that the backspace character was deleted, again because of linkage probleis, and the end_of_source_record "character" is nDt ■entioned here since it is wore a state of the lexical analyzer and is therefore not giwen a capping. 1.1 3.2 Delimiting Boundaries The delimiting boundaries (as described in section 1.1 of fh? Report) between the characters of the fire character classes are defined by the following rule: there is an understood delimiting bomndary between any two characters, except that there is no delimiting boundary between - two letters - a letter and a digit - a digit and a letter - two digits - two special characters. In order to verify that this role holds, a decision matrix was constructed representing the cross product of the set of five character types with itself. That is, an entry in row i and column 1 of the matrix indicates that a character of type 1 immediately follows a character of type i. Test data were generated so that there was a boundary corresponding to each position in the matrix. Figure 1 shows the test data along with th° matrix, where each position is filled with the character pair satisfying that boundary. The results of the test, in the form of tokens identified, are listed in Figure 2. These results agree with the boundary rule except in the case of a digit followed by a letter: if the digit for diqit string) is immediately preceded by an alphabetic character, there is no boundary as the rule predicts; however, if the digit string is immediately preceded by a non-alphabetic character, then there is a deliaitinq boundary between the last io letter digit ielii special control letter digit delle special control r- t f r — r— — f AB I 3R r B7 | I; I 1 1 83 | 7: | 6* f- :8 1 h h • t I :• — f .4 | ( I a. ■ < 1 L. | 1 9- I *! I • ! ■ indicates end of soarca record | »P7:83P;»$6» !«» I ( |?,. / Figure 3 Test rases for Comment Tokens 3.U Identifier and Operator Tokens The main specifications for identifier and operator tokens are aqain taken from the Report (section 2.1): (2.1) identifier ::= letter f letter 1 digit ]• (2.2) operator ::= f special_character }• I identifier There are only two exceptions to the discussion in the Report. First, the transparent underscore is not implemented; an underscore appearing within an identifier acts as a delimiter, thus dividing the identifier. (This is the reason the underscore was added to the character set as a delimiting character.) Second, there is not explicit maximum length on either identifiers or operators; they are limited in length only by the restriction ♦•ha*- th»»y cannot be broken across source record boundaries. Therefore, in this implementation these tokens may be up to 80 characters long, if positioned properly. 17 Proa these specifications th« decision tables in Figure 4 were produced, enumerating cases to be tested for identifiers and operators. Pignre 5 shows actual test cases satisfying the criteria of the decision tables and the results after execution. All tokens were isolated correctly, verifying the correctness of trhp identifier/operator handling portion of the lexical analysis. Identifiers legal cases: 12345678 ends with letter or digit | LLLLDDDD| I I contains interior letters J NYHYNYWYI 1 I contains interior digits | NHYYNHYYI i i Pathological Cases: starts with a digit broken across card boundary contains a blank contains an underscore contains other illegal character Operators legal cases: 1 2 r 1 length>1 1 Y w | i 1 Pathological Cases: broken across card boundary contains a blank contains other illegal character Pignre a Decision Tables for Identifiers And Operators 18 II D0918273645D X9BCD8BFG7HI J6KLH5WOP0QFS3TUV2WITZ1 ZTXHVOTSHQPOHHLKJIHGPEDCBA I I XONBERS OHDBR_SCORE A BCD0EMGH2IJ3KLttHH5OP6QR7ST8 0V9WXTZ G0123456789 K9 *| |iillB»-*»"?/<>|f-» $♦->:<*/$ 7BBONG A*TI"ES BIGHTS SEHI;COL3I I naBbers IDEMTifier L Figure 5 Identifier and Operator Test Cases 3.5 Numeric Constants *he specifications for constants, as are the specifications for the remainder of this testing, are based on Halbar [ * ] with only minor modifications. Halbur's productions 2.1 through 2.11 are correct, bat 2.15 and 2.17 have been changed to: (2.15) long_integer ::= P. decimal_string (2.17) bit ::= s. binary_digit This change in the BRF reflects the restriction that only one of a base or a size can be specified in one constant (e.g., P.X.7PPF is not legal, even though X.7FPF and F. 32767 are legal). The only other modification is that one or more blanks are allowed between the letter-dot preface and the minus symbol, if it appears, or between the preface and the digit string if there is no minus present. The decision table given in Fionre 6 was constructed from the specification productions, and the test cases shown in Pigure 7 were generated to satisfy the decision table conditions. The results »ere not completely as expected. Th3 lajor problem hinged on constants that exceeded the specification bounds for minimum and maximum values (see Figure R) . First, no provision had been made in the code to detect 19 PH — . 1 r~ h, X X z M 1 £ r\l — 1 pr li- X X z o X x 1 «- " - m ft. X X z Q Z £1 o — 1 pn b. z X z M 1 r! er PM u. z X X a X at I 00 ^. l (N &.. X X r o. X X 1 r» .^ , ^- 1 CM ft. X X x M 1 t| \C .^ ~- 1 PM ft- X X z o X «_ 1 in ^ — » 1 PM ft. X z z a Z X 1 rt ^ — . 1 (N Ik, z z z M 1 z 1 m _ — 1 PM ft- z z z O X rl PM _ PM ft. z z z Q ss x i r- _ . i (N X X X X M i z 1 © — 1 PM X X X X cc X z 1 ■ z. z a: z X cr X x z z M 1 Z 00 X X z z" DC X z" p- X Z as z X Z X \c Z Z X X Q z z 1 " * Z ► z n X ^ i » r ir" X z Q z X pp Z » z z o X X PM z Z z z X 1 z r - Z z SB z c. z b- !!!■ ■ —— - —_ « ■!■ ■! ■ m~ . . • to 1 (,'. Ct F- (A o. z O- M Q ftj Ki w o Z f-' T" f- *-< C ft. ft. Q c *r w «t CL ■4 Z ft. X ir, M irs O ft. H be «j: be o hJ ft. z t • z Q « ftj «. SB «c Z f-l a U- h i c J t-J 3 ft: 0. c [ 1 Oj bC r ■J VD _,_ j v£ ITl b- b- bi M 1 X I ir ^_ 1 vC (T) b- b- bn D3 b- z a- ^. vo K X b- o~ m X X m _ X t- 1 X 1 o »-> 1 » (C b- b- b- a. b- » 1 <^ 1 tr — . 1 ' P2 b- b- b- CQ X r ! X a- 03 b» b- z l-J 1 z r- ^* ^ cc. b- b- X 0". b- X vC » 1 » cc ► • bn z cc Z X 1 IT, ~ 3- CD Z bM X M 1 z 3- M a CC Z. Cm z CO z X m ""* a CO z b- X CO z X 1 Pvl ^ ^^ a- cc b- X X t- 1 1 X ^ ^ » M a CO b* X X CO b-. X o ^ *^ 1 3 CO b- X X CO X b- o> «», 0*. m on Z. z X M 1 X 00 an ^. m cr z X X CO b- x I P- _ ^» p^ 03 z X X CD X X I SO ^_ PO ft. b- b- b- M 1 X m ^ PO ft. b- b- bn O z z J» .^ Pr- ft. b- b» k" Q X z ■ ■ ^ 1 III ■ M • to D 1 to \n a, z cc bi a w M 0J IB X e-- r E-' M D ft. ft. Q o —. (n «: co M z ft X 1/1 M tn o ft t- br «c be c ►J ft z fH X Q «■ ft-i -c X •c X (-' t: cc K-l c ►J M c c. Q. au 1 1 CO be o -j 03 a o bC u u to to C (Ti tO t/1 P' «l fli W ULi o: o fl 1 n; T3 XC C C II II II II II 1 xzxz ft.-. z o Z| trl X Qi XI k U • OJ ft-l Ol k , • >. EQj u * * c • •l-l XI COI _ N w c D •H 4-' rd I) GJ D tr. +j p vT to 4- a w jj p- c c tr U •H fc l-l o u- f c o •H fP •H () C,' o 20 32767 U58 10 57B3 601298 -32768 -1592 -69437 - 2914 X.7PFP X.5P98 X.8FF1 X.4G7 X. OCDE X. PPP3 X. 4LA9 X.-8000 X.-ABCD X.-5031 X. -23BP X. -P93A X. -P112 X. - 73B X. - 9ABE X. - 35D X . U56E P. 2147483647 F. 5147483609 F.1234567ABC P. 1000000000 P. 9876543210 P. 215Z137 P. -2147483648 P. -273456 1 8900 P.-983A76 P. -2147483648 F. -3987654021 F. -165291C257 P. - 2147483648 P. - 39875642301 F. - 529816D2C F. 111111111111111 B. 1000000100100000 B. 10141011 B. 111111111111111 3. 10000000000000000 F. 00110P011 B. -1000000000000000 B. -11000111000111000 B.-1010Q0110 B. -1110001 B. -10111000111000111 B. -00111610 B. - 01000101 B. - 11000111101010011 B. - 001012110 B.- 1100 X.- 56PE P.- 34345654 S.01 S.1 S.1101 S.2 S. 1 S. 010 S.012 S.-1 S.-101 S.-1A0 S. -0 S. -0111 S. -1C0 S. - 1 S. - 10 S. - D01 Figure 7 Test Data for Constants constants whose values were out-of-bounds. Therefore, the first tine an extremely large constant was encountered, the FIXEDPVPPFLOR condition occurred and aborted the execution of lexical analysis. Clearly, this drastic an error "message" was undesirable, so an ON PIXEDOVERPLOR unit, which printed a reasonable error message and set the constant's value to zero, was incorporated into the procedure that calculates constant values, and the test was rerun. Onf ortunately, this repair di3 not coioletely correct the problem. This time constants of extremely large magnitude (>2147U836U8) were correctly detected, but deciaal constants whose values were only moderately out-of- bounds (32767<|x|<2107U836U8) were not flagged, and both decimal and long_integer constants in this range were given spurious values. The cause of this turned out to be the mixing of FIXED PINRPY 15 and 31 variables to hold the constant's computed value, causing the loss of high-order significant bits. The problem was finally solved by uniformly using FIXED BINAPT 31 variables to hold the computed value and checking normal decimal constants to 21 insure that their values were within the correct range. Second, the value P. -21474B3648 was flagged as being out of bounds. The reison for this was that, by definition, the value of a negative constant is -1 times the positive value computed. Therefore, when the positive value 2147*83648 was coiputed, it was correctly flagged as out-of-range before the value could be negated. This indicates that the specifications were inconsistent; the range of long_integer constants should be I lonq_integer| < 2147483648, and not as shown in Pigure 8. H&XIHOH HIBIHOH DECIMAL 32767 -32768 HEXlDE^THiL X.7PPP X.-8000 r,ONG_IHTEGER P. 2147483647 P. -2147483648 BIT S.I S.O Pigure 8 Range of Constant Values K third problem that this test data revealed was that constants with an w. preface which contained any letters a through P were evaluated interpreting these letters as hexadecimal digits. The cause of this problem was a local character set nipping for dioits that distinguished between binary, octal and other digits, but did not distinguish between decimal and hexadecimal digits. The lapping was modified, and i rerun of the test data showed that the problem was solved. 22 The fourth inconsistency involves the processing of those constants containing illegal blanks following the linos sign. When a negative decimal constant contains blanks separating the minns sign and the digit string, the string is interpreted as in operator followed by a positive decimal constant, which is correct accordinq to the specifications. If a hexadecimal or binarv constant contains one or wore blanks following the minus, the entire token is correctly isolated and flagged as being in error, which again is what the specifications stipulate. Powever, if a long_integer constant contains illegal blanks after the minus, the P.- is tokenized and given a value of zero; thi3 farces the string that would have been the body of the constant to be analyzed on its own individual characteristics (being typad as either a decimal string or identifier or decimal string followed by an identifier) giving unexpected and unwanted results. Similarly, if an S. is followed by any character other than a binary digit or a string of blanks followed by a binary dicrit, the S. is tokenized and given a value of FaLSE, and the rest of the string is interpreted on its own. The processing of long_integer constants was changed so that it was consistent with hexadecimal and binary constants (i.e., the entire token was isolated and flaoged as illegal) , but it was decided to let the bit constant handling remain as it was. The last problem ancovered by this test data was the handling of those numeric constants containing illegal characters, such as X.-5Q31 or 57B3. In cases like these, the constant is interpreted as containing all the characters upto bat rot including the illegal character; then everything from that 23 character on is interpreted on its own. As a result, these constant are interpreter! as a legal constant followed by another token (or tokens) , and not as a single illegal constant as one would expect, since it is in general aore probable that the illa- qitiaate letter was a typographical error (as in using the letter o instead of the digit zero) or resulted in using a digit of an incorrect base. The solution to this problea mainly consists of repairing the digit/letter boundary problew discussed in section 3.2, which, as stated before, was not done. 3.6 Character Literals Character constants are defined by: (2.28) literal_value ::= C. any seguence of characters up to but not including the first following blank_character That is, the first character following the C. preface begins the li*-eral_value, and the first blanx_character following the C. ends the literal_value. Additionally, End_of_source_record is transparent to a literal_value and dses not becoae a part of it. The underscore character represents a blank_character in the literal_walue. Neither a ! nor the string COHHBNT initiate conents; they are considered part of the literal_walue. K literal_value aay not exceed 256 characters; longer literals are truncated to 256. 2* Picjure 9 shows the lit er al_walue test cases. Contained in at least one of the literals are ! and COHHPWT, each character of ♦■he character s*t, an andefined character, an underline, and an end_of_source_r?corfl (bat no blank_character, of course). The data also include a literal longer than 256 characters and one of length zero. |C. ABCDEFGBIJKLBNOPQRSTUVIXYZ_() . : ;, • i# $*&*- + /="-.>< | ? 123456789 _!THISI SNDTACOHH | I ENTBOTAtOHGCHARACTERSTPI1IGCONTAININGivERYCHARACTERINTHECRARSET_ITISVERYL01IGBOTST| I ILK256! !abcdefghjijkl»nopqrstUTwxyz C._ C. I | C. A BCDEFGHIJKLB!inPQRSTnVWXYZ_01 2 3U56 789_S#J«r,* -♦="/><-. |?_() . , ;:• _!THTSL ITERAL ALS | |OCONTAINSE?ERYCHARACTERTHTHECHAPACTERSET_C0HHE!IT BUTITISH0CHLOIGERTHA1I256ANDW ILL| | BETRONCATEDSOTHATITISLESSTHANT«OHONDREDPIPTYSIXCHARS;HOHEVERITGBTSINCREASI1IGLY#?| | nFBICOlTTOBRITECBARACTERSTRIRGSTHATARESOVERYLO!IG[ abcdefghi jklin opqrstUTtrxyz? ]t \ I C.THISISJOSTASBORTLITERALTHATHAPPERSTOEIIDIHCOLUHNaOl I I Pigure 9 Character Literal Test Cases The first execution of this test was literally a disaster, literals that w?re contained completely on one line and surrounded by blanks were handled correctly, but strings containing end_of _source_records were badly wangled. The first recurring end_of_source_record ended the literal, and, if •x* was the string between the C. and the first en1_of _source_record, the token "recognized" becaie «xit« (i.e., the string concatenated with itself). The remainder of the literal was interpreted as a seguence of new tokens. The sain cause of these problems was the procedure that read in a new line of source text and the conditions this procedure expected to be true when inroked; the procedure was designed to do wore processing than it logically should have, and by doing so 25 thwarted the handling of character literals. These problems were repaired, and the tests were reran. This tiie all literal_values were correctly isolated. 3.7 Coiplete Lexical Analysis After having tested each portion of the lexical analysis individually , it was necessary to integrate all these tests together in order to test the entire lexical analysis phase. The motivation behind the design of the composite test was the following: assume all tokens in the input stream have been correctly processed (i.e., as predicted) up to a given point; the correct isolation of the next token now depends only on what follows it. (The ^act that the lexical analyzer is a procedure that is called each time a token is needed for the parse was part of the basis for making this assumption.) The first step in the design of this test was to generate a list of specific token types that had been previously tested individually. Then this list was expanded by differentiating, within each token type, tokens that wight have distinct ending characteristics (e.g., identifiers ending in a letter vs. identifiers ending in a digit) . The original list was again expanded bv differentiating on beginning characteristics (e.g., flaciaal constants beginning with a decimal digit vs. those beginning with a minum sign) . Figure 10 shows both complete lists. Character literals were not included in the distinct endings group since these must all be followed by a blank_character which is not considered a part of the literal. DISTINCT ENDINGS 26 DISTINCT BEGINNINGS 1 delimiter character 2 co«i«»nt: COHHENT ; 3 cotnent: ! en1_of _soarce_record 4 Identifier endinq in a letter 5 identifier endinq in a diqit 6 operator 7 deciial constant fl hex constant endinq in a letter 9 hex constant endinq in a diqit 10 lona^integer constant 11 binacv constant 12 bit constant 13 end of source record 1 deliiiter character 2 consent: CONHENT ; 3 conent: ! end_of _source_record « identifier 5 operator 6 positive deciial constant 7 neqatiwe deciial constant 8 hex constant 9 lonq_integer constant 10 binary constant 11 bit constant 12 character literal 13 end of source record Fiqure 10 Token Classes nsed In Composite Test A matrix was constructed, siiilar to that used in the delimiter test in section 3.2, where the rows were numbered with the classes of tokens possessinq distinct ending characteristics, and the colnms nuibered with the classes of tokens hawing distinct beqinninq characteristics. (See Piaure 11.) The idea here is to insure that each token with a distinct endinq is followed in the test case by each token with a distinct 27 beginning. Another important criterion of this composite test is that it include every instance from the sob-function tests, in order to preserve the "testedness" of the sub-functions. The actual test cases were then qenerated, filling in the matrix as follows: the entry in row i and column j of the matrix received the value 'n/m* when in the test data a token of type 1 followed a token o^ type i on card n, and the first character of the type 1 token appeared in card-column m. For example, the entry in row 1 and column 7 is 3/11; this indicates that an identifier ending in a letter (IDentifier) immediately precedes a negative decimal constant (-65000), where the minus sign is the 11th character on line 3 of the test data. The actual test data appear in Figure 12. Tt is easy to verify that all sub-function t=»sts are included in this data. The results of this test reaffirm that previous repairs had been successful, hut brought to light one more problem* If any nameric constant was followed immediately by a negative decimal constant (or an operator beginning with a "-•») , the two were recognized together as a "constant" (i.e., B. 0110-35); when evaluation was then attempted, an error occurred because the interior minus was not a digit. Rerunning the test showed that the first repair made for this problem was incorrectly performed, which wreaked havoc everywhere. The repairs themselves were than fixed, and a re-execution of the data showed that indeed the problem was finallv solved. 28 «. W m * * TTT~ 1 in r- r- 1 — I ,. — 1 V s. a H u z E- IT (N cr -J (N | ^ ^- ^- V V V IP r- •" t- (N « * r" 1 1 1 r^ V \ ■3- •- — 1 IT. V I •- oc V r- o I/" V I 1 r. e 1 OC 1 ro r- (V, 1 V N V „ •- r- I 4-> 0) (fl 0) H Fh tn in o Ifi a •H ^ (tf ID C u «£ .f4 i— t £. IT o. (J •H • « K t- Q) « a .h tr> u w a e. c D o. •H •H E b. UJ U fli ■ •■H Id *J o a ■ u-l c V, X « •H « 1_ «J r r: i-+ e * o a •H '- 03 r •H * H U « o; O Ci(-l/,i~»-,Zl.l cij v. o ►-• i' u i/i 29 w » X X 1 4» b- • K-l • • fib © o OMN e M © #• B VO H 49 r» • o r* H 1 «» © &-r> SC t" IT. CC st O u"> !*• u © • m o ii r- m •— a at 1 r- 1 4-> © x a • •rCIM •— X CT r-> ► • St *— tf 1 r-D ft X ft 1 t) i— < w r-> p«. r" UrCf C X LT N O er trOu M 1 • N © r- ■c r- © t~* T-. he P- kH »» ^. PvJ • 1 OS f- • tm: M-l t 00 BE M r- • © 1 is »- © (U K N St u ► r- % •» • I Nr-U d «r X 1 SO E » »i b. X s 8t • u CT • * E ■>f •»• O • 1 1 *- fl' &: 1 h" N • « O CO b: W ro r- w 1 © p- • 6 a et © K • r • E U *» r- • i © »» X •< ft 1 r- © JX E m © f !-• 1 o • flB ki X A J* rv' © •H »• O 1 E 4J I « o r- * r 1 i c 1 D" W u • W 1 i or on j.' W V nJ © V 0) h- ^ . X E • St © O e — •c 1 t © H u X ^ X t E M X X 00 C E * r? b- © E (0 * m CO O t»(N ©►■ o * •M • » © w • © vc L) UJ frW | tJ V* u © E L> © © er w t •H O l on r~ »* Au fc* c E © • rs» er © BM t/5 © CM r- v> © faB c © f» X • rsj er *~ • >, x I«- m c ♦ CO M o u f» few fcc er 1 U r— | n .no- © i m © A ?• ►J t; • • t (0 BU£* r- ♦ cc ** © 1 © IT ki © X CO (J t j ii mm ^ n © © 1 • E cc r- © i«C W A t*X | r Cu ■JT, 1— f-i b- i» 09 © t tI r r4M X w O 1 6- E. st ^. L) © W i-H 03 •• *J © • V u T1 \C • E • b- IT) (i t E t TO UK &r CO V tr o r- CD. b; © E ir K X H bn CJ U Q) X T"-© O.X5 f* ♦ E E er H ■e r- -H ► XK-O — * to M + E T— O er E mi ••• w © K c * IE- • » • • © T > EC u 1 rr C 1 e •- r» Oj Vt fr- E fc %> © «— C *C ^ © 1 u r" • © u © • P i av e w •■ • © © he m VC • r* © tn © E © W IT U i E 10 M E • > • » © • u * • r*- b. rs • • C 1 •H C D * * r t © u >< (/} CM r m CO u ec CC (i-H -M 1 1-3 1 E © 1 • © r- © m ^» er on O C7 OK © 1 K ♦ C U 1" w- © ft) r« © • © in 03 • K • v- iJ nn O II u CC © £ t r- r" b. 00 s£ W C U % •C 4J | Cfc. C * « «e © u EB f— 1 1 • • > OS «c X Cc ! c r- .. 1 •»• 1— •c JU »— • rj o tn Qu © D or «: pa *& 1 » VMn IT • » a' E »" CO er © • » »• r- J3 ad Eh h* 1 O V S3 H e w * **-! E r- ^ © • V • • © H *e 3J 1 t-i A sC X cc k e- CrC r- ^» »» w 1© r- X © »^ a | CD — on © .-3 cu *W z j= U f © 1- bi fU N r- • © • 1 .* CC M b- £T © K • H C' r— r" »• 11 © C E ► fN IT. a C_ 73 O OCSE O E T IT r* © ^^ r~ © «r K k 0^ *- © © 1 «H \> X * • X * C JM r\i r- r- # © © cc E E CT * IT. • er b I sua ►• L E u> C rH vt r- © * 1 • Cr E X er b- on C E o I © r «- r>j t • b e D e 1 W m • • in •c C 1 on (V in 1 a. 1 x eg vc r E © B r- T- © © CO © tr L> © • O fM «c • LJ £> • D e E e Dl D © 1 • » © r^ © b E «- • tv IX sc b- ec © C an ■B CL r- • mt K ►< CO (N ^r O 1© K c fcfM e t> u I H tr© CC r r r» r- T- fcO i • E 1 E- m © © Cl ^J Uj r» 1 V ^c E- St t 1 U 0-, b= 1 SP fc# iH tr at # 1 u • V. V T E IT t X • » E U * 1 &.' er © ♦ e &a K tJ *J CC A ST b- «r K t» *£ r- > n I r .. c cc 3 e CC ►J a C_ * V r- E on b3 r- 03 K ft E I e m o at w: ^' ft t- c V tr E og E •• Ci • P- H 1 C t it. vc &-: E> c b- * D P" D C r" E t & X fc£ CC E [ O >►< F-i II er © u b- r\ P" b- tn H V»- IjS «r P- X N c LT + CO t • • E f- on r- © C5 • IT. i i* © n x X tr r- IT • r* r- 1 r- r- u E E • r- r- E 13 I ^^-.h er • W t. 1 1 * se f~ • • E © b x © sC CJ in tr. 1*. f. 14-1 «- E- E- m e r b • in tr- CO b- N U E © © CO © C3 « Z'HN E ■» if. ITj m ■f a, U er b- © • Cb E X c E © E x b- + i b Z St H N I b- k m E x; X X id! C »- m C • X cm e c E trt fT) X »" «— r~ m (N E © X © E on «r o;" . E fc r< P- i c • c r- C (4-i r" •S X c er i- r^ a r- on 1 . C C 6. C r o^ or • K • • U x: • • A • c • • • =t p" • 1 r; L/ Ki V L. c i ■>■ T2 V. • X b- • • T CL X 1 ir. r- © a- l \l. 1 Lj ■>• 10 4 Syntactic analysis The testin7 of the syntactic analysis phase should encompass the design of t?st lata which would exercise avery legal production rule of the CLEOPATRA grammar. Unfortunately, as discussed in Chapter 2, it was virtually impossible to exercise each production rule in every valid context. Therefore, the motivation behind the design of the test data was simply to exercise each legal production at least once; in cases where there were several productions for a single non-terminal (e.g., the wany syntactically different ways of writing an ITERATE statement) , however, the productions appeared in as many different contexts in the test data as was feasible. In addition, sinrre error-correction facilities are very liiited in this implementation, illegal constructs were generally not tasted. Test data for a parser usually consist of syntactically correct bat semantic- free program segments. Because syntax and semantics were implemented concurrently in this compiler, the use of semantic- fre* test prograis would have been impractical due to ♦■he abundance of semantic error messages that would have been generated. Furthermore, the use of semantically meaningful data in this phase of testing facilitated the tests for semantic correctness, as reported in Chapter ">. Two main programming examples were chosen as the basis of ♦■he remaining testing effort. One is a very skeletal scanner based somewhat on the CLEOPATRA language itself; it is not 11 intended to b? complete, or even correct, bat rather is intended ♦•o illustrate low one light approach sach a task using CLEOPATRA. ▼ he second eximole shows how several user-defined data types wiqht be manipulated. The presentation of these examples is divided into three parts: the first consists of the structure and data blocks, the second incorporates routine blocks (including all statements, but with only the lost rudimentary expressions) , and the third covers expressions in detail. U.1 Structure and Data Blocks This section reports the results obtained from testing four of the types of program blocks in CLEOPATRA: global structure blocks, local structure blocks, global data blocks, and local data blocks. The specification production rules for these blocks are: (4.1) basic_ref_type ::= IHTBGER | LONGINTEGBR | BIT | CHARACTER (1.2) basic_typ? ::= UTEGER | LONGIRTRGER | BIT | CHARACTER [ (expression) ] (1.1) reftype ::= { basic_ref_type | type_name ) ; integer EXTENTS] (1.5) typ=» ::= ( basic_type | type_name } [array] (5.1) struct ure_block ::= STRUCTURE conf iguration_name { ; link_item )• [;] EHD configuration_name [;] 32 (5.2) conf igurationnaae ::= procedare_naie | type_naae I aperator^link (5.1) linlc.iten ::= TYPE type_na«e [ALIAS identifier] | qlobal_link_itea (5.«) qlobal_link_itei ::= PROCEDURE procedure_naae [ALIAS identifier] [ ref_type_list .] PETDRRS basic_ref_type (5.5) (INTEGER , INTEGER , INTEGER) . SYHTA BENTRY RETURNS BIT END SYHTABENTRY; GLORAL DATA PARSER CONSTANT INTEGER SYHTABHAI INIT 100 SYHTABENTRY RIGHT(IOO) SYRTAB CHARACTBR(I) VECTOR (1:1000) NAHBTAB RI^ EOF INIT S.O ; END PARSER •JT.OBAL DATA SCANNER CONSTANT INTEGER LINELIHIT INIT 80 INTEGER CAPDNO INIT 1, CAPDPTR INIT 1 CHARACTER (80) CARD, TOKEN INTT C. CHAR ACTER(81) TCARD END SCANNER; STRUCTURE SCANNER -.OROCED'JRE LTNETN ALIAS GETLINE (CHA R ACTER BY A DDRESS) . RBTURNS BIT ;CONSTA"JTEVAI,: OPERATOR | i | (INTEGER) . CHAR ACTER RETURNS LONGINTEGER ;PROCEDMRE SCA NFRROR (INTEGER) . RETURNS BIT ;LOOKnP: OPERATOR LOOKFOR (SYHTABENTRY 1 EXTENTS) .CHARACTER RETURNS INTEGER ; FND SCANNER: DATA SCANNER ;INTEGER T; CHAPACTEP(I) TCHAR END SCANNER; STRUCTURE LIN^IN ;PPOCEDURE INPUT (CHA RACTER BY ADDRESS) . RETO PNS LONGINTE1ER TRANSLATE: OPERATOR 9$% ALIAS TRANS (CHAR ACTER, CHAR ACTER) . CHARACTER RETURNS CHARACTER RND LINRIN; OATA LINRTN CHARACTER (80) TNPOTSTRING INIT C. ;CONSTANT CHARACTER(80) INCHARS INIT C. ABCDEFGHI JELENOPQ RSTUViX YZ ( ) . : , : »#t%r,*-* = »?/<>|[ -,012 31^678 9! .TCHARS INIT C. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 222 222 2 233133333 1333 33 11 3am»U4U HUH U5 RND LINEIN DATA CONSTANTEVAL ;TNTEGER RADIX INIT 10, SIGN INIT 1, LOB INIT 1,L, DIGIT ;LONRINTEGER VALUE INIT F.O ;CHAPACTER (80) STRING ; END rTNSTANTEVAL Figure 13 Structure ann: Data Block Test Data 35 STRUCTURE I1ANIP; TYPE STACK ALIOS YOTOLIST; TYPE F1ATRIX1BY3; TYPE COMPLEX; TNNERPRODnCT: OPERATOR INTEGER 1 EXTENTS . (INTEGER , IHTES ER) ♦ * ALIAS DOTPRODnCT ( INTEGER) . INTEGER 1 EXTENTS RETURNS LONGINTEGER; PROCEDURE PROCWTTR2PARHS (INTEGER, INTEGER) . RETURNS BIT; END NANIP; DA""A HANIP; STACK STACK1,STACK2; COHPTEX RTGHT(10,5) CR1,CR2; MATRIX3BY3 M,H2 ; INTEGER INTEXPR END MANTP; GLOBAL DA^A HANIP; INTEGER VECTOR(B) VEC1 , VEC2 , VKC3 END HANIP; GLOBAL DATA STACK; INTEGER VECTOR (1: 100) PDLTST; INTEGER TOP INTT 0; END STACK; GLOBAL STRUCTURE STACK; POSH: OPERATOR INTEGER POSH: STACK BY ADDRESS RETURNS BIT; POP: OPERATOR INTEGER BY ADDRESS: POP STACK RETURNS BIT; INSPPCT: OPERATOR STACK. (INTEGER) ??? IHTEGBR RETURNS INTEGER; END STACK; DA^A PnSH; STACK SI; INTEGER I; END PUSH; DATA pnp ; STUCK S1; INTEGER T; END POP; T>\t\ INSPECT; INTEGER Z,T,J; STACK S1 END INSPECT DATA TNNEPPPODUCT; lONGIN^EGFR PRODUC" INIT F.O; INTEGER VECTOR (*:*) V1,V2; INTEGER I,J,nl,L1,02 END TNNFPPPODUCT; GLOBAL DATA n A^P IX3BY 7 ; INTEGER VECTOR(3,3) W1, EXTRA; END MATRIX3BYT: GLOBAL STRUCTURE w A" , RTX3BY3 ; flATRITOP: OPERATOR MATRIX3BY3 BY ADDRESS : (INTEGER) DHALNULT (INTEGER) :HATRTX3BY3 BY ADDRESS RETURNS INTEGER END NATRTX3RY3; GLOBAL DATA COHPLFX; INTEGER TNTPART,I"AGPART END COMPLEX; GLOBAL STRUCTURE COMPLEX; ABS: OPERATOR ABS COMPLEX RETURNS INTEGER; END COMPLEX; Piaur* 3 13 Continued Struc»-.ar o and Data Block Test Data 36 The execution of these test data exposed several discrepancies between the language described by the BMP and the language accepted by the parser. These discrepancies and their descriptions are given below. 1) Tn production 5.1 the seaicolon after the "Bin confi}uration_name" is not optional, as implied by tha BHF. When omitted, an error message is issued and nonal processing continues. ?.) The operator of production 5.5 cannot be the same as the operator_link. rf they are identical, an error message claiming "duplicate declaration" is issued, and the na»e is no longer recognized as being an operator_H nlc. This causes additional problems later in the program when that operators definition blocks are encountered. 3) There are several problems with production 5.12. Pirst, the ref_type_list is not optional after "BY ADDRESS:". if it does not appear, every token up to and including the next seaicolon is ignored, the tokea following the semicolon is assumed to be the operator of production 5.5, and processing continues from thera as if no error had occurred. Tnevitably, more probleis arise as a result of the "correction." Second, it appears that the ref_type in production 5.12 cannot include an ETTBWTS phrase. When one is included, it causes the next n tokens declared (where n is an unprelictable number, usually on the order of five) all 37 to be mapped into the same symbol table location as the integer extent naiber. The most serious side-effect sf this 1s that if the programmer is unfortunate enough to have included an array declaration later in the proorram, the confusion in the symbol table causes an infinite loop in the array cleanup routine. t») An interesting deviation involving production 5.36 is that if it appears in any context other than COWSTAWT, the identifier is not restricted to being initialize3 to a constant value; it may be initialized to any previously declared token-name (e.g., a procedure name or a punctuation mark) . Only in the CONSTANT context is any type-checking done. with the exception of the EXTENTS phrase problem in prodtictiDn 5.12 (the cause of which is still unclear), proposals fDr the repair of these inconsistencies have been documented, but not yet implemented. However, the restrictions imposed by these errors are not Dverly confining in view of the more serious problems encountered in routine blocks, as described in the next section . 4.2 Pontine Blocks Pontine blocks in CLEOPATPA define the actual operations to be carri^i out by a procedure or operator. The production rules f*>r these blocks (including expressions) are: 18 (5.16) routine_block ::= PROCEDURE procedure_na«e rnaie_list »] f ; statements }• [;] END procedure_naae [ ; ] (5.18) naae_list ::= (foraal [,foraal]«) (5.19) fonai ::= Identifier (5.20) nroceiure_call :: = procedure_na«e [ paraaenters] (5.21) pa«a»ters ::= (actual [, actual]*) (5.22) actaal ::= expression (5.23) routine_block ::= operator_link : OPERATOR [ left_naies ] operator right_na«es { ; statement )• [;] EM operator_link [ ; } (5.25) left_naaes ::= ref_type identifier r ( : | . } na»e_list | : ] (5.26) riqht_nanes ::= [;] ref_type identifier | naae_list { : I . } ref_type identifier (5.27) operator_cali ::= (expression f { : I . ) [paraisl]]] operator [ fparaas2] ( s I • 1] expression (5.27a) paransl ::= (actual [ r actual}«* (5.27b) params2 ::= •actual (,actuall«) (6.1) expression ::= constant f systea_supplied_value | syst»a_supplied_constant | array_ref erence f procedure_call | operator_call | (expression) (6.2) array_reference ::= array_expression (6.3) array_expression : := identifier (7.1) stat ::= expression | MIL (7.2) stat ::= RETURN expression 39 (7.7) lstmt • (7.8) label • • (7.9) cstat • * (7.10) CStBt • (7.3) stmt ::= EXIT [label] (7.U) stmt ::= IP expression THEN statement [ [;] ELSE stateaent] * cstit f label : cstat label = identifier = BEGIN stateaent [; stateaent]* [;] END = [for_phrase] ITERATE [ while_phrase] statement [; statement]* [;] [ when_phrase ] END (7.11) for_phrase :: = FOR identifier [ FRON expression] step expression [[;] f DPTO | DONNTO ) expr ] ; (7.12) while_phrase ::= WHILE expression ; (7.13) when_phrase ::= RHEN expression [;] (7.1U) cstat ::= DECISION decision f; decision]* *;] ACTION action [; action]* [;] r FLSE stateaent [;] ] END (7.15) decision ::= switch : expression (7.16) decision ::= switch {, switch]* [ list_layout ] : iist_inflex (7.17) switch ::= identifier (7.18) list_layout ::= VECTOR (inteqer : integer) (7.19) list_index ::= expression (7.20) action ::= switch_expression : stateaent (7.21) switch_expression ::= [-•] switch [ switch_operator [-•] switch]* (7.22) switch_operator ::= & | ]_ | == | -.= | AND | OR (7.23) stateaent ::= st.it | lstat no These productions are again based on Halbar [9], but reflect the following modifications introduced by Fisher [ 5 ]: 1) parentheses and quotes are used in production 5,27 to disambiguate parameter lists; 2) ALTDCATE an3 RELEASE statements have been deleted; and 3) a revision of the POR phrase of the ITERATE statement vas made. '''he initial test program for these productions is similar to that shown in Pioure 14. The routine blocks were incorporated into a revised (i.e., w error"-f ree) version of this data, and expressions mere replaced with single identifiers. Initial attempts at execution of these test programs were disastrous. A significant number of serious compiler errors were found. The following is a description of these errors; 1) The dot after the name_list in production 5.16 is not accepted at all. If it appears, it is ignored and an error message is issued. 2) Productions 5.25 and 5,26 are implemented as if all occurences of ref_type were replaced with basic_ref_type. This means that the operands of a user-defined operator cannot be arrays or user-definei types, even though these may appear in the original declaration of the operator. If a user-type name or in extent integer does appear as a left_name, the *ype_name or integer, respectively, is assumed to be the operator, processing continues expecting ricrht_names, and more errors result. If either of «M these entities appears in the right_names, an error message is issued, and the remainder of the right_naaas is ignored with no other apparent side-effect. 3) Tf at any time daring the processing of the statement lists of productions 5.16, 5.23, 7.9 or 7.10 a statement is encoantered vhich begins with any token whose symbol-table number is greater than or equal to 18 and less than 66 (this includes most reserved words and all delimiters, see Halbur [ 4 ]) , the parser enters an infinite loop. This trap is particularly easy to fall into since the letters B, C, F, 5 and I are considered reserved words, but no warning is given if they are declared as identifiers. a) When a compound statement is labelled (via production 7.7) »rror messages are issued claiming that the label is an undeclared identifier. The identifier is later recognized as a label when the following colon is found, but because of the error messages the program can never be passed on to the code generator and therefore can never be executed. 5) The P3R phrase of production 7.11 is not implemented as Fisher claimed. The form that is implemented is a compromise between the Fisher and Halbur versions: for_phrase ::= POR identifier [ FROH expression ] f f nowRTO \ opto } expression [:] I STEP expression [ ; ] [ { DOWNTO | 0PTO } expression [ ; ] ] } »2 6) A decision statement (production 7.1&) cannot be labelled since it is not considered a compound statement in the implementation. 7) Switch expressions (prodation 7.21) 1o not work for several reasons. First, switch identifiers (production 7.17) are given the type "unknown identifier" in the symbol table instead of being of the type bit as one might expect. Second, the switch operators (production 7.22) B and } are not predefined system operators, so they tay not be used at all, and the other switch_operators understandably do not accept unknown identifiers as operands. Even if the switches were of type bit, the wrong type result would be returned. with the exception of errors 3 and 6, repairs for these discrepencies have been documented but not yet incorporated into the current version of the implementation. Since the FOR phrase as described in fi is closer to the design goals discussed in Fisher [5] than the specified form, it was decided to leave it is is. Frror 3 involves fairly far-reaching side effects, all of which have not vet been tracked down. 4. 3 Expressions It would appear to the casual observer that the setions of " this chapter so far have described errors of increasing number and gravity. In this section, unfortunately, the trend is accelerated. To suisarize briefly, expressions in this implementation simply do not work. t»3 The specification productions for expressions were listed In section 4.2 and describe expressions whose operators are associated froa right to left, along with leans for calling on user-defined operators and procedures. For the initial test data attempted, expressions were reincorporated into the routine blocks of Figure 1*. The errors uncovered by this experiment are given below. 1) The system-supplied substring operator <- can never be found in the syabol table because of incorrectly initialized backets and bucket-links used by the lookup routine. wore specifically, these links contain two different pointers to the token " (concatenate) but none to <-. 2) Bany of the systea-supplied operators return the wrong type result. In particular, the coaparison operators 1°. 12t return bit values; instead, they return a result of the type of the two operands being coapared. For example, T>J returns an integer result if I and J are of type intecrer, a character result if I and J are of type character, etc. LBOOND returns a result of type error, and HBOUND, LENGTH, "-> (index) and ?-> (verify) all return integer results. LINT and CHAR return character and longinteger results, respectively, instead of the reverse. also, several operators do not. accept the specified type of operands, notably LENGTH, HFonND, -> and <-. Figure 15 shows a saaple prograa illustrating these points. (No error aessages were issued except in the two places noted in-line!) «« structure syntaitbst procedure purser returns longinteger procedure nantp alias typecreck returis loigtiteger procedure ontpot(craractrf) . returns longintegef end syntaxtest STRUCTURE PARSER ; TYPE SYRTabeNTRY : PROCEDURE SCANNER RETURNS INTEGER ; END PARSER GLOBAL DATA SYNTABBNTRY ; INTEGER NANETABPTR TWIT 0, TONLENGTH TNIT 0, TOKTTPE INIT 0, VALUE IlfIT END SYNTABBNTRY 3LOBAL STRUCTURE STWABPRTRT ; SETENTRY: OPERATOR # = > (INTEGER, INTEGER, INTEGER) . SYNTABBNTRY RETURNS BIT END SYNTABBNTRY; GLOBAL DATA PARSER CONSTANT INTEGER STHTAFHIX INIT 100 SYNTABBNTRY RIGHT(100) SYNTAB CHARACTBR(I) VICTOR (1:1000) NAHETAB BT* Elf INIT S.O END PARSER PROCEDURE PARSER ; ITERATE SCANNEP WREN BOP = ENO PAPSER = S. 1 END GLOBAL DATA SCANNER CONSTANT INTEGER TTNBLIHIT INIT 90 INTEGER CARDNO INIT 1, CAPDPTP INIT 1 CHARACTER (10) CARD, *OEEN INIT C. rHAR A?trp («1j tcaRD END SCANNER; STRUCTURE SCANNER :PROCED(TPE LTN»TN ALIAS GETI TNE (CHA R ACT BR BY A DDRESS) . R"! TtJRNS BIT ;CONSTANTEVAL: OPERATOR | # | (INTEGER) . CHAR A ~TER RETURNS LONGT* T ESER :PP0CED0RE STANPRROR (INTEGEP) . RETURNS BIT ;L">0E1P: OPrpjTnR LOOKBOR (SYNTABENTRY 1 EXTENTS) . CHA P ACTER RETURNS TNTEGFP ; END SCANNER: DATA SCANNER ;INTEGER I; CHARACTER (1) TCHAP END SCANNER; PROCEDURE SCANNER ; RETTONEN: ITERATE WHILE TRnE: OBCISION LETTER, DELTNITER, SPECIAL, DIGIT, "ONTPOL,EOSR : 3HAR 1 -> TCARD <- CAPDPTR; ACTION; LETTER: BEGIN POR I EPON CARDPTRO OPTO LINELIHIT; ITERATE TOKEN:=TOEBN"1->CARD<-I - 1; TCHAP := 1->TCARD<-I; WHEN (TCHAP-*=C1 ) AND TCRAP-.=C. * END; PETORN LOOKPOR'SYNTAB) . TOKEN END; LETTER AND DIGTT: Nil; !IEPOSSTBLE CASE DELTNTTBP: mr EN : *1 ->CARD<- 1 ; CONTROL OP EOSR: IE LINBIN(CARD) THEN SCANERROR(I) ELSE EITT; DIGIT: 'CHAP := ".« ; SPPCTAL: "*HAP := P. T ; Pigure 1* Routine Block Test Data '4 5 DIGIT OR SPECIAL : BEGIR TORE! :» (CARDPTR:*TCRtR ?-> TCARD <- CAPDPTP»1) -> CARD ) <- 3ARDPTP; RETORR LOORFOR'STRTAB) .TDKRR; ERD PI.SE SCARRRR"*RC?) •RRDEPTRED CHARACTER BRD 'DECISIS ERD GBTTORBB BRD SCAKRPR STROCTORE LIRPTH ;PROCEDfTPE TRPnT(CHAPACTER BT ADDBBSSJ . RETDBWS LORGIRTE3EF ;TRARSLATE: OPERATOR »S% ALIAS TBIRS (CHARACTER, CHARACTBR) . CHARACTER RPTORRS CHARACTER ERD LTRBIR; DATA LIHBTB ;CHARACi*ER (90) IRPOTSTRTNG TRTT C. ;CORSTMT CHARACTER (80) IRCRARS IRTT C. ABCDEPGRIJKLHROPJRSTOVilTZ ().:,; • »*$«£•-♦ ="?/<>| [-.0 12 3*56789! ,TCHAPS IRIT C. 111111111111111111111111112222222 23 3 33333 33 33 33 33 330R \ t| (RADTT) .CHARACTEP STRTRG : TP "..- -- (I:=1| ->STRIRG rRER BEGTH SIGB := -1; T:=2 PRO; PDF I nPTO LPRGTH STRTRG TTEPATB DIGIT := ( 1 -> STRTRG <- I) "-> C. 1 21056789ABCDBF ; TP nrGTT -.= IRTBGER RIL THE* 7AIUP := (DIGIT - 1) «-PADTf •▼ALOE BI.SP RETORR P.O ERD; PRTORR SIGR*VALUE ERD CORSTARTETAL STRUCTURE HARIP; TT»E STACR ALTAS TOTOLTST; "TPE NArRHr3PT3; TYPP CORPLE*; TRRERPRDDOCT: OPERATOR IRTEGEP 1 EfTERTS . (IRTEGER, IBTESEt) «•* ALTAS DOTPPODttCT (TRTPGER) . IRTPGER 1 EITERTS RPTURRS T.ORGIRTBGEP; PROCEDURE ORDCRITR7PARI1S (IRTBGER , THTEGER) . RPTORRS BIT; ERD HARIP; DA^A RAHIP; STACK STACR1,STACK2; COHPLPT RTGR ,, (10,S) CR1,CP2; "ATRTT3«T3 R1,R2 ; IRTErtPR TR^RXPP ERD HARIP: Pigare 1* Continued Routine Block Test Data »6 GLOBAL DA'A NANTP; INTPGPR VPCTORfR) VEC1,¥EC2,VEC3 END HHWTP: PROCEDORE HANTP; NONSPNSP: begin PROCWTTH2PARWS (S.O,B. 01001) ; PPC1. (U, B • ♦ * • ?) . ?EC7; DEPISTOi sw»,S»S,SW*- VECT0R(U:f>) : INTEXPP; SS1,«»2 : rvTFXPR; SiO : PILSE -.SHO == SNtt : EITT HORSEHSE; SRS ,= SH7 UNn s?1 : RETORH F.21»8«6 PLS* REWORK F.O "ND END NONSPNSE END HANTP "5L0BAL DATA STUCK: INTEGPR VECTOR (1: 100) PDLIST; INTEGER 'OP INTT 0; FID STACK; "LOBAL STROCTORP STACK: POSH: OPERATOR INTEGER POSH: STACK BT ADDRESS RETTIRNS BIT; POP: TPERATOR TBT»-ur B y ADDRPSS: POP STACK RETORRS HIT; mSPECT: OPERATOR STACK. (IRTEGEP) ??? TRTEGER RETORRS TNTBGER; PND STACK; DATA posh; STACK 51; TRTEGF" T; "so posh: POSH: OPERATOR INTEGER I °flSRI : STACK SI : DECISION OVERFLOW : 100 <= T0P.S1 AC'TON; -.OVERFLOW : BFGIN pdi st.si (TOP.si:*rop.si*i) :=i: RETORR TROB; END; OVERFLOW : RFTORN FALSE : END PND POSH DATA P^P; STAC* SI: TWEGEW I; PND POP; POP: OPPPATOR TN^PGPR T : POP! STACK 51; DECISTON UNDERFLOW: TOP.S1 <= 1 ACTON; ONDERPT.OH: RETORN FALSE; ELSE BEGIN I := PDLIST.S1 (TOP.S1) ; TOP.S1 := 'OP. SI- 1; RETORN TROE; END: END POP; DATA INSPPCT; TNTBGPR 2,I,.T; STACK S1 END INSPPCT r"SPEC T : OPERATOR STA^K SI. (I) ??? TNTPGBR T; "OR 3 »<»OH TOP. SI STEP -1 T'FPATB RHILF -1<=I; XV X == PPLIST.SI(J) THEN FPTPPR J PNO; PPT0»N PND INSPECT Fiqure 1U Continued Routine niock Test Data »7 DATA IRRERPRODUCT; LORGIWTGFR PRODUCT TRIT T. ; IRTEGER VECTOR (*:*) T1,7?; IRTEGER I,J,U1,L1,U2 ERD IRRERPRODUCT; IRRERPPOOnCT: OPERATOR IRTEGER 1 EfTERTS T1.{Ol,L1| ♦ * (02). IRTEGER 1 EXTBRTS »2; J: = 1 ♦ 1 HROURD 72; EOR T EROE 1 HBOURD V1 D0RBT0 1 LBODRD T1 ITERATE PRODUCT := PR0DUCT*T1 (T) *V2 (J: =J- 1) ERT>; PETnRR PRODUCT ERD IRPERPRODDCT; 3L0BAL DATA RATPTX3BT3; IRTEGER fECTOR(3,3| R1, EXTRA; ERD EATRII3BT3: 3L0BAL STRUCTURE RATRTX3BY3; NATRIXOP: OPERATOR RATRIX3BY3 BT ADDRESS: (IRTEGER) DUAL1DLT (INTEGER) :EATPIX3BY3 BT 1DDRESS RETURNS IRTEGER ERD EATRIX3BT3; 3L0RAL DATA CORPLEX; TRTEGEP TITPAPT r IEAGPART ERD CORPLEX; 3L0BAL STRUCTURE CORPLEX; IBS: OPERATOR UBS CORPLEX RETURRS IRTEGER; BRD CORPLEX; Figure 1U Continued Routine Block Test Da*-a f 4 8 STRUCTURE EXPR ; PPOrEDORP, TU T P<1T (CHARACTER) .RBTORRS LORGTRTEGER ERD EXPR; OATA EXPR TRTRGER T,J,* tORGIHTPGRR R,T,7 BIT B1,B2,B3 THTRGEP VBCTDRflO, 100) ¥1,Y2 CHARACTER^) C1,C2,C3 ERD EXPR; PPOCEDBRE BTOR; J := - T K := r * K := ( I B1 := B1 K T ♦ ABS J ; ! 80, 66, 68, 67 3 // K ♦ 1 j ! 69, 70, 71, 8* :»■ T- • 1 | * J := J // 2 ; AMD B? 1R -,B3; ! 70,76,120,77 (J: =T *oo f»l **J; 2 LPOtJRP 71 ; r T I T T r C2 = J = =3; • ■ J-=3; = ,7>3; ! = 1<3; • = J>=3; ! = J<=3; ! := LERGTR CI: 78 79 80 81 82 83 C. 1 HBOORD n C3 C3 C3 C3 C2 V T 9 7 Z V B1 CI R P1 n W C1 R := C1 -> C2; := ci <- C2: := CI "-> C2; := CI ?-> C2; := I.TNT ?1; = CHAR 9; = ABS - »: = R*r*7//P.2; * i*»T; = i BOD r: = r==F.i*: : = p2==R3; := c? ; := T-=7: := B? -.= B3: := C2 -.» CI; := r>7; := C2>C3: : = T=7: CI := C2>=f3; W:=Y<=7; C1:=C2<=r3; ci : = ci " CI PRO ETPR; ( ! 73,72,80 ! 75, SHOOLD GET EBBOR BESSAGE OR := ! STRCE tBOORD RBTORRS "ERROR" TTPE 86 87 88 89 SHOOLD GET ERROR RBSSAGF OR <• 90 91 93,119 90,121 15,96 97,98,99,100 101 102 103 100 105 106 108 109 111 112 113 110 115 116 117 118 9? Fiqare 15 fTnasaal Expressions Accepted '4 9 3) Productions 5.27 and 5.27b are not accepted for unary operators. For example, LOORFOR* SYHTaB) . TOKEH is not accepted on the grounds of aisaatched parentheses. On the assumption that the specifications •era not coaplete (i.e., unary operators were perhaps consllered a special case), the following was tried: LOnRFORf'SYHTlB) .TOKEH . This caused the parser to enter a loop that vaccilate^l between the guote and soie other unspecified token, printing repeatedly Tlleeral or inactive token. Bad token was • Illegal or inactive token. Bad token was Illegal or inactive token. Bad token was • until a aessage was issued claiming the stateaent was too long. This operator call was tried one last tiae deleting the quote altogether, but the parser objectel to the dot. There are aany more errors in the processing of expressions, too nuaerous to list and too nebulous to describe intelligently, nnf ortunately, most of these cause fatal TBH/360 errors which aade further testing fruitless. sn 5 Seiantic Analysis As stated before, it was difficult to separate pure syntax from semantics since these were implemented concurrently. Faction i» covered type-checkinq and some other semantics alonq with syntax. This section is concerned with the correctness of th a symbol table complex and the intermediate text. *>.1 Symbol Table Complex The malor lata base for the compiler is the symbol table and includes the name table, the hash table, the type analysis table (♦■oken attributes and some scope information) , the constant ♦ ablr», the level/conficruration table (activation information), the conf icruration table, and the typp table (parameter information) . These tables are first inspected for the correctness of the system predefined symbol entries and then for oser-defined entries. 5.1,1 Predefined Symbols The system's predefined symbols are locations 1 through 122, and all of these entries in all the tables listed above ar«» initialized via the PL/I TWTT option btfore the compilation of th a user proaraa. The hash table is incorrectly initialized as discussed in section 1. 3: the »ntry <- has no pointer to it, while " has two pointers, one of which is never used. Unfortunately, the repair 51 Is not as simple as changing the unused "-pointer to point to <- ; the repair is documented elsewhere. Ill other entries are correct. The type field (giving the type returned or the type of the identifier) of the type analysis table is also incorrectly initialized, as discussed in section 4.3. Entries 78 through 83 and 103 through 118, the comparison operators, should contain 8 indicating a bit value is to be returned. The returned types of LINT and CHAR are correctly initialized to longinteger and character respectively, but in actuality they return the reverse of this. (It is unclear why this is the case.) LENGTH, HBO0ND, "-> and ?•> are initialized to return character results, but all of thesa should return integer results. All other entries of the type analysis table are correct. The type table field defining the types of procedure parameters and operator operands also has soae incorrectly initialized entries. The operators HB00MD, -> and <- are all sat to accept character operands while the first should take an integer operand and the last two should take an integer and a character operand and the reverse, respectively. Again the entries of LIU* and CHAR appear to be reversed. All other entries are correct. The name table, the constant table, the configuration tble and the level/configuration table are correctly initialized for the predefined symbols. 52 5.1.2 nser-defined Symbols Information about user-defined symbols is added to the symbol table as it is determined from the syntax. For symbols which occur in any error context, the information in its entry may not be complete or correct, which is understandable. However, in some cases *hese errors can have adverse side-effects on other entries in the symbol table by overwriting them. For instance, the token :=> was mistakenly entered as an operator. In the operator definition in the program, the symbol table lookup routine was in declare-token mode when it reached ♦■h a :=>, correctly isolated it as the special svmbol : = , and proceeded to change that entry to be a unary operator, which :=> was supposed to be. Thon when the parser reached the >, it correctlv issues an error messaae claiming an unexpected token, but entered the r*»*urn type of error in th? := symbol table location (and the one before it was chanqe<1 also, since configurations use two consecutive rows of the symbol table). Th a reason this is mentioned is that if later in the procrram ♦■here are integer assignment statements, much confusion will result. The point is that the predefined symbol entries shouli have a higher degree of protection. Wi«-h the exception of the problem described in section 1.1 which caused several different tokens to be placed in the same location of thp symbol table, thus making more than one hash link point to that location, no other errors ^.a.'c [»^en detected in the ha«?h links of inserted it«ms. S3 Wo errors have been detected in the conf lguration table and lewel/conf iguration table, with one exception: all the predecessor links in the lewel/conf iguration table remain zero throughout the compilation of any program. Perhaps this pointer is not explicitly used, since the "father" link of the configuration table seems to contain the saae information. Alsa no errors have been found ia the manipulation of the constant table, naae table or the type table except for those tokens found in an error context. 5.2 Intermediate Text One problem with the testing of the intermediate text was that it was difficult to construct a program that had few errors, therefore a readable intermediate text. A second problem was that by this point in the project it had become clear that the CLPOPATRA code generator contained some serious incompatibilities which prevented the linking of the two passes of the compiler. Bat the programs presented thus far were "doctored ap" a bit and the intermediate text from their compilation was analysed. Happily enough, no errors were detected. It miqht s?en somewhat suspicious that the semantics of nesting levels and intermediate text generation, much more complicated tasks than lexical and syntactic analysis, would be rala*iwely error-free, while the "easier" parts of the compiler were bug-ridden. There could be two reasons for this. Perhaps the implementor spent a greater amount of time and care in constructing th a more difficult parts, knowing that the chance 3f su prror was dreat?r, and spent less time and attention to detail on the "easier" parts. Also it could be true that the test programs that could be used for testing in this phase were very simple, due to the errors in the previous phase of the compiler. Re that as i* may. Chapter » certainly documents enough errors for two chapters. 55 6 Conclusions The three parts of the CLBOP1TB1 analysis phase have now been tested. k thorough test of the lexical analysis uncovered a moderate number of errors, all of which were rectified, tkas establishing a high level of reliability for the scanner. Reasonably thorough tests, considering the complexity of CLROPiTRi, were run against the syntax analyser; again only a moderate number of purely syntactic discrepancies were found (but not vet corrected). When the documented repairs are implemented, the parser will be acceptably close to the specifications. However, the syntax tests were confounded by the shambles of the ♦•ype-cherking and expression-handling portions of the semantic analyzer. The remainder of the semantic analyzer, intermediate text generation, is fairly good. Evaluation of this project is difficult. In one sense, it was very successful since so many errors were found, and that was the purpose of the testing. However, one of the basic assumptions of a validation effort is that the code has been debugged already by the implementor to remove the more mechanical errors of coding; it is difficult to believe that this was done. Too many of the errors found would have become immediately obvious if even the most superficial testing had been performed. Par example, any test program that contained even a very simple expression (and one that does not is rare) would have caused the compiler to come crashing down. As a result, this project was "successful" as a debugging venture. 56 Onf ortunatelv , the project was not as successful as a vali da tion effort; the answer to the question "Ts this i mplempn»-a tion of a CLPOPATRA analysis phase a valid one?" woull be an unequivocal "W0! M . At least not for this version o* the i mplom^ntation , because the validation process is not really conipl e*=»rl. The documented errors Bust first be repaired, then the tests must be rerun. No doubt aore errors will be found requiring the process to be repeated. Despite th a pessiaisa exhibited above, the future of the CtEOPATPA project is proaisinq. Plans include the production an an improved version of the iapleaentation to incorporate corrections of the errors described in this thesis, and the r°t»stipcT of the new version using the data that has been pr Q sen*-°d here. The result should be a well-written, reliable analysis phase implementation which will provide a solid basis for future CLEOPATRA expansion. 57 References [11 Schreiner, Axel T. , "CLEOPATRA Comprehensive Language for Elegant Tparating System and Translator Design", Report 0TUCDCS-R-7&-6tt6, Department of Computer Science, Oniversity of Illinois, Hay 197*. [2] , "CLEOPATRA A Proposal for Another System Implementation Language", Report 0I0CDCS-R-7ft-65m, Department of Computer Science, University of Illinois, June 197U. [31 Ralbur, John D. , "A Code Generator for the CLEOPATRA Language", Report UTUCDCS-R-75-739, Department of Computer Science, University of Illinois, July 1975. -a] 9 "CLEOPATRA Code Generator Oser f s Guide", Report OlUCDCS-R-76-mo, Department of Computer Science, Oniversity of Illinois, January 1976. [5] Eisher, Scott H. , "Implementation of the Language CLEOPATRA: the Analysis Pass," Haster*s thesis, Oniversity of Illinois, September 1976. [6 1 Gruenberger, Pred, "Program Testing and Validating", H§liI£tion, Vol. 10, No. 7 (July 1968), pp. 39-47. S8 *7] Fimenrtorf, W. R., "Controlling the Functional Testing of an operating Systea", IEEE Transactions on Sjsteas Science ani ^tSLa^iiG^* Vo1 - SSC-5, Wo. U (October 19*9) , pp. 28&-290. [8] Prown, JU R., and Sampson, W. A., Progxaa Debugging. Aaerican "lsevier, Hew York, 1973. [9"| Gooienouqh, John B., and Gerhart, Susan L., "Toward a Theory of Tsst Oat a Selection", IEEE Transactions on Software rJiaiaieEilll* Vol « SP-1, Wo. 2 (Jane 1975), pp. 156-173. f10l London, R. L., "Provinq Proqraa Correctness: Some Techniques ar1 F*aaol?s," PIT, Vol. 10, No. 2 (September 1970), pp. 168-182. [ill Huang, J. ?., "Hn Approach to Proqraa Testing", Co»£uting Sorrels, Vol. 7, No. 3 (Septeaber 1975), pp. 113-128. (12 1 Hetzel, Rilliaa C, "Principles of Coaputer Proqraa Testinq", Proqraa Test Methods, Prentice-Pall, Inc., Enqlewood cliffs, N.J., 1973, S9 Appendix: Snmiary of Productions Implemented This appendix summarizes the productions that define CLROPATRA as it is implemented. Upper-case symbols and punctuation designate terminal symbols of the language; lower- case symbols represent non-terminals. Other notation used is: | indicates a choice ( J combine a number of choices [ 1 the enclosed entity may appear or may be omitted [ )• the enclosed entity may appear zero or lore times { ) • the anclosed entity must appear one or more times Since the character f is also a member of the CLEOPATRA character set, it will be underlined (£) when it denotes itself as a terminal symbol. so (1.1) letter II* 1 | 1 f C | | I | F | 6 | I | I | J | I | L|B1PfO|P|Q|R|S|T|0|Vtf|lf I T| ZU| b| cH | H f I 9lb| i| 11 U 1 I i M I o | p | q t r | s | t I o | m 1 x I y I z (1.2) deliiiting_character : :* ( 1 ) I • I : t , f ' t : I blanfc_character | _ (1.3) special_character ::= aff|S|%ict*j-t + | = I « I » I / l'< ! > I 1 1 * I •"■ (1.4) digit ::=n|i|2|l|«|5|fi»7|8|9 (1.5) contral ::= ! (1.6) cownent ::= COMHEfT any sequence of characters with the srception of a semicolon ; (1.7) : : = ! anr sequence of characters with the exception o^ an end of the source record end of source record 51 (2.1) identifier ::= letter [letter \ digit]* (2.2) operator ::* fspecial_character) • | identifier (2.3) constant ::= integer \ long_integer | bit J iiteralvalue (2.1) decinal_digit ::= digit (2.5) hexadeciial_digit ::= digit |R|B|C|DfB|F (2.7) binary_digit ::= J 1 (2.8) minus_syibol ::= - (2.9) deciaal_string ::= [ ■inus_syabol] fdeciaal_digit} • (2.10) hexideciaal_string :?= T. f iinas_syibol ] f hexadecital_diqit) • (2.12) binary_string ::= B. [ »inus_syabol ] fbinary_digit) • (2.13) basic_constant :: = deci«al_string | hexadecimal_strin:j | binary_string (2.14) inteo?r ::= basic_constant (2.15) lonq_integer ::= P. deci»al_string (2.17) bit ::= S. binary_digit (2.28) literal_value ::= C. any sequence of characters up to but not including the first following blank_character (2.29) systei_supplied_value ::= basic_type !»TL | f LORGTNTEGER | IWTEGER } { LARGE | SHALL } (2.30) systea_supplied_constant ::= FALSE | PIRST | LAST | TROP 52 (».1) basicjr#f_type ;:= TRTEGER | LOHGINTBGER j BIT | CHIRACTB* C».2) basic^tjpe ::= INTEGER | LORGIRTEGER | BIT | CHARACTER r (expression) ] (•.<») ref_tjpe ::= f basic_ref_type | tjpe_naie } [integer BJCTERTS] S3 (5.1) strnctare_block ::= STRUCTURE configaration_naie f ; link_ltei }• [;] EWD conf ignration_naae ; (5.2) conf igaration_naie ::= procedure_na»e | type_nane | operator_link (5.3) link_itea ::= TTPB type_naie [ALIAS identifier] | global_link_item (5.9) global_link_itei ::* PROCEDURE procedure_naae [ALIUS identifier] [ ref_type_list .] returns basic_ref_type (5.5) global_linlc_itei ::= operator_link : OPERATOR f left_ref_types] operator [ALIAS identifier] right_ref_types RETURNS basic_type (5.10) ref_t*pe_list ::= (type_for»al [, type_f orial ]•) (5.11) type_fo«al ::= ref_type f BI ADDRESS] (5.12) left_ref_types ::= f basic_ref_type | type_na»e } TBI ADDRESS : ref_type_list I . ref_type_list ] (5.13) riqht_ref_types :: = ref_type | : ref_type BT ADDRESS I ref_type_list { . ref_typ*» | : ref_type BI ADDRESS ] (5.1«i)