i MH B il l Mm IBHBHHiliM mm Wflsmfim MBiMgB lBBBilnwnOBaK ■I I ■SHI !■ I HO U ►'", BE Hi ■ Hi 191 HB4 LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 510.84 IA6v ho. 69 1 -6% ?■ cop. 2. The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/interactiveanaly695davi yyu^f U f uiucDcs-R-75-695 AN INTERACTIVE ANALYSIS SYSTEM FOR EXECUTION-TIME ERRORS by Alan Mark Davis January 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS IHE LIBRARY OF THE FEB 24 1975 UNIVERSITY OF ILLINOIS UIUCDCS-R-75-695 AN INTERACTIVE ANALYSIS SYSTEM FOR EXECUTION- TIME ERRORS* by Alan Mark Davis January 1975 Department of Computer Science University of Illinois at Urban a -Champaign Urbana, Illinois 6l801 This work was supported in part by the National Science Foundation under Grant No. US-NSF-EC-U15II and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, January 1975. Ill Dedicated to my parents for their love, support, and faith which have enabled me to accomplish this work iv in memory of Morris Slobodow ACKNOWLEDGEMENTS The author wishes to express his gratitude to his thesis advisor, Professor Thomas Wilcox, for his very useful guidance and suggestions throughout all stages of this project, and to Professor Jurg Nievergelt for his helpful comments about the implementation and the written thesis. A sincere thank you must also be given to the other members of the thesis committee: Professors D. B. Gillies, W. Kubitz, and M. D. Mickunas for their active support; as well as to Mike Tindall and Professor W. Hansen for their many helpful suggestions. The National Science Foundation (Grant EC-4l51l) and the Department of Computer Science of the University of Illinois deserve many thanks for their financial support. Thanks also goes to the entire PLATO staff and to Professor H. G. Friedman for their assistance in enabling the implementation to be completed. The Fall, 197*+ CS 121 class must also be thanked for volunteering to be guinea pigs on the experimental system. Sandy Leach read the manuscript and her multitudinous suggestions on both grammar and content were much appreciated. Also, a word of thanks is in order for Mary Jane Doolen for her excellent job of typing this manuscript when time was of major importance. vi TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 Purpose 1 1.2 Batch Error Handling 2 1.3 Interactive Error Handling k 2 . A SURVEY OF INTERACTIVE DEBUGGING SYSTEMS 7 2.1 The Past „ 7 2.2 The Present 9 2.3 The Future 15 3 . THE PROGRAMMING SYSTEM 17 3 . 1 Introduction 17 3 . 2 The Editor/Compiler Package 18 3.3 The Execution Supervisor 19 3.^- Language Used for Experimentation 21 k. ANALYSIS OF EXECUTION-TIME ERRORS 2k k.l Motivation • 2k k.2 Introduction 27 k.3 Static Analysis 28 ^.3.1 Introduction 28 U.3.2 What Must Be Remembered 29 J+.3.3 What Static Analysis Does 31 ^.3.^- Implementation 35 VI 1 Page k . h Dynamic Analys is 38 k.k.l Introduction 38 k.k.2 The Main Algorithm 39 k.k.3 Reverse Execution kl h.k.k Searching for Common Misconceptions 1+5 k.k.5 Implementation k9 k . 5 Flow Exhibition 50 k.G Implementation Problems 53 5 . RESEARCH CONCLUSIONS AND FURTHER RESEARCH 58 5.1 Error Analysis Research (Restrictions and Improvements) .... 58 5.2 Research Conclusions 6l 5.2.1 Static Analysis 6l 5.2.2 Dynamic Analysis „... 63 5.2.3 Flow Exhibition 63 5.2.1+ PLATO 6k LIST OF REFERENCES 65 APPENDIX A - SAMPLE DATABASES 68 APPENDIX B - SAMPLE COMMON MISCONCEPTION TABLE 71 APPENDIX C - SAMPLE DIALOGUES 72 VITA „ 99 1. INTRODUCTION 1.1 Purpose A project is underway at the University of Illinois to automate the beginning (freshman) computer programming courses taught by the Department of Computer Science. The Automated Computer Science Education System (ACSES) hopes to replace most of the work of the human consultants, instructors, teaching assistants and tutors [16] . This project is being implemented on the PLATO computer-assisted instruction system [1], One essential part of ACSES is a highly interactive programming system on which beginning programming students may write, edit, execute and debug their programs. This Computer -As sis ted Programming System (CAPS) will be discussed in Chapter 2 and is described fully in [22]. One of the obvious goals of CAPS is to supply the diagnostic assistance necessary for beginning programmers when either syntax or run- time errors are encountered. Much work has been done in the area of detection, correction, and analysis of syntax errors [12,13,20,21], but research with execution-time errors has been limited to detection and repair methods. Error detection at run-time deals with methods of locating programming bugs as soon as possible and reporting such to the programmer. Error repair at run-time deals with methods of correcting the situation in some way as to permit execution to continue. This thesis is concerned with error analysis. Error analysis at run-time deals with methods of discovering why the error has occurred. Up to now execution-time error analysis has not been explored in depth. Specifically, the purpose of this thesis is threefold: (l) to present the reasons why analysis of execution-time errors is needed for beginning programmers, (2) to develop the actual features and algorithms necessary for such analysis to be effective, and (3) to describe and analyze an actual implementation of the developed algorithms. It is the purpose of the present chapter to explore the differences between run-time error analysis and the other types of run-time error handling which exist on present programming systems. 1.2 Batch Error Handling The barest possible execution environment is one in which the intermediate text that has been generated by a compiler is only machine language for the target computer, and the computer simply executes the instructions. In this non -interpretive environment, the only execution errors that can be determined easily are those which the hardware will detect. These would typically include such things as addressing on a non-word boundary, addressing outside the program's data area and arithmetic overflow or underflow. The information given to the programmer would probably include the type of error interrupt triggered and the offset in machine words from the base of the program. The programmer could then, if given the proper cross-reference tables, symbol tables and dumps, locate the statement in which the interrupt occurred. All of this information is utterly useless to a beginning programmer. An execution environment that could be more useful to the beginner is one in which the intermediate text generated by the compiler is interpreted by an execution supervisor that also monitors the data structures used in order to locate an execution error long before it would have been detected by the hardware. An interpreter, for example, might be able to flag an array subscript which is out of bounds; however, a non-interpreting execution supervisor might be able to locate it only if the subscripting caused a memory reference outside the user area. Notice that if a program were allowed to modify words which are immediately above or below an array by subscripting out of range, the effects of that error would be disastrous and, if finally detected many statements later, would be extremely difficult to isolated Both of the above environments can be visualized by the block chart given in Figure 1.1. The only difference, but a big difference, between them is their effectiveness in locating errors. EXECUTION OF PROGRAM EXECUTION ERROR DISPLAY ERROR MESSAGE Figure 1.1: Execution Environment Model 1, The present model of an interpreter can locate only one error at a time during execution. If the interpreter could somehow "repair" the error on the spot as well as supply an error message, then execution could continue until termination or until another error was located. In this case the model looks like Figure 1.2. This technique is used by various batch compilers today [5]. The main problem with them is that neither the error handler nor the execution supervisor know what the programmer had intended to do. The repair made is quite definitely epair" and not a "correction". This repair merely enables execution a r ERROR REPAIR EXECUTION OF PROGRAM t l DISPLAY ERROR MESSAGE EXECUTION ERROR Figure 1.2: Execution Environment Model 2, to continue unheeded by the presence of an obvious fault in the program. This approach is not a bad one; it is excellent within the confines of a batch environment where there does not exist any method of finding out what the programmer intended. 1.3 Interactive Error Handling Usually, interactive systems return to the model in Figure 1.1. The main advantage of the interaction is that the programmer is near-by and can therefore interact with a debugging system in order to initiate editing, re-executing, tracing, etc. In this case however, the student is on his own after an execution-time error has occurred. It is felt that this approach lacks necessary guidance by the error system. Many examples of such systems appear in Chapter 2. This thesis proposes an alternative approach to execution -time errors in an interactive environment. At the occurrence of an error, the basic goal is to discuss with the programmer the problem at hand, obtain from the programmer information about what specific sections of the program are supposed to do, and finally point out specific changes that the student might make in his program which would prevent the occurrence of the parti- cular error. An additional goal is to perform these tasks in an easy-to- follow orderly manner so as to (l) not confuse a beginning programmer, and (2) demonstrate to the beginning programmer how a program should be debugged in the future by the student himself. This environment is shown in Figure 1.3. This interaction is directed by the system and not by the student. This is essential because the beginning programmer does not know what questions to ask; he does not know how to debug yet. An added result of this is that the student does not have to be burdened with learning another language (i.e., the command language for the debugging package). EXECUTION OF PROGRAM EXECUTION ERROR DISPLAY ERROR MESSAGE REQUEST FOR HELP INTERACTIVE ERROR ANALYSIS WITH PROGRAMMER Figure 1.3: Proposed Execution Environment Model 3, At this time a trivial example should point out the basic differences between the first two models discussed and the proposed model, Here is a very simple PL/l program and some sample responses by the three models : line # 1 DECLARE A(l:2); #2 DO I = TO 2 BY 1; #3 A(I)-I; #1+ END; Model 1: SUBSCRIPT OUT OF RANGE IN LINE #3. EXECUTION HALTED. Model 2: SUBSCRIPT OUT OF RANGE IN LINE #3- SUBSCRIPT SET TO 1. EXECUTION CONTINUED. Model 3: SUBSCRIPT OUT OF RANGE IN LINE #3. DID YOU WANT YOUR ARRAY TO HAVE 2 ELEMENTS NUMBERED 1 THROUGH 2? IF SO, CHANGE LINE # 2 TO READ: DO I = 1 TO 2 BY 1; OTHERWISE, CHANGE LINE # 1 TO READ: DECLARE A(0:2); An execution-time error analysis system using this approach has been implemented as part of CAPS and aims at assisting programmers with correcting execution errors as well as teaching students how to debu§ their programs. Chapter 2 will discuss the history of debugging systems. Chapter 3 will describe the features of CAPS which influence the run- time error analysis. Chapter k will describe the actual run-time error analysis in detail. 2. A SURVEY OF INTERACTIVE DEBUGGING SYSTEMS 2.1 The Past From the moment that the computer became a useful tool, it became apparent that a major portion of the time used in preparing a program for proper execution on the computer would be spent in debug- ging. The term debugging means any systematic method of detecting and correcting faulty logic in a program. In this chapter, the reasons for the excessive amount of time spent at debugging will not be explored, nor will new approaches to programming be proposed which could result in fewer "bugs" in the original program. Instead, this chapter will be a survey of the interactive debugging facilities that have been designed to assist the programmer in the burdensome task of eliminating the logic errors from his creation. This survey is intended to point out the shortcomings of the present repertoire of debugging services. The earliest form of programming was that of assembly language. With each machine and assembly language today the manufacturer usually supplies some form of debugging package. Before these were readily avail- able, the debugger had only the computer's switches (front panel) and perhaps a console to assist him. Through these devices, he was capable of examining or modifying any location in memory and of stepping through his program's execution either by single instructions or processor cycles if necessary. After many attempts at supplying test data and modifying various instructions to comply with his logic constraints, he would 8 finally debug his program. Debugging in this way was certainly a tiresome task, and luckily there soon arose some new features. The debugging package afforded the user a welcome set of console commands including setting of breakpoints to permit the program to execute full speed through a debugged section of machine language and then return control to the user at the breakpoint location. Then, the program- mer could make requests to see the contents of memory locations, examine registers and status words, and insert or delete additional breakpoints. Because of the very low-level instructions to which the user was restricted, debugging assembly language programs was (and still is) very time- consuming. As higher level languages became popular, new debugging features had to be developed. No longer should a user be required to know the machine language in order to debug his program. With the advent of more complex operating systems, the user had no way of knowing where his particular program or data resided in memory. Thus there evolved higher level language debugging systems that were applicable in a complex operat- ing system environment which possessed the same features as their assembly language counterparts. However, these debugging systems introduced new implementation problems. For example, it was necessary to have the symbol table present during execution so that the user could refer to data symboli- cally. Also, it was necessary to locate the beginning and end of source-level statements within all the machine code generated by a typical compiler. Bernstein and Owens [3] describe four approaches to debugging that could be used: 1. Attempt to duplicate the failing condition in order to obtain information about the program just before failing. 2. Acquire storage maps, tables and data areas. 3. Initiate machine -level debugging procedures at the operators console, k. Correct the error. But as time went on, it became apparent that this was not sufficient. In addition to examining and modifying symbolic locations and possibly setting of breakpoints, a few other tools arose which set the trend for debugging systems of today. These included automatic data storage dumps, various trace facilities, and an automatic dump of specific data which described the machine's state at the time of error. Even these tools provide limited information about the cause of errors. The repertoire of debugging tools had to be further extended. 2.2 The Present What types of services are available today from higher level interactive debugging systems? Project MA.Cs LISP has a variety of useful debug features [7] . These include trace of both statement flow of control and variable changes, conditional breakpoints, and modification (using an editor) of the actual program's list structures. Conditional breakpoints have a definite advantage over the usual unconditional type. A user may specify in his program that if a certain condition occurs in a certain place, then return control to the user for further requests. The modi- fication to the list structure (both the program and its data are list structures in LISP) is also quite unique. Since this feature is available during interpretation, the programmer may modify the program when control has reached a breakpoint and continue execution; there is no necessity for him to request a recompilation and execution. This is a very useful and timesaving feature, but of course it is easily implemented only in a runtime- storage environment which is as dynamic as that under LISP control. 10 QUIKTRAN also possesses some unique built-in debugging features [7]. In addition to unconditional breakpoints, examination and modification of variables, trace, and insertion and deletion of FORTRAN statements without a recompilation step, it also has an audit feature. The audit feature, when requested, supplies the programmer with statistics showing the number of times that each statement in his program was executed. This is particularly useful after debugging when the programmer wishes to optimize the most used sections of his program. QUIKTRAN stops just short of a similar feature available in a batch environment described by Wolman [23]. His audit operation supplies the user with the following informa- tion about each source level PL/ I statement: number of time executed, unit cost (function of amount of machine code generated), total effective cost (number of times executed times unit cost), as well as similar statis- tics on the time spent on each statement. This is certainly a unique programming aid, but for an experienced programmer I More conventional debugging features are available on the Berkeley time-sharing system for the MAD and FORTRAN languages [7]. These include acquiring the contents of variables upon request, setting unconditional breakpoints and an extensive editing feature which must be followed by a recompilation. More information about LISP, QUIKTRAN, MADBUG and FORTRAN at Berkeley may be acquired from [lU], [6], [8], and [h] respectively. R. M. Balzer [2] has developed an Extendable Debugging and Monitoring System (EXDAMS) at RAND Corporation with the following goals: 1. To test some proposed but unimplemented debugging facilities, 2. To be extendable to enable new aids to be added easily, and 3. To be independent of the language of the machine. The basic philosophy of EXDAMS is that in order to perform at an ultimate 11 level, all a debugging system needs is a complete history of a program's execution. (Contrary to Balzer's opinion, a system requires more than just a program history. This opposing philosophy will be fully discussed in chapter k. ) Upon the occurrence of an execution error, EXDAMS accepts requests from the programmer for various kinds of information. EXDAMS then looks at the history, extracts the proper information, formats it nicely, and gives it to the programmer for his critical examination. The burden of asking the proper questions (and thus the burden of discovering what caused the error) is regretfully left to the user. However, the features available to assist him in the debugging are quite useful. The programmer has the ability to execute forwards or backwards at any position in the program and may request that these be done at any speed (obviously with some upper limit I ) so that the debugger may speed through correct sections of code and may go slowly through possible problem areas. The user may also stop execution at any time and request certain displays. These include "flowback analysis", "control space flowback", and standard tracing facilities. The two flowback features are worth a special note. They are presented to the user when he effectively asks the questions: "How did variable get the value of ?" and "How did I get from position in the program to position ?" In the first case, a tree is displayed showing the history of a certain variable as in Figure 2.1. The root of the tree represents the current statement being executed and its descendent nodes represent past events. The debugger may then request more information of this type by indicating in the displayed tree a particular leaf which is to be further expanded into another displayed tree with it at the root. The control space flowback analysis displays a 12 similar but degenerate (linear) tree that describes the path taken during execution between any two points. A = B+ C - 10; = 105 D SDR K Figure 2.1: EXDAMS ' Variable Flowbaek Analysis EXDAMS appears to be the most useful and original debugging package of its kind. It combines the more common debugging features with a number of very useful displays; however, its major debugging philosophy appears to agree with most other competitors. That is: First, the user ascertains what has happened; then, if incorrect, the user must determine how or why it happened. This philosophy is obvious when one sees that the user must ask all the questions of the system. This is not an adequate situation because in many cases the programmer has little idea of what went wrong. Programmers are also quite blind to their own bugs; even while staring directly at one, the programmer will not notice a glaring mistake. If the debugging system were somehow "smarter," it could ask the user certain relevant questions about his program and then assist in locating the bug by carefully steering the individual down correct paths of thought. 13 A new approach to debugging has recently been introduced by- Marvin Zelkowitz of Cornell University in his Ph.D. dessertation [2^]. He has modified the PL/C compiler to include a modified ON condition statement, a TRACE statement which supplies a statement and variable trace, and a RETRACE statement. The RETRACE statement has the general form, RETRACE