LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 TfiGr cop.2- 1 he person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/experimentoncais771embl UIUCDCS-R-76-771 /Ua. fix AN EXPERIMENT ON CAI SEQUENCING CONSTRUCTS By DAVID W. EMBLEY February, 1976 uiucdcs-R-76-771 AN EXPERIMENT ON CAI SEQUENCING CONSTRUCTS By DAVID W. EMBLEY February , 1976 Department of Computer Science Univeristy of Illinois at Urbana-Champaign Urbana, Illinois 6l801 This work was supported in part by the National Science Foundation under Grant No. US-NSF-EC-U1511. Ill TABLE OF CONTENTS Page 1. INTRODUCTION 1 1.1 Psychological Studies in Programming 1 1.2 The KAIL Project 2 1.3 General Complaints About TUTOR 2 2. THE LANGUAGES AND HYPOTHESES h 2.1 The Two Languages of the Experiment k 2.2 TUTOR and the T Language k 2.2.1 Program Structure k 2.2.2 Exception Handling 5 2.2.3 Modifications 8 2.2.U Hidden Side Effects 8 2. 3 Hypotheses 9 2.3.1 Uniformity 9 2.3.2 Orthogonality 9 2.3-3 Dynamic Execution 10 2.3.U Hidden Side Effects 10 2.k The K Language 11 2.U.1 Program Structure 11 2.U.2 Exception Handling 11 2.U.3 Modifications 12 2.k.k Hidden Side Effects 13 IV Page 3. THE EXPERIMENT Ik 3.1 Overview lk 3.2 Subjects 15 3.3 Lectures on T and K 15 3.U The PLATO Session 16 U. RESULTS 23 U.l Measures of Ability 23 U.2 Difficulties 2k k.3 Background 2h U.U Debugging Statistics 25 U.U.I Bug 1 25 U.U.2 Bug 2 27 U.U.3 Bug 3 27 U.U.U Bug h 30 U.U.5 Bug 5 31 U.5 Modifying Statistics 33 U.6 Correlations 35 5. CONCLUSIONS 38 5 • 1 Summary 38 5.1.1 Uniformity 38 5-1.2 Orthogonality 39 5.1.3 Dyanmic Execution 39 5.1.U Hidden Side Effects 39 5.2 Recommendations 39 5.3 Suggested Improvements ho V Page REFERENCES ^ 2 APPENDIX A: PROTIONS OF THE REFERENCE MANUALS FOR K AND T 1+3 A.l Pages from the T Reference Manual 1+1+ A. 2 Pages from the K Reference Manual 1+8 APPENDIX B: INFORMATION USED FOR SUBJECT GROUPING 52 B.l Background Information Sheet ■ • 53 B.2 KAIL Quiz 55 B.3 Subject Grouping Schema 58 APPENDIX C: COMPLETE PROGRAM LISTINGS FOR LESSON "INTREP" IN K AND T. 59 C.l T Listing 60 C.2 K Listing 66 APPENDIX D: PERFORMANCE STATISTICS FOR THE DEBUGGING SECTION .... 71 D.l Scores 72 D.2 Timing Data 75 D.3 Editing Data 82 APPENDIX E: PERFORMANCE STATISTICS FOR THE MODIFYING SECTION .... 89 E.l Scores 89 E.2 Timing Data 91 E.3 Editing Data 93 APPENDIX F: CORRELATIONS 95 F.l Correlations for T Language Subjects 96 F.2 Correlations for K Language Subjects 97 VI LIST OF TABLES Table Page 1 Summary of Background and Ability Data 25 2 The Extent to which Subjects Worked the Problems ... 26 3 Error Contingency Table for Bug 1 27 k Error Contingency Table for Bug 3 28 Data Correlated with Scores 37 VI 1 LIST OF FIGURES Figure Page 1 Lesson and Help Sequences in T 7 2 Overview of the Experiment l6 3 Flow Diagram for "intrep" IT k BUG: Screen Overwritten 19 5 Solution in K 20 6 Solution in T 21 T Handling back in K 29 8 Handling "back in T 30 9 Section of T Code 32 10 Section of K Code 33 11 Double pause Problem 35 ABSTRACT This report describes and gives the results of an experiment designed to investigate sequencing constructs in computer-aided instruction (CAl) programs. The experiment compared two languages. One incorporated sequencing constructs of an established CAI programming language; the other contained alternate constructs which stressed uniformity, orthogonality, and static definition and minimized the use of hidden side effects. Working on-line, subjects debugged and modified a large program while the system monitored their progress and gathered data. Although several interesting results emerged, it was impossible to state conclusively that the entire set of sequencing constructs in either language proved better. Instead, the experiment indicated that direct, explicit, and simple constructs are best. Recommendations are given for designing sequencing features for CAI languages. 11 ACKNOWLEDGMENTS Professor Wilfred J. Hansen deserves a special thanks for guiding me through this study. He not only discussed the experiment with me on numerous occasions, but he also allowed me to conduct the experiment as part of his CS 317 course on computer-aided instruction. I thank Professor Richard G. Montanelli for his help on the statistical aspects of this study. He also read an early draft of the manuscript and offered several pertinent and important suggestions. Many others helped in several wasy. The CS 317 students cooperated beyond expectations. Ron Danielson suggested some improvements for the experiment. Professor George H. Friedman gave technical and administrative assistance, and Kathy Gee typed this report. 1. INTRODUCTION 1.1 Psychological Studies in Programming Programming languages have traditionally teen designed by individuals or relatively small groups. Language designers generally introduced and propagated language features based on their own feelings, thoughts, and experiences. They rarely consulted with the intended user community and made little or no effort to determine the psychological soundness of newly designed language constructs. One objective method to consult the wishes of the user community and concurrently establish psychological soundness of new language constructs is through thorough, carefully documented, and replicable experiments [l]. Through empirical tests, designers can gain assurance that their language features actually meet the needs of the intended user community, and they can present evidence to support their claims about stylistic and language design issues. In 1971) Gerald M. Weinberg wrote The Psychology of Computer Programming [2] to trigger the study of computer programming as a human activity. Although most of the book concentrates on social and individual activity in programming, Weinberg also hoped to encourage language designers to include the psychology of computer programming as a new dimension in their design philosophy. Since the appearance of Weinberg's book, an increased number of researchers have conducted psychological experiments on programming language features. Using inexperienced programmers, three psychologists concluded that nested if programs are easier to understand than corresponding goto programs [3]. Larry Weisman listed and catagorized factors affecting the complexity of programs, began development of a methodology for studying the psychological complexity of computer programs, and conducted some experiments attempting to determine which factors or combinations of factors reduce complexity [U,5]. At the University of Indiana, memorization tests were used to measure the effect of program structure on understanding [6]. John D. Gannon, in his Ph.D. thesis, gathered empriical evidence to support or discredit specific language design decisions [7l- Recently, the author reported the results of an experiment which investigated the psychological soundness of a new control construct called the KAIL selector [8]. This report describes yet another experiment intended to experimentally verify hypotheses about programming language design issues. It explores alternatives for expressing sequencing constructs in computer-aided instruction programs. 1.2 The KAIL Project This work is part of the Automated Computer Science Education System (ACSES) project [9] which uses the PLATO CAI system [10] to teach elementary computer science. At this writing, PLATO runs on a CDC Cyber 73 dual processor with two million 60-bit words of extended core storage. Attached to the processor are over nine hundred plasma display terminals, which were invented by the PLATO group for this project. The peak comfortable system load is about four hundred simultaneous users. Under PLATO, all programs are written in a special system language called TUTOR [ll] which mixes FORTRAN-like flow of control with peculiar CAI sequencing and sophisticated answer judging facilities. The language KAIL is an attempt to influence the development of TUTOR by showing that modern control constructs and sequencing features can be combined with the higher level features of TUTOR. 1.3 General Complaints about TUTOR Peculiar sequencing rules (see section 2.2) make TUTOR code difficult to read; moreover, they are often a source of bugs. In order to understand a listing, an author must memorize the rules of unit , jump , help , inhibit , etc. Bugs arise because authors forget, for instance, that goto automatically returns to the end of the current main unit. Often, students become lost in lessons either because unanticipated function keys are active or normal PLATO defaults are inactive. These errors can generally be attributed to an author's mis- understanding of sequencing rules. In opposition to TUTOR, a CAI language which follows more generally accepted sequencing rules should encourage authors to write more reliable code. The experiment described in this report tests this general hypothesis. 2. THE LANGUAGES AND HYPOTHESES 2.1 The Two Languages of the Experiment The languages designed for the experiment emphasize differences in sequencing constructs and minimize other differences as much as possible. The common features include the KAIL selector [12, 13] which was previously tested for psychological soundness [8], input-output statements similar to those found in TUTOR, and declarations and expressions typical of high-level programming languages. One language, T, incorporates TUTOR sequencing, and the other language, K, contains an alternate specification for sequencing, similar to that proposed for KAIL. The important differences "between K and T are discussed in the sections below, and the pertinent pages of the reference manuals are contained in appendix A. 2.2 TUTOR and the T Language TUTOR'S approach to computer-aided instruction, in its simplest form, is based on a sequence of "display-response" frames. A TUTOR program repeatedly displays information on the screen and then accepts and processes student responses. A lesson author codes a particular "display-response" frame as a unit and strings units together to build a lesson using a variety of inter-unit connections. The language T is based on TUTOR. 2.2.1 Program Structure In T, a unit is coded as a procedure . A T program or lesson consists of a sequence of procedures optionally preceeded by global variable declarations : := lesson ; [ ; ] ; end : := | ; : := ; ; end Procedure nesting and local declarations are not permitted. Execution begins with the first statement in the first procedure and flows into other procedures via internally and externally initiated transfers of control. The system initiates an internal transfer of control to another procedure when it encounters one of the branch statements, jump , goto , or procedure call. : := jump J goto | If control enters a procedure via jump , entry into a lesson, or an externally initiated transfer of control, it is a main procedure ; and if control enters a procedure via goto or procedure call, it is an attached procedure . Attached procedures always return control to a procedure in the calling routine. For procedure calls, control returns to the statement textually following the call; for goto , control returns to the end of the main procedure. 2.2.2 Exception Handling Externally initiated transfers of control are called exceptions . A student presses a function key to raise an exception . : := next | back | help | lab | data | : := nextl|backl|helpl| labl| datal To process exceptions, the system maintains an internal pointer for each function key. This pointer is active or enabled , if the pointer references a specific procedure in the lesson; it is inactive or disabled if the pointer is null. The system transfers control to the designated procedure when a student raises an exception that has an enabled pointer. When control enters a main procedure, all pointers are set to null except the next pointer which is set by default to the textually next procedure. When execution reaches the end of a main procedure, the system automatically pauses to wait for student input; and if the student presses next , control passes to the procedure designated by the next pointer. By executing an exception enabling statement , a lesson author can explicitly set any pointer, including the next pointer. : := For example , help hint ; helpl big-hint; causes the help pointer to reference procedure hint and the helpl pointer to reference procedure big-hint. The additional semantics of exception handling in T depend on the function key name in an exception enabling statement. Next , nextl , back , and backl evoke lesson sequences , and help , helpl , lab, labl , data , and datal evoke help sequence E. Lesson sequences are completely independent of any execution history; help sequences are not. When a student raises an exception which evokes a help sequence, the system designates the current main procedure as the base procedure and sets an internal pointer to it. Following the help sequence, control returns to the beginning of the base procedure. As shown in figure 1, a help sequence consists of one or more main procedures. In a help sequence, the system not only enables the next pointer as usual, it enables the back and backl pointers as well and associates them with the base procedure. When execution encounters HELP- help HM J.€riC€ ma 1 r \ NEXT ! attached ,-t--i- a c r it t ached hiF; LA i ! at-h ::hed MF V attachec NE; attached >L ma i n NEXT \ \ // // BfiCK, //brcki ret u. r n t o b a. s e procedure Figure 1. Lesson and Help Sequences in T. If the help sequence is evoked from one of the main or attached procedures in the lesson sequence, the main procedure in the calling routine "becomes the base procedure to which control returns following the help sequence. 8 an endhelp in place of a simple end, the next pointer is set to the base procedure, and the help sequence terminates. 2.2.3 Modifications T modifications statements are defined as Modification statement> : := rewrite | echo | erasure | si ze I rotate | long These commands modify input-output statements. For the -write statement, rewrite , echo , and erasure set the mode, and size and rotate designate the character size and angle of writing respectively. The long statement limits the length of an input string. Modifications are set and reset dynamically, and their effect extends beyond procedure boundaries. 2.2.U Hidden Side Effects When the execution of a statement causes an action unrelated to the main function of the statement or not denoted by the statement name, this action is a hidden side effect . For example the T jump statement causes a screen erase. Hidden side effects permeate the T sequencing features. For each main procedure, the system automatically generates a screen erase at the beginning and a pause at the end. If a lesson author wishes to prevent these defaults, an inhibit erase statement nullifies automatic screen erase; and a jump statement transfers control without pausing. In lesson sequences, the system automatically associates next with the textually next procedure. In help sequences, it automatically associates back backl with the base procedure, and activates next as in lesson sequences except on encountering endhelp . Other features such as the returning goto can also be considered as hidden side effects. 2. 3 Hypotheses There are at least four problems with T sequencing constructs. Each has led to a hypothesis tested in this experiment. 2.3-1 Uniformity T lacks uniformity. One example is the nonuniform "behavior of procedures. Syntactically, procedures appear identical, but the semantics depend on whether a procedure connects into a sequence via jump , goto , subroutine call, textual sequential ordering, or student raised exceptions. A common error is to insert an attached procedure between two main procedures and then evoke it by falling through from the textually preceeding main procedure. When languages lack uniformity, many rules and exceptions to rules are needed. This causes infrequently used constructs to be forgotten, makes deviation from the basic norm difficult, and discourages casual users from gaining in depth under- standing. UNIFORMITY HYPOTHESIS: programs would be easier to understand and use if written in a language where syntactically similar constructs have similar semantics. 2.3.2 Orthogonality T lacks orthogonality. When language facilities are highly independent, they are said to be orthogonal . An example of interdependence in T is the tight binding of the semantics of lesson and help sequences to particular function keys. This binding discourages authors from using exceptions in other contexts where they might be useful. For example, in T it is difficult to code an on-page help sequence , a help sequence where help text appears on the same page as instructional text and where control resumes at the point of interruption rather than the beginning of the base procedure. One solution to this problem might be to introduce a special exception handling statement with its own rules and semantics, but this only compounds the problem. Features which are interdependent and tightly bound together lack generality and 10 thus, create a need for further extensions, rules, and exceptions to rules. ORTHOGONALITY HYPOTHESIS: an orthogonal language would be easier to under- stand, modify, and use. 2.3-3 Dynamic Execution T depends extensively on dynamic execution. T exception enabling statements are activated dynamically. This allows an author, for instance, to enable the help key in an attached procedure far removed from the calling routine where the exception may be raised. It also allows an author to sprinkle several specifications for a single function key throughout a procedure and activate them depending on the flow of control. T modifications also depend on dynamic execution. A common mistake is to set size and then forget to reset it. Dyanmic execution features are error- prone because they impose fewer restrictions on the author. DYNAMIC EXECUTION HYPOTHESIS: by properly limiting the dynamic nature of these and similar statements, it is likely that the resulting language would be less error-prone. 2.3.1* Hidden Side Effects T sequencing constructs have many hidden side effects. Each of these is useful and makes coding easier provided an author understands them all and uses them correctly. T authors commonly forget to inhibit unwanted side effects and often use explicit statements where basic defaults are already provided. Also, different needs may create situations which oppose basic defaults. These needs usually force authors to write awkward sections of code. Sometimes further extensions (e.g. on-page help) might alleviate these awkward situations, but the extensions often impose new defaults or additional side effects. The number of rules and exceptions to rules soon grow beyond an author's ability to remember. HIDDEN SIDE EFFECTS HYPOTHESIS: although hidden side effects usually reduce the amount of coding, it is likely that by eliminating most of them, a language would be easier to understand and use. 11 2.U The K Language K was designed to propose alternate solutions to T's sequencing constructs It stresses uniformity, orthogonality, and static definition and minimizes the use of hidden side effects. 2.U.1 Program Structure A K program is defined as : := lesson ; ; end : := | ; : := | I | : := on [ ] : := K enables exceptions statically and thus, avoids dynamic execution as a means of activation. If an exception is raised in a block containing an associated exception declaration, control immediately passes to its handler. After executing the handler, control continues with the statement textually following the handler unless a goto explicitly transfers control elsewhere. The principle of orthogonality is observed because the function key name in an exception declaration and the handler are totally independent. 12 To keep an exception active in a called procedure, the exception must be passed as a parameter. A parameter list in K is defined as : := | ; ::= [ result ] | on ::= integer | real | string ( ) When the system detects an exception in a procedure associated with an exception declaration in its parameter list, control immediately returns, the system raises the exception in the calling routine. 2.U.3 Modifications K modifications are defined as : := { Modification list>} Modification list> : := | , ::= rewrite | echo | erasure | size | rotate | long K modifications avoid dynamic execution problems because a modification only affects the statement to which it is attached. Thus, { size 0.75) [ at ^29; {size 3) write TITLE; at 605 ; write sentence;] writes "TITLE" in size 3, but writes "sentence" in size 0.75* When a state- ment to which modifications are attached includes a procedure call, the modifications carry into the procedure. For example, the following code places a box on the screen and then erases it. 13 drawbox(x,y ) ; { erasure } drawbox ( x,y ) ; procedure drawbox( integer x,y) ; draw x,y. .x+100,y. .x+100,y+100..x,y+100. .x,y; end ; 2.U.U Hidden Side Effects K contains no hidden side effects. The language requires every action to be explicit. 3. THE EXPERIMENT 3.1 Overview To test the hypotheses stated in chapter 2, it was necessary to devise a method to measure a subject's ability to understand and use a language. Several factors were involved. Personal factors : 1. Educational experience 2. Knowledge of programming language 3. Basic intelligence h. Motivation Observable factors : 1. Accuracy in coding 2. Speed at finding and correcting program bugs 3. Facility at finding and correcting bugs and at making modifications Together, these factors formed a model of subject understanding. This model of a subject's ability to perform programming tasks broke the hypotheses down into tests involving measureable variables. For the experiment, subjects were divided into two groups, the T language group and the K language group. The effects of personal factors were controlled for by grouping subjects so that background and motivation in both groups were similar. After receiving instruction in their assigned languages, subjects were asked to debug and modify a reasonably large lesson consisting of about 500 lines of code. Subjects performed this task on-line while the system gathered data. 15 3.2 Subjects The experiment took place during the Fall Semester, 1975 » and the subjects were students in CS 317. Professor W. Hansen, instructor, required each student to participate in the experiment. As part of the course, the students learned TUTOR and used it to code a substantial CAI lesson. In order to expose them to the additional language concepts needed for the experiment, the course included a section on KAIL. Each student read through lesson KAIDS [13], an introduction to KAIL, coded the control flow for a typical KAIL lesson, and took a quiz to test their comprehension. Based on this quiz and a background information sheet, subjects were divided into two groups. To accomplish this division, each subject received a score in the indicated range on the following items : quiz score (0 - 37), status at the university (FR=1, . . . ,C-1=5 , G2=6) number of CS courses taken (0 - 6), understanding of ALGOL 68 (1 - 5) KAIL (1-5), TUTOR (1 - 5 ) , PLATO experience (l - 10). Subjects were ranked according to the sum of these scores. (Those with identical sums were ranked alphabetically.) Using this ranking, subjects received a group assignment according to a predetermined random grouping schema. The quiz, background information sheet, and subject grouping schema are contained in appendix B. 3. 3 Lectures on T and K Once divided into groups, subjects learned their assigned language in a one-hour lecture given by the author. Since they were already familiar with 16 TUTOR and KAIL, it was only necessary to review the applicable language features for each group and emphasize important points. Because of scheduling problems , subjects attended these lectures in small subgroups ranging in size from one to six. Neither language group, however, gained an advantage by having more small subgroups. At these lectures subjects signed up for their session on PLATO; these were scheduled between 5 and 8 days, after their lecture. 3.U The PLATO Session The on-line portion of the experiment proceeded as depicted in Figure 2. •k on this T v UOU OCT i =---. v-- .*--. !•--. =s I;-', =. ) i ; ; >- ■- M overv i ew i nst r u.ct l oris tor s sect l or liebngg i rig find and fix lip to 10 bugS i. i nst ruct l ot to t or ■ mod i f v i ng sect l on modi fy the lesson according to 2 spec i f l cat i ons X Unless you hapf to finish this | section m les: '"> than 2 5 minute; you w ill be interrupted. I sto P; Figure 2. Overview of the Experiment 17 Each subject was asked to debug and modify lesson "intrep" on integer representations written in the subject's assigned language, K or T. The logical structure for "intrep" was as depicted in Figure 3. The solid arrows indicated paths through the lesson; the broken arrows indicated paths to the quiz. A student taking "intrep" was to complete every section before attempting the quiz (i.e., before traversing a broken path). 1 title page :at mtents -*5 — -=w: — duct conacc i ■>T"'*p.=i# a 'tTta."fc i on CL>I — ■: ! h-Ip I - f- I I ± Figure 3. Flow Diagram for "intrep". Solid lines indicate paths through the lesson; broken lines indicate paths to the quiz. Several on-line aids were available during the experiment: (1) A complete program listing. The listings of "intrep" for the the two languages appear in Appendix C. (2) An editor with the following commands: Fn to move forward n lines Bn to move backward n lines In to insert texx after line n Rn to replace line n Dn to delete n lines Xt to locate text t (3) A reference manual. The essential pages of the reference manuals for K and T are in Appendix A. These contained syntax diagrams and a minimal explanation of the semantics. 18 {h) Program execution. A subject could execute "intrep" to assist in locating and fixing program bugs. (5) The flow diagram for "intrep", figure 3. In the debugging section, subjects had 25 minutes to find as many of the bugs as possible. The system seeded the program with bugs one at a time and in a specific order. Thus, at any given moment, the program contained only one bug. Their task was to find it and fix it. All bugs were logic errors and typified those which might actually arise in programming "intrep". Naturally, bugs were selected which related to the hypotheses, but care was taken to keep them typical and thus, hopefully, fair. To help subjects identify a bug, the system directed them through a path in the execution of the lesson leading to the problem and then gave a one line explanation. To shorten the path leading to an error, the system began each path at a convenient point, not necessarily the title page. Sub- jects were able to execute "intrep" as desired to help locate the bug. When a subject pressed the term key and entered "editor" the program listing became available for editing. While editing, subjects could return again to the execution of "intrep" with the bug. Any changes to the code, however, were not reflected in the execution of the program. After subjects fixed a bug (or gave up), the system presented a solution to the problem. The subject's solution was graded later by hand. As an example, figures k t 5 5 and 6 show one bug and its solution, first in K and then in T. 19 3LK--.: screen overwritten. i, (tern editor ) TmeLENTKOBSUMSJITGNTS fin everyday, ordinary, base B integer N is represented as a . I nt r oduct 1 on N = Cb n b n _ x ... b.b^B = blpB^lixfed^fj'SQli: Repres*nfc^lB?ow bjB + b Hr e t hece E>t benamay sRepr eepnteafeihfcnN ' On 1 z Press a, b, c, or NEXT. Figure k. BUG: Screen Overwritt en 20 !l"j 1 T1LJ t e::5 remd 1 Tl 1 HS bULU I iUJN OT1 DdCK ; C at 3 2 1; { 5 i z at 9.@'5 : write backl got- i ndx ; <.5 ) write R. INTRODUCTIOr Hr j i na r B integer N i repress N = (b_h i T >- ! b 2 b 1 b )-g B n-1 Tl- i ♦ b 2 B' ♦ b ; B ♦ b : "H" Mre t her ■•_ : -_' : l i i °ti r li.i 1 6 at 19 15; ^,-,-H=.p..-r = i — at 1915; erase 2 + i O at 1705; . . .___ J -L _ 1 9 2.8 ■--■ i i~l 1 S 1 3 on I v one -■" i tr ,_ i __-__-_ ■inrsct ns- .Mays to represent IM? ngth (reply) ; at 17 10; erase e of many representations for N. 3-2'f other systems are discussed In ill ed Radi> 2 4 Rfter 1 me 2 , :,ert IHEX1 Figure 5- Solution in K 21 n : i nutes rema 1 n 1 ns UUR bOLU i ION z e .0' I t "••" edui e i t i< Je> : back ritle; tern " index" , 'contents" ; erase ; at ij.0; size 1.5; write TABLE OF CONTENTS; si 1 oc 4= 1 1 1 7 ; tnr 4= i ; [ while I tnrinrsctns : [ if i tcompl (tnr) : at 1.6c- 2; write #; ] ; topics; I oc 4= 1 oc + 3@0; x_ t i r ■r-, *"■ J. 1 1 •'•■■ j ; 1 -. C 4= r L I f ,-i,-,i- 1 •■^•-■K ■-; .0 18 i ■- at 1 ,•-„- ? writs F'r e SS 3 , t L i f ! <*= >k i wr i t wr ite ■--, > — NEXT . * wh erst -_ 1 ■ R-3U.:=. ^ ■ i f key i ' a ° : i- otc i ntro; ne> ■ 4- 0.-J+- i ntro i 3020 1 : goto mi xedrad; ?oto fibonac: i "d" , if qok j u.mp qu. i z ; I a .t 3 105; write Replace the goto ' s in lines 19 through Z2 with xy[L!t2 ' -• Press !he:-:t) Figure 6. Solution in T 22 After completing the debugging section, subjects were given an additional 25 minutes to modify a bug-free listing of "intrep" according to the following specifications : Hdd juence so that whenever [help] is pre; ': 1 :"":■ t~- t £=■ T~i t '—• t~> & :-' f ised throughout 4-1 Lge intended to describe r !t lee ! K?-" .-J - ■= ¥-■ 4\Zl t j" e V : Z K~\ i 3.'y " k. !. 1 .-j q pp e O I T" f - e t a b : ^ flow of control, but just have the aye ten" th< -.j=.ri I T itti While editing, subjects could refer to these specifications as often as desired. The system imposed no constraints on how subjects were to accomplish the task; thus, they could modify the program as they wished. Their efforts were graded later by hand. 23 k . RESULTS k. 1 Measures of Ability As indicated in section 3-1, a subject's ability to comprehend and per- form was measured in several ways: (1) accuracy in coding: (a) subjective scores (b) number of coding errors (c) type of coding errors (2) speed at finding and correcting each program bug: (a) time spent executing "intrep" (b) time spent searching the listing (c) time spent altering the listing (d) time spent using the aids (e) total time (3) facility at finding and correcting each bug and at making modifications : (a) number of times "intrep" was executed for the debugging section, and number of times the specifications were accessed for the modifying section (b) number of times editor searching commands were used (c) number of changes of the listing (d) number of times aids were referenced (e) total number of editing keypresses These performance statistics appear in appendix D, E, and F, and the important results are discussed below. 2k k.2 Difficulties Several difficulties were encountered during the experiment. Two K subjects and one T subject dropped out of the experiment. This left an imbalance of 13 K subjects and 15T subjects. T-tests showed, however, that this did not bias the experiment. During one session, the system crashed. Because this problem was anticipated, experiment sessions on PLATO were limited to at most four sub- jects, two from each group. The crash caused the loss of partial data from two T language subjects and one K language subject. In other sessions, three subjects accidentally terminated the experiment prematurely. A T language subject accidently hit stopl , a built-in escape feature from any PLATO lesson. Misunderstanding the directions in the modifica- tion section caused two K language subjects to prematurely press the combina- tion of keys provided for exiting the experiment. In all of these cases, partial data was lost. In analyzing the results, an unexpected number of language interference errors appeared. K subjects used a few TUTOR-like constructs, and T subjects used a few KAIL-like constructs. Despite the interference, it was relatively easy to decide what caused a subject to choose a particular syntactic form or expect some specific semantic action. k. 3 Background As discussed in section 3.2 subjects were grouped so that background and ability would be equal. An analysis of the statistics confirmed this; there were a few specific areas, however, in which unexpected differences occurred. A typical subject in the K language group was a first year graduate student, while a typical subject in the T language group was nearly, Means T T 3.60 1.97 2.U0 1.62 25. hO 1.18 U.i+O .19 29. ut .39 25 but not quite, a senior at the university. This was almost close enough to b be considered significant (p ^_ .059). More K subjects had taken CS 323, system programming (p <_ .02U); and K subjects better understood ALGOL-W (p _;_ . OlU). Both groups had similar PLATO experience and KAIL quiz scores. Table 1 summarizes the background and ability data. Personal Data K university status U.38 course taken 3.38 language understanding 27-38 PLATO experience U.31 quiz score 28.92 Table 1. Summary of Background and Ability Data k.h Debugging Statistics The debugging section of the experiment was completed by 12 of 13 K subjects and 12 of 15 T subjects. The other k encountered the difficulties discussed in section U.2. Table 2 shows the extent to which subjects from each group worked the problems. T subjects worked more problems (5.8 on the average) than K subjects (U.8 on the average), but this is not statistically significant. Bugs 1 through 5 are discussed below; not enough subjects worked problems 6 through 10 to allow statistical analysis. U.U.I Bug 1 The first bug tested the dynamic execution hypothesis. A title in "intrep" was supposed to be written in size l-5» "but the statement write C. FIBONACCI REPRESENTATION, wrote it in size 0. K language subjects should have fixed this bug with { size 1.5} write C. FIBONACCI REPRESENTATION; and T language subjects should have fixed it with size 1.5; write C. FIBONACCI REPRESENTATION; size 0; 26 Problem Group Completed Worked on Only Looked Number Problem Problem at Problem 1 K 11 2 T Ik 1 2 K 13 T lk 3 K 10 2 T 12 2 k K 8 2 T 11 2 5 K 5 3 T 8 3 6 K 2 2 T 5 2 7 K 1 T 3 8 K 1 T 2 9 K T 10 K T Table 2. The Extent to which Subjects Worked the Problems. There were 13 K subjects and 15 T subjects. Only 1 of ik T subjects who attempted the problem solved it correctly, whereas 5 of 11 K subjects solved it correctly (table 3). An exact [lk] showed this was significant (p <_ .039)* Moreover, only 3 subjects in the T group made any attempt to reset size , and another 3 had the failure to reset size as their only error. The 11 K subjects made 8 errors; the lk T subjects made 18 errors. Timing and editing data showed essentially no differences. 27 K number of errors 1 > 2 5 k 2 1 9 !+ Table 3. Error Contigency Table for Bug 1 This data supports the dynamic execution hypothesis. Subjects committed fewer coding errors when using static modifications. k.k.2 Bug 2 The second bug tested the hidden side effects hypothesis. In the K listing, an unwanted erase was placed at the end of a help sequence; in the T listing, a necessary inhibit erase was deleted. Of 13 K subjects, 8 performed well; of lU T subjects, 8 also performed well. Two T subjects, however, placed inhibit erase in the wrong procedure; apparently, they under- stood the problem but not the rules of inhibit erase . The K group introduced 0.23 errors per person; the T group introduced 0.31 errors per person. Although K subjects spent less time altering the listing (p <_ .009), this only indicates that inserting takes more time than deleting. No other significant j differences emerged. Apparently, it was no more difficult to detect the absence of an inhibit erase and insert it than to find and delete an unwanted erase . This bug was probably too simple to reveal any significant information about the hidden side effects hypothesis. ; k.k.3 Bug 3 The third bug considered program structure and exception handling and tested i hypotheses relating to uniformity and dynamic execution. Coding CAI lesson ! sequences with traditional constructs seemed to demand a level of indirection 28 (goto label 1 ... 1: call procedure p) , but T sequencing (continue at procedure p) did not. Figure 7 shows K's indirect method of going back to the previous frame; figure 8 shows T's direct method. The bug was seeded into the programs by deleting on back goto ttl; from K's listing and back title; from T's listing. Only 1 of 10 K subjects who completed the problem solved it correctly, whereas 7 of 12 T subjects solved it correctly (table h) . An exact test [li+] showed this was significant (p <_ .026). Of the 10 K subjects, 8 entered goto ; moreover, 2 T subjects used KAIL constructs but also entered goto . The 10 K subjects made l6 errors; the 12 T subjects made only 9 errors. On the average, the T subjects scored 8.1 (10 possible), and the K subjects scored 5.6 (p <_ .013). Moreover, even though the amount of code to be inserted for a correct solution was nearly identical for both languages. K subjects spent more time inserting (p <_ .030), perhaps because they were unsure of themselves. K number of errors 1 > 2 1 i 3 ' ■ " 6 7 5 Table h. Error Contingency Table for Bug 3 29 Contrary to the general hypothesis (section 1.3), traditional sequencing in K resulted in poorer performance. The K language subjects seemed confused by going one place to get somewhere else. ttl. title; indx. index; procedure title; end title; procedure index; on back goto ttl: end index; Figure ?• Handling back in K 30 procedure title; end title; procedure index; back title; end index; Figure 8. Handling back in T On the other hand, 2 T subjects placed " back title" in the wrong procedure. In both instances, these subjects may have expected back to remain active once it was enabled since under this assumption, they essentially solved the problei correctly. This lends support to both the uniformity and dynamic execution hypothesis The 2 T subjects seemed to be confused about the interaction between dynamic activation of exceptions and entrance into a main procedure. k.k.k Bus k ei The fourth bug tested the hidden side effects hypothesis. In T, the jump statement, in addition to directing control to a designated procedure, erased the screen. The goto only directed control elsewhere and could often be used in place of a jump . Bug k was a screen overwrite problem. An erase was left out of the K listing; a goto , instead of a jump , was used in the T listing. 31 In T, 10 of 11 subjects gave a correct solution, but only 1 replaced the goto with a jump . The rest, 9 of 10, gave an alternate solution: they properly inserted erase in the appropriate procedure. Clearly, T subjects found it easier to fix the bug directly. In K, 5 of 8 solved the problem correctly. Both groups fixed the bug in the same manner; however, K subjects inserted more often (p <_ .0^5), activated more searching options (p ^_ .015), and pressed more editing keys (p <_ . 0U0). There is no clear explanation why K subjects edited more than T subjects. The results lend support to the hidden side effects hypothesis because the explicit solution was preferred. Underlying implicit effects were more difficult to find. I.U.5 Bug 5 The fifth bug tested the hidden side effects hypothesis. Because help sequences in T returned to the beginning of the base procedure, procedures had to be broken at unusual junctures; and because the screen erased automatically when entering a main procedure, normal defaults had to be overridden. Solving both these problems at once resulted in particularly awkward and unusual code. Figure 9 shows the essential statements in a section of T code where unusual coding was necessary. The inhibit erase prevented an unwanted screen erase on entering the help sequence, mrhelp. The erase in mixedr2 was essential because the automatic erase which normally would have occurred in falling through from mixedrl was inhibited. Figure 10 shows the corresponding section of K code. In both instances, the bug was created by removing the erase . Of the o T subjects who attempted to solve this problem, 3 removed inhibit erase from procedure mixedrl which fixed the immediate problem but introduced another bug. All 5 K subjects who attempted this problem solved it correctly. K subjects scored lU.O of 15 in a subjective measure which 32 adjusted the score to give more credit to those who did well quickly, but T subjects scored only 9.6. This is significant (p <_ . 0^9). This supports the hidden side effects hypothesis and shows how combinations of hidden side effects interfere with one another ot produce unexpected bugs. procedure mixedrad; end mixedrad; procedure mixedrl; inhibit erase; help mrhelp; at 3020; write Press NEXT to continue. ; end mixedrl; procedure mixedr2; erase ; end mixedr2: Figure 9- Section of T Code 33 procedure mixedrad; on help mrhelp ; at 3020; write Press NEXT to continue.; pause next ; erase ; end mixedrad; Figure 10. Section of K Code h. 5 Modifying Statistics The modifying section of the experiment was completed "by 10 of 13 K subjects and 13 of 15 T subjects. The other 5 encountered the difficulties discussed in section k.2. Refer to section 3.^- for the spcifications used in the modifying section of the experiment. K subjects made several errors related to the hypotheses of the experiment. Most of these related to the hidden side effects hypothesis and were probably a result of TUTOR language interference. On 11 occasions of a possible 20, no pause occurred after help messages. Subjects may have been counting on a nonexistent automatic pause at the end of a procedure. K subjects also seemed 3U confused about erasing after help messages: in 5 places in connection with the first specification, an unnecessary erase appeared; and in k places in connection with the second specification, the message was left on the screen. Contrary to the general hypothesis (section 1.3), K subjects continued to include errors of the from goto A total of 10 of these errors crept into the listings. This further supports the results found in bug 3 and shows that this kind of indirect transfer to a frame processor is probably unnatural. T subjects also made several errors related to the hypotheses of the experiment. With regard to the uniformity hypothesis, T subjects positioned 5 of a possible 26 attached procedures in sequences of main procedures so that the attached procedures would also be executed as main procedures. In the second specification, no one solved the pause problem correctly. With the expection of those who did even worse, everyone included a double pause in approximately the form shown in figure 11. One pause occurred explicitly; the other occurred implicitly at the end of the help sequence. This relates to the hidden side effects hypothesis, but it also relates to the orhtogonality hypothesis because the implicit pause at the end of the help sequence is bound to rather than separate from the semantics of the help sequence. Other errors were related to the hidden side effects hypothesis. On 9 of 26 occasions, subjects did not take advantage of normal defaults and referred unnecessarily to next , back , and backl . T subjects omitted endhelp in 5 of 26 help sequences and forgot to include inhibit erase in 6 of 13 instances where it was required. 35 procedure sorry inhibit erase ; at 3010; write sorry; pause next , back , "backl ; at 3010; erase 5; endhelp sorry; Figure 11. Double pause Problem All subjects in both groups coded, the control for the second problem improperly. The specification called for control to resume at the point of interruption in the quiz. Most subjects, however, wrote code which simply restarted the quiz; and no one even attempted to force control to resume at the interruption point. The problem was difficult enough that subjects were unable or unwilling to solve it. In general, observations on the modification section of the program support the hidden side effects hypothesis. In 25 minutes, 10 K subjects made made 20 hidden side effects related errors and 13 T subjects made 36 such 'errors. The K errors seemed to be based on assumptions learned from TUTOR. 1+.6 Correlations Appendix F contains the correlation data for the major factors in the i experiment. Using score as a measure of performance, table 5 shows how it correlated with other factors. The number of years of university training had i no correlation with performance. Other background data correlated highly with scores from the K language group, but not so highly with scores from the T language group. Programmers with greater insight and ability performed better 36 in K, but insight and ability did not seem to matter so much in T. Perhaps this was because T, being like TUTOR, was more familiar to all subjects, whereas K, contianing less familiar concepts, allowed experienced programmers to perform better and hindered less experienced programmers even more. 37 n 0) 5 h o ^ o o n cu b£ 5 a 3 § b£ q 3 a ,0 >-) CU 13 H ft 3 m () Jh U o t"J u CO U u be B| c a M M § 3 J & cu M T3 bfl bD C C so ■H W) G bO hi) •H bD 1 bo 3 P bD ,o cu 3 cu CO T) ,Q -d El 11 o t) •H W) CO -P c cu 0) b CU cd a 1 cu CO 3 o > o bD tH ►J J