LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 51C84 liter no.S60-835 cu>p. 2/ ; >;-> ^ *■&**:. I r. ;: [J \ 5 I I - Report No. UIUCDCS-R-76-832 7k ' a. t v\ t AN AUTOMATIC VERIFIER FOR A CLASS OF SORTING PROGRAMS by PRABHAKER MATETI October 1976 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS The librwy of Uw> JAN 20 1977 University of Illinois «t Urtena -Chair gn -■'•• r inn' •■C| i. ■ ■■■■■i Report No. UIUCDCS-R-76-832 AN AUTOMATIC VERIFIER FOR A CLASS OF SORTING PROGRAMS by PRABHAKER MATETI October 1976 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 i 3 This work was supported in part by the National Science Foundation under Grant No. NSF EC 41511 and was submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science, October 1976. •oto jr" •fi e .. « ! 1 11 ACKNOWLEDGMENTS I am indebted to Jurg Nievergelt, my thesis advisor, for his encouragement, interest, help, advice and patience. I am grateful to Jayadev Misra who had significantly influenced my thinking about programming; to Dave Plaisted for his help during the development of the theorem prover; to Dave Eland who always helped me out of TUTOR troubles; and to Ron Daniel son without whose critical read- ing this thesis would have been even poorer in style. I am thankful to Bert Speel penning for a number of useful discussions, and to Wilfred Hansen for his encouragement and criticism. i * c 2 ■ Table of Contents IV INTRODUCTION 1.1 Automatic Program Verification 1.2 Limited Domain Program Verifiers . 1.3 Program Verification in Teaching Programming 1.4 The Sorting Program Verifier VERIFIER 2.1 Inductive Assertion Method of Verification . 2.2 The Programming and Assertion Languages . 2.3 Verification Condition Generator . . . . Page 1 1 3 4 6 7 9 15 THEOREM PROVER 24 3.1 Basic Theorem P rover 3.2 Basic Theorem Prover is a Decision Procedure 3.3 Evaluation of Backward Functions 3.4 Extended Theorem Prover is a Decision Procedure 3.5 Counterexample Generation GENERALITY 4.1 Constraints of the Present Verification System. 4.2 Partitioning 4.3 Closure and Local Implication 4.4 Examples 4.5 On the Applicability of Partitioning SORTLAB 88 5.1 PLATO 88 5.2 ACSES 90 5.3 S0RTLAB--A Programming Laboratory 90 25 44 57 64 » 68 i 70 70 1 71 72 73 86 "1 6Z Table of Contents (Continued) Page 6. DISCUSSION 6.1 A Critique of Program Verifiers 6.2 Previous Work Related to This Thesis .... 6.3 Salient Features of the Sorting Program Verifier 6.4 Conclusion REFERENCES APPENDIX VITA . . 100 100 107 109 112 114 118 119 i M< 1 . INTRODUCTION This thesis discusses primarily the theoretical basis of a verifier for sorting programs designed for use in an automatic tutor for computer programming. A system, called SORTLAB, has been built em- bedding the sorting program verifier. SORTLAB allows a student to write programs for sorting an array, and decides whether these programs are correct; if they are not, it generates counterexamples. SORTLAB has been implemented on the PLATO system for computer-aided instruction [Alpert and Bitzer 1970], as a part of an Automated Computer Science Education System, ACSES [Nievergelt 1975]. The development of SORTLAB required several different components: in particular: - a programming language convenient enough to write programs for sorting an array and not necessarily other programs, - an assertion language with just the required expressive power to assert the state of an array with respect to the order of its elements, - special purpose techniques for verifying these programs with assertions, including a theorem prover for the class of lemmas generated by the verifier. i ■i.i«* 1 .1 Automatic Program Verification With the increased concern for program reliability, the verifica- tion of programs is receiving greater attention than ever before. The verification process consists of checking if the program meets its H m c specifications, namely, that it always terminates, and when it does, certain variables have a desired property provided the input given to the program meets the input specification. The inductive assertion method of proving programs considers these two problems separately: That the program meets its input/output specifications is proven separately from that of proving termination. The method also requires that an invariant property about the program variables be given for e^jery loop. Given these specifications and loop assertions, a set of mathematical lemmas are generated, which depend on the assertions given, and the semantics of the programming language used. If these lemmas are true, the correctness of the program is guaranteed; thus proving that a program meets its specification is equivalent to proving a certain set of lemmas. This is the crux of the problem. Much of the verification process is of a yery mechanical nature, and unless a large part of the process is carried out by the computer, few programmers would be willing to hand- verify their programs. A number of program verifiers have been constructed (see survey by London [1972]), requiring varying degrees of human intervention. However, these are far from being helpful to a programmer for several reasons. Typically, human aid is required in pruning the proof-trees. An ordinary programmer is not trained in theorem-proving, and is usually not interested in how these lemmas are proved. In addition, if the program is incorrect, the verifiers cannot provide assistance either by generating a counter- example, or by pointing out where the error lies. Finally, the verifiers are slow in operation, even for small programs. These conditions combine to tempt a programmer test-run his programs rather than submit them to an automatic verifier! 1 .2 Limited Domain Program Verifiers The failure in constructing verifiers that are mechanical aids to program writing can be attributed largely to the ambitious approach taken in building these verifiers. Except for the earliest of the veri- fiers [King 1969], the others have been increasingly ambitious in the variety of programs they intended to verify. The wide scope of programs being proven requires that the programs be written using elementary but powerful operations. Further, the assertions, and hence, the lemmas generated have to be formulated in first-order predicate calculus (or the equivalent), which is theoretically undecidable. By increasing the power of the theorem provers, we not only make them nondecision procedures, but they also lose a sense of direction toward their goal. A large number of useless inferences are then generated. Even among the decidable domains of problems, the theorem provers must be carefully designed in order to yield a decision procedure that works in practice. A "good" theorem prover should prove a large class of theorems that are often encountered very quickly, while it may take a while to decide about others. A verifier and its theorem prover can become simple, if they incorporate certain aspects of the semantics of the problem domain. For example, most well -written nonnumeric programs manipulate their data structures in a "disciplined and uniform" way, which is as yet not formally characterizable. The verification lemmas arising from such i .JT' if; ■ c r programs seem to be of a different nature from those that may arise in ordinary mathematics, say, number theory. If this is indeed the case, the underlying formal system may be decidable. If so, an incorrect pro- gram may be proven to be incorrect, counterexample generation may be feasible, and fast theorem-proving procedures may exist for the specific class of lemmas. Strictly speaking, e\/ery program verifier constructed so far is a limited domain verifier. For example, the programs being verified are often limited to those that operate on integer-valued variables. But we mean to limit the domain even further. Some examples of such domains are programs operating on linear arrays with no arithmetic, those using lists, binary trees, etc. It is doubtful if it would ever be possible to construct suc- cessful general purpose program verifiers. On the other hand, practical verifiers dealing with programs from a limited domain of discourse can be designed. This thesis provides one such example, namely, a verifier for in-place sorting programs which is being used in an automatic tutor of computer programming. 1 .3 Program Verification in Teaching Programming It is important that a student programmer realize the need for program reliability. A concern for the correctness of programs at an early stage in one's education has great impact on one's attitudes to- ward programming in later years. As exemplified by Dijkstra and others, a systematic method of designing abstract programs depends heavily on the correctness proofs of programs. The "elegance" of a program is usually directly proportional to the ease with which it can be proven correct. There can be no question that one's understanding of one's own program is increased greatly after inventing the loop assertions for the program. Quite often one discovers better ways of writing the program. In teaching programming, one would like to supervise the pro- gram design process by the student, as well as examine thoroughly the finished product. Both these aspects are amenable to computerization, particularly if an interactive computer system is available. The teacher- program supervising the program design process should be an expert in the programming problem domain, must have an "opinion" about various de- sign methodologies, and, perhaps more importantly, be able to converse with the student in a reasonable language. If the problem domain is sufficiently simple, such teaching programs can indeed be designed. For an example of such a system, see [Daniel son 1975]. On the other hand, a teacher program examining the student's finished program will not, and should not, consider the design process. Regardless of how it was constructed, judging the program's correctness and elegance should be its concern. The teacher program may, at one extreme, simply test run the student program, or, at the other extreme, attempt to formally verify the student program. Such teacher program should contain at least a program editor, a run-time system, a program verifier, and a counterexample generator. Apart from these technical qualifications required of the teacher programs, they should be fast enough to give interactive response to the student. These considerations lead us to write a special ized teacher program with built-in knowledge of a programming domain resulting in an interactive programming laboratory, SORTLAB, wherein the student can prepare a sorting program, and use the program verifier iteratively until a correct program is obtained. 1 .4 The Sorting Program Verifier 1 v. • A We have chosen in-place sorting as the limited domain of dis- course in SORTLAB because of two main reasons. First, every program verifier constructed so far has verified several sorting programs; their authors quote, quite often exclusively, these examples. This gives us a basis for comparison. Secondly, sorting programs are perhaps the most used examples in introductory programming courses. The verifier can actually prove any program., sorting or not, written in our mini-programming language and whose behavior can be asserted in the assertion language. (See Sections 2. 2 and 3.1 for a des- cription of these languages.) If the program is not proven correct, then there must be mathematical "lemma"(s) generated from the program and its assertions which are false. The verifier can generate counter- examples to these lemmas. m 2. VERIFIER Every program operates on a certain set of data objects and aims to produce an output set of data objects with desired properties. A subset of these data objects, the input, is given to the program, and the remaining data objects are the result of program execution. Quite often, the input changes in its structure, data objects get created or destroyed, their structure and relationships change D The program is expected to realize a desired property on the output only if the input meets certain requirements. To this end, the programmer asserts what relationships are to hold on the input data objects, and what holds on the output. 2.1 Inductive Assertion Method of Verification Given the input and output assertions, say and iJj, we are in- terested in verifying that the program P behaves properly, i.e., whenever P is given input satisfying , the output satisfies \p, if and when P terminates. Notationally, following [Manna and Pnueli 1974], let us express this statement by: {cj> P ^} (2.1) The program P is said to be partially correct with respect to

xj+1 then 5 exchange xj with xj+1 6 else 7 end if * 1 < J < I * N & A(1;J) £ XJ+1 & A(1;I) < S(I+1;N) 8 endscan * 1 < I < N & A(1;I-1) < S(I;N) 9 endscan * S(1;N) 10 endproc Abbreviations A( for S( for array (s,t) < sorted (u,v) for array ,( sorted ( array (s,t) < array_(u,v) and sorted (u,v) Figure 2.1 A Bubble Sort Program with Assertions 11 more general assertion language will use elementary, but powerful, atomic predicates. This use of elementary predicates makes it diffi- cult to lump together all related predicates. The loss of power of ex- pression in a limited assertion language is compensated for by the large and recognizable chunks of properties in the assertion. Further, while it has been advocated that theorem provers make large inferences > it appears necessary that related information should be recognizable as such before large inferences can be made. These considerations led us to design a mini-programming lan- guage and an assertion language which are specific to the sorting of arrays. Formal specification of the language is given in Chapter 5. Below we touch upon only the salient features. 2.2.1 Programming Language Operations on Keys It is a well-recognized principle in program design that basic procedures, specific to the particular problem and the data structures being used, should be developed and used so that data integrity may be preserved [Dahl et al . 1972], Sorting programs must conserve the keys they are sorting. Hence, we provide two basic operations: exchange and insertion of keys, and forbid value assignments to the keys of the array. This guarantees that the elements of the array are conserved throughout the program. Therefore, our verifier need only prove that the array is sorted. i MO CHUB 12 Operations on Array Indices Successor and predecessor functions on the indices ("ptrs") of the array provide sequential access to the elements. A ptr variable may be assigned the value of a ptr expression , which is of the form + j rather than j + 1 - c < i or i = j rather than 1 + < j and j + < i . 2.3 Verification Condition Generator We first discuss the generation of verification conditions of a simple loop program segment W with a loop-free body S. W: while B do endwhile (2.2) This is then generalized to cover arbitrary procedures. Two general methods for the generation of verification conditions are forward sub- stitution and backward substitution [King 1969]. 2.3.1 Forward Substitution Let w be the entry assertion of W, and <(>,. be the entry asser- tion of S. We then symbolically execute S on s to obtain an assertion Sf(c). Then the exit assertion of W is generated as VIM Eg W and not B or_ Sf (_) and not B (2.3) The lemmas to be proven are u and B logically impl ies 4>~ Sf (-) and B logically implies cj><~ (2.4) (2.5) I An ■ i S: 3 16 Proving (2.4) and (2.5) guarantees that the entry assertion ~ of S will be true each time S is entered. The assertion Sf(<(> s ) can be obtained by forward substitution as follows: If S is empty then Sf( s )is the same as <(>-. Otherwise, let S be a concatenation of SI and S2, where S2 may be empty. Then we obtain Sf( s ) by recursively applying rules Fl and F2 defined as follows: Rule Fl (applicable iff SI is an assignment statement) Let SI be u +■ t where t is an expression then Sf (cf> s ) is S2f ( subst u for u in <0 and u = ( subst u for u in t) (2.6) Rule F2 (applicable iff SI is an If statement) Let SI be if Bl then S3 else S4 endif Then Sf( s ) is S2f(S3f (W and Bl ) or S4f(W and not Bl)) (2.7) where subst y for z jn^ F stands for the expression obtained by substitu- i ting y for all occurrences of z in the expression F. The variable u refers to the previous value (before SI) of the variable u; thus (2.6) 17 asserts the existence of a value for u which satisfied » prior to SI which is to be used in the expression t. This introduction of existen- tial quantifiers causes certain technical difficulties to our theorem prover (see Chapter 3). Hence we have chosen to abandon forward substi- tution, even though it seems appealing due to its close association with ordinary execution of programs, and to adopt the backward substitution method, which does not introduce any quantifiers. 2.3.2 Backward Substitution Without loss of generality, let the given loop invariant be the exit assertion of the loop body S. Given the exit assertion \p~ of S, we generate an entry assertion _ such that {~ S ^<.}. It should be noted that several such assertions s exist, one of them being the trivial false . However, the - generated by backward substitution is such that, for any , if {<)> S ^ s > then c}> logically implies <(><.. Now let ty M be the exit assertion of W, and \p .. be the exit as- w o sertion of the loop body S. We can symbolically "unwind" the execution in the backward direction and obtain the entry assertion of S as S is Sb(ij; s ) = subst t for u in_ S2b(\p s ) (2.11) £ i t B1.2: Let SI be exchange x with x b where a, b are ptr ex- pressions. Then, Sb(iJv) = exchb x 3 with x b in S2b(ifr s ) (2.12) B1.3: Let SI be insert x below x, where a,b are ptr- expressions. Then, Sb(ify) = nsrtb x a below x b in S2b(^ s ) (2.13) We postpone (to Chapter 3) an accurate description of these inverse func- tions exchb . . ., and nsrtb . . . , as their evaluation plays an important 19 role in our theorem proving. Intuitively, these functions reproduce the situation(s) which must have existed prior to the exchange or insert. Rule B2 (applicable iff SI is an vf- statement) Let SI = If Bl then S3 else S4 endif Then Sb(i|/ S ) = S3b(S2b(^ s )) and Bl or S4b(S2b(ip s ) ) and^ not Bl SI is a call statement Let the call statement and called procedure be as shown below: * a call Q(s,t) : (u,v) * 3 procedure Q(a,b) : (c,d) [procedure body] Q * ^ endproc where a and b are the two distinct input variables of procedure Q re- ceiving values from the ptr expressions s and t respectively; the lists given after ":" are the output parameters; a is the given entry assertion to the call statement, and 3 is the generated exit assertion of call .urn 20 looking at the statements below call ; , ip n are the given entry and exit assertions for the procedure Q. We should prove the following two lemmas: a logically implies Q (2.16) ■ i : f •J* I id' t if S inn 3'! ■ *f , '€ ■» '" * 4 unmod ified parts of a wrt (s,t) and \\j q logically implies 3 (2.17) where <|> n> ip , unmodified parts of a are obtained from 4> n , ^ n , and a as H Q WW described below. The entry assertion n should not have any ptr variables other than a or b because these are not defined on entry. We substitute in cJ) Q the expressions s and t for a and b, resulting in (J)f. The exit assertion i^ n should not have ptr variables other than a and b or those contained in ptr expressions c and d. We substitute in \pQ S s, t, u and v respectively for a, b, c and d to obtain ^ Q . Note that the ptr variables u and v are substituted for expressions c and d. Also note that if c or d contains either a or b then the ptr variables u or v will be equal to an expression involving s or t. The substitution of s and t for a and b is valid because the variables a and b are not subject to assignment in procedure Q. Recall that procedure Q can permute elements belonging to the segment array (a,b). Thus, those predicates of a not including segments which are strict subsegments of array (s,t) will still be true upon exit from Q. Such predicates are collected together in unmodified parts of a wrt (s,t). We postpone the description of this function to Section 3.3.2, 21 When the program consists of more than one procedure, we must prove the lemmas (2.16) and (2.17) for each procedure call, and further prove that the called procedures meet their specifications. To generate the entry assertion for the body of a given pro- cedure, we begin at the bottommost and innermost loop and successively generate the entry assertions of loops as described above. If p (2.15) Clearly, the proof of all these lemmas guarantees that { | P | ip} An example of lemma generation appears in Figures 2.2 and 2.3. (2.1) ;5» uw 22 a?< ' I ■in . . ■« 1 procedure sort (n) * TRUE 2 i «- n 3 while i >_ 2 do 4 j - 1 5 while j <_ i-1 do 6 if xj > xj+1 then 7 exchange xj with xj+1 8 else 9 en di f * 1 1 J < I < N & A(1;J) <_XJ+1 & A(1;I) < S(I+1;N) 10 endwhile * 1 i logically implies subst i-1 for i vn_ 12* (V2) where stmts[6. . .I0]b(10*) e the generated entry assertion of body of loop 5. = X-; - x -_li and 9* or x. > x.,, and exchb x- with x.^ in 9* J j+1 — j J + 1 j j+1 — where 9* subst j+1 for j in_ 10* - for loop at 3: 12* and i > 1 logically implies stmts[4. . .12]b(12*) (V3) 12* and i < 1 logically implies sorted (1,n) (V4) where stmts[4. . .12]b(12*) e subst 1 for j jm ( subst i-1 for i jn_ 12*) and j > i or stmts[6. . .10]b(10*) and j < i - and for proc body: true logically implies subst n for i in ( V5 ) sorted Q ,n) and i < 1 or stmts[4. . .12]b(12*) and i > 1 1 Figure 2.3 The Five Verification Conditions Generated for Bubble Sort ; . ' *e ■in c 1 '■•» iHl'l 1- inn * j S |M * e ■ Mill Itllli 2 ;S)n is : ;swi < ■Mil xi! * ■Ml jlilil in ■ c: ■ 'MM 1 » l kiiM i * luni i jfc)n i 41 * 4 II 1 : 24 3. THEOREM PROVER In this chapter, we describe a procedure for proving or disprov- ing a theorem whose premise and conclusion are augmented well -formed for- mulas. Well -formed formulas (wffs) are the sentences of the assertion language (see Chapter 5). Augmented wffs involve the functions subst, exchb and nsrtb which map a pair consisting of a wf f and programming language statement into a wff. The details of these mappings will be given in Section 3.3. The proving or disproving of a theorem is done in two phases: in the first phase, the augmented wffs are converted into wffs; in the second phase, the actual proof begins. We discuss the second phase first, as the evaluation of the above functions involves the concept of partitioning which is an important part of the second phase. The basic theorem prover is described in Section 3.1. A proof that this basic theorem prover is a decision procedure for theorems stated as wffs is given in Section 3.2. The theorem prover is then extended (in Sec- tion 3.3) to include evaluation of the aforementioned functions. The chap- ter concludes with Section 3.5 where the generation of counterexamples is discussed. The treatment will be informal. The level of rigor in the proofs is comparable to that generally found when discussing combinatorial algo- rithms. Several "remarks" are made soon after describing a procedure. These are meta-lemmas describing the properties of the verification system. We omit the proofs of these remarks as they are neither illuminating nor in- teresting. 25 3.1 Basic Theorem Prover A theorem prover attempts to prove that a given conclusion oo fol- lows from a certain hypothesis or premise n. If oo does not follow from n, a general theorem prover may not always halt and say so. However, if Q, and a) are sentences of a properly chosen assertion language, it is possible, as we demonstrate in this chapter for our assertion language, to give a deci- sion procedure for the question: Does Q. logically imply oj? We construct a "most general model" for U, and then determine if oo is "true" in this model, If oo is indeed satisfied by this model, then fi logically implies oj; other- wise, we will be able to produce a counterexample which gives specific values to the variables making oo false and n true. To make the discussion more precise, we will need the following definitions. 3.1.1 Definition Def 1 An interpretation of a wff is a mapping of the set of all ptrs, constants and the elements of the array into the set of integers. The ptr constants 0, 1, 2, . . ., and the function symbols +, -, relation symbols <, <, >, >, =, =)= are given the usual meaning. The key function x maps the ptr expressions into key elements We have taken the set of integers as the universe for the keys of the array only for the sake of simplicity in the ensuing discussion. However, any set of keys on which a linear ordering is defined will do for this universe, The domain of ptr values can similarly be enlarged. 26 m '!« e .1' ■ 0311111 u :. .ii(iii ■€ \ ir Def 2 (Truth of Predicates): The ptr predicates are interpreted as relations on integers in the conventional way. The array predicate array (s,t) < array (u,v) is true if either of the arrays is empty (i.e., s > t or u > v), or if no element of array (s,t) is greater than any element of array (u,v). (Similar meanings are given to <, >, and > relations between array segments.) The array predicate sorted (s,t) is true either if the array seg- ment array (s,t) is empty, or if the elements of the segment are arranged in nondecreasing order from the lower boundary s to the upper boundary t of the segment. A disjunct is of the form p, and p ? and . . . where each p. is a predicate. A wff is of the form d, or dp or . . . where each d. is a disjunct. The logical connectives and and or are interpreted in the conventional way. Def 3 An interpretation M is said to satisfy a wff o>, notationally k. if is true in M. The interpretation M is then a model for . Def 4 A wff ti logically implies a wff w, notationally Q f= u, if oj is satisfied by every model of n. Def 5 A wff , is equivalent to ?> o), (==| d) 2> if , f= 2 and $« Mi- 3.1.2 Outline of Theorem Prover The general strategy of the proof procedure is as follows: The wffs n, and w are in disjunctive normal form. If Q = n , or Q,~ o_r . . . 27 then the proof of ft |= co is a collection of proofs of ft. (= oo. ft |= a) / • • • ■ • • ft, (= co and ft ? f= oo and ft. (= CO Given a disjunct ft,, and the conclusion co to be made from it, we construct ft, from ft, using certain "inference rules." (For our purposes, an inference rule is a procedure which transforms a given wff 4>, according to certain cri- teria, into a wff which is in a more convenient form than .) The wff ft, represents all that can be "deduced" out of the facts given by ft,, and is equivalent to ft,. However, ft, is not necessarily a single disjunct. * * * Thus for each disjunct ft . of ft,, we "normalize" ft,, and co so that both wffs use not only the same set of ptrs but also use the same set of array segments. This may require "partitioning" the array segments they originally #1 referred to into smaller segments. The wff co is rewritten as co using these smaller segments. Thus, :■■> ft, f= CO ft, |= 00 a* u H uo #1 and .#i * #2 ft „ f= co and As we shall see later co is equivalent to co in the context of ft.. M li further have the property that if ft,, is satisfiable then The ft, .'s 28 °11 f" P iff P I s P (3.1) where P is a particular predicate of ft,, that "corresponds" to the predicate #i #1 p of oj . Thus, if a) is a disjunct, «11 •" #1 GJ A* :a>iM &ii r Pn and ^-| r p-|2 'n Hp n '12 hp 12 #1 where u> = p,, and^ p,„ anjd #1 #1 #1 #1 If oo is not a single disjunct, let it be oj, or oj„ where oj, * #1 * #1 is a single disjunct. Clearly, if ft,, f oj, then ft,, f= oj . Otherwise, let #1 * * p.. be a predicate of oj, which is not implied by ft,,: ft, , f p. .. We then consider two cases of the premise: ft n h #1 U) * #1 #1 * #1 ft, , and p . . (= oj, or w 2 and ft, , and not p . . (= ou We now take the transitive closure of the new premises ft,, and p.. to insure that property (3.1) mentioned above holds, and repeat the whole process for each of the new premises. * # An example of ft, oj and ft and oj is given in Figure 3.1. 3ol.3 Inference Rules 7c We now describe a procedure for generating the aforementioned ft.. . Ji and oj from ft and oj. We will assume that ft and oj are disjuncts. The more 29 ft = 2 < i + 1 < n and sorted(l,i) and sorted(i+l ,n) co = array(l ,i ) < array(i+l,n) or x. > x. + . array(l,n) is partitioned into array (1 ,i-l); array (i,i); array ( i +1 ,i+l ); array (i+2,n) ft = 2 £ i and i + 2 £ n and 4 <. n and sorted(l ,i-l) < x. and x. ( , < sorted(i+2,n) #1 . #1 rt „ #1 , . ^ co = co, or co ? where #1 co, array(l,i-l) < array(i+2,n) and x. S arra array(l,i-l) < x i+1 and x i < x^^ 1 y(i+2,n) and #2 .. w l = x i > x i+l a i * The predicate x. < x. + , of cof is not implied by ft . The two new premises are: * * ft and x. < x.,, ft and x. > x- xl * The transitive closure of ft and x. < x. + , is: 2 < i and i + 2 < n and 4 < n and sorted(l ,i-l) < x. and x. + , < sorted(i+2,n) and sorted(l ,i-l) < x. ,, and x. < sorted(i+2,n) and sorted(l ,i-l) < sorted(i+2,n) and x. <, x. ,-, The predicate sorted(l ,i-l ) £ sorted(i+2,n) of above "corresponds" to the predicate array(l,i-l) <. array(i+2,n) of co? . IIW m urn •' ISA inn "■! * # Figure 3.1 An Example of ft, co, ft and co 30 general case will be dealt with later (see Algorithms 1, 2, 3 and 4 of Section 3.2). The procedure consists of several subprocedures (infer- ence rules) each performing a distinct transformation on a subset of predicates of fi and u)„ We will find the following descriptions of the effect of applying the inference rules useful. We are interested only in "sound" inference rules: a «C c • 91 Canii » Willi .■ami e * w .ill!'! 2 Def 6 An inference rule r is sound if it yields when applied on (J>, notationally y 4> , such that <}> (= . Uef 7 An inference rule r is information preserving iff {<$> ^-4> implies (J) h=U ). Intuitively, no information carried by has been lost in the process of applying an information preserving rule Not all inference rules are information preserving. For example, our rule of local implication (Section 3.1 „ 3) lets us conclude that (u + kp - v) from (u + k-, < v) whenever k 2 < k, . Clearly, this rule is not information preserving. Def 8 An inference rule r is an enriching rule if it yields when applied on such that 1. r is information preserving, and 2. For e\/ery aR b of , consider the predicates aRb of on the same variables a and b. Then aR b (= aRb but not necessarily aRb |= aR b. 31 Note that an enriching rule does not actually create new information, but rather makes whatever information was present more readily usable. All our inference rules, except the abovementioned rule of local implica- tion, are enriching rules. It will be convenient to describe the rules on a directed graph representation of the wffs. The representation of a wff is the collection of the representations of its disjuncts. Each disjunct $ is represented as a pair of graphs — a ptr graph representing the conjunc- tion of ptr predicates in the wff , and a key graph representing the array predicates. There is a coupling between these two graphs, namely, the boundaries of the array segments of the key graph are defined by the pointers. The construction of thfe ptr graph tt of a disjunct is des- cribed below. A partitioned array (key) graph will be constructed later. 3.1.3.1 Graph Construction The ptr graph tt will have a vertex for each ptr variable re- ferred to in 4>. For each ptr predicate (u + k < v) in <|>, we put a directed edge from u to v and label it "k,." Note that k, is a signed integer, and the relation is always £. The graph tt may have more than one edge from a vertex u to a vertex v. An example of a disjunct 4> and its pointer graph appears in Figure 3.2. In constructing the ptr graph the following equality axiom is embedded: for any ptr u, u + < u. 3.1.3.2 Subsumption for Ptrs The graph, as constructed in the preceding section, may have ■C ! 32 (|> = 1 < j < i < n and array (l ,j-1) < x. and array Q ,i) £ sorted (i+1 ,n) and j < i ( is the premise of the verification condition VI of Figure 2.3). The ptr graph tt of is i'S IfllKI I fttai | :» } in terms of a common set of segments is obtained as the product of all indi- vidual partitions defined by the array segments occurring in either Q. or go. Each linear ordering of the boundaries used in o, or go defines a partition of the array. The partitioning procedure collects the relevant boundaries, and produces all linear orderings of these boundaries in the context of the partial ordering specified by the ptr graph it of n. The set B of boundaries is constructed as follows: Initially, B «- {-°°, +°°} for each array segment array (s,t) referred to either in 9, or in go do B^BU {s-1, s, t, t+1} endfor Consider a maximal chain of the following kind, in the context of the given ptr graph tt of £2 : m C : - = b Q b 1 b 2 . . b 2q b 2q+1 = +» where for 1 < i < q b-. and b ? . + , are boundaries in B, and b 2i = ] + b 2i-T and b 2i " b 2i+l as implied by the ptr graph tt of ft 36 If every boundary b of B either appears in C, or is equal to a boundary appearing in C, then a composition (product) of the partitions is read- ily obtained: ■X 1 "" .(.■J II & ^ 'It I jC » IB '•' ' •e |» ' 2!Mi .pi: 4 ■«. ; iuu « ini|i|i 1 nil : it;:; .in. i ■ ' II" 11 < »a.i . b Q to b^ b 2 to b 3 ; . . .; b,, to b 2q+1 . However, if for some boundary pair b and 1 + b of B at least one of them is not in C, then we resort to case analysis. We find the largest j such that b < b Clearly, it is not known if b . , < b or b. + , > b. For otherwise, we either have a larger j, or a longer chain. The wff ft is equivalent to the disjunction of ft, , ft where , = ft and (b < b. + , ) Eft and (b > b. +] ) Proving ft f= w is equivalent to proving and ft, Y OJ ft 2 Y w » and in each of ft, and ft we can produce longer chains of boundaries than in ft. We apply the same procedure of obtaining maximal chains of boundaries on each of ft, |= oo and ft ? (= w. Clearly, this process will 37 terminate. Let ft, , ft ? , ft_, . . . , fi. be the decompositions thus ob- tained; in each of ft., the boundaries can all be put into one chain. I Figure 3.5). The following lemma immediately follows: Lemma 1 Let ft = ft. or_ ftp or . . of ft„ where ft.'s are the result C i of decompositions of ft made while ordering the boundaries. Then ft |= =| ft Note that the disjuncts ft 1 , ft . . . , ft Q differ only in the ptr pre- dicates; they all have the same set of array predicates. The ptr graph of ft defines a partial ordering on the set of boundaries B. From this par- tial ordering, say L, we are obtaining all linear orders L-, , Lp, . . ., Lp. Hence L Y = l L i 9L L o °H • • • 2H L c* If ft = S and S where S and S are respectively the set of ptr, and it a tt a array predicates of the disjunct ft, then ft. = L. and S . i i a Lemma 2 For i j j, (ft and ft . and ft,) is unsatisfiable. Construction of Key Graph We construct the key graph a of a disjunct in the context of a given partition of the array defined by a linear ordering L of boundaries For each segment array (s,t), there are two vertices: minx (s,t) repre- senting the minimum key, and maxx (s,t) representing the maximum key of 1 1 38 .Hi'-! c u • >«!' ■!>.,. 3 I: 3«»m i ■mi < IE" " f llgi IB: Let a) e 1 < i - 1 < n and array (1 ,i-l ) < sorted (i,n) and he 1 < j i i < n and S and j > a where S a = array (l,j-l) < array (j,j) and array (l ,i) < sorted(i+l ,n) (The verification condition V2 in Figure 2.3 is 9. |= co.) The set of boundaries B = {-»,+«,0,l ,i-l ,i,n,n+l ,j-l,j,j+l,i+l} Maximal chain C induced by the ptr graph 7T = l f= p, where p is a ptr or key predicate, and is in a certain form, from a local property that p |= p, the p being a particular predicate of $. Def 11 A disjunct = it and a is an enriched disjunct with respect to a set B of boundaries if * * 1. The ptr graph tt of $ defines exactly one linear ordering L of the boundaries of B 2. The array predicates of a have been expressed using the segments defined by the partition induced by the linear ordering L of the boundaries 3. Both tt , and a are transitively closed Let , = tt.. and a, be an enriched disjunct. Let (Ju = tt^ and oio be a disjunct such that the vertex-set of tt 2 is the same as that of * * tt, and the vertex-set of cto be the same as that of a-,. Def 12 For each predicate (u Rp v) of 4> 2 , the corresponding predicate * in » • Ihiiu ' ~ :s»im ■Stu '•iml I :s».!ii Ha 44 2. If there is no edge from u to v in a., then the correspond- ing predicate is (u null v), an empty predicate, which is defined to be true in all interpretations. (Intuitively, take null as "-°° <." Consider a disjunct ox. of the conclusion u> to be made from an enriched it U disjunct n . Let w, be the rewritten version of w, using the partitioned #1 array segments. Since u), is equivalent to oo-, , in this context, it follows that it * 4 n Y a), iff a |= oo^ # iff fi (= p for every predicate p of w. Because ft is an enriched disjunct, we can make the following stronger statement for any predicate p of u>, , the corres- * ponding predicate P of ft is such that P |= p (3.2) ft* H u* i f f We shall refer to (3.2) as the rule of local implication . A proof of the validity of this rule is given in Section 3.2.2. 3.2 Basic Theorem Prover is a Decision Procedure An informal, but complete, description of the basic theorem prover is contained in Algorithms 1, 2, 3 and 4 in the following pages. This section proves that the theorem power is a decision procedure for flf u, where both ft and to are wffs. The theorem prover will be extended in Section 3.3 to prove ft |= w where either or both of ft and o> can be augmented wffs. 45 That the basic theorem prover terminates follows immediately by considering the "length" of the conclusion go. In Algorithm 3, and 4, we delete either a whole disjunct, or a predicate of a disjunct from oo in eyery iteration. In the present section, we shall be occupied with the proof that the basic theorem prover gives correct answers, that is, when the basic theorem prover terminates, the boolean variable - istheorem - is true iff it is indeed (3.3) the case that the premise ft logically implies the conclusion w. The core of the proof is that the rule of local implication is valid. The structure of the proof follows the structure of the algorithms closely. We show (1) that the satisfiability of a graph is decidable and that it is obtained as a by-product of transitive closure, and (2) that ft |= p iff p follows from ft by local implication. 3.2.1 Model Construction Given an enriched disjunct 4> = tt and a , we want to construct a model for it by assigning values to ptrs and keys. Without loss of generality, we can assume that the vertex is present in the ptr graph tt , for, if it is not, introduce it by anding tt with < 0. This vertex is then assigned the value zero. A model for * * tt is constructed first, and then a model for a is similarly constructed. 46 jet ■ • I Oil ;a* MBS! I ■a taui S <•>:■■! II HI' i.-ii'!: procedure basic theorem prover (ft : wff (= w : wff) is theorem «- true if a) is empty then oo •*■ false endif for each disjunct ft, of ft and while istheorem dp_ provetheorem ( ft, |= oj) end for-while endproc Algorithm 1 procedure provetheorem (ft, : disjunct (= w : wff) construct the ptr graph tt of ft,; apply subsumption. If 7T is satisfiable then {see Algorithm 5 of Section 3 3.1} collect the boundaries referred to in ft,, and oj into a set B. for each linear ordering L, according to it, of boundaries of B and while istheorem do construct the key graph a of ft, for this parti- tion defined by L apply subsumption on a istheorem ■*■ provelemma (tt and L and a ^ oo) end for-while endif endproc Algorithm 2 7T +- transitive closure of ptr graph of fi 11 * a *■ 47 function provelemma (ft,, : disjunct |= reference oj : wff) returns value of local islemma : boolean transitive closure of key graph of ft,, * * islemma-*- (it and a is unsatisfiable) if not (to is empty or islemma) then choose a disjunct oo, of oo ; delete oo, from oo , •*■ equivalent of oo, expressed using the partition defined W n # islemma +■ does (tt and a ) imply (oj, or oo )? endif endfunction Algorithm 3 48 c !9*!l Mm',: i 3|i I < 31! mo" a i 3 " ■• i ll lUidl! I (I Mil I 4& * # function does (ft : enriched disjunct) imply (<*>.. : partitioned disjunct or reference to : wff)? returns value of local implies : boolean repeat choose a predicate p of go, ; delete p from to a * until u), is empty orfi,, ^ p it vf_ n n I s P {local implication} # then implies «- true * else implies «- provelemma (ft,, and p |= to, or to ) and provelemma (ft-i-. and not p f= to ) {see Theorem 4 of Section 3.2.2} endif endfunction Algorithm 4 49 Model for tt The model for tt is constructed iteratively. Let assigned = subset of vertices of tt to which values are already assigned, such that for u-. , u ? e assigned (u-, + k-, < Up) of tt is satisfied, i.e., valueof (u,) + k-j < valueof (u 2 ). * This model is then extended by choosing an arbitrary vertex v of tt , which is not in assigned . If no such vertex exists, then the construction * * of a model for tt is finished, and tt is satisfiable. Consider the fol- lowing set: * S = predicates of tt to be satisfied by v = {(u. + k. < v) u. e assigned} 1 i ' l a U {(v + k, ^ u.) | u. e assigned } J J J The value to be assigned to v should be such that all the predicates in S are indeed satisfied, and hence assigned «- assigned U {v}, provided the label on the self-loop at v is zero Let V „ = max of vail and V . = min of valJ, where max mm vail = {valueof (u.) + k. | (u. + k. < v) e S} U {-«>} valJ = {valueof (u.) - k. (v + k. < u,) e S} U {+°°} J J J J £-x: ■ 50 If ptr v is assigned a value V such that V < V £ V • max mm (3.4) ■ i ;W'" 3li ski: AW" C9II then all the predicates in S are satisfied,, However, if the self-loop at v, (v + k < v) , is such that the label k > 0, this predicate is not satisfiable in any interpretation, and any model construction cannot proceed further,, Now, assuming that the self-loop at v is labeled with zero, we show that a value V satisfying the inequality (3.4) must exist. Let max be ■5 valueof (u, ) + k, (without loss of generality on the subscripts i of u. )» and V . be valueof (iu) - k 2 . Thus (u-. + k-i _ v) e S and (v + k 2 < u 2 ) e S. Since tt is transitively closed, (u, + k < u 2 ) e tt for some k greater than or equal to k, + k 2 , and since u-. and u ? are in assigned , we have i (u, + k < u«) £ S by our hypothesis on the set assigned , valueof (u,) + k < valueof (u~) and, therefore V < V . . max rain The vertex v can, therefore, be assinged any finite value in the range •-max' min-"* 51 Model for a The boundaries of array segments are defined by ptrs and hence * a model for a can be constructed only after a model for the ptrs is given If array (s,t) is a segment used in a , we have, in general, minx (s,t) < maxx (s,t) If s = t then this becomes an equality. We remark that the labels or self-loops of a cannot be negative. A positively labeled self-loop is clearly unsatisfiable. Thus, these labels can only be either sorted •k or zero . For each pair of vertices minx (s,t), maxx (s,t) of a , we will assign a single value. This assignment clearly satisfies all self-loops, and sorted predicates. Once this decision is made, the model construction for a is identical to that for tt . 3o2.2 Unsatisfiability and Local Implication Theorem Theorem 1 (Unsatisfiabil ity Theorem) The ptr graph tt is unsatisfiable iff the transitive closure tt of tt has a self-loop whose edge label is positive. Proof We know from Remarks 1 and 2 that i i * TT |- =| TT * Thus, tt is unsatisfiable iff tt is. (^b) If tt has a self-loop at v with a positive label k, clearly (v + k < v) is unsatisfiable in any interpretation. 52 C "«' X " 2 la* 1 „ $0" * ■■ ■ r 1 " . in id (=^) It is well known that the transitive closure G of a graph G will have a self-loop at a vertex v iff G has a directed cycle (not necessarily of length 1) passing through v. The rule of * transivity is such that the label on the self-loop at v in tt is not less than the run of labels of edges in any directed cycle of tt passing through v„ Thus, if tt has no directed * cycle with positive edge-label sum passing through v, then tt does not have a self-loop at v with a positive label. For such * a tt , we can indeed construct a model (see Section 3.2.1), and hence tt is satisfiable. I Corollary to Theorem 1 (Unsatisfi ability of Key Graphs) The key graph a, in the context of an enriched ptr disjunct, is unsatisfiable iff the transitive closure a of a has a self- loop whose edge label is positive. Theorem 2 (Local Implication Theorem) * * i Let tt be a transitively closed ptr graph. Then tt |= (u + k ? < v) iff the corresponding predicate (u + k, < v) * in tt is such that k-, > k2<> Proof (4=) Obvious. (-$?) we prove that if k-j < k 2 then tt Y ( u + k 2 - v )* Assi 9 n u an 53 arbitrary value, and then assign v a value equal to valueof (u) + k-.. Set assigned +(u,v}. Now, we can complete the construc- tion of the model as in Section 3.2.1. Clearly, (u + k~ < v) is false in this model „ I Corollary to Theorem 2 Theorem 2 holds for a transitively closed key graph a , and array predicate (u + k~ < v). Theorem 3 Let ft be an enriched disjunct, and co a disjunct partitioned with respect to the linear ordering of boundaries defined by ft . Then o* b # ft |= 0) iff either, for every predicate p of to , * ft locally implies p, or ft is unsatisfiable Proof by Theorem 1 and repeated application of Theorem 2 Q | When an enriched disjunct it and a does not imply a predicate p of a),, we consider two cases (refer to Algorithm 4, if -statement): 7T TT tt and a an_d not p f= to tt and a and p |= (w, o_r w) (3.5) (3.6) If p is sorted (s,t), not p cannot be represented in our scheme (Section 3.1.3.4). Hence a proof/disproof of (3.5) cannot be obtained in this 54 deductive system. The following theorem avoids this problem by showing that if p is a sorted-predicate, then the proof of (3.5) is equivalent to the proof of (it and s < t) an_d a and minx (s,t) < maxx (s,t) f= w when a ^ a sorted- predicate, p. | Theorem 4 Let (J> be an enriched disjunct and array (s,t) be a segment * * in the partition defined by . Further assume that <|> Y sorted (s,t). Then it an and not sorted (s,t) |= i> iff = ifr j|3 3 wf Ift M ■) where \p is a partitioned wff in the context of , and * <|> and s < t and minx (s,t) < maxx (s,t) If <|> |= ijr then and not sorted (s,t) = \p is obvious. Suppose 4> and not sorted (s,t) ^ i|> . If 4> is unsatisfiable, or if array (s,t) is empty, the theorem is trivially true. So let and not sorted (s,t) be satisfiable. Consider any model M for (J) , p . If sorted (s,t) is false in this model, then # M |= ]p . Given a model, any permutation of elements of sorted (s,t) M conserves the minx (s,t) and maxx (s,t). Thus, if we permute the elements of array (s,t) in model M, all predicates of \\> , with the possible exception of ( array (s,t) R array (s,t))-type predicates, must still be true. Since \\> is a wff (in our 55 system) there are only three possibilities for ( array (s,t) R array (s,t)): 1. array (s,t) < array (s,t) 2 - a rray (s,t) < array (s,t) 3 » array (s,t) sorted array (s,t) The first one is unsatisfiable in every interpretation. The second one will still be true after permutation. The third one was false in M, and if the permutation is an appropriate one it may be true in the resulting model M But, if \p had this sorted (s,t) predicate, (J> and not sorted (s,t), being a satis- # * fiable disjunct, cannot imply \p . Thus, if |= <$> and not sorted # M (s,t) then \p will be true in a model M also where M is the result of permuting elements of array (s,t). That is, ^ will be true in any model for . | 3.2.3 Basic Theorem Prover is Correct The structure of the proof of ft \= w, as constructed by this theorem prover, is shown in Figure 3.9. Theorem 3 yields the proof at the lowest level in Figure 3.9; the remaining proofs are proven by appro- priate recursive/ iterative calls (indicated by dotted lines; see Algo- rithm 1 , 2, 3, and 4). We omit further details of the correctness proof of the basic theorem prover. •OS ft gl HIM: i {, mm It' J _ ;|W' 4>l 'l.il !S# j&Q»ir 3 >j iM im 4 .B qi 56 Let ft = fi 1 or n 2 , where ft 1 is a disjunct, and ^ is a wff, possibly false ft (= co -<- ft, |= CO and ftp (= CO for each linear ordering of boundaries of ft, and oo prove ft, • |= co let cj = co, p_r ojp (^o m y be em Pty) — *\ fi-j . |= p of cotj and ft,. (= c/ with ft, • and not p (= cop and ft, . and p |= co, o_r ojp y p deleted if false Figure 3.9 Structure of the Proof of ft |= CO 57 3.3 Evaluation of Backward Functions Recall that the conclusion oo of the lemmas ft |= co to be proven was an augmented wff, possibly involving the functions subst, exchb and nsrtb in because of backward substitution (see Chapter 2). Similarly, ft was an augmented wff possibly involving the functions subst and unmodi f iedpartsof . To be able to use the basic theorem prover presented in Section 3.1, we transform these augmented wffs by evaluating the func- tions to produce simple wffs not involving any of these functions. Strictly speaking, the evaluation of functions like exchb , nsrtb , etc., cannot be considered part of theorem proving. However, we include it here because it plays an important role in our theorem proving, and because this evaluation is done in the midst of the theorem-proving effort. Given the lemma ft f= go to be proven, the subst functions, if any, of ft and co are evaluated first. Let us call the resulting augmented wffs SI and go. The premise Si can be considered to be ft, or ftp where ft-, is a disjunct, and ft 2 is an augmented wff, possibly the wff false . We then prove that ft, (= go, and that ft ? |= oj. In Section 3.3.2, we describe the proof of ft, (= oo. (This procedure is used repeatedly for each disjunct of ft.) The boundaries of ft, and oo are collected into a set B. Each linear ordering of boundaries defines a partition of the array The boun- daries collected are such that this partition has single element array segments array (s,s) and array (t,t) for each exchange x with x. statement (similarly for insert statements). It is then a simple matter to evaluate the exchb and nsrtb functions. The resulting oo is a simple wff. 58 We now describe these two passes of evaluation in greater detail ■I ' ,'l MM 2'**' 4 ** m jjp < "r ■ Ikatu t i"8' • laajW (nil 4 <8q 3.3.1 First Pass of Evaluation The evaluation of the subst functions is the simplest, and constitutes the first pass of our evaluation. Clearly, before subst t for u in \p can be evaluated, all subst functions of the augmented wff \p must be evaluated. Assuming that ty is free of subst functions, the ptr expres- sion t is substituted for e\/ery occurrence of the ptr variable u in the augmented wff i|;, which may have only exchb and nsrtb functions. Remark 7: Let S be u ■*- t statement. Then, the entry assertion E = subst t for u i_n \p<~ generated for an exit assertion ^ is such that if b (\>r. then |=, ^, and if jf <|> E then f x ^ s where I is the result of execution of S on an interpretation I. The boundaries referred to in the conclusion a> and current disjunct of the premise ft, are collected by Algorithm 5. As can be easily seen, the boundaries included in B are such that the partition produced is guaranteed to contain appropriate segments needed in the evaluation of e xchb , nsrtb and unmodifiedpartsof in the second pass. 59 B «*■ {-oo, +»} for each array segment array (s,t) referred to either in fi or in a) do B^-BU {s-l,s,t,t+l} endfor for each exchb x with x. in ip occurring in gj do B^-BU {s-l,s,s+l,t-l ,t,t+l} endfor for each nsrtb x below x. vn ip occurring in w do B^-BU {s-1 ,s,s+l ,t-2,t-l ,t,t+l} endfor for each unmodifiedpartsof a wrt (s,t) occuring in ft, do B^BU {s-1 ,s,t,t+l} endfor Algorithm 5: Collecting Boundaries 3.3.2 Second Pass of the Evaluation The second pass is made for each linear ordering L of the boundaries collected as above. None of the functions exchb , nsrtb and unmodifiedpartsof changes the ptr expressions. While the first pass has an effect only on ptr expressions, the second pass has its effect only on the array segments, which depend on the context L. Again, the evaluation of exchb and nsrtb is from inside out. .''fan '•■' 0i <■* ,ttZ •to it* - '■""■>' 2, a* 4 iNMIll c r:i 60 exchb Assuming that \\> in a wff, that is, that \p is free of exchb and nsrtb functions, exchb x with x. in \j> is evaluated in the context of the partition defined by the current linear ordering of boundaries. The wff if; is expressed as \\> using the partitioned array segments. Note that the partition produced will have single element array segments A = array (s,s), and B = array (t,t) (see, second for -loop of Algorithm 5). The exchb is evaluated by substituting B for A, and vice- versa, in every array predicate of ty . nsrtb Again assuming that ty is free of exchb and nsrtb functions, nsrtb x below x. in ty is evaluated in the context of the present partition. The wff \\> is ex- pressed as \\> using the partitioned array segments. Note that since s-1, s, s+1 , t-2, t-1 , t and t+1 are included in the set of boundaries, the partition produced willi have single-element segments array (s,s), array (t-1, t-1), and array (t,t). The boundaries are already ordered, and we consider the two cases s < t, or s > t. Suppose s < t. Then the following transformations are made on the array segments A = array (u,v) of the predicates of \p : 61 1. If A is a subsegment of array (s,t-2) then A is redefined as array (u+1 ,v+l ). 2. If A is the same segment as array (t-l,t-l) then A has the new definition: array (s,s). 3. The definition of A is unchanged otherwise. Now suppose s > t. Then the following transformations are made on the array segment A: 1. If A is a subsegment of array (t+l,s) then A is redefined as array (x-1 ,y-l ). 2. If A is the same segment as array (t,t) then A is rede- fined as array (s,s). 3. Otherwise, the definition of A is unchanged. Remark 8 : Let S be either an exchange x wi th x. or an insert x below x t statement, and let ^ be the corresponding exchb x with x. j_n ^ s or nsrtb x below x. in i/;~ statement where \\>~ is the exit assertion of S. Then ij> s is true in M , which is the result of the execution of S on M, iff c is true in M, where M is a model for the context. More formally, if L is the present linear ordering of boundaries and hL then M if |= tyj. then |=, i^ c , and M t M b if (^ E be the entry assertion of S obtained as above 62 in the context of L. Then if k F then f=, ^, and if k (J) E then p, ip $ M where M is a model for the linear ordering L of boundaries and M is the result of execution of S on M. Proof by repeated applications of Remarks 7 and 8.1 0fA« MM 'l! ■ IB Id I unmodifiedpartsof The description of the evaluation of unmodifiedpartsof g wrt (s,t) is somewhat complicated because of the details needed. Intuitively, since the procedure called can permute the elements of array (s,t), all predicates of a which depend on the strict subsegments of array (s,t) are deleted from a . In addition, the predicate sorted (s,t), if pre- # # sent in a , is deleted from a . The complication arises from the pos- sibility that the current linear ordering of boundaries partitions the segment array (s,t) into smaller segments. In such a case, it will be necessary to temporarily "join together" contiguous segments to see if the entire segment array (s,t) is related to other segments of array (-oo,s-l ) or array (t+1 ,+=»). Let array (s,t) = S,; S ? ; . . .; S : that is, the S .s consti- tute the p subsegments of array (s,t) from the boundary s to t. Then the evaluation is done as described in Algorithm 6. 63 «- the wff false for each disjunct a of g do it. *■ ptr graph of a, for each linear ordering L induced by tt do (J) 1 «- L let a, be the disjunct a, expressed using partitioned segments for each array predicate (ARB) of a, do cases neither A nor B is an Si : -| is unchanged endfor <}> +■ = unmodifiedpartsof a wrt (s,t) 4 Ml 1 1 ,0| 'V. « jig Hiii 3» iw a ii n; 64 This completes the description of the evaluation of functions. In the next section, we present the theorem prover with the extensions required by the evaluation, and give arguments to establish the fact that the extended theorem prover is a decision procedure. 3.4 Extended Theorem Prover is a Decision Procedure Let us briefly review the backward substitution method (Section 2.2.2) of generating the verification conditions., To prove -C4> | P | ^> , the program P is decomposed into straightline program segments S and we then prove {(f) |S|^}, where and ip are generated from Y, and the loop in- variants given. Each {|S|iJ;} is proven by proving the generated lemma (= <{>d> where (k is the entry assertion for S generated from ty by back- ward substitution. We recall that the backward substitution of [King 1969] is such that if (= then f=, \p t and 1 B j if Yi 4>b then t*« ^ (3.7) where I is any interpretation, and I is the result of executing the program segment S on I. Note that {<}) R |S|^} is a milder statement tha (3.7). It follows immediately that, for any entry assertion , { | S | ij,} iff (J) f= <}> B (3.8) In general, B has many disjuncts which are unsatisfiable in every_ model 65 of <}>, making it unnecessary to consider these The diligent reader may have noticed that the contextual backward function evaluation of the previous section may generate entry assertions - is essentially 4> B from which some dis- juncts are deleted. To illustrate this dramatically, consider the one-line program segment S = exchange x. with x., the exit assertion ij>~ being sorted (1 ,n) i j j The <}> R generated by true backward substitution is the equivalent of that slhown in Figure 3.10a. This R does imply the property (3.7). (For readability we have not written cj> in disjunctive normal form.) However, the backward function evaluation in the context of 1 < i < j < n yields r- is much simpler than t|) R , whose generation does not depend on the context of the given entry as- sertion. Theorem 5 (Validity of contextual backward function evaluation) For a straightline program segment S, {(f) |S| ip} iff 4>* |= E Proof Without loss of generality, we assume that the given entry as- sertion is , the enriched version of . Thus, we wish to prove { | S | \l)} iff $* |= F . (3.9) We actually prove a slightly stronger version than (3.9), namely, * * {<() S | i/>> iff for every disjunct . of <|> , . |= _. 66 4 m ' ''I ■u(i.aa ) < x, < sorted (i+l ,j-l) < x, < sorted (j+l ,n) ) < x i < sarted(j+l ,i-l) < x. < s orted (i + l ,n) ) < x. < sorted (i+l,n) ) < x, < sorted(i+l,n) %J ) < x. < sorted (j+l,n) ) £ x. < sorted(j+l ,n) * ? S = exchange x. wi th x. * sorted(l,n) R = <}>p with context ' true ' e 1 £ i <. j £ n and sorted ( 1,1- or 1 £ j < i ^ n and sorted (1 ,j- £r 1 £ i £ n < j a nd sorted ( 1,1- or j < 1 < i 5 n and sorted ( 1,1- or 1 < j < n < i and sorted (1 ,j- ojr i < 1 ^ J ^ n and sorted (1 ,j- ojr sorted ( 1 , n ) and ( i < 1 or i > n) and (j < 1 or j > n) (a) F in the context of 1 < i < j < n = sorted (1 ,i-l) < x. < sorted (i-l ,j-l ) < x i < sorted (j+l ,n) (b) (j) • Contextual Backward Substitution (Simplified) Figure 3.10 Contextual and Context-Free Backward Substitutions 67 where -. is the entry assertion of S obtained by evaluating the back- ward functions in the context of L . , the linear ordering of boundaries * * defined by the (enriched) disjunct . „ That . (= <|>p. , say for i = 1, is shown by proving <$> R and L, |= =| F , and L El 1 Thus, (jjp. is 4>p, or (jv^ or ... . <1> R and L, |= (j^-, and L, : Suppose and L, ^ _,. Let M be a model for R and L, , such that ^ E , and L, then p, ip, where M is the result of the execution M of S on M. Since M is a model for d and L, : Suppose <|> F1 and L, is true in M, and Y 4>r. By property (3.7) ti i M ^i \\j. But by Lemma 3, if \= F , and L, then (=, ip, a contradiction. I M M ' M The advantage in generating p's rather than <|> R should be ob- vious. The assertion D will have as many disjuncts as there are linear D orderings of the boundaries collected from S and ^. Several of these linear orderings are of no concern to us, since we only need to prove that execution of S on an input staisfying <(> results in \p. We do not care what S does on a linear ordering of boundaries contradicting the partial order specified by . .] X '* F i MS ■ C|:3 ;;»» . O : i a ; ;s* j ■::.!■ •• Id 535 68 Since contextual backward function evaluation is valid, the extended theorem prover is a decision procedure for ft f= go, where u and ft are augmented wffs. 3.5 Counterexample Generation Whenever the theorem prover determines that ft ^ go it is pos- sible, in this system, to construct a model M for ft such that go is false in M. However, it should be realized that M may not be a "counterexample to the program." This is because even though {<|>|P|iJ>}, the loop invariants given may not be strong enough to prove all the lemmas generated. Coun- terexamples will, hopefully, provide clues for strengthening the loop invariants. Suppose ft ^ to. Then there must exist (see Algorithm 2: prove- theorem) a linear ordering L, ptr graph tt and key graph a of a disjunct ft, of ft such that it and L and a ^ .go. § Let to be the partitioned version of go in the context of L, go and_ L |= =| go and L go = go-, or goq or . # . or go The last call (from either Algorithm 2 or 4) of Algorithm 3: provelemma gives a satisfiable disjunct ft,,, and go = go such that 69 "n ¥ U) and for 1 < i < c, • # ft,, and a), is unsatisfiable. * # Thus, a model M for ft,, such that f co is a counterexample to (it and m c * # L and a |= ca) and hence to ft = oo. Since ft,, ^ w , there must be a predi- # * * cate p in to such that ft,, ft p (see Algorithm 4). The disjunct ft,, and not p is satisfiable, and a model can be constructed as in Section 3.2.1 * for the transitive closures of ft,, and not p. 70 4. GENERALITY ■C ■ In the last two chapters, we have seen the successful applica- tion of inference rules about partitioning, closure and local emplica- tion in the verification of programs written and asserted in our languages. Though these vital inference rules are developed here as the result of severe constraints imposed primarily by the assertion lan- guage, they do apply to a wider class of programs manipulating data structures. We now give several examples to support this contention. 4.1 Constraints of the Present Verification System . a: '•"{ 2 '3m * ■;» jiSS i T9 4 i»n* The verification system was designed with the specific goal of being usable in SORTLAB to verify the correctness of student pro- grams for sorting an array. Severe constraints were imposed on the programming and assertion languages both to limit the class of programs to sorting-type problems and to obtain a system that is usable in a practical situation. Not all these constraints are technically necessary for making the theorem prover a decision procedure, though they have value pedagogical ly. For example, the verifier can be enhanced quite easily to per- mit many arrays, temporary variables, ptr expressions like j + 8, and predicates like array (s,t) - 3 < array (u,v), which means array (s,t) - 3 5 array (u,v) = V.V. (s^ist and u^j^v ■+ x.-3^x.) However, if arbitrary assignments to array elements are allowed, it is 71 not clear how the verifier can be extended to prove the key-preserving property of solving algorithms. It is not possible to characterize the class of programs provable in this system except as those programs that can be written in our programming language and for which sufficiently strong assertions can be made in our assertion language. Theoretically speaking, all computable functions are programmable in the programming language. How- ever, for most computable functions strong enough assertions do not exist in our assertion language that permit a proof that the correspond- ing program computes the function. Thus, e.g., heap sort and several merging programs can be written in the programming language, but strong enough assertions to prove that these programs also sort do not exist in our assertion language. 4.2 Partitioning Several properties on a data structure can be expressed as properties on its substructures, and by interrelationships among these components. For example, sorted (s,t) iff s ^ t or (for all u, s ^ u < t sorted (s,u) 5 sorted (u+l,t)) avl-tree iff empty- tree (r) or avl-tree (left (r)) and avl-tree (right (r)) and -1 ^ height (left (r)) -height (right (r)) * 1 A typical verification condition ft |= w of a program aiming to produce such a property on a data structure is of the following kind: the conclusion ■t»l 9 *!•■<« .dus H •X* BE a S'ttttfc . m A II ji 72 aj refers to larger parts of a data object having the property, while the premise Q, refers to smaller parts of the data object which have the same (or similar) property and contains certain interrelationships between these parts. Proving Q f= w becomes much simpler in such cases if both ft and a) are expressed in terms of a set of common parts of the data object, Partitioning is a technique which decomposes the data object into small enough components so that every segment of data structure referred to in ft or co is a union of some of these components. 4.3 Closure and Local Implication Much of the inefficiency in general theorem provers can be traced to their inability to choose appropriately those predicates of the premise which would imply a certain conclusion. The rule of local implication completely avoids this problem by specifying the pre- dicate of the premise that determines if a given predicate of the con- clusion follows from the premise. It should be noted that the rule of local implication is valid only when the ptr and key graphs are transitively closed. A rule of local implication can trivially be formulated in any deductive system if all possible inferences from the given premises are collected as the closure of the premise. However, this may not be practical either because it takes a long time or because the closure is not finite. We therefore seek inference rules yielding only finitely many inferences from given premises and obtain the closure of such rules. In the context of proving lemmas about parti tionable properties 73 on data structures, it is generally possible to obtain this closure rapidly, and to invent appropriate rules of local implication. 4.4 Examples Several examples from the literature are used in this section to support our contention that the techniques developed for SORTLAB are in fact applicable to a wider class of programs. The treatment of these examples is necessarily brief; we only indicate how a relevant partition may be constructed. We also assume, without further ado, that ap- propriate extensions are made to the programming and assertion languages where necessary. 4.4.1 A Geometric Example Consider finite plane maps which can be described using rectan- gles with one side parallel to the x-axis, and the operations union (+), intersection (•) and negation (~i) • Thus A + B is the map covered by the rectangle A or B, A.B represents the map common to both A and B, and -tA represents the map not covered by A. The shaded map shown below can be described by several expressions. 16 15 j -i ii ...II* 1 ri Xvui : is: " ;*'« 74 For example, (l-2-3-4)-"t(5-6-7-8)--i(9-10-ll-12) (l-14-15-4)--)(5-6-7-8) + (13-2-3-16)-n(9-10-ll-12) (l-2-3-4)-n(5-10-ll-8) + (6-9-12-7) The problem we wish to consider is: given two expressions E, and E 2 , decide if E, and E 2 are describing the same map. If the coordi- nates of all points referred to in E-. and Ep are constants, the problem is trivial. But, if the points are arithmetic expressions (with plus, minus only) of free variables and constants, the problem can be answered by decomposing the maps described by E-. and Ep as follows. Let the rectangle A contain a corner p of another rectangle B. Then, p splits A into four smaller rectangles A,, A«j A 3 and A. as shown below. Repeat this process until none of the partitioned rectangles > splits contain corners of other rectangles. Clearly, each original rectangle is a union of some of these parti oned rectangles. If we now impose a linear ordering on these partitioned rectangles (e.g., A precedes B if the coordinates of the left-top corner of A are (x, ,y, ) and that of B are (x 2 ,y 2 ) such that either x-, < Xp or x-, = Xp and y-j < y 2 ) the original expressions E-, and Ep can be rewritten in a canonical form now and E-, will be equivalent to E 2 if their partitioned expressions are identical . 75 4.4.2 Simple Array Examples All the verification conditions of the two examples given in this section can be proven by partitioning the array as described in Section 3.1 .3.4. 4.4.2.1 Binary Search The example given in Algorithm 7 is a classical binary search algorithm. The proof that the algorithm searches correctly a sorted array x(m. . . n) for an element z does not depend on the index k being equal to (i+j) div 2; this particular choice of k only makes the algorithm more efficient (0(log 2 (m-n))). For the algorithm to search properly it is sufficient that the function f be such that when- ever i < j, i * f(i,j) < j. The verification condition for the loop is sorted (m,n) and i < k < j and ( z i sin- array (i,j) or z notin- a rray (m,n)) 1= sorted (mn,) and i ^ k and x k - z and (z i sin- array (i,k) or z notin-array (m,n)) or sorted (m,n) and k + 1 ^ j and x. < z and (z i sin-array (k + 1, j) or z notin-array (m,n)) The predicate notin-array is the negation of i sin-array , where z i sin- array (s,t) = s = t and x = z or (for some u such that s S u < t z i sin-array ( s , u ) or z i sin- array ( u+1 , t ) ) 76 - " r ■■ 'I H em ■ m to 3!\:: 3 ;! * !» 2 « ;.. v *sorted (m,n) i ■*■ m; j -*• n while i < j do k «■ f(i,j) if x k < z then i «- k + 1 else j ■*■ k endif * i S j and (z i sin- array (i,j) or z notin-array (m,n)) endwhile and sorted (m,n) found -*- (x- = z) * (found «-*■ z i sin- array (m,n)) Algorithm 7. Classical Binary Search 77 4.4.2.2 Dutch National Flag Problem The problem is to rearrange the elements of an array x which are those-val ued viz., either red, white or blue, into contiguous red-, white- and blue-colored segments from the low end to high end respectively. [Dijkstra 1976]. A solution to the problem is given here as Algorithm 8. The predicates red, white, blue or array segments are defined as fol lows: c(s,t) = (s < t and for all u such that s < u < t c(s,u) and c(u+l ,t) or s = t and color (s) = c or s > t) where c is to be substituted by red , white , or blue . The backward function evaluation, and partitioning technique of Chapter 3 are adequate to prove the partial correctness of this algorithm. 4.4.3 Heap Sort Algorithm 9 [Floyd 1964] imposes the structure of a binary tree on the array to sort its elements. We formulate the si ft- up algorithm recursively; an iterative version of this algorithm is not prov- able using our partitioning technique (see Section 4.5). The predicates ordt, x - tree (•,*) are defined below: 78 f I X ■c ...1 MM Kim up "HrlO es r ■*• 1 ; w ■*• 1 ; b «- n while wS b do cases color (w) of white: w «- w + 1 red: ( exchange t with t ; r +,r ;■+ 1 ; vt + w + 1) blue: ( exchange x with x. ; b *■ b - 1) end cases * red (1 ,r-l ) and white (r,w-l ) and blue (b+1 ,n) and l£r*wsb)£n+l endwhile * red^ (l,r-l) and while (r,w-l) and blue (w,n) and l ordt (s,t) heap (s,t) (s £ t and x £ x and x ^ tree (2s, t) v u s u and x > tree (2s+l ,t) or s > t) (s < t and x ^ ordt (2s, t) and x s > ordt (2s+l,t) or s > t) (x ^ tree (s,t) and ordt (s,t)) (s < t and heap (s+1 ,t) and ordt (s,t) or s ^ t) Since our interest here is to demonstrate the applicability of the principle of partitioning, we shall take the liberty of simplifying the verification conditions. A crucial verification condition of si f tup- procedure is: {j = 2i < n and x- < x. and x. -, 5 x. and ordt (2j,n) and ordt (2j+l,n) and x. ^ tree (j,n) and x- ^ ordt (j+l,n) [ call siftup (j,n)| ordt ( i , n ) } , (4.1) 80 ■ ••11 3 a "is> SS IE* a jlSS procedure si f tup (i,n) * ordt (wi ,n) and ordt (2i + 1 , n) j <- 2 * i if j $ n then if j < n then If x i < x -j + i then j +• j + 1 endif endif if x- < x. then J exchange x^ with x.; * ordt (2j,n) and ordt (2j + 1 , n) and x. £ tree (j,n) and x. * ordt ( j + 1 , n call siftup (j,n) endif endif * ordt (i,n) endproc Algorithm 9(a). Recursive Siftup Algorithm 81 procedure heapsort (n) for i n div 2 down to 2 do call siftup (i ,n) * heap (i,n) and 2 < i ^ n drv 2 endfor for i *■ n downto 2 do call siftup (1 ,i ); exchange x-, with x. * heap ( 2 , i - 1 ) and array ( 1 , i - 1 ) < sorted ( i , n ) and < i £ n endfor * sorted (1 ,n) endproc Algorithm 9(b). Heap Sort 82 • assuming that si f tup does not change the order of elements in any tree (s,n) unless the tree is a subtree of tree (i,n). The Lemma (4.1), therefore, reduces to: j = 2i < n and ordt (2j,n) and ordt (2j+l,n) and x. ^ tree (j,n) and x. £ ordt (j+l,n) and ordt (j,n) ordt (i ,n) (4.2) where call siftup (j,n) has added ordt (j,n). The relevant partition of the "array" is not decomposing into contiguous array segments but to decompose the tree (i,n) into its two subtrees tree (2i,n) and tree (2i+l,n) and the root x. . The proof of (4.2) requires consideration of three cases: 2j > n, 2j = n, and 2j < n. To demonstrate the use of a partition of the above type, consider the most interesting case 2j < n We can rewrite (4.2) as: j = 2i < n and 2j < n and ordt (2j,n) and ordt (2j+l,n) and ^ tree (2j,n) and x- ^ tree (2j+l ,n) and x- ^ x- and £ ordt (j+1 ,n) and x i x j 1= x.- x 2i x i £ ordt (2j,n) and x. > ordt (2j+l,n) (4.3) x ?i anc * x i * ordt (4i,n) and x. > ordt (4i+l ,n) and ordt (4i ,n) aj 2 ordt (2i+l,n) > ordt (4i,n) ami x~. ^ ordt (4i+l,n) and 83 As can be seen, the conclusion follows from the premise if 2i is substituted for j. The verification conditions for the two for- loops of heapsort (Algorithm 9(b)) require even more complex partitioning: a decom- position into subtrees as well as into array segments of one-element. However, an iterative version of the siftup-algori thm does not yield to such a decomposition of the heap, and hence is not provable by our techniques. 4.4.4 A List Moving Algorithm Algorithm 10 [Reingold 1973, Wagner 1974] moves all nodes of a list structure accessible from a root to a new contiguous set of nodes. We outline a proof of the fact that what is copied by the algorithm is isomorphic to the original list structure composed of all, and only, those nodes accessible from the root. For convenience in this proof, we have introduced the tables copyof [•] and origof [•], and boolean flags copied [•]. The original node, origof [q], of the newly copied node q is not required by the algorithm itself; the tables copied [•] and copyof [•] may be overlapped with the left [•] fields of the original nodes (see Wagner 1974). The predicates in the loop invariant are defined below: isocopy (q) = (q = or i so copy (q-1) and data [q] = data [q ] and right [q] = copyof [right [q Q ]] and left [q] - copyof [left]q Q ]]) 84 4 Hi tf;J| 3 g 'Urn •m ■»m » •an 3 1 IS* 1 *o | ':»•' £ I at < <» m procedure movelist (root) p «- 0; q «• 0; £ ■*- root call copy (£) while q p do q + q + 1 call copy (left [q]) call copy (right [q]) * isocopy (q) and q to p and p from q and dupe (q,p) endwhile * isocopy (p) and p to p and p from p endproc procedure copy ( var x) vf x f nil then if not copied [x] then p «*- p + 1 ; node [p] «- node [x]; copied [x] +■ true ; copyof [x] + p; origof [p] •*■ x endif x ■*• copyof [x] endif endproc Algorithm 10 A List Moving Algorithm 85 where q = origof [q]. (The nodes 1 through q constitute an isomorphic copy of a substructure of the original list.) q to p = (q = or q-1 to p and left [q) £ p and right [q] < p or q-1 to p-1 and (right [q] £ p-1 and left [q] = p or right [q] = p and left [q] ^ p-1 ) or q-1 to p-2 and left [q] = p-1 and right [q] = p) p from q = (q = or P from q-1 or p-1 from q-1 and (p = left [q] or p = right [q]) o_r p-2 from q-1 and p-1 = left [q] and p = right [q]) (q to p means that all nodes reachable from q using right-left links are included in 1 ... p. Similarly, p from q denotes the converse, i.e., all nodes included in 1 ... p are reachable from nodes in 1 . . . q via the right-left links.) dupe (s,t) = (s > t or s = t and node [s] = node [origof[s]] o_r for some u, s ^ u < t and dupe [s,u] and dupe [u+1 ,t]) (Nodes from s to t are exact copies of their original nodes.) The partition of the copied list structure as indicated by the above definitions of the predicates readily gives a proof of various verification conditions of the list moving algorithm. 86 4.5 On the Applicability of Partitioning As we have seen in the examples of the proceeding section, a class of programs that typically have loops (recursive calls) operate on their data objects building up the desired property iteratively (recursively). Two general approaches are discernible in the iterative build-up of properties: Al . The data structure having a desired property P is gradually built-up. If D is a segment of the data object having property P, we find 6D, an incremental part from the remaining part of the data object. The composite segment D + <5D is manipulated so that D + 6D has the property P. Repeat the process until all of the data object has the property P [Misra 1976]. A2. The desired property P on a data object is gradually built- up. If D has a property Q, we manipulate D so that it now has property Q which is "closer" to P than Q was. The examples of Section 4.4.2 and 4.4.4 belong to class Al . Partitioning seems applicable to all such programs. It is, of course, possible to describe an algorithm belonging to class Al in terms of A2. A bubble sorting algorithm can be thought of as converting an array that is less-sorted to an increasingly-sorted array; however, the algorithm is best put in class Al . On the other hand, there are algorithms belong- ing to class A2 which it will be very difficult to describe in terms of Al . A nonrecursive sift-up algorithm of heap sort (see, Floyd 1964 and Section 4.3) descends the tree confining the undesirable property 87 that some tree is not ordered ( ordt ) to smaller and smaller trees. This algorithm clearly belong to Class A2. Thus, for partitioning to be applicable, it seems necessary that the following requirements be satisfied: Rl . The data structures used must have disjoint components. (Thus circular lists, "trees" with shared structures do not satisfy this requirement, while stacks, queues, linear lists, trees, tables do.) R2. It should be possible to describe the property P on data object D equivalently in terms of the same property P on components of D obtained by a finite decomposition, and possibly some interrelationships among the components. (Properties like A is a permutation of A , are not thus partitionable, while those like T is an AVL-tree, array A is sorted, or array A is a heap are.) R3. The property P being sought should be built-up by the algorithm using the approach Al . When the desired property P, and data object D satisfy requirements Rl and R2, it is generally possible to write programs that satisfy R3. Thus, the applicability of partitioning depends not only on the intrinsic properties of the data structure, and the property P, but also on how P is built-up. • •• I I f .^1 i ■1 |3i3 . Q i, J ' fi.iw I an g inn a is* 5 >uvfC 4 'It All S'l «' 4 Mm 88 5. SORTLAB The verification system descirbed in Chapters 2 and 3 is at the heart of a programming laboratory, called SORTLAB, which assists the student- programmer in producing correct sorting algorithms from basic ideas of these algorithms. SORTLAB consists of a program editor, an interpreter, the program verifier described earlier and a counter- example generator. These are implemented on the PLATO interactive system as a "lesson." This lesson is a part of the Automated Computer Science Education System (ACSES) developed by the Department of Computer Science of the University of Illinois. This chapter describes SORTLAB, its use and its implementation. Sections 5. land 5.2 provide a context in which the performance of SORTLAB should be evaluated. 5.1 PLATO The PLATO IV interactive system [Alpert and Bitzer 1970] is designed to support more than 500 users logged-in on the plasma-panel graphic terminals. The users can be divided into "authors" who write teaching-programs ("lessons"), and "students" who execute these lessons at their own pace. It is expected that a user limit CPU usage to 2 milliseconds/clock-second; any attempted over-use will be reduced to this level by offering fewer time-slices. Each student-user has a data segment of 1,650 60-bit words. A lesson is assigned a data space of 1,500 words in the central memory, and it can access these 1,500 words and the first 150 words of student 89 data segment. The 1 ,500- word space must be loaded (and unloaded) with the contents of the remaining 1,500 words of student data segment or of a segment of extended core storage containing information that is common to all users executing the lesson. Thus any lesson using more than 150 words of data must explicitly control this "paging." The single most annoying factor in the use of the PLATO system for program development is TUTOR, the only programming language available to authors, in which the lessons are to be written. (For a short intro- duction, see Popular Computing 1975; a detailed, and a slightly outdated description may be found in [Sherwood 1975].) TUTOR is a high-level language with an assembly-language-like format. It contains several machine-dependent data manipulative statements with such niceties as nested assignment statements and generalized versions of the computed- goto and do-loop statements of FORTRAN. Procedure blocks may be defined, but there are no local variables. Each variable name must be assigned an address by the programmer. Several variables with small values may be assigned to different segments of the same 60-bit word. In addition to these features, there are several statements that are useful in judging the students response. The run-time system of TUTOR permits nested procedure calls (recursive or not) at most 10 levels deep. Most lessons written for PLATO have a simple structure; for these programs, lack of control structures, local variables, etc. are not serious im- pediments. Typically, such lessons also use little CPU-time. Most stu- dents find it pleasant to "read" such lessons because of the near- instantaneous response and excellent graphics. Any unpleasantness is usually attributable to the author's style of writing his lesson. 90 5.2 ACSES 3™* s _ >u«B I ll «» SIS The Department of Computer Science of the University of Illinois has developed on PLATO an Automated Computer Science Education System [Nievergelt 1975] for beginning students in computer science. It con- sists of a large body of lessons, a GUIDE information retrieval and management system [Eland 1975] and an interactive programming system [Wilcox 1973]. The GUIDE may be used by a student to find out about his records or to choose a lesson of interest. The programming system supports several languages with excellent error diagnostics. The body of lessons largely consists of conventional Computer Assisted Instruction lessons about various aspects of computer science. Among this collection are two lessons which incorporate novel concepts of artificial intelligence and program proving adapted to run on limited computer resources: PATTIE [Danielson 1975], to tutor students in top-down program design; and SORTLAB, to be presented in the next section. 53. S0RTLAB--A Programming Laboratory SORTLAB concerns itself with the implementation of certain sorting algorithms. It provides a "laboratory" wherein a student can perform programming "experiments" using the various equipment provided. It does not actively suggest what ways should be used in implementing an algorithm, but focuses the student's attention on the correctness of his program by providing such tools as specially-designed, and easy-to- learn mini programming language, an excellent program editor, a program verifier, a counter-example generator, and an interpreter for his programs. 91 Program Programming Editor Language Recognizer SORTLAB Sorting Program Verifier Assertion Language Recognizer Verification Theorem Counter Condition Prover Example Generator Generator Figure 5.1 Components of SORTLAB 5.3.1 Programming and Assertion Languages Interpreter The languages are so chosen that while it is convenient and natural to express several sorting algorithms, writing other programs is not easy. The particular choice of basic operations in the program- ming language, and predicates in the assertion language is strongly influenced by decidability considerations (see Section 2.2). A program example is given in Figure 5.2. The syntax of the languages is specified in Figures 5.3 and 5.4. The assertion language semantics is specified in Section 3.1.1. The ptr assignment, while, if and call statements have the conventional meaning. The semantics of other statements of the programming language is explained in the examples below. 92 *»■»* V — »oB * i— T -X x 3 ..X ■o V| 8- + 8^ Q-X V| + •"3 «£ v| x v| -8 o O -a v| X A .X + X t- + v I a> •r- •r- C T- c + I r-} , — (O .|— t-j I— -r- V .C 4- > ^ «j + v| + •"3 V X ^3 r- T-jX c a; ^ 2 "i- X v | i "o ^ c X CD r-* M (^"d" mio * r^ CO" C3"> * O i — CMCO <* invo r-^ oo -o a; E 2l _Q C o c o +J 3 u CD X (L) CTi C o U 0) X c z. Z3 Q QJ CD fC O- >> ra Q. lO 5 •r- -r- S- 3 3 O O" O" w UD CD S- 4-> C\J CD s_ en 4 ^ + o O o un ^d- co r— CM o + 1 CO c a CD X a; c o S- a> to to 03 * o| I5> Q. to s_ S- M3 CO 93 := procedure endproc := {}* := |||| || := while do endwhile := scan from to endscan ::= if then else endif = «- = exchange x with x = insert x below x = call , ) = |*({,}*) = (, ) = |*({,}*) = {ojr }* = {and^ | = = x x = | =f= = = up with | down with = 0|l|2|| ± 1 = i | j | k 1 1 1 m | n Figure 5.3. Syntax of the Programming Language 94 4% : := {or }* : := {and }* : := | : := {} : := sorted | {} : := array | sorted |2< : := (, ) : := : := : := Figure 5.4. Syntax of the Assertion Language 95 The statement scan up with i from j + 1 to k - 1 endscan is equivalent to i * j + 1 while i ^ k - 1 do i «- i + 1 endwhile The loop variable i of the scan statement is not considered unmodifiable by the body. The statement " insert xi below xj" is equivalent to the following abstract program: t «- x.j ; p +■ i i_f i - j then while p £ j - 2 do x «- x , ; p «- p + 1 endwhile {circular up shift} else while p < j + 1 do x +■ x -, ; p +■ p - 1 endwhile {circular down shift} end if f •» 01-** t MM .'Ml 311! IB*" E a SIB 96 A program, in SORTLAB, is a collection of procedures and it always includes the main procedure "sort." All procedures are external and may be recursive. The array x is global to all procedures; indices are always local. Thus, the only way a procedure may receive an index value is by receiving it as a (value) parameter. Notice that apart from the array to be sorted x, and ptr vari- ables, no temporary variables are provided. Two padding elements x , and x + , are predefined to be -» and +°° respectively; these may be used as sentinels. Thus, the entry and exit assertions of main procedures sort (n) are: n ^ 1 and x < array (1 ,n) < x , sorted (1 ,n) 5.3.2 Language Recognizers The tokens of the programming and assertion languages are so chosen that (except for if, insert, and i «•. . .) they can be recognized by their first character. As soon as the first character of the token is typed, the statement is completed as far as possible and is displayed. An illegal key-press causes it to be flashed and is ignored. Thus, in writing the following statements only the underlined keys need be pressed: scan down with 1_ f rom H "to 2 endscan exchange xi^l with xj+1 5.3.3 Program Editor Each procedure constitutes a "display page," and these may be 97 selected by typing in the name of the procedure. A statement is inserted by first giving a line number to it and then writing the statement. An assertion is given as the exit assertion of a statement; the assertion is displayed at the end of the statement. Thus, the line labeled 16* in Figure 5.2 is the exit assertion of the if -statement at line 11. It is also the loop invariant of the while- loop at line 4. Any sequence of statements can be deleted and, if so desired, saved. A segment from among several of such saved program segments may later be inserted into a pro- cedure. Compound statements 1 ike the while- statement are written in two steps: first, the while- envelope with its corresponding endwhile and without a body is written. At a later time, the body is formed either as a sequence of new statements, or by inserting a saved program segment. Thus, a number of simple, but common, errors, like unmatched end-brackets of statements, unintentional nesting of bodies because of a missing begin , end , or semicolon, do not arise. Further, structural changes of a procedure do not require reparsing. Every structural change results in a new page displaying the updated version, with automatic indentation, of the procedure. A number of ideas incorpoarted into this editor are originally due to [Hansen 1971 ]. 5.3.4 Interpreter The interpreter can execute any program written in the program- ming language. The assertions are also executed, and their truth value at run-time is indicated. It is possible to execute the program J- MttVt So i 3c 2 • *- •viz ;:sas junn Ik 98 in various modes, including step-by-step node. During execution, the contents of the array being sorted is dynamically displayed along with the location of various indices (Figure 5.2). Only the currently active procedure are displayed; as each new procedure is entered, that procedure is displayed. An invocation trace is also displayed. The interpreter carefully checks for all possible violations of the assumptions made by the verification system: Each procedure is assumed to permute only the elements of the array segment between the two imput parameters of the procedure (1 is an "implicit" input parameter of procedure sort ; this prevents it from becoming a recursive procedure since each call statement must have two input (actual) parameters!). The values of all index variables should be between and n + 1 where n is the size of the array; once an index variable has a value outside this range, it is not possible for that variable to have a legal value. 5.3.5 Sorting Program Verifier The student requests that his program be verified when he has completed writing it. The verifier then proceeds to verify his program provided all the required assertions (an invariant for each loop; an entry, and an exit assertion for each procedure; an entry assertion for each call statement) are given. The process of verification is not interactive. The student is informed only of the outcome of the verification. If his program is not proven correct, the lemmas which were false are indicated. He may then request a counterexample, or proceed directly to edit his program. 99 We emphasize that when a program is not proven correct, it may be because strong enough assertions were not given. 5.3.6 Possible Extensions of SORTLAB It seems possible to construct a "sorting expert" consisting of such components as loop invariant generator, termination prover, efficiency analyzer, elegance judger, and algorithms expert. Systems similar in intent to these subcomponents have been designed in other contexts. El spas [1973] describes how the efficiency of a program analyzed automatically, a by-product being termination. Considerable literature (see, e.g. [Wegbreit 1974]) has appeared on the automatic generation of loop invariants. Ruth [1974] discusses a system which attempts to give quality feedback to the student using built-in knowledge about specific sorting algorithms like bubble sort algorithm. An elegance judger may be readily constructed if that elusive characteristic, "elegance," of a program is quantified in terms of measurable quantities like the length of the proofs of correctness, number of statements, variables etc. The tutoring system SORTLAB would certainly be more attractive with such a sorting expert. The construction of this component seems doable, but is another project of same magnitude as the verifier. I H I PI 4 (nit* j il.;* P J. MB »1 ¥10 pai 35 100 6. DISCUSSION Many verifiers have been constructed. Yet, none of them can be considered a tool usable by ordinary programmers. The number and variety of programs proven is small. Data structures more complex than linear arrays or lists are handled unnaturally. More significant is their lack of performance of these verifiers in terms of memory space, and computation time needed. This failure in making significant advances toward constructing verifiers that are mechanical aids to program writing can be largely attributed to the yery attitude taken in building several of the present day verifiers. They all seem to start with the presumption: Given an arbitrary program with assertions, prove it. Evidence is building up that practically usable verifiers cannot be constructed unless the prob- lem domain is limited, programs are well -composed, abstract data struc- tures and operations are used, and properties of programs and data structures are studied from a semantic viewpoint. Thus, we foresee not one ultimate program verifier but a class of limited domain program verifiers, each capable of proving/disproving a certain class of programs. Section 6.1 elaborates these points. Section 6.2 describes a few of the significant verifiers and theorem provers built so far. 6.1 A Critique of Program Verifiers McCarthy [1963] was one of the earliest to recognize the need to replace debugging of systems (computer programs, engineering systems, etc.) by proofs that systems meet their specifications. Considering 101 programs as mathematical objects, he goes on to show how statements about programs may be proven. The theory developed by Floyd [1967] for iterative programs is comprehensive and equates the correctness of the program to the truthhood of a certain set of lemmas generated from it. King [1969] constructed a verifier which mechanized both lemma generation and proof. This clearly demonstrated the feasibility of an automatic program verifier and became the pilot system for a dozen or so systems to follow (see [London 1972]). Many of these verifiers are the result of unfortunate marriages between a lemma generator and a classic automatic theorem prover, and none can be considered to be sig- nificantly superior to King's verifier. 6.1.1 Theorem Provers for Program Verifiers Work on classic theorem proving always concerned itself with the general problem of syntactically deducing that a given statement of first-order logic follows from a set of axions (see, e.g., [Chang and Lee 1974], and [Bledsoe 1975]). Pointing out some of the theoretical impediments to automatic theorem proving, Rabin [1974] comments that this work had such high hopes and aims as: . . .to develop a theorem prover which will enable them to solve mathematical problems, and hopefully even difficult mathematical problems, by the com- puter. If one wants to slide into the realm of science fiction then one may talk about proving or disproving Fermat's conjecture by an automated theorem proving program. . . . Since first-order logic is undecidable, one is looking only for efficient semi-decision procedures which will produce proofs of statements which "' "1 .1 1'W 4 .wan 1 Sic* S M 3 •I W*" A I.JI 102 are theorems and halt, and which may not halt on nontheorems. But, as Rabin makes it plain, even in such theoretically decidable domains as Pressburger Arithmetic (first-order sentences involving natural numbers and the operation of addition only), to computationally determine if a given sentence is true or false may be practically undecidable. If verification is ever to replace debugging, verifiers should be able to handle incorrect programs. That is, we need theorem provers which are decision procedures for the lemmas generated. Thus, the pro- grams that a verifier attempts to prove or disprove should be so limited that the lemmas generated belong to a decidable domain. This can be done only by carefully designing a language for assertions expressive enough to allow all "legitimate" assertions one might want to make in proving properties of programs from an interesting class of programs. The theorem prover should then be a decision procedure for all sentences in the assertion language. Since even decision procedures may take impractical ly long to decide if a sentence is true or false, they should be so engineered that for a large subset of the lemmas that can be considered to be "naturally occurring" in well-designed programs such decisions are made rapidly. Thus, we may not mind if it takes super-exponential time to decide if a verification condition of the following kind { n | i *■ i 103 is correct (because the programmer has the bad manners of misusing the verifier to prove an irrelevant mathematical theorem that n implies w) so long as the verifier gives correctness proofs of legitimate programs quickly. Furthermore, the lemmas generated in proving well -designed, legitimate programs are not typical of manual mathematics. These lemmas are shallow and follow fairly directly from (properly chosen) axioms and inference rules. Clearly, it is impractical to include all lemmas to be proven as the set of inference rules; a small number of inference rules should be carefully tailored so that short proofs of naturally occurring lemmas can be given rapidly. Two examples of theorem provers so designed are [King and Floyd 1972] and the theorem prover described in Chapter 3 of this thesis. 6.1.2 Effect of Program Composition The structure and statements of a program clearly will have an effect on its verification. Writing abstract programs using abstract data structures has been advocated by such authors as Dijkstra and Hoare. The solution to a programming problem is constructed using operations on data structures that are natural to the problem. These operations and data structures will then be written at a lower level of abstraction, and so on, until all operations and abstract data structures are implemented in the host programming language. The advantages of such an approach lie in the factorization of detail at any given level of abstraction. 104 -• go ' S;|W» JW 3 ; i ■ in 3& Such abstraction is helpful not only to the human designer of the program, but also to the program verifier. When data structures are manipulated solely through designated procedures, properties related to data integrity can be proven by considering these procedures independent- ly of their invocations using generator induction [Hoare 1972]. Thus, for example, that a sorting algorithm has only permuted the given ordered set of elements can be shown by proving that the primitive operations exchange and insert were element-conserving. Another important advantage to be gained is that undecidable domains of lemmas may be isolated in a program. Arithmetic operations such as multiplication, division and addition which result in theore- tically or practically undecidable domains can be grouped together and their input/output relationships explicitly given. These relationships may then be proven separately by ad hoc techniques. Often, such arith- metic is not essential to the property of the program being proven. For example, the division by 2 in binary search, and multiplication by 2 in siftup of heap sort are not essential to the correctness proofs. The only thing that matters for the correctness of the search is that the interval of uncertainty be partitioned into two smaller subintervals. These operations on data structures are generally implemented as procedures. Only selected components of a data structure are modi- fied by the procedures, keeping the remaining environment of the procedure intact. However, the rules of inference about procedure calls such as those given in [Hoare 1973] or in [Elspas et al . 1973] deal only with "entire variables" (a whole array, a whole stack, etc.) and are weaker 105 than they should be. That is, correct programs exist which cannot be proven using such inference rules. A "predicate transformer" (a la [Dijkstra 1976]) offers a solution to this problem. The rule of procedure invocation of [Hoare 1973] can be roughly described as follows: Let Q be a procedure whose correctness with respect to and iJj has been established independently, i.e., { I Q I 4>} Then to prove {a [call Q| 3) verify the following: and a |= * 1= 3 i i where and ty are obtained from , and ^ with appropriate substitutions made for the formal parameters of Q. Clearly, this rule is sufficient to prove {al call Q|$}. But the exit assertion ^ of Q cannot, in general, contain enough information to imply 3 when Q is called under different input environments, all of them satisfying <|> . A number of properties guaranteed by a may be unchanged by Q, and hence true upon exiting Q. What is needed is a meta-operator which produces a 3 as the transformations made by Q on a when a implies 4> . Such an operator in the context of backward substitution is a "pre- dicate transformer," transforming the given exit assertion 3 of the i call Q into a , which is the weakest entry condition to call Q such that 3 is true if and when call Q returns. 106 ."'' ■C Ml '» \ c 3 '*:.* n * IK) ■ * I a Is* b linn IS The verifier should be given a predicate transformer for each procedure Q which may be invoked under varying circumstances. However, if the procedure Q is not well-written (e.g., global variables were used where local variables should have been used), the predicate transformer will be an overspecifi cation of Q. It should also be realized that some procedures are called only in certain contexts. In such cases, Hoare's rule is simpler to use. 6.1.3 Proving Certain Properties of Programs It is not difficult to invent innocent-looking programs whose correctness is \/ery difficult to establish. Pure and deep mathematical results may be used in the program and hence there may not be a "directly perceivable" relation between what is being computed and the stated in- tentions of the program. For example, a depth-first search algorithm [Tarjan 1972] com- putes certain simple functions NUMBER(«) and L0WPT(«) on vertices, and deletes all edges from a stack until a certain condition on NUMBER(«) is satisfied. This property is quite obvious to prove. That this set of edges constitutes a bi connected component of the graph, however, is a difficult theorem. It is interesting to note that this and several other graph algorithms use very simple arithmetic (successor function +1 , and < relation). Habermann [1975] gives another example of an al- gorithm (a quadratic-hash algorithm) whose correctness proof does not readily follow from the program structure itself. M 107 "Existential" properties are also quite difficult to prove using the inductive assertion approach. Consider, for example, an al- gorithm enumerating all circuits of a graph. Its exit assertion is: Every subgraph g (of the given graph G) that is output is a circuit of G, and conversely, every circuit of G is output. As another example, consider a shortest path algorithm. The exit assertion is: The graph G has no path shorter than the one found by the algorithm. The path p found by the algorithm often appears explicitly in the al- gorithm, while the set of all paths of G that p -is being compared to does not. 6.2 Previous Work Related to This Thesis In a survey, London [1972] reports that there are more than a dozen verifiers constructed so far, most of these using the inductive assertion method. None of these verifiers can, in general, handle incor- rect programs. Only algorithms that were known to be correct a priori have been mechanically verified with varying degrees of human interven- tion in their proofs. We briefly describe two of these verifiers—King's and SRI — which have influenced the verifier presented in this thesis. Other Era* ■ 108 i r -(.Iltj: J ^ S 1 9 ' Hilt*" 1»> 9 ' ; iTi :;as il «» IBS significant verifiers include [Luckham et al. 1973],[Deutsch 1973], [Boyer and Moore 1975], [Good et al. 1975] and [Marmier 1975], Cooper [1975] discusses independently some ideas similar to those expressed in Chapter 3. 6.2.1 King's Verifier King [1969] constructed a verifier which mechanized both the lemma generation, and their proof. A commendable engineering approach was taken in tailoring the theorem prover. The programs, and hence the lemmas, were limited to integer-valued variables, including linear ar- rays. Several ad hoc techniques which depend on the detailed knowledge of integer expressions are used in proving a large class of lemmas about integers. The premise and the negation of the conclusion of the lemma to be proven are represented in a "normal" form, and the resulting set of linear inequalities, and nonlinear equations is algebraically solved [King and Floyd 1972]. Among the programs that King's verifier has proven, without any human intervention, are: simple insertion sort, bubble sort, and computing x using the binary representation of y. Subsequent verifiers ([Elspas et al . 1973], [Luckham et al . 1973], [Good et al. 1975], [Deutsch 1973]) have provided for interac- tion with the user in attempt to prove a much larger class of pro- grams, resulting in the proofs of such programs as Hoare's FIND. 109 6.2.2 SRI Verifier The theorem prover [Elspas et al . 1973] is a collection of inference rules together with a set of strategies. Given the premise of a verification condition to be proven, determining whether it implies the conclusion proceeds in a goal -driven manner. The theorem prover has several high-level inference rules about arrays. Unfortunately, the theorem prover is embedded in a disastrously general QA4 system [Rulifson 1972], and lacks a sense of direction. At any given point, several in- ference rules are applicable, and the system applies each one in turn until it succeeds in proving the goal or exhausts all inference rules when, of course, the lemma is false. However, it should be noted that the application of an inference rule may generate further instances of application for another rule, and vice versa, resulting in thrashing. The user may be called upon to provide advice on such and other occasions which can then alter the course of deduction., Both King's verifier, and the SRI verifier handle arrays unsat- isfactorily, using the equivalent of access and change functions of McCarthy [1967] because array elements are considered to be of the same type as their indices, and interassignments between them are allowed. Our own inference rules about arrays (see Chapter 3) may be considered as refinements of the rules in the SRI verifier. 6.3 Salient Features of the Sorting Program Verifier The verifier presented in this thesis has been designed to meet specific performance requirements. It was to be usable in an no interactive computing system which imposed severe constraints on both the amount of memory and computation time that can be used (see Section 5 1). This section briefly analyzes the factors that contributed to the fast decision procedure, and notes some of its shortcomings. 6.3.1 Decidable The verifier presented here is unique in that it is the only verifier with a decision procedure for the verification conditions of the programs it accepts to verify. It makes no pretense of being general,, The syntax of the input programs has been carefully designed to reject all programs that the verifier cannot prove or disprove. It provides two basic operations, exchange and insert , to permute the elements of the array, thereby guaranteeing that the elements of the array are conserved. The assertion language is just powerful enough to express all the asser- tions that may be made about sorting-type algorithms. The basic predi- cates provided capture the notion of sequential access in sorting algorithms. The decidability is due to such restriction of the lemmas generated, and the partitionability of the sequentially accessed array structure. This results in a canonical representation for each lemma to be proven. The rule of local implication lets us decide if a given pre- dicate is implied by the hypothesis without any search. At no time does our theorem prover need to backtrack or consider various inference rules for their applicability. 6.3.2 Fast The theorem prover is not only a decision procedure, but gives in these decisions rapidly for most theorems encountered in proving sorting algorithms. It should be noted that loop invariants of most algorithms (not necessarily sorting) are conjunctions of predicates. This theorem prover is specially suited to prove such theorems by natural deduction. It might appear that a large number of linear orderings of boundaries will be considered in the proof of a lemma; however, if the algorithm is well -written this is generally not the case. Such lack of information about how the boundaries are ordered is not typical of sorting algorithms, Two factors contributing to the speed of the theorem prover are the large inferences made about array segments, without considering their individual elements, and the rule of local implication. 6.3.3 Backward Function Evaluation The backward function evaluation, in the context provided by the ptr expressions which constrain the boundaries of array segments, considerably simplifies a given lemma. This completely eliminates the need for such pseudo-functions as access , and change of McCarthy, used in nearly all other verifiers. It is important to realize that such con- textual evaluation is valid only if assignments among array indices and elements are not permitted. 6.3.4 Counterexample Generation We consider the generation of counterexamples one of the most important duties of a program verifier. If debugging is ever to be replaced by verification, incorrect programs must be handled by verifiers 112 ,c - • « ■hi b J. IMS' }.» i C ' ■'> SK era jig by either suggesting corrective actions, indicating the unproven verifi- cation condition, or actually generating a counterexample for the skeptic, As shown in Chapter 3, a modified shortest-path algorithm is the counterexample generator used by this verifier. 6.3.5 Some Shortcomings It is interesting to note that the theorem prover is not goal oriented. Thus, in proving even a trivial theorem such as sorted (l,n) f= sorted (l,n) it considers two partitions (one for each of the cases n £ and n > 0) of the array. This is typical of decision procedures in that they may ignore shortcuts. However, the strength of our decision procedure is in its orientation toward naturally occurring theorems. More seriously, it is hard to generalize the theorem prover. For example, if we permit the predicate that all keys of an array seg- ment are distinct, the theorem prover cannot be extended in a straight- forward manner. 6.4 Conclusion SORTLAB shows that verifiers for programs from a limited do- main of application, which incorporate some of the semantics of the domain, are practical. It would be interesting to see an approach similar to that described in this thesis tried for another domain that is well- understood and easily formalized mathematically. 113 We believe that such limited program verifiers will be the trend of the future, in the wake of recent results in practical unde- cidability and the lack of progress in mechanical program verification in general „ 114 REFERENCES * V> ' *i B •3,3 I'J a ,iw 3>' ■ o • 2:w Bg* ; Si Sums [Bledsoe 1975] W. W. Bledsoe* "Non Resolution Theorem Proving," ATP-29, Departments of Mathematics and Computer Sciences, Univer- sity of Texas, Austin, Texas 78712, September 1975. [Boyer and Moore 1975] R. S. Boyer and J. S. Moore, "Proving Theorems about LISP functions," Journal of ACM 22 (1975), 129-144. [Chang and Lee 1973] Chin-Lian Chang and Richard Char-Tung Lee, "Symbolic Logic and Mechanical Theorem Proving," Academic Press, New York, 1973. [Cooper 1975] D. C. Cooper, "Proofs about Programs with One-Dimensional Assays," Unpublished manuscript, March 1975. [Dahl et al. 1972] O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, "Structured Programming," Academic Press, New York, 1972. [Daniel son 1975] Ronald L. Daniel son, "PATTIE: An Automated Tutor for Top- Down Programming," Ph.D. Thesis, University of Illinois, Urbana, Illinois 61801, October 1975. [Deutsch 1973] L. Peter Deutsch, "An Interactive Program Verifier," Ph.D. Thesis, University of California, Berkeley, California, May 1973. [Dijkstra 1976] Edsger W. Dijkstra, "A Discipline of Programming," Prentice- Hall, Englewood Cliffs, New Jersey, 1976. [Eland 1975] Dave R. Eland, "An Information and Advising System for an Automated Introductory Computer Science Course," Ph.D. Thesis, University of Illinois, Urbana, Illinois 61801, June 1975. [Floyd 1964] Robert W. Floyd, "Algorithm 245: Treesort 3," Communications of ACM 7 (1964), 701-701. 115 [Floyd 1967] Robert W. Floyd, "Assigning Meanings of Programs," Proceedings of a Symposium on Applied Mathematics , American Mathematical Society 19 (1967), 19-32. [Elspas et al . 1973] Bernard Elspas, Karl N. Levitt and Richard J. Waldinger, "An Interactive System for the Verification of Computer Programs," Standford Research Institute, SRI Project 1891, Menlo Park, CA 94025, September 1973. [Good et al. 1975] Donald I. Good, Ralph L. London and W. W. Bledsoe, "An Inter- active Program Verification System," IEEE Transactions on Software Engineering 1 (1975), 59-67. [Habermann 1975] A. N. Habermann, "The Correctness Proof of a Quadratic-Hash Algorithm," Department of Computer Science, Carnegie-Mellon University, Pittsburg, PA 15213, March 1975. [Hansen 1971] Wilfred J. Hansen, "Creation of Hierarchic Text with A Computer Display," ANL-7818, Argonne National Laboratory, June 1971. [Hoare 1971a] C. A. R. Hoare, "Proof of a Program: ACM 14 (1971), 39-45. FIND," Communications of [Hoare 1971b] C. A. R. Hoare, "Procedures and Parameters: An Axiomatic Approach," Proceedings of Symposium of the Semantics of Algorithmic Languages , Lecture Notes in Mathematics 188, Springer Verlag, 1971. [Hoare 1972] C. A. R. Hoare, "Proof of Correctness of Data Representations," Acta Informatica 1 (1972), 271-281. [Luckham et al . 1973] David C. Luckham, Friedrich W. vonHenke, Shigerie Igarashi , Ralph L. London and Norihisa Suzuki, "Automatic Program Verifica- tion," STAN-CS-(73-365, 74-473, 74-475, 75-522), Standford University, Standord, California, 1973. [King 1969] James C. King, "A Program Verifier," Ph.D. Thesis, Carnegie- Mellon University, National Technical Information Service, Springfield, Virginia 22151, #AD 699248, September 1969. *"» 3 J:* I SIS 3 3* giiw' ] xj then exchange xi with xj else en di f S(1,I) < XI < A(I+1,J) & 1 <_ I < j < ends can S(1,I) < A(I+1,N) & 1 < I < N endscan * S(1,N) 10 en dp roc The theorem prover disproves the corresponding verification condition, ( subst j-1 for j vn_ 7*) and j < n stnts [4. ..7] b (7*) in 1114 CPU-milliseconds. When the assertion at 7* is given as: S(1,I-1) < A(I,N) & XI • Identifiers/Open-Ended Terms Is. COSATI Field/Group Availability Statement Release Unlimited 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 122 22. Price 1 "M NTIS-38 ( 10-70) USCOMM-DC 40329-P71 r is 19 is* IS i .a* I $■'£! go JAN 2 5 W77 I I ■1 $ I I '0 > ID SB 1 mm H 5 5 ; ' ! .1 > in ;: % I p '8 I! V ■nOHBHIIWfltnHHHHHHUHVMnHMUHnnMMHHitlM'li i * ■■'!'/ JM» 1 9 1978 II TtrtTniiiiiiiHHiHiiiMHimimin^ggy^