■ mm w H HI HHn SSKfi! Kbshh EraS H Kraft! HHH wKmM WBfB&BmBSBi 111 H MHvflli Fir BH^H Nlill ■I ■HHi 8B£iB9Sil Eufflmiiillll OtHMMd DBBBlfSMlIBn^ LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN SI0.84 IQ6V ho. 649-654 cop. 2. The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN SEP 2 7IEC0 L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/cleopatraproposa654schr 3 iu -or UIUCDCS-R-74-654 June, 1974 /?UZ4, CLEOPATHA A Proposal for Another System Implementation Language by Axel T. Schreiner DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS THE LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBAN KB!? Keport No. UIUCDCS-R-74-654 CLEOPAfHA A Proposal tor Another System Implementation Language by Axel T. Schreiner 1974 Departments of Mathematics and Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 This work was submitted in partial fulfillment of the requirements tor the degree of Doctor of Philosophy in Computer Science, June 1974. Ill A personal note. When I came to the United States in 1968 on a Fulbright Travel Giant to study mathematics at Northern Illinois University, I was firmly determined to return to my alma mater in Stuttgart, Germany, within a year to complete my studies there. My life tooK quite a different course- Over the last six years I have become deeply involved with computer- oriented and -aided mathematics instruction and with systems programming. CLEOPATRA is just one of the results of my extended stay. Many people contributed to my education. First^ and foremost I must mention Werner Meyec-KOnig whose example made me come here and who is a patient friend. Marvin Wunderlich turned assembler language programming into a challenge, H. George Friedman continued to give me almost unlimited opportunities to program and to cope with all aspects of OS/360 and he later served admirably as ay Ph.D. advisor, John Brown and tne Mathematics Department at Illinois allowed me vast freedom over three years in designing Computer Calculus aud in worxing with the PLATO system, and my friend Gary Pace convinced me to attempt degree work in computer science. The Department of Computer Science proved to be an ideal environment for the studies reported here. To all of them 1 wish to express ray sincerest thanks for all the opportunities I found. My wife Carol, however, an elementary school teacher, had to live with PLATO, CLEOPATRA, and me, and in about that order. She may not read this thesis, but she will understand. And to her it is dedicated. A.T.S. IV Taule of Contents. 1 Introduction and background. 1 1.1 Previous languages for system implementation. 2 1.2 The impact of structured programming. 7 1.3 Extensible languages. 13 2 The desiyn of CLEOPATRA. 17 2.1 Program structure. 21 2.2 Data processing. 28 2.3 Control structures. 33 2.4 A summary of local innovations. 37 3 Two examples. 44 3. 1 A symbolic differentiator. 45 3.1.1 Problem statement. 46 3.1.2 Solution. 47 3.1.3 Improvements. 57 3.1.4 Design process. 59 3.2 A "buddy system" storage allocator. 66 3.2.1 Problem statement. 67 3.2.2 Solution. 68 3.3 Conclusions. 79 4 Implementation. 86 4. 1 Global considerations. 89 4.2 A few specific problems. 93 4.2.1 POINTER problems. 94 4.2.2 Arrays. 98 4.2-3 Generic values. 100 4.3 Extensions left to the implementor. 101 5 Conclusions. 104 A A summary of CLEOPATRA. 108 B Annotated list of references. 117 1. Introduction and background, Operating systems have grown in size and complexity. They are now providing extensive services to their users, ranging from dynamic storage allocation over assistance in input-output operations to multi-tasking and time-sharing. At the same time, system design must meet a variety of design goals such as efficiency, maximum throughput, reliability, clarity, ease of maintenance, etc. [ Abernathy 1973]. Operating systems have traditionally been coded in assembler language. Only in recent years have we observed an increase in the use of ALGOL-style block structured high- level languages for parts or operating systems [Sammet 1971b J. I do not guite agree with Fletcher [1972] when he claims that "Universities use high-level languages, because of the high turnover in graduate students": the use of high- level languages to me in general seems to encourage a certain amount of coding discipline, and it usually affords the user a problem oriented approach to debugging. In this report we shall discuss the design and a possible implementation of •CLEOPATRA* , a new high-level programming language, addressed primarily towards the needs of an operating system impleaentor [Schreiner 1974, henceforth called the Report]. To put the discussion into proper perspective, we shall first briefly review some existing programming languages and their applications in system implementation (section 1.1), the concept of "structured programming" and its implications (section 1.2) , and extensible languages (section 1.3). Section 2 presents some considerations which led to the final form of CLEOPATRA, and in section 3 we shall illustrate the language by means of some coding examples. Finally, section 4 of this report is intended as a guide to the prospective implementor. 1.1 Previous languages for system implementation. Lyle £1971] presents the hierarchy of ALGOL languages used on the Burroughs 6700 computer. They range from a slight extension of standard ALGOL t»0 [Naur 1963] to the unsafe and machine-dependent extension ESPOL. All system programming on the Burroughs machine is done in one of these languages; in particular, the Master Control Program (HCP) , the operating system, is written in ESPOL. The 6700 is the successor to the Burroughs 5500, and the concept of ALGOL system programming was carried over from that machine. I would like to emphasize that especially the Burroughs 6700 is designed as a host for a high-level language; e.g., subscript checking for arrays xs a hardware feature. I have used the term "unsafe". By this I mean features of a language that cau be used to counteract otherwise implicit protection mechanisms in a potentially destructive fashion. E.g., the ability to disable subscript range checking or the (hardware) storage protection mechanism affords the user the opportunity to overwrite areas of memory which may not be rightfully his, and the user may thus interfere with the proper execution of his and other proyrams. A language is "sate" if even syntactically or logically incorrect programs cannot interfere with the execution of other programs in an uncontrolled and unpiedictaole manner. Some operating system components must necessarily be written using unsafe features of a language. The idea of unsafe supersets of a protected high-level language was further pursued by Hain [ 1972, 1973a-c, Holager 1972] with MARY, a relative of ALGOL 68 [van Wijngaarden 1969, Cume 1971, Lindsey 1972, Suzuki 1971, van der Poel 1971, Woodward 1972, Yoneda 1971 J. fiain compiles programs into code for various machines, most notably the UNIVAC 1108 and several minicomputers. Portability and an efficient exploitation of the idiosyncracies of a particular hardware are achieved by a macro-like prelude facility, which details the amount of unsafety of the language, together with particular code sequences that the compilation is to employ. Hewlett-Packard announced its System/3000 together with a system programming language [start 197 1a-b, 1972], and claims to have reduced system software development time by a factor ot five over conventional assembler language programming. SPL/3000, the system programming language for the System/3000, is similar in style to ALGOL, and it provides access to the hardware through in-line assembler code and through the use of reserved operands such as TOS, denoting the top of the built-in hardware stack. From this point of view, SPL/3000 is not a very safe language. Except for conventional procedure mechanisms it lacks the possibility of extending the built-in features of the language. Another recent system implementation language is BLISS/10 for the DEC PDP/10 and BLISS/11 for the DEC PDP/1 1 computers [ tfulf 1971a-d, 1972b, 1973b]. BLISS is a typeless language in which every expression as well as every control construct has a value. Wulf has recently implemented an optimizing compiler for BLISS/11, which is claimed to produce extremely efficient code. Many operating system related applications have been coded in BLISS; Hulf reports [1972b] that "a considerable number of systems have been written using it: compilers, interpeters, i/o systems, simulators, operating systems, etc." Turning, finally, to the IBM Systea/360, we note that a sizeable amount of new software is distributed by IBH in PL/S, a system implementation language which also allows in- line assembler code. Details of PL/S are not readily available: IBM merely offers a guide to reading PL/S programs [IBM], and Hiederhold [1971 J presented educated guesses for the syntax and semantics of PL/S. The language allows assembler code in-line with essentially block structured text. The programmer has fairly complete control over register assignments, allocation, etc. PL/S code is compiled into machine code, mostly with absolute address computation (which makes the resulting code hard to read) , and the PL/S statements can be included as comments. Another system implementation language is PL360 £ Wirth 196b]; we again find in-line assembler code and a very thorough access to the hardware. BCPL, a typeless language [Richards 1969, 1971], has been implemented, among other machines, for the IBH System/360. It allows sophisticated address computations (and thus the creation of complex data structures) on the left hand side of an assignment statement. A related language for the PDP/11 series of machines is STAB [Colin 1972]. An ambitious project in system design, project SUE [Atwood 1972, Sevcik 1972], is dedicated to the design and implementation of a simple time-sharing system for the IBH System/360 computers. Design techniques are influenced by Dijkstra's THE system [1968b J, and by his ideas on structured programming [1970]. The designers of SUE based al4- their implementation work on a language designed and implemented solaly for this purpose [Clark 1971a-b, 1973]. The SUE language resembles PASCAL [ Wirth 1971a-b], adding the concept of compilation blocks to separate the descriptions of data, program context, and executable code. We close this brief survey with a reference to the GE 6U5 flULTICS system [Sammet 1971b] which was completely implemented in various versions of PL/I. It is interesting to note that in this case even a very poor compiler proved to be useful enough to have the implementors of HOLTICS resist the temptation of coding in assembler code. In one case code originally written in assembler language became so complex that it was recoded in PL/I to make it intellectually manageable. Optimization, the standard argument in favour of low-level languages, was obtained by redesigning modules and their relationships, and not by recoding in lower languages. I have not defined what an operating system is, or what makes a system implementation language different from a general purpose language, and from an application language. Likewise, I have neglected to mention a large number of programming languages that a "system analyst" at one time or another somewhere has employed, for a comprehensive state- of-the-art-1972 survey I recommend the papers presented at the 1971 Purdue Symposium on Languages for Systems Implementation [Proceedings 1971b j. For a survey of lauguages in general, consult Sammet*s annual rosters [1971a, 1972c, 1974], and the survey articles by Cheatham [1971], Rosen [1972], and Sammet [1972a-a]. For an overview of current and anticipated projects, see [Proceedings 1973a]. As for a definition of system implementation, let it be that part of computer programming which attempts to construct a synchronous deterministic machine from the black (or blue) boxes found in the hardware store. Additionally, I tnink we should interpret "deterministic" to also mean "conversant in a humanized language". 1.2 The impact of structured programming. The term "structured programming" is due to Dijkstra, and has been the subject of much speculation ever since. I do not feel competent enough to add yet another definition, and such is not really necessary for this thesis. Let me trace some of the history and the implications of this programming technique instead. 8 Dijkstra [ 1968b, 1970, 1971, 1972] attempts to analyze and formalize the process by which we arrive at "correct" implementations for solutions to programming problems which initially can be both incompletely understood, and too complicated to be managed comfortably. It is crucial to Dijkstra* s approach to solve a problem by successive refinement and elaboration. The problem is kept intellectually manageable by restricting the scope of one's attention to a well-delimited and completely bounded fragment. Knuth [1973], however, observes that even in this fashion eventually the entire program is in Dijkstra's mind, except, perhaps, that it arrived there in a controlled fashion. In the spirit of independently considered local fragments, undisciplined transfers ot control into such fragments were to be eliminated: thus arose the great qpto controversy. The results are summarized in [Proceedings 1972]; arguments have been found in favour of a low-overhead (i.e., relatively disciplined) g,oto [Wegner 1972] as well as in favour of a rich set of more structured control mechanisms, such as a case statement [ Hulf 1971d, Hoare 1973a]. Structured programming to me is a question of discipline in using one's intellect much more than a question of external discipline imposed by the designer and the la pie mentors of a language which may or may not provide a goto. A language designer can encourage and aid cleanliness of code, but he cannot guarantee it. Schroeder [1973] puts this quite nicely: "If Djikstra were stranded on a desert island with nothing but FOBTRAN he would still practice structured programming..." The quotation continues "but probably would not produce a structured program, for FORTRAN is not up to the task." At this point the problem of formalizing the creative approach to problem solving becomes painfully obvious. Schroeder elsewhere in his paper writes^ "While it is not yet clear precisely what structured programing is, there is general agreement that such a thing is possible and would be extremely useful in constructing large, complex systems." This sounds like an invitation for yet another large software psychology proposal. Schroeder subseguently prompted Denning [1974] to inquire "Is it not time to define structured programming", and Zelkowitz countered "It is not time..." [ 1974]. The literature on structured programming actually is quite extensive. It ranges from theoretical discussions, which mostly more or less rephrase Dijkstra's terminology [Liskov 1972, Mills 1970, Smoliar 1974], to the exposition of actual programming activities carried out in a demonstratively orderly fasnion [Henderson 1972, Hoare 10 1972b, Ledgard 1973, Naur 1969, 1972, Parnas 1972a-c, Hirth 1971c]. Ledgard provides an excellent, discriminating definition tor the terms "top-down programming", "stepwise refinement", and "structured programming". Henderson, however, together with Ledgard's paper, shows that if two programmers solve the same problem, theoretically even with the same problem-solving tecnniques, the results need neither agree nor even be initially correct. Naur [1972] and Mirth demonstrate that very different problem-solving techniques can yield suitable programs for the same problem. Parnas, and to some extent Hills, discuss yet quite a different aspect: program structuring by a supervisor in the interest of having a group of programmers cooperate successfully to create a complex system. This aspect, incidentally, will be the commercially much more important consideration in the long run. Confusion, as to terminology, way of use, and feasibility, rules, but be that as it may, a language designer should consider Dijkstra's approach of top-down refinement, as well as Naur's local refinement in "action clusters" £1969] as extremely enlightening examples of the way in which a programer might think and then use the programming language. This tool then should support and enhance the thinking process, if possible. 11 Languages have been designed and implemented which cater to the local and top-down approaches. He already mentioned SUE, and we should add Snowdon*s PEARL [1972] as probably the most pronounced example. Freeman [1972] proposed a software laboratory similar to PEARL to be geared toward interactive top-down programming. Knuth [1973] provided an interesting suggestion for the use of SIMULA 67 [Dahl 1966 and in Dijkstra 1970, Ichbiah 1971, Palme 1974] in a structured programming context. His is the best execution of the technigue with a contemporary language which I have seen. Dijkstra^ related attack on the harmful goto [1968a] prompted extensive research into the different conceivable control structures- The research ranged from a purely theoretical expressability approach to attempts at the detinition of the most flexible, manageable, and "natural" set of control mechanisms. My references certainly being incomplete, I would recommend [Ashcrott 1971, Foulck 1972, iloare 1972a, Martin 1973, Nassi 1973, wegner 1972, 1973, Hulf 1971d] for further study. The most promising results to me seem to be for teaching Nassi and Shneiderman* s flowcharting technigue, and Martin's set of control structures for its completeness according to the very rational criterion of a flowchart of limited complexity. Both technigues demand further study. 12 Among never languages we observe a definite trend towards cleaner control structures. Nevertheless, PASCAL and BLISS demonstrate that this can be achieved with or without the goto, although Haberman [1973J took exception. "Structured" subsets of PL/I have been defined [Holt 1973], and even SNOBOL is under revision [Griswold 1974, Abrahams 1974b]- Although sometimes (wrongly) considered synonymous with structured programming, "qo$.g-f ree" is not the only offspring. Another related idea is "abstract data types", the separation of usage and realization of complex data items. To give the user more than arrays and structures was perhaps first suggested by Baizer [1967]. BLISS and Alsberg's OSL/2 [1971,1972] allow the user to designate f e tch and store mechanisms with the declaration of data items, and to have these mechanisms implicitly be invoked in the general source text. Later languages carry the idea further. Earley«s VEHS2 [1973c] is an example of a very high level language with user-conceived data usage mechanisms, and Liskov^s operation. clusters [1974] seem to follow CLEOPATRA" s technigue of packaging various type-sensitive routines in separate blocks as a realization of a user- defined new data type, an idea which is also present in SIMULA 67»s classes. 13 Block structure and the thusly implied name binding have also been under investigation in the aftermath of the arrival of structured programming. While Bulf [1973a] indicts the globally known variable as too unprotected an information carrier, George i 1 973 J and Earley £ 1973b] suggest very interesting, sometimes dynamic, not-so-global binding mechanisms. In particular, George's ideas should prove to be fruittul in future applications. CLEOPATHA's scope mechanisms anticipated some of Wulf's suggestions. 1.3 Extensible languages. [Proceedings 197 1c], although definitely a state-of- tne-art document, is a good indication for the uncertainty just what exactly an extensible language should be. The scope of the concept there includes well designed compilers to which a daring user may add his own extensions, as well m as such complete environments as the ECL system, which certainly is the most ambitious effort in the area [Brosgol 1971, Holloway 1971, Prenner 1971, 1972, Wegbreit 1971a-c, 1972a-c, 197<*]. better and more critical definitions are provided by Cheatham [1971], and by Hoare i 1973a J. Cheatham's definition, however, seems to be a little too restrictive (it accepts ECL as the only extensible language) , he demands 14 syntax, control mechanism, and data type extensions, and he states that "there is essentially a complete elimination of compile-, load-, or run-time distinctions." Hoare postulates that only McCarthy*s idea of "overloading", i.e., the extension of the meaning of the existing operators of the language, should be admissible in an efficient extensible language. Perhaps a more workable definition is provided by Schuman [1970] who distinguishes syntactic extensions, mostly by a macro processing mechanism, from semantic extensions, mostly by a data type ("mode") construction mechanism. If we adopt this classification, I need mostly address myself to the latter: this section is only intended to sketch some earlier developments which influenced my design of CLEOPAIRA and not as a comprehensive discussion of the extensive subject area. Schuman* s ideas of mpfl e constructors are largely present in the ECL system, h similar extension mechanism for data types is found in ALGOL 68. Both languages support overloading, i.e., the definition of additional meanings for existing operators such as +, -, etc. on arguments of a new type. In the context of ALGOL 68 # s coercions (implied conversions) the recognition of such new operators is difficult; see [Jorrand 1971]. 15 The syntax extensions of the ECL system are accomplished by modifications to the parse tables of the language. Irons [1970] and Bilofsky [1974] discuss some of the ambiguity problems which this technique can cause. Their IMP family of extensible languages does not support new data types explicitly. Another approach to syntax extension is taken by sophisticated macro and procedure call processors, such as described by bowlden £1971] and Garwick [1968]. Garwick's GPL uses the interesting trick to have arbitrary words as delimeters between procedure call parameters, so that procedure calls present themselves almost liKe syntax extensions. The same technigue, to a lesser extent, is used in the list oriented language LOGO; a comparison of Newman's technique to add graphing capabilities to LOGO £1973] and of Smith's propose! extensions to the fixed syntax of PL/I to oDtam graphing mechanisms £1971] demonstrates some of the elegance inherent in extensible languages. A different approach to semantic extension is pursued by languages lixe SIMULA 67, OSL/2, and CLEOPATRA. Instead of treating mode as a primitive and providing operations for it such as union, or structure of, these languages allow the creation of a program package to realize a new data type; the package contains declarations which describe the representation of a new data type in terms of a set of data 16 items of previously defined data types, and routines which have access to the representation and which thus manipulate the new data types. The calling conventions differ considerably: Alsberg^ OSL/2 permits a certain kind of overloading, while SIMULA 67 and Liskov*s language [1974] designate the new operations together with a reference to the name of the encompassing package. CLEOPATRA was designed to use overloading exclusively, and I was pleasantly surprised to find my idea reinforced by Hoare's opinions. 17 2. The design of CLEOPATRA. "CLEOPATRA will be used to code operating systems, i.e., production systems. It is expected that "normal" programs written in CLEOPATRA will be lengthy, and that normally compilation, linking, and loading will be distinct events in time. "Therefore, emphasis in CLEOPATRA design lies primarily on ease of maintenance, execution efficiency, and clarity of systems written in the language, and to a much lesser extent on ease of compilation. Maintaining control is a major consideration; we wish to protect the ignorant coder from himself. However, we will provide tae necessary means to eliminate those control options, thus allowing the construction of seemingly efficient "dirty" programs - a practice which we would strongly like to discourage. "Operating systems require the manipulation of a variety of data items, representing system components like ♦task*, •jon 1 , •unit record device 1 , •reply*, etc. It is not possiole to anticipate all such data items, and their aggregates, when a system implementation language such as CLEOPATkA is designed; it may well be the case that such data items and their aggregates receive their final layout well through the actual system design period. Consequently, 18 CLEOPATBA provides user defined data items and aggregates; the usage of such data items through their operators and the definition o£ each operator can be kept completely independent so that the redefinition of the algorithm performed by an operator need not affect any code calling the operator. "CLEOPATBA is to be transportable. Most of the run-time support routines for the system are therefore expected to be written in CLEOPATRA itself. In particular, we expect to only supply those operators with the system proper (i.e., inside the coder) which reflect the target machine's hardware. All other operators will be synthesized in the language, not in the coder. "Only those "basic" data types and operators are built into the language which correspond to special hardware instructions on the target machine, or which present a major convenience for the programmer. All other operators and data types should be built using CLEOPATHA code. It is expected that a library of commonly used types will be built; provisions should be made in the control statements of the compiler to incorporate such library routines. "Main features of CLEOPATRA are extensible (i.e., user definable) data types and operators, the ability to create synchronizing primitives, interrupt mechanisms, the ability to create parallel processes, and the ability to create 19 stand-alone programs (i.e., operating systems)." [From the introduction to the Report. ] I started with the firm intention nerely to slightly modify Alsberg^s OSL languages [196a, 1971, 1972] so as to eliminate his anticipated need for special hardware features, and then to proceed and implement the resulting OSL/3 language for the IBM Systea/360 # a "conventional" machine as opposed to Alsberg's fictitious implementation for the Burroughs line of machines. One of the first design decisions was to use overloading, i.e., the addition of new semantic interpretations to existing operators, together with Alsberg , s cjc ge s s defi nition technigue to let the CLEOPATRA user design and realize his own data types toyetner with a very conventional usage capability. At this point the possibility of a multitude of •like* operators of completely different semantic content was discovered, and the need for a drastic decision witn regard to precedence and recognition of operators became apparent. An extended abstract, describing the design at this point, appeared in [Proceedings 1973a], and a number of copies of a (very) preliminary version of the Report were made available to those who reguested them. OSL/2 had fairly completely disappeared from the picture, and subseguently I proceeded to carry my design far afield to arrive at completely rethought (and hopefully sometimes more suitable) features for CLEOPATRA. An actual implementation, although its 20 feasibility remained a constant concern in all my discussions with Professor Friedman, my advisor, was judged to be beyond my physical capacity at this time. An implementation attempt is presently being carried out by students under Professor Friedman* s guidance, and soie feedback was already used in the Report. Hoare [1973a] distinguishes the designer of a new feature who "should concentrate on one feature at a tiae", and the language designer who "should have the resources to implement the language on one or more machines, to write user manuals, introductory texts, advanced texts; ... one thing he should not do is to include untried ideas of his own. His task is consolidation, not innovation." I found this article too late - according to this definition I have designed so many features that they make up a new language, which perhaps is unfortunate. On the other hand I believe that I have succeeded in merging a significant number of features into a coherent language for system implementation, features which are either new or have not yet received the kind of attention in this context which I think they deserve. It is the purpose of this section to sketch the aoce important ideas and to motivate their inclusion in CLEOPATRA. These ideas deal foremost with the overall structure of a program, with the realization and use of data items, and with the flow of 21 control in a CLEOPATHA program. How CLEOPATRA code actually looks like can be seen from the examples in section 3; what semantic features are available is discussed in tabular form in appendix A; the interested reader is in any case expected to read the Report. Some more details are implicitly supplied in the discussion of a prospective implementation in section 4 of this report. 2. 1 Program structure. On a coding level we distinguish essentially three types of information which the source code, explicitly or implicitly, has to convey: calling conventions and nesting of procedures, available data items, and the actual executable statements. SUE did distinguish these information carriers by introducing the context, data, and program blocks, respectively. This coding oriented structuring of intormation can be particularly helpful for program maintenance and tor the sharing of code generation among the members of a programming team. There is, however, another classification of the source code possible. On a functional level we can distinguish modules that describe algorithms in general, modules which realize the operations on a particular data type # i.e.. 22 usually a package of algorithms, and, finally, nodules concerned with the handling of an exceptional condition. In a parallel programming context, we light further add Dijkstra f s "secretaries" or Hoare^s "monitors" [1973b], i.e., the serial administrators of a shared resource, etc. CLEOPATRA adopts and refines SUE's concept of different blocks as carriers of textually different information. first, we distinguish three kinds of data descriptor blocks: the local data_block, which describes all those data items that are local and accessible by only one routine, and inaccessible to any routines nested within; the global_data_block, which describes all data items that are accessible by the given routine and by routines nested into it; and the type_data_block. The type_data_block is really a generalization of the global_data_block; it describes the underlying representation of a user defined data, item, and its contents are accessible, with an indication as to which instance of the data type they belong to, throughout the type_pack, the package of algorithms realizing operations on the new data type. This distinction of data description blocks by the scope of their constituents has two advantages. First, we have influence on the global accessability of data items, i.e., we can prevent indiscriminate access or inadvertant damage to data items which are crucial for a routine but 23 best hidden from its internal routines. Second, as the type_data_block illustrates and as will be discussed later, the concept generalizes quite nicely; the three types of blocks allow a functional construction of source modules merely by the selection of appropriate constituent blocks. For reasons of efficiency I made one more amendment to the concept of local and global scope: data items in a type_data_block may be declared with a SHARE attribute. This opens their scope in a read-only fashion toward the enclosing routine; it is thus possible to efficiently read the status of certain, type_pack-designer selected, data items. Let us now consider the second information carrier- CLEOPATRA provides two kinds of blocks to describe calling conventions and to establish the logical nesting of otherwise physically entirely separate routines: the local structure_block describes the calling conventions for all routines logically nested into the given routine (to which the structure_block belongs). The idea or a routine here is guite general. It includes the conventional procedure and operator as well as the less conventional generation time routine of a user defined type, i.e., essentially the type_data_block, and information on exceptional condi tio ns and on their interru£t handling routines. 24 We also have the global_structure_block, which is used to describe the calling conventions of a package of routines realizing operations on a user defined new data type. This package of routines is known "globally** in the sense that it should be introduced wherever an instance of the data type is available. This is accomplished by the nesting conventions together with a special scope rule applying to the contents of the global_structure_block as described below. A number of blocks, usually a data block, a structure block, and a routine block together, are termed a configuration. A configuration usually describes an algorithm and may provide its services to other configurations, or it may call on the services of some other configurations. CLEOPATflA nests configurations in the same manner in which ALGOL nests its procedures, except that the nesting is logical - a configuration is nested into another if its structure block entry is in the structure block of the other configuration - rather than physical. Hence, either type of structure block, in addition to listing the calling conventions, describes the nesting of configurations. Before we discuss configurations more closely, let me mention the third information carrier, the routine block. A routine block is composed of statements, it contains the 25 executable code of a configuration. Here, too, ve distinguish a number of different kinds of blocks: operators and procedures merely differ by their calling conventions, conversions are a special kind of unary operators, and all three return a value, interr u pts are invoked as a result of an exceptional condition; they may have parameters, but they do not return a value (they may change their parameters). The separation of information into different carrier blocks has two results. First, it makes the relevant information content easily distinguishable, it allows very distinct compiling technigues, and it admits very clear rules governing ref erencability (all things in CLEOPATRA are defined before they are used, and only structure blocks may define routines so that they can be used before they have actually been elaborated; the complete listing of the calling conventions by the structure bloc* entry allows a complete verification of the calls at compile time) . Second, the combination of selected different carrier blocks, essentially by a common name, imposes a distinct functional meaning for the combination. 1 believe that this is tne point where the separation really pays off. The ability to combine blocks into a configuration is not carried to the extreme of complete orthogonality: only specific combinations make sense and are admitted. 26 At present, CLEOPATBA recognizes three such configurations: algorithms, type_packs, and interrupts. An algorithm and an interrupt both describe some action, potentially on some data items, possibly while employing other conrigurations. They therefore use a (local) structure_block, a data_block, a global_data_block, and one kind of a routine block. Only an interrupt can use an interrupt routine block; in general, the type of the routine block will determine the structure block entry (the calling seguence) for the configuration. A type_pack has two jobs: it must describe the underlying representation for an instance of the data type, and it must make the representation available to a number of algorithms (or interrupts) which realize operations on the new type. Conversely, these operations must be made known to the owner of the type_pack, i.e., to the configuration in whose structure_block the type_pack is recorded. The first task is handled by allowing the type_pack to own a (local) data_block (to describe some of the generation time parameters if any) and a type_data_block to specify the underlying representation. The elaboration of the initialization re guests of the type_data_blOck is termed the generation time routines for the type, and the type_data_block therefore may have parameters, i.e., the generation time parameters. 27 Furthermore the type_pacJc owns a global_structure_block listing all routines usable not only within the type_pack Dut also by the encompassing configuration. With this special scope rule we can pass the ability to call type- sensitive routines (which by their inclusion into the type_pack have access to the underlying representation of a type) to the point where the type is used. The type_pack finally may own a structure_block to describe global utility routines inaccessible to the outside, but it may not own a routine block. One could consider (as Aisberg did) to admit a routine simply to have a more elaborate initialization process. Let me summarize: we have three basic blocks of source information, carrying data, linkage, and algorithmic information. We compose three types of functional modules, to realize algorithms, interrupt handling, and operations on a user defined data type. The functional modules, configurations, are created as combinations of the source blocks, and there is an interplay between the types of blocks participating in the configuration, the functional result of the configuration, and the linkage description by the structure block entry for the configuration. It is expected that this rigorous but uniform approach will aid in tne design and self-documentation of programs, and that it will enhance understandability and maintenance by delimiting the kind of information presented in the context of each 28 source block. 2.2 Data processing. This section is concerned with soae aspects of data manipulation. I will try to exhibit and motivate the basic features of data as reflected by their use in expressions. Let me begin with a discussion as to what constitutes a data type, i.e., how much information about a data item is abstracted when it is encountered in context. I recall two productions from the Report; the first one describes what a recognized type is, and the second one describes what it takes to create an instance of a type. (4.U) ref_type ::= { basic_ref_type | type_name | { ALIGNED J COMPACT f ( ref_type £~, ref_type } • ) } [ ( ALIGNED J COMPACT } integer EXTENTS ] £ ALIAS identifier ] (4.5) type ::= { basic_type j type_name £ parameters ] | £ ALIGNED J COMPACT } i type £ , type }• ) } [ array J £ ALIAS identifier ] In agreement with Hoare £ 1973a j, CLEOPATBA is strongly typed, i.e., with but very few exceptions there is never an implied conversion. This is not guite as restrictive as it may seem, since operators are recognized by their name in connection with their principal argument types, so that "mixed mode" expressions are possible as long as the "mixed 29 mode" operators have been defined. Production (4*4) above should give an impression of what contributes to a recognized data type. There are three components that determine the recognition of a data type: first, the primitive type, as indicated by the type_name. It may be one of the predefined data types that are, essentially, machine specific, or it may be a user defined data type. The second component is the aggregation of a number of these types into a data_group, and the fact whether this group is aligned to (hardware determined) divisible boundaries, or whether the group is packed as tightly as possible. Here I follow the SUE language: tor some problems in system implementation it is essential to be aDle to pack data items tightly, with no regard to alignment, and this is best discussed at the level of aggregating data items into a structure, record, or data_group (the names are more or less synonymous). He must distinguish alignment as part of the rei_type, so as to be able to generate efficient code. Types with the ALIGNED attribute will match COMPACT types, but not conversely. Finally, the last component of type recognition is concerned with the aggregation into arrays. Here we again recognize COMPACT pacKing of the array elements, and we also recognize as distinct arrays with a different number of extents. The latter is necessary to allow the user to write operators which are, for example, sensitive to matrices and vectors. 30 and it frees us from the worries of APL, where, essentially, any variable must be carried with a dope structure to allow for it to be used as an array. Production (4.5) indicates one More important detail about CLEOPaTRA data: the user, when creating a new data item, must specify or imply parameters for the generation time routines for the type. Such parameters typically govern for example the maximally anticipated length of a CHARACTER item, or the highest number of elements which a stack may have, etc. Some types, of course, may not require generation time parameters. Our treatment of recognized types is such that we do not require instances of the same type with different generation time parameter values to be recognized as different. The entire design of data types was governed by the intention to do all type checking at compile time, and to keep the entire feature controlled in the sense of a "safe" language (section 1.1). It is for this reason that we have excluded the ability to define unions of types, and operators on them, where a dynamic decision would have to be made which of a set of operator encodings to invoke. This, however, prohibits us from defining generalized access methods, such as a stack, without simultaneously defining the type of the constituent, such is possible in Liskov's language £1974]. Liskov claims that they expect to 31 eventually be able to verify semantic consistency at compile time - at this point, CLEOPATRA may undergo some revisions. Let me close this section on data with a few minor remarks on further features. First, CLEOPATRA supports row- major and column-major storage schemes for arrays, and a very elaborate crossectioning and reshaping mechanism. Admitting both storage schemes was done in order to allow teams of users to snare their arrays and array-sensitive routines without copying of the arrays, but it will require a run-time routine for array accessing, rather than in-line code. The decision should not prove to be too inefficient in view of my firm conviction that CLEOPATriA should check and enforce all subscripted references which is best accomplished by a central subroutine, rather than by voluminous code in-line. Second, I have attempted in the Report to give a careful definition as to what constitutes a modifyable "storable" reference. The definition is aided by a consistent labeling of principal arguments for operators which are expected to be modified with the colon ":" as a special delimeter. This results, in particular, in the familiar notation of »•: = •« for right to left assignment, assuming, of course, that "= M is used to denote assignment. More importantly, however, the consistent requirement to have to define parameters as to be modified, i.e., passed BY 32 ADDRESS, enables the compiler easily to store-protect all other values, and as a trivial byproduct, a CONSTANT attribute for declarations is defined. This convention, too, should aid in the creation of correct prograas. The two most drastic innovations are a precedence-free right to left evaluation of all expressions, and a default initialization of all variables (except for use of the very privileged direct access to memory; see section 9.2.1 of the Report and section 3.3 below). The elimination of operator precedence was considered necessary so as to free two users who use the sane operator symbols vith different precedences from this error- prone consideration. 1 generally prefer for an operator oriented language no precedence to a definition of precedence when the operators are defined. An absolute precedence numbering will be too restrictive, and a relative scheme [Suzuki 1971, Yoneda 1971] is incomplete. Right to left execution, unfortunately, is imposed by ay habit of appending unary operators to the left, not being in homological algebra anymore. Default initialization, although perhaps expensive, seems to be another feature that can aid in the creation of correct programs, and in an increase of security. I decided to opt for an actual physical initialization, although Earley»s idea [1973a] of an "uninitialized" status for references looks attractive. As Earley admits, however, it 33 may require a special tagging mechanism for its implementation. Initialization and the related environment inguiries [Naur, compare ALGOL 6b] will most likely pose a hard but necessary implementation problem. 2.3 Control structures. Some new ideas about control flow are another feature of CLEOPATRA which 1 hope will be considered useful for the creation of clean programs. CLEOPATRA does not have a goto statement; I have instead included a decis ion table as a very powerful mechanism for selective execution, and a fairly general iterate statement for repetitive execution. Let me discuss these in turn. Decision tables have been used for business data processing applications for many years, mostly together with fixed field special coding forms for COBOL programs; [Proceedings 197 1a] may be consulted for an overview. Probably because of the required fixed format, decision tables do not seem to have made an appearance in free format, block structured languages, BABEL being the notable exception [Scowen 1971a-c, 1973a]. I would think that decision tables, which contain Hoare*s case statement as a special case, should be an ideal control structure, since they provide a high level of top-down execution discipline. 34 as vail as a large potential for optimization of the code controlling the flow of execution once ve apply techniques similar to those used for the simplification of logical networks. CLEOPATHA provides for selective execution the if then else statement, since it was recognized that this is such an often used special case of a decision table that it merits a special syntax. The main structure for selective execution, however, is a free format, extended entry decision table. It consists of a decision part, where in one or more decisions "switches" are set to be either true or false. Decisions can set a single switch, based on the evaluation of, essentially, a logical expression (which may have side- effects) , or they can set one of a set of switches (a case statement) depending on a subscript pointing into the array- formated set. The decision table then has an action part, where in one or more actions statements are executed. Switches cannot be changed in any action, and their combination, a switch_expression with a number of logical operators, determines whether or not the statement of the action labeled by the switch_expression is executed. Finally a decision table may have an else part, an action to be selected if no other action is selected. The decision table will reguire a new approach to coding, but it was my experience that it can produce a clean 35 and efficient flow of control. It is more general than the case statement, but it still has the property of a controlled top-down execution flow, which the goto statement lacks. fiepetitive execution is controlled by the iterate statement. (CLEOPATRA provides synonyms for many keywords; one synonym for ITERATE is, of course, DO.) In its design I essentially followed Hoare £ 1972a J, but I did generalize slightly. The user can specify conditional or unconditional repetition, controlled by a while clause checked to be true before each iteration, and by a when clause checked to be false after each iteration. (The termination of the loop reads WHEN ... END, and the phrase must be true to prevent further iteration. I hope that this idea, as many others, has a mnemonic value.) A loop may also be preempted by an exit statement. In its design I followed Wulf's considerations of the shortcomings of BLISS 1 e scape mechanism [1971b]: my exit terminates the closest compound, or the explicitly mentioned labeled compound. It is still a static, compile-time bounded mechanism, and as such it should admit a very efficient implementation- Let me now return to Hoare's ideas on the for statement: we should provide for a finite, predetermined number of iterations, and the index controlling the iterations should not be modified from within the loop. It 36 turned out to be quite simple; the index must be an INTBGJSR in the (local) data_block, thus it is unknown to deeper nested configurations which night be called froi vithin the loop. Additionally the index is termed CONSTANT within the loop, i.e., the compiler can protect it iron being passed as a storable reference. The index ranges o?er a predetermined set of values, frog either its present (possibly default- initialized) value, or an expression which is evaluated at that time, and ugto or down to a terminal value, also just once evaluated at loop initialization time. The user can define a step size, which may or may not be in the proper direction of the range of the loop. One hopes that in this case the user is smart enough to terminate his loop properly. The latter is not such a clean decision. It arose as a compromise between performing the dynamic check for each loop initialization and reporting it as a condition (one of the problems with errors is the absence of a standard device for a report) , or eliminating the user defined step altogether. I feel that if the user designates a step , it is his responsibility to terminate the loop properly. This seems to me a reasonable compromise between total security and efficiency of the range-checking code. Rules for the cooperation of the various control mechanisms for a loop are stated in section 7.2.2 of the 37 Report, and in particular there is a clear rule determining the value of the index when the loop is terminated; this value is made available outside of the loop since I feel that it will sometimes be used in further processing; here CLEOPATRA differs for example from AL3UL 68. Summarizing I would consider the decision table as a powerful control mechanism whose properties and implications need to be tried, explored, and considered for future applications, and I consider the iterate statement as a workable version of the many do loops in existence. I was amazed how easy it was to have a well protected index variable in the context of the scope mechanisms of CLEOPATRA. 2.4 A summary of local innovations. I do not wish to overly extend this section. Let me therefore just give a list of other items in the Report which I feel have been rethought and which should merit further consideration. The list is presented in the order in which the items appear in the Report; tne numbers on the margin indicate the page numbers of the Report. 10 The distinction between a delimiting_character and a special_character is made to allow a clear definition (which, unfortunately, still resorts to the concept of a lexical analyzer) of tokens and their possible 38 separations. 11 The comment feature agrees with Scovea [1973b] in that major comments are delimited by unbalanced and frequent brackets (the semicolon ";") , in that in-line comments are available, and in that comments are legal anywhere. I would like to consider Knuth's suggestion £1973] when he recommends that in-line explanations of all variables should be mandatory. This could be facilitated by a convention that an in-line comment delimeter when appearing on the same line with source code should act as a statement separator; this would avoid a number of unpleasant "; I" combinations. 12 The minus_symbol recognition is not handled adeguatly. I would much prefer kVL's convention of "-" vs. •»-'», but this is not readily available on the customary input equipment. Allowing constants to be written in various bases might be helpful for composing extreme patterns. The base designators, as well as most of the constant type designators are borrowed from assembler language conventions. 16 It was a difficult decision whether to have LONG or SHORT integers. The decision was finally made in favour of LONG integers, since I felt that this would considerably compact array representations (it limits them too, but I do not consider this a serious practical limit), and it eliminated a problem in dealing with the lengths of CHARACTER values. 17 The decision to choose n P" to distinguish real number exponents, rather than U E" for REAL and "D" for LONG_REAL values, was difficult. I still would like a better and equally unambiguous representation. ("P" stands for power.) 18 Decimal numbers are one of the worst features of the IBM System/360 hardware. But so it goes. They have definitely added their own level of idiosyncracies to the language. 19 The representation of literal constants is a consequent attempt to define them with a frequent bracket as a delimeter, and to use a reasonable grammar based on a single control character. The fact that there are no CHARACTER values of fixed length (although a clever implementation may provide 39 them internally) may yet prove to be an oversight with respect to space economy. Usage should analyze this problem. 21 System supplied quantities are an attempt both to provide environment inquiries (and a more complete list could be obtained from ALGOL 68) , and to provide a basic set of more or less generic values. In the strongly typed language, these values, unfortunately, must be presented in a typed format. 24 Configurations have been discussed in section 2. 1 of this report. 27 The rules for recompiling individual blocks show how localized the information in each separate block is. Actual usage must show how much optimization of the recompilation effort is sensible. 29 I have made an effort to allow identical names to be used with similar scopes but for different purposes. The effort is intended to minimize the need to watch out for an identifier conflict in larger programs; but the idea may backfire in the same sense in which global variables bridge definitional gaps for local variables. Actual usage must show whether or not my distinction of identifier classes is workable. 31 CLEOPATRA completely abolishes the physical nesting of procedures. This was done in favour of the well structured approach provided by the blocks and configurations. Actual experience with the language must show whether this physical separation enhances understandability or whether it hinders it. Structured programming as I understand it should be facilitated by this decision. 32 The decision not to monitor recursion is dangerous, but I find the implementation of recursion monitoring about as expensive as the implementation of recursion. An exception must be made in the context of hardware interrupts, but there the hardware can be called upon to prevent recursion. 34 The compilation environment request is intended to provide almost automatic and finely tuned library services for the compilation. He seem to have developed a reasonable secondary storage handler for our prototype compiler, but again an actual usage will have to demonstrate the feasibility. 40 38 The decision to have a BYTE have a positive numerical value only was made in the interest of an efficient implementation and in order to simplify CHARACTER comparisons on a numerical basis, whether or not the SIZE condition can be monitored efficiently for BYTE values, needs to be investigated. 39 POINTER values and their merits are extensively discussed in the Report and belov in section 4. The decision to make POINTER values sensitive to the types of their arguments is obvious in the interest of maintaining control; still, the inclusion of POINTER values "has been a step from which we may never recover." [Hoare 1973a] 48 ALIAS names are an attempt to simplify the coding for the user. To some extent, of course, they mirror the behaviour of current naming conventions in OS/360. 51 The concept of BUILT IN procedures, especially in the context of ALIAS names needs further analysis. It effectively counteracts transparency considerations, and it is similar to attempts to reestablish access to lost global variables. Its merits are mainly efficiency considerations, and the possibility of redefining existing procedures without causing unwanted recursion. 52 Keyword parameters are motivated by assembler macro coding technigues, as are ay ideas about omission of parameters in procedure calls. The guote "•" as a separator for the keyword is not quite a satisfactory solution. The admittedly unfamiliar delimiter rules for operator invokations are the conseguence of allowing the omission of parameter lists together with the possibility of returning array values. Re searched for a considerable amount of time for an alternative solution which equally signals the absence of an optional parameter list. The only "syntactic sugar" is the clean extension of the delimiting "•" into the delimiting ":" if the argument is passed BY ADDRESS; the whole concept of these mostly mandatory extra deliaeters may well turn out to be "syntactic rat poison" £ Hoare attributes this term to Peter Landin, 197 3a]. At least additional periods should not hurt. 54 The END phrase could in general be taken for an automatic request that a procedure return a NIL value, or that an interrupt RESUflE, but in the spirit of Hoare's indictments of the abuses of the labeled END statement (which CLEOPATRA does not commit) I prefer my U1 decision to have the execution of the END phrase signal an error. The syntax of structure block entries, routine headings, and routine calls is carefully arranged to be as similar as possible, while conveying the necessarily very dissimilar information. This should aid in using routines. 57 Not to signal omitted parameters more explicitly than by juxtapposed delimeters may well be another case of "syntactic rat poison** but it does clean up and shorten the code. 61 CONVERSION routines are an attempt to create names for unary operators which make the source code as functionally descriptive as possible. They are also (because of the necessary ALIAS names to handle aata_groups cleanly) an open invitation to write overly compacted code. I can but appeal to the better judgment of the users. Admittedly, the naming trick was too nice to omit. 64 I believe that the design of data_groups allows a very concise and controlled layout of a data area, but it does admit some fairly inefficient and inaccessible constructs. Here, again, I would hope that the user would only employ the full power of the feature to advantage and not for trickery. The SHARE option really is intended for efficient infocmation gathering, but it obviously also invites information leakage. 71 The array mechanism with all its extraction and reshaping power may turn out to be a functional elephant. On the other hand I firmly believe that for example the inability to destroy the dimensional association of the elements will improve the clarity of the encoded algorithms, crossectioning is too powerful and useful a mechanism (and it is very efficiently implemented at the dope vector level) not to be supplied. Admitting both storage schemes, although inefficient, will eliminate the strange complications when communicating between FORTRAN and PL/I. The ability to redefine bounds is necessary to create clean and protected code. It, too, is implemented very efficiently at the dope vector level. 77 The distributive law for operators with respect to a data_group is dangerous, although efficiently implemented. I would hope that the compiler would give a clear indication as to what code actually was 42 created. 83 The COPY option for parameters, i.e., the possibility to create a RETUBN VALUE parameter convention similar to ALGOL w [uirth 1966], is a necessity not exactly dictated by elegance. He unfortunately have to overcome the dereferencing problem for pointed objects, the problem of ill-aligned values, and the problem of improperly sized decimals. The compiler can do this slightly lore efficiently than the user, which is where the COPY option comes in. Implementation and usage will have to provide further data on this decision. 85 I think that the syntax of extracting crossections or elements of an array, as well as reshaping, has an elegant solution. Those are definitely two cooperating mechanisms and I found it easiest to separate them into two parameter lists for the mechanisms. That extraction precedes reshaping functionally as well as in the right to left execution scan I consider true "syntactic frosting". 90 The temporary array mechanism provides an elegant solution to the problem of functionally creating an array constant. 103 My ideas on statement separation, i.e., to reserve the statement keywords and then to force the user to only supply the minimal amount of statement separators in the form of " ;" is another one of those close decisions between aiding the user by a tolerant compiler and creating confusion. The coding convention would be to introduce statement separators liberally; the language definition in this context should be an aid to the negligent and not a challenge to the keypunching economist. 104 Exceptional conditions were added after the general purpose part of the language was frozen. I think my propsal meshes nicely with the general syntax of the language. In particular the idea to have interrupts admit a type-sensitive argument would be an elegant solution to the problem of the potential multitude of basic conditions such as SIZE or OVERFLOW. The corresponding ability to enable ALL like-named interrupts will only prove successful if the compiler provides a lot of helpful hints. 109 The ASSERT condition, found in a number of languages, is a reasonable approach to debugging and the problem of program correctness, but its resolving needs a lot of 43 further thought. ALGOL H's termination of a program once the condition is raised seems to be slightly unsatisfactory. 110 Section 9 as a whole will be subject to further study when it is actually put to use in designing an operating system. In particular, the idea of having a language to control the channels should be pursued further. 129 The list of basic operators is incomplete; missing are for example operators on arrays, and logical operators on integer values (see section 3. 2 for an example of the latter). Some of the innovations are: CEIL, FLOOR, ROUND, TBUNC as unary operators with positional parameters indicating after what digit the truncation is to take place (we might even consider another argument to denote the base for this count) , ROOT as an operator tor all roots. All CHARACTER operators, in particular the substring mechanism realized by two binary operators, the character interrogation operators, and the general interconnection between CHARACTER values and their internal representations in BYTE arrays for comparison and conversion. Among the arithmetic operators, tne sign-preserving exponentiation ♦♦ needs to be mentioned. The distinction between preemptive english names for the logical operators, and non-preemptive symbols should prove efficient. 44 3. Two examples. I mple mentation languages ace employed to code large software systems. Examples of their use therefore necessarily must be somewhat large in order to be realistic. In selecting the two examples foe this section I was guided by Project Bosetta Stone [ lulf 1972a] which attempts a comparison of implementation languages by means of a code- oti, i.e., by requiring that a few "typical" medium scale problems be coded in each language to participate in the comparison. The coding examples which I selected here ace, first, a symbolic differentiator, and second a "buddy system" storage allocator. The first example exhibits typical top-down application programming. The main routine is constructed first, assuming and employing suitable abstract new data types to realize algebraic expressions. The bulk of the implementation work subsequently is concerned with the realization of operations on these data types, in a well localized and manageable fashion. The second example shows one of the key parts of an interrupt driven operating system. The fieport has only made a preliminary proposal for au interrupt system for 45 CLEOPATRA. ny solution for the second problem therefore serves to elaborate on this proposal and is not necessarily a generalized example. Operational code is but one of the consequences of a programming language. Operational code together with a line ny line discussion of the language features involved can explain the exposition of algorithms. More important than exposition, though, is usually the creation of an algorithm or its implementation, following Naur [1972 J I will therefore try to show how I arrived at the solution for the first example. In order to establish a basis for communication, however, I shall first present the completed solution. Quotations in this section are taXen from [Wulf 1972a J; the solutions have essentially been contributed to Project Bosetta Stone [Friedman 1974]. 3. 1 A symbolic differentiator. Programs for the symbolic differentiation of mathematical expressions have been coded for many years. Knuth [1968] reports that they have been written as early as 19b2. He describes an elegant solution in assembly language which proceeds iteratively over the given expression in postfix order. The mathematical definition of the solution, 46 however, is essentially recursive, and I chose to model it rather closely. 3. 1. 1 Problem statement. "tfrite a procedure, DERIVATIVE (F) , which accepts an expression F involving a "symbol" (i.e., variable) x, real constants, and the operators ♦, -, *, /, In, and pow, and produces (returns) an expression which is the derivative of F with respect to the variable x. No simplification of the resultant expression is required. "Symbolic differentiation involves application of the following recursive rules: (D (f ) means "derivative of f with respect to x".) D (constant) =0; D (x) = 1 D(f±g) = D(f) ± D(g) D (f*g) = f * D(g) ♦ D(f) * g D(f/g) = (D(f)*g - f*D(g))/pow(g,2) D(ln(f)) = D(f) / f D(pow(f # g)) = D(f)*g*pow(f,g-1) ♦ In (f ) *i) (g) *pow (f ,g) "The expression F is an arbitrary (but finite) mathematical expression involving x, constants, and the operators listed. Its precise form ... is optional, but no restrictions on its length or complexity may be assumed. The result expression is to have the same form as F." 47 3.1.2 Solution. The code presented here is annotated by comments as numbered on the left margin. The comments which follow the code are intended to be a short course on CLEOPATRA, rather than an in-depth explanation of the simple-minded algorithm. 1 2 3 4 STRUCTURE symbolic_dif f erentiator ; ! Symbolic differentiator ATS 6/74 TYPE formel ; TYPE variable ; de: OPERATOR derive( variable) . formel RETURNS formel END symbolic_diff erentiator 6 7 GLOBAL STRUCTURE formel ; 10 11 12 13 ad su mu di po In le ri ar ty fo te as OPERAT OP for OP for OP for OP for OP log OP lef OP rig OP arg OP typ CONVER OP ter OP for OR formel + formel RETURNS formel ; mel - formel RET formel mel * formel RET formel mel / formel REI formel mel ** formel RET formel ; formel RET formel ; t ALIAS num formel RET formel ; ht ALIAS denom formel RET formel ; formel RET formel ; e (variable), formel RET INTEGER ; SION TO formel FROtt INTEGER ; m (formel, formel). CHARACTER RET formel mel BY ADDRESS := formel RET formel 14 END formel 15 GLOBAL STRUCTURE variable ; 16 PROCEDURE fill (variable BY ADR, formel, formel, CHAR, INT). RET variable ; 17 in: OP variable BY ADR :INIT INT RET INT ; 18 eg: OP variable == variable RET BIT END variable 48 19 DATA de ; 20 variable x ; forael f 21 END de 22 de: OPEBATOR derive (x) . forael f ; 23 DECISION 24 con, var, sua, dif, pro, quo, pow, log: type(x). £ 25 ACTION 26 con: RETURN forael ; var: RETURN forael 1 ; 27 sua: RETURN (derive (x). left f) ♦ derive (x) . right f ; dif: RETURN (derive (x). left f) - derive (x) . right f ; pro: RETURN ( (derive (x). left f) * right f) * (derive (x). right f) * left f ; quo: RETURN (( (derive (x) • nua f) * denoa f) - (derive (x) • denoa f) * nua f) / (denoa f) ** forael 2 ; pow: RETURN (( (derive (x) . left f) * right f) * (left f) ** {right f) - forael 1) ♦ ((log left f) * derive (x). right f) * f ; log: RETURN (derive (x) . arg £| / arg f 28 END 29 END de 30 GLOBAL DATA variable ; 31 SHARE forael 1, r ; 32 SHARE INTEGER t INIT 1 ; 33 SHARE CHARACTER (10) symbol INIT C.x 34 END variable 35 DATA fill ; variable v ; forael f , g ; 36 CHARACTER s INIT C.X ; 37 INTEGER i INIT 1 38 END fill 49 39 PROCEDURE fill(v, f, g, s, i) ; 40 v.symbol := s ; v. t := i ; v. 1 := f ; v.r := g ; 41 RETURN v 42 END fill DATA in ; variable v ; INTEGER i END in 43 in: OPERATOR variable v : IN1T INTEGER i ; 44 RETURN fill(v # . , CHARACTER L) END in DATA eq ; variable u, v END eq eq: OPERATOR variable u == variable v ; 45 IP (u.t == v.t) AND (u. symbol == v. symbol) AND (u.t < 3) AND v.t < 3 46 THEN RETURN TRUE ELSE RETURN FALSE END eq 47 GLOBAL DATA formel ; 48 POINTER (variable) ptr ; DEFER variable var END formel DATA ad ; formel x, y END ad 49 ad: OPERATOR formel x ♦ formel y ; 50 RETURN term (x, y) . C* 50 END ad DATA In ; formel x END In 51 In: OPEBATOB log formel x ; 52 BETUBN term{x) . Clog END In DATA le formel x END le le: OPEBATOB left formel x ; 53 BETUBN x. ptr. x- var. 1 END le DATA ty variable x ; formel f END ty 54 ty: OPEBATOB type 2 UNTIL c == operators (i) 63 END ALLOCATE r.var EOR r.ptr ; fill (r.ptr.r. vat, f, g, c, i) ; RETURN r END te DATA as ; for ael f, g END as as: OPERATOR formel f :- formel g ; RETURN f-ptr := g.ptr END as Comments. 1 This {more or less ficticious) outermost structure_block describes the calling conventions for the main routines available in the program. It would probably include further routines to read and write formel values, and it would belong to a configuration which would probably implement a command interpreter for a formel manipulation system. 52 2 An in-line comment. More elaborate comments have the form "COMMENT ...text... ;". 3 The example will use two abstract data types, f ormel and var iable. A formel represents an algebraic expression and it is built from many y_ar.ia.ble values. A variable value is either a terminal operand, or it is an operator. In the latter case, it will connect to further (sub-) forme.1 values. 4 The coding example is only concerned with the realization of this differentiation operator. Its struc ture_block entry describes the calling sequence: the unary derive operator has a principal argument of type f ormel and an optional parameter of type va riabl e. The latter will designate the symbol with respect to which we differentiate. The operator returns a f ormel value. 5 designates the end of the structure_block listing the calling conventions for the outermost routines. 6 This global_structure_block belongs to the configuration which realizes the data type formel. Routines listed in this block are available wherever the type formel is introduced; the declaration of the type (item 3) reports the routines out into the predecessor configuration tree, and makes the type_pack internal to the sym bolic differentiator configuration. 7 CLEOPATfiA supports overloading, i.e., the addition of new semantic interpretations to existing operators. This set of structure_block entries defines arithmetic on for mel values by means of the usual binary operators. 8 log, becomes a unary operator. Due to overloading the operator name log alone is not sufficient for an easy identification of the parts of the configuration for compilation purposes; instead, a separate operator_link In is introduced. 9 These three operators implement access to operands. Observe how the ALIAS names numerator and den ominator are used to illustrate the idea more closely. 10 This operator interrogates the type of a formula, i.e., it returns an indication of the character of the outermost constituent. 11 CONVERSION is a special notation for unary operators. The presented CONVERSION will serve to create formel 53 representations of some INTEGER values. 12 This unary operator has two optional parameters in addition to its principal argument. It will serve to create a for mel representation of a combination of the principal argument, which will denote the printable representation of a unary or binary operator, with one or two formel arguments provided by the parameters. U This binary operator lets us assign a formel value to a formel variable. The variable must be a "storable reference" (it is to be modified) ; this is indicated by the special delimeter M :", and by requesting that the left hand argument be received BY ADDRESS. 14 designates the end of the global_structure_block. ?5 variable values implement formel values. In this global_structure_blocK we describe the routines that let us manipulate variable values. 16 This procedure allows us to enter information into the variable specified as first parameter. 17 This routine will be used to initialize a new variable, to represent an INTEGER value. We are overloading the INIT phrase of a declaration. 18 This operator serves to compare two va riable values. It will return TRUE iff the two arguments are both terminal operands and equal. 19 This data_block describes data items available only to the de configuration, i.e., to the implementation of the derive operator. Notice that we describe the differentiation symbol as a variable. This is why the type variab le cannot be internal to the type £ ormejL. [I could have masked some handling routines such as items 12 or 16 by declaring them in local structure_blocks so that they become internal to the type_packs. ] 20 x is the differentiation symbol, it is defaulted to be the CHARACTER C.x (item 33), and f is the formel to be differentiated. 21 designates the end of the data area. 22 designates the beginning of the routine_block. Por each variable in the original formel we will call recursively on this operator to obtain the derivative of the variable, then we combine the information to construct the derivative of the tormel. Recursion ends 54 at terainal operand v ariabl e values. 23 A DECISION table serves as the main control structure in CLEOPATRA. 24 This fori of a decision implements a case selection: the list of switches is numbered (by default) from 1 on the left on up; the INTEGER expression selects one or none of these snitches and sets it to TRUE, and all other switches to FALSE. 25 designates the beginning of the action part. All switches are now frozen and read-only. 26 If f is a constant, type will return a "1" in item 24, and thus con will be true. In this case the present action will be carried out, i.e., the derivative of a constant is evaluated to be "0 M . This item also illustrates the application of a CONVERSION: "0" is an INTEGER, and it is converted into a f ora el before it is returned. 27 illustrates the first recursion. Note that in CLEOPATRA expressions are evaluated right to left. The operator + has two f ormel arguments, hence, a call will be compiled to item 7. 28 designates the end of the DECISION table. 29 designates the end of the routine. An error condition would result if control reached this statement: this is only possible if no switch was selected in item 24 which in turn corresponds to a variable of illegal type appearing in a f orjnel. 30 This type_data_block describes the underlying representation for each variable- One copy of it is made for each new data item of this type. 31 A variab le has two f ormel operands, which would be NIL if the variable is a terminal operand, and 32 an INTEGER denoting the type of the variable, 33 and a CHARACTER string (of variable length up to 10 Characters) giving a printable image of the contents of the variable. Conversions between numerical data types and literal values are standard in CLEOPATRA, which is why I made the somewhat inefficient choice of encoding; see section 3.1.3. By default the f ormel items will be initialized to be 55 NIL, and the other items are set to be a constant C.x . For ease of access (item 53) the contents of a variable are made read-only accessible by means of the SHARE attribute. 34 designates the end of the description. 35 This data_block lists data items local only to the fill routine. 36 The INIT attribute for a formal parameter is only executed if the parameter is omitted in the call. It vould designate a symbol C.x with 37 a type constant ("1") designation. 38 designates the end of the data area. 39 designates the beginning of the routine_block. Observe that the connection between parameters and their declarations is by name (the declarations define the scope of the names of the parameters) , and the connection between parameters and their types in the structure_block entry is positional or by keyword. The connections must match both ways, and they must also match in a call; CLEOPATRA is strongly typed. 40 The routine merely performs a number of assignments. Since the routine is part of the variable type_pack tree, and since v is a variable, the underlying representation of v can be accessed in a storable fashion by structure-style references. 41 Every routine must return something. 42 designates the end of the routine_block. 43 Tnis operator will be invoked whenever a variable is being declared and if the declaration has an INIT phrase with an INTEGER on its right. It could also be used directly in an expression as a special assignment operator. 44 Observe this call on fill specifying two parameters explicitly, and omitting three others positionally. i is converted to CHARACTER format; the default for the fifth parameter will take effect, namely an initialization to "1" designating a constant type. 45 IF THEN is provided as second control structure. Observe that the AND operators are defined to abort (right to 56 lent) execution as soon as it becomes apparent what the result will be. Using "&" instead would still cause the side-effects to take place. 46 TRUE and FALSE are built-in logical constants. 47 Tnis global_data_block describes the underlying representation for a formel. 48 In CLEOPATRA a POINTER can only refer to objects of a fixed type. A DEFER declaration serves essentially as a name carrier for allocation purposes; it specifies that the data item will be explicitly allocated (item 57). 49 This operator performs addition of two f orme l values. He would have to code subtraction, multiplication, division, and exponentiation in a similar fashion. 50 Addition is achieved by using terja to connect the two operands x and x with a C. * operator. 51 Here we create a unary operator on f ormel values. 52 Note that we attach the formel x by default as the first f ormel in this call to term. This must be considered when the interrogation arg is implemented (item 53). 53 This expression uses the fact that the fields of the variable x. ptr. x. var were shared. Similarly we would have to implement the operators right and arg. As noted in item 52, ar.g would have to be a carbon copy of left. 54 This operator runs the DECISION table (item 24). It must construct the appropriate values from the contents of the designated formel. 55 Observe how again shared information is used. "2", incidentally, is never recorded in a variable, it exists as a type indication merely through this comparison. 56 It may be confusing, but the returned item has no name, not even in this CONVERSION heading. Only the parameter i is indicated by name. 57 This statement causes the reservation of a new data item of type variable for the POINTER £jj>tr which can point to a variable. The data item is initialized (i.e., a call to INIT is issued, item 43) to represent an integer, namely i. 5b In this declaration, a one-dimensional array operators 57 is created, and initialized with a temporary array containing all the supported operators in f;ormel values as literal values. [This is a slight extension of the Report: no assignment operator on arrays was defined there, but it could easily be written. ] 59 This INTEGER will be used as index of a loop over the operato rs array, it is here set to the (first) lower bound of the array. 60 This loop is executed while the index i ranges from its current value (itea 59) up to the (first) upper bound of the operators array in steps of "1". 61 The empty statement; the word NIL is necessary because of the way CLEOPATRA defines statement separations. 62 The loop terminates with i designating the relevant element of operators. I assume that the element is found. Note that the contents of operators correspond through the type routine to the entries in the DECISION table (item 24) , admittedly a far reaching conseguence. However, ty_p_e could easily offset this. 63 designates the end of the loop. 3.1.3 Improvements. Let us discuss two modifications. First, ve should note that the symbolic differentiator as presented here does not perform algebraic simplification. A certain number of basic simplifications can be implemented at nominal cost, and I shall illustrate with ah example: I will present an improved version of the multiplication routine for two f ormel values. DATA mu ; formel x, y ; 1 CONSTANT variable one INIT 1, zero INIT END mu 58 mu: opehatoh foroel x * forael y ; IF (x.ptr.x.vaL == zero) OH y.ptr.y.var == zero THEN RETURN forael ; DECISION x_one: x.ptr.x.var == one ; y_one; y.ptr.y.var == one ACTION x_one & y_one: RETURN formel 1 ; x_one: RETURN y ; y_one: RETURN x ELSE RETURN term (x, y) . C* END END mu Comments, 1 The INIT operator (item 4 3 m section 3.1.2) is used to create two constants representing "0" and "1". The variabl e values are store protected by the CONSTANT attribute. 2 OR abandons further processing as soon as the result is apparent, contrary to its otherwise equivalent counterpart "J"; compare item 45 in section 3.1.2. 3 This decision phrase sets the switch x one to TRUE if the expression is non-zero, and to FALSE otherwise. Equal-comparison is TRUE or non-zero on equality* 4 Action phrases allow general switch expressions to control their execution. This particular action phrase implements the simplification of the product "1 * 1". 5 The ELSE exit in a DECISION table is taken only if no other action was selected. This phrase here implements the "normal" multiplication if no simplification could be found. This example demonstrates how we could simplify multplications by one and by zero. The next simplification would involve having term evaluate all possible numerical expressions. To this extent we would have to implement a CONVERSION from v ariabl e to REAL, which is not entirely a 59 trivial task. The second modification which I would like to consider is concerned with collecting the variable information in a symbol table, rather than in the variable itself. This requires a redesign of the implementation of the type_pack for variable but it does not require recoding of the f orael routines. I think I was successful in packaging all information about the internal representation of variable into its type_pack, in spite of the close operational ties between v ari able and for mel. I will not present a solution to this new problem. Essentially, we would have to support the same SHAREd information and the same operations as before, and we would have to encode the symbol information differently. I would most likely design a separate type_pack symbol table together with operations to locate and enter a new symbol. This type_pack can then be made as efficient as desired by the introduction of hash-coding access mechanisms, etc. 3-1.4 Design process. The elapsed time for my attempts at a solution as recorded here was about 20 minutes; during the first two minutes the differentiator was essentially completed; the remaining time was spent on elaborating on the representation and manipulation functions for formulas. (I 60 hope the reader does not consider this an unfair example: I have previously coded two differentiators [ Schreiner 1973], but neither one was recursive.) in this section the code will not be rigorous in the sense of the Report; I reproduce more or less exactly what I thought. STfiUCTDEE example ; J Symbolic differentiator. D: OPERATOR derive (variable) . formel RETURNS formel ; i namely the derivative. END example ; So far, so good. I would, of course, have to insert declarations for the types variable and formel,. Let us now write the main program: D: OPERATOR derive (variable x). formel y ; DECISION constant, variable, plus, minus, mult, div, pow, u_minus : type y 61 ACTION constant: RETOfiN forael ; variable: RETUBN forael 1 ; plus: RETUHN (der left y) ♦ der right y ; minus: ... mult: RETURN (left y) * ... div: • END END D Me can now define the type formel; from the preceding code it is pretty clear what operations I expect to perform. I should have listed the global_structure_block for the type_pack beforehand, but I wanted to see what I was getting into. Basically I need four-species arrtnmetic, and a few conversions. 62 GLOBAL STRUCTURE formel ; T: OPERATOR type formel RETURNS INTEGER ; ! This operator runs the DECISION switch above. L: OPERATOR left formel RETURNS iormel ; ! This operator splits off the left argument of ! a binary operator. Similarly OPERATOR right. A: OPERATOR formel ♦ formel RETURNS formel ; ! Let addition suffice. Actually I need more. C: CONVERSION TO formel PROM INTEGER ; END formel This cuts out the work. Next I made my first attempt at defining an underlying representation. I knew two things: my formel was a tree, i.e., it had (for binary operators) left and right sub-trees, and it carried type information. The most logical thing was to define a formel as a collection of an INTEGER for the information, and two POINTER variables for the sub-trees. This, of course, was where I made my biggest mistake thus far, but let us see how I recover. GLOBAL DATA formel ; INTEGER info ; POINTER (formel) 1, r ; END formel 63 I tried to implement a CONVERSION to construct a f orael from an INTEGER. (Mulf's original problem calls for REAL values in the expression, but for the des-ign exercise this is relatively unimportant.) C: CONVERSION TO formel FROH INTEGER i ; x.info := 1 ; ! constant has info 1. ! The POINTER variables are defaulted empty. RETURN X But nefore I even closed this routine, I discovered a basic flaw: info must consist of more than just the type. It must describe the various node types, as veil as the information content of a terminal node. In the second attempt I therefore expanded the information field to include a CHARACTER variable to hold the information in printable form {conversions between numerical values and character strings are a built-in feature of CLEOPATRA) , and an INTEGER variable to record the type information. The CONVERSION as well as the tvj>e interrogation operator were readily coded. Still being in the experimental stage, I did not worry about efficiency. I noted that I could implement the tyj>e operator through SHARing the INTEGER variable, but I was not sure whether the decision tanle and the formel management 64 needed the same encoding. The script, i.e. , the global_structure_block, calls for a definition of addition: A: OPERATOR formel a ♦ formel b ; ALLOCATE formel x FOR p_x ; p_x.x.t := J ; J Type of plus p_x.x.s := C.+ ; p_x. x. 1 := a I spared myself the data_block. Essentially, x would have been a DEFERed formel and £_x would have been a POINTER to it. I guess now that I should have written the data_block, but it did not really matter. The last line of code was impossible: formel by its global_data_block and its being a formal parameter was allocated in the block stack, but the assignment called for a POINTER formel a. For reasons of overhead avoidance, CLEOPATRA does not permit POINTER objects to reside in the block stack. I tried again. I used the technique that was suggested by the presence of the differentiation variable, i.e., I packaged the information describing a node of the formula in a separate data type, and I made formel simply a POINTER to this new data type node. 65 GLOBAL DATA node ; * I should have called it 'variable 1 . INTEGER info ; CHARACTER (10) symbol ; POINTER (node) le, ri END node GLOBAL DATA fociel ; POINTER (node) ptr ; END formel In the original notes I now made another mistake. I coded the CONVERSION as a member of the formel type_pack, and in this routine I made direct assignments to the node members, an impossibility. The whole idea of introducing the new type node is the introduction of a further fire-wall and protection of the knowledge of the contents of a node. The mistake could easily have been avoided had I considered what defines the type node, i.e., what operators will be needed for it. The minimum will again be a CONVERSION, or a routine that allows us to construct a new node. Another alternative would have been to define n ode as a DEFERed data_group inside the type_pack for formel- The disadvantage there, of course, is that we need the node information to describe the differentiation variable. Under the concept of information hiding [ Parnas 1972a-c, Liskov 1972] we should not make the information about the structure 66 of a node available in more than one place. With one aore iteration, namely with investigating the effect of making n ode a data_group inside fo rael , I did arrive at the final form as described in section 3.1.2. 3.2 A "buddy system" storage allocator. A storage allocator is a basxc module in any operating system. At its level, memory will not yet have been formatted into types; it will be handled by its size and base-address alone. I assume that an operating system usually will be interrupt driven, and that the storage allocation interrupt ("UETttAIN" in the Operating System/360) is one of the most basic and most privileged interrups of the system. The Report has only proposed, not firmly defined, the handling of interrupts in CLEOPATRA, as well as the access afforded privileged users to such special features of the hardware as storage protection keys. Conseguently the program suggested in this section is but my imagination of how it could be coded under an actual implementation of CLEOPATRA. 67 3.2.1 Problem statement. "Write three subroutines, get (n) , £ree(i,n), and init which use the "buddy system" to allocate memory from a linear vector m (the bounds on m are assumed to be from zero to 2**k-1). The subroutine get (n) is to return the (positive) index of a free area of size 2**n unless it is not possible to allocate an area of this size — in which case -1 should be returned. The subroutine free (i,n) will free the area of storage beginning at index i and of size 2**n. The value of free is normally zero unless an error is detected in which case -1 is returned. The (value-less) subroutine init must be called before either get or free." Details of the algorithm can be found in [Knuth 1968] in the second printing (the first printing, according to Professor Knuth, has an error in the description of the algorithm). A few useful hints concerning the detection of "buddies" as found in [Wulf 1972a] are: "If a and b are the binary encodings of the indices of two areas of size 2**sa and 2**sb respectively, then: a) these areas are •buddies 1 iff: sa = sb and (ajjb) = 2**sa b) the index of the buddy of a is given by: aj J2**sa c) the area defined by is included in the area 68 described by iff: (a j |b) < 2**sb . •• Observe that "II" denotes the bit-vise exclusive or. 3.2.2 Solution. The code presented in this section again is annotated with comments as indicated by the numbers on the left margin. The comments follow the code. These comments are less intended to introduce the reader to CLEOPATRA; rather, they are intended to describe the implementation of the algorithm. Familiarity with the algorithm as such, and with the IBM System/360 family of computers, is assumed. 1 GLOBAL DATA buddy_system ; 2 CONSTANT INTEGER max INIT 24, mm INIT 11 ; 3 LONG_INTEGER LEFT (2, min:max) avail_list END buddy_system 4 STRUCTURE buddy_system ; ! Buddy system storage allocator ATS 6/74 5 CONDITION init ; 6 COND obtain LONG_INTEGER BY ADDRESS (INTEGER) ; 7 COND free LONG_INTEGER (INTEGER) ; 8 IN: INTERRUPT init ; 9 OB: IR obtain LONG_INTEGER BY ADDRESS (INTEGER) ; 10 FR: IR free LONG_INTEGER (INTEGER) ; 11 COND LOW INTEGER BY ADDRESS ; 12 COND HIGH INTEGER BY ADDRESS ; 13 COND NONE INTEGER ; 14 COND ODD LONG_INTEGER BY ADDRESS (INTEGER) ; 15 COND FREE LONG_INTEGER (INTEGER) ; 69 16 PI: OPERATOR f irst_available INT RETURNS INT ; 17 PU: OP put to use INTEGER RET LONG_INTEGER ; 18 RE: OP recombine LONG_INTEGER RET LONG INTEGER ; 19 HA: OP make_avaiiable (INT). LQNG_INTEGER RET INT ; 20 TA: OP TAG LONG_INTEGER RET BIT ; 21 KV: OP KVAL LONG_INTEGER RET INTEGER ; 22 NE: OP next LONG_INTEGER RET LONG_INTEGER END buddy_system 23 PROCEDURE buddy_system ; 24 ON init init ; 25 ON tree tree LONG_INTEGER ; 26 ON obtain obtain LONG_INTEGER ; 27 SIGNAL init ; 28 END buddy_system ; DATA IN ; INTEGER i INIT min END IN 29 IN: INTEBRUPT init ; FOR i UPTO max - 1 DO 30 avail_iist (1,i) := avail_list (2 # i) := d) avail_list (*:*,i) END 31 avail_iist (1 # max) := avaii_iist (2,max) := F-0 ; 32 tirst_available max ; 33 RESUME END IN DATA OB ; 34 INTEGER n INIT min, j, k INIT n ; 35 LONG_INTEGEfi L END OB OB: INTERRUPT obtain LONG_INTEGER L (n) ; 36 L := F.-1 ; 37 IF k < min THEN BEGIN SIGNAL LOi k ; IF k < ain THEN RESUME END 70 38 IF k > max THEN BEGIN SIGNAL HIGH k ; IF k ) max THEN RESUME END 39 FOR j FROM k UPTO oax DO 40 WHILE avail_list (1,j) == a) avail_list (*:*,j) ; NIL END 41 IF j > max THEN BEGIN SIGNAL NONE X ; RESUME END 42 L := put_to use j ; 43 FOR j FROM j - 1 DOWNTO X DO 44 avail_list (1,j) :- avail_Iist (2,j) := L ♦ 2 ** j ; 45 f irst_available j END 46 RESUME END OB DATA FR ; 47 LONG_INTEGER i, L INIT i, P ; 48 INTEGER n INIT min, k INIT n, kk INIT max END FR FR: INTERRUPT free LONG_INTEGER i (n) ; 49 IF k < min THEN BEGIN SIGNAL LOI k ; IF k < min THEN RESUME END 50 IF k > max THEN BEGIN SIGNAL HIGH k ; IF k > max THEN RESUME END 51 IF L MOD 2 ** k THEN BEGIN SIGNAL ODD L (k) ; IF L MOD 2 ** k THEN RESUME END 52 FOR kk DOWNTO min DO 53 P := avail_list (1,kk) 54 DO 55 WHILE P -= d avail_list (*;*,kk) ; 56 IF k > kk 57 THEN IF (P DIFF L) < 2 ** k THEN BEGIN SIGNAL FREE P (kk) ; recombine P END 58 ELSE NIL 59 ELSE IF (P DIFF L) < 2 ** kk THEN BEGIN SIGNAL FRJ&E P (k) ; RESUME END next P END END 60 FOR k UPTO max 71 DO 61 P := L DIFF 2 ** k ; 62 IF (Jc -= KVAL P) OB (-* TAG P) OH k == aax THEN EXIT ; 63 recombine P ; 64 IP P < L THEN L := P END 65 make_available (k) . L ; RESUME END FH DATA FI ; 66 INTEGER size ; 67 ALIGNED (LONG_INTEGER F, B, INTEGER kval) block AT (avail_list (1,size), F.10) END FI FI: OPERATOR f irst_available INTEGER size ; 68 avail_list (1,size) SETKEY Q~Q ; 69 block. F := block. B := a) avaii_list {*:*, size) ; 70 block. kval := size ; RETURN END FI DATA PU ; 71 INTEGER size ; 72 LONG INTEGER L INIT avaii_list (1,size) ; 73 ALIGNED (LONG_INTEGER F, B, INTEGER kval) block AT (L, F.10), next AT (block. F, F-10) END PU PU: OPERATOR put_to_use INTEGER size ; 74 avail_list (1,size) := block. F ; 75 next.B := a) avail_list (*:*, size) ; 76 L SETKEY Q. 1 ; 77 RETURN L END PU DATA RE ; 72 78 LONG_INTEGER address ; 79 ALIGNED (LONG_INTEGEB F, B, INTEGER kval) block AT (address, F.10), prior AT (block. B, F.10), next AT (block. F, F.10) END RE RE: OPERATOR recorabine LONG_INTEGER address ; 80 next.B := block. B ; 81 prior. F := block. F ; RETURN address END RE DATA MA ; 62 INTEGER size ; 83 LONG_INTEGER address ; 84 ALIGNED (LONG_INTEGER F, B, INTEGER kval) block AT (address, F.10) first AT (avail_list (1 # size) # F.10) END HA MA: OPERATOR make_available (size). L_INT address ; 85 address SETKEY Q.O ; 86 block. F : = avail_list (1,size) ; 87 block. B := d avail_list (*:*, size) ; 88 block. kval ;= size ; 89 first. B := avail_list (1,size) := L ; BETURN END MA DATA TA ; LONG_INTEGEB address ; BYTE key END TA TA: OPERATOR TAG LONG_INT£GER address ; 90 IF key : BEADKEY address THEN RETURN FALSE ELSE BETURN TRUE END TA DATA KV ; 91 LONG_INTEGEB address ; 73 92 ALIGNED (LONG_INTEGER F, B, INTEGER kval) block AT (address, F.10) END KV KV: OPEfiATOB KVAL LONG_INTEGER address ; 93 RETURN block. kval END KV DATA NE ; 94 LONG_INTEGER address ; 95 ALIGNED (LONG_INTEGER F, B, INTEGER kval) block AT (address, F.10) END NE NE: OPERATOR next LONG_INT£GEfi address ; 96 RETURN block. F END NE Comments. 1 Data items in this block are available not only in the buddy system procedure but throughout all configurations nested into it. 2 Protection is available for blocks of 2048 bytes or more. min denotes the base-2 logarithm of the smallest allocatable block size, and max denotes the dual logarithm of the largest available block size. It is assumed that the storage allocator resides outside of this area. min and max are read-only because of their CONSTANT attribute. 3 This is the array of list-heads for the storage scheme. Addresses are treated as LONG_INTEGER values. CLEOPATRA supports column-major (LEFT) and row-major (RIGHT) arrays. 4 In this block we define the calling conventions for all routines which the allocator uses and provides. 5 The routines required by the pronlem will actually be provided as handlers for these interrupts. He are thus able to issue privileged instructions while we are 74 executing storage allocator routines. b Tne obtain condition is raised to obtain memory. The address (or F.-1) is returned as the principal argument of the condition, and the size is supplied as the first and only parameter. 7 The tree condition is raised to release memory. The address is supplied as principal argument, the size is the first and only parameter. 8 This entry defines a handler for the init condition, 9 this entry defines a handler for the obtain condition (the actual allocator) , and 10 this entry defines a handler for the free condition. Observe that condition and handler names are recognized together with the type of a possible principal argument. 11 Tnis condition is raised by obtain or free if the requested size is smaller than tne smallest allocatable block, size. The block: size requested is supplied as argument and may be changed for use by the allocation routines. The allocation routines fail if the size is not corrected. 12 This condition is raised by obtain and free if the requested size exceeds the largest allocatable block size. The block size requested is supplied as argument and may be changed for use by the allocation routines. The allocation routines fail if the size is not corrected- 13 This condition is raised by obtain if no block of the requested size, as indicated by the argument, is available. The allocation routine fails, i.e., it will return F--1. 14 This condition is raised by free ir the address as indicated by the principal argument is not a multiple of the requested size as indicated by the first and only parameter. The address may be corrected for use by the allocation routine- The allocation routine fails if the address is not adjusted. 15 This condition is raised by free if the area to be released is part of, or contains, an area which is already free. The area which is already free is indicated by the argument and parameter. The free 75 operation continues by recombining a contained area, or it terminates if the requested area xs part of a larger free area. 16 The node in availlist indicated by the argument points to an area which is to be linked into this avail_list as the first entry. Zero is returned. 17 The first area on the free list as indicated by the argument is to be detached from the free list. Its address is returned. 18 The referenced available area is to be delinked from the free list for recombination with its "buddy". Its address is returned. 19 The referenced area is to be added to the indicated free list (but not necessarily as only member) . Zero is returned. 20 returns TRUE iff the indicated area is available. 21 returns the size of the indicated tree area. 22 returns the area following the given area on the free list. 23 This procedure essentially is ficticious. It indicates how we would connect the handlers and how we would intialize the storage allocator. 24 connects the init handler. 25 connects the free handler. Observe that the ref_type of the principal argument must be stated for clear identification. 26 connects the obt ain handler. 27 This statement raises the init condition. It serves to initialize the storage allocator. 2b At this point further requests could be made. 29 designates the beginning of the code for the initialization routine. 30 All elements of the avai l l ist array are set to point to themselves; "d)" indicates the address of its argument. Observe that at this point control over the allocation scheme of the avail list array is needed. 76 31 The last element of the avail l ist array is set to point to the first available storage block at address P.O. 32 links the area onto the last free list. 33 terminates processing of the init interrupt. . 34 n is the size to be allocated; if omitted, it would be initialized to min. j is a local indexing variable, and k serves as the routine's own modifyable copy of n. 35 L is the address to be returned, 36 it is initialized to the error return value F.-1. 37 ensures that the requested size is large enough. 38 ensures that the requested or adjusted size is within the limits of the system. 39 Sweep over the free lists of the given or of larger size. 40 As long as the lists are empty (point to themselves) do nothing. 41 If the search was not successful, j would now exceed ma x and we signal failure. 42 L is set to point to the available area, and the area is released from the free list. 43 Sweep back down over the empty free lists, up to the list for the requested size. 44 Point each list-head to the buddy of the given area, and 45 link the buddy as a free element onto the list. 46 Terminate the processing of the obtain condition. 47 i denotes the address to be released, L is a copy of this address for use by the routine, and P is a local address variable. 43 n denotes the size to be released, it would be initialized to the size of the smallest possible area, k is a copy of n for use by the routine, and kk is a local variable. 49 ensures that the requested area is large enough. 77 50 ensures that the requested or adjusted size is within the limits of the system. 51 If the given address is not divisible by the (possibly adjusted) size, i.e., if it has a non-zero remainder when divided by the size, raise the ODD condition. 52 Sweep down over all free lists to analyze an inclusion relation. 53 On each list, pick up the first area, if any, and 54 Sweep along each free list 55 back, to the list-head. If the list_head has not been reached, 5b if the free area is of smaller size than the area to be freed, and 57 if there is an inclusion: raise the FHEE condition and recombine the free area (it will be absorbed) . 56 If there is no inclusion, do nothing. Observe the otherwise "dangling" ELSE. 59 If the free area is larger than or egual to the given area, and if there is an inclusion (or equality), terminate processing the free condition. The search over the free areas was from largest to smallest areas and would return the largest inclusion. 60 From the given size up to the system bound, 61 determine the buddy, and 62 if we are looking at the largest possible size (there is no buddy) , or if the buddy is not free, or if the buddy is free but has the wrong size, then quit moving up over the lists. 63 If we still sweep up over the lists, combine the buddies, 64 adjust the address so that it points to the buddy if it should be lower in memory, and continue on to the next larger size. 65 Add the largest contiguous free area to the proper list. 66 This variable designates the free list to be used. It is an index into avail list. 78 67 This is a data_yroup, consisting o£ two addresses (the forward and backward links) and an INTEGER (the dual logarithm of the size of the area) ; it is a block descriptor overlay for the beginning of the area pointed to by the entry in the ayail_list of length F.10 bytes, and it is referred to by the name felock- 6b The storage key of the indicated area is set to Q.O, indicating an available area. 69 The area is linked to the list-head, and 70 it's size is recorded. 71 This variable designates the free list to be used. It is an index into avajl_list. 72 L points to the area to be put to use. 73 He have two block descriptor overlays, block for the area to be used, and next for the second area on the list, which may indicate the list-head, however. 74 Forward bypass on the linked list. 75 BacKward bypass on the linked list. 76 The storage key of the delinked area is set to be non- zero. This storage key would, of course, normally be a parameter to the put to u se routine. 77 The routine returns the address of the area. 16 This variable designates the area to be recombined. 79 We have three block descriptor overlays, bl oc k for the area to be recombined, p_Lior aa( * next for the surrounding entries on the list. The latter may designate the list-head itself. 80 Backward bypass on the linked list. 81 Forward bypass on the linked list. 82 This variable designates the free list to be considered. S3 This variable designates the area to be added to the free list. 64 tfe have two block descriptor overlays, block for the area to be freed, and fir st for the first area, if any, on the linked list. 79 85 Zero the storage key to indicate a free area* 86 Forward link the area to the first list entry. 87 Backward link the area to the list-head. 88 Record the size. 8b> Backward link the first area on the list, and forward link the list-head, both to the area. 90 Interrogate the storage key, and if zero indicate that the given area is free. 91 This variable designates the area to oe interrogated. 92 Me have one block descriptor overlay for the given area. 93 Simply return the size as recorded in the block descriptor. 94 Tais variable designates the area to be interrogated. 95 We have one block descriptor for the given area. 96 Simply return the forward link as recorded in the block descriptor. J. 3 Conclusions. Having completed coding and exposition of these examples, I would like to offer some conclusions. We should asx perhaps four guestions: Was it easy to code the algorithms, i.e., is there positive interference between the use of CLEOPATRA and the thought process executed while creating a concrete representation of the originally fuzzy idea of the algorithms? 80 Is the exposition of these essentially familiar algorithms concise and clear? Do we expect an efficient encoding of the algorithms, i.e., can we expect that a compiler for CLEOPATRA would produce reasonably efficient machine code with a reasonable amount of effort? Can the language express all algorithms that we wish to perform? It was the purpose of section 3.1.4, the detailed description of the design process leading to the symbolic differentiator as presented in section 3.1.2, to support an affirmative answer to the first of these guestions. Designing a programming language is a terribly subjective thing, and I cannot hope to have designed a language that will not permit a user to write incorrect programs. However, I am under the impression, at least in this example, that there was indeed a very positive interaction between grammatical correctness of the code and correctness of the program. The 'define before use 1 rule, the basic strategy imposed by the availability and the structuring of type_packs, and the scope rules for variable names helped and encouraged me tremendously to stay on a sensible top- down programming track. I was never at a loss as to what the next implementation task at hand would be. 81 Can we do the sane thing in machine language? It seems to ae that this does not only take more effort, and more intellectual discipline to stay on the top-down track and to defer one's decisions, but I find myself constantly crossing the artificial walls self-imposed by modules and their hidden knowledge- Admittedly, rUTOd IV, the language of the PLATO IV CAI system, although offered as sole means of communication even to absolute newcomers, is very far from being suitable for coding complex systems, and it is thus not guite a fair target for comparison. Nevertheless, I struggled tor about a month while realizing suitable data structures for symbol manipulation in this type-less language, which does offer a rudimentary procedure structure. Short of imposing intellectual discipline and short of employing macros (and thus the initial stages of an i uplementation language [Zelkowitz 1972]), a machine language programmer is even more deprived of the guidance offered by a block structure, or by the even stronger structured CLEOPATRA type_pack mechanisms. Turning to the second question, I would tend to believe that CLEOPATEA programs can offer a very clear description of algorithas. I find the separation ot the use and the realization of abstract data types, together with the facilities for overloading and thus for creating problem- oriented data structures and operators, extremely helpful in structuring the presentation of a program. If we adopt 82 Denning^ definition [1974], I would definitely call the symbolic differentiator a structured program, and ay sometimes slightly "artistic" approach to coding then owes to CLEOPATHA that it*s result follows a reasonable pattern. The "buddy system", although not employing a type_pack, also seems fairly well structured, 1 found in particular the assembler-language-style handling of addresses and linked lists very easy to accomplish, aud generally less subject to error than similar projects which I coded in assembler language earlier. It was particularly helpful to take certain standard tasks out of Knuth's algorithms and worry about their encoding later- The structure block helps me to annotate yet missing routines and tne assumptions made for them. The technigue of signaling errors, allowing the user a corrective action before an error return is issued, should allow a very flexible embedding of the allocation modules into context. (My solution is not guite foolproof, but an array subscript range check certainly would prevent possible abuses.) Section 4 of this report will present some thoughts on a possible implementation. Until an actual optimizing compiler exists, we can only hope for efficient code; an actual implementation will have to discover and revise the language features which cause inefficiency without being very convenient or an addition to the security of the 83 language. Id general I would hope that clean and segmented code, such as that presented In these examples, can be translated into a set of equally clean and efficient machine instructions. Too much sectioning of a program can easily result in a large overhead in inter-segment calling, which is why many authors call for efficient macro facilities which are indistinguishable from procedures, so as to afford the compiler (or the user) the possibility of allocating seldom called logical segments physically in-line with their involutions [Bilofsky 1974, Bowlden 1971, Clark 1971a-b, 1973, Earley 1973b, Hammer 1971]. The general purpose part of the CLEOPATRA language, as defined by the Report in sections 1 through 7, was designed and frozen before the language was extended to provide interrupt handling and access to "odd" parts of the underlying hardware. Hith, so far at least, but one exception, this extension was accomplished within the prescribed framework of the language- It was not really necessary to add major new units of expression to the language. The exception was the AT operator, and I shall briefly sketch what went wrong to address myself to question four above on the completeness of the language. As defined by the Report, AT was intended to allow the (restricted) access to 84 memory by address, rather than within the framework, of typed variables. It is clear from the coding of the storage allocator in section 3.2.2 that such access both for interrogation and for storing is necessary, at least at that primitive level of an operating system at which unformated memory exists. The problem was to find a description for AT which somewhat fit into the general guidelines of the syntax of CLEOPATRAi AT requires a type designation to indicate which way the addressed memory is to be used, and it requires an address, and for security reasons it also requires a size indication. The Report used the concept of a system_supplied_value, such as NIL, which must be typed before it can be used, and it extended the concept by stating that this particular s/stem_supplied_value was returned as a storable quantity. I considered that a minor and reasonable extension of the general language. Unfortunately, as the storage allocator demonstrates, it is much more convenient to give the result a name, and it is imperative that the result can be read as well as written, and that it is not automatically preinitialized as the Report demands. The alternative is implicitly presented above, namely, AT is now a phrase like the INII phrase in a declaration, and it requests the creation (but not the implied, only the explicitly requested initialization) of a named entity at a specific address. A problem, at this point left to the 85 ioplementor and to the user community, is when to allow the use of AT: I would expect that it iust appear at the saie level of a declaration at which the SHARE, DEFER, and CONSTANT attributes reside, so as to prevent artificial splitting of a contiguous data_group. Still, I would consider this new definition to be within the framework of the general language, and 1 believe that this is an extremely powerful and guite dangerous extension to the language. "Surely the restrictions will be felt a burden only by those programmers whose delight it is to use their programming tools for purposes for which they were not originally intended." [ Hoare 1972a J. Or conversely: it will never be possible to protect those programmers from themselves who see a challenge in such protective measures. 86 4. Implementation. As remarked in section 2, I have not yet implemented a compiler for CLEOPATRA or for a subset of the language. I believe that I have designed the language in a reasonable fashion so that a subseguent implementation effort on the whole should succeed without major complications. The present section is intended to be less a guide book for a prospective implementor and more a collection of my own thoughts, a collection of the various implementation decisions which have or have not yet been made. The section will discuss basically three topics: first, I will give an overview of the compiling process; without becoming too technical, I intend to sketch the basic components of a CLEOPATRA compiler and their tasks. Second, I am going to discuss the three major problems which a compiler for the well-defined general purpose part of the language will have to overcome; one of these problems, the precise implementation of POINTER variables and the restrictions imposed by the implementation, has thus far not been resolved and I shall sketch two possible solutions. Third, there is one area where the CLEOPATRA implementor must closely cooperate with his primary user, the operating system designer: I have in the Report only sketched a 87 possible interrupt scheme together with the related direct access to certain hardware components which the operating system designer must be afforded. I shall therefore summarize what extensions to the general purpose part of the language will have to be made by the implementor of a complete CLEOPATRA compiler. Let me just mention a few papers on compiler design, deliberately excluding the easier accessible and relatively well-known textbooks on the subject. Concrete representations and the related problem of portability of the source modules were analyzed, for ALGOL 68, by Hansen [1973, 1974]. Overall descriptions of the design of a compiler were given, among others, by Conway [1963] for COBOL, by Irons [1970] for the extensible IMP language, by Richards [1971] for BCPL, by Hirth [1971b] and Welsh [1972] for PASCAL, by Colin [1972] for STAB, and by tfulf [1973b] for BLISS/11. The last paper describes an optimizing compiler for a minicomputer. The technique of bootstrapping, i.e., the generation of a compiler while using the language which it is intended to compile, was used for BCPL, PASCAL, and STAB. Further remarxs on this subject have been made on the extensible ECL system ay Brosgol [1971] and mainly in Hegbreit^s thesis [1972a], and on the use of SNOBOL by Dunn [1973] and Hanson 68 [ 1 973 J. Wilcox [1970, 1971] addressed himself mainly to the problem of code generation and presented this synthesis phase of compilation as a very rigorous procedure. One of the trickier aspects of code generation, especially for reasons of compatibility with other languages, is the definition of subroutine linkage conventions; papers on this subject exist by Dickman [1972 J and Mulf [1973c]- Dickman proposes macro techniques, while Hulf suggests that the user should be able to participate in the process. Access to memory from higher level languages was discussed by Duijvestijn [ 1971 ]- A number of authors deal with suitable symbol table management strategies. A practical example, for PL/I, is given by Busam [197^:], and Brent [1973] describes some search optimization strategies. CLEOPATRA will require the analysis of many names consisting of a sequence of identifiers. Gates [1973] describes a very elaborate deterministic mechanism, while Abrahams [1974a] holds that these references (in PL/I) appear too seldom to warrant such an approach - the truth might well lie somewhere in between. 89 4. 1 Global considerations. There exists a large number oi excellent books and papers on compiler design; in particular, much research has been concerned with the recognition of languages. I therefore do not expect major complications in accomplishing the first task with which a compiler is faced, the task of recognizing a correct program. Two design decisions should be of assistance to the recognizer phase of the CLEOPATRA compiler: since I expect to have upper and lower case letters available, I decided to reserve all keywords of the language, and similarly there is the rule that each symbol must be defined or declared before it is used. Reserving keywords should significantly ease the recognition problem - it also allows us to be extremely lenient with respect to explicit statement separation - and with two sets of letters available, it should not be a bothersome restriction. The "define before use" rule I consider essential for the construction of clean programs I may let a compiler help me in defining undefined symbols, but I expect the compiler in any case to clearly report such deraulting actions. Assuming and enforcing this rule makes compilation much simpler, at no expense to the user. The recognition phase is further simplified by the very context dependent information that it can encounter- Corresponding to the different blocks, the compiler can 90 essentially be split into three parts which ace each capable ot handling one type of compilation block; structure blocks can be completely handled by one segment of the compiler (they carry only symbol table information) , the routine and data blocks presumably would share the expression encoder segment of the compiler. The data declaration scanner will only be applied in data blocks, the statement scanner need only apply to routine blocks. The compilation blocks pose a table management problem, however. Environment requests, either explicit (COMPILE INTO) or implicit by context, must be acted upon by a compiler module which precedes the partitioned recognition phase, and which, together with an efficient symbol table library manager, must precondition the compiler tables so as to create an appropriate static environment. This environment is not entirely the result of a tree walk, or rather, the tree walk does include a lookahead each time a reference to a type_pack is encountered; the lookahead must include the global_structure_block and potential SHAREd data item names among the statically available configurations. Altogether, this requires a doubly linked representation of the tree of configurations. Mike Jamerson has successfully written a prototype file access mechanism in PL/I which could be used to manage a CLEOPATRA symbol table library on secondary storage. 91 The name table management as such during compilation of a sequence of blocks is iairly conventional in its requirements. He must be able to close blocks of information and to delegate them to external storage, we must be able to analyze complete structured references, and we must be able to redefine names in inner scopes. Suitable table schemes have been suggested various times in the literature. CLEOPATRA poses three slight problems: like names can be used in a semantically different context, i.e., within the same scope they may simultaneously denote two different things, there is an ALIAS name mechanism, and names may be recalled as BDILT IN. The "identifier classes" were explained in section 3.3.1 of the Report; basically, the classes are selected in such a fashion as to enable the recognizer to make a unique selection from the classes based on the semantic context (e.g., a switch cannot appear as a laoel, or an opera tor_link appears in a position where an operator cannot appear, etc.). The treatment of ALIAS names is more difficult, but it snould be easily solvable by a suitably structured symbol table. What procedures and operators can be reestablished through the BUILT IN attrinute is left to the implementor to decide. He may even be guided by the privilege level at which the user is coding. Generally, I would expect that all the operators in appendix B of the Report are available as BUILT IN. Being able to recall BUILT IN operators at any time implies that 92 the symbol table of the compiler is initialized at any time at least to contain, in addition to the keywords of the language, the names of all the BUILT IN operators, marked as defined in some encompassing block to vhich the BUILT IN mechanism then would provide access. In addition to recognizing the given language, a compiler must be able to provide its user with an indication of what was recognized. This requires usually at least a listing of the incoming program, together with some indication as to statement numbers, nesting levels, etc. One of the more promising techniques (which has for example been used in SUE) is the introduction of a paragraphing module which will print the presented source text in a syntactically oriented format. This technigue allows in particular the clear identification of the depth of exjit statements, and it could be used to also annotate the amount of distribution of operators over data_groups, etc. In general, I would definitely favour at least a visual report to the user as to what has become of his constructs - as a minimum this enables a post mortem analysis of program malfunctions in a simple fashion. While the recognition and exposition phases of the compiler should not prove to be difficult to realize, the synthesis phase definitely has some unanswered questions for an impiementor. In the next two sections I shall discuss 93 part of these. At this point I would still like to mention our initial idea: one of tne influencing factors in the design of CLEOPATRA was to be able to write the run-tine support modules in the language proper. It is not so difficult to support for example mathematical functions and the like. The problem lies in realizing such basic operations as the maintenance of the activation record stack, the area that handles automatic storage allocation. Since the privileged user, and the compiler implementor would gualify as such, is given almost unrestricted access to the underlying hardware, the realization of the entire run-time support sould be feasible, but I would expect this to be one of the fine points of communication between the language implemantor and the operating system designer. It is extremely hard to classify what constitutes run-time support, and what is a proper part of an operating system. 4.2 A tew specific problems. Most statements in CLEOPATRA can be found in other languages, and their synthesis problems therefore have been solved before. Some fine points, such as an efficient, optimized realization of a generalized decision table, the proposed ability to return arbitrary values from all routines, or the mandatory initialization of all data items 94 and its interplay with the omission of parameters in a routine call, will pose some difficulties. In tins section I would like to discuss three major problems: the POINTER problem, as explained in section 4.2.1 of the Report, the implementation of the proposed array referencing mechanism, and some difficulties inherent to the proposed data type extension mechanism in connection with the also proposed generic constants. I can only offer some suggestions which might aid an actual implementation to arrive at definitive solutions. 4.2.1 POINTER problems. The Report has at great length discussed the difficulties that a completely unstructured strategy for the heat can cause. More material can, incidentally, be found in [Hoare 1973a], who calls POINTER values "a step backwards from which we may never recover". Essentially we wish to provide explicit allocation and explicit or implicit release in conjunction with the inability to make uncontrolled reterences through POINTER variables and with the inability to lose memory through loss of ref erencability. Additionally we prohibit garbage collection as memory demand dictates, and undue unrelated overhead. 95 Professor Friedman worked out a completely secure management, whereby each POINTER consists of four address fields. One field points to the object, i.e., it is the actual "pointer value" described in the Report. Two fields are used to maintain a two way linked list of all copies of the pointer value of a given object. (This list could just be circular; essential is only the ability to randomly delete any member of the list.) The remaining field maintains a one way link through all POINTER variables residing in an object or activation record. Maintenance is the obvious: upon release of an object, the chain through ali. POINTER variables in the object is used to actively detach each and every one from its object, and if this causes the deletion of the last pointer value copy for another object, the release mechanism is invoked recursively. The two way linked lists serve to speed up the delinking process. My solution to the storage allocator in section 3.2 leads me to believe tnat tnis mechanism can be coded in CLEOPATRA. This management of POINTER variables has the advantage of causing moderate overhead for the feature proper, and of meeting all our other criteria above. In particular, objects can live as long as the user may desire. There is no tie from the activation record stack, i.e., the nesting of configurations, to the objects in the heap. The flaw is described as the "parameter passing problem" in the Report: 96 we wish to pass objects as routine parameters in a manner indistinguishable from data items in the stack. Each time we pass an object, we "dereference" it, i.e., we create another copy of its pointer value which is the actual item passed. In view of our POINTER management, we then must reguire that the parameter description in_general take the form of a pointer, and that for any passed parameter we always check the REFERENCE condition before we access it, whether the parameter is an object or not. This, however, I would definitely consider an undue and unrelated overhead. k prospective implementation would have to overcome this problem somehow, if the management is chosen as outlined here. Manacher^s ESPL language [1971] uses a different approach, which to me seems to oe both less overhead affected and overall more secure. Essentially the heaj> is managed as an extension of the activation record stack. Before another configuration is added to the stack, the user (in the data block) may declare the size of a •hole* to be reserved in the stack. Subseguent allocations may be charged against these holes and there is no explicit release so as to avoid fragmentation. The noias disappear as the activation record stack decreases. There is no memory leakage since all holes have been reclaimed once the user's stack ceases to exist. 97 The management requires only one extra field per POINTER: this field describes the life expectancy of the POINTER and that of the pointer value. The system is secure in the sense that the REFERENCE condition cannot be raised, as long as we do not assign pointer values to POINTER variables which outlive them. In other words, a POINTER may never point to an object that sits at a higher nested level of the stack than the POINTER. The condition is easily monitored on assignment, but it may seem restrictive. Space is not exceeded globally, but locally within each hole, and the life of objects is somewhat tied to the nesting of configurations. An actual implementation of CLEOPATRA must decide which of the two described POINTER managements, or which other management, will be most suitable for the operating system to be implemented in CLEOPATRA. There is an obvious question: why not let the user design his own type_pacx for POINTER items? I am at a loss as to what constitutes a more primitive and secure data item from which this type_pack could be realized. Additionally I find that the ALLOCATE statement cannot be readily replaced by a procedure which without any extensions of the usual calling mechanism would provide the same feature of initialization before or after POINTER allocation. Moreover, POINTER is an object- type-sensitive data type. For reasons 98 of efficiency I took, care not to include any features in CLEOPATRA which Mould require dynamic type checking; we therefore would have to write POINTER facilities for each new data type that is ever pointed to. Let me close this discussion with a reference to Dijkstra's Turing Award lecture [1972]: he anticipates that even the do loop of FORTRAN might he eliminated from future programming languages as too baroque. The concept of POINTER variables seems to me a much more suitable candidate. 4.2.2 Arrays. CLEOPATRA proposes a rich set of restructuring mechanisms for arrays. While there must be a full subscript range checking, we allow the redefinition of lower bounds, we allow the extraction of just a part of each extent, and we allow the (artificial) insertion of further extents of value one. One possible dope vector structure could provide a base address field, a chop field, and a vector each of virtual and actual extents, and of virtual and actual lower bounds. The location formula for an array would look as follows: location = base ♦ chop * SUM(i=1,n) [ PRODUCT (j= 1,i-1) [ extent (j) ] * (subscript (i) - lowbound (i) ) J 99 Initially, the chop tield is set to one, and the virtual and actual bounds and extents reflect tne initial creation data- Actual lower bounds and extents participate in the subscript checking, and the virtual lower bounds and extents are used in the location formula above. New extents of length one can de added or inserted without problem, since they will effectively not participate in the location formula. If we eliminate extents, i.e., if we crossect the array, the cnop field, or the neighboring virtual extent fields absorb the vanishing extent so as to maxe the location formula "jump" across tne areas eliminated in the crossection. The chop field contains the product of all extents that were completely eliminated from the major side of the dope structure. Selection or a part of an extent modifies the actual lower bound tor subscript checking. A redefinition of a lower bound cnanges the actual lower bound for subscript verification, and it also changes the virtual lower bound by a corresponding amount so as to offset tne effect of the changed lower bound in the location formula. While the virtual extent remains unchanged in either case, the actual extent always reflects the allowed range of the subscript relative to the lower bound, and therefore needs to be adjusted in the first case. 100 Accommodating row-major and column-major storage schemes is slightly more complicated. They differ essentially only by the direction in which the products over the extents are taken in the location formula. I would most likely carry a flag in the dope structure indicating the storage scheme, and I would instruct the location mechanism to consider the flag. This must be done at run time, since we do not know at compile time whether an incoming array parameter will be left or right major allocated, so that we cannot compile complete code at that time. 11*2*3 Generic values. The Report indicates that the user is allowed to create instances of NIL, SMALL, and LkUGE for arbitrary data types. He has no explicit control over the values that will be returned (the idea to make the user write NIL, etc. procedures with each type_pack was abandoned as too restrictive); instead, these vaiues are recursively defined: e.g., NIL is numerically zero for numerical types, and unless explicitly initialized, fields in the underlying representation of a user-defined data type will be set to NIL if NIL for the new data type is requested. As for an implementation, each generation time routine simply will have to provide entry points (routines) which realize the generic values. It should be clear to the user 101 that reliance on default initialization may prove to be costly in time and space requirements, should he call on the expansion of generic values. But 1 believe that the overhead can be well localized to the feature itself. Generation time routines must provide other services, such as a uuiform means to determine the storage requirements both for an allocated instance of the data type, and for tne respective dope structure- The generation time routines also might have to participate in POINTER management as described in section 4.2.1. The £CL system uses a "mode" compiler for the creation of all the creation time routines; I believe that an implementation of CLEOPATRA will have to cope with some standardization problems in this area. 4.3 Extensions left to the implementor. While the general purpose part of CLEOPATRA has been fairly completely specified, the operating system dependent areas of interrupt resolution and of access to basic hardware features have received less attention. Since I believe that an actual operating system will be interrupt driven, and since I do not wish to preempt desirable mechanisms at this time, I left this area to be decided in cooperation between the language designer and the system 102 impleinentor. My own suggestions in section 8 and * of the Report try to mirror the interrupt structure of the underlying anticipated machines very closely. 1 assume that my idea of having the interrupt routines receive parameters, and of malting their names, as well as the corresponding conditio ns . type sensitive like unary operators, will considerably simplify the amount of implementation dependent information that must be reported to the user. In this fashion we can provide the user with a standard set of computational exceptions which he may expect and provide for for each of the basic types that he will use- There is just one slight catch: such a standard set of interrupts may be somewhat denser tnan the implementor is willing to distinguish. He might, for example, decide to implement BYTE arithmetic through INTEGER arithmetic, and a user may receive an unexpected, altnough efficient, INTEGER OVERFLOW where he should have received a BYTE OVERFLOW. I am aware of aaving thus left anotner definition problem for the implementor, but this definition problem of the calling sequences for the built-in conditions I would consider a true degree of freedom of the implementation. While the handling of the software conditions might not prove to be overly complicated, the handling of the hardware interrupt conditions is guite a different problem. Here we 103 enter the domain of the privileged user, and I am not quite sure where to draw the line and just assume such things as automatic non-recursiveness and residency of the routines, and where to provide explicit commands which may be rarely if ever used. By definition of the mechanism, we can implement hardware condition handlers in a non-recursive, self- stacking form. At this level of access to the machine, we must, however, give the user access to the basic control registers of the machine (if only to be able to write that part of the run-time support which differentiates the kind of software condition to be reported further) , and we must simultaneously hope that the user will not abuse his powers. Two more extensions of the basic language have also been left up to the implementor. The Report sketched a possible syntax for a control over the communication channels, and it also sketched some desirable debugging features. Either area should be quite suitable for independent, localized further development. 104 5. Conclusions. Although quite an incomplete project, CLEOPATRA to me has proven to be quite an educational experience. While designing a language capable ot providing access to all features of the underlying hardware, and providing this access most of the time in as safe a fashion as possible, while designing a language that reflects the growing concern for ease in creation and exposition of reliable software, while designing a language that combines orthodox and radically revised features of other contemporary languages into a consolidated new entity, I have been confronted with a large number of aspects of programming language and operating system design. I am left with admiration for the success of new languages such as PASCAL or the more mathematical APL. Can I hope and could I justify a similar acceptance of CLEOPATRA? Actually, I don't think that this should be the immediate, and probably not even the ultimate goal of this research. CLEOPATRA must be considered as a collection of features which I feel we should investigate as to their feasibility and applicability in future programming. I happened to encounter so many suitable features that the best context for their presentation and application seems to be an entirely new language. Whether this then remains the only 105 environment for the features needs to be seen. Among the innovations, I would hope that the extreme structuring of the compilation blocks, together with their scope rules, can aid us in the creation of easily presented, understood, and maintained programs. I would still think that even more flexible and closer controllable variable binding mechanisms should be investigated; the static scope and block structure might yet outlive its all encompassing usefulness. Abstract data types, the separation of usage and realization of problem oriented data mechanisms, is a concept which I think should greatly enhance system design. The ability to transparently elaborate data mechanisms should allow us to separate algoritnmic notions from their final implementation details, and it should greatly ease the implementation problems by delegating the responsibilities in a functional fashion. Additionally, we should thus easily be able to separate machine-dependent implementation aspects from the portable ideas of the algorithm - we might design a portable operating system yet. Along the lines of the typeless BLISS and the largely structured OSL/2, Balzer's ideas have come a long way. CL£0PATRA and Liskov's technigues of maintaining the underlying representation should both now be tried in a real application. The next step would probably have to be a 106 separation of the access path from the actually handled element, but in a tightly controlled compile time bound fashion. Recent years have seen a multitude of new, rediscovered, or extended control structures. Here, too, I have offered to reconsider and experiment with an old structure. For the free format decision table I could see a great future, since it seems to be both regimented enough to placate the purists, and flexible while efficiently implemented enough to please and constrain the clever programmers. Decision tables provide a large area for further research both as to their general application in such system related tasks as compiler writing, or scheduling, and as to their optimized implementation in the restricted and generalized form as proposed here. What has been left out? A richer looping structure, or in general a structure of the language where the limits between statements and values are less pronounced, could be conceived. A loop might well return as one of its values its final index, thus providing a different solution to the index protection problem. BLISS is a language where there is hardly a distinction between statements and values. A different interpretation of types may be attractive, efficient, and much more portable; I am referring to the range interpretation of data types in PASCAL which Haberman 107 found to have some deficiencies. A consolidation of the concepts in either area Bay well prove to be another advancement. Every once in a while, though, we will have to take a language design from the drawing boards to the real world as personified by at least the implementors. I hope that CLEOPATRA has reached this stage, at least to the extent as delimited in this report. As an ultimate test for this system .implementation language we originally and still foresee an actual operating system implementation; I both hope for, and dread that moment. Let me close with another quote from Hoare's "Hints on programming language design": "A final hint: listen carefully to what language users say they want, until you have an understanding of what they really want. Then find some way of achieving the latter at a small fraction of the cost of the former. This is the test or success in language design, and of progress in programming methodology. Perhaps these two are the same subject anyway." To some extent I have used the prerogative of the doubting newcomer. But there will always be another time - language design to me is the most subjective activity conceivable. 108 A. A summary of CLEOPATRA. CLEOPATRA is defined in the Report [Schreiner 1974]. In this appendix we shall give a brief summary of its main features. The summary is given in the form of a Taxonomic Description by Feature, a suggested by Wulf [1972a] in the Project Rosetta Stone, appendix B. I. Data A. Literal types: a notation exists for denoting constants of type (1) INTEGER (long and short) (2) REAL (long and short) (3) boolean: BIT (4) CHARACTER (variable length) (5) POINTER (to no object) (6) label: EXIT labels (10) other: ADDRESS (addresses are LONG_INTEGER) BYTE (non-negative integer 0..255) CONSTANT (any variable can be store- protected) DECIMAL (packed decimal format) E, FALSE, FIRST, LAST (CHARACTER in the collating sequence), PI, TRUE are provided. 109 LARGE, NIL, SHALL (the user can influence a maximum, zero, and minimum value for all user-defined types.) All numerical constants can be written in number bases 2, b, 10, and 16. B. Literal aggregates: a notation exists for denoting constants of type (1) array (bounds within [-32768,32767], at most 32767 dimensions) : a temporary array can be constructed with a built-in ARRAY mechanism- It can be of any type. (2) string (maximum lenyth 32767 elements): discussed under I. A. 4 above. (9) other: CONSTANT (any variable can be store- protected.) LARGE, NIL, SMALL (the user can influence a maximum, zero, and minimum value for all user-defined types.) C. Supported data types (1) All users: INTEGER (LONG or short) REAL (LONG or short) BYTE BIT DECIMAL (size 1.-16 bytes) 110 CHARACTER (length 1.. 32767 elements) POINTER (to a specified type) DECISION table 'switches' (2) Privileged users: ADDRESS PSH Program Status Hord CSW Channel Status Word CCH Channel Command Word KEY protection key TIME (3) other: (a) aggregation as arrays with crossectioning and reshaping of all data types. Reshaping consists of changing the number of dimensions and the bounds, and decreasing the extents of an array. The number of dimensions distinguishes recognized types. (b) aggregation as data_groups (structures) of all data types. Operators can be distributed over data_groups. Order and alignment of the members distinguishes recognized types. (c) alignment to dividing boundaries or BIT boundaries. Alignment contributes to the recognized type. 111 (d) the user may construct new data types by defining a type_pack: an underlying representation and operators for the new type. Scope rules encourage the confinement of access to the underlying representation to routines nested into the type_pack. II. Naming Issues and Variables A. Type Association (3) typed: all names are strongly typed, i.e., no implied conversions (coercions) in passing parameters. Exception is the flexible treatment of DECIMAL values of various sizes. B. Scope Association (1) static (as per ALGOL bO block structure) (3) other: names can be local to a block (configuration), global to the current configuration tree, or global to the predecessor configuration tree. ALIAS names can be declared. BUILT IN names can be recalled. C. Extent (or "lifetime") (1) local (automatic) (b) deletion 112 (3) controlled (POINTER based) (5) other: the underlying representation for a user-defined type is retained according to the declaration of a variable of that type, not according to the dynamic nesting of the creation-time routine, i.e., the configuration defining and initializing the underlying representation. D. Parameters (to subprograms) (1) all types (basic, user-defined, aggregated, etc.) can be passed. (2) Parameter convention (b) cail-by-reference (as per FORTRAN) (c) call-by-result (as per ALGOL-W) : the COPY option (e) other: unless explicitly stated, actual parameters are read-only. Intent to change must be stated (B7 ADDRESS) , and requires a modifyable actual parameter. Almost no conversions are implied; see II. A. 3 above. E. Storage Management (2) stacJc (3) heap (or free-list) (b) other: garbage collection as per explicit RELEASE request or per 113 implicit loss of ref erencability is proposed. Alternatively heap managed as semi-dynamic extension of the stack a la ESPL [Manacher 1971]. III. Control A. Simple forms (1) complete operator hierarchy: right to left expression evaluation (3) statement grouping: with BEGIN. .END bracket (6) conditional forms (a) IF. .THEN (statement)