"L I E> RARY 
 
 OF THE ■ '*i 
 UNIVERSITY 
 Of ILLINOIS 
 
 510. 84 
 
 U6.r , 
 
 no.524-33d 
 
 cop., 2 
 
 »«. 
 
The person charging this material is re- 
 sponsible for its return on or before the 
 Latest Date stamped below. 
 
 Theft, mutilation, and underlining of books 
 are reasons for disciplinary action and may 
 result in dismissal from the University. 
 
 University of Illinois Library 
 
 JUN 2 3 19 
 
 [J l V/ 
 
 wnw^^flgiEBt 
 
 MAY 2 9 RST 
 
 OCT 2 
 
 1970 
 
 NOV 1 4 1970 
 
 Nov i c RpfB;. 
 
 DEC - 7 1870 
 OF 
 
 JAN * 1571 
 
 NOV 1 2 V87^ 
 
 SEP 3 1977 
 
 : > B 
 
 MAR i 4 L 
 
 1REFI 
 :Q04 
 
 
 APR 2 
 
 m* ib 
 
 
 JUL 1 1975 
 MAR 8 t97@ 
 
 MAR 8 RECTI 
 
 FEB - 1 wn _.._ 
 JAN 2 5 W* 1 
 
 WAB 28W 
 
 *W4 A377 
 
 ■ 
 
 L161— O-1096 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/nucleolminimalli324niev 
 
\l4 
 
 I 
 
 H 
 
 
 >->'> 
 
 NUCLEOL - A MINIMAL LIST PROCESSOR 
 
 by 
 
 J. Nievergelt, F. Fischer, M. I. Irland and J. R. Sidlo 
 
 JD» 1 K mn 
 
 April, I969 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
Report No. 324 
 
 NUGLEOL - A MINIMAL LIST PROCESSOR 
 
 "by 
 J. Nievergelt, F. Fischer, M. I. Irland and J. R. Sidlo 
 
 April 1969 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 6l801 
 
 Paper presented at the Purdue Centennial Year Symposium on Information 
 Processing, April 28, 1969. 
 
 This work was supported by the Department of Computer Science at the 
 University of Illinois, the National Science Foundation under NSF Grant 
 GJ 217, and by BUILD, which sponsors cooperation between the Universities 
 of Illinois and Colorado. 
 
sio. ■ r 
 x - 
 
 /j. 2 NUCLEOL - A Mini-jml List Processor 
 
 ,_/-'—' J. Nievergelt, F. Fischer, M. I. Irland, J. R. Sidlo 
 Department of Computer Science, University of Illinois 
 
 Abstract 
 
 NUCLEOL is a low-level list processor designed as a basis in terms of 
 which higher-level list- and string-processing languages could be imple- 
 mented easily and efficiently. Hence its design aims at: 
 
 a) Simplicity 
 
 b) Complete and concise description 
 
 c) General data structures and a small, well-chosen set of primitive 
 operations 
 
 d) A scheme for implementation which makes it easy to transfer the system 
 from one computer to another. 
 
 The system is currently implemented as a PL/1 program. 
 
 1. Background and purpose 
 
 The project to be described has its origin in our experiences with 
 transferring a symbol manipulation language from one computer to others . 
 The language in question is EOL [1,2,3], which was first implemented on 
 the Polish computer ZAM at the Institute for Mathematical Machines in 
 Warsaw and later on an IBM 709^ and an IBM 3&0 at the University of 
 Illinois . The specific features of EOL are not important for the purpose 
 of this paper. They have, however, strongly influenced our design of 
 NUCLEOL. 
 
 Implementations of different list processing languages have many common 
 features --indeed, their very name refers to a particular scheme for 
 organizing storage. In most implementations, however, these common fea- 
 tures are usually, but unnecessarily, tied to a particular language and 
 it appears not to be common practice that different list-processing 
 languages share the same basic subroutines . 
 
 Our guideline in designing NUCLEOL was to draw an explicit dividing 
 line between what we consider to be "high-level features" in which list- 
 processing languages tend to differ a great deal, and the "low-level 
 features" common to most of them: and to have NUCLEOL provide most of the 
 latter and so, in a sense, simulate a computer specifically adapted to 
 list processing. 
 
 An important qualification must be made at this point. The name "list" 
 covers several data structures among which it is useful to distinguish, 
 in order of increasing generality: 
 
 * 
 
 M. I. Irland is currently with the University of Waterloo and 
 
 J. R. Sidlo with Computer Sciences Corporation. 
 
strings or linear lists (one-way or two-way) 
 
 trees (lists without shared sublirts) 
 
 graphs (structures whose elements are linked to each other in arbitrary 
 ways) 
 
 Probably the only thing common in handling all of these data structures is 
 the ability to define fields and manipulate pointers. There are languages 
 
 like L [k] that are aimed at this level. We think that drawing the divid- 
 ing line at the level of fields and pointers leaves a much greater part of 
 the implementation of a high-level list -processing language above than be- 
 low the line, and we aimed at easing the burden of implementing list- 
 processing languages to a greater extent. In particular, we wanted NUCLEOL 
 itself to take care of the organization of lists in terms of data fields 
 and pointers, so that a user would not have to refer explicitly to pointers 
 to carry out such list-oriented operations as insertion and deletion of 
 sublists (but could, e.g., say something like "insert list x at point y in 
 list z"). 
 
 Aiming at this level, however, forced us to renounce the generality of 
 graphs as data structures. It is in the restricted case of trees that we 
 felt there were sufficiently general and efficient common operations that 
 would warrant our effort. 
 
 Our goal in designing NUCLEOL can now be stated as follows: To provide 
 a system, as simple as possible, which is sufficient, in a practical sense, 
 so that any list -processing language which operates on tree -structured 
 data can be implemented in terms of it easily and efficiently. It became 
 crucially important then, that the description of NUCLEOL itself be com- 
 plete, so there would be no misunderstandings to a person who studied it 
 sufficiently deeply. And that the implementation of NUCLEOL itself be 
 simple. 
 
 We will discuss at the end of this paper to what degree we consider 
 having achieved this goal. 
 
 2. Informal Description 
 
 NUCLEOL programs as well as data are well - formed strings (abbreviated 
 as wfs) of units called constituents . A constituent carries the following 
 information: its type , an attribute , and (with the exception of parenthesis 
 constituents mentioned below) data . Among the various types of consti- 
 tuents there are two, the left parenthesis $( and the right parenthesis 
 p) which occur in a wfs in a balanced way. Hence it is convenient to 
 introduce a unit called a block , which is either a single constituent 
 (other than a parenthesis) or an appropriate string enclosed in parenthe- 
 ses. Because of this block structure a wfs can be interpreted as being 
 organized as a tree as well as a linear string. 
 
 Each wfs is accessed by means of a unique constituent $S called the 
 scanner, which can be moved around in a manner convenient for both of the 
 interpretations of a wfs as a linear list or a tree. The scanner gives 
 access to the two blocks to its left and right (if present), and also 
 designates the two gaps to its left and right (where a new block may be 
 inserted). The scanner carries a name as its data, which is also the name 
 of the wfs accessed by this scanner. 
 
Apart from parentheses and scanner, there is one more type of constituent 
 which relates to the structure of wf s ' s called a reference constituent 
 $R. It may occur anywhere in a wfs .nd refer to any wfs, either in its 
 entirety or to the "blocks or gaps near the scanner. 
 
 The remaining types are data constituents, of which there are three, 
 namely $B ( bitstrings ), $C ( characters t rings ) and $D ( numbers ), and 
 finally a parameter constituent $P, which is used to represent formal 
 arguments in macros . 
 
 A NUCLEOL state is a set of wfs's no two of which have the same name, 
 and exactly one of which has a scanner whose attribute characterizes it as 
 the execution scanner . The syntax of NUCLEOL is given by a set of rules 
 (mostly in Backus Naur Form, but the sentence above is also part of the 
 definition) which defines what a NUCLEOL state is. 
 
 The semantics is defined by a function NUCSTEP, which assigns to some 
 of the states a next state. A NUCLEOL execution is a sequence of states 
 each one of which gets transformed by NUCSTEP into its successor. If there 
 is a last state, then either NUCSTEP is undefined on it, or during the 
 past transition one of a small number of stop conditions must have occurred. 
 
 The following example shows a NUCLEOL state consisting of three wfs's 
 (represented in what we call the reference language): 
 
 $(X $S 'SINK' #)X 
 
 #(X SB '00111' #( $D '-17' $C 'STRING OF ARBITRARY LENGTH' #) SS 'SOURCE' jg)X 
 
 , ^ 1 
 
 block at left of scanner $S 'SOURCE' 
 
 SSN 'PROGRAM' $(XN $CK 'MOVE' $RL 'SOURCE' j^RL 'SINK' $)XW 
 
 The wfs SINK is as small as it can be, since every wfs must contain at 
 least a pair of external parentheses (distinguished by an attribute X) and a 
 scanner. The scanner 0S 'SOURCE' has a block to its left but none to its 
 right . 
 
 The scanner named PROGRAM is in its external position (i.e., outside the 
 external parentheses. This position is distinguished by the fact that (by 
 definition) the block to the right of the scanner is the same as the block 
 to its left, namely, the entire wfs exclusive of the scanner. I.e., wfs are 
 considered to be circular . This scanner is also distinguished by its 
 attribute N (for neutral ) to be the execution scanner, and its motion repre- 
 sents the flow of control in the program. 
 
 The $C constituent inside wfs PROGRAM has an attribute K (for keyword), 
 which marks the beginning of the instruction #CK 'MOVE' $RL 'SOURCE' jSRL 'SINK'. 
 Each of the two $R constituents has an attribute L (for left), and execution 
 of this instruction causes the block currently to the left of scanner $S 
 'SOURCE' to be deleted and inserted in the gap to the left of scanner' 
 $S 'SINK'. 
 
 Let us now trace the sequence of successive states generated by repeated 
 application of the function NUCSTEP on the state described above. 
 
 First step: the attribute (N) of the execution scanner is matched against 
 the protection attribute (N) of the $( constituent to its right. Because 
 
they match, the scanner enters the block, i.e., is shifted past the consti- 
 tuent to its right (if the attributes do not match, the execution scanner 
 skips around the block to its rigl b). 
 
 Second step: the MOVE instruction is executed, which deletes the block 
 #( $D '-IT' $C 'STRING OF ARBITRARY LENGTH' $) from wfs SOURCE and inserts 
 it to the left of scanner $S 'SINK'. Thereafter, the execution scanner is 
 placed to the right of the instruction just executed. 
 
 Third step: the execution scanner, which still has an attribute N, cannot 
 pass through the $) constituent with attribute W. This causes the scanner 
 to bounce to the corresponding left parenthesis, where it finds itself 
 again in front of the MOVE instruction. 
 
 Fourth step: the MOVE instruction is executed again, and this time the 
 single constituent #B '00111' is deleted from wfs SOURCE and inserted in 
 the gap to the left of scanner $S 'SINK'. Then the execution scanner moves 
 past the instruction. 
 
 Fifth step: execution scanner bounces again. 
 
 Sixth step: the execution scanner attempts to execute the MOVE instruc- 
 tion a third time. During evaluation of its first argument $RL 'SOURCE', 
 however, it is found out that now there is no block to the left of the 
 scanner $S 'SOURCE'. This causes the execution scanner to skip the MOVE 
 instruction as before, but now its attribute changes from N to ¥ (our 
 mnemonic for "something went wrong"). 
 
 Seventh step: now that the attribute of the execution scanner matches 
 the attribute of the parenthesis, the scanner passes through the $)XW, 
 instead of bouncing. Since the execution scanner's exiting through an 
 external parenthesis is a condition for stopping, the sequence terminates 
 here. 
 
 Hence, the one-instruction program above has the effect of deleting all 
 the blocks (at a given level of nesting — in this case all first-level 
 blocks) from wfs SOURCE and inserting them in reversed order in wfs SINK. 
 
 NUCLEOL has 15 instructions (or l6, depending on how they are counted), 
 of which we now give a rather abbreviated description (compare this with 
 the syntactic description in the next section). 
 
 Structural Operations 
 
 MOVE In addition to the motion of blocks described earlier, this in- 
 struction also serves the purpose of returning blocks and entire wfs's 
 to the free storage list (e.g., $CK 'MOVE' #R 'WFS' $R 'SYS-FREE'), and 
 of input and output (e.g., j^CK 'MOVE' $R 'SYSINPUT' $R " adds a named 
 wfs to the current NUCLEOL state from the system's input device). 
 
 COPY Acts in the same way as MOVE, except that the original is copied, 
 not deleted. 
 
 SHFT Shifts a scanner one constitutent to the left or right, and hence 
 may be used to enter and leave blocks (the MOVE instruction is used to skip 
 blocks). Since wfs are considered to be circular, shifting a scanner from 
 its external position in either direction makes sense. 
 
RSTR Restores a scanner to one of three positions: its external position, 
 to the right of the next outer left parenthesis, to the left of the next 
 outer right parenthesis. 
 
 E.g.: /!( t(. M t) . & tf) J) tf( ft 0) 
 t ft f J 
 
 Bitstring Operations 
 
 AND, OR, NOT generate a $B constituent whose bitstring is obtained by 
 performing bitwise logical operation on the bitstrings in its arguments. 
 
 Characterstring Operations 
 
 CONC generates a $C constituent whose characterstring is the concatena- 
 tion of the strings in its arguments. 
 
 SPLT splits the last character from a $C constituent and generates a 
 new $C constituent from it. 
 
 Numerical Operations 
 
 ADD, SUB, MLT, DIV generate a $D constituent whose number is the result 
 of performing an arithmetic operation on the numbers in the arguments. 
 
 Data Conversion 
 
 CVRT converts (if possible) a constituent of one type to a constituent 
 of another type and/or attribute (e.g., $D to $C or $B and vice versa, $C 
 to $R, etc.). 
 
 TEST 
 
 All of the above instructions set the attribute of the execution scanner 
 to ¥ if they cannot be performed. The last instruction, TEST, can set 
 this attribute to one of four values, namely: 
 
 S test was successful 
 
 F failure (the comparison demanded by the test was carried out and the re- 
 sult was negative) 
 
 U undefined (the data to be compared was not of the proper type) 
 
 W wrong (the data to be compared could not be accessed) 
 
 Having more than two possible outcomes for a test is natural and very 
 useful when accessing a data item is as much part of the test as comparing 
 it once it has been found. The outcome indicates how far execution of the 
 test could be carried out. 
 
 Jumps 
 
 Notice there seem to be no go-to-statements in this list of instructions. 
 This is not quite true, as the instruction RSTR, when it refers to the 
 execution scanner itself is a jump (of limited usefulness). Much more con- 
 trol is available by using the "bouncing and skipping" logic which depends 
 on the protection attributes of parentheses and the attribute of the 
 execution scanner. 
 
There is, however, a hidden l6th instruction, which takes effect when 
 the execution scanner finds itself just in front of a $R constituent, 
 as in 
 
 $SN 'PROGRAM* $RR 'NEW' 
 
 The NUCSTEP function causes the following changes to occur in the state. 
 
 a) Shift the scanner PROGRAM past the reference constituent $RR 'NEW' 
 
 b) Reset the attribute of scanner PROGRAM (to ""blank") so it is no longer 
 the execution scanner. 
 
 c) Set the attribute of scanner NEW to N, so it becomes the execution 
 scanner. 
 
 Notice that execution continues wherever the scanner NEW happened to be. 
 By executing the reference $R 'NEW' instead of $RR 'NEW', the scanner NEW 
 would have been reset to its external position before exchanging control. 
 
 It is clear that with this facility, and given that reference consti- 
 tuents can be operated upon, such devices as subroutine call and return, 
 coroutine jumps, and switches are easily programmable. 
 
 We don't present this as evidence that labels and go-to statements are 
 obsolete (maybe Dijkstra would? - see [5]). We considered seriously having 
 label constituents and allowing references to them. In NUCLEOL, however, 
 such labels would necessarily be dynamic, and the overhead associated with 
 their use (e.g., what happens when you copy a label?) did not seem consis- 
 tent with our aims of simplicity. Not having labels in NUCLEOL, of course, 
 does not imply that there could not be labels in a language based on it. 
 
 3- Formal Definition 
 
 Describing a programming language and defining it are two very different 
 things. In a description to someone unfamiliar with a language one wants 
 to stress a few highlights and avoid burdening his memory with details. 
 This is what we have attempted to do in the previous section. In a defi- 
 nition, on the other hand, one has to say everything there is to say. Be- 
 cause of the intended use of NUCLEOL as a basis in terms of which other 
 languages may be implemented, we felt it necessary to attempt at least to pro- 
 vide a complete rigorous definition of the language. 
 
 Below is a complete definition of the syntax of NUCLEOL, mostly in (a 
 slightly modified) Backus -Naur Form but also containing some English sen- 
 tences (for convenience and, in one case, necessity). The notation 
 <something * > means "one or more occurrences of <something>" and 
 <something^?> means "zero or more occurrences of <something>". 
 
 NUCLEOL Syntax 
 
 <STATE> ::= A SET OF <WFS>'s NO TWO OF WHICH HAVE <$S>'s WITH THE SAME 
 <WFS NAME> AND EXACTLY ONE OF WHICH HAS A <$S> WITH <SA> 
 EQUAL TO N, S, F, U, OR W. 
 
 <WFS> : := <$S> $(X <PA> <BLOCK*?> $)X <PA> | 
 
 $(X <PA> <BLOCK*?> <SBLOCK> <BLOCK*?> $)X <PA> 
 
 <SBLOCK> ::= <$S> | <$(> <BLOCK*?> <SBLOCK> <BLOCK*?> <$)> 
 
= #B<BA> I <BITSTRING>* 
 
 = ^C<GA> I <CHAMCTERSTRING> 1 
 
 = #D<DA> I <NUMBER>* 
 
 - ^P<rA>'<CHARACTERSTRING> t 
 
 = £R<RA>"<WFS NAME>" 
 
 = #(<PA> 
 = #)<PA> 
 = $S<SA>'<WFS NAME>' 
 
 = <BLANK> I S 
 
 = <BLANK> | K I M | S 
 
 = <BLANK> | S 
 
 = <BLANK> I B | C | D I R 
 
 = <BLANK> | L I R 
 
 = <BLANK> |N| S | F | U | W OR COMBINATIONS OF N, S, F, U, ¥ 
 
 = <BLANK> | N | S | F | U | W 
 
 <BITSTRING> : := <BIT*?> 
 <BIT> ::= I 1 
 
 <CHARACTERSTRING> : := <CRARACTER*?> 
 
 <CHARAOTER> : := <BLANK> I <REST> I <LETTER> I <DIGIT> 
 <BLANK> : := A SINGLE SPACE 
 
 <REST> ::= . I < r ( I + I & I f I #|*l) 
 I / » , I % I I > I ? I : I -IT I a 
 <LETTER> ::=A| B ICI Dl El Fl G | HI I 
 I Q|E ISIT IUI VltflXI Y 
 <DIGIT> ::=0|ll2l3l^«5l6|7|8r 
 <NUMBER> : := <DIGIT*> I +<DIGIT*> | -<DIGIT*> 
 <WFS NAME> ::= SEQUENCE OF LETTERS, DIGITS, AND THE CHARACTER * ', 
 
 BEGINNING WITH A LETTER AND NOT LONGER THAN 8 CHARACTERS. 
 
 '1=1 
 
 IJIKILI Ml Nl 01 P 
 I Z 
 
 9 
 
 <LNSTRUCTION> 
 
 := <MOVE> I <COPY> I <SHFT> I <RSTR> I 
 
 <CVRT> I <ADD> I <SUB> I <MLT> I <DIV> I <A.ND> I <0R> I 
 <NOT> I <C0NO I <SPLT> I <TEST> 
 
 <M0VE> 
 <COPY> 
 <SHFT> 
 <RSTR> 
 <CVRT> 
 
 <ADD> 
 
 <SUB> 
 
 <MLT> 
 
 <DIV> 
 
 <AND> 
 
 <DR> 
 
 <NOT> 
 
 <CONC> 
 
 <SPLT> 
 
 <TEST> 
 
 <SR> 
 
 ^CK'MOVE' <#R> <$R> 
 
 ^CK'COPY' ( <BLOCK> <$R> I <$R> <#R> ) 
 
 ^ck/shft' <SR> 
 
 ( <SR> I <WR> ) 
 
 ( <T/A> I <SR> ) <SR> 
 
 = $CK*RSTR 
 = ^CK'CVRT' 
 = ^CK'ADD' 
 = ^CK'SUB' 
 = #CK'MLT' 
 = ^CK'DIV' 
 = ^CK'AND' 
 = #CK ! OR f 
 #CK'NOT* 
 
 ( <$D> I <SR> ) <SR> 
 
 ( <#D> I <SR> ) <SR> 
 
 ( <#D> I <SR> ) <SR> 
 
 ( <£D> I <SR> <SR> 
 
 ( <$B> I <SR> ) <SR> 
 
 ( <£B> I <SR> ) <SR> 
 
 <SR> 
 j^CK'CONC' ( <#0 I <SR> ) ( <$0 I <SR> ) <SR> 
 ^CK'SPLT' <SR> <SR> 
 
 ^CK'TEST' ( <$B> I <$C> I <^D> I <$P> I <SR> ) ( <TEST MODE> | 
 <SR> ) ( <$B> I <$0 I <$D> I <$P> I <SR> ) 
 
 i $RL'<WFS NAME>' | $RR'<WFS NAME>' 
 : $R '<WFS NAME>' 
 
 <$D> • <SR> 
 
 <$D> I <SR> 
 
 <£D> I <SR> 
 
 <£D> » <SR> 
 
 <$B> I <SR> 
 
 <$B> I <SR> 
 
 <JSB> I <SR> 
 
<T/A> ::= ^C , B/ / <BA> , / ^C'd/<DA>' I ^C'r/<RA>' I gfJ'P/OttV I 
 
 /<DA>' I $C /<TA>' 
 
 $C /<RA>' 1 $C' 
 #C* /<BA>' | JSC* 
 
 /<PA>' 1 
 /<CA>* | $C 
 
 <TEST MODE> ::= &!' = 
 &C <= 
 
 $C'D =A 
 $C'A =A 
 £C*A =t 
 $C'T =T 
 $C'T =A 
 
 
 1 tfC -e • 
 1 go' > * 
 
 1 ^C'Di^A 1 
 1 jgCA-^A' 
 
 | ^C'T-^A* 
 
 Ac < • 
 ■^c >= • 
 
 $C'A =D* 
 $C'T =D* 
 #C'T =T' 
 £c*D =D' 
 
 ^C'A-^D 1 
 
 ^C'T-,^ 1 
 £C'D-,=D' 
 
 While there are well-established tools for the definition of the syntax 
 of programming languages, the situation is completely different with res- 
 pect to semantics. 
 
 We insisted that the definition should serve the dual purpose of defin- 
 ing NUCLEOL to humans and to machines. This principle is not often taken 
 into consideration. It is correct that any compiler or interpreter defines 
 a language to a particular computer completely, but this is not of much 
 use to somebody who must implement the language on a new machine. 
 
 A review of earlier attempts to define programming languages indicated 
 to us that McCarthy's approach ([6, 7] and other papers), would 
 be best suited to serve our dual purpose. Hence a definition of NUCLEOL 
 was written which is, at the same time, a PL/l program for an interpreter. 
 PL/l was chosen because, among well-known high-level languages, it offers 
 the greatest flexibility of notation, which is an important point if a 
 program is to be its own documentation. 
 
 The interpreter which resulted currently consists of about 1500 PL/l 
 statements. We estimate that through "tight coding" this number could be 
 reduced to 1000, but our aim was clear documentation and avoidance of all 
 "tricky" programming. 
 
 Only the top part of the interpreter, which consists of about 400 
 statements, is part of the formal definition of NUCLEOL. It is written in 
 terms of about 50 basic predicates and functions, listed below. The re- 
 maining 1000 statements implement these predicates and functions and they 
 are too detailed and machine -dependent (in this case, PL/l- dependent) to 
 be very enlightening. 
 
 NUCLEOL Basic Functions and Predicates 
 
 BASIC PREDICATES: 
 
 IS_STATE( STATE) i 
 
 IS_WFS(WELL_FORMED_STRING) } 
 
 IS_BL0CK( BLOCK) ; " 
 
 IS - CONSTITUENT ( CONSTITUENT ) ; 
 
 IS_TYPE(TYPE) ; 
 
 IS_DIRECTION( DIRECTION) ; 
 
 IS_BITS(BITSTRING) ; 
 
 IS-CHRS(CHARACTERSTRING) ; 
 
 IS -NUMB (NUMBER) ; 
 
 IS_NAME(NAME) ; 
 
 IS_CONVERTIBLE(CONVERSION_MODE, CONSTITUENT) } 
 
 CAN_PASS ( SCANNER_ATTRIBUTE , PARENTHESIS -ATTRIBUTE ) 
 
 TESTS ( TEST_M0DE , CONSTITUENT! , CONSTITUENT 2) ; 
 
STATE LEVEL FUNCTIONS : 
 
 EXEC ( STATE )=WFS_NAME ; 
 
 WFS -NAMED (WFS_NAME)=WFS ; 
 
 KILL (WFS -NAME, STATE )=NEW_STATE ; 
 
 CREATE(WFS_NAME,WES,STATE)=NEW-STATE ; 
 
 wfs level functions : 
 
 block_at(direction,wfs)=block ; 
 constituent -at (direct ion, wfs ) constituent ; 
 dflete(direction,wfs)=new_wfs ; 
 insert ( direct ion , block, wfs ) =new_wfs 
 skep-block( direction, wfs )=new_wfs ; 
 shift(direction,wfs)=new_wfs ; 
 restore (wfs )=new_wfs ; 
 
 constituent level functions : 
 
 type_of( constituent )^iype ; 
 
 attr_of ( constituent ) =attribute ; 
 
 bits_in( constituent ) =bitstring ; 
 
 chrs_in ( constituent ) =characterstring ; 
 
 numb_in( constituent )=number ; 
 
 name-in( constituent )=wfs-name ; 
 
 convert_da ta. ( conversion_mode, constituent ) =new- constituent ; 
 
 set_attr (attribute, constituent )=new- constituent ; 
 
 adds ( constituent1 , c0nstituent2 )=new_constituent ; 
 subs (constituentl, c0nstituent2 i=new_constituent 
 mlts ( constituentl , c0nstituent2 ) =new_constituent 
 divs ( constituentl , c0nstituent2 ) =new- constituent 
 ands ( constituentl , c0nstituent2 ) =new_constituent 
 ors ( constituentl , c0nstituent2 ) =new_constituent ; 
 nots ( constituentl ) =new- constituent ; 
 concs - chrs ( constituentl , c0nstituent2 ) =new_constituent ; 
 
 split_chrs1 ( constituent ) =new_constituent ; 
 split_chrs2 ( constituent ) =new_constituent ; 
 
 l_nebr( constituent )=other_constituent * 
 r_nebr ( constituent ) =other_constituent • 
 match_paren( parenthesis ) =other_parenthesis .• 
 
 subconstituent level functions : 
 
 opposite ( direction )=new- direction ; 
 
 As an example of the definitional part of the interpreter, we show "below 
 the top level of the function NUCSTEP discussed earlier, and the interpre- 
 ter's "main loop" which calls NUCSTEP. 
 
 DO WHILE (IS_STATE( STATE)) ; 
 
 STATE = NUCSTEP (STATE) ; 
 END : 
 
NUCSTEP: PROCEDURE (STATE) ; 
 NXT = CONSTITUENT_AT_RIGPlT(EXEG_SOAMER); 
 
 IF TYPE_0F(NXT) = &C. g ATTR OF(NX'.') = 3K THEN RETURN(EXECUTE-INSTR(STATE)); 
 IF TYFE_0F(NXT) = #LEFT_PAREN | TYPE_0F(NXT) - ^RIGHTJPAREN THEN DO ; 
 IF CAN_PASS(ATTR_OF(F^CEC - SCAIMER^ATrR_OF( i NXT;; THEN DO; 
 ATTR_OF(EXEC SCANNER) = »N; 
 RETURN ( SHEFT'CrIGHT , EXEC_SCANNER ) ) ; 
 END; 
 
 constituentj^t_right(exec_scanner) = r_nebr(match_paren(nxt)); 
 return( state); 
 
 END; 
 
 IF TYPE_0F(NXT) = gR THEN DO; 
 
 IF NAME_IN(NXT) = 'SYS_STOP' THEN GO TO STOP; 
 
 STATE = SHIFT(RIGHT,EXEC_SCANNER); 
 
 ATTR_OF(EXEC_SCANNER) =•>; 
 
 WFS = WFS_NAMED(NAME_IN(NXT)); 
 
 ATTR_OF(WFS) = dN; 
 
 EXEC_SCANNER = WFS; /* CHANGE EXECUTION SCANNER */ 
 
 IF attr_of(nxt) = a> THEN return(restore(wfs)); 
 
 RETURN (STATE ); 
 END; 
 
 RETURN ( SHIFT (RIGHT, EXEC_SCANNER) ); /* IN ALL OTHER CASES */ 
 END NUCSTEP; 
 
 The definition of NUCLEOL is completed by a set of about 60 postulates 
 which relate the basic predicates and functions to each other. Here is a 
 sample . 
 
 NUCLEOL Postulates 
 
 IS_TYPE(TYPE) <=> TYPE=#B 
 j TYPE=$C 
 t TYPE=$D 
 I TYPE=$P 
 \ TYPE=$R 
 
 j TYPE=$LEFT_PAREN 
 | TYPE=$RIGHT_PAREN ; 
 
 IS_DIRECTION(d) <=> D=LEFT | D=RIGHT ; 
 OPPOSITE ( LEFT )=RIGHT ; 
 OPPOSITE ( RIGHT )=LEFT ; 
 IS_DIRECTION(OPPOSITE(D)) <=> IS_DIRECTION(D) ; 
 
 IS_TYPE ( TYPE_OF ( C ) 
 IS_BITS(BITS_IN(C) 
 IS_CHRS ( CHRS_IN ( C ) 
 IS_NUMB( NUMB_IN( C ) 
 IS_NAME ( NAME_IN( C ) 
 
 <=> IS_C0NSTITUENT(C) 
 <=> IS_C0NSTITUENT(C) 
 <=> IS_C0NSTITUENT(C) 
 <=> IS__CONSTrTUENT(C) 
 <=> IS CONSTITUENT(C) 
 
 IS_C0NSTITUENT(ADDS(C1,C2)) <=> TYPE_OF(Cl)=$DfcTYPE 0F(C2)=$D ; 
 IS_CONSTITUENT (ADDS ( CI , C2 ) ) => TYPE_OF (ADDS ( CI , C2 ) ) =$D ; 
 
 IS_CONVERTIBLE ( CONVERSION_MODE, CONSTITUENT ) 
 
 => IS_CONSTITUENT ( CONVERT_DATA ( CONVERSION_MODE , CONSTITUENT ) ) ; 
 
 IS_CONSTITUENT( CONSTITUENT) => IS_BLOCK( CONSTITUENT ) 
 
 | TYPE_OF( CONSTITUENT )=^LEFT_PAREN 
 | T YPE_OF ( CONSTITUENT ) =$RIGHT_PAREN ; 
 
is_constituent ( constituent_at ( direction, wfs ) ) 
 <=> is_direction( direction) 4 is_wfs(wfs) ; 
 
 restore ( restore( wfs ) ) =restore ( wfs ) ; 
 is_direction(d) & is_wfs(wfs) 
 
 => restore(wfs)=restore(shtft(d,wfs)) ; 
 
 is_block( block at (direction wfs)) 
 
 => insertTdirection,block_at(direction,wfs), delete(direction,wfs))=wfs ; 
 
 To summarize: Our definition of NUCLEOL consists of about ^00 PL/l 
 statements which are part of an interpreter, and about 60 postulates 
 which relate the functions to each other in terms of which the interpre- 
 tive part of the definition is written. Needless to say, we would have 
 liked to prove some sort of completeness of this definition, but we just 
 didn't know how to go about doing this. 
 
 k. Conclusion 
 
 We consider having been successful in reducing a programming language of 
 potentially great complexity (because of the data structures involved) to 
 a small yet practically usable core, whose parts fit into a conceptual 
 system with few basic notions. The adequacy of the instruction set was 
 tested during the design stage by writing a macrogenerator for NUCLEOL, in 
 NUCLEOL. 
 
 We have not yet reached a definite opinion concerning the practical 
 feasibility of a formal definition of programming languages, even one as 
 simple as NUCLEOL. McCarthy's approach (which, incidentally, is the main 
 base for an attempt at the formal definition of PL/l by a group at the IBM 
 laboratory in Vienna (see [8], and many reports)) appeared to amount essen- 
 tially to "good programming "--e.g. , identify the basic functions in terms of 
 which the interpreter should be written, distinguish carefully among 
 different levels of activity (in our case: operations on the state, on a 
 wfs, a block, a constituent, and finally on the data contained in a constitu- 
 ent). 
 
 5- Current Work 
 
 One of the guiding lights in the design of NUCLEOL was its applicability 
 to tree transformations as they occur in linguistic analysis, particularly 
 in testing transformational grammars. Such a system, in which tree trans- 
 formations can be specified by patterns and replacements, and in which sub- 
 trees, which match the pattern, are replaced recursively is currently 
 being written in NUCLEOL. 
 
 Lastly, for NUCLEOL to serve its purpose it is important that it may be 
 implemented easily on other machines, and that it may run efficiently. Hav- 
 ing an interpreter written in PL/l solves the first problem for installations 
 which have a PL/l compiler, but hardly the second one. 
 
 Our aim. in writing the PL/l interpreter was mainly one of documentation. 
 It is intended that efficient implementations of NUCLEOL will be obtained 
 by using this interpreter not as a PL/l program, but as an input to a macro 
 processor. For each of the PL/l constructs used (care was exercised to 
 limit this set as much as possible) a macro has to be defined in the target 
 language. This leaves an implementor free to choose the internal represen- 
 tation of the NUCLEOL data structure to be the most efficient on his 
 particular computer. 
 
W. M. Waite of the University of Colorado is currently working on the 
 implementation of NUCLEOL on a GDC 6k00 and a (decimal) Librascope computer 
 using this scheme and his Mobile Programming System [9]. 
 
 References 
 
 1. Lukaszewicz, L. "EOL - A Symbol Manipulation Language," The Computer 
 Journal, Vol. 10, No. 1, May, 1967. 
 
 2, 3. Lukaszewicz, L. and Nievergelt, J. "EOL-Report" and "EOL Programming 
 
 Examples," U. of Illinois, DCS Reports Nos. 2kl, 2^2, Sept., 1967. 
 
 k. Khowlton, K. C. "A Programmer's Description of L ," C ACM, Vol. 9, 
 No. 8, Aug., 1966. 
 
 5. Dijkstra, E. "Go To Statement Considered Harmful," C. ACM, Vol. 11, 
 No. 3, March, 1968 (letter to the editor). 
 
 6. McCarthy, J. "A Formal Description of a Subset of Algol," in 
 "Formal Language Description Languages" (ed., Steel), pp. 1-12, 
 North Holland, 1966. 
 
 7. McCarthy, J. and Painter J. "Correctness of a compiler for arithmetic 
 expressions" in "Mathematical Aspects of Computer Science," Proc. Symp. 
 Applied Math, Vol. 19, American Math. Society, I967. 
 
 8. Bandat, K. "On the formal definition of PL/l, " Proc. AFIPS SJCC, 1968, 
 PP. 363-373. 
 
 9. Orgass, R. J. and Waite, W. M. "A Base for a Mobile Programming System," 
 IBM Research Paper, RC- 1952, Nov., 1967. 
 
 10. Sidlo, J. R. "NUCLEOL - The Basis for the List-Processing Language 
 E0L-4," M.S. Thesis, U. of Illinois, Aug., 1968. 
 
 11. Irwin-Zarecki, M. (I. Irland) . "NUCLEOL as a Formal System," M.S. Thesis, 
 U. of Illinois, Feb., 1969. 
 
 Acknowledgments 
 
 We are grateful to Professor L. Lukaszewicz of the Polish Academy of 
 Sciences, W. M. Waite of the University of Colorado, and B. D. Weathers of the 
 University of Missouri for important discussions on the design of NUCLEOL; and 
 to Mr. Kiyoshi Maruyama for his help in programming and debugging. 
 
 Our work was supported by the Department of Computer Science at the Univer- 
 sity of Illinois, the National Science Foundation under NSF Grant GJ 217, and by 
 BUILD, which sponsors cooperation between the Universities of Illinois and 
 Colorado .