^U^ 
 
 
 
 
 
 
 i ' ». 
 
 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510. 84- 
 
 no. 60V GI2 
 cob. 2 
 
CEMIRAl CIRCUUTIOM AMD BOOKSTACKS 
 
 r person >J---|rarof /et™ 
 responsible or -t^ J™™ ^^,„„. You 
 
 APR y. m 
 
 When renewing by phone. «rlt. new due dale 
 below previous due date. 
 
" 7 
 
 UIUCDCS-R-T5-607 
 Ho. o2^ 
 
 A Revised ALGOL 68 Hardware Representation 
 for ISO-code and EBCDIC 
 
 November, 1975 
 
 by 
 Wilfred J. Hansen 
 
 THE LIBRARY OF THE 
 
 JAN 9 1974 
 
 UNIVERSITY OF ILLINOIS 
 
 AT ')F?P ■'N' ■^. • ■ • -onipM 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
uiucDCS-R-73-607 
 
 A Revised ALGOL 68 Hardware Representation 
 for ISO-code and EBCDIC 
 
 by 
 
 Wilfred J. Hansen 
 
 November, 1973 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 
 
 This work was supported by the Department of Computer Science 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/revisedalgol68ha607hans 
 
310. /¥ 
 
 if Annotated Table of Contents 
 
 Page 
 
 I. Design Considerations for AlfiOL 68 Representations 
 A philosophical discussion of the problems that 
 
 complicate representation design 1 
 
 1.1 Psychological considerations 1 
 
 1 . 2 Deci sions demanded by the Report 3 
 
 1.3 Hardware Considerations 
 
 Including tables of ISO-code and EBCDIC k 
 
 II. Five ALGOL 68 Symbol Set Suggestions 10 
 
 II. 1 plus -i -times -symbol: '+*' 10 
 
 II. 2 of-symbol: '-<' 10 
 
 II. 3 stick-symbol, again-symbol, or-symbol 11 
 
 11.^ Disentangling U ', 't % '-', and '[ ' 12 
 
 II. 5 'flip': true, 'flop': false lit- 
 
 III. The Design of the Hardware Representation 
 
 Using characters available in both ISO and EBCDIC I5 
 
 III . 1 Letter tokens I5 
 
 111. 2 Bold tags 16 
 
 111 . 3 Composite characters 16 
 
 III.i^- Carriage Return, Line Feed, and Delete I6 
 
 III. 5 Notes on Particular Representations I6 
 
 111. 6 Other-string-items, other-pragment -items I7 
 
 111. 7 Abs, repr, and conversion 17 
 
 111. 8 Use of remaining characters I8 
 
 111. 9 Representations with smaller character sets I8 
 
Ill 
 
 page 
 
 III . 10 Guide to reading Appendix A I9 
 
 rV. Stropping Recommendations 
 
 Stropping clutters, but if you must strop, use case 
 
 shift or a post-fix underbar 20 
 
 IV. 1 Postfix underline is bold (least favored, but better 
 
 than sixteen other schemes) 21 
 
 IV. 2 Upper case is bold - 21 
 
 IV. 3 Lower case or postfix underline is bold 22 
 
 IV.k- Reserved words are bold (most preferred technique) 22 
 
 References 25 
 
 Appendix 
 
 A. Proposed Hardware Representation 26 
 
IV 
 
 Abstract 
 
 Because of the latitude allowed by the Revised ALGOL 68 Report, 
 each implementation has a slightly different representation for the 
 constructs of the language. This diversity can only lead to confuse 
 as ALGOL 68 trained individuals find they need readaptation to programs 
 at a new installation. The solution proposed here is to develop a single 
 hardware representation which can be used on many computer systems. In 
 fact this representation can conveniently be designed using only the 
 intersection of the graphic characters available in the ISO code and 
 EBCDIC. 
 
 The paper also proposes comfortable new representations for a few 
 symbols and discusses the thorny problem of distinguishing bold face words, 
 
V 
 
 Preface 
 
 I didn't want to write this paper. 
 
 After the Los Angeles meeting of WG2.1 approved the Revised Eeport on 
 the Algorithmic Language ALGOL 68 (l) I decided to spend two leisurely days 
 designing a transportation representation for the language. It wasn't 
 that easy (hut I did it, it's another paper). Because the transportation 
 representation is an encoding of ALGOL 68 program texts, an adequate 
 explanation requires sample encoder and decoder programs. These must 
 assume some specific hardware representation. Lacking a compiler I 
 decided I could quickly design a suitable hardware presentation language 
 of my own. 
 
 Older now, but wiser, I offer the following. 
 
 Sections three and four explain the hardware representation and sections 
 one and two explain why I chose it. To a large extent, the sections can be 
 read independently and the recommendations of one adopted without adopting 
 any other. In particular, the transportation representation in no way 
 depends on this hardware representation. 
 
vn 
 
 Acknowledgements 
 
 This paper has been written with continuous reference to 'An ISO-Code 
 Representation for ALGOL 68* by C. H. Lindsey (2). I am indebted to 
 J. E. L. Peck for introducing me to ALGOL 68 and patiently explaining 
 obscurities I encountered. 
 
I. Considerations in the Design of Hardware Representations 
 
 Now that the Revised Report has been approved by WG2.1, it is appropriate 
 to reconsider the question of hardware representations of the language. 
 (Hereafter, the Revised Report will be referred to as the Report; if there 
 were any references to the earlier Report, they would specify the Original 
 Report. Most remarks apply to both, anyway.) The Report has been written 
 with the thought that it will be implemented on a wide variety of hardware 
 with vastly differing character sets. As a consequence, it is not 
 particularly specific as to how any construct will be represented on (say) 
 cards. 
 
 Rather than envision the possibility of a tower of Babel of 
 representations, I suggest that there is in fact a widely available 
 set of graphics with which all constructs of the language can be 
 represented. Selection of a widely available character set will bring 
 these benefits : 
 
 o as trained AlfiOL 68 programmers move from implementation to 
 implementation, they will be able to begin programming and reading programs 
 immediately and without confusion. 
 
 o as programs are sent from one installation to another, they will be 
 understood without lengthy explanation and constant referral by the reader 
 between the text and a codebook. 
 
 o many installations have a variety of devices with different character 
 sets. Only by choice of a widely available representation will programmers 
 be able to access the files containing their programs from all these devices, 
 including both terminals and line printers. 
 
 o ALGOL 68 will more readily be accepted by outsiders as a single 
 language and not a collection of similar languages. 
 
 It must be recognized that at the majority of installations, and 
 especially in North America, ALGOL 68 will not be the primary language in 
 use. For this reason it cannot be expected that the operating system will 
 have provisions especially suited to the language. In particular, for many 
 years ALGOL 68 by itself will not be a strong force determining the nature 
 of character sets provided as standard by manufacturers. 
 
 The many considerations that affect design of a representation can be 
 categorized into psychology, ALGOL 68, and hardware. These topics are 
 covered in the remainder of section I. Section II suggests a few 
 representations for symbols that may be controversial. It is important to 
 note that the representation finally arrived at can be adjusted to take 
 into account rejection of any of these suggestions. Section III details 
 the decisions made in the rest of the representation and section IV 
 discusses the sticky question of representations for bold-tags. 
 
 I.l Psychological considerations. 
 
 Care must be exercised in the design of representations for a number of 
 general psychological reasons : 
 
a) There are many possible sources of small confusions in representation 
 design: odd. characters, context dependent usage, dissimilar usage in similar 
 contexts in similar languages, breaks in typing rhythms, and more. Each 
 instance of confusion may be only a minor annoyance, or it may interrupt a 
 train of thought and result in omission of critical phrases. Moreover, 
 confusions may have a cummulative effect that can lead to frustration and 
 breakdown in communication. 
 
 b) To some extent the physical representation and not the abstract 
 'strict language' is the medium of thought. This is true when writing a 
 program and even more so when reading a program. Consequently, variations 
 between physical representations ought to be carefully restricted. 
 
 c) When writing a program, a trained programmer writes by reflex and 
 concentrates instead on the task at hand. For small changes of representation 
 when changing installations, the retraining period is probably small, but 
 
 if it can be avoided, everyone benefits. 
 
 d) Representations should be chosen with an eye to the representations 
 used in other fields and in other languages. The task of attracting ALGOL 68 
 users is sufficiently difficult without repelling them with unusual symbol 
 choices. (The Report has excellent symbol choices, I worry about implementations 
 
 Beyond these general considerations, the designer must keep in mind a 
 number of human factors effects : 
 
 a) Some reasonably consistent aesthetic should be followed in the design 
 to assist in readability. The aesthetic of the ALGOL 68 reference language is a 
 pleasant combination of natural language and mathematical conventions. It seems 
 characterized by economy of expression and avoidance of clutter. 
 
 b) Simultaneous with aesthetics, the designer must strive for understand- 
 ability, unmistakability, and clarity. 
 
 c) A specific problem is that operators that bind closely ought to take 
 less space than those that bind more loosely. In this regard, a bold tag used 
 as an operator can suffer "stropping separation". 'This includes not only the 
 length of the tag, but also the character (s) needed to strop it and any 
 necessary blanks. 
 
 d) Another factor is the length of the text. Too short a text may 
 correspond to a program that has been abbreviated beyond reason; but too long 
 a text slows both the writer and the reader. A longer length for an 
 infrequently used operator is acceptable because it will not substantially j 
 affect the length of the program. . \ 
 
 e) No one is aided if an implementation provides too many alternative 
 ways of expressing a single construct. Programmers are constantly forced to [ 
 choose among alternatives; readers must be prepared to encounter that many 
 
 more symbols. When text includes rare alternative form, the reader may , 
 remember it erroneously; he will at least have to interrupt his reading to 
 try to recall the symbol. 
 
Selection of specific symbols must depend on yet other factors: 
 
 a) The existence of groups of potential programmers trained in the 
 meaning of a symbol. For example, '+' ought to mean addition because most 
 potential users have studied algebra. 
 
 b) Use of the symbol in other widespread languages. Conflicting usage 
 could be a source of confusion. 
 
 c) Relation of a symbol to its meaning ("graphic onomatopoeia"). For 
 example, parentheses, braces, and brackets do appear to surround their 
 contents. 
 
 d) The possibility of a confusing similarity between two graphics. 
 Lindsey points out, for example, that '+' and '«•' appear alike on teletypes. 
 
 1.2. Decisions demanded by the Eeport. 
 
 In a number of areas, the Report leaves considerable latitude to 
 accommodate implementations with varying character sets. The representation 
 must specify what is allowed for other-string-item, STYLE -other-PRAGMENT-item, 
 style-TALLY-letter-ABC, and style -TALLY -monad (R9.J4-a}. At least one means 
 must be provided to write any operator in the standard prelude {R9.^b}, 
 decisions must be made as to the values of abs, repr, 'null character', 
 'error char', 'flip', 'flop', 'blank', and 'max abs char' [RIO. 2.1}, and a 
 conversion must be associated with 'stand conv' {10. 5.1. 2d}. 
 
 In 9«^ t), the Report accepts an implementation even if it provides 
 only one of the alternatives for each symbol (say either '@' or at_). The 
 intention, in fact, seems to be to accept any representation language that 
 provides at least one way of expressing each construct in the language. 
 Thus an implementation need not necessarily have both stick-symbol and 
 then-symbol, since with either one a choice-clause can be constructed 
 {R9.1.1 c,h}. Similarly, a times -ten -to -the -power -symbol might be omitted 
 since some letter-e- symbol will usually be available [R8. 1.2.1 h}. 
 
 Selection of representations for standard prelude operators is complicated 
 by the fact that not only do many operators have a number of symbols, but 
 many symbols are assigned more than one function. Some way must be found 
 through the mazes of relationships among uses and alternatives for 't ', '~', 
 and '['. Likewise, there are some complications if other than the reference 
 representations are chosen for any of the multiple symbols that map into 
 certain representations (for example, four symbols map into ':'). Some 
 decision must be made as to whether the implementation will accept the 
 "allowable" alternatives like '..' for ':'„ [R^.kh] 
 
 Occasionally, a representation designer will be forced to consider use 
 of a diphthong for some operation. In this effort, he must check the 
 operator grammar in 9«^'2.1 of the Report to see that the diphthong is legal 
 and thus will not cause ambiguity. A secondary consideration is to try to 
 leave intact the possibility of families of operators. For example, diphthongs 
 ending in '/' should be avoided because they are all available for a family: 
 
 V V // -/ ^/ 
 
Indeed, this is the family of APL reduction operators. 
 
 Finally, 9«^«2.2 b specifies that a bold-tag is composed of marks 
 corresponding to its LETTER'S and DIGIT'S, where the 'mark corresponding 
 to each LETTER ([or] DIGIT) is similar to the mark representing the corresponding 
 LETTER-symbol ([or] DIGIT symbol)'. Interpretation of the word 'similar' has 
 led to a variety of "stropping techniques". The representation designer must 
 choose one of these techniques. My own suggestions will hinge on the 
 observation that nothing is more similar to an object than itself, but see 
 section IV for the gory details. 
 
 1.3 Hardware Considerations. 
 
 Two standard codes for computer text have been defined and widely 
 used: the ISO code (2) (and especially its ASCII subset (3^5)) and 
 EBCDIC (^,5). The former are used by most of the world, and the latter 
 is used by only one manufacturer. Tables defining these two codes are 
 reproduced in figures 1, 2, and J* Note that ISO has many national variants 
 and the code provides spaces where national groups can place characters 
 specific to their own needs. 
 
 An ALGOL 68 program will certainly not be given the same binary encoding 
 in the two codes, since, for example, '+' is '00101110' in EBCDIC and '0101011' 
 in ISO. However, in an important sense one can make programs in the two 
 codes appear similar; the graphics chosen to represent each ALGOL 68 construct 
 can be the same in both codes . 
 
 Examining the code tables, we see that ISO has a number of characters 
 that will not be the same on every terminal, the so called "national 
 characters". These interfere with a common graphic representation of 
 ALGOL 68 so they should be avoided. Leaving them aside, the following 
 characters are available in both codes: 
 
 upper and lower case letters : a-z A-Z 
 
 digits : 0-9 
 
 space ! " # $ f^ p, ' ( ) * + ^ - • / 
 
 : ; < = > ? @ _ 
 
 The following control characters appear in both codes and ought to 
 be considered in design of a representation: 
 
 BS 
 
 backspace 
 
 HT 
 
 horizontal tab 
 
 CR 
 
 carriage return 
 
 LF 
 
 linefeed 
 
 DEL 
 
 delete 
 
 MJL 
 
 null 
 
 FF 
 
 form feed 
 
 VT 
 
 vertical tab 
 
+ 16 
 
 32 
 
 i^8 
 
 Gk 80 
 
 112 128 ihk l6o 176 192 208 22^ 2I+0 
 
 
 
 NUL 
 
 DLE 
 
 DS 
 
 
 space 
 
 & 
 
 - 
 
 
 
 
 
 
 
 
 
 
 1 
 
 SOH 
 
 DCl 
 
 SOS 
 
 
 
 
 
 / 
 
 
 a 
 
 3 
 
 
 A 
 
 J 
 
 
 1 
 
 2 i 
 i 
 
 STX 
 
 DC 2 
 
 FS 
 
 SYN 
 
 
 
 
 
 
 b 
 
 k 
 
 s 
 
 B 
 
 K 
 
 S 
 
 2 1 
 
 3 ' 
 
 ETX 
 
 TM 
 
 
 
 
 
 
 
 
 c 
 
 1 
 
 t 
 
 C 
 
 L 
 
 T 
 
 3 1 
 
 i. ' 
 
 FF 
 
 RES 
 
 BYP 
 
 PN 
 
 
 
 
 
 
 d 
 
 m 
 
 u 
 
 D 
 
 M 
 
 U 
 
 ^ ' 
 
 5 
 
 HT 
 
 NL 
 
 LF 
 
 RS 
 
 
 
 
 
 
 e 
 
 n 
 
 V 
 
 E 
 
 N 
 
 V 
 
 1 
 
 5 ! 
 
 6 
 
 LC 
 
 BS 
 
 ETB 
 
 UC 
 
 
 
 
 
 
 f 
 
 
 
 w 
 
 F 
 
 
 
 W 
 
 1 
 6 ■ 
 
 7 
 
 DEL 
 
 XL 
 
 ESC 
 
 EOT 
 
 
 
 
 
 ' 
 
 g 
 
 P 
 
 X 
 
 G 
 
 P 
 
 X 
 
 7 : 
 
 8 
 
 
 CAN 
 
 
 
 
 
 
 
 
 h 
 i 
 
 q 
 
 r 
 
 y 
 
 z 
 
 H 
 I 
 
 Q 
 
 R 
 
 Y 
 Z 
 
 8 ' 
 
 9 
 
 SMM 
 
 EM 
 
 cc 
 
 SM 
 
 
 / 
 
 
 j 
 
 
 , — ^ 
 
 l 
 • i 
 
 9 
 
 10 
 
 
 
 
 
 
 
 
 11 
 
 VT 
 
 GUI 
 
 CU2 
 
 CU3 
 
 • 
 
 
 $ 
 
 } 
 
 # i 
 1 
 
 
 
 
 
 
 
 i 
 
 i 
 [ 
 
 12 
 
 FF 
 
 IFS 
 
 
 DCi^ 
 
 < 
 
 
 * 
 
 I0 
 
 1 
 
 @ : 
 
 
 
 
 
 
 
 1 
 
 i 
 
 13 
 
 CR 
 
 IGS 
 
 ENQ 
 
 NAK 
 
 ( 
 
 
 ) 
 
 
 ' ! 
 
 
 
 
 
 
 
 1 
 
 li^ 
 15 
 
 SO 
 SI 
 
 IRS 
 lUS 
 
 ACK 
 BEL 
 
 SUB 
 
 + 
 
 
 • 
 -I 
 
 > 
 
 9 
 
 tt 
 
 
 
 
 
 
 
 ;' 
 
 
 Figure 
 
 ; 1. 
 
 Extended Binary- 
 
 ■Coded- 
 
 ■Decimal Interchange Code 
 
 (EBCDIC ) 
 
 
 
 (Chart adapted from (k).) 
 
0+ 
 
 16+ 
 
 32+ 
 
 k-8+ 6k+ 80+ 
 
 96+ 112+ 
 
 
 1 
 2 
 
 3 
 1^ 
 
 5 
 6 
 
 7 
 8 
 
 9 
 
 10 
 
 11 
 12 
 
 13 
 11+ 
 15 
 
 NUL 
 
 DIE 
 
 space 
 
 
 
 (@) 
 
 P 
 
 
 n 
 
 P 
 
 SOH 
 
 DCl 
 
 1 
 
 ■ 1 
 
 A 
 
 Q 
 
 
 a 
 
 q 
 
 SIX 
 
 DC 2 
 
 IT 
 
 2 
 
 B 
 
 R 
 
 
 b 
 
 r 
 
 ETX 
 
 DC5 
 
 (£) # 
 
 3 
 
 C 
 
 S 
 
 
 c 
 
 s 
 
 EOT 
 
 DCi^ 
 
 $ 
 
 i^ 
 
 D 
 
 T 
 
 
 d 
 
 t 
 
 ENQ 
 
 NM 
 
 i 
 
 5 
 
 E 
 
 U 
 
 
 e 
 
 u 
 
 ACK 
 
 SYN 
 
 & 
 
 6 
 
 F 
 
 V 
 
 
 f 
 
 V 
 
 BEL 
 
 ETB 
 
 t 
 
 7 
 
 G 
 
 W 
 
 
 g 
 
 w 
 
 BS 
 
 CAN 
 
 ( 
 
 8 
 
 H 
 
 X 
 
 
 h 
 
 X 
 
 HT 
 
 EM 
 
 ) 
 
 9 
 
 I 
 
 Y 
 
 
 i 
 
 y 
 
 LF 
 
 SUB 
 
 ■)«• 
 
 • 
 
 J 
 
 Z 
 
 
 J 
 
 z 
 
 VT 
 
 ESC 
 
 + 
 
 5 
 
 K 
 
 ([) 
 
 [ 
 
 k 
 
 { 
 
 KF 
 
 FS 
 
 ^ 
 
 < 
 
 L 
 
 
 \ 
 
 1 
 
 
 CR 
 
 GS 
 
 - 
 
 = 
 
 M 
 
 (]) 
 
 ] 
 
 m 
 
 } 
 
 SO 
 
 RS 
 
 • 
 
 > 
 
 N 
 
 (-) 
 
 /-\ 
 
 n 
 
 (") 
 
 SI 
 
 US 
 
 / 
 
 9 
 
 
 
 — 
 
 
 
 
 DEL 
 
 Figure 2. ISO 7 -bit Coded Character Set 
 (also known as ASCII) 
 
 Some codes are not officially assigned graphics, but the 
 preferred alternatives are shown in parentheses. The right 
 hand side of columns 32, 80, and 112 show the alternatives 
 chosen for ASCII. (ISO chart taken from (2).) 
 
35 61^ 91 92 93 9^ 96 123 121+ 125 126 
 
 ASCII 
 
 # 
 
 @ 
 
 [ 
 
 \ 
 
 ] 
 
 y\ 
 
 - 
 
 [ 
 
 
 ] 
 
 Australia 
 
 # 
 
 @ 
 
 [ 
 
 \ 
 
 ] 
 
 /\ 
 
 ^ 
 
 { 
 
 
 ] 
 
 Denma.rk, Finland, 
 
 
 
 A 
 
 6 
 
 • 
 
 A 
 
 
 
 a 
 
 b 
 
 • 
 
 a 
 
 Norway, Sweden 
 
 
 
 
 
 
 
 
 
 
 
 France 1 
 
 £ 
 
 a 
 
 o 
 
 9 
 
 § 
 
 /N 
 
 ^ 
 
 e 
 
 u 
 
 e ~ 
 
 2 
 
 
 
 [ 
 
 \ 
 
 ] 
 
 
 
 
 
 
 W. Germany 1 
 
 # 
 
 @ 
 
 [ 
 
 \ 
 
 ] 
 
 /\ 
 
 «» 
 
 1 
 
 
 } 
 
 2 
 
 £ 
 
 S 
 
 A 
 
 
 
 U 
 
 
 
 a 
 
 o 
 
 a p 
 
 Japan 1 
 
 # 
 
 @ 
 
 [ 
 
 ^ 
 
 ] 
 
 ■^ 
 
 V 
 
 1 
 
 1 
 
 } 
 
 2 
 
 £ 
 
 
 
 
 
 
 # 
 
 
 
 
 1 
 
 New Zealand 
 
 
 @ 
 
 [ 
 
 
 ] 
 
 -\ 
 
 "* 
 
 
 
 - 
 
 i 
 United Kingdom 1 
 
 £ 
 
 @ 
 
 [ 
 
 \ 
 
 ] 
 
 t 
 
 ^ 
 
 { 
 
 
 } 
 
 1 2 
 1 
 
 
 
 
 10 
 
 
 
 
 
 
 
 3 
 
 1 1 
 
 
 
 
 1 
 
 
 
 
 
 
 
 Figure 3- ISO Alternatives Adopted by Various Countries 
 
 (Taken from Lindsey (2).) 
 
8 
 
 A few comments on individual characters: 
 
 '#' is not ISO standard, but is the ASCII choice for position 35 in 
 ISO. The alternative is '£', a currency symbol as is ';^'. 
 
 •@' is not ISO standard, hut is the preferred alternative for position 
 6k and is only infrequently assigned other graphics. 
 
 ' I ', '! ', '-', and '-' 
 
 Extended discussion of these follows. 
 
 The manufacturer supporting EBCDIC has helped introduce endless confusion 
 into the codes for these graphics. The following charts illustralTe this by 
 giving the hexadecimal location in the code of various graphics. 
 
 ISO Alter- IBM IBM 
 
 code: ISO natives ASCII-8 ASCII-8 EBCDIC-8 EBCDIC 
 (references) (2) (2) (3,5) (h) (5) (k) 
 
 graphic 
 
 70 u d 
 21 
 
 5D A U S 
 
 kl 
 
 21 kF 
 
 7C FC 6a 
 
 5D BD 5A 
 
 kF 
 
 8e 
 6e 
 
 BE 
 
 6e 
 
 6e 
 Be 
 
 5F 
 
 Al 
 
 5F 
 
 Many terminal control software routines interpret the "best" meaning of 
 characters. Thus on some North American timesharing systems a '!' at an 
 ASCII terminal is converted to an EBCDIC ' | ' by the time it reaches the 
 executing program. In such an environment or even in more benign 
 environments there are then eight characters that can be produced by the 
 system in response to some code that at one time was supposed to be ' | ' : 
 ! ' ] ^ A U il d. Similarly, '-%' can be ~ n - t and p. This 
 peregrination can render a character a poor choice for an important task 
 in a representation language. 
 
 Another "feature" of terminals and terminal support routines is the 
 translation of lower case characters to upper case because time-sharing 
 supervisors expect upper case commands. This will influence the choice of 
 letter symbols and stropping techniques below. 
 
 i 
 
The final set of hard-ware problems is concerned vd.th format effectors. 
 In particular, should backspace and carriage return be permitted to instruct 
 the compiler to reconstruct the input text image exactly as it would appear 
 on a page produced by an ISO-code terminal? If so, bold stropping could be 
 accomplished by backspacing and underlining, or even by returning the 
 carriage and underlining. For a number of reasons, such composite characters 
 are bothersome and must not be allowed: 
 
 mechanically, backspace is a slow operation. For example an IBM 
 Selectric© backspaces J+2^ slower than it forward spaces. This delay is 
 enough to break typing rhythm and cause marginal discomfort and confusion. 
 
 1 many terminals do not interpret ASCII backspace correctly and replace 
 the character after repositioning the typing element (or cursor). R. W. Bemer 
 has conrplained about this in a letter to Datamation (8), but I do not expect 
 his plea to stem the tide. 
 
 o a compiler that interprets the printed image is not interpreting the 
 characters in their sequential presentation. Error indications may not be 
 correctly associated with the text, and even if so there may be no good 
 clue as to how the text is stored in a file and should be modified. This 
 problem would not be as bad if all editing were done interactively, but that 
 is not always economical. 
 
 ° some systems use only carriage return as a signal for end of line, 
 so if the compiler were interpreting the image, all the characters would be 
 on one line. (Certainly the compiler for such an installation would be 
 more clever, but it is a curious thought.) 
 
IjO 
 
 II. Five ALGOL 68 Symbol Set Suggestions 
 
 1. plus -i -times -symbol: '+*' 
 
 Few devices provide 'J_', the reference language character for the 
 plus -i -times -symbol. It can be constructed from overlaid characters, but 
 these are a questionable recourse at best. The report also suggests i_, 
 but this suffers stropping separation or, if reserved words are used, 
 conflicts with the identifier 'i'. The Lindsey ISO-code representation 
 proposes '!' and ' |_', but these encounter the ISO vertical bar uncertainty. 
 
 In Algol Bulletin '^k W. Freeman (6) proposed that modulo-operation be 
 represented by 'tX'. This has been accepted in the Revised Report, and a 
 parallel construction suggests '+X' for the plus -i -times -symbol. More 
 usually, this will be written as '+-^'; but note that this does not 
 interfere with the x-asterisk family of diphthongs, because the family 
 members '^•^' and "^*' already have standard meanings. Note: '+*' would 
 also be used in transput to represent plus-i -times. 
 
 Examples : 3+*5 u +* v 
 
 (R 10.2.3.7J) (re a+re b) +^ (im a+im b) 
 
 (R 11.1) (x > I rp+*ip I abs_ ip+*(y > | rp | -rp)) 
 
 2. of -symbol: '-<' 
 
 At Los Angeles, '->' was removed as a representation of of -symbol 
 because it points the wrong way - from son to parent, andbecause in practice 
 it points the opposite way from the arrow in a similar construct in PL/I. 
 Only of is left and it suffers extreme stropping separation for an operation 
 that binds even more tightly than a monadic operator. One proposal for 
 of-symbol is '.', but again this has the opposite meanings to the 
 corresponding construct in PL/I. I derived an alternative to _of by 
 starting from the is-an-element -of-symbol, 'e', and then considering ' -< ' 
 and ^' . The latter has an excellent diphthong: '-<', so I propose that 
 the Report list of and '%< ' while the "approved alternative representation" 
 be '-<'. Note: It should cause little difficulty that '-C' is also the 
 symbol for a photocathode. 
 
 Examples: father -<p 
 
 (R10.2.3.Tc) sqrt(re-<a**2 + im-<a**2 ) 
 
 (RlO.J.l.ld) p-<a > p-<b or p-<:a = p-<b 
 
 and (l-<a > l-<b 
 
 or l-<a = l-<b 
 
 and c-<a > c-<b) 
 
11 
 
 3. stick-symbol, again-symbol, or-symbol 
 
 The Report proposes that the stick-symbol be represented by '| ', a 
 reasonable enough choice given the name. Three problems arise: the 
 problems of * | ' migration in ISO; the fact that PL/l uses this character 
 for 'or', so it appears in boolean contexts in that language as well as 
 in ALGOL 68; and the fact that the symbol has no inherent relation to 
 decision. Despite these problems a brief-in-symbol is essential to avoid 
 stropping separation in if-clauses yielding a value. The '?' meets all the 
 objectives of the stick-symbol (which I would now rename the decision-symbol): 
 it is available in most character sets without ambiguity, it is unused in 
 other languages, and it naturally implies that a decision is being made. 
 
 A disadvantage of the stick-symbol is that it is also used for in and 
 out in case-clauses, and '?' would be less meaningful in this context. 
 Consider, however, the unfortunate similarity of ' (a ? b, c ? d) ' and 
 '(a ? b; c ? d) ' . (it is just as unfortunate with s tick- symbols. ) It would 
 be too drastic to eliminate the brief case clauses, but why not specify that 
 CASE-brief-in and CASE -brief -out be in and out? Then the example would be 
 ' (a in b, c out d)', a reasonable result, even with stropping separation. 
 The case -again should not have a brief form: it occurs in long clauses 
 that easily confuse a reader's parse; it is non-intuitive - an integral-again 
 seemingly ought to select a new range for the original enquiry clause. 
 
 The proposal for choice clause tokens can be diagrammed: 
 
 start in again out finish 
 
 IF -brief 
 CASE -brief 
 IF -bold 
 CASE -bold 
 
 ( 
 
 9 
 
 ? : 
 
 9 
 
 ) 
 
 ( 
 
 in 
 
 xxxxx 
 
 out 
 
 ) 
 
 if 
 
 then 
 
 elif 
 
 else 
 
 fi 
 
 case 
 
 in 
 
 ouse 
 
 out 
 
 esac 
 
 Reassigning the brief-in frees the stick-symbol for representation of 
 or. For the time being, at least, I will not propose this because many 
 systems will still use stick for decisions. I would, howevever, like to 
 contradict a statement in the Lindsey ISO-code representation paper: 
 "Hopefully, it [the or-symbol] is not as common as & [the and-symbol], so 
 we shall restrict it to just or." In view of DeMorgan's laws, this 
 assumption seems dubious. The proposed Revised Report (LAl(25l) version 2) 
 uses k-"] and 's and only 27 or's, but 1^ of these and 's are in checks to see 
 whether a given operation can be performed on a given file. Changing these 
 to or's is not only conceivable, but would enable the "good programming 
 practice" of specifying the shortest alternative first. For example, 
 'put char' could begin: 
 
 if not opened of f or not put possible (f) 
 
 then undefined 
 
 else . . . 
 
12 
 
 Examples : 
 
 (Rll.l) (rp = ? ? y/(2 * rp)) 
 
 (RIO. 5 -Id.) (pofa<pofb ? false 
 
 ?: pofa>pofb ? true 
 
 ?: lofa<lofb ? false 
 
 ?: lofa>lofb ? true 
 
 ? cofa>cofb) 
 
 (RIO. 3. 5-2 a string) 
 
 (y(j) in 
 
 ( ref char c) : 
 
 ( upb s=l?c:=s? incomp := true ) , 
 
 (ref () char cc) : I 
 
 ( upb s = upb cc - Iwb cc ? cc (:) := s (:) ? incomp := true ) , 
 
 ( ref string ss): ss := s 
 
 out incomp := true ) I 
 
 II. i^. Disentangling U ', 't ', '-', and '[' 
 
 Because few devices provide these graphics, the Report allows a profusion 
 of alternatives. The down-symbol, up-symbol, tilde-symbol, and floor-symbol 
 are variously specified to replaceable by the bold symbols down, shr, up, shl, 1 
 skip, n ot , Iwb, and entier ; also tilde may be synonymous with -i and up-arrow r 
 with **. Unfortunately, the synonymity is context dependent. For example 
 entier may replace the floor-symbol in 
 
 [ (flex I al I xl) (l) 
 
 but the same replacement cannot be made in 
 
 [ (flex I al I xl). 
 
 The intertwining of symbols for exponentiation, shift left;, and raise 
 semaphore is even more complex. We can diagram the relations thus 
 
13 
 
 raise semaphore 
 
 An instance of u£ cannot readily be associated with the correct operation by 
 either a reader or a compiler. Moreover, the hapless programmer is given 
 little guidance as to how he might be able to write his program to avoid 
 possible reader confusion. 
 
 These synonymities were introduced to permit an ALGOL 68 program to be 
 written on any device, but only one representation is required at any one time. 
 It is not difficult to assign symbols to operations so no symbol represents 
 more than one operation. In addition to reducing reader ambiguity, this 
 assignment simplifies construction of encoders for transportation representations, 
 If the problems are resolved as follows, programs can still be represented on 
 any device, but the synonymity conflicts are removed. Where possible, 
 non-stropped representations are also proposed. 
 
 boolean negation 
 skip 
 
 lower bound 
 entier 
 
 not 
 
 skip 
 
 Iwb 
 entier 
 
 as in the Report 
 
 no ~, the standard prelude does 
 not use the tilde and has no 
 problem 
 
 no '[• 
 
 no 
 
 neither operation has a stronger claim to [ than the 
 other, and it would be silly to allow entier to replace 
 Iwb ; this restriction will not make programs appreciably 
 longer 
 
 lower semaphore 
 raise semaphore 
 shift left 
 
 d own 
 
 up 
 
 shl 
 
 as in the Report 
 as in the Report 
 
 shift right 
 
 shr 
 
 it is not intuitively clear why 'up' should mean 'left' 
 and not 'right' ; I considered suggesting '<*' and '*>', 
 but their meanings could be forgotten since shifts are 
 infrequent 
 
3Jf 
 
 exponentiation t "^^ pow *^ will seldom be unavailable; 
 
 t is kept because it looks 
 right; pow will do if the 
 others are imavailable. 
 
 Examples: 
 
 (in an assembler) op shl 8 or adrs 
 
 (tautology test) ( not (2 pow k /= 2rl shl k) | skip | undefined) 
 
 (binary search) 
 proc bis ear ch = (() int a, int arg, default) int : 
 
 ( int val := default; bit s i :- 2rl shl (bits width - l); 
 
 while abs i > u pb a - Iwb a + 1 do i := i shr 1 od; 
 
 int k := ab s i + Iwb a - 1; 
 
 while abs i /= do i := i shr 1; 
 
 (a(k) < arg | k := min (k+ abs i, upb a) 
 
 I : a(k) = arg | val := k; i := bin | k -:= abs i) 
 
 od; val) 
 
 II 5. 'flip': true, 'flop': false 
 
 For bool and bits, ALGOL 68 violates the principle that valid denotations 
 should be valid transput values. As there seems to be no good reason for this, 
 I propose the elimination of 'flip' and 'flop'. The value 'put' for true will 
 be 'true'; that for false, ' false ' . 'Get' for a bool will read a tag and set 
 the bool to true if the tag starts with 't', false for 'f, and call 'char 
 error mended' if the tag starts with neither. The transput value for a bits 
 will be a BITS -denotation. 
 
15 
 
 III. The Design of the Hardware Eepresentation 
 
 This section covers the points raised in section k of Lindsey's paper. 
 The Representation is detailed in Appendix A_, and section III. 10 below 
 contains some notes to explain that Appendix. 
 
 III.l Letter tokens 
 
 The central question here is whether to allow both upper and lower case 
 letters in tags and the letters assigned denotation functions (a,b, c, d, e, f^ 
 g,i,k, 1, n, p, q, r, s^x, y, z for radix, times ten to the power, and format 
 markers). In fact few programmers will want to have tags differing only 
 in the case of one or more letters. (is there really enough difference 
 between 'ox' and 'OX'? Is 'scan' not 'Scan'?) Moreover, many line printers 
 are normally operated with only upper case. Therefore, the basic answer to 
 the central question is that internally the compiler will have only one 
 case of letter symbols (which one is immaterial) . For string denotations 
 and transput of strings, though, case will be distinguished. We now consider 
 three categories of terminals and three contexts for tags and denotation 
 function letters: 
 
 UC/liC terminal, 
 UC means bold 
 
 tag and 
 denotation 
 function letters 
 
 denotation function letters 
 (a,b,c, d, e, f, r, t) 
 
 input 
 UC/lC* 
 
 output 
 LC 
 
 UC/LC terminal, 
 case not used 
 for stropping 
 
 UC/LC^ 
 
 UC/LC^ 
 
 LC 
 
 UC terminal 
 
 UC* 
 
 UC* 
 
 LC** 
 
 *Converted to one case internally. 
 **System software or terminal converts to UC. 
 
 Appendix A lists lower case letters for ISO and upper case letters for EBCDIC. 
 This is only because these characters sets are normally available on the 
 corresponding equipment. 
 
 Of the national variant characters in the last foirr ISO columns, only 
 '@' and '~' are assigned a function in this hardware representation. The 
 other eight are available for use as additional letter-ABC -symbols. When 
 they are not so used they can serve, as suggested below, as style -TALLY -monads 
 and other-string-items. 
 
16 
 
 111. 2. Bold tags. 
 
 This hardware representation is designed to be applicable to any 
 stropping convention. To facilitate this, only one case is assvimed and no 
 function is assigned to ' ' ' and '_'. Stropping is discussed in section IV. 
 
 111. 3. Composite characters. 
 Outlawed as discussed in section I. 3' 
 
 Ill.i^-. Carriage Return, Line Feed, and Delete. 
 
 As Lindsey's paper suggests, CR, LF, CR/LF, LF/CR (my addition), form 
 feed, and vertical tab all should terminate a line of input to the compiler. 
 Delete, backspace, and all other control characters should be ignored. 
 
 111.5. Notes on Particular Representations 
 
 The controversial representations in this design have been discussed 
 in section II. A few further choices deserve comment here. 
 
 'e' for times -ten-to-the-power-symbol 
 
 The Report implies that both 'e' and some other graphic (' „ ' or '\') 
 will be available for this symbol. It does not seem valuable to have two 
 or three graphics mean the same thing, especially where one is widely 
 available and the others are not. 
 
 (space) for space -symbol 
 
 The Report proposes that both the space and visible space ('^') be 
 available. The latter cannot be allowed because it would have to be 
 composite on available equipment. Using underlines for visible spaces does 
 not help because they run together and are no easier to count than spaces. 
 
 '& ' for and-symbol 
 
 This is reasonable and follows Lindsey's proposal. 
 Iwb and entier for floor-symbol 
 
 These two bold tags are not interchangeable alternatives. 
 
 '-)' and '~' for not-symbol 
 
 Section I has described the ridiculous confusion as to the location of 
 these symbols, including the possibilility that in some circumstances 
 not-symbol would be typed as 't ' . One solution would be, as with stick-symbol, 
 to suggest an alternate representation or to abolish the graphics in favor of 
 not . The situation is not as serious with not-symbol, however, because 
 
17 
 
 not-synibol is only an operator and its interpretation is not crucial to 
 interpretation of the structure of the program text. Hence, 't' and '~' 
 are both allowed, depending on which character code is used. Note that, to 
 avoid even further confusion, '~' is not a representation of the tilde-symbol 
 and cannot be used in TAO's. 
 
 ' ? ' for ' error char ' 
 
 The report makes no proposal for this value. 'Error char' is used in 
 transput to be the string value for a value that cannot be translated as 
 specified. Is it not reasonable that this situation should be signalled 
 with the ? 
 
 ' ' for 'blank' 
 
 This assignment corresponds to the decision to use (space) for 
 space -symbol. 
 
 '{' ... '}' for ISO brief -comment -symbol 
 
 The desirability of this representation is shown by PASCAL programs. 
 Some brief-comment-symbol is necessary because 'vf may not be available on 
 all ISO terminals. There is no reason the compiler cannot insist that the 
 left-brace start a comment and the right-brace end it. 
 
 111. 6 Other-string -it ems, other -pragment-i terns. 
 
 The characters available for these will certainly vary depending on the 
 code in use; I am not trying to define an implementation independent language. 
 Essentially every character that can be typed should be allowed in these 
 positions, except for the format effector characters, which only control the 
 listing of the program. (Though a format effector could be a string value 
 by means of transput or repr . ) Except for the quote -image -symbol every 
 character in a string will represent itself; there will be no composite 
 characters or representations of one character by several. 
 
 111. 7 Abs, repr, and conversion. 
 
 Abs and repr should certainly be designed so the position of a character 
 in the code is its abs value. In ISO, abs "A" = 65 and repr 65 = "A"; in EBCDIC 
 abs "A" = 193 and repr 193 = "A". Moreover, in both codes the null character 
 will be repr 0. As far as possible, programs should be written so as to be 
 independent of the actual values associated with abs and repr . As the 
 encoder in my transportation representation paper shows, this can be achieved 
 with minimal effort except for control codes. Appendix A proposes predefined 
 identifiers for the six format affector control codes. 
 
 Lindsey's paper proposes two conversion regimes. 'Stand conv' transmits 
 printable characters unchanged but ignores most control codes. The format 
 codes would create the requisite spaces to position the text as it would 
 appear on paper. Backspace would cause a call of 'char error' of the file. 
 His other conversion was 'complete conv' which transmitted all characters 
 
IB 
 
 unchanged. Ignoring these, Lindsey's paper contains a program that uses 
 what is called 'special conv'. In fact this latter is an important conversion 
 that should be available for system programmers and anyone else -who vra.nts to 
 know exactly how the user encoded spaces (and wants to avoid the processing 
 time required to skip over them). I would call this conversion 'layout 
 encoded conv': printable characters and format effectors are transmitted 
 intact, but other control codes are ignored. When 'get'ing characters, each 
 code is delivered in turn. When 'get'ing strings, a line is terminated by 
 any format effector except horizontal tab; the tenninating character is 
 delivered as the last character of the line. If the line is terminated with 
 CR/LF or LF/CR, both will appear at the end of the line. (This will depend 
 on whether the operating system delivers both codes to the ALGOL 68 monitor. ) 
 
 III. 8. Use of remaining characters. 
 
 Wo virtue is attained by associating a standard meaning with every 
 available character. Indeed, a few ought to remain available as style-TALLY- 
 monads. The characters not yet assigned are 
 
 (ISO) [ ] ! I - \ 
 
 (EBCDIC) I ! jzf 
 
 Of these the brackets should not be monads because they would normally be 
 used in pairs as delimiters, but the others can be monads. 
 
 It is unfortunate that the report makes no provision for style -TALLY -nomads 
 because the given set {=,<,>,*, \ /) is limited and many of its combinations are 
 in use. Indeed, of the characters above, it would not be unreasonable to 
 
 specify that ' ! ', 
 
 and '•^' could not be monadic, but could be nonads. 
 
 With these new nomads there could be dyadic diphthongs like '*! ', 
 
 and even 
 
 Appendix A, however proposes no new nomads. 
 
 7-S 
 
 III. 9. Representations with smaller character sets. 
 
 When only a subset of the characters assumed above is available, bold 
 tags should be used to make up for missing characters. 
 
 if this graphic 
 is missing 
 
 I0 
 
 @ 
 
 < 
 > 
 
 ? 
 
 these bold tags 
 
 can replace its uses 
 
 over 
 
 mod 
 
 not 
 
 
 at 
 
 
 CO 
 
 
 It le of 
 
 
 si ge 
 
 
 and 
 
 
 then else 
 
 elif 
 
19 
 
 If worst comes to worst, colon and semicolon will be missing and must be 
 replaced with iigly diphthongs as suggested in R9.^b. Compilers should 
 implement these only if numerous devices at the installation lack the 
 specified graphic. 
 
 the graphic is replaced by 
 
 += : plus to 
 
 := : is 
 
 /= : isnt 
 
 (the second and third of these are no longer mentioned in the Report, but 
 still do not lead to ambiguity. It can be remembered that semicolon is '., ' 
 and not ',.' because real-denotations can start with '.' but cannot end with 
 it, so that no ambiguity exists in '3.,5' for '3;5'«) 
 
 In cases where standard graphics are absent but other symbols are available, 
 the temptation to use the latter should be resisted with passion . 
 
 The minimum character set for ALGOL 68 is thus: 
 
 letters, digits, =, +, -, *, ,, ., (, ), $, /, ", and possibly 
 some stropping character. 
 
 III. 10. Guide to reading Appendix A. 
 
 Some symbols in Appendix A are marked boldface by underlining them. 
 They are to be keywords, reserved words, or stropped words, according to 
 whatever convention is adopted. 
 
 The first listed "representation" for a symbol should be used if possible. 
 If no representation is possible (and the symbol is not intended as a MONAD), 
 the first possible graphic listed in the "alternatives" column should be 
 chosen. Up-syrabol and down-symbol have neither representation nor alternative 
 and cannot be written in a program {see section 11.^}. 
 
 Please note that "typographic display features" (spaces, new lines, and 
 new pages) cannot divide the characters of a diphthong or bold tag. 
 
 Numerals in braces refer to sections of this paper where decisions are 
 discussed. Where no comment is made, the representation follows the suggestions 
 of the Revised Report. 
 
20 
 
 rv. stropping Recommendations 
 
 ALGOL 60 invented the concept of indivisible symbols represented by 
 boldface identifiers. ALGOL 68 extended this concept to allow the user to 
 invent his own bold tags. In the Revised Report, it is made clear that 
 these symbols are no longer indivisible, but are composed of ordinary 
 letters possibly set off in some way to indicate their type face. The 
 Report does not suggest how they are to be set off, but gives five examples 
 IR9.I1-. 2. 2b} including one using script letters, a feature found on few 
 existing computer transput devices. 
 
 Implementations have adopted diverse stropping conventions; some have 
 even attempted to implement several conventions with provision for pragmat ic 
 selection of one at compile time. Such attempts at universality seem doomed 
 to failure, however. Without much effort I was able to write a list of 
 twenty different stropping conventions, most incompatible with two-thirds 
 of the rest. Instead, the compiler writer should implement one convention. 
 Moreover, each program will at one time or another be transput on every 
 device in the installation, so the stropping convention chosen must be 
 applicable to every device. 
 
 Traditionally, ALGOL 60 programs have been stropped with apostrophes 
 at the beginning and end of each boldface symbol. To me this convention 
 seems unduly cluttering. I have always been struck by the clean aesthetics 
 of ALGOL: short symbols, indentation emphasized as a tool, no semi -colon 
 before else, ALGOL 68 's brief comment forms, the top-down structure of 
 program texts. In this atmosphere, apostrophes seem as appropriate as neon 
 lights in a Japanese garden. Examine 
 
 'begin' 'real' x; 'char' c; 
 
 get (f, (c,x)); 
 
 'if c = "i" 'then' x+1 'else' x 'fi' 'end' 
 
 Note that in the natural (Western) way of looking top-to-bottom and 
 left -to -right, the first item the eye encounters is not a piece of information, 
 but is the interspersive apostrophe. Indeed, the most important distinction 
 between pairs of words is their first letters, but the first 'letter' of 
 all apostrophied words is the same. Note also that a hurried programmer who 
 used other languages might waste time over the confusion between 'if and 
 "i"; not much time, but perhaps enough to lose his train of thought. 
 Moreover, there is no one apostrophe stropping method. Lindsey lists three, 
 and my list included five, only one of which has been made illegal by the 
 Revised Report. 
 
 Another stropping technique that has been proposed is underlining; it 
 does not clutter and has a rather pleasant appearance. Its difficulties are 
 not aesthetic, but operational: it takes longer to enter and revise a 
 prograjn with underline stropping, many text editors and line printer routines 
 do not support underlining. Mechanical backspacing is a slow operation. 
 Finally, underlining constructs composite characters and suffers the 
 disadvantages of listings that are not in one-one correspondence with the 
 file. 
 
21 
 
 Apostrophes clutter. Underlining has operational problems. 
 I reject them both for stropping. (Blithely ignoring the derivation of 
 the word 'stropping'.) What do I propose instead? Reserved words. However, 
 many implementers will feel they must provide some stropping convention so 
 I discuss below four conventions in order of increasing preference. 
 
 IV. 1. Postfix underline is bold (least favored, but better than 
 sixteen other schemes). 
 
 The least intrusive stropping scheme is to append an underline to 
 the end of a bold word. The reader can easily ignore the character, but it 
 is there to resolve ambiguity if he needs it. A trained ALGOL reader scans 
 the indentation before examining the text. With prefix stropping, most 
 lines begin with the low-information-content strop character; with postfix 
 stropping, the strop character is out of the way. 
 
 Our earlier random program would look like this: 
 
 begin_real_x; char_c; 
 
 get (f, (x, c)); 
 
 if_ c="i" then_ x+1 else_ x fi_ end_ 
 
 Note that '_' alone is enough and no space is needed. One of the advantages 
 of this notation to the programmer- cum- keypuncher is that the underline is 
 typed in the same sequence that it would be drawn: last after the rest of 
 the word. Thus written work could still underline bold words and the 
 transliteration while keypunching is not onerous. 
 
 In a small way, this convention is compatible with complete underlining 
 of bold words. If backspaces are ignored and multiple underlines converted 
 
 to a single underline, the compiler can accept "begin 3PPP3 " (where 
 
 'P' means backspace) as equivalent to "begin_". But the user could not 
 type "bP_eP_gP_iP_nP_" and achieve the same effect. (Alternatively, the 
 latter could mean begin and a non-letter be required to end a bold tag. 
 This introduces undue stropping separation in, for example, "re of_ z".) 
 
 Postfix underline stropping is poor for terminals with a non-escaping 
 underline key. They would have to type "begin_ " which would be printed 
 as "begin_". Presumably implementors for such terminals would choose some 
 convention later in this section. 
 
 IV. 2. Upper case is bold. 
 
 One convention gaining wide use in Europe is case stropping: bold tag: 
 in upper case and tags in lower case. 
 
 BEGIN REAL x; CHAR c; 
 
 get (f, (c, x)); 
 
 IF c="i" THEN x+1 ELSE x FI END 
 
22 
 
 If the tags are chosen to be real words, they look more like normal text if 
 they are lower case. (Theoretically, no space woiild be required between 
 "REAL" and "x", but it might be considered gauche to omit it. Consider 
 "reOFz".) 
 
 Case stropping is excellent, except that existing devices, especially 
 line printers, often support only upper case. One could postulate combining 
 this technique with underlining or apostrophes to be used on an upper case 
 only device. Unfortunately, this fails because in systems with only one 
 case, it is usually upper case; plain tags would be upper case and thus 
 mistaken for bold. 
 
 17.3. Lower case or postfix underline is bold. 
 
 A reversal of the previous convention allows combination of both the 
 first two conventions. Bold words would be lower case (as they are in the 
 Report) or would be followed by an underline: 
 
 BEGIN_real X; char C; 
 
 GET (F, (X,C)); 
 
 if C="i" THEN_ X+1 else X fi end 
 
 (The mixed case if-THEN_-else-fi looks unpleasant and is. I postulate that 
 in this particular case it was forced on the programmer because he was fixing 
 the "THEN_" at an upper case terminal.) Presumably, the user would use case 
 distinction at a mixed case terminal and fall back on postfix underline at 
 an upper -case -only terminal. An upper case line printer would print all as 
 upper case, and the user would have to rely on context to distinguish bold 
 from roman. The latter is never an arduous chore in a well written program. 
 
 The only disadvantage with this third stropping convention is that the 
 second, contradictory, convention has achieved considerable useage, 
 
 IV. k. Reserved words are bold (most preferred technique). 
 
 Few languages other than ALGOL have stropping conventions, most rely 
 on reserved words or a language design that makes all distinctions 
 determinable from immediate context. Many successful ALGOL compilers have 
 avoided stropping. Several ALGOL 68 implementations (at least Vancouver, 
 Dartmouth, Illinois Institute of Technology, and probably many more) have 
 worked out techniques to minimize or eliminate stropping. Here is a random 
 program without stropping: 
 
 begin real x; char c; 
 
 get (f, (x, c)); 
 
 if c = "i" then x+1 else x fi end 
 
23 
 
 Is there any reason to believe that the human reader should have difficulty 
 distinguishing bold from roman in this program? 
 
 One interesting point is the fact that the Eevised Report lists 20^ 
 more predefined reman words than bold tags. To be sure, the roman words 
 can all be redefined with far less penalty than a redefinition of if; but 
 still they constitute a large class of identifiers the user must remember. 
 
 In the past, automatically generated parsers have behaved poorly when 
 confronted with a reserved word used as an identifier. There is no inherent 
 need for this to be the case, and work such as that reported by Graham and 
 Rhodes (7) is showing how to avoid these problems. 
 
 What is to be done about identifiers that contain a reserved word as 
 an integral part, for example "year to date"? It is possible to forbid 
 this, but that leaves too many opportunities for the programmer to forget 
 that something is a reserved word. After all, the entire identifier is 
 far from any reserved word. Instead, I suggest that underlines replace 
 blanks in identifiers. Where an identifier is continued from one line to 
 the next, it may have embedded blanks, but it must end with an underline on 
 the first line or begin with an underline on the second line. Either 
 
 year_to_ 
 
 date 
 
 or 
 
 year_to 
 
 _date 
 
 will be acceptable renderings of "year_to_date " . 
 
 In ALGOL 68 a user can declare arbitrary tags to be bold face. There 
 are three reasonable techniques for dealing with these. (l) Such tags would 
 be iinstropped and treated as bold only in the block where they were so 
 declared. This technique implies that the token scanner will analyze the 
 block structure, a good idea because it can aid automatic correction of 
 bracket errors. (2) They could be unstropped, but treated as bold throughout 
 the compilation. At least one implementation (IIT) has chosen this route. 
 (3) They could be stropped in some way. Underlines serve as spaces in roman 
 tags, so they cannot be used; apostrophes at both ends are unnecessary; 
 postfix apostrophes are almost as unobtrusive as postfix -underlines; so I 
 propose that a postfix apostrophe be used. 
 
 Much as I dislike stropping of standard bold tags, I recommend the third 
 method of distinguishing user created bold tags. They will certainly be 
 less used than syntactic words like if and real, and usually they will not 
 appear more than once in a phrase. More critically, they will be unfamiliar 
 to a reader of the program, so he deserves to have them set off in some 
 manner. Here is a part of a program: 
 
2k 
 
 mode token' = struct (int type, string val, rtok'next), 
 
 rtok' = ref token'; 
 rtok'toklist := nil, eop := heap token'; 
 ref rtok'tokput := toklist; 
 sema tokens_ready = level 0, 
 
 may_move_tokens = level 1; 
 
 Note that in its use as a mode or a monadic operator, a user defined bold 
 tag is separated from its object by the apostrophe. It can be viewed as a 
 notation that the thing to its left operates on the thing to its right. 
 Moreover, a postfix apostrophe is somewhat like normal usage where an 
 apostrophe may appear near the end of a word. 
 
 Some emergency method of stropping may be necessary anyway to handle 
 cases where a tag is declared both bold and roman. In such cases, it would 
 be assumed roman unless stropped or unless the syntactic position demanded 
 that it be bold. This mechanism is one way to solve the problem of 're' 
 and re_ and 'im' and im. 
 
 (Writing this, I have come to worry about "re" and friends. Why do 
 both operator and field selector exist? Synonymously, is compl a primitive 
 mode or is it a struct ? If it is a mode then we need only operators to 
 process it and an operator to construct objects of that modes (which we 
 have: '!')• Viewed thus, it makes no sense to assign a value to part of 
 the primitive object "z". On the other hand, given the stropping separation 
 of re and im and given "-<" for of^ it is just as reasonable to eliminate 
 re and im from the language.) 
 
 Final thought: I believe an efficient compiler can be written so that 
 no more than 29 words are reserved: 
 
 BEGIN, BY, CASE, CO, COMMENT, D0, ELIF, ELSE, END, ESAC, FI, 
 FLEX, FOR, FROM, IF, IN, MODE, OD, OP, OUSE, OUT, PR, PRAGMAT, 
 roiO, PROC, REF, STRUCT, THEN, TO, UNION, WHILE. 
 
 With these words, the structure of the text can be determined. The other 
 bold words in the Report could be redeclared, but could not be used in their 
 bold sense within that block. 
 
 I offer this limited-reserved-words approach as a challenge to parser 
 implementers. 
 
25 
 
 References 
 
 (1) van Wijngaarden, A., et al.. Almost the Revised Report on the 
 Algorithmic Lanugage ALGOL 68, private communication^ (SeptV7l9T3 ) . This 
 version is slightly more recent than the version considered at Los Angeles 
 and includes most of the corrections agreed on there. 
 
 (2) Lindsey, C. H., 'An ISO-Code Representation for ALGOL 68', ALGOL 
 Bulletin 31 (March, 1970), pp. 3T-6o. 
 
 (3) ANSI, 'Data Communication Control Procedures for the USA Standard Code 
 for Information Interchange', CACM 12, 3 (March, I969), PP- I66-I78. ASCII 
 is listed in Appendix-E. 
 
 {k) IBM Corp., IBM System/360 Principles of Operation, Order No. GA22 -6821-8, 
 1970, pp. 150.2-150.3. (Note that the graphic '.' has been omitted from 
 position 'i|B' of the document.) 
 
 (5) ANSI, 'Correspondences of 8-Bit and Hollerith Codes for Computer 
 Environments - A USASI Tutorial', CACM 11, 11 (Nov., I968), pp. 783-789. 
 Corrected in CACM 12, 5 (May, I969), p. 29^^. 
 
 (6) Freeman, ¥., 'Suggestions regarding certain representations in ALGOL 68', 
 ALGOL Bulletin 3h (July, 1972), pp. kl-kh. 
 
 (7) Graham, S. L. and S. P. Rhodes, 'Practical Syntactic Error Recovery 
 
 in Compilers', Conference Record of ACM Symposium on Principles of Programming 
 Languages, Boston, Massachusetts (Oct., 1973)^ PP- 52-58. 
 
 (8) Bemer, R. W., 'Backspace Bungle', Datamation 19, 9 (September, 1973)^ p. 25. 
 
26 
 
 Appendix A. Proposed Hardware Representation 
 
 Where there is a difference between the ISO and EBCDIC representations, 
 the respective parts are labeled (iSO) and (EBCDIC). 
 
 8.1.4 Character denotations 
 
 d) other string item : style i monad {942b}; 
 
 {111.6} 
 
 (iSO) ~ [ ] { ] (and upper case letters) 
 
 (EBCDIC) -I (non-printing codes above 'space'; lower case letters). 
 
 9.1.1 Syntax 
 
 (111.3} 
 
 IF :: choice using boolean. 
 CHOICE brief start : open token. 
 IF bold start : if token. 
 CASE bold start : case token. 
 IF brief in : decision token. 
 IF bold in : then token. 
 CASE STYLE in : in token. 
 IF brief again : again token. 
 IF bold again : else if token. 
 CASE bold again : out case token. 
 IF brief out : decision token. 
 IF bold out : else token. 
 CASE STYLE out : out token. 
 CHOICE brief finish : close token. 
 IF bold finish : fi token. 
 CASE bold finish : esac token. 
 
 {there is no 'CASE brief again'. The provisions above can be diagrammed: 
 
 start in again out finish 
 IF brief 
 CASE brief 
 IF bold 
 CASE bold 
 
 ( 
 
 9 
 
 ?: 
 
 9 
 
 ) 
 
 ( 
 
 in 
 
 X 
 
 out 
 
 ) 
 
 if 
 
 then 
 
 elif 
 
 else 
 
 fi 
 
 case 
 
 in 
 
 ouse 
 
 out 
 
 esac 
 
 out esac ] 
 
 9.2 Comments and pragmats 
 
 d) STYLE other FRAGMENT item : quote symbol; 
 
 other string item (except not STYLE-PRAGMENT-symbol). 
 
 {111.6} 
 
27 
 
 9.'4-.2 other TAX symbols 
 
 a) style i letter ABC : [if case is not used for stropping] tlH-l} 
 (ISO) upper case of letter ABC symbol 
 (EBCDIC) lower case of letter ABC synibol 
 
 (internally all STYLE-letter-ABC ' s are converted to one case}. 
 
 b) style i monad : (ISO) ! \ | '^ 
 
 (EBCDIC) \ \ i 
 9.^.2.2 Representation 
 
 [111.8] 
 
 {IV] 
 
 b) Stropping. {This representation is compatible with all proposed 
 stropping conventions and favors none (pun intended).] 
 
 9.4.1 Representations of symbols 
 
 a) Letter symbols 
 
 
 
 
 
 
 
 (III.l] 
 
 symbol 
 
 (ISO) 
 
 (EBCDIC ) 
 
 symbol 
 
 (ISO) 
 
 (EBCDIC ) 
 
 letter a 
 
 symbol 
 
 a 
 
 A 
 
 letter 
 
 n 
 
 symbol 
 
 n 
 
 N 
 
 letter b 
 
 symbol 
 
 b 
 
 B 
 
 letter 
 
 
 
 symbol 
 
 
 
 
 
 letter c 
 
 symbol 
 
 c 
 
 C 
 
 letter 
 
 P 
 
 symbol 
 
 P 
 
 P 
 
 letter d 
 
 symbol 
 
 d 
 
 D 
 
 letter 
 
 q 
 
 symbol 
 
 q 
 
 Q 
 
 letter e 
 
 symbol 
 
 e 
 
 E 
 
 letter 
 
 r 
 
 symbol 
 
 r 
 
 R 
 
 letter f 
 
 symbol 
 
 f 
 
 F 
 
 letter 
 
 s 
 
 symbol 
 
 s 
 
 S 
 
 letter g 
 
 symbol 
 
 g 
 
 G 
 
 letter 
 
 t 
 
 symbol 
 
 t 
 
 . T 
 
 letter h 
 
 symbol 
 
 h 
 
 H 
 
 letter 
 
 u 
 
 symbol 
 
 u 
 
 U 
 
 letter i 
 
 symbol 
 
 i 
 
 I 
 
 letter 
 
 V 
 
 symbol 
 
 V 
 
 V 
 
 letter j 
 
 symbol 
 
 J 
 
 J 
 
 letter 
 
 w 
 
 symbol 
 
 w 
 
 W 
 
 letter k 
 
 symbol 
 
 k 
 
 K 
 
 letter 
 
 X 
 
 symbol 
 
 X 
 
 X 
 
 letter 1 
 
 symbol 
 
 1 
 
 L 
 
 letter 
 
 y 
 
 symbol 
 
 y 
 
 Y 
 
 letter m 
 
 symbol 
 
 m 
 
 M 
 
 letter 
 
 z 
 
 symbol 
 
 z 
 
 Z 
 
 b) Denotation symbols 
 symbol 
 
 representation 
 
 digit zero symbol 
 
 
 
 
 
 digit one symbol 
 
 
 1 
 
 
 digit two symbol 
 
 
 2 
 
 
 digit three symbol 
 
 
 5 
 
 
 digit four symbol 
 
 
 k 
 
 
 digit five symbol 
 
 
 5 
 
 
 digit six symbol 
 
 
 6 
 
 
 digit seven symbol 
 
 
 7 
 
 
 digit eight symbol 
 
 
 8 
 
 
 digit nine symbol 
 
 
 9 
 
 
 point symbol 
 
 
 • 
 
 
 times ten to the power 
 
 symbol 
 
 (ISO) e 
 
 (EBCDIC) E 
 
 (III. 5] 
 
28 
 
 symbol 
 
 true symbol 
 
 false symbol 
 
 quote symbol 
 
 quote image symbol 
 
 space sjrmbol 
 
 comma symbol 
 
 empty symbol 
 
 c ) Operator symbols 
 
 symbol 
 
 or symbol 
 and symbol 
 ampersand symbol 
 differs from symbol 
 is less than symbol 
 is at most symbol 
 is at least symbol 
 is greater than symbol 
 divided by symbol 
 over symbol 
 percent symbol 
 window symbol 
 floor symbol 
 ceiling symbol 
 plus i times symbol 
 not symbol 
 tilde sj.Tribol 
 down symbol 
 up symbol 
 plus symbol 
 minus symbol 
 equals symbol 
 times symbol 
 asterisk symbol 
 assigns to symbol 
 becomes symbol 
 
 representation 
 
 true 
 false 
 
 (space) 
 
 {111.5} 
 
 empty 
 
 representation alternates 
 
 & 
 < 
 
 > 
 
 / 
 
 (EBCDIC) "• 
 
 or 
 
 & 
 
 and 
 
 (III. 5 
 
 ( 
 
 /= 
 
 It 
 
 ne 
 
 
 
 <= 
 >= 
 
 le 
 St 
 
 
 
 i 
 
 over 
 
 
 
 elem 
 
 
 
 
 Iwb, 
 upb 
 
 entier 
 
 {II. ^ 
 {11.^} 
 {11.1} 
 
 III. 5] 
 
 (ISO) ~ not [Tl.h, III. 5] 
 {II. i^} 
 
 > 
 
 e£ 
 
 ■X- 
 
 (Although they are not listed in S.k, the following are defined in chapter 10.} 
 
 exponentiation operator 
 modulo operator 
 plus and becomes operator 
 minus and becomes operator 
 times and becomes operator 
 divided by and becomes operator 
 over and becomes operator 
 modulo and becomes operator 
 plus to operator 
 shift left operator 
 shift rif^ht operator 
 raise semaphore operator 
 lower semaphore operator 
 
 + 
 
 ■X- 
 
 9 
 
 pow t 
 mod 
 plusab 
 minusab 
 time sab 
 divab 
 overab 
 mo dab 
 
 plus to 
 
 shl 
 
 shl 
 
 "^own 
 
 (11.^} 
 
 ; 
 
 (11.^} 
 
29 
 
 d) Declaration symbols 
 As in the Revised Report 
 
 e) Mode standards 
 
 As in the Revised Report 
 
 f) Syntactic symbols 
 
 symbol 
 
 bold begin symbol 
 bold end symbol 
 brief begin symbol 
 brief end symbol 
 and also symbol 
 goon symbol 
 completion symbol 
 label symbol 
 parallel symbol 
 open symbol 
 close symbol 
 decision symbol 
 again symbol 
 if symbol 
 then symbol 
 else if symbol 
 else symbol 
 fi symbol 
 case symbol 
 in symbol 
 out case symbol 
 out symbol 
 esac symbol 
 colon symbol 
 brief sub symbol 
 brief bus symbol 
 style i sub symbol 
 style i bus symbol 
 up to symbol 
 at symbol 
 is symbol 
 is not symbol 
 nil symbol 
 of symbol 
 routine symbol 
 go to symbol 
 go symbol 
 skip symbol 
 formatter symbol 
 
 representation 
 
 begin 
 end 
 
 ) 
 
 par 
 
 ? : 
 
 if 
 
 then 
 
 elif 
 
 else 
 
 fi 
 
 case 
 
 in 
 
 ouse 
 
 out 
 
 esac 
 
 @ 
 
 nil 
 
 -< 
 
 goto 
 
 go 
 
 skip 
 
 } {11.3} 
 
 at 
 is 
 isnt 
 
 of 
 
 CII.3} 
 
30 
 
 g) Loop symbols 
 
 No change from Report 
 
 h) Fragment symbols 
 
 symbol 
 
 brief comment symbol 
 bold comment symbol 
 style 1 comment symbol 
 style ii comment symbol 
 bold pragmat symbol 
 style i pragmat symbol 
 
 10.2.1 Environment enquiries 
 
 representation 
 
 (ISO) (...} 
 comment 
 
 CO 
 
 # 
 pragmat 
 
 HI 
 
 p) int max abs char = (iSO) 127 (EBCDIC) 255; 
 q) char null character = repr 0; 
 
 r) 
 
 char flip = "t"; 
 
 s) 
 
 char flop = "f"; 
 
 t) 
 
 char errorchar = 
 
 u) 
 
 char blank = " " 
 
 v) char horizontal tab = repr ((EBCDIC) 5 (iSO) 9), 
 backspace = repr ((EBCDIC) 22 (iSO) 8), 
 carriage return = repr 15, 
 line feed = repr ((EBCDIC) 57 (iSO) 10 ), 
 vertical tab = repr 11, 
 form feed = repr 12; 
 
 10.6.1. Library Preludes 
 
 a) proc complete conv = (ref book b) conv: 
 (conv c; 
 for i from to max abs char do 
 
 c); 
 
 (aleph of c) (i) := ( repr i, repr i) od ; 
 
 {111.5} 
 
 b) proc layout encoded conv = ( ref book b) conv : 
 
 # characters to be ignored are set to null # 
 
 ( conv c; 
 
 for i from to_ abs blank - 1 do 
 
 (aleph of c"5 {i) :- (null character, repr i) od ; 
 for i to 6 do 
 
 char ch = (i in horizontal tab, backspace, 
 
 carriage return, line feed, vertical tab, 
 form feed) ; 
 (aleph of c) ( abs ch) := (c, c) od ; 
 for i from abs blank to max abs char do 
 
 c); 
 
 (aleph of c ) (i) := ( repr i, repr i) od; 
 
OGRAPHIC DATA 
 
 r 
 
 1. Report No. 
 
 UIUCDCS-R-75-607 
 
 2. 
 
 3. Recipient's Accession No. 
 
 c and Subt itle 
 
 A Revised ALGOL 68 Hardware Representation 
 for ISO-code and EBDCID 
 
 5. Report Date 
 
 November, I973 
 
 6. 
 
 ior(s) 
 
 Wilfred J. Hansen 
 
 8. Performing Organization Rept. 
 No. 
 
 orming Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 
 
 10. Project/Task/Work Unit No. 
 
 11. Contract/Grant No. 
 
 insoring Organization Name and Address 
 
 Department of Computer Science 
 
 13. Type of Report & Period 
 Covered 
 
 University of Illinois 
 Urbana, Illinois 
 
 14. 
 
 jplementary Notes 
 
 stracts 
 
 Because of the latitude allowed by the Revised AIGOL 68 Report, each 
 ementation has a slightly different representation for the constructs of the 
 uage. This diversity can only lead to confusion as ALGOL 68 trained individuals 
 
 they need readaptation to program at a new installation. The solution 
 osed here is to develop a single hardware representation which can be used 
 any computer systems. In fact this representation can conveniently be designed 
 g only the intersection of the graphic characters available in the ISO code 
 EBCDIC . 
 
 The paper also proposes comfortable new representations for a few symbols 
 discusses the thorny problem of distinguishing bold face words. 
 
 y Words and Document Analysis. 17o. Descriptors 
 
 ALGOL68, 
 
 hardware representation, program interchange, 
 
 symbols, characters, 
 
 bold face letters, 
 
 ASCII, ISO-code, EBCDIC 
 
 lentifiets/Open-Ended Terms 
 
 OSATI Field/Group 
 
 'liability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21. No. of Pages 
 
 22. Price 
 
 <5 (10-70) 
 
 USCOMM-DC 40329-r>71 
 
'i. 
 

 UNivEBsrrv of Illinois ubbana 
 
 3 0112 064441527 
 
 
 
 
 
 • '" " '>■ "-i -^f-^ 
 
 ^ii ' * i i i 
 
 ... ' ' - C-t' 
 
 W 
 
 Iri 
 
 .■'....• •■.!•■■• '^-WM