^U^ i ' ». LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510. 84- no. 60V GI2 cob. 2 CEMIRAl CIRCUUTIOM AMD BOOKSTACKS r person >J---|rarof /et™ responsible or -t^ J™™ ^^,„„. You APR y. m When renewing by phone. «rlt. new due dale below previous due date. " 7 UIUCDCS-R-T5-607 Ho. o2^ A Revised ALGOL 68 Hardware Representation for ISO-code and EBCDIC November, 1975 by Wilfred J. Hansen THE LIBRARY OF THE JAN 9 1974 UNIVERSITY OF ILLINOIS AT ')F?P ■'N' ■^. • ■ • -onipM DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS uiucDCS-R-73-607 A Revised ALGOL 68 Hardware Representation for ISO-code and EBCDIC by Wilfred J. Hansen November, 1973 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois This work was supported by the Department of Computer Science Digitized by the Internet Archive in 2013 http://archive.org/details/revisedalgol68ha607hans 310. /¥ if Annotated Table of Contents Page I. Design Considerations for AlfiOL 68 Representations A philosophical discussion of the problems that complicate representation design 1 1.1 Psychological considerations 1 1 . 2 Deci sions demanded by the Report 3 1.3 Hardware Considerations Including tables of ISO-code and EBCDIC k II. Five ALGOL 68 Symbol Set Suggestions 10 II. 1 plus -i -times -symbol: '+*' 10 II. 2 of-symbol: '-<' 10 II. 3 stick-symbol, again-symbol, or-symbol 11 11.^ Disentangling U ', 't % '-', and '[ ' 12 II. 5 'flip': true, 'flop': false lit- III. The Design of the Hardware Representation Using characters available in both ISO and EBCDIC I5 III . 1 Letter tokens I5 111. 2 Bold tags 16 111 . 3 Composite characters 16 III.i^- Carriage Return, Line Feed, and Delete I6 III. 5 Notes on Particular Representations I6 111. 6 Other-string-items, other-pragment -items I7 111. 7 Abs, repr, and conversion 17 111. 8 Use of remaining characters I8 111. 9 Representations with smaller character sets I8 Ill page III . 10 Guide to reading Appendix A I9 rV. Stropping Recommendations Stropping clutters, but if you must strop, use case shift or a post-fix underbar 20 IV. 1 Postfix underline is bold (least favored, but better than sixteen other schemes) 21 IV. 2 Upper case is bold - 21 IV. 3 Lower case or postfix underline is bold 22 IV.k- Reserved words are bold (most preferred technique) 22 References 25 Appendix A. Proposed Hardware Representation 26 IV Abstract Because of the latitude allowed by the Revised ALGOL 68 Report, each implementation has a slightly different representation for the constructs of the language. This diversity can only lead to confuse as ALGOL 68 trained individuals find they need readaptation to programs at a new installation. The solution proposed here is to develop a single hardware representation which can be used on many computer systems. In fact this representation can conveniently be designed using only the intersection of the graphic characters available in the ISO code and EBCDIC. The paper also proposes comfortable new representations for a few symbols and discusses the thorny problem of distinguishing bold face words, V Preface I didn't want to write this paper. After the Los Angeles meeting of WG2.1 approved the Revised Eeport on the Algorithmic Language ALGOL 68 (l) I decided to spend two leisurely days designing a transportation representation for the language. It wasn't that easy (hut I did it, it's another paper). Because the transportation representation is an encoding of ALGOL 68 program texts, an adequate explanation requires sample encoder and decoder programs. These must assume some specific hardware representation. Lacking a compiler I decided I could quickly design a suitable hardware presentation language of my own. Older now, but wiser, I offer the following. Sections three and four explain the hardware representation and sections one and two explain why I chose it. To a large extent, the sections can be read independently and the recommendations of one adopted without adopting any other. In particular, the transportation representation in no way depends on this hardware representation. vn Acknowledgements This paper has been written with continuous reference to 'An ISO-Code Representation for ALGOL 68* by C. H. Lindsey (2). I am indebted to J. E. L. Peck for introducing me to ALGOL 68 and patiently explaining obscurities I encountered. I. Considerations in the Design of Hardware Representations Now that the Revised Report has been approved by WG2.1, it is appropriate to reconsider the question of hardware representations of the language. (Hereafter, the Revised Report will be referred to as the Report; if there were any references to the earlier Report, they would specify the Original Report. Most remarks apply to both, anyway.) The Report has been written with the thought that it will be implemented on a wide variety of hardware with vastly differing character sets. As a consequence, it is not particularly specific as to how any construct will be represented on (say) cards. Rather than envision the possibility of a tower of Babel of representations, I suggest that there is in fact a widely available set of graphics with which all constructs of the language can be represented. Selection of a widely available character set will bring these benefits : o as trained AlfiOL 68 programmers move from implementation to implementation, they will be able to begin programming and reading programs immediately and without confusion. o as programs are sent from one installation to another, they will be understood without lengthy explanation and constant referral by the reader between the text and a codebook. o many installations have a variety of devices with different character sets. Only by choice of a widely available representation will programmers be able to access the files containing their programs from all these devices, including both terminals and line printers. o ALGOL 68 will more readily be accepted by outsiders as a single language and not a collection of similar languages. It must be recognized that at the majority of installations, and especially in North America, ALGOL 68 will not be the primary language in use. For this reason it cannot be expected that the operating system will have provisions especially suited to the language. In particular, for many years ALGOL 68 by itself will not be a strong force determining the nature of character sets provided as standard by manufacturers. The many considerations that affect design of a representation can be categorized into psychology, ALGOL 68, and hardware. These topics are covered in the remainder of section I. Section II suggests a few representations for symbols that may be controversial. It is important to note that the representation finally arrived at can be adjusted to take into account rejection of any of these suggestions. Section III details the decisions made in the rest of the representation and section IV discusses the sticky question of representations for bold-tags. I.l Psychological considerations. Care must be exercised in the design of representations for a number of general psychological reasons : a) There are many possible sources of small confusions in representation design: odd. characters, context dependent usage, dissimilar usage in similar contexts in similar languages, breaks in typing rhythms, and more. Each instance of confusion may be only a minor annoyance, or it may interrupt a train of thought and result in omission of critical phrases. Moreover, confusions may have a cummulative effect that can lead to frustration and breakdown in communication. b) To some extent the physical representation and not the abstract 'strict language' is the medium of thought. This is true when writing a program and even more so when reading a program. Consequently, variations between physical representations ought to be carefully restricted. c) When writing a program, a trained programmer writes by reflex and concentrates instead on the task at hand. For small changes of representation when changing installations, the retraining period is probably small, but if it can be avoided, everyone benefits. d) Representations should be chosen with an eye to the representations used in other fields and in other languages. The task of attracting ALGOL 68 users is sufficiently difficult without repelling them with unusual symbol choices. (The Report has excellent symbol choices, I worry about implementations Beyond these general considerations, the designer must keep in mind a number of human factors effects : a) Some reasonably consistent aesthetic should be followed in the design to assist in readability. The aesthetic of the ALGOL 68 reference language is a pleasant combination of natural language and mathematical conventions. It seems characterized by economy of expression and avoidance of clutter. b) Simultaneous with aesthetics, the designer must strive for understand- ability, unmistakability, and clarity. c) A specific problem is that operators that bind closely ought to take less space than those that bind more loosely. In this regard, a bold tag used as an operator can suffer "stropping separation". 'This includes not only the length of the tag, but also the character (s) needed to strop it and any necessary blanks. d) Another factor is the length of the text. Too short a text may correspond to a program that has been abbreviated beyond reason; but too long a text slows both the writer and the reader. A longer length for an infrequently used operator is acceptable because it will not substantially j affect the length of the program. . \ e) No one is aided if an implementation provides too many alternative ways of expressing a single construct. Programmers are constantly forced to [ choose among alternatives; readers must be prepared to encounter that many more symbols. When text includes rare alternative form, the reader may , remember it erroneously; he will at least have to interrupt his reading to try to recall the symbol. Selection of specific symbols must depend on yet other factors: a) The existence of groups of potential programmers trained in the meaning of a symbol. For example, '+' ought to mean addition because most potential users have studied algebra. b) Use of the symbol in other widespread languages. Conflicting usage could be a source of confusion. c) Relation of a symbol to its meaning ("graphic onomatopoeia"). For example, parentheses, braces, and brackets do appear to surround their contents. d) The possibility of a confusing similarity between two graphics. Lindsey points out, for example, that '+' and '«•' appear alike on teletypes. 1.2. Decisions demanded by the Eeport. In a number of areas, the Report leaves considerable latitude to accommodate implementations with varying character sets. The representation must specify what is allowed for other-string-item, STYLE -other-PRAGMENT-item, style-TALLY-letter-ABC, and style -TALLY -monad (R9.J4-a}. At least one means must be provided to write any operator in the standard prelude {R9.^b}, decisions must be made as to the values of abs, repr, 'null character', 'error char', 'flip', 'flop', 'blank', and 'max abs char' [RIO. 2.1}, and a conversion must be associated with 'stand conv' {10. 5.1. 2d}. In 9«^ t), the Report accepts an implementation even if it provides only one of the alternatives for each symbol (say either '@' or at_). The intention, in fact, seems to be to accept any representation language that provides at least one way of expressing each construct in the language. Thus an implementation need not necessarily have both stick-symbol and then-symbol, since with either one a choice-clause can be constructed {R9.1.1 c,h}. Similarly, a times -ten -to -the -power -symbol might be omitted since some letter-e- symbol will usually be available [R8. 1.2.1 h}. Selection of representations for standard prelude operators is complicated by the fact that not only do many operators have a number of symbols, but many symbols are assigned more than one function. Some way must be found through the mazes of relationships among uses and alternatives for 't ', '~', and '['. Likewise, there are some complications if other than the reference representations are chosen for any of the multiple symbols that map into certain representations (for example, four symbols map into ':'). Some decision must be made as to whether the implementation will accept the "allowable" alternatives like '..' for ':'„ [R^.kh] Occasionally, a representation designer will be forced to consider use of a diphthong for some operation. In this effort, he must check the operator grammar in 9«^'2.1 of the Report to see that the diphthong is legal and thus will not cause ambiguity. A secondary consideration is to try to leave intact the possibility of families of operators. For example, diphthongs ending in '/' should be avoided because they are all available for a family: V V // -/ ^/ Indeed, this is the family of APL reduction operators. Finally, 9«^«2.2 b specifies that a bold-tag is composed of marks corresponding to its LETTER'S and DIGIT'S, where the 'mark corresponding to each LETTER ([or] DIGIT) is similar to the mark representing the corresponding LETTER-symbol ([or] DIGIT symbol)'. Interpretation of the word 'similar' has led to a variety of "stropping techniques". The representation designer must choose one of these techniques. My own suggestions will hinge on the observation that nothing is more similar to an object than itself, but see section IV for the gory details. 1.3 Hardware Considerations. Two standard codes for computer text have been defined and widely used: the ISO code (2) (and especially its ASCII subset (3^5)) and EBCDIC (^,5). The former are used by most of the world, and the latter is used by only one manufacturer. Tables defining these two codes are reproduced in figures 1, 2, and J* Note that ISO has many national variants and the code provides spaces where national groups can place characters specific to their own needs. An ALGOL 68 program will certainly not be given the same binary encoding in the two codes, since, for example, '+' is '00101110' in EBCDIC and '0101011' in ISO. However, in an important sense one can make programs in the two codes appear similar; the graphics chosen to represent each ALGOL 68 construct can be the same in both codes . Examining the code tables, we see that ISO has a number of characters that will not be the same on every terminal, the so called "national characters". These interfere with a common graphic representation of ALGOL 68 so they should be avoided. Leaving them aside, the following characters are available in both codes: upper and lower case letters : a-z A-Z digits : 0-9 space ! " # $ f^ p, ' ( ) * + ^ - • / : ; < = > ? @ _ The following control characters appear in both codes and ought to be considered in design of a representation: BS backspace HT horizontal tab CR carriage return LF linefeed DEL delete MJL null FF form feed VT vertical tab + 16 32 i^8 Gk 80 112 128 ihk l6o 176 192 208 22^ 2I+0 NUL DLE DS space & - 1 SOH DCl SOS / a 3 A J 1 2 i i STX DC 2 FS SYN b k s B K S 2 1 3 ' ETX TM c 1 t C L T 3 1 i. ' FF RES BYP PN d m u D M U ^ ' 5 HT NL LF RS e n V E N V 1 5 ! 6 LC BS ETB UC f w F W 1 6 ■ 7 DEL XL ESC EOT ' g P X G P X 7 : 8 CAN h i q r y z H I Q R Y Z 8 ' 9 SMM EM cc SM / j , — ^ l • i 9 10 11 VT GUI CU2 CU3 • $ } # i 1 i i [ 12 FF IFS DCi^ < * I0 1 @ : 1 i 13 CR IGS ENQ NAK ( ) ' ! 1 li^ 15 SO SI IRS lUS ACK BEL SUB + • -I > 9 tt ;' Figure ; 1. Extended Binary- ■Coded- ■Decimal Interchange Code (EBCDIC ) (Chart adapted from (k).) 0+ 16+ 32+ k-8+ 6k+ 80+ 96+ 112+ 1 2 3 1^ 5 6 7 8 9 10 11 12 13 11+ 15 NUL DIE space (@) P n P SOH DCl 1 ■ 1 A Q a q SIX DC 2 IT 2 B R b r ETX DC5 (£) # 3 C S c s EOT DCi^ $ i^ D T d t ENQ NM i 5 E U e u ACK SYN & 6 F V f V BEL ETB t 7 G W g w BS CAN ( 8 H X h X HT EM ) 9 I Y i y LF SUB ■)«• • J Z J z VT ESC + 5 K ([) [ k { KF FS ^ < L \ 1 CR GS - = M (]) ] m } SO RS • > N (-) /-\ n (") SI US / 9 — DEL Figure 2. ISO 7 -bit Coded Character Set (also known as ASCII) Some codes are not officially assigned graphics, but the preferred alternatives are shown in parentheses. The right hand side of columns 32, 80, and 112 show the alternatives chosen for ASCII. (ISO chart taken from (2).) 35 61^ 91 92 93 9^ 96 123 121+ 125 126 ASCII # @ [ \ ] y\ - [ ] Australia # @ [ \ ] /\ ^ { ] Denma.rk, Finland, A 6 • A a b • a Norway, Sweden France 1 £ a o 9 § /N ^ e u e ~ 2 [ \ ] W. Germany 1 # @ [ \ ] /\ «» 1 } 2 £ S A U a o a p Japan 1 # @ [ ^ ] ■^ V 1 1 } 2 £ # 1 New Zealand @ [ ] -\ "* - i United Kingdom 1 £ @ [ \ ] t ^ { } 1 2 1 10 3 1 1 1 Figure 3- ISO Alternatives Adopted by Various Countries (Taken from Lindsey (2).) 8 A few comments on individual characters: '#' is not ISO standard, but is the ASCII choice for position 35 in ISO. The alternative is '£', a currency symbol as is ';^'. •@' is not ISO standard, hut is the preferred alternative for position 6k and is only infrequently assigned other graphics. ' I ', '! ', '-', and '-' Extended discussion of these follows. The manufacturer supporting EBCDIC has helped introduce endless confusion into the codes for these graphics. The following charts illustralTe this by giving the hexadecimal location in the code of various graphics. ISO Alter- IBM IBM code: ISO natives ASCII-8 ASCII-8 EBCDIC-8 EBCDIC (references) (2) (2) (3,5) (h) (5) (k) graphic 70 u d 21 5D A U S kl 21 kF 7C FC 6a 5D BD 5A kF 8e 6e BE 6e 6e Be 5F Al 5F Many terminal control software routines interpret the "best" meaning of characters. Thus on some North American timesharing systems a '!' at an ASCII terminal is converted to an EBCDIC ' | ' by the time it reaches the executing program. In such an environment or even in more benign environments there are then eight characters that can be produced by the system in response to some code that at one time was supposed to be ' | ' : ! ' ] ^ A U il d. Similarly, '-%' can be ~ n - t and p. This peregrination can render a character a poor choice for an important task in a representation language. Another "feature" of terminals and terminal support routines is the translation of lower case characters to upper case because time-sharing supervisors expect upper case commands. This will influence the choice of letter symbols and stropping techniques below. i The final set of hard-ware problems is concerned vd.th format effectors. In particular, should backspace and carriage return be permitted to instruct the compiler to reconstruct the input text image exactly as it would appear on a page produced by an ISO-code terminal? If so, bold stropping could be accomplished by backspacing and underlining, or even by returning the carriage and underlining. For a number of reasons, such composite characters are bothersome and must not be allowed: mechanically, backspace is a slow operation. For example an IBM Selectric© backspaces J+2^ slower than it forward spaces. This delay is enough to break typing rhythm and cause marginal discomfort and confusion. 1 many terminals do not interpret ASCII backspace correctly and replace the character after repositioning the typing element (or cursor). R. W. Bemer has conrplained about this in a letter to Datamation (8), but I do not expect his plea to stem the tide. o a compiler that interprets the printed image is not interpreting the characters in their sequential presentation. Error indications may not be correctly associated with the text, and even if so there may be no good clue as to how the text is stored in a file and should be modified. This problem would not be as bad if all editing were done interactively, but that is not always economical. ° some systems use only carriage return as a signal for end of line, so if the compiler were interpreting the image, all the characters would be on one line. (Certainly the compiler for such an installation would be more clever, but it is a curious thought.) IjO II. Five ALGOL 68 Symbol Set Suggestions 1. plus -i -times -symbol: '+*' Few devices provide 'J_', the reference language character for the plus -i -times -symbol. It can be constructed from overlaid characters, but these are a questionable recourse at best. The report also suggests i_, but this suffers stropping separation or, if reserved words are used, conflicts with the identifier 'i'. The Lindsey ISO-code representation proposes '!' and ' |_', but these encounter the ISO vertical bar uncertainty. In Algol Bulletin '^k W. Freeman (6) proposed that modulo-operation be represented by 'tX'. This has been accepted in the Revised Report, and a parallel construction suggests '+X' for the plus -i -times -symbol. More usually, this will be written as '+-^'; but note that this does not interfere with the x-asterisk family of diphthongs, because the family members '^•^' and "^*' already have standard meanings. Note: '+*' would also be used in transput to represent plus-i -times. Examples : 3+*5 u +* v (R 10.2.3.7J) (re a+re b) +^ (im a+im b) (R 11.1) (x > I rp+*ip I abs_ ip+*(y > | rp | -rp)) 2. of -symbol: '-<' At Los Angeles, '->' was removed as a representation of of -symbol because it points the wrong way - from son to parent, andbecause in practice it points the opposite way from the arrow in a similar construct in PL/I. Only of is left and it suffers extreme stropping separation for an operation that binds even more tightly than a monadic operator. One proposal for of-symbol is '.', but again this has the opposite meanings to the corresponding construct in PL/I. I derived an alternative to _of by starting from the is-an-element -of-symbol, 'e', and then considering ' -< ' and ^' . The latter has an excellent diphthong: '-<', so I propose that the Report list of and '%< ' while the "approved alternative representation" be '-<'. Note: It should cause little difficulty that '-C' is also the symbol for a photocathode. Examples: father -

p- l- c-pofb ? true ?: lofalofb ? true ? cofa>cofb) (RIO. 3. 5-2 a string) (y(j) in ( ref char c) : ( upb s=l?c:=s? incomp := true ) , (ref () char cc) : I ( upb s = upb cc - Iwb cc ? cc (:) := s (:) ? incomp := true ) , ( ref string ss): ss := s out incomp := true ) I II. i^. Disentangling U ', 't ', '-', and '[' Because few devices provide these graphics, the Report allows a profusion of alternatives. The down-symbol, up-symbol, tilde-symbol, and floor-symbol are variously specified to replaceable by the bold symbols down, shr, up, shl, 1 skip, n ot , Iwb, and entier ; also tilde may be synonymous with -i and up-arrow r with **. Unfortunately, the synonymity is context dependent. For example entier may replace the floor-symbol in [ (flex I al I xl) (l) but the same replacement cannot be made in [ (flex I al I xl). The intertwining of symbols for exponentiation, shift left;, and raise semaphore is even more complex. We can diagram the relations thus 13 raise semaphore An instance of u£ cannot readily be associated with the correct operation by either a reader or a compiler. Moreover, the hapless programmer is given little guidance as to how he might be able to write his program to avoid possible reader confusion. These synonymities were introduced to permit an ALGOL 68 program to be written on any device, but only one representation is required at any one time. It is not difficult to assign symbols to operations so no symbol represents more than one operation. In addition to reducing reader ambiguity, this assignment simplifies construction of encoders for transportation representations, If the problems are resolved as follows, programs can still be represented on any device, but the synonymity conflicts are removed. Where possible, non-stropped representations are also proposed. boolean negation skip lower bound entier not skip Iwb entier as in the Report no ~, the standard prelude does not use the tilde and has no problem no '[• no neither operation has a stronger claim to [ than the other, and it would be silly to allow entier to replace Iwb ; this restriction will not make programs appreciably longer lower semaphore raise semaphore shift left d own up shl as in the Report as in the Report shift right shr it is not intuitively clear why 'up' should mean 'left' and not 'right' ; I considered suggesting '<*' and '*>', but their meanings could be forgotten since shifts are infrequent 3Jf exponentiation t "^^ pow *^ will seldom be unavailable; t is kept because it looks right; pow will do if the others are imavailable. Examples: (in an assembler) op shl 8 or adrs (tautology test) ( not (2 pow k /= 2rl shl k) | skip | undefined) (binary search) proc bis ear ch = (() int a, int arg, default) int : ( int val := default; bit s i :- 2rl shl (bits width - l); while abs i > u pb a - Iwb a + 1 do i := i shr 1 od; int k := ab s i + Iwb a - 1; while abs i /= do i := i shr 1; (a(k) < arg | k := min (k+ abs i, upb a) I : a(k) = arg | val := k; i := bin | k -:= abs i) od; val) II 5. 'flip': true, 'flop': false For bool and bits, ALGOL 68 violates the principle that valid denotations should be valid transput values. As there seems to be no good reason for this, I propose the elimination of 'flip' and 'flop'. The value 'put' for true will be 'true'; that for false, ' false ' . 'Get' for a bool will read a tag and set the bool to true if the tag starts with 't', false for 'f, and call 'char error mended' if the tag starts with neither. The transput value for a bits will be a BITS -denotation. 15 III. The Design of the Hardware Eepresentation This section covers the points raised in section k of Lindsey's paper. The Representation is detailed in Appendix A_, and section III. 10 below contains some notes to explain that Appendix. III.l Letter tokens The central question here is whether to allow both upper and lower case letters in tags and the letters assigned denotation functions (a,b, c, d, e, f^ g,i,k, 1, n, p, q, r, s^x, y, z for radix, times ten to the power, and format markers). In fact few programmers will want to have tags differing only in the case of one or more letters. (is there really enough difference between 'ox' and 'OX'? Is 'scan' not 'Scan'?) Moreover, many line printers are normally operated with only upper case. Therefore, the basic answer to the central question is that internally the compiler will have only one case of letter symbols (which one is immaterial) . For string denotations and transput of strings, though, case will be distinguished. We now consider three categories of terminals and three contexts for tags and denotation function letters: UC/liC terminal, UC means bold tag and denotation function letters denotation function letters (a,b,c, d, e, f, r, t) input UC/lC* output LC UC/LC terminal, case not used for stropping UC/LC^ UC/LC^ LC UC terminal UC* UC* LC** *Converted to one case internally. **System software or terminal converts to UC. Appendix A lists lower case letters for ISO and upper case letters for EBCDIC. This is only because these characters sets are normally available on the corresponding equipment. Of the national variant characters in the last foirr ISO columns, only '@' and '~' are assigned a function in this hardware representation. The other eight are available for use as additional letter-ABC -symbols. When they are not so used they can serve, as suggested below, as style -TALLY -monads and other-string-items. 16 111. 2. Bold tags. This hardware representation is designed to be applicable to any stropping convention. To facilitate this, only one case is assvimed and no function is assigned to ' ' ' and '_'. Stropping is discussed in section IV. 111. 3. Composite characters. Outlawed as discussed in section I. 3' Ill.i^-. Carriage Return, Line Feed, and Delete. As Lindsey's paper suggests, CR, LF, CR/LF, LF/CR (my addition), form feed, and vertical tab all should terminate a line of input to the compiler. Delete, backspace, and all other control characters should be ignored. 111.5. Notes on Particular Representations The controversial representations in this design have been discussed in section II. A few further choices deserve comment here. 'e' for times -ten-to-the-power-symbol The Report implies that both 'e' and some other graphic (' „ ' or '\') will be available for this symbol. It does not seem valuable to have two or three graphics mean the same thing, especially where one is widely available and the others are not. (space) for space -symbol The Report proposes that both the space and visible space ('^') be available. The latter cannot be allowed because it would have to be composite on available equipment. Using underlines for visible spaces does not help because they run together and are no easier to count than spaces. '& ' for and-symbol This is reasonable and follows Lindsey's proposal. Iwb and entier for floor-symbol These two bold tags are not interchangeable alternatives. '-)' and '~' for not-symbol Section I has described the ridiculous confusion as to the location of these symbols, including the possibilility that in some circumstances not-symbol would be typed as 't ' . One solution would be, as with stick-symbol, to suggest an alternate representation or to abolish the graphics in favor of not . The situation is not as serious with not-symbol, however, because 17 not-synibol is only an operator and its interpretation is not crucial to interpretation of the structure of the program text. Hence, 't' and '~' are both allowed, depending on which character code is used. Note that, to avoid even further confusion, '~' is not a representation of the tilde-symbol and cannot be used in TAO's. ' ? ' for ' error char ' The report makes no proposal for this value. 'Error char' is used in transput to be the string value for a value that cannot be translated as specified. Is it not reasonable that this situation should be signalled with the ? ' ' for 'blank' This assignment corresponds to the decision to use (space) for space -symbol. '{' ... '}' for ISO brief -comment -symbol The desirability of this representation is shown by PASCAL programs. Some brief-comment-symbol is necessary because 'vf may not be available on all ISO terminals. There is no reason the compiler cannot insist that the left-brace start a comment and the right-brace end it. 111. 6 Other-string -it ems, other -pragment-i terns. The characters available for these will certainly vary depending on the code in use; I am not trying to define an implementation independent language. Essentially every character that can be typed should be allowed in these positions, except for the format effector characters, which only control the listing of the program. (Though a format effector could be a string value by means of transput or repr . ) Except for the quote -image -symbol every character in a string will represent itself; there will be no composite characters or representations of one character by several. 111. 7 Abs, repr, and conversion. Abs and repr should certainly be designed so the position of a character in the code is its abs value. In ISO, abs "A" = 65 and repr 65 = "A"; in EBCDIC abs "A" = 193 and repr 193 = "A". Moreover, in both codes the null character will be repr 0. As far as possible, programs should be written so as to be independent of the actual values associated with abs and repr . As the encoder in my transportation representation paper shows, this can be achieved with minimal effort except for control codes. Appendix A proposes predefined identifiers for the six format affector control codes. Lindsey's paper proposes two conversion regimes. 'Stand conv' transmits printable characters unchanged but ignores most control codes. The format codes would create the requisite spaces to position the text as it would appear on paper. Backspace would cause a call of 'char error' of the file. His other conversion was 'complete conv' which transmitted all characters IB unchanged. Ignoring these, Lindsey's paper contains a program that uses what is called 'special conv'. In fact this latter is an important conversion that should be available for system programmers and anyone else -who vra.nts to know exactly how the user encoded spaces (and wants to avoid the processing time required to skip over them). I would call this conversion 'layout encoded conv': printable characters and format effectors are transmitted intact, but other control codes are ignored. When 'get'ing characters, each code is delivered in turn. When 'get'ing strings, a line is terminated by any format effector except horizontal tab; the tenninating character is delivered as the last character of the line. If the line is terminated with CR/LF or LF/CR, both will appear at the end of the line. (This will depend on whether the operating system delivers both codes to the ALGOL 68 monitor. ) III. 8. Use of remaining characters. Wo virtue is attained by associating a standard meaning with every available character. Indeed, a few ought to remain available as style-TALLY- monads. The characters not yet assigned are (ISO) [ ] ! I - \ (EBCDIC) I ! jzf Of these the brackets should not be monads because they would normally be used in pairs as delimiters, but the others can be monads. It is unfortunate that the report makes no provision for style -TALLY -nomads because the given set {=,<,>,*, \ /) is limited and many of its combinations are in use. Indeed, of the characters above, it would not be unreasonable to specify that ' ! ', and '•^' could not be monadic, but could be nonads. With these new nomads there could be dyadic diphthongs like '*! ', and even Appendix A, however proposes no new nomads. 7-S III. 9. Representations with smaller character sets. When only a subset of the characters assumed above is available, bold tags should be used to make up for missing characters. if this graphic is missing I0 @ < > ? these bold tags can replace its uses over mod not at CO It le of si ge and then else elif 19 If worst comes to worst, colon and semicolon will be missing and must be replaced with iigly diphthongs as suggested in R9.^b. Compilers should implement these only if numerous devices at the installation lack the specified graphic. the graphic is replaced by += : plus to := : is /= : isnt (the second and third of these are no longer mentioned in the Report, but still do not lead to ambiguity. It can be remembered that semicolon is '., ' and not ',.' because real-denotations can start with '.' but cannot end with it, so that no ambiguity exists in '3.,5' for '3;5'«) In cases where standard graphics are absent but other symbols are available, the temptation to use the latter should be resisted with passion . The minimum character set for ALGOL 68 is thus: letters, digits, =, +, -, *, ,, ., (, ), $, /, ", and possibly some stropping character. III. 10. Guide to reading Appendix A. Some symbols in Appendix A are marked boldface by underlining them. They are to be keywords, reserved words, or stropped words, according to whatever convention is adopted. The first listed "representation" for a symbol should be used if possible. If no representation is possible (and the symbol is not intended as a MONAD), the first possible graphic listed in the "alternatives" column should be chosen. Up-syrabol and down-symbol have neither representation nor alternative and cannot be written in a program {see section 11.^}. Please note that "typographic display features" (spaces, new lines, and new pages) cannot divide the characters of a diphthong or bold tag. Numerals in braces refer to sections of this paper where decisions are discussed. Where no comment is made, the representation follows the suggestions of the Revised Report. 20 rv. stropping Recommendations ALGOL 60 invented the concept of indivisible symbols represented by boldface identifiers. ALGOL 68 extended this concept to allow the user to invent his own bold tags. In the Revised Report, it is made clear that these symbols are no longer indivisible, but are composed of ordinary letters possibly set off in some way to indicate their type face. The Report does not suggest how they are to be set off, but gives five examples IR9.I1-. 2. 2b} including one using script letters, a feature found on few existing computer transput devices. Implementations have adopted diverse stropping conventions; some have even attempted to implement several conventions with provision for pragmat ic selection of one at compile time. Such attempts at universality seem doomed to failure, however. Without much effort I was able to write a list of twenty different stropping conventions, most incompatible with two-thirds of the rest. Instead, the compiler writer should implement one convention. Moreover, each program will at one time or another be transput on every device in the installation, so the stropping convention chosen must be applicable to every device. Traditionally, ALGOL 60 programs have been stropped with apostrophes at the beginning and end of each boldface symbol. To me this convention seems unduly cluttering. I have always been struck by the clean aesthetics of ALGOL: short symbols, indentation emphasized as a tool, no semi -colon before else, ALGOL 68 's brief comment forms, the top-down structure of program texts. In this atmosphere, apostrophes seem as appropriate as neon lights in a Japanese garden. Examine 'begin' 'real' x; 'char' c; get (f, (c,x)); 'if c = "i" 'then' x+1 'else' x 'fi' 'end' Note that in the natural (Western) way of looking top-to-bottom and left -to -right, the first item the eye encounters is not a piece of information, but is the interspersive apostrophe. Indeed, the most important distinction between pairs of words is their first letters, but the first 'letter' of all apostrophied words is the same. Note also that a hurried programmer who used other languages might waste time over the confusion between 'if and "i"; not much time, but perhaps enough to lose his train of thought. Moreover, there is no one apostrophe stropping method. Lindsey lists three, and my list included five, only one of which has been made illegal by the Revised Report. Another stropping technique that has been proposed is underlining; it does not clutter and has a rather pleasant appearance. Its difficulties are not aesthetic, but operational: it takes longer to enter and revise a prograjn with underline stropping, many text editors and line printer routines do not support underlining. Mechanical backspacing is a slow operation. Finally, underlining constructs composite characters and suffers the disadvantages of listings that are not in one-one correspondence with the file. 21 Apostrophes clutter. Underlining has operational problems. I reject them both for stropping. (Blithely ignoring the derivation of the word 'stropping'.) What do I propose instead? Reserved words. However, many implementers will feel they must provide some stropping convention so I discuss below four conventions in order of increasing preference. IV. 1. Postfix underline is bold (least favored, but better than sixteen other schemes). The least intrusive stropping scheme is to append an underline to the end of a bold word. The reader can easily ignore the character, but it is there to resolve ambiguity if he needs it. A trained ALGOL reader scans the indentation before examining the text. With prefix stropping, most lines begin with the low-information-content strop character; with postfix stropping, the strop character is out of the way. Our earlier random program would look like this: begin_real_x; char_c; get (f, (x, c)); if_ c="i" then_ x+1 else_ x fi_ end_ Note that '_' alone is enough and no space is needed. One of the advantages of this notation to the programmer- cum- keypuncher is that the underline is typed in the same sequence that it would be drawn: last after the rest of the word. Thus written work could still underline bold words and the transliteration while keypunching is not onerous. In a small way, this convention is compatible with complete underlining of bold words. If backspaces are ignored and multiple underlines converted to a single underline, the compiler can accept "begin 3PPP3 " (where 'P' means backspace) as equivalent to "begin_". But the user could not type "bP_eP_gP_iP_nP_" and achieve the same effect. (Alternatively, the latter could mean begin and a non-letter be required to end a bold tag. This introduces undue stropping separation in, for example, "re of_ z".) Postfix underline stropping is poor for terminals with a non-escaping underline key. They would have to type "begin_ " which would be printed as "begin_". Presumably implementors for such terminals would choose some convention later in this section. IV. 2. Upper case is bold. One convention gaining wide use in Europe is case stropping: bold tag: in upper case and tags in lower case. BEGIN REAL x; CHAR c; get (f, (c, x)); IF c="i" THEN x+1 ELSE x FI END 22 If the tags are chosen to be real words, they look more like normal text if they are lower case. (Theoretically, no space woiild be required between "REAL" and "x", but it might be considered gauche to omit it. Consider "reOFz".) Case stropping is excellent, except that existing devices, especially line printers, often support only upper case. One could postulate combining this technique with underlining or apostrophes to be used on an upper case only device. Unfortunately, this fails because in systems with only one case, it is usually upper case; plain tags would be upper case and thus mistaken for bold. 17.3. Lower case or postfix underline is bold. A reversal of the previous convention allows combination of both the first two conventions. Bold words would be lower case (as they are in the Report) or would be followed by an underline: BEGIN_real X; char C; GET (F, (X,C)); if C="i" THEN_ X+1 else X fi end (The mixed case if-THEN_-else-fi looks unpleasant and is. I postulate that in this particular case it was forced on the programmer because he was fixing the "THEN_" at an upper case terminal.) Presumably, the user would use case distinction at a mixed case terminal and fall back on postfix underline at an upper -case -only terminal. An upper case line printer would print all as upper case, and the user would have to rely on context to distinguish bold from roman. The latter is never an arduous chore in a well written program. The only disadvantage with this third stropping convention is that the second, contradictory, convention has achieved considerable useage, IV. k. Reserved words are bold (most preferred technique). Few languages other than ALGOL have stropping conventions, most rely on reserved words or a language design that makes all distinctions determinable from immediate context. Many successful ALGOL compilers have avoided stropping. Several ALGOL 68 implementations (at least Vancouver, Dartmouth, Illinois Institute of Technology, and probably many more) have worked out techniques to minimize or eliminate stropping. Here is a random program without stropping: begin real x; char c; get (f, (x, c)); if c = "i" then x+1 else x fi end 23 Is there any reason to believe that the human reader should have difficulty distinguishing bold from roman in this program? One interesting point is the fact that the Eevised Report lists 20^ more predefined reman words than bold tags. To be sure, the roman words can all be redefined with far less penalty than a redefinition of if; but still they constitute a large class of identifiers the user must remember. In the past, automatically generated parsers have behaved poorly when confronted with a reserved word used as an identifier. There is no inherent need for this to be the case, and work such as that reported by Graham and Rhodes (7) is showing how to avoid these problems. What is to be done about identifiers that contain a reserved word as an integral part, for example "year to date"? It is possible to forbid this, but that leaves too many opportunities for the programmer to forget that something is a reserved word. After all, the entire identifier is far from any reserved word. Instead, I suggest that underlines replace blanks in identifiers. Where an identifier is continued from one line to the next, it may have embedded blanks, but it must end with an underline on the first line or begin with an underline on the second line. Either year_to_ date or year_to _date will be acceptable renderings of "year_to_date " . In ALGOL 68 a user can declare arbitrary tags to be bold face. There are three reasonable techniques for dealing with these. (l) Such tags would be iinstropped and treated as bold only in the block where they were so declared. This technique implies that the token scanner will analyze the block structure, a good idea because it can aid automatic correction of bracket errors. (2) They could be unstropped, but treated as bold throughout the compilation. At least one implementation (IIT) has chosen this route. (3) They could be stropped in some way. Underlines serve as spaces in roman tags, so they cannot be used; apostrophes at both ends are unnecessary; postfix apostrophes are almost as unobtrusive as postfix -underlines; so I propose that a postfix apostrophe be used. Much as I dislike stropping of standard bold tags, I recommend the third method of distinguishing user created bold tags. They will certainly be less used than syntactic words like if and real, and usually they will not appear more than once in a phrase. More critically, they will be unfamiliar to a reader of the program, so he deserves to have them set off in some manner. Here is a part of a program: 2k mode token' = struct (int type, string val, rtok'next), rtok' = ref token'; rtok'toklist := nil, eop := heap token'; ref rtok'tokput := toklist; sema tokens_ready = level 0, may_move_tokens = level 1; Note that in its use as a mode or a monadic operator, a user defined bold tag is separated from its object by the apostrophe. It can be viewed as a notation that the thing to its left operates on the thing to its right. Moreover, a postfix apostrophe is somewhat like normal usage where an apostrophe may appear near the end of a word. Some emergency method of stropping may be necessary anyway to handle cases where a tag is declared both bold and roman. In such cases, it would be assumed roman unless stropped or unless the syntactic position demanded that it be bold. This mechanism is one way to solve the problem of 're' and re_ and 'im' and im. (Writing this, I have come to worry about "re" and friends. Why do both operator and field selector exist? Synonymously, is compl a primitive mode or is it a struct ? If it is a mode then we need only operators to process it and an operator to construct objects of that modes (which we have: '!')• Viewed thus, it makes no sense to assign a value to part of the primitive object "z". On the other hand, given the stropping separation of re and im and given "-<" for of^ it is just as reasonable to eliminate re and im from the language.) Final thought: I believe an efficient compiler can be written so that no more than 29 words are reserved: BEGIN, BY, CASE, CO, COMMENT, D0, ELIF, ELSE, END, ESAC, FI, FLEX, FOR, FROM, IF, IN, MODE, OD, OP, OUSE, OUT, PR, PRAGMAT, roiO, PROC, REF, STRUCT, THEN, TO, UNION, WHILE. With these words, the structure of the text can be determined. The other bold words in the Report could be redeclared, but could not be used in their bold sense within that block. I offer this limited-reserved-words approach as a challenge to parser implementers. 25 References (1) van Wijngaarden, A., et al.. Almost the Revised Report on the Algorithmic Lanugage ALGOL 68, private communication^ (SeptV7l9T3 ) . This version is slightly more recent than the version considered at Los Angeles and includes most of the corrections agreed on there. (2) Lindsey, C. H., 'An ISO-Code Representation for ALGOL 68', ALGOL Bulletin 31 (March, 1970), pp. 3T-6o. (3) ANSI, 'Data Communication Control Procedures for the USA Standard Code for Information Interchange', CACM 12, 3 (March, I969), PP- I66-I78. ASCII is listed in Appendix-E. {k) IBM Corp., IBM System/360 Principles of Operation, Order No. GA22 -6821-8, 1970, pp. 150.2-150.3. (Note that the graphic '.' has been omitted from position 'i|B' of the document.) (5) ANSI, 'Correspondences of 8-Bit and Hollerith Codes for Computer Environments - A USASI Tutorial', CACM 11, 11 (Nov., I968), pp. 783-789. Corrected in CACM 12, 5 (May, I969), p. 29^^. (6) Freeman, ¥., 'Suggestions regarding certain representations in ALGOL 68', ALGOL Bulletin 3h (July, 1972), pp. kl-kh. (7) Graham, S. L. and S. P. Rhodes, 'Practical Syntactic Error Recovery in Compilers', Conference Record of ACM Symposium on Principles of Programming Languages, Boston, Massachusetts (Oct., 1973)^ PP- 52-58. (8) Bemer, R. W., 'Backspace Bungle', Datamation 19, 9 (September, 1973)^ p. 25. 26 Appendix A. Proposed Hardware Representation Where there is a difference between the ISO and EBCDIC representations, the respective parts are labeled (iSO) and (EBCDIC). 8.1.4 Character denotations d) other string item : style i monad {942b}; {111.6} (iSO) ~ [ ] { ] (and upper case letters) (EBCDIC) -I (non-printing codes above 'space'; lower case letters). 9.1.1 Syntax (111.3} IF :: choice using boolean. CHOICE brief start : open token. IF bold start : if token. CASE bold start : case token. IF brief in : decision token. IF bold in : then token. CASE STYLE in : in token. IF brief again : again token. IF bold again : else if token. CASE bold again : out case token. IF brief out : decision token. IF bold out : else token. CASE STYLE out : out token. CHOICE brief finish : close token. IF bold finish : fi token. CASE bold finish : esac token. {there is no 'CASE brief again'. The provisions above can be diagrammed: start in again out finish IF brief CASE brief IF bold CASE bold ( 9 ?: 9 ) ( in X out ) if then elif else fi case in ouse out esac out esac ] 9.2 Comments and pragmats d) STYLE other FRAGMENT item : quote symbol; other string item (except not STYLE-PRAGMENT-symbol). {111.6} 27 9.'4-.2 other TAX symbols a) style i letter ABC : [if case is not used for stropping] tlH-l} (ISO) upper case of letter ABC symbol (EBCDIC) lower case of letter ABC synibol (internally all STYLE-letter-ABC ' s are converted to one case}. b) style i monad : (ISO) ! \ | '^ (EBCDIC) \ \ i 9.^.2.2 Representation [111.8] {IV] b) Stropping. {This representation is compatible with all proposed stropping conventions and favors none (pun intended).] 9.4.1 Representations of symbols a) Letter symbols (III.l] symbol (ISO) (EBCDIC ) symbol (ISO) (EBCDIC ) letter a symbol a A letter n symbol n N letter b symbol b B letter symbol letter c symbol c C letter P symbol P P letter d symbol d D letter q symbol q Q letter e symbol e E letter r symbol r R letter f symbol f F letter s symbol s S letter g symbol g G letter t symbol t . T letter h symbol h H letter u symbol u U letter i symbol i I letter V symbol V V letter j symbol J J letter w symbol w W letter k symbol k K letter X symbol X X letter 1 symbol 1 L letter y symbol y Y letter m symbol m M letter z symbol z Z b) Denotation symbols symbol representation digit zero symbol digit one symbol 1 digit two symbol 2 digit three symbol 5 digit four symbol k digit five symbol 5 digit six symbol 6 digit seven symbol 7 digit eight symbol 8 digit nine symbol 9 point symbol • times ten to the power symbol (ISO) e (EBCDIC) E (III. 5] 28 symbol true symbol false symbol quote symbol quote image symbol space sjrmbol comma symbol empty symbol c ) Operator symbols symbol or symbol and symbol ampersand symbol differs from symbol is less than symbol is at most symbol is at least symbol is greater than symbol divided by symbol over symbol percent symbol window symbol floor symbol ceiling symbol plus i times symbol not symbol tilde sj.Tribol down symbol up symbol plus symbol minus symbol equals symbol times symbol asterisk symbol assigns to symbol becomes symbol representation true false (space) {111.5} empty representation alternates & < > / (EBCDIC) "• or & and (III. 5 ( /= It ne <= >= le St i over elem Iwb, upb entier {II. ^ {11.^} {11.1} III. 5] (ISO) ~ not [Tl.h, III. 5] {II. i^} > e£ ■X- (Although they are not listed in S.k, the following are defined in chapter 10.} exponentiation operator modulo operator plus and becomes operator minus and becomes operator times and becomes operator divided by and becomes operator over and becomes operator modulo and becomes operator plus to operator shift left operator shift rif^ht operator raise semaphore operator lower semaphore operator + ■X- 9 pow t mod plusab minusab time sab divab overab mo dab plus to shl shl "^own (11.^} ; (11.^} 29 d) Declaration symbols As in the Revised Report e) Mode standards As in the Revised Report f) Syntactic symbols symbol bold begin symbol bold end symbol brief begin symbol brief end symbol and also symbol goon symbol completion symbol label symbol parallel symbol open symbol close symbol decision symbol again symbol if symbol then symbol else if symbol else symbol fi symbol case symbol in symbol out case symbol out symbol esac symbol colon symbol brief sub symbol brief bus symbol style i sub symbol style i bus symbol up to symbol at symbol is symbol is not symbol nil symbol of symbol routine symbol go to symbol go symbol skip symbol formatter symbol representation begin end ) par ? : if then elif else fi case in ouse out esac @ nil -< goto go skip } {11.3} at is isnt of CII.3} 30 g) Loop symbols No change from Report h) Fragment symbols symbol brief comment symbol bold comment symbol style 1 comment symbol style ii comment symbol bold pragmat symbol style i pragmat symbol 10.2.1 Environment enquiries representation (ISO) (...} comment CO # pragmat HI p) int max abs char = (iSO) 127 (EBCDIC) 255; q) char null character = repr 0; r) char flip = "t"; s) char flop = "f"; t) char errorchar = u) char blank = " " v) char horizontal tab = repr ((EBCDIC) 5 (iSO) 9), backspace = repr ((EBCDIC) 22 (iSO) 8), carriage return = repr 15, line feed = repr ((EBCDIC) 57 (iSO) 10 ), vertical tab = repr 11, form feed = repr 12; 10.6.1. Library Preludes a) proc complete conv = (ref book b) conv: (conv c; for i from to max abs char do c); (aleph of c) (i) := ( repr i, repr i) od ; {111.5} b) proc layout encoded conv = ( ref book b) conv : # characters to be ignored are set to null # ( conv c; for i from to_ abs blank - 1 do (aleph of c"5 {i) :- (null character, repr i) od ; for i to 6 do char ch = (i in horizontal tab, backspace, carriage return, line feed, vertical tab, form feed) ; (aleph of c) ( abs ch) := (c, c) od ; for i from abs blank to max abs char do c); (aleph of c ) (i) := ( repr i, repr i) od; OGRAPHIC DATA r 1. Report No. UIUCDCS-R-75-607 2. 3. Recipient's Accession No. c and Subt itle A Revised ALGOL 68 Hardware Representation for ISO-code and EBDCID 5. Report Date November, I973 6. ior(s) Wilfred J. Hansen 8. Performing Organization Rept. No. orming Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 10. Project/Task/Work Unit No. 11. Contract/Grant No. insoring Organization Name and Address Department of Computer Science 13. Type of Report & Period Covered University of Illinois Urbana, Illinois 14. jplementary Notes stracts Because of the latitude allowed by the Revised AIGOL 68 Report, each ementation has a slightly different representation for the constructs of the uage. This diversity can only lead to confusion as ALGOL 68 trained individuals they need readaptation to program at a new installation. The solution osed here is to develop a single hardware representation which can be used any computer systems. In fact this representation can conveniently be designed g only the intersection of the graphic characters available in the ISO code EBCDIC . The paper also proposes comfortable new representations for a few symbols discusses the thorny problem of distinguishing bold face words. y Words and Document Analysis. 17o. Descriptors ALGOL68, hardware representation, program interchange, symbols, characters, bold face letters, ASCII, ISO-code, EBCDIC lentifiets/Open-Ended Terms OSATI Field/Group 'liability Statement 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price <5 (10-70) USCOMM-DC 40329-r>71 'i. UNivEBsrrv of Illinois ubbana 3 0112 064441527 • '" " '>■ "-i -^f-^ ^ii ' * i i i ... ' ' - C-t' W Iri .■'....• •■.!•■■• '^-WM