^U^
i ' ».
LIBRARY OF THE
UNIVERSITY OF ILLINOIS
AT URBANA-CHAMPAIGN
510. 84-
no. 60V GI2
cob. 2
CEMIRAl CIRCUUTIOM AMD BOOKSTACKS
r person >J---|rarof /et™
responsible or -t^ J™™ ^^,„„. You
APR y. m
When renewing by phone. «rlt. new due dale
below previous due date.
" 7
UIUCDCS-R-T5-607
Ho. o2^
A Revised ALGOL 68 Hardware Representation
for ISO-code and EBCDIC
November, 1975
by
Wilfred J. Hansen
THE LIBRARY OF THE
JAN 9 1974
UNIVERSITY OF ILLINOIS
AT ')F?P ■'N' ■^. • ■ • -onipM
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
URBANA, ILLINOIS
uiucDCS-R-73-607
A Revised ALGOL 68 Hardware Representation
for ISO-code and EBCDIC
by
Wilfred J. Hansen
November, 1973
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, Illinois
This work was supported by the Department of Computer Science
Digitized by the Internet Archive
in 2013
http://archive.org/details/revisedalgol68ha607hans
310. /¥
if Annotated Table of Contents
Page
I. Design Considerations for AlfiOL 68 Representations
A philosophical discussion of the problems that
complicate representation design 1
1.1 Psychological considerations 1
1 . 2 Deci sions demanded by the Report 3
1.3 Hardware Considerations
Including tables of ISO-code and EBCDIC k
II. Five ALGOL 68 Symbol Set Suggestions 10
II. 1 plus -i -times -symbol: '+*' 10
II. 2 of-symbol: '-<' 10
II. 3 stick-symbol, again-symbol, or-symbol 11
11.^ Disentangling U ', 't % '-', and '[ ' 12
II. 5 'flip': true, 'flop': false lit-
III. The Design of the Hardware Representation
Using characters available in both ISO and EBCDIC I5
III . 1 Letter tokens I5
111. 2 Bold tags 16
111 . 3 Composite characters 16
III.i^- Carriage Return, Line Feed, and Delete I6
III. 5 Notes on Particular Representations I6
111. 6 Other-string-items, other-pragment -items I7
111. 7 Abs, repr, and conversion 17
111. 8 Use of remaining characters I8
111. 9 Representations with smaller character sets I8
Ill
page
III . 10 Guide to reading Appendix A I9
rV. Stropping Recommendations
Stropping clutters, but if you must strop, use case
shift or a post-fix underbar 20
IV. 1 Postfix underline is bold (least favored, but better
than sixteen other schemes) 21
IV. 2 Upper case is bold - 21
IV. 3 Lower case or postfix underline is bold 22
IV.k- Reserved words are bold (most preferred technique) 22
References 25
Appendix
A. Proposed Hardware Representation 26
IV
Abstract
Because of the latitude allowed by the Revised ALGOL 68 Report,
each implementation has a slightly different representation for the
constructs of the language. This diversity can only lead to confuse
as ALGOL 68 trained individuals find they need readaptation to programs
at a new installation. The solution proposed here is to develop a single
hardware representation which can be used on many computer systems. In
fact this representation can conveniently be designed using only the
intersection of the graphic characters available in the ISO code and
EBCDIC.
The paper also proposes comfortable new representations for a few
symbols and discusses the thorny problem of distinguishing bold face words,
V
Preface
I didn't want to write this paper.
After the Los Angeles meeting of WG2.1 approved the Revised Eeport on
the Algorithmic Language ALGOL 68 (l) I decided to spend two leisurely days
designing a transportation representation for the language. It wasn't
that easy (hut I did it, it's another paper). Because the transportation
representation is an encoding of ALGOL 68 program texts, an adequate
explanation requires sample encoder and decoder programs. These must
assume some specific hardware representation. Lacking a compiler I
decided I could quickly design a suitable hardware presentation language
of my own.
Older now, but wiser, I offer the following.
Sections three and four explain the hardware representation and sections
one and two explain why I chose it. To a large extent, the sections can be
read independently and the recommendations of one adopted without adopting
any other. In particular, the transportation representation in no way
depends on this hardware representation.
vn
Acknowledgements
This paper has been written with continuous reference to 'An ISO-Code
Representation for ALGOL 68* by C. H. Lindsey (2). I am indebted to
J. E. L. Peck for introducing me to ALGOL 68 and patiently explaining
obscurities I encountered.
I. Considerations in the Design of Hardware Representations
Now that the Revised Report has been approved by WG2.1, it is appropriate
to reconsider the question of hardware representations of the language.
(Hereafter, the Revised Report will be referred to as the Report; if there
were any references to the earlier Report, they would specify the Original
Report. Most remarks apply to both, anyway.) The Report has been written
with the thought that it will be implemented on a wide variety of hardware
with vastly differing character sets. As a consequence, it is not
particularly specific as to how any construct will be represented on (say)
cards.
Rather than envision the possibility of a tower of Babel of
representations, I suggest that there is in fact a widely available
set of graphics with which all constructs of the language can be
represented. Selection of a widely available character set will bring
these benefits :
o as trained AlfiOL 68 programmers move from implementation to
implementation, they will be able to begin programming and reading programs
immediately and without confusion.
o as programs are sent from one installation to another, they will be
understood without lengthy explanation and constant referral by the reader
between the text and a codebook.
o many installations have a variety of devices with different character
sets. Only by choice of a widely available representation will programmers
be able to access the files containing their programs from all these devices,
including both terminals and line printers.
o ALGOL 68 will more readily be accepted by outsiders as a single
language and not a collection of similar languages.
It must be recognized that at the majority of installations, and
especially in North America, ALGOL 68 will not be the primary language in
use. For this reason it cannot be expected that the operating system will
have provisions especially suited to the language. In particular, for many
years ALGOL 68 by itself will not be a strong force determining the nature
of character sets provided as standard by manufacturers.
The many considerations that affect design of a representation can be
categorized into psychology, ALGOL 68, and hardware. These topics are
covered in the remainder of section I. Section II suggests a few
representations for symbols that may be controversial. It is important to
note that the representation finally arrived at can be adjusted to take
into account rejection of any of these suggestions. Section III details
the decisions made in the rest of the representation and section IV
discusses the sticky question of representations for bold-tags.
I.l Psychological considerations.
Care must be exercised in the design of representations for a number of
general psychological reasons :
a) There are many possible sources of small confusions in representation
design: odd. characters, context dependent usage, dissimilar usage in similar
contexts in similar languages, breaks in typing rhythms, and more. Each
instance of confusion may be only a minor annoyance, or it may interrupt a
train of thought and result in omission of critical phrases. Moreover,
confusions may have a cummulative effect that can lead to frustration and
breakdown in communication.
b) To some extent the physical representation and not the abstract
'strict language' is the medium of thought. This is true when writing a
program and even more so when reading a program. Consequently, variations
between physical representations ought to be carefully restricted.
c) When writing a program, a trained programmer writes by reflex and
concentrates instead on the task at hand. For small changes of representation
when changing installations, the retraining period is probably small, but
if it can be avoided, everyone benefits.
d) Representations should be chosen with an eye to the representations
used in other fields and in other languages. The task of attracting ALGOL 68
users is sufficiently difficult without repelling them with unusual symbol
choices. (The Report has excellent symbol choices, I worry about implementations
Beyond these general considerations, the designer must keep in mind a
number of human factors effects :
a) Some reasonably consistent aesthetic should be followed in the design
to assist in readability. The aesthetic of the ALGOL 68 reference language is a
pleasant combination of natural language and mathematical conventions. It seems
characterized by economy of expression and avoidance of clutter.
b) Simultaneous with aesthetics, the designer must strive for understand-
ability, unmistakability, and clarity.
c) A specific problem is that operators that bind closely ought to take
less space than those that bind more loosely. In this regard, a bold tag used
as an operator can suffer "stropping separation". 'This includes not only the
length of the tag, but also the character (s) needed to strop it and any
necessary blanks.
d) Another factor is the length of the text. Too short a text may
correspond to a program that has been abbreviated beyond reason; but too long
a text slows both the writer and the reader. A longer length for an
infrequently used operator is acceptable because it will not substantially j
affect the length of the program. . \
e) No one is aided if an implementation provides too many alternative
ways of expressing a single construct. Programmers are constantly forced to [
choose among alternatives; readers must be prepared to encounter that many
more symbols. When text includes rare alternative form, the reader may ,
remember it erroneously; he will at least have to interrupt his reading to
try to recall the symbol.
Selection of specific symbols must depend on yet other factors:
a) The existence of groups of potential programmers trained in the
meaning of a symbol. For example, '+' ought to mean addition because most
potential users have studied algebra.
b) Use of the symbol in other widespread languages. Conflicting usage
could be a source of confusion.
c) Relation of a symbol to its meaning ("graphic onomatopoeia"). For
example, parentheses, braces, and brackets do appear to surround their
contents.
d) The possibility of a confusing similarity between two graphics.
Lindsey points out, for example, that '+' and '«•' appear alike on teletypes.
1.2. Decisions demanded by the Eeport.
In a number of areas, the Report leaves considerable latitude to
accommodate implementations with varying character sets. The representation
must specify what is allowed for other-string-item, STYLE -other-PRAGMENT-item,
style-TALLY-letter-ABC, and style -TALLY -monad (R9.J4-a}. At least one means
must be provided to write any operator in the standard prelude {R9.^b},
decisions must be made as to the values of abs, repr, 'null character',
'error char', 'flip', 'flop', 'blank', and 'max abs char' [RIO. 2.1}, and a
conversion must be associated with 'stand conv' {10. 5.1. 2d}.
In 9«^ t), the Report accepts an implementation even if it provides
only one of the alternatives for each symbol (say either '@' or at_). The
intention, in fact, seems to be to accept any representation language that
provides at least one way of expressing each construct in the language.
Thus an implementation need not necessarily have both stick-symbol and
then-symbol, since with either one a choice-clause can be constructed
{R9.1.1 c,h}. Similarly, a times -ten -to -the -power -symbol might be omitted
since some letter-e- symbol will usually be available [R8. 1.2.1 h}.
Selection of representations for standard prelude operators is complicated
by the fact that not only do many operators have a number of symbols, but
many symbols are assigned more than one function. Some way must be found
through the mazes of relationships among uses and alternatives for 't ', '~',
and '['. Likewise, there are some complications if other than the reference
representations are chosen for any of the multiple symbols that map into
certain representations (for example, four symbols map into ':'). Some
decision must be made as to whether the implementation will accept the
"allowable" alternatives like '..' for ':'„ [R^.kh]
Occasionally, a representation designer will be forced to consider use
of a diphthong for some operation. In this effort, he must check the
operator grammar in 9«^'2.1 of the Report to see that the diphthong is legal
and thus will not cause ambiguity. A secondary consideration is to try to
leave intact the possibility of families of operators. For example, diphthongs
ending in '/' should be avoided because they are all available for a family:
V V // -/ ^/
Indeed, this is the family of APL reduction operators.
Finally, 9«^«2.2 b specifies that a bold-tag is composed of marks
corresponding to its LETTER'S and DIGIT'S, where the 'mark corresponding
to each LETTER ([or] DIGIT) is similar to the mark representing the corresponding
LETTER-symbol ([or] DIGIT symbol)'. Interpretation of the word 'similar' has
led to a variety of "stropping techniques". The representation designer must
choose one of these techniques. My own suggestions will hinge on the
observation that nothing is more similar to an object than itself, but see
section IV for the gory details.
1.3 Hardware Considerations.
Two standard codes for computer text have been defined and widely
used: the ISO code (2) (and especially its ASCII subset (3^5)) and
EBCDIC (^,5). The former are used by most of the world, and the latter
is used by only one manufacturer. Tables defining these two codes are
reproduced in figures 1, 2, and J* Note that ISO has many national variants
and the code provides spaces where national groups can place characters
specific to their own needs.
An ALGOL 68 program will certainly not be given the same binary encoding
in the two codes, since, for example, '+' is '00101110' in EBCDIC and '0101011'
in ISO. However, in an important sense one can make programs in the two
codes appear similar; the graphics chosen to represent each ALGOL 68 construct
can be the same in both codes .
Examining the code tables, we see that ISO has a number of characters
that will not be the same on every terminal, the so called "national
characters". These interfere with a common graphic representation of
ALGOL 68 so they should be avoided. Leaving them aside, the following
characters are available in both codes:
upper and lower case letters : a-z A-Z
digits : 0-9
space ! " # $ f^ p, ' ( ) * + ^ - • /
: ; < = > ? @ _
The following control characters appear in both codes and ought to
be considered in design of a representation:
BS
backspace
HT
horizontal tab
CR
carriage return
LF
linefeed
DEL
delete
MJL
null
FF
form feed
VT
vertical tab
+ 16
32
i^8
Gk 80
112 128 ihk l6o 176 192 208 22^ 2I+0
NUL
DLE
DS
space
&
-
1
SOH
DCl
SOS
/
a
3
A
J
1
2 i
i
STX
DC 2
FS
SYN
b
k
s
B
K
S
2 1
3 '
ETX
TM
c
1
t
C
L
T
3 1
i. '
FF
RES
BYP
PN
d
m
u
D
M
U
^ '
5
HT
NL
LF
RS
e
n
V
E
N
V
1
5 !
6
LC
BS
ETB
UC
f
w
F
W
1
6 ■
7
DEL
XL
ESC
EOT
'
g
P
X
G
P
X
7 :
8
CAN
h
i
q
r
y
z
H
I
Q
R
Y
Z
8 '
9
SMM
EM
cc
SM
/
j
, — ^
l
• i
9
10
11
VT
GUI
CU2
CU3
•
$
}
# i
1
i
i
[
12
FF
IFS
DCi^
<
*
I0
1
@ :
1
i
13
CR
IGS
ENQ
NAK
(
)
' !
1
li^
15
SO
SI
IRS
lUS
ACK
BEL
SUB
+
•
-I
>
9
tt
;'
Figure
; 1.
Extended Binary-
■Coded-
■Decimal Interchange Code
(EBCDIC )
(Chart adapted from (k).)
0+
16+
32+
k-8+ 6k+ 80+
96+ 112+
1
2
3
1^
5
6
7
8
9
10
11
12
13
11+
15
NUL
DIE
space
(@)
P
n
P
SOH
DCl
1
■ 1
A
Q
a
q
SIX
DC 2
IT
2
B
R
b
r
ETX
DC5
(£) #
3
C
S
c
s
EOT
DCi^
$
i^
D
T
d
t
ENQ
NM
i
5
E
U
e
u
ACK
SYN
&
6
F
V
f
V
BEL
ETB
t
7
G
W
g
w
BS
CAN
(
8
H
X
h
X
HT
EM
)
9
I
Y
i
y
LF
SUB
■)«•
•
J
Z
J
z
VT
ESC
+
5
K
([)
[
k
{
KF
FS
^
<
L
\
1
CR
GS
-
=
M
(])
]
m
}
SO
RS
•
>
N
(-)
/-\
n
(")
SI
US
/
9
—
DEL
Figure 2. ISO 7 -bit Coded Character Set
(also known as ASCII)
Some codes are not officially assigned graphics, but the
preferred alternatives are shown in parentheses. The right
hand side of columns 32, 80, and 112 show the alternatives
chosen for ASCII. (ISO chart taken from (2).)
35 61^ 91 92 93 9^ 96 123 121+ 125 126
ASCII
#
@
[
\
]
y\
-
[
]
Australia
#
@
[
\
]
/\
^
{
]
Denma.rk, Finland,
A
6
•
A
a
b
•
a
Norway, Sweden
France 1
£
a
o
9
§
/N
^
e
u
e ~
2
[
\
]
W. Germany 1
#
@
[
\
]
/\
«»
1
}
2
£
S
A
U
a
o
a p
Japan 1
#
@
[
^
]
■^
V
1
1
}
2
£
#
1
New Zealand
@
[
]
-\
"*
-
i
United Kingdom 1
£
@
[
\
]
t
^
{
}
1 2
1
10
3
1 1
1
Figure 3- ISO Alternatives Adopted by Various Countries
(Taken from Lindsey (2).)
8
A few comments on individual characters:
'#' is not ISO standard, but is the ASCII choice for position 35 in
ISO. The alternative is '£', a currency symbol as is ';^'.
•@' is not ISO standard, hut is the preferred alternative for position
6k and is only infrequently assigned other graphics.
' I ', '! ', '-', and '-'
Extended discussion of these follows.
The manufacturer supporting EBCDIC has helped introduce endless confusion
into the codes for these graphics. The following charts illustralTe this by
giving the hexadecimal location in the code of various graphics.
ISO Alter- IBM IBM
code: ISO natives ASCII-8 ASCII-8 EBCDIC-8 EBCDIC
(references) (2) (2) (3,5) (h) (5) (k)
graphic
70 u d
21
5D A U S
kl
21 kF
7C FC 6a
5D BD 5A
kF
8e
6e
BE
6e
6e
Be
5F
Al
5F
Many terminal control software routines interpret the "best" meaning of
characters. Thus on some North American timesharing systems a '!' at an
ASCII terminal is converted to an EBCDIC ' | ' by the time it reaches the
executing program. In such an environment or even in more benign
environments there are then eight characters that can be produced by the
system in response to some code that at one time was supposed to be ' | ' :
! ' ] ^ A U il d. Similarly, '-%' can be ~ n - t and p. This
peregrination can render a character a poor choice for an important task
in a representation language.
Another "feature" of terminals and terminal support routines is the
translation of lower case characters to upper case because time-sharing
supervisors expect upper case commands. This will influence the choice of
letter symbols and stropping techniques below.
i
The final set of hard-ware problems is concerned vd.th format effectors.
In particular, should backspace and carriage return be permitted to instruct
the compiler to reconstruct the input text image exactly as it would appear
on a page produced by an ISO-code terminal? If so, bold stropping could be
accomplished by backspacing and underlining, or even by returning the
carriage and underlining. For a number of reasons, such composite characters
are bothersome and must not be allowed:
mechanically, backspace is a slow operation. For example an IBM
Selectric© backspaces J+2^ slower than it forward spaces. This delay is
enough to break typing rhythm and cause marginal discomfort and confusion.
1 many terminals do not interpret ASCII backspace correctly and replace
the character after repositioning the typing element (or cursor). R. W. Bemer
has conrplained about this in a letter to Datamation (8), but I do not expect
his plea to stem the tide.
o a compiler that interprets the printed image is not interpreting the
characters in their sequential presentation. Error indications may not be
correctly associated with the text, and even if so there may be no good
clue as to how the text is stored in a file and should be modified. This
problem would not be as bad if all editing were done interactively, but that
is not always economical.
° some systems use only carriage return as a signal for end of line,
so if the compiler were interpreting the image, all the characters would be
on one line. (Certainly the compiler for such an installation would be
more clever, but it is a curious thought.)
IjO
II. Five ALGOL 68 Symbol Set Suggestions
1. plus -i -times -symbol: '+*'
Few devices provide 'J_', the reference language character for the
plus -i -times -symbol. It can be constructed from overlaid characters, but
these are a questionable recourse at best. The report also suggests i_,
but this suffers stropping separation or, if reserved words are used,
conflicts with the identifier 'i'. The Lindsey ISO-code representation
proposes '!' and ' |_', but these encounter the ISO vertical bar uncertainty.
In Algol Bulletin '^k W. Freeman (6) proposed that modulo-operation be
represented by 'tX'. This has been accepted in the Revised Report, and a
parallel construction suggests '+X' for the plus -i -times -symbol. More
usually, this will be written as '+-^'; but note that this does not
interfere with the x-asterisk family of diphthongs, because the family
members '^•^' and "^*' already have standard meanings. Note: '+*' would
also be used in transput to represent plus-i -times.
Examples : 3+*5 u +* v
(R 10.2.3.7J) (re a+re b) +^ (im a+im b)
(R 11.1) (x > I rp+*ip I abs_ ip+*(y > | rp | -rp))
2. of -symbol: '-<'
At Los Angeles, '->' was removed as a representation of of -symbol
because it points the wrong way - from son to parent, andbecause in practice
it points the opposite way from the arrow in a similar construct in PL/I.
Only of is left and it suffers extreme stropping separation for an operation
that binds even more tightly than a monadic operator. One proposal for
of-symbol is '.', but again this has the opposite meanings to the
corresponding construct in PL/I. I derived an alternative to _of by
starting from the is-an-element -of-symbol, 'e', and then considering ' -< '
and ^' . The latter has an excellent diphthong: '-<', so I propose that
the Report list of and '%< ' while the "approved alternative representation"
be '-<'. Note: It should cause little difficulty that '-C' is also the
symbol for a photocathode.
Examples: father -
p- l- c-pofb ? true
?: lofalofb ? true
? cofa>cofb)
(RIO. 3. 5-2 a string)
(y(j) in
( ref char c) :
( upb s=l?c:=s? incomp := true ) ,
(ref () char cc) : I
( upb s = upb cc - Iwb cc ? cc (:) := s (:) ? incomp := true ) ,
( ref string ss): ss := s
out incomp := true ) I
II. i^. Disentangling U ', 't ', '-', and '['
Because few devices provide these graphics, the Report allows a profusion
of alternatives. The down-symbol, up-symbol, tilde-symbol, and floor-symbol
are variously specified to replaceable by the bold symbols down, shr, up, shl, 1
skip, n ot , Iwb, and entier ; also tilde may be synonymous with -i and up-arrow r
with **. Unfortunately, the synonymity is context dependent. For example
entier may replace the floor-symbol in
[ (flex I al I xl) (l)
but the same replacement cannot be made in
[ (flex I al I xl).
The intertwining of symbols for exponentiation, shift left;, and raise
semaphore is even more complex. We can diagram the relations thus
13
raise semaphore
An instance of u£ cannot readily be associated with the correct operation by
either a reader or a compiler. Moreover, the hapless programmer is given
little guidance as to how he might be able to write his program to avoid
possible reader confusion.
These synonymities were introduced to permit an ALGOL 68 program to be
written on any device, but only one representation is required at any one time.
It is not difficult to assign symbols to operations so no symbol represents
more than one operation. In addition to reducing reader ambiguity, this
assignment simplifies construction of encoders for transportation representations,
If the problems are resolved as follows, programs can still be represented on
any device, but the synonymity conflicts are removed. Where possible,
non-stropped representations are also proposed.
boolean negation
skip
lower bound
entier
not
skip
Iwb
entier
as in the Report
no ~, the standard prelude does
not use the tilde and has no
problem
no '[•
no
neither operation has a stronger claim to [ than the
other, and it would be silly to allow entier to replace
Iwb ; this restriction will not make programs appreciably
longer
lower semaphore
raise semaphore
shift left
d own
up
shl
as in the Report
as in the Report
shift right
shr
it is not intuitively clear why 'up' should mean 'left'
and not 'right' ; I considered suggesting '<*' and '*>',
but their meanings could be forgotten since shifts are
infrequent
3Jf
exponentiation t "^^ pow *^ will seldom be unavailable;
t is kept because it looks
right; pow will do if the
others are imavailable.
Examples:
(in an assembler) op shl 8 or adrs
(tautology test) ( not (2 pow k /= 2rl shl k) | skip | undefined)
(binary search)
proc bis ear ch = (() int a, int arg, default) int :
( int val := default; bit s i :- 2rl shl (bits width - l);
while abs i > u pb a - Iwb a + 1 do i := i shr 1 od;
int k := ab s i + Iwb a - 1;
while abs i /= do i := i shr 1;
(a(k) < arg | k := min (k+ abs i, upb a)
I : a(k) = arg | val := k; i := bin | k -:= abs i)
od; val)
II 5. 'flip': true, 'flop': false
For bool and bits, ALGOL 68 violates the principle that valid denotations
should be valid transput values. As there seems to be no good reason for this,
I propose the elimination of 'flip' and 'flop'. The value 'put' for true will
be 'true'; that for false, ' false ' . 'Get' for a bool will read a tag and set
the bool to true if the tag starts with 't', false for 'f, and call 'char
error mended' if the tag starts with neither. The transput value for a bits
will be a BITS -denotation.
15
III. The Design of the Hardware Eepresentation
This section covers the points raised in section k of Lindsey's paper.
The Representation is detailed in Appendix A_, and section III. 10 below
contains some notes to explain that Appendix.
III.l Letter tokens
The central question here is whether to allow both upper and lower case
letters in tags and the letters assigned denotation functions (a,b, c, d, e, f^
g,i,k, 1, n, p, q, r, s^x, y, z for radix, times ten to the power, and format
markers). In fact few programmers will want to have tags differing only
in the case of one or more letters. (is there really enough difference
between 'ox' and 'OX'? Is 'scan' not 'Scan'?) Moreover, many line printers
are normally operated with only upper case. Therefore, the basic answer to
the central question is that internally the compiler will have only one
case of letter symbols (which one is immaterial) . For string denotations
and transput of strings, though, case will be distinguished. We now consider
three categories of terminals and three contexts for tags and denotation
function letters:
UC/liC terminal,
UC means bold
tag and
denotation
function letters
denotation function letters
(a,b,c, d, e, f, r, t)
input
UC/lC*
output
LC
UC/LC terminal,
case not used
for stropping
UC/LC^
UC/LC^
LC
UC terminal
UC*
UC*
LC**
*Converted to one case internally.
**System software or terminal converts to UC.
Appendix A lists lower case letters for ISO and upper case letters for EBCDIC.
This is only because these characters sets are normally available on the
corresponding equipment.
Of the national variant characters in the last foirr ISO columns, only
'@' and '~' are assigned a function in this hardware representation. The
other eight are available for use as additional letter-ABC -symbols. When
they are not so used they can serve, as suggested below, as style -TALLY -monads
and other-string-items.
16
111. 2. Bold tags.
This hardware representation is designed to be applicable to any
stropping convention. To facilitate this, only one case is assvimed and no
function is assigned to ' ' ' and '_'. Stropping is discussed in section IV.
111. 3. Composite characters.
Outlawed as discussed in section I. 3'
Ill.i^-. Carriage Return, Line Feed, and Delete.
As Lindsey's paper suggests, CR, LF, CR/LF, LF/CR (my addition), form
feed, and vertical tab all should terminate a line of input to the compiler.
Delete, backspace, and all other control characters should be ignored.
111.5. Notes on Particular Representations
The controversial representations in this design have been discussed
in section II. A few further choices deserve comment here.
'e' for times -ten-to-the-power-symbol
The Report implies that both 'e' and some other graphic (' „ ' or '\')
will be available for this symbol. It does not seem valuable to have two
or three graphics mean the same thing, especially where one is widely
available and the others are not.
(space) for space -symbol
The Report proposes that both the space and visible space ('^') be
available. The latter cannot be allowed because it would have to be
composite on available equipment. Using underlines for visible spaces does
not help because they run together and are no easier to count than spaces.
'& ' for and-symbol
This is reasonable and follows Lindsey's proposal.
Iwb and entier for floor-symbol
These two bold tags are not interchangeable alternatives.
'-)' and '~' for not-symbol
Section I has described the ridiculous confusion as to the location of
these symbols, including the possibilility that in some circumstances
not-symbol would be typed as 't ' . One solution would be, as with stick-symbol,
to suggest an alternate representation or to abolish the graphics in favor of
not . The situation is not as serious with not-symbol, however, because
17
not-synibol is only an operator and its interpretation is not crucial to
interpretation of the structure of the program text. Hence, 't' and '~'
are both allowed, depending on which character code is used. Note that, to
avoid even further confusion, '~' is not a representation of the tilde-symbol
and cannot be used in TAO's.
' ? ' for ' error char '
The report makes no proposal for this value. 'Error char' is used in
transput to be the string value for a value that cannot be translated as
specified. Is it not reasonable that this situation should be signalled
with the ?
' ' for 'blank'
This assignment corresponds to the decision to use (space) for
space -symbol.
'{' ... '}' for ISO brief -comment -symbol
The desirability of this representation is shown by PASCAL programs.
Some brief-comment-symbol is necessary because 'vf may not be available on
all ISO terminals. There is no reason the compiler cannot insist that the
left-brace start a comment and the right-brace end it.
111. 6 Other-string -it ems, other -pragment-i terns.
The characters available for these will certainly vary depending on the
code in use; I am not trying to define an implementation independent language.
Essentially every character that can be typed should be allowed in these
positions, except for the format effector characters, which only control the
listing of the program. (Though a format effector could be a string value
by means of transput or repr . ) Except for the quote -image -symbol every
character in a string will represent itself; there will be no composite
characters or representations of one character by several.
111. 7 Abs, repr, and conversion.
Abs and repr should certainly be designed so the position of a character
in the code is its abs value. In ISO, abs "A" = 65 and repr 65 = "A"; in EBCDIC
abs "A" = 193 and repr 193 = "A". Moreover, in both codes the null character
will be repr 0. As far as possible, programs should be written so as to be
independent of the actual values associated with abs and repr . As the
encoder in my transportation representation paper shows, this can be achieved
with minimal effort except for control codes. Appendix A proposes predefined
identifiers for the six format affector control codes.
Lindsey's paper proposes two conversion regimes. 'Stand conv' transmits
printable characters unchanged but ignores most control codes. The format
codes would create the requisite spaces to position the text as it would
appear on paper. Backspace would cause a call of 'char error' of the file.
His other conversion was 'complete conv' which transmitted all characters
IB
unchanged. Ignoring these, Lindsey's paper contains a program that uses
what is called 'special conv'. In fact this latter is an important conversion
that should be available for system programmers and anyone else -who vra.nts to
know exactly how the user encoded spaces (and wants to avoid the processing
time required to skip over them). I would call this conversion 'layout
encoded conv': printable characters and format effectors are transmitted
intact, but other control codes are ignored. When 'get'ing characters, each
code is delivered in turn. When 'get'ing strings, a line is terminated by
any format effector except horizontal tab; the tenninating character is
delivered as the last character of the line. If the line is terminated with
CR/LF or LF/CR, both will appear at the end of the line. (This will depend
on whether the operating system delivers both codes to the ALGOL 68 monitor. )
III. 8. Use of remaining characters.
Wo virtue is attained by associating a standard meaning with every
available character. Indeed, a few ought to remain available as style-TALLY-
monads. The characters not yet assigned are
(ISO) [ ] ! I - \
(EBCDIC) I ! jzf
Of these the brackets should not be monads because they would normally be
used in pairs as delimiters, but the others can be monads.
It is unfortunate that the report makes no provision for style -TALLY -nomads
because the given set {=,<,>,*, \ /) is limited and many of its combinations are
in use. Indeed, of the characters above, it would not be unreasonable to
specify that ' ! ',
and '•^' could not be monadic, but could be nonads.
With these new nomads there could be dyadic diphthongs like '*! ',
and even
Appendix A, however proposes no new nomads.
7-S
III. 9. Representations with smaller character sets.
When only a subset of the characters assumed above is available, bold
tags should be used to make up for missing characters.
if this graphic
is missing
I0
@
<
>
?
these bold tags
can replace its uses
over
mod
not
at
CO
It le of
si ge
and
then else
elif
19
If worst comes to worst, colon and semicolon will be missing and must be
replaced with iigly diphthongs as suggested in R9.^b. Compilers should
implement these only if numerous devices at the installation lack the
specified graphic.
the graphic is replaced by
+= : plus to
:= : is
/= : isnt
(the second and third of these are no longer mentioned in the Report, but
still do not lead to ambiguity. It can be remembered that semicolon is '., '
and not ',.' because real-denotations can start with '.' but cannot end with
it, so that no ambiguity exists in '3.,5' for '3;5'«)
In cases where standard graphics are absent but other symbols are available,
the temptation to use the latter should be resisted with passion .
The minimum character set for ALGOL 68 is thus:
letters, digits, =, +, -, *, ,, ., (, ), $, /, ", and possibly
some stropping character.
III. 10. Guide to reading Appendix A.
Some symbols in Appendix A are marked boldface by underlining them.
They are to be keywords, reserved words, or stropped words, according to
whatever convention is adopted.
The first listed "representation" for a symbol should be used if possible.
If no representation is possible (and the symbol is not intended as a MONAD),
the first possible graphic listed in the "alternatives" column should be
chosen. Up-syrabol and down-symbol have neither representation nor alternative
and cannot be written in a program {see section 11.^}.
Please note that "typographic display features" (spaces, new lines, and
new pages) cannot divide the characters of a diphthong or bold tag.
Numerals in braces refer to sections of this paper where decisions are
discussed. Where no comment is made, the representation follows the suggestions
of the Revised Report.
20
rv. stropping Recommendations
ALGOL 60 invented the concept of indivisible symbols represented by
boldface identifiers. ALGOL 68 extended this concept to allow the user to
invent his own bold tags. In the Revised Report, it is made clear that
these symbols are no longer indivisible, but are composed of ordinary
letters possibly set off in some way to indicate their type face. The
Report does not suggest how they are to be set off, but gives five examples
IR9.I1-. 2. 2b} including one using script letters, a feature found on few
existing computer transput devices.
Implementations have adopted diverse stropping conventions; some have
even attempted to implement several conventions with provision for pragmat ic
selection of one at compile time. Such attempts at universality seem doomed
to failure, however. Without much effort I was able to write a list of
twenty different stropping conventions, most incompatible with two-thirds
of the rest. Instead, the compiler writer should implement one convention.
Moreover, each program will at one time or another be transput on every
device in the installation, so the stropping convention chosen must be
applicable to every device.
Traditionally, ALGOL 60 programs have been stropped with apostrophes
at the beginning and end of each boldface symbol. To me this convention
seems unduly cluttering. I have always been struck by the clean aesthetics
of ALGOL: short symbols, indentation emphasized as a tool, no semi -colon
before else, ALGOL 68 's brief comment forms, the top-down structure of
program texts. In this atmosphere, apostrophes seem as appropriate as neon
lights in a Japanese garden. Examine
'begin' 'real' x; 'char' c;
get (f, (c,x));
'if c = "i" 'then' x+1 'else' x 'fi' 'end'
Note that in the natural (Western) way of looking top-to-bottom and
left -to -right, the first item the eye encounters is not a piece of information,
but is the interspersive apostrophe. Indeed, the most important distinction
between pairs of words is their first letters, but the first 'letter' of
all apostrophied words is the same. Note also that a hurried programmer who
used other languages might waste time over the confusion between 'if and
"i"; not much time, but perhaps enough to lose his train of thought.
Moreover, there is no one apostrophe stropping method. Lindsey lists three,
and my list included five, only one of which has been made illegal by the
Revised Report.
Another stropping technique that has been proposed is underlining; it
does not clutter and has a rather pleasant appearance. Its difficulties are
not aesthetic, but operational: it takes longer to enter and revise a
prograjn with underline stropping, many text editors and line printer routines
do not support underlining. Mechanical backspacing is a slow operation.
Finally, underlining constructs composite characters and suffers the
disadvantages of listings that are not in one-one correspondence with the
file.
21
Apostrophes clutter. Underlining has operational problems.
I reject them both for stropping. (Blithely ignoring the derivation of
the word 'stropping'.) What do I propose instead? Reserved words. However,
many implementers will feel they must provide some stropping convention so
I discuss below four conventions in order of increasing preference.
IV. 1. Postfix underline is bold (least favored, but better than
sixteen other schemes).
The least intrusive stropping scheme is to append an underline to
the end of a bold word. The reader can easily ignore the character, but it
is there to resolve ambiguity if he needs it. A trained ALGOL reader scans
the indentation before examining the text. With prefix stropping, most
lines begin with the low-information-content strop character; with postfix
stropping, the strop character is out of the way.
Our earlier random program would look like this:
begin_real_x; char_c;
get (f, (x, c));
if_ c="i" then_ x+1 else_ x fi_ end_
Note that '_' alone is enough and no space is needed. One of the advantages
of this notation to the programmer- cum- keypuncher is that the underline is
typed in the same sequence that it would be drawn: last after the rest of
the word. Thus written work could still underline bold words and the
transliteration while keypunching is not onerous.
In a small way, this convention is compatible with complete underlining
of bold words. If backspaces are ignored and multiple underlines converted
to a single underline, the compiler can accept "begin 3PPP3 " (where
'P' means backspace) as equivalent to "begin_". But the user could not
type "bP_eP_gP_iP_nP_" and achieve the same effect. (Alternatively, the
latter could mean begin and a non-letter be required to end a bold tag.
This introduces undue stropping separation in, for example, "re of_ z".)
Postfix underline stropping is poor for terminals with a non-escaping
underline key. They would have to type "begin_ " which would be printed
as "begin_". Presumably implementors for such terminals would choose some
convention later in this section.
IV. 2. Upper case is bold.
One convention gaining wide use in Europe is case stropping: bold tag:
in upper case and tags in lower case.
BEGIN REAL x; CHAR c;
get (f, (c, x));
IF c="i" THEN x+1 ELSE x FI END
22
If the tags are chosen to be real words, they look more like normal text if
they are lower case. (Theoretically, no space woiild be required between
"REAL" and "x", but it might be considered gauche to omit it. Consider
"reOFz".)
Case stropping is excellent, except that existing devices, especially
line printers, often support only upper case. One could postulate combining
this technique with underlining or apostrophes to be used on an upper case
only device. Unfortunately, this fails because in systems with only one
case, it is usually upper case; plain tags would be upper case and thus
mistaken for bold.
17.3. Lower case or postfix underline is bold.
A reversal of the previous convention allows combination of both the
first two conventions. Bold words would be lower case (as they are in the
Report) or would be followed by an underline:
BEGIN_real X; char C;
GET (F, (X,C));
if C="i" THEN_ X+1 else X fi end
(The mixed case if-THEN_-else-fi looks unpleasant and is. I postulate that
in this particular case it was forced on the programmer because he was fixing
the "THEN_" at an upper case terminal.) Presumably, the user would use case
distinction at a mixed case terminal and fall back on postfix underline at
an upper -case -only terminal. An upper case line printer would print all as
upper case, and the user would have to rely on context to distinguish bold
from roman. The latter is never an arduous chore in a well written program.
The only disadvantage with this third stropping convention is that the
second, contradictory, convention has achieved considerable useage,
IV. k. Reserved words are bold (most preferred technique).
Few languages other than ALGOL have stropping conventions, most rely
on reserved words or a language design that makes all distinctions
determinable from immediate context. Many successful ALGOL compilers have
avoided stropping. Several ALGOL 68 implementations (at least Vancouver,
Dartmouth, Illinois Institute of Technology, and probably many more) have
worked out techniques to minimize or eliminate stropping. Here is a random
program without stropping:
begin real x; char c;
get (f, (x, c));
if c = "i" then x+1 else x fi end
23
Is there any reason to believe that the human reader should have difficulty
distinguishing bold from roman in this program?
One interesting point is the fact that the Eevised Report lists 20^
more predefined reman words than bold tags. To be sure, the roman words
can all be redefined with far less penalty than a redefinition of if; but
still they constitute a large class of identifiers the user must remember.
In the past, automatically generated parsers have behaved poorly when
confronted with a reserved word used as an identifier. There is no inherent
need for this to be the case, and work such as that reported by Graham and
Rhodes (7) is showing how to avoid these problems.
What is to be done about identifiers that contain a reserved word as
an integral part, for example "year to date"? It is possible to forbid
this, but that leaves too many opportunities for the programmer to forget
that something is a reserved word. After all, the entire identifier is
far from any reserved word. Instead, I suggest that underlines replace
blanks in identifiers. Where an identifier is continued from one line to
the next, it may have embedded blanks, but it must end with an underline on
the first line or begin with an underline on the second line. Either
year_to_
date
or
year_to
_date
will be acceptable renderings of "year_to_date " .
In ALGOL 68 a user can declare arbitrary tags to be bold face. There
are three reasonable techniques for dealing with these. (l) Such tags would
be iinstropped and treated as bold only in the block where they were so
declared. This technique implies that the token scanner will analyze the
block structure, a good idea because it can aid automatic correction of
bracket errors. (2) They could be unstropped, but treated as bold throughout
the compilation. At least one implementation (IIT) has chosen this route.
(3) They could be stropped in some way. Underlines serve as spaces in roman
tags, so they cannot be used; apostrophes at both ends are unnecessary;
postfix apostrophes are almost as unobtrusive as postfix -underlines; so I
propose that a postfix apostrophe be used.
Much as I dislike stropping of standard bold tags, I recommend the third
method of distinguishing user created bold tags. They will certainly be
less used than syntactic words like if and real, and usually they will not
appear more than once in a phrase. More critically, they will be unfamiliar
to a reader of the program, so he deserves to have them set off in some
manner. Here is a part of a program:
2k
mode token' = struct (int type, string val, rtok'next),
rtok' = ref token';
rtok'toklist := nil, eop := heap token';
ref rtok'tokput := toklist;
sema tokens_ready = level 0,
may_move_tokens = level 1;
Note that in its use as a mode or a monadic operator, a user defined bold
tag is separated from its object by the apostrophe. It can be viewed as a
notation that the thing to its left operates on the thing to its right.
Moreover, a postfix apostrophe is somewhat like normal usage where an
apostrophe may appear near the end of a word.
Some emergency method of stropping may be necessary anyway to handle
cases where a tag is declared both bold and roman. In such cases, it would
be assumed roman unless stropped or unless the syntactic position demanded
that it be bold. This mechanism is one way to solve the problem of 're'
and re_ and 'im' and im.
(Writing this, I have come to worry about "re" and friends. Why do
both operator and field selector exist? Synonymously, is compl a primitive
mode or is it a struct ? If it is a mode then we need only operators to
process it and an operator to construct objects of that modes (which we
have: '!')• Viewed thus, it makes no sense to assign a value to part of
the primitive object "z". On the other hand, given the stropping separation
of re and im and given "-<" for of^ it is just as reasonable to eliminate
re and im from the language.)
Final thought: I believe an efficient compiler can be written so that
no more than 29 words are reserved:
BEGIN, BY, CASE, CO, COMMENT, D0, ELIF, ELSE, END, ESAC, FI,
FLEX, FOR, FROM, IF, IN, MODE, OD, OP, OUSE, OUT, PR, PRAGMAT,
roiO, PROC, REF, STRUCT, THEN, TO, UNION, WHILE.
With these words, the structure of the text can be determined. The other
bold words in the Report could be redeclared, but could not be used in their
bold sense within that block.
I offer this limited-reserved-words approach as a challenge to parser
implementers.
25
References
(1) van Wijngaarden, A., et al.. Almost the Revised Report on the
Algorithmic Lanugage ALGOL 68, private communication^ (SeptV7l9T3 ) . This
version is slightly more recent than the version considered at Los Angeles
and includes most of the corrections agreed on there.
(2) Lindsey, C. H., 'An ISO-Code Representation for ALGOL 68', ALGOL
Bulletin 31 (March, 1970), pp. 3T-6o.
(3) ANSI, 'Data Communication Control Procedures for the USA Standard Code
for Information Interchange', CACM 12, 3 (March, I969), PP- I66-I78. ASCII
is listed in Appendix-E.
{k) IBM Corp., IBM System/360 Principles of Operation, Order No. GA22 -6821-8,
1970, pp. 150.2-150.3. (Note that the graphic '.' has been omitted from
position 'i|B' of the document.)
(5) ANSI, 'Correspondences of 8-Bit and Hollerith Codes for Computer
Environments - A USASI Tutorial', CACM 11, 11 (Nov., I968), pp. 783-789.
Corrected in CACM 12, 5 (May, I969), p. 29^^.
(6) Freeman, ¥., 'Suggestions regarding certain representations in ALGOL 68',
ALGOL Bulletin 3h (July, 1972), pp. kl-kh.
(7) Graham, S. L. and S. P. Rhodes, 'Practical Syntactic Error Recovery
in Compilers', Conference Record of ACM Symposium on Principles of Programming
Languages, Boston, Massachusetts (Oct., 1973)^ PP- 52-58.
(8) Bemer, R. W., 'Backspace Bungle', Datamation 19, 9 (September, 1973)^ p. 25.
26
Appendix A. Proposed Hardware Representation
Where there is a difference between the ISO and EBCDIC representations,
the respective parts are labeled (iSO) and (EBCDIC).
8.1.4 Character denotations
d) other string item : style i monad {942b};
{111.6}
(iSO) ~ [ ] { ] (and upper case letters)
(EBCDIC) -I (non-printing codes above 'space'; lower case letters).
9.1.1 Syntax
(111.3}
IF :: choice using boolean.
CHOICE brief start : open token.
IF bold start : if token.
CASE bold start : case token.
IF brief in : decision token.
IF bold in : then token.
CASE STYLE in : in token.
IF brief again : again token.
IF bold again : else if token.
CASE bold again : out case token.
IF brief out : decision token.
IF bold out : else token.
CASE STYLE out : out token.
CHOICE brief finish : close token.
IF bold finish : fi token.
CASE bold finish : esac token.
{there is no 'CASE brief again'. The provisions above can be diagrammed:
start in again out finish
IF brief
CASE brief
IF bold
CASE bold
(
9
?:
9
)
(
in
X
out
)
if
then
elif
else
fi
case
in
ouse
out
esac
out esac ]
9.2 Comments and pragmats
d) STYLE other FRAGMENT item : quote symbol;
other string item (except not STYLE-PRAGMENT-symbol).
{111.6}
27
9.'4-.2 other TAX symbols
a) style i letter ABC : [if case is not used for stropping] tlH-l}
(ISO) upper case of letter ABC symbol
(EBCDIC) lower case of letter ABC synibol
(internally all STYLE-letter-ABC ' s are converted to one case}.
b) style i monad : (ISO) ! \ | '^
(EBCDIC) \ \ i
9.^.2.2 Representation
[111.8]
{IV]
b) Stropping. {This representation is compatible with all proposed
stropping conventions and favors none (pun intended).]
9.4.1 Representations of symbols
a) Letter symbols
(III.l]
symbol
(ISO)
(EBCDIC )
symbol
(ISO)
(EBCDIC )
letter a
symbol
a
A
letter
n
symbol
n
N
letter b
symbol
b
B
letter
symbol
letter c
symbol
c
C
letter
P
symbol
P
P
letter d
symbol
d
D
letter
q
symbol
q
Q
letter e
symbol
e
E
letter
r
symbol
r
R
letter f
symbol
f
F
letter
s
symbol
s
S
letter g
symbol
g
G
letter
t
symbol
t
. T
letter h
symbol
h
H
letter
u
symbol
u
U
letter i
symbol
i
I
letter
V
symbol
V
V
letter j
symbol
J
J
letter
w
symbol
w
W
letter k
symbol
k
K
letter
X
symbol
X
X
letter 1
symbol
1
L
letter
y
symbol
y
Y
letter m
symbol
m
M
letter
z
symbol
z
Z
b) Denotation symbols
symbol
representation
digit zero symbol
digit one symbol
1
digit two symbol
2
digit three symbol
5
digit four symbol
k
digit five symbol
5
digit six symbol
6
digit seven symbol
7
digit eight symbol
8
digit nine symbol
9
point symbol
•
times ten to the power
symbol
(ISO) e
(EBCDIC) E
(III. 5]
28
symbol
true symbol
false symbol
quote symbol
quote image symbol
space sjrmbol
comma symbol
empty symbol
c ) Operator symbols
symbol
or symbol
and symbol
ampersand symbol
differs from symbol
is less than symbol
is at most symbol
is at least symbol
is greater than symbol
divided by symbol
over symbol
percent symbol
window symbol
floor symbol
ceiling symbol
plus i times symbol
not symbol
tilde sj.Tribol
down symbol
up symbol
plus symbol
minus symbol
equals symbol
times symbol
asterisk symbol
assigns to symbol
becomes symbol
representation
true
false
(space)
{111.5}
empty
representation alternates
&
<
>
/
(EBCDIC) "•
or
&
and
(III. 5
(
/=
It
ne
<=
>=
le
St
i
over
elem
Iwb,
upb
entier
{II. ^
{11.^}
{11.1}
III. 5]
(ISO) ~ not [Tl.h, III. 5]
{II. i^}
>
e£
■X-
(Although they are not listed in S.k, the following are defined in chapter 10.}
exponentiation operator
modulo operator
plus and becomes operator
minus and becomes operator
times and becomes operator
divided by and becomes operator
over and becomes operator
modulo and becomes operator
plus to operator
shift left operator
shift rif^ht operator
raise semaphore operator
lower semaphore operator
+
■X-
9
pow t
mod
plusab
minusab
time sab
divab
overab
mo dab
plus to
shl
shl
"^own
(11.^}
;
(11.^}
29
d) Declaration symbols
As in the Revised Report
e) Mode standards
As in the Revised Report
f) Syntactic symbols
symbol
bold begin symbol
bold end symbol
brief begin symbol
brief end symbol
and also symbol
goon symbol
completion symbol
label symbol
parallel symbol
open symbol
close symbol
decision symbol
again symbol
if symbol
then symbol
else if symbol
else symbol
fi symbol
case symbol
in symbol
out case symbol
out symbol
esac symbol
colon symbol
brief sub symbol
brief bus symbol
style i sub symbol
style i bus symbol
up to symbol
at symbol
is symbol
is not symbol
nil symbol
of symbol
routine symbol
go to symbol
go symbol
skip symbol
formatter symbol
representation
begin
end
)
par
? :
if
then
elif
else
fi
case
in
ouse
out
esac
@
nil
-<
goto
go
skip
} {11.3}
at
is
isnt
of
CII.3}
30
g) Loop symbols
No change from Report
h) Fragment symbols
symbol
brief comment symbol
bold comment symbol
style 1 comment symbol
style ii comment symbol
bold pragmat symbol
style i pragmat symbol
10.2.1 Environment enquiries
representation
(ISO) (...}
comment
CO
#
pragmat
HI
p) int max abs char = (iSO) 127 (EBCDIC) 255;
q) char null character = repr 0;
r)
char flip = "t";
s)
char flop = "f";
t)
char errorchar =
u)
char blank = " "
v) char horizontal tab = repr ((EBCDIC) 5 (iSO) 9),
backspace = repr ((EBCDIC) 22 (iSO) 8),
carriage return = repr 15,
line feed = repr ((EBCDIC) 57 (iSO) 10 ),
vertical tab = repr 11,
form feed = repr 12;
10.6.1. Library Preludes
a) proc complete conv = (ref book b) conv:
(conv c;
for i from to max abs char do
c);
(aleph of c) (i) := ( repr i, repr i) od ;
{111.5}
b) proc layout encoded conv = ( ref book b) conv :
# characters to be ignored are set to null #
( conv c;
for i from to_ abs blank - 1 do
(aleph of c"5 {i) :- (null character, repr i) od ;
for i to 6 do
char ch = (i in horizontal tab, backspace,
carriage return, line feed, vertical tab,
form feed) ;
(aleph of c) ( abs ch) := (c, c) od ;
for i from abs blank to max abs char do
c);
(aleph of c ) (i) := ( repr i, repr i) od;
OGRAPHIC DATA
r
1. Report No.
UIUCDCS-R-75-607
2.
3. Recipient's Accession No.
c and Subt itle
A Revised ALGOL 68 Hardware Representation
for ISO-code and EBDCID
5. Report Date
November, I973
6.
ior(s)
Wilfred J. Hansen
8. Performing Organization Rept.
No.
orming Organization Name and Address
Department of Computer Science
University of Illinois
Urbana, Illinois
10. Project/Task/Work Unit No.
11. Contract/Grant No.
insoring Organization Name and Address
Department of Computer Science
13. Type of Report & Period
Covered
University of Illinois
Urbana, Illinois
14.
jplementary Notes
stracts
Because of the latitude allowed by the Revised AIGOL 68 Report, each
ementation has a slightly different representation for the constructs of the
uage. This diversity can only lead to confusion as ALGOL 68 trained individuals
they need readaptation to program at a new installation. The solution
osed here is to develop a single hardware representation which can be used
any computer systems. In fact this representation can conveniently be designed
g only the intersection of the graphic characters available in the ISO code
EBCDIC .
The paper also proposes comfortable new representations for a few symbols
discusses the thorny problem of distinguishing bold face words.
y Words and Document Analysis. 17o. Descriptors
ALGOL68,
hardware representation, program interchange,
symbols, characters,
bold face letters,
ASCII, ISO-code, EBCDIC
lentifiets/Open-Ended Terms
OSATI Field/Group
'liability Statement
19. Security Class (This
Report)
UNCLASSIFIED
20. Security Class (This
Page
UNCLASSIFIED
21. No. of Pages
22. Price
<5 (10-70)
USCOMM-DC 40329-r>71
'i.
UNivEBsrrv of Illinois ubbana
3 0112 064441527
• '" " '>■ "-i -^f-^
^ii ' * i i i
... ' ' - C-t'
W
Iri
.■'....• •■.!•■■• '^-WM