LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 no.171-187 cop. a. Digitized by the Internet Archive in 2013 http://archive.org/details/illiaciiashortde173brea Report No. 173 COO-415-lOl+l rt».l73 op . 3 ILLIAC II- -A SHORT DESCRIPTION AND ANNOTATED BIBLIOGRAPHY by H. C. Brearley February 25, 1965 «NIVF JAN 07 iw LJBRAK . Report No. 173 ILLIAC II- -A SHORT DESCRIPTION AND ANNOTATED BIBLIOGRAPHY by H. C. Brearley February 25, 1965 Department of Computer Science University of Illinois Urbana, Illinois ILLIAC II is a large high-speed, general-purpose computer built by the Digital Computer Laboratory, University of Illinois, Urbana. Comprehensive plans for its construction were given in a widely quoted 1957 report [38]= No similarly comprehensive post-construction report exists, although a number of papers describing various aspects of the computer have been published. This bibliography lists these papers and provides a short description of the computer as a guide to the entries. The papers cited fall into two classes. The first class is the open literature consisting of journal articles, symposia proceedings, and the like. The second class is Digital Computer Laboratory Reports. These are included because they are fairly widely held in the libraries of computer organizations and they have been cited in the literature. The internal Digital Computer Laboratory documents related to construction are not cited here. History Planning for ILLIAC II began June 1, 1956, and culminated in 1957 in a report describing the proposed design [38]- Design began in 1957 and final chassis construction began in i960. In I962, the two controls, arithmetic unit and core memory began operation with paper tape input and output. At the present time the machine is essentially complete and in use, and work continues on the addition of input-output devices and other peripheral equipment. The work has been supported jointly by the University of Illinois, the Atomic Energy Commission and the Office of Naval Research. The IBM Corporation donated a number of input-output devices. Recently renamed the Department of Computer Science ■2- Organlzation ILLIAC II is a highly parallel computer, with three simultaneously operating controls. Operations of the floating-point arithmetic unit are controlled by an arithmetic control „ Transfer of data between the core memory and the slower memories is controlled by an interplay control. Other control functions are performed by a supervisory control called Advanced Control. Among the functions of Advanced Control are fetching and storing of operands, address construction and indexing, and partial decoding of the orders for the other two controls. The Advanced Control order code is rather elaborate, and in conjunc- tion with the 13-bit registers in the fast memory it provides for a large variety of 13-bit fixed-point arithmetic and logical operations, except multiplication and division. The hierarchy of memories consists of: 1. Fast transistor memory, 10 words, 0.2 usee. 2. Core memory, 8,192 words (soon to be 12,288 words), 1.8 usee. 3. Drum memory 65,536 words, 8.5 msec average access time, 7.8 usee word period. k. Magnetic tapes and disk files. The order code contains long and short instructions. A 13-bit short instruction, which occupies only a quarter word, contains four bits to specify an index register containing an operand or an address. A 26-bit long instruc- tion contains in addition a 13-bit address. Long instructions may be packed two to a word. Two words of orders are held in the fast memory. This makes is possible to execute a loop of up to eight short instructions (two words) without any instruction fetches from the core memory. If at the same time the operands are held in the ten-word fast memory, a very fast loop can be written. •D- A detailed consideration of the size and speed requirements of the various parts of the machine for several classes of problems is given in Taub, et al. [38], which also contains an early version of the order code. Considera- tion of problem types is also contained in Taub [39] • More detailed descriptions of the organization and the order code are contained in Gillies [8], [9], [11]. Up to date details are given in the ILLIAC II programmer's manual [5]. Arithmetic Unit The arithmetic unit is asynchronous, double-precision, floating-point . It is radix k in almost all respects. Single-precision operands are 52 bits long, with a ^5~bit fraction and a 7-bit exponent (base h) in radix complement representation. The range of normalized single-precision numbers in the memory is k~ Gk < lx y | < k 63 Results of most arithmetic operations are not normalized and the programmer is free to normalize or not as he stores them. To aid in fixed-point programming, orders are provided which force the exponent to one of three values, with corresponding shifts in the fraction part. The roundoff which occurs when storing a double-precision arithmetic result in the single-precision memory is obtained by adding 1 or to the last retained fraction bit for discarded fractions greater or less than one-half respectively. The equality case is made dependent on the (presumably random) last retained bit to produce an unbiased roundoff. A number of features are provided to increase the speed of operation. Redundant number representations and separate carry storage are used within part of the arithmetic unit to eliminate carry propagation during repeated -h- additions such as occur in multiplication. In general a carry bit is provided for each two fraction bits. Multiplier digits, originally having values 0, 1, 2, 3 are recoded to the range -1, 0, 1, 2 and two-at-a-time shifts are provided. Two adders are provided so that addition may be performed both while gating from the accumulator (A, Q) to the temporary accumulator (S,R) and vice versa. Radix k division was considered by Robertson [30], but rejected in favor of redundant binary nonrestoring division, wherein the quotient digits are generated as -1, 0, +1 and then recoded as base h digits with values between -3 and +3. Carries are assimilated before a store, since the other parts of the computer do not use redundant number representation. The floating-point arithmetic unit as constructed is described theoretically in Robertson [31] and in detail in Penhollow [19]. Earlier plans were described in Taub, et al. [38] and Wheeler [^+0], In addition there were a number of earlier studies. These included redundant number representations by Avizienis [1], [2], [3] and Metze [14], use of redundant number representation in the whole computer instead of just the arithmetic unit by Metze and Robertson [15], separate carry storage adders by Takahashi [37], efficient multiplier and division recodings by Penhollow [18], and efficient division by Robertson [30], Metze [16] and Shively [3^]. Speed Independence and Control Design Theories of asynchronous circuits and speed independence were studied extensively prior to construction. The speed independence problem is stated physically and theoretically in Taub, et al. [38]. Detailed theoretical studies are in Muller and Bartky [17], Shelly [33], and Bartky [k] . A circuit is speed- independent if its function does not. depend on the speeds at which its constituent parts operate. Advantages of speed independence are increased reliability and ease of maintenance. -5- The realization of spesd independence used in the controls of ILLIAC II involves the collection of reply signals to insure that all the operations which must he performed at each step are complete before going on to the next step. Some of the problems involved in designing the arithmetic control in this way are described in Swartwout [35]> Robertson [32], and Gillies [10]. Advanced Control was designed in a similar but not identical way. The arithmetic unit was made not speed-independent to avoid increasing its complexity and cost and decreasing its speed. The electro-mechanical peripheral devices are inherently synchronous, but the philosophy of speed independence was partly extended to them by the provision of replies and alarms for many of the control signals . A theoretical study of methods of designing a speed independent control, including the method actually used for the arithmetic control, is contained in Swartwout [36]. Speeds Some approximate operation times are as follows: Floating add or subtract 2.5 to 3. 5 usee Floating multiply 6„3 usee Floating divide 16.0 usee Indexing 1.0 usee 13-bit integer orders 2.0 usee Fast memory .2 usee Core memory 1.8 usee The times shown for arithmetic do not include instruction or operand accessing times because Advanced Control performs memory accesses concurrently with arithmetic, usually with zero net time charges. Instruction decoding, address construction and indexing are similarly overlapped with arithmetic, and most absorb no effective time at all. -6- Fast Memory Ten words of very fast storage are provided, called the fast memory or flow gating memory. These ten registers are composed of transistor flipflops with common input and output louses and special gating arrangements to keep the number of transistors small. The design achieves high speed and high sensitivity along with the usually contradictory high stability by using variable feedback. During the write-in operation, a gate signal lowers the average potential of the flipflop. This produces two effects: (l) information is allowed to flow into the circuit through a diode from the input bus, and (2) the feedback in the flipflop is disabled. This reduces the circuit to a difference amplifier, and the information is stored in the base-emitter capacitances. At the end of the write-in operation the average potentials are raised back to normal, thus cutting off the input diode and allowing the feedback to permanently store the infor- mation. The operation time is 0.2 usee. The transistor counts per bit are: basic flipflop 2, output driver 1, write and read drivers and terminations about 2.3. The fast memory sits at the "crossroads" of the computer, and some of its registers are also intimately identified with other parts of the machine, e.g., the core memory, Advanced Control, the arithmetic control and the arith- metic unit. Four of the fast registers are also addressable as quarter words, thus providing l6 registers of 13 bits each for use as index registers and for other purposes. The early plans for the fast memory were given in Taub, et al. [38], and Poppelbaum [20], [21], [22]. A brief mention is also made in Poppelbaum [24], Detailed experimental data on the fast memory, including tolerance analyses, waveforms and other details, is given in Guckel, Kunihiro and Crow [12]. A patent covering the flow-gating principle was issued in 1962 [25]. -7- Core Memory The core memory was originally planned to contain 8,192 words of 52 bits plus parity each. There were to be two 4, 096-word modules, with odd addresses in one module and even addresses in the other to halve the average access time for sequential addresses. The first 4, 096-word module was completed in 1962. It was word oriented, with one switch core per word and two data cores per bit. Two data cores per bit give bipolar output and a loading on the switch cores that was virtually independent of the digit pattern. Partial switching was used to increase speed and reduce core heating. Readout was destructive and a restoration cycle was provided. Early plans for the core memory were described in Taub, et al. [38]. Some earlier experiments were reported in McKay, Yu, Pottle [13]. Detailed plans for the construction of the first 4, 096-word module were described in Ray [27]. Theoretical studies of partial swtiching are contained in Ray [28], [29]. The first 4,096-word module was finished in 1962, and has been in operation since then (without the interleaved addresses feature) at a cycle time of 1.8 u-sec. In 1964, a commercial 8,192 -word core memory was purchased. The original 4,096-word module and 4,096 words of the commercial core memory are now in operation with interleaved addresses. This exhausts the addressing capa- bilities of the original 13-bit address field. The addressing scheme is presently being modified to allow the additional 4,096 words also to be used. Circuits The basic circuits used in the high-speed portions of the machine are nonsaturating current switching circuits using pnp germanium mesa transistors. Switching times are 10 to 40 nanoseconds. Early reports on these circuits are Taub, et al. [38] and Poppelbaum and Wiseman [22]. The actual construction was based on a revised design completed in the summer of i960. A patent covering the asymmetrical flipflop was issued in i960 [23]. A tutorial description of some of the memory elements is in Rao [26]. The slower parts of the computer (interplay, Drum Memory, Input-Output Channels, etc.) contain a variety of slower circuits. These include saturating, nonsaturating, current switching and NOR topologies using germanium transistors. The computer contains about 55,000 transistors and 133^000 diodes, exclusive of the commercially built input-output devices. Input -Output and Interrupt Two input -output systems are provided- -a high capacity full word system and a slower quarter word system. Full word data transfers in the memory hierarchy are between the core memory and one of the other memories or devices. Transfers between the core memory and the ten-word fast memory are supervised by Advanced Control. All other full word transfers are performed by Interplay, which contains the necessary controls and data buffers. Interplay is a wired program computer of a limited sort. It begins a data transfer between the core memory and one of the other memories or devices in response to a command from Advanced Control. After the initial set-up, Advanced Control and Interplay operate independently without interaction except that they compete for core memory accesses. Each of the Interplay Channels can be performing a transfer at the same time. Currently there are nine channels in use out of a possible 32. The capacity of Interplay is one word every 3-5 M-sec. The slower input-output system, called the special register system, allows Advanced Control to exchange 13-bit characters with up to 6k input-output registers. Each 13-bit transfer requires Advanced Control to execute one order -9- as distinguished from Interplay which operates in parallel with Advanced Control and requires execution of only two Advanced Control orders to transfer a block of data, generally 256 words. The special register system is used for low-speed input-output and to transmit control and status information for peripheral devices. An interrupt system is connected to certain hits of the special registers. For example, when an Interplay channel completes the transfer of a block of data, a completion signal is provided via one of the special registers. This may, if desired, interrupt the program then running and call a supervisory program to initiate another transfer or take other action. The interrupt system may also be actuated by errors, power failures, requests from consoles, etc. Magnetic Drum Memory The Magnetic Drum Memory stores 65,536 words on two 3^00-rpm drums. Each word is stored as four 13-bit characters plus parity. The character period is 1.95 usee; the word period is 7.8 usee. Non-return-to-zero recording is used at a packing density of 288 bits per inch. Full 52-bit parallel recording with a 1.95-M-sec word period was considered but not used because it would have required four times as many read and write amplifiers and it would have almost completely occupied the core memory while a drum transfer was in progress. Drum data is written and read in 256-word blocks, with eight blocks per band, and l6 bands per drum. Gaps between the blocks allow for head switching so that following any block transfer, random access to one of the l6 blocks in the next sector may be obtained without waiting. •10- System Programs The ILLIAC II software includes an assembler called NICAP, a FORTRAN II translator, and an operating system program. Among other things, NICAP handles the multiple-orders-per-word problem and translates complex address field expressions, including nested parentheses to any depth. Parts of address field expressions which can be evaluated at translation time are so evaluated. The ) remaining additions and subtractions are prepared for execution at run time by the 13-bit fixed-point arithmetic unit in Advanced Control; multiplications and divisions are prepared for execution by the floating-point arithmetic unit. The address field compilation algorithm is described in Gear [6]. The FORTRAN II translator produces assembly language in a single pass. Effective use of the drum memory enables the translator to proceed without the use of magnetic tapes, thus gaining an order of magnitude in speed. The operating system program provides for batch processing. The various system and library programs are described in a user's manual [5] and in a compiler writer's manual [ 7 ] . Acknowledgment s The assistance of Professors J. E. Robertson and D. B. Gillies in the preparation of this bibliography is gratefully acknowledged. BIBLIOGRAPHY [1] A. Avizienis, "A Study of Redundant Number Representations for Parallel Digital Computers," Ph.D. Thesis, University of Illinois, Urbana, Illinois, i960. (Also DCL Report 101, May 20, i960. ) [2] A. Avizienis, "Signed-Digit Number Representations for Fast Parallel Arithmetic," IRE Trans, on Electronic Computers , vol. EC-10, pp. 389-^00; September, 1961. [3] A. Avizienis, "On a Flexible Implementation of Digital Computer Arithmetic," Information Processing 1962 (Proceedings of IFIP Congress, Munich, Germany, August, 1962), North Holland Publishing Co., Amsterdam, 1963, pp. 66^-670. [b] W. S. Bartky, "A Theory of Asynchronous Circuits III," DCL Report 96, January 6, i960. [5] C W. Gear, editor, "ILLIAC II Manual," Digital Computer Laboratory, University of Illinois, March, 1963. This loose-leaf programmer's manual is kept up-to-date with revision pages. [6] C. W. Gear, "Optimization of the Address Field Compilation in the ILLIAC II," The Computer Journal , vol. 6, pp. 332-335, January, 196k. [7] C. W. Gear, editor, "New Illinois Compiler and Assembler Programming System Systems Manual," Digital Computer Laboratory, University of Illinois, July, 196^. This loose-leaf manual describes the system programs, including the NICAP and FORTRAN translators. [8] D. B. Gillies, "Organization of a Very -High-Speed Computer," DCL Report 93, August 24, 1959. [9] D. B. Gillies, "The Design of a Very -High-Speed Scientific Computer," notes for course "Theory of Computing Machine Design," June 26-29, 196l, University of Michigan. [10] D. B. Gillies, "A Flow Chart Notation for the Description of a Speed Independent Control," Switching Circuit Theory and Logical Design (Proceedings of the Second Annual Symposium on Switching Circuit Theory and Logical Design, Detroit, October 17-20, 196l), pp. 109-110, AIEE Publication S-I3U. [11] D. B. Gillies, "Order Code for the New Illinois Computer," Digital Computer Laboratory internal report, June 15, 1962, revised August ik, 1962. Current version is available as Chapter 3 of the ILLIAC II Manual [5]. [12] H. Guckel, T. Kunihiro and R. K. Crow, "Final Report--Flow Gating," DCL Report 106, March 2k, 1961 [13] R. W. McKay, N. N. Yu and C. Pottle, "A One-Word Model of a Word- Arrangement Memory," DCL Report 79, May, 1957- [lb] G. Metze, "A Study of Parallel One's Complement Arithmetic Units with Separate Carry or Borrow Storage," Ph.D. Thesis, University of Illinois, Urbana, Illinois, 1958. (Also DCL Report 8l, November 11, 1957.) -11- -12- [15] G. Metze and J. E. Robertson, "Elimination of Carry Propagation in Digital Computers," Information Processing (Proceedings of International Conference on Information Processing, June 15-20, 1959), pp. 389-396, UNESCO, Paris. [l6] G. Metze, "A Class of Binary Divisions Yielding Minimally Represented Quotients," IRE Trans, on Electronic Computers , vol. EC-11, pp. 761-76^?" December, 1962. [17] D. E. Muller and W. S. Bartky, "A Theory of Asynchronous Circuits," Proc . International Symposium on Theory of Switching, Part I, vol. 29 of Annals of the Harvard Computation Laboratory, Harvard University Press, pp. 204- 2^3r 1959. [l8] J. 0. Penhollow, "A Study of Arithmetic Recoding with Applications in Multiplication and Division," Ph.D. Thesis, University of Illinois, Urbana, Illinois, September, 1962. (Also DCL Report 128, September 10, 1962. ) [19] J. 0. Penhollow, "The Arithmetic Subsystem of the New Illinois Computer," DCL Report 160, January 2k, 1964. [20] W. J. Poppelbaum, "Flow Gating," Proceedings of the Western Joint Computer Conference (May, 1958), pp. 138-lUl, AIEE, New York, 1959- ' [21] W. J. Poppelbaum, "Flow Gating," DCL Report 83, July 10, 1958. [22] W. J. Poppelbaum and N. E. Wiseman, "Circuit Design for the New Illinois Computer," DCL Report 90, August 20, 1959. [23] W. J. Poppelbaum, "Transistor Flipflop," United States Patent No. 2,933,621, granted April 19, 19^0. [2^] W. J. Poppelbaum, "Millimicrosecond Computer Circuits," NEREM Record (Northeast Electronics Research and Engineering Meeting, Boston, November 15-17, i960), Boston Section of IRE, pp. 22-23. [25] W. J. Poppelbaum, "Flow Gating," United States Patent No. 3,067,339, granted December k, 1962. [26] P. V. S. Rao, "Some Memory Elements Used in ILLIAC II," DCL Report 119, June 21, 1962. [27] S. R. Ray, "Design of the Core Storage Unit," DCL Report 91, August 21, 1959- [28] S. R. Ray, "Engineering Model of a Partial Switching Effect in Ferrite Cores," Ph.D. Thesis, University of Illinois, Urbana, Illinois, I96I. (Also DCL Report 111, September 5, 1961. ) [29] S. R. Ray, "Model of Partial Switching in Polycrystalline Ferrites," Proceedings of the International Conference on Nonlinear Magnetics (Washington, April 17-19, 1963), IEEE publication T-1^9. [30] J. E. Robertson, "A New Class of Digital Division Methods," IRE Trans, on Electronic Computers, vol. EC-7, pp. 218-222; September, 1958. -13- [31] J. E. Robertson, "Theory of Computer Arithmetic Employed in the Design of the New Computer at the University of Illinois/' notes for course "Theory of Computing Machine Design/ 1 June 13-17 > 19^0, University of Michigan. [32] J. E. Robertson, "Problems in the Physical Realization of Speed Independent Circuits/' Switching Circuit Theory and Logical Design (Proceedings of the Second Annual Symposium on Switching Circuit Theory and Logical Design, Detroit, October 17-20, 1961), pp. 106-108, AIEE publication S-13^. [33] J. H. Shelly, "The Decision and Synthesis Problems in Semi-Modular Switching Theory," Ph.D. Thesis, University of Illinois, Urbana, 1959- (Also DCL Report 88, May 29, 1959- ) [3^] R. R. Shively, "Stationary Distributions of Partial Remainders in S-R-T Digital Division," Ph.D. Thesis, University of Illinois, Urbana, Illinois, 1963. (Also DCL Report 136, May 15, 19^3 • ) [35] R. E. Swartwout, "One Method of Designing a Speed-Independent Logic for a Control," Switching Circuit Theory and Logical Design (Proceedings of the Second Annual Symposium on Switching Circuit Theory and Logical Design, Detroit, October 17-20, 1961), pp. 9 i +-105, AIEE Publication S-13^. [36] R. E. Swartwout, "Further Studies in Speed-Independent Logic for a Control," Ph.D. Thesis, University of Illinois, Urbana, Illinois, 1962. (Also DCL Report 130, December 13, 1962. ) [37] S. Takahashi, "Separate Carry Storage Adders," DCL Report 97, March 7, i960. [38] A. H. Taub (Chairman), D. B. Gillies, R. E. Meagher, D. E. Muller, R. W. McKay, J. P. Nash, W. J. Poppelbaum, and J. E. Robertson, "On the Design of a Very High-Speed Computer," DCL Report 80, First ed. , October, 1957, Second ed. , April, 1958. [39] A. H. Taub, "Machine Organization with Respect to Problem Types with Particular Emphasis on the Scientific Computer," notes for course "Theory of Computing Machine Design," June 20-2^-, i960, University of Michigan [1+0] D. J. Wheeler, "The Arithmetic Unit," DCL Report 92, August 21, 1959. T^r E RSITYOF«LL.NO.S-URBANA co Tr002 V 171 U7(1965) IWitlll .procMSi)'» ,vlsuo ■11 " Oil 2 088398208