LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAICN 
 
 "510.84 
 
 ho.£|-&0 
 
 *3 
 
 CO 
 
Digitized by the Internet Archive 
 
 in 2013 
 
 http://archive.org/details/notesonuseofpara67wier 
 
UNIVERSITY OF ILLINOIS 
 GRADUATE COLLEGE 
 DIGITAL COMPUTER LABORATORY 
 
 INTERNAL REPORT NO. 67 
 
 NOTES ON THE USE OF PARALLELISM 
 IN ULTRAHIGH SPEED COMPUTER DESIGNS 
 
 BY J. M. WIER 
 
 December 12, 1955 
 
I. Introduction 
 
 In the construction of high speed computational facilities it seems 
 of great importance to produce as much computation speed per tube used as 
 possible. As the tubes themselves are intrinsically very fast, a certain 
 basic speed is attainable with nothing but simple tube circuits in more or 
 less conventional forms. This speed may be improved upon to a certain 
 extent by making the circuits somewhat more powerful, by making some effort 
 to minimize circuit capacities and by using tubes with higher figures of 
 merit. This simple increase in speed seems to saturate for normal direct 
 coupled circuits at a point where basic transfers take place in the region 
 of 50 to 100 im^sec. It should be stressed that even this speed is 
 attained with some effort. 
 
 In some circuits it is found to be possible to reduce the operation 
 times marginally by increasing the number or power of the driving tubes 
 considerably. This is not a linear process by a large margin as a result 
 of which, doubling the number of tubes changes the speed by far less than 
 a factor of two. This is evidently not too satisfactory as a solution, for 
 in nearly all cases the doubling of the tube count would enable one almost 
 to double the effective computation speed were these tubes used in two like 
 circuits. Thus, after the point has been reached where a given number of 
 added tubes does not add a proportional increase in speed, it would seem 
 that parallelism should be followed where this leads to more or less propor- 
 tional increases in overall speed for the added expenditure of tubes . Of 
 course, even this process will saturate eventually for a few problems and 
 still later for all sufficiently complex calculations. The computer which 
 has very high individual circuit speeds does have one field in which it is 
 absolutely supreme. This field includes all problems where a sequence of 
 very simple operations in which not even the least effort in the parallel 
 direction is possible, must be performed and the result of each of the 
 steps is required before the next step can proceed . This problem will 
 then be soluble just as rapidly on a serial machine as on a parallel one. 
 As is fairly extensively recognized, the class of such problems is rather 
 restricted . 
 
 -1- 
 
The use of increased parallelism may be shown to be of great use 
 in the performance of certain arithmetic and transfer operations . 
 
 It has been shown that it is possible essentially to parallel 
 the carry chain so that the carry sequence is at least partially calculated 
 in parallel. This method reduces the carry time to something like y-r— th 
 of the normal time for expenditure of about as many more tubes as originally 
 used in the adder. With this scheme plus a speeding up of the circuits 
 which pass straight through the adder it has been found to be feasible to 
 produce an adder which settles down to the correct solution in a time in 
 the neighborhood of 0.1 microsecond after an input number is changed. 
 
 By employing a multiplicity of adders, further effective increases 
 in the computation speed result. By using n adders with a settle down time 
 T, microseconds plus a transfer time of T microseconds, an m digit multi- 
 plication may be completed in about 
 
 T = T, (-2-) + T (^-) = -2-(T, + T ) microseconds. 
 l v n ' 2^n n v 1 2 
 
 Some difficulty will be found in decreasing or even approaching the speeds 
 achieved by parallelism if more serial methods are used, even with twice 
 the expenditure of tubes . It could be argued that this might be further 
 increased by using faster circuitry in this place, but this same additional 
 expenditure of equipment may more profitably be used to further increase 
 the parallelism. 
 
 II. Discussion of some parallel circuits. 
 
 In order to produce increased speed in the registers by parallelism, 
 two courses of action may be taken. One of these is to allow the transfer 
 of information between more than one pair of registers at a time. This 
 will allow for moving more than one number at a time when necessary. A 
 second method of procedure utilizes a switching plus gating process for 
 transferring information rather than a gating procedure alone. In the case 
 of the production of shifting facilities, this process is remarkably use- 
 ful. By utilizing a multiple depth shift the time necessary to perform a 
 
 -2- 
 
shift of n places "becomes essentially independent of n. The system operates 
 by arranging a set of binary switches in series. The method is illustrated 
 in Figure 1 for four digits. Provisions are made in this logical arrange- 
 ment for a cyclic left shift of up to three places . The two f lipf lops on 
 the left hold the binary number which specifies the number of shifts to he 
 performed . 
 
 ourrurs aptE:^ a 
 
 SM I FT *- 
 
 Outputs apt£E- a*. 
 
 POC.fi. IS\I_ITY Of- ^M 
 
 l op s) 
 
 Figure 1. Multiple Depth Shift 
 
 The circuit of Figure 1 utilizes a multitude of binary switches 
 of the kind shown in Figure 2 . 
 
 -3- 
 

 INPUT & 
 
 Figure 2 . Binary Switch for Multiple Depth Shift 
 
 It is evident that the output shown in Figure 2 is equal to input 1 
 B if F = and it is equal to input A if F = 1. At each level the inputs 
 are switched so that they appear at the output displaced by 2 places if 
 the shift digit in the 2 th place of the shift number is a 1 and is not 
 displaced at all if the shift digit is a 0. By passing through a series 
 of such circuits the outputs become displaced with respect to the input by 
 a number of places equal to the shift number. The time of transit through 
 the circuit is practically independent of the number of shifts executed 
 since the number of elements through which the signals pass in any case is 
 the same. In the circuit illustrated in Figure 1 the output signals pass 
 through a series of four logical elements in coming from the input. 
 
 The system shown in Figure 1 when converted to its circuit equi- 
 valent in a direct fashion, requires about three tubes for each two input- 
 one output switch. Thus, in order to produce a maximum possible cyclic 
 shift of (2 - l) places in a shift register of m digits, a total of 3mn 
 double sided tubes are required. Such a shifting device for a kO digit 
 shifting register with a maximum shift of 63 places, as in the Illiac 
 would be quite expensive. Here n = 6 and m = 40 so the tube count would 
 be (3) (6) (ho) = 720 tubes. Fortunately a better method exists for pro- 
 ducing this two input switch which requires but one tube instead of three. 
 This circuit is identical to the complement gate in the Illiac in which 
 the two inputs to be switched appear as high impedance signals at the 
 grids, the switching signals are applied to the plates, and the low impedance 
 
 -h- 
 
output appears at the common cathode of a double triode . The circuit enables 
 the above mentioned shifting circuit to be constructed for mn = (h0)(6) = 2^0 
 tubes. The switch is shown in Figure 3* 
 
 ivJpi/T a 
 
 9 O 
 
 INPUT S 
 
 Figure 3- Switch Circuit 
 
 A further economy may be realized if the cyclic shift is not 
 
 required for then the end around carry portion may be omitted. The number 
 
 of triode sections required in this case for a maximum shift of (2 - l) 
 
 n-1 . 
 places in an m digit register is 2mn - Z 2 . An evaluation for m = ^0 and 
 
 r . i=0 
 
 n = o gives 
 
 2(40)(6) -Z2 1 = 480 - 63 = 1+17 triodes. 
 1=0 
 
 This is 208.5 double triodes, a saving of 31-5 tubes from the complete case 
 indicated above. 
 
 The speed of operation of a shift using the above process is de- 
 pendent upon the time required to set up the switch signals and to allow 
 the switched signals to pass through the multilayer switch. Since these 
 all may be rather fast with fairly low impedances, it is conceivable that 
 an arbitrary length shift can be completed in something like one micro- 
 second using individual circuits of very little more power consumption than 
 those used in the Illiac . It will be found to be very difficult to construct 
 
 -5- 
 
a similarly fast device out of so called "high speed" circuits operating 
 
 in a more serial mode in which one shift at a time is executed, even if the 
 
 tubes used in the high speed shifting circuit be used in high speed circuitry. 
 
 The switch shift system will also be much less critical in its design 
 
 since it need not operate so near to the limits of tube (or transistor) speed, 
 
 A second place where great gains in speed may be obtained through 
 the selective use of parallelism is in the carry chain in the adder. Since 
 the inherent speed of the adder is limited by the time taken to complete a 
 carry, an improvement in this time will directly affect the time to do an 
 addition. The carry into any stage of a parallel adder is usually produced 
 by a recursive relation involving all of the lesser significant stages, so 
 the time for producing the proper carry into all stages is that time neces- 
 sary to allow the carry signals to pass down the carry chain. In the worst 
 case this may require a number of sequential operations equal to the number 
 of digits in a word. Such a sequential process is not fundamentally neces- 
 sary but is used to decrease the circuit element count. At the expense of 
 increasing the element count, the carry function into any stage of the 
 adder may be generated in a more parallel fashion. Since the carry into a 
 given stage is present if a carry is originated at some stage toward the 
 least significant end of the adder and if there is at least a single 1 
 at each intervening stage, a separate high speed carry generation circuit 
 may be made for each stage. Unfortunately it becomes impractical to carry 
 this process to completion for any large adder because of the huge number 
 of switching elements required. A practical compromise does exist however. 
 If the complete adder be divided into sections of several digits, all 6T 
 the carries within each of these smaller sections may be generated by the 
 parallel method. Then a sequential carry process may be used which pro 7 
 pagates the carries from section to section only. It has been found that 
 a high speed carry circuit for 10 digits may be constructed for about 75 
 tubes. In the case of the Illiac, four of these would be required. The 
 maximum length of carry chain then would be through a total of eight 
 sequential logical elements, as compared to eighty in the present system. 
 
 -6- 
 
As the sequence is shorter, all, or nearly all, amplifiers may be eliminated 
 to yield a really high speed carry which is not slowed down by amplifier 
 stages. Estimates of speed indicates that with fairly standard circuits, 
 the carry might be completed in 0.1 microsecond. By redesigning the adder 
 proper the propagation time straight through the adder may be made much 
 smaller than in the Illiac so that an adder of 40 digits could complete 
 an addition in about 0.1 microseconds, the propagation through the adder 
 yielding a negligible increment of time increase over the bare carry time. 
 
 In the process of performing a multiplication some considerable ad- 
 vantage may be taken of parallel adders to further increase the speed of 
 this process. If it be desired to multiply by n digits of the multiplier 
 at a time, n adders may be used. They are arranged as shown in Figure ^t- 
 for a four adder array. 
 
 t£A>4P IN 
 
 2. iE£. ptAir 
 
 2.~ 59 |EE- pWrlT 
 
 ICAtJt? \U PKPLA«rD, 
 
 <ve Pi_Ace o ►(akJo 
 
 zr* 7 \€x~ 
 
 PlfelT 
 
 KAKJp IkJ PisPUk<«P, 
 
 LEpf 
 
 1 St APDER. 
 
 1 ST PAK-Tlkv. 
 
 £ku? Apc^?. 
 
 £ 
 
 NO PMiTIAl. 
 
 peoo J cr 
 
 ■?. klp> AvPp^R. 
 
 ~l 
 
 ?Rp Factum. 
 
 PEjOPJCT 
 
 <4Tv\ APPtK. 
 
 Figure U« Multiplication Using Four Adders 
 
 ■7- 
 
The time taken to do a k by kO bit multiplication in this way is 
 approximately 0.11 microsecond if addition takes 0.1 sec. Thus forty 
 digit multiplication may he carried out in (10)(U bit multiplication time) 
 + (10 ) (shift time). This means that with a 0.1 microsecond shift time the 
 multiplication time will be about 1.1 + 1 = 2.1 microseconds. This is in 
 the desired speed range. It may be shown that this four adder circuit will 
 cost in the neighborhood of 2500 double triodes . It is rather doubtful that 
 the given speed could be approached in even an approximate manner with more 
 or less serial modes of very high speed circuitry, even with an infinite 
 expenditure of tubes . It may be that the use of 2500 tubes is not practical, 
 but this system indicates that higher speeds are more readily achievable 
 by the use of increased parallelism than with brute force speed circuitry 
 with the normal amount of parallelism. 
 
 Ill . Summary 
 
 Because of the above considerations it is felt that it is more pro- 
 fitable to expend tubes (or transistors ) to increase the parallelism of a 
 computer rather than to use them. for increasing the individual circuit speeds 
 beyond a certain point. This conclusion is based on the following consid- 
 erations . 
 
 1. When the active elements are not being pushed for all possible 
 speed, it is possible to provide more circuit reliability since not so 
 much effort needs to be expended in cutting tolerance corners to get speed. 
 
 2. After a certain limiting speed is reached the additional 
 expenditure in elements rises much faster than the increase in speed. This 
 point of diminishing returns seems to occur, for vacuum tube circuits, when 
 the gating time has been reduced to the 50 to 100 milli-microsecond region. 
 
 3- When high circuit speeds are used, problems of circuit stability 
 and propagation time within the machine become more unmanageable, often 
 causing it to be necessary to employ excessively complex and expensive 
 construction techniques to secure any degree of success. 
 
 -8- 
 
As a result of this situation, it is proposed that circuit speeds 
 be increased to the point where further increases sacrifice too much in 
 tolerance and components to he profitable. Beyond this point some of the 
 many methods of parallelism may he employed to gain further effective com- 
 putation speed. 
 
 
 \?7)-T^ 
 
 J. M. Wier 
 
 JMW/MGE 
 
 1 
 
 R. K. Richards, "Arithmetic Operations in Digital Computers", 
 D. VanNostrand, 1955- 
 
 -9-