LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN "510.84 ho.£|-&0 *3 CO Digitized by the Internet Archive in 2013 http://archive.org/details/notesonuseofpara67wier UNIVERSITY OF ILLINOIS GRADUATE COLLEGE DIGITAL COMPUTER LABORATORY INTERNAL REPORT NO. 67 NOTES ON THE USE OF PARALLELISM IN ULTRAHIGH SPEED COMPUTER DESIGNS BY J. M. WIER December 12, 1955 I. Introduction In the construction of high speed computational facilities it seems of great importance to produce as much computation speed per tube used as possible. As the tubes themselves are intrinsically very fast, a certain basic speed is attainable with nothing but simple tube circuits in more or less conventional forms. This speed may be improved upon to a certain extent by making the circuits somewhat more powerful, by making some effort to minimize circuit capacities and by using tubes with higher figures of merit. This simple increase in speed seems to saturate for normal direct coupled circuits at a point where basic transfers take place in the region of 50 to 100 im^sec. It should be stressed that even this speed is attained with some effort. In some circuits it is found to be possible to reduce the operation times marginally by increasing the number or power of the driving tubes considerably. This is not a linear process by a large margin as a result of which, doubling the number of tubes changes the speed by far less than a factor of two. This is evidently not too satisfactory as a solution, for in nearly all cases the doubling of the tube count would enable one almost to double the effective computation speed were these tubes used in two like circuits. Thus, after the point has been reached where a given number of added tubes does not add a proportional increase in speed, it would seem that parallelism should be followed where this leads to more or less propor- tional increases in overall speed for the added expenditure of tubes . Of course, even this process will saturate eventually for a few problems and still later for all sufficiently complex calculations. The computer which has very high individual circuit speeds does have one field in which it is absolutely supreme. This field includes all problems where a sequence of very simple operations in which not even the least effort in the parallel direction is possible, must be performed and the result of each of the steps is required before the next step can proceed . This problem will then be soluble just as rapidly on a serial machine as on a parallel one. As is fairly extensively recognized, the class of such problems is rather restricted . -1- The use of increased parallelism may be shown to be of great use in the performance of certain arithmetic and transfer operations . It has been shown that it is possible essentially to parallel the carry chain so that the carry sequence is at least partially calculated in parallel. This method reduces the carry time to something like y-r— th of the normal time for expenditure of about as many more tubes as originally used in the adder. With this scheme plus a speeding up of the circuits which pass straight through the adder it has been found to be feasible to produce an adder which settles down to the correct solution in a time in the neighborhood of 0.1 microsecond after an input number is changed. By employing a multiplicity of adders, further effective increases in the computation speed result. By using n adders with a settle down time T, microseconds plus a transfer time of T microseconds, an m digit multi- plication may be completed in about T = T, (-2-) + T (^-) = -2-(T, + T ) microseconds. l v n ' 2^n n v 1 2 Some difficulty will be found in decreasing or even approaching the speeds achieved by parallelism if more serial methods are used, even with twice the expenditure of tubes . It could be argued that this might be further increased by using faster circuitry in this place, but this same additional expenditure of equipment may more profitably be used to further increase the parallelism. II. Discussion of some parallel circuits. In order to produce increased speed in the registers by parallelism, two courses of action may be taken. One of these is to allow the transfer of information between more than one pair of registers at a time. This will allow for moving more than one number at a time when necessary. A second method of procedure utilizes a switching plus gating process for transferring information rather than a gating procedure alone. In the case of the production of shifting facilities, this process is remarkably use- ful. By utilizing a multiple depth shift the time necessary to perform a -2- shift of n places "becomes essentially independent of n. The system operates by arranging a set of binary switches in series. The method is illustrated in Figure 1 for four digits. Provisions are made in this logical arrange- ment for a cyclic left shift of up to three places . The two f lipf lops on the left hold the binary number which specifies the number of shifts to he performed . ourrurs aptE:^ a SM I FT *- Outputs apt£E- a*. POC.fi. IS\I_ITY Of- ^M l op s) Figure 1. Multiple Depth Shift The circuit of Figure 1 utilizes a multitude of binary switches of the kind shown in Figure 2 . -3- INPUT & Figure 2 . Binary Switch for Multiple Depth Shift It is evident that the output shown in Figure 2 is equal to input 1 B if F = and it is equal to input A if F = 1. At each level the inputs are switched so that they appear at the output displaced by 2 places if the shift digit in the 2 th place of the shift number is a 1 and is not displaced at all if the shift digit is a 0. By passing through a series of such circuits the outputs become displaced with respect to the input by a number of places equal to the shift number. The time of transit through the circuit is practically independent of the number of shifts executed since the number of elements through which the signals pass in any case is the same. In the circuit illustrated in Figure 1 the output signals pass through a series of four logical elements in coming from the input. The system shown in Figure 1 when converted to its circuit equi- valent in a direct fashion, requires about three tubes for each two input- one output switch. Thus, in order to produce a maximum possible cyclic shift of (2 - l) places in a shift register of m digits, a total of 3mn double sided tubes are required. Such a shifting device for a kO digit shifting register with a maximum shift of 63 places, as in the Illiac would be quite expensive. Here n = 6 and m = 40 so the tube count would be (3) (6) (ho) = 720 tubes. Fortunately a better method exists for pro- ducing this two input switch which requires but one tube instead of three. This circuit is identical to the complement gate in the Illiac in which the two inputs to be switched appear as high impedance signals at the grids, the switching signals are applied to the plates, and the low impedance -h- output appears at the common cathode of a double triode . The circuit enables the above mentioned shifting circuit to be constructed for mn = (h0)(6) = 2^0 tubes. The switch is shown in Figure 3* ivJpi/T a 9 O INPUT S Figure 3- Switch Circuit A further economy may be realized if the cyclic shift is not required for then the end around carry portion may be omitted. The number of triode sections required in this case for a maximum shift of (2 - l) n-1 . places in an m digit register is 2mn - Z 2 . An evaluation for m = ^0 and r . i=0 n = o gives 2(40)(6) -Z2 1 = 480 - 63 = 1+17 triodes. 1=0 This is 208.5 double triodes, a saving of 31-5 tubes from the complete case indicated above. The speed of operation of a shift using the above process is de- pendent upon the time required to set up the switch signals and to allow the switched signals to pass through the multilayer switch. Since these all may be rather fast with fairly low impedances, it is conceivable that an arbitrary length shift can be completed in something like one micro- second using individual circuits of very little more power consumption than those used in the Illiac . It will be found to be very difficult to construct -5- a similarly fast device out of so called "high speed" circuits operating in a more serial mode in which one shift at a time is executed, even if the tubes used in the high speed shifting circuit be used in high speed circuitry. The switch shift system will also be much less critical in its design since it need not operate so near to the limits of tube (or transistor) speed, A second place where great gains in speed may be obtained through the selective use of parallelism is in the carry chain in the adder. Since the inherent speed of the adder is limited by the time taken to complete a carry, an improvement in this time will directly affect the time to do an addition. The carry into any stage of a parallel adder is usually produced by a recursive relation involving all of the lesser significant stages, so the time for producing the proper carry into all stages is that time neces- sary to allow the carry signals to pass down the carry chain. In the worst case this may require a number of sequential operations equal to the number of digits in a word. Such a sequential process is not fundamentally neces- sary but is used to decrease the circuit element count. At the expense of increasing the element count, the carry function into any stage of the adder may be generated in a more parallel fashion. Since the carry into a given stage is present if a carry is originated at some stage toward the least significant end of the adder and if there is at least a single 1 at each intervening stage, a separate high speed carry generation circuit may be made for each stage. Unfortunately it becomes impractical to carry this process to completion for any large adder because of the huge number of switching elements required. A practical compromise does exist however. If the complete adder be divided into sections of several digits, all 6T the carries within each of these smaller sections may be generated by the parallel method. Then a sequential carry process may be used which pro 7 pagates the carries from section to section only. It has been found that a high speed carry circuit for 10 digits may be constructed for about 75 tubes. In the case of the Illiac, four of these would be required. The maximum length of carry chain then would be through a total of eight sequential logical elements, as compared to eighty in the present system. -6- As the sequence is shorter, all, or nearly all, amplifiers may be eliminated to yield a really high speed carry which is not slowed down by amplifier stages. Estimates of speed indicates that with fairly standard circuits, the carry might be completed in 0.1 microsecond. By redesigning the adder proper the propagation time straight through the adder may be made much smaller than in the Illiac so that an adder of 40 digits could complete an addition in about 0.1 microseconds, the propagation through the adder yielding a negligible increment of time increase over the bare carry time. In the process of performing a multiplication some considerable ad- vantage may be taken of parallel adders to further increase the speed of this process. If it be desired to multiply by n digits of the multiplier at a time, n adders may be used. They are arranged as shown in Figure ^t- for a four adder array. t£A>4P IN 2. iE£. ptAir 2.~ 59 |EE- pWrlT ICAtJt? \U PKPLA«rD, AvPp^R. ~l ?Rp Factum. PEjOPJCT <4Tv\ APPtK. Figure U« Multiplication Using Four Adders ■7- The time taken to do a k by kO bit multiplication in this way is approximately 0.11 microsecond if addition takes 0.1 sec. Thus forty digit multiplication may he carried out in (10)(U bit multiplication time) + (10 ) (shift time). This means that with a 0.1 microsecond shift time the multiplication time will be about 1.1 + 1 = 2.1 microseconds. This is in the desired speed range. It may be shown that this four adder circuit will cost in the neighborhood of 2500 double triodes . It is rather doubtful that the given speed could be approached in even an approximate manner with more or less serial modes of very high speed circuitry, even with an infinite expenditure of tubes . It may be that the use of 2500 tubes is not practical, but this system indicates that higher speeds are more readily achievable by the use of increased parallelism than with brute force speed circuitry with the normal amount of parallelism. Ill . Summary Because of the above considerations it is felt that it is more pro- fitable to expend tubes (or transistors ) to increase the parallelism of a computer rather than to use them. for increasing the individual circuit speeds beyond a certain point. This conclusion is based on the following consid- erations . 1. When the active elements are not being pushed for all possible speed, it is possible to provide more circuit reliability since not so much effort needs to be expended in cutting tolerance corners to get speed. 2. After a certain limiting speed is reached the additional expenditure in elements rises much faster than the increase in speed. This point of diminishing returns seems to occur, for vacuum tube circuits, when the gating time has been reduced to the 50 to 100 milli-microsecond region. 3- When high circuit speeds are used, problems of circuit stability and propagation time within the machine become more unmanageable, often causing it to be necessary to employ excessively complex and expensive construction techniques to secure any degree of success. -8- As a result of this situation, it is proposed that circuit speeds be increased to the point where further increases sacrifice too much in tolerance and components to he profitable. Beyond this point some of the many methods of parallelism may he employed to gain further effective com- putation speed. \?7)-T^ J. M. Wier JMW/MGE 1 R. K. Richards, "Arithmetic Operations in Digital Computers", D. VanNostrand, 1955- -9-