LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 5IQ84 TfiGr "0.770-775 cop.^L ^gi,^^^^--. 1 he person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN 4 L161 — O-1096 .Z(tA' Report No. UIUCDCS-R-75-775 UJ7S NSF-0CA-DCR7 3-07980 A02-000016 COMBINATIONAL CIRCUIT SYNTHESIS WITH TIME AND COMPONENT BOUNDS by S. C. Chen and D. J. Kuck December 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Hit LIBRARY OR THi UNIVERSITY OF ILLINOIS *t "drawa CHAMPAIGN Report No. UIUCDCS-R-75-775 COMBINATIONAL CIRCUIT SYNTHESIS WITH TIME AND COMPONENT BONDS by S. C. Chen and D. J. Kuck December 1975 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 * This work was supported in part by the National Science Foundation under Grant No. US NSF DCR73-07980 A02. Digitized by the Internet Archive in 2013 http://archive.org/details/combinationalcir775chen 1. Introduction This paper discusses some aspects of the relationship between sequential circuits and combinational circuits. Circuit design in both areas has been studied extensively in the past. Past studies have included efforts to reduce the time and gates required to compute various functions. This paper establishes upper bounds on time and gates, and also provides a systematic procedure for transforming a sequential circuit design into a combinational circuit. The upper bounds on time vhich we prove are quite good, relative to the best known lower bounds in most cases. We also give gate bounds, which have often eluded detailed analysis in the past. Our gate bounds seem quite sharp relative to the actual numbers found in real logic design examples . The algorithms we have for transforming sequential circuit designs into combinational ones yield circuits which meet the above-mentioned gate and time bounds. In this sense, we present a uniform design procedure for the realization of any linear sequential machine in combinational circuit form. The advantage of this is that one can often specify the behavior of some desired function quite easily as a sequential circuit. It is somewhat more difficult to translate such a specification into a faster combinational circuit form. A classic example is the ease with which a bit serial adder is specified in sequential form. On the other hand, the design of combinational parallel adders (with various lookahead schemes) occupied many logic designers for some years in the 1950s. The automatic design of a fast parallel combinational adder derived from a bit serial specification is one example of the use of our method. Not all interesting logic design problems are presented in a sequential form that is linear. As ve shall see later, multiplication is an example. While some nonlinear cases can he linearized mathematically, ve shall discuss another approach. We will show how nonlinear logic circuits can he used to remove the nonlinearity in the sequential specifica- tion. Then, in terms of elements which contain the nonlinearities, we obtain a linear system at a higher level. Our method can then be applied in a straightforward way. An important question in modern, practical logic design is what to put in one integrated circuit package and then how to synthesize useful circuits using such packages. One of the methods we present deals with what can be regarded as logic design at the integrated circuit package level. We show what logic should be contained in a package and then give a method for interconnecting packages. Again our discussion is centered on transforming given sequential logic specifications into combinational logic in the form of packages. This is closely related to the subject of the previous paragraph in the sense that nonlinear logic functions can often be hidden in integrated circuit packages, leaving us with a linear problem at a higher level. Throughout the paper we illustrate our methods with examples giving gate and time bound coefficients for several practically useful logic design problems including adders, multipliers, and ones' position counters . The techniques described in this paper are variations on our earlier efforts to design fast parallel operation computers [ 1 ] [ 2 ] . There our basic units were adders and multipliers which operated on whole floating-point numbers, while here we are dealing with logic design at lower levels. In this paper we deal with operations on bits and bytes at the gate and integrated circuit package level. It is important to notice that mathematically, precisely the same ideas and algorithms are used at all levels; only the details of the technology change. Thus we feel that in attempts to automate the design of general purpose or special purpose machines, one set of underlying ideas may be of general use. The following definitions and assumptions will hold throughout the paper. An atom is a constant or variable denoted by a lower case letter. In some parts of the paper we will deal with Boolean atoms (which have value or l) and in other parts we will deal with arithmetic atoms (which represent binary numbers). A dyadic Boolean operator is either a logical or or a logical and . A dyadic arithmetic operator is either an addition or multiplication operator. We denote these by + and • respectively, in either case. The context will make our meaning clear when necessary, and in some cases the same result will hold in either the Boolean or the arithmetic case. Except as noted in the paper, we assume that all Boolean nots and arithmetic subtractions are distributed down to the level of atoms. In the arithmetic case, this is discussed in [ 3], while in the Boolean case a similar procedure may be carried out using DeMorgan's Laws. We do this without loss of generality to simplify our discussion. An expression (Boolean or arithmetic) is a well-formed string consisting of atoms and operators and is denoted by an upper case letter. We write E, for example, to denote an expression E containing e atoms. The distinction "between Boolean and arithmetic atoms and expressions will be clear by the context of our discussion. We assume throughout the paper that and , or and not gates each have one gate delay of unit time. We assume that all and and or_ gates have fan-in 2 and fan-out f . By dealing with such stylized gates we are able to compare various designs in elementary terms. If one assumes more complex gates with higher fan-ins, our gate and time upper bounds can obviously be reduced, in general. Another way in which the coefficients in our bounds can be uniformly improved is by ignoring the time required to complement signals. Many circuit families have gates in which both true and complemented outputs are available with no time or cost penalty. To make our bounds conservative and as widely useful as possible, we have not taken advantage of any such features . We emphasize the fact that in practice fan-out is usually greater than fan-in, but fan-out delays may be nonnegligible. We account for fan- out delays and gates in all of our bounds. Thus our results represent a more refined treatment than is usually found in abstract bounds of this type which often ignore fan-out limitations. We use the notation T [E] to denote the number of gate delays (i in a circuit which implements expression E using G gates. Similarly, we use the notation T [E] to denote the number of processor delays required to compute E using P processors. Throughout the paper we use log x to denote log^x. 2. Combinational Circuits In this section we discuss gate and time "bounds for combinational logic circuits. We give bounds for gates with fan-out f and fan-in 2. After giving some elementary fan-out and combinational fan-in bounds we present an overall circuit bound. This is expressed in terms of the number of inputs and outputs, and could, for example, be used to bound the gates and time needed for an integrated circuit package. Throughout the paper, we assume that signals appear from some external source and are returned to some external destination after our operations on them. Effectively we are ignoring registers from which signals come and to which they are returned. Thus we can count gates and time delays and compose them in a uniform way, without ad hoc accounting procedures at the source and destination of our signals. Our first lemma concerns the fan-out of signals and will be used extensively later. Lemma I An e way fan-out can be accomplished using gates with fan-out of f > 2 in T G 1 flog f e] - 1 with ° of e atoms can be realized using gates of fan-in 2 in o 1 + 2d + ("log el if d < 7 log e T [E] < JUlog el otherwise, with 3 e-1 if d < — log e G[El << »2(e-l) otherwise, where d is the depth of parenthesis nesting in E. Stage Stage 1 2 Signal Destinations Figure 1 Signal Fan-out The proof of this lemma for d < — log e is found in [ 3 ] . In most practical expressions, the depth of parenthesis nesting is small, so 3 this provides the best bound. However, if d > - log e, we use the second half of the lemma which is proved in [ k ] s where it is also shown that this may be extended to T,jE] < 31og n with G[E] < 2.5e. We have found that for practical purposes a low gate bound is more important than a low time bound, however. In much of the following we will use Lemma 2, assuming 3 for simplicity that d < — log e. Next we define a combinational circuit and then give overall gate and time bounds for such circuits. Definition 1 A combinational circuit C is defined by 1) A set of inputs x. , 1 <_ i <_ r. 2) A set of outputs y , 1 <_ j <_ s , where y. is d J defined by an output expression E. of e. atoms (representing inputs or complements of inputs) and with parenthesis nesting depth d.. J 3) e = max{e.} is the maximum number of atoms contained in E . , 1 <_ j j s J J M n = I e. is the total number of atoms in all E. s. j=l J J 5) d = max{d.} is the maximum parenthesis nesting depth among all of the output expressions E.. It is clear that n >_ s , and we assume that n >_ r, i.e., each input is used in at least one output expression. Theorem 1 Any combinational circuit C can be realized using gates of fan-in 2 and fan-out f in T G < [log el + 2(d + riog f nl) with G < (l+^)n + (l-"jrj-)r " a . Proof First, consider the fan-out of the inputs. Let the i-th input be used e. times in output expressions. Since we may need to complement the input, we first fan it out to e. + 1 places (the extra one for complementation) By Lemma 1 we need (since we assume each input atom is used at least once) T G1 < riog f (n-s+l)l - 1 < riog nl - 1 with r e. Gl < I f-1 f-1 1=1 Now we can complement each input variable in T =1 G2 with G2 <_ r, and fan the complemented variable out, each to at most e. places. Thus we have T G3 < riog f (n-s+l)l - 1 < [log f nl - 1 r e.-l G3 < Z — — = £z ^- J - . _ f-1 f-1 i=l 10 Next, we consider the fan-in of the atoms to form the output variables according to the output expressions E.. By Lemma 2 we J J have (assuming d. < — log e.) "GU with r Gk < 1 + 2d + flog e] n+log el Z e . -1 = n - s if d < f log e , 1 < J < s (J *- J otherwise if d < f log e , 1 < j < s V I 2(e.-l) = 2(n-s) otherwise. Thus (assuming d. < — log e., 1 £ j £ s) we have a total of T_ < Tlog el + 2(d + riog-nl) b I with = (l +f -^)n + (l-^)r -s. Q.E.D. Example 1 Suppose we have a 16 pin integrated circuit package which contains only combinational logic. Assume we can use 7 pins for inputs and 7 pins for outputs, i.e., r = s = "J. Assume that we have an average Of k atoms per output expression so n = k.7 = 28, the maximum number of atoms per expression is e = 8, and d = 2. Thus a typical output expression may be of the form y . = (x 1+ x 2 )*(x- 3+ x 5 ) . 11 Let us use circuits with fan-in 2 and fan-out 8. Now for any possible combinational logic with the above characteristics, a package can be designed such that the total package time in gate delays is T Q £ flog el + 2(d + [log f nl) = [log 81 + 2(2+ Flogg281) = 3 + 2(U) = 11 . The total number of gates in any such package is at most G < (1+|)28 + (1-4)7 - T = 35 • Example 2 Suppose we have a h& pin package for large-scale integrated circuits. Let r = s = 23, n = 6*23 = 138, e = l6, d = 3, and f = 8. Now any possible combinational circuit can be realized with T Q < Tlog 161 + 2(3 +riog Q 1381) = k + 2(6) = 16 and G < (1+|)138 + (1-4)23 - 23 = ITT . Thus we see that for realistic assumptions about packages and logical expressions, we obtain gate and time bounds that are of practical interest. 12 3. Sequential Circuits In this section we discuss methods of transforming sequential circuits into combinational ones and give time "bounds and component "bounds on the resulting circuits. Definition 2 A sequential circuit S is defined at time t by 1) A set of inputs x.(t), 1 <_ i <_ r = r + r . We call the x . ( t ) , 1 < i < r. , the external inputs , and the x.(t),r + 1 1. i !l r ? the feedback inputs . 2) A set of outputs y.(t), 1 < j <_s = s + s, where for any logical functions f., J y (t) = f [x (t), x (t), ..., x (t)] = f [a^Ct), ..., x r (t), y s +1 (t-m 1 ), . .., y g (t-m r )] as shown in Figure 2 . We call the y.(t), 1 < j £ s , the external outputs and the y.(t), s +l£j<_s, the feedback outputs . Note that r > s . Each output is defined by an output expression E, of e. atoms (repre- J J J senting inputs or their complements). Expression E. has parenthesis nesting J depth d . . J 3) e = {e l5 e } where e, = max {e.} and e = max (e.} i^l 5 ! J s^l^^s J Clock Inputs 1 Outputs 13 . L 1 • • Combinational Logic • s i r l+l s i + i * • • r +r 1 2 S l +S 2 m r 2 • m. J • m. l • m i Fig. 2 Sequential Circuit lU h) n = n + n where S l n_ = Z e . and n^ = Z e . . 1 j-l ° 2 j=s 1+ l J 5) d = {d ,dp} where d = max {d.} and d = max {d.} l are derived from any logical functions of i 1.1 — — d the inputs x. (t), ..., x (t) 1 r Definition k An m-th order linear recurrence system of n equations R, is defined by x. = for i < 0, i — and i-1 x.=c.+ E a..x. for 1 < i < n , i i . . ij j _ — j=i-m 15 where 1 < m < n, and the c. and a. . are constants. We assume that n and m - 1 ij are powers of 2. If either is not, we choose the next higher power of 2 and apply our hounds and algorithms directly. The solution of this recurrence is the set {x. |l <_ i <_ n} . The following lemma forms the basis of much of our subsequent work. We will use it to count gates as well as higher level components such as integrated circuit packages or whole processors. Thus we state the lemma in terms of operations 6 which can be interpreted as logical or and and or as arithmetic addition and multiplication. When we deal with fan-out, at the gate level corresponds to gates while at the processor level it refers to registers or demultiplexers. Lemma 3 Any m-th order linear recurrence R can be solved in / 5 1 12 T <_ (g- + log m + — log f n)log n - -^(log m + log m) with 9 < || m2 (2-H f -3Y) + m(l+~)"|nlog n + [m 3 (l +f -i T )-m 2 (2+-^ I y) - -^SLyJ n + 2m 2 + (^-) log n where f = 2^, q >_ 1 . Proof Our proof follows the proof of Theorem 2 of [2 ] and a logical circuit can be constructed following Algorithm 2 of [ 2]. First, we consider the time required. The computational 6 delays follow directly from the time bound for solving an R system in Theorem 2 of [ 2 ] . Thus, for the first part of our time bound, we have from Theorem 2 of [?_} 16 1 2 Tl <_ (2+log m) log n - —(log m + log m) . To complete the time bound, ve must consider the fan-out time required by Theorem 2 of [ 2 J ■ Such times were regarded as negligible compared to arithmetic operation times in [ 2]. The solution of an R system is generated in log n iterations. It may be seen from Figure k of [2] that on iteration i = log k, we perform at most (— + m - l) way fan-outs. Thus the fan-out time on iteration i is [log (— + m - l)] - 1, by Lemma 1. Summing over all iterations, for k = 2, h, 8, ..., n, we have (since f = 2 q > 2), T2 < ([log f 2f] - 1) + ([log f Uf] - l) + ... + ([log f (|+ m - 1)] - l) and grouping terms, we get < q(l + 2 + 3 + log n - log 2f + 1 ) = q(l + 2 + 3 + ... + f l0g f n l - 1) <_ q(l + 2 + 3 + ... + log f n) = I log f n(l + log n) = — log n(l + log n) . Thus our total time is Tl + T2 or 12 1 T £ (2 + log m) log n - -^(log m + log m) + — log n(l + log n) 5 1 1 ? = (— + log m + — log f n) log n - p"(log m + log m) . Next, we consider the number of operations required. In the proof of Theorem 2 [ 2], we gave expressions for counting the number of processors in evaluating an R system. Since a tree of n leaves has at most 2n - 1 IT nodes, we can upper bound the number of 6 operations by doubling the processor count from Theorem 2 of [ 2 ] • We choose the worst expression for the processor count on iteration i = log k, namely, expression (2 ) [2], the 2m <_ 2 < n case, sum over all iterations, for keK={2,U,8,...,n} , and multiply by 2 to bound the 6 operations. Thus, ignoring fan-out for the moment , we have a total of 91 < 2E (fl + (£ - 2)(m + 1) " keK L r m I J +m(f - 1) 3=1 r m + (m + 1. Ij + m(f - m) } , where K = {2,U,8,...,n} . By rearranging terms, we have = 2Z { keK 1 +(|- l)(m + 1) m 3=1 2) m(m + 1)(|- 1) + m(| - l) + m(m + 1)(|- m)} Wow summing on j gives = 21 {[f(m + l) - m] aJfal + (S. , 2 ) m(m + l)(| keK * ^ k 2 1) + f^(m + 2) (m + m + m)} 3 2 m _ oy r m k ( m - m )n ( m + m ) 3m J - - dL I- - + — - + — n - keK ^ ^ k ^ 2 - 2m } < [- m (2n - 2) + (m J - m)n + (m + m)nlog n] p O p = (m + m)nlog n + (m - 2m - m)n + 2m 18 As is discussed in [ 2 ] » the trees we are evaluating are of a special form with • operations at the leaf nodes and + operations elsewhere. The above sum can be used as an exact count of * operations. But since the trees are somewhat sparse, a more refined count reduces the number of + operations. Thus our factor of 2 above is too large. By a straightforward but long argument similar to the above, we can show that the 6 operation count is actually bounded by 61 £ (m 2 + |)n log n + (m 3 - 2m 2 )n + m(2m - l) which we use in the statement of the theorem. Now we consider the number of fan-out 9 operations required. It 2 2 follows from Theorem 2 [ 2] that iteration i requires (m + m)n/k - m fan-outs, each fanning out to at most k/2 + m - 1 destinations. Thus the total number of operations can be computed using Lemma 1 as Q^UfimJn z I(| +m . 2 ) -=L E (f+m - 2) , K= {2.U.8,.. . ,n) f-1 k£K k 2 f_1 keK d Summing , we obtain 92 < 2 (m +m)n f-1 log n + m - 1 m f-1 2n-2 + (m-2) log n 2 m +m 2(f-l) m +m 2(f-l) n log n + -5P3 2 2 m -m -m m log n - 2m log n - m f-1 n - f-1 2 3 2 , m +m , , m -m -m 2 .. < . 1 i n log n + — — ; n + t-tt log n f-1 f-1 Note that at the gate level these 9 operations are gates and are comparable to the gates counted in 81. At the integrated circuit or processor level, these 9 operations correspond to registers or demultiplexers which are 19 generally less costly than the 6 operations of 01. But to be conservative we count each of them as one operation. Thus our total operation count is = 01 + 02, so 2 m , m +m m + 2 + 2Tf^lT n log n 3 2 i 3 ~ 2 , m -m -m |m " 2m + (f-l) J 2 2 n + 2m + (j~r) log n = |[ m 2 (2 + —-) + m(l + ~j-)J n log n + [ m 3 (l + _i_) _ m 2 (2 + ^ . .^ J n + ( |_ } lQg n + ^2 ^ Q.E.D. The following corollary follows directly from Lemma 3 and covers a case of wide practical interest. Corollary 1 Any first order linear recurrence R can be solved in T e £ |(5 + log f n) log n with 1 |(3 + ~[) n log n - (1 + -^-) n + (~-0 log n + 2 . Thus we see that for large fan-outs, we can solve any R system in T., = 0(log n) with G = 0(n log n) . Example 3 The R<8,1> system c. = , i < l — and c. = y. + x. »c. _ , l system as fan-out ranges from 2 to an arbitrarily high number. Corollary 2 Any m-th order linear recurrence R can be solved in (2 + log m)log n - x(log m + log m) < T < (| + log m + -log n)log n - -(log m+loj with (m 2 +|)n log n + (m 3 -2m 2 )n + 0(m 2 log n) <_ <_ j 3m 2 +2m n log n + [2m -3m - ml n + 21og n + 2m Proof The lower bounds follow directly from Tn-, and 91 in the proof of Lemma 3, assuming that fan-out time and 6 count are negligible. The upper bounds follow from Lemma 3 by setting f = 2. Thus we see that for large fan-outs we can solve an R system in T = 0(log m log n) with G = 0(m n log n) . u Definition 5 The k step operation of a sequential circuit S is defined by k pairs of vectors [(x 1 (t), ..., x r (t)), (y 1 (t), ..., y s (t)] for 1 <_ t <_ k. These vectors represent the external inputs and outputs of S at each time step t. 21 a o •H -p cd U CD G i U u 3 on o u bD o •H «H ■P •H a U •H CO V ■5 22 Theorem 2 The k step operation of any linear sequential circuit S can be realized by a combinational circuit such that for large k T Q < |(log f s 2 k)(log s 2 k) + 0(log k) with G < |(m+l) 2 s 2 3 (2 + ~j-) klog s 2 k + 0(k) Proof Our proof is in three parts. First, we set up the A and b arrays of Definition h. Then we evaluate the resulting recurrence system. Finally, we generate the external outputs. The A matrix and b vector components can be generated from the external inputs at any of the k time steps. Thus we have a total of kr inputs to combinational circuit C. which produces as outputs the components of A and b. Since a total of n p atoms are used in generating all" of the feedback outputs of S, there are at most kn,. non-zero components in A and b. The maximum number of atoms in any expression is e Q , the total number of atoms is kn p and the maximum parenthesis depth is d , so we can set up the A and b arrays with C 1 kn 2 , d p > . Next we solve the linear recurrence R . There are a total of ks outputs in k time steps so n = ks . Since the maximum delay is m i time steps with s outputs per time step, the bandwidth of this system is at most (m+l)s - 1. Thus we have a recurrence of the form R . Finally, we generate the external outputs with combinational circuit C . There are a total of kr inputs and ks external outputs. The maximum 23 number of atoms in any output expression is e , the total number of atoms in all output expressions is kn and the maximum depth is d , so we have C 2 . Now we bound the gates and time required for each of these. By Theorem 1 , for CL we have T with Gl 1 \ lQ Z e 2J + 2(d 2 + P^f kn 2l Gl < (1 +^)kn 2 + (1 - 71^)^ - kn 2 = (i -7II )kr i + 7=1 kn 2 By Lemma 3, we can solve R in T G2 < ( f + log(m+l)s 2 + | log f (m+l)s 2 )log ks 2 with G2 < | |"(m+l) 2 s 2 2 (2 + —-) + (m+l)s 2 (l+~j-)l ks 2 log ks 2 + (1 + ■—■) (m+1)- 3 s 2 3 ks 2 + 0((m+l) 2 s 2 2 log ks 2 ) By Theorem 1 we have for C_ T 03 ^ [ log e ll + 2(d l + f Xog f **lV with G3 < (1 + j~) kn 1 + (1 - £—-) kr - ks Combining the above we have a total time of T Q < hog ej + flog e 2 | + 2(d 1 + d g + ^log f taxj + flog f knj) + (f + lo g( m + X ) s 2 + 2 l0g f s 2 k ^ l0g S 2 k ' Thus, for a fixed circuit, as we increase the number of operating time steps k, we have T Q < -(log f s 2 k)(log s 2 k) + 0(log k) . The total gate count is G < |[(m+l) 2 s 2 2 (2 + ~^) + (m+l)s 2 (l + ^-)J s^ log s 2 k + ^-fir^i*^ + n i + ?ir n - s i] k + (1 + fTj-Jdn + I) 3 s 2 h k + 0((n + l) 2 s 2 2 log ks 2 ) . Thus, for any fixed circuit, as k increases we have G < |[(m+l) 2 s 2 2 (2+ f -^-) + (m+1) B 2 (l+~-)l s g klog k s^ + 0(k) or (since m >_ 1 and f >_ 2) G < ~(m+l) 2 s 2 3 (2+^j) klog s 2 k + 0(k) . Q.E.D, Now we turn to the consideration of higher level components as our basic circuit elements. We will define two package types which could be implemented directly using integrated circuits. Our time bounds will be expressed in package delays. The techniques of the previous section could be used to design such packages. Our component bounds will be expressed in terms of the total number of packages required. 25 Our strategy in this case is to decompose a linear recurrence system R into a number of small identical systems. These smaller systems can be solved directly by interconnecting the integrated circuit packages we specify. An algorithm to decompose a large R system has been given in [ 2], [ 5] for arithmetic operations. Here we present the algorithm for logic design and consider only the R case for the sake of easy explanation. The R case is by far the most common one occurring in practical logic design, and our method can be extended to larger m in a straightforward way. Definition 6 We define two types of integrated circuit packages. a) ICL. n . is a package which accepts input atoms c. for R c c l . 1 <_ i <_ n, and a. for 2 <_ i <_ n. It computes the outputs x. for 1 < i < n according to the recurrence relation x =0 x. = c. + a. x. _ . ill l-l For signal input and output it has a total number of pins equal to 3n - 1 times the number of bits per atom. b) IC_ T is a package which may accept input atoms a. and b. for 1 £ i ± n, and c and d. It computes the outputs x. for 1 <_ i < n, according to x. = v.w. + y. z. , i li -11 where either i) v. =a.,w. =c,y. =b. and z. =d, l. , y. = a. and z. = "b. , 1 < i < n. 1 ii ii l i i — — For signal input and output it has a total number of pins of at most 3n + 2 times the number of bits per atom. In general, we denote the total number of integrated circuits in some logical circuit by IC. Example h An IC , has a total of 3*^-1 = 11 signal pins if it is to solve a Boolean recurrence. Suppose we are summing the bits in a l6-bit word and will produce a log l6 = h bit result. Then h bits are required per atom and an arithmetic IC , to solve this problem would need hk signal pins. An IC for Boolean operations requires 3*3+2 = 11 signal pins. An arithmetic package for handling h bit numbers would need a total of UU signal pins. The following algorithm is adapted from [5] (c.f. Ch . k) . It solves any R system by partitioning it into smaller systems. Algorithm 1 Any given first- order linear recurrence R: x = x. = c. + a. x. . , 1 < i < n 111 l-l — — can be solved as follows . Step 1 f i ) ,) For any h > 2, compute — independent recurrence systems Z , n 1 <: J ' : 7"j defined as follows h 27 Z (J) : z ( ^ = , .. ( j } = c. (j) + a. (j) z. n (j) , 1< i <£, 111 i-I — — n where c (J) - c c i " C i+(J-I)h ' i l+tj-ljh . Id) Compute (|T-l) independent recurrence systems ^ 5 2 <_ j £ jj- , defined as follows. Y (J) : y < j) =1 , r (j) = a (}) (J) !,!<„. 1 1 ^1-1 — — From this step we ohtain h elements of the solution of the original system, i.e., x. = z. for 1 < i < h . li — — Step 2 From the results of Step 1, compute the following recurrence system Z h = ° z (j) = (J) (j) (j-D 1<: . ..n h h J \ h — ° — h From this step we obtain another (—"-l) elements of the solution, h i.e., x., = z^' for 2 < j < ~ . jh h — ° — h Step 3 From the results of Steps 1 and 2, compute the remaining elements 28 of the solution using the following n - — - (h-l) independent expressions - Z (J) +V (J) Jj-D X i+(j-l)h " Z i for 1 <_ i < h - 1 and 2 <_ j <_ — . r>s Lemma h Any first-order linear recurrence R can be solved in time ( logn _ } IC - log h ; using a total package count of IC < 6£ + k ^&-£ - 7 — h log h with package types ICL,.. ... and IC TT . n _. for h > 2, R U — Proof It follows directly from the above algorithm that we need one ICL,, _. type package for each Z and Y in Step la and lb. This results K in (2 -l) packages. In Step 3 we use ( -l) packages of type IC U' corresponding to Definition 6b, part i. We can treat Step 2 as a new R< , 1> system and apply the same algorithm recursively to solve this system. This implies that we reduce the size of the original system from n to less than or equal to h, following the sequence n' = n, — n h * n h 1 h , h, and finally use one extra IC package to solve the residual system. Hence, n for each iteration we need (2 -l) packages of type IC„ . , > and ( -1) packages of type IC..^., , . . Since at most - — °— — - 1 iterations are required, J * U log h we have for IC_.., n . type package a total of IC = (2 K -1) + (2 -1 + • 29 2<*HM 2 + 3(^f- 2 ) + i, and since h > 2, < iiS. + 3 12EJ1 _ 5 - h J log h Similarly, for IC type packages, we have a total of «-<[§H ♦<[[!! n| 1 h 1) + ... < (H) + ( n 1 } ( n^ 1 1 } - V v ^2 h' \3 ^2 h ; h h h ^^^•••'♦Ci- 1 *- 1 < 2n + log n _ 2 — h log h The time hound is ohtained hy the fact that all packages in the same step per iteration are operating in parallel Q.E.D, Example 5 x. = i The R system for i < and x. = c. + a. x. , for 1 < i < l6 , ill l-l — — can be solved hy the circuit of Figure h which follows directly from Algorithm 1 with h = h . The packages marked R represent IC R< < , > types and those marked U represent IC For use in a later application, we now consider a special case of an R system. Let a. =1, for all i, in Algorithm 1. In this case we 30 31 fn'l need not perform step l"b. So for each iteration, only r~i type ^^ | n I n^n j.L'* packages are required. Also, note that all Z are computed in Step la by merely summing atoms. Since Steps 2 and 3 require only multiplication by the y's generated in Step 1, which are l's, no multiplication is required in any package. From this we have Corollary 3 Any R system of the form x = x. = c. + x. _ , 1 < i < n l l l-l — — can be solved in time IC - v log h ; using a total package count of IC < l£ + 3^-^ - k — h log h 32 k . Applications In this section we will study several practical logic design problems. The methods of section 3 will "be used to derive time and component bounds. We will consider binary addition and ones' position counting in detail. In less detail we will consider binary multiplication, digital filtering and a control problem. Definition 7 By the addition of two n digit binary numbers a = a ... a and b = b ... b^ we mean the generation of sum digits s = s . . . s n and n 1 n 1 carry digit c , defined as follows. We write s. = (a.b.+a.b.) c. ., + (a.b.+a.b.) c. 1 (l) l 11 11 l-l 11 ii i-l where 1 < i < n and c^ = 0, such that s. = 1 iff just one or all three of — — l ° a., b. and c. _, are equal to 1. Also we write l i i-l c. = a.b. + (a.+b. ) c. , (2) l ii ii i-l where 1 < i < n and c = , such that c. = 1 iff any two or all three of a., b. and c. n are equal to 1. Now let l i i-l x. = a. + b. (3) ill and y. = a.b. . (h) i ii If we write d.=a.b.+a.b.=(a.+b.)+a.b.=x.+y. (5) l 1111 ii ii l l 33 then Equation 1 can be rewritten as s. = d. c. . + d. c". . (6) l i l-l i i-I and Equation 2 can be rewritten as y i + x i C i-1 1 i i i n (T) i = . Our first result concerns binary addition using gates as components. Theorem 3 Two n = 2 , t >_ 0, digit binary numbers can be added in T G l|<5+lcg f n) log n + h with G - ( 2 + fll 5 n log n + (8 - -^-) n + {-£-) log n + 2 . Proof Our proof consists of three parts. 1) To generate the x. and y. , 1 < i < n, from a. and b. by l l — — l l Equations 3 and h , we need 2n gates and one gate delay, so T = 1 and (j-L Gl = 2n. 2) To generate the s. , 1 <_ i <_ n, from x. , y. , and c. using Equation 6, we refer to Figure 5- A total of 7 gates are required for each s., for a total of 7n gates. After d. and c. , are available, three l i i-I gate delays are required. It will be seen in part 3 that the generation of the c, 1 <_ i <_ n, from x. and y. can be accomplished in 21og n steps. So for n >_ 2 the two steps required to generate d. from x. and y. are no more than the time required to generate c, since 21og 2 = 2. fctH 3U Figure 5 Sum Generation It is easy to verify that the theorem holds for n = 1 by a direct construction. Thus we have T_ = 3 with G2 «: Tn. G2 3) To generate the c. , 1 <_ i <_ n, from x. and y. using Equation 7, we turn to Lemma 3. Since Equation 7 defines an R system, it follows immediately from Corollary 1 (c.f., Figure 3) that T G3 - | (5 + lo S f n) lcg n with with G3 < 3 + 1 2 f-1 1 ? n log n - (l + j^-) n + — r log n + 2 . Thus we have from parts 1, 2 and 3 a total of T G =l+3+|(5+ log f n) log n = -(5 + log f n) log n + k < 2n + 7n + (| + ~-) n log n - (l + — ) n + ~- log n + 2 - (§ + ^j) n log n + (8 - £^-) n + (~j-) log n + 2 . Q.E.D. 35 Next we consider binary addition with integrated circuit packages as components . Theorem h Two n = 2 , t > 0, digit binary numbers can be added in time T Jr < (2^^ + 1) IC — log h using a total package count of IC < <£ + k±^ _ 7 — h log h with package types IC and IC for h > 2. Proof The x. and y. of Definition 7 can be generated in one package delay using 2n/h type IC packages. The carries of Equation 7 (c.f., Figure h) can be generated following Lemma k in T TO < (2r — ~r - l) using Gr- + h- — °— — - 7 IC — log h h log h packages. Then the sum bits of Equation 6 can be generated in one package delay using — packages of type IC JT following Definition 6b, part ii. Summing these counts proves the theorem. Q.E.D. Example 6 Consider the problem of adding two 32-bit binary numbers using gates with fan-in 2 and fan-out 8. By the method of Theorem 3, the sum can be formed in at most 21 gate delays since T G < |(5 + logg 32) log 32 + It