Hll unll HHBi 1 IHHi llKr lH H 1111111 — msssm 9& villi m HHHH1 ■BHHBil ggg illilSSiI W®m Mil lipHiflil ffigSm WW®. LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 IJHor "0.698-702. cod. 2- t The person charging this material is re- sponsible £or its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN RECTJ L161 — O-1096 u , *jti Report No. UIUCDCS-R-75-700 NSF - OCA - GJ -36936 - 000008 A PARALLEL QR-ALGORITHM FOR TRIDIAGONAL SYMMETRIC MATRICES by Ahmed H. Sameh and David J. Kuck February 1975 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS Digitized by the Internet Archive in 2013 http://archive.org/details/parallelqralgori700same * A Parallel QR-Algorithm for Tri diagonal Symmetric Matrices by Ahmed H. Sameh and David J. Kuck This work was supported in part by US NSF GJ-36936; and in part by the Advanced Research Project Agency of the Department of Defense under Contract No. DAHC04-72-C-0001 . ** Department of Computer Science and the Center for Advanced Computation, University of Illinois, Urbana, Illinois 61801. *** Department of Computer Science University of Illinois, Urbana, Illinois 61801. Abstract We show that if the size of the tridiagonal matrix in any given iteration is n, then the parallel QR-algorithm requires 0(log ? n) steps with 0(n) processors per iteration and no square roots. This results in a speedup of 0(n/log 2 n) and an efficiency of 0(l/log ? n). Keywords : Givens reduction Linear algebra LU-decomposition Parallelism AMS(MOS) Subject Classification Primary 68A20 Secondary 65F15 1 . Introduction Reinsch [1] proposed a stable QR algorithm for finding the eigen- values of symmetric tri diagonal matrices that requires no square roots. In this paper we present a version of his algorithm that is suitable for parallel computers. We give upper bounds on the time and number of proces- sors required, per iteration. Throughout this paper we assume that any number of processors may be used at any time, but we will give bounds on this number. Each processor can perform any of the four arithmetic operations in one time step, and there are no memory or data alignment time penalties. We also give the following definitions. If p >^ 1 processors are used, we denote the computa- tion time by T . The speedup of a parallel algorithm which uses p processors over a serial algorithm is defined by S = T,/T >_ 1, and the corresponding efficiency of the computation is given by E = S /p <_ 1. We show that if the size of the tri diagonal matrix in any given iteration is n, then the parallel QR algorithm requires 0(log 2 n) steps per iteration (and no square roots) using 0(n) processors. This results in a speedup of 0(n/log 2 n) and an efficiency of 0(l/log ? n). This algorithm depends on a parallel scheme developed by Chen and Kuck [2], for solving linear triangular systems. 2. The Algorithm One iteration of the QR algorithm can be expressed in the form [3], Q(A-kl) = R A = RQ t (1) where k is an origin shift, Q is orthogonal, R is upper triangular, and we assume that the tridiagonal matrix A is of order n = 2 m , for a positive integer m. Note that the tridiagonal matrix A is similar to (A-kl) rather than A. Let the diagonal elements of (A-kl) be denoted by a. (i = 1, 2, ..., n) and the off-diagonal elements by b. (i = 2, 3, ..., n). The orthogonal matrix Q is chosen as the product of (n-1) plane rotations, i.e., n 1 4- 1 Q = p n -i P 2 P V where p i rotates rows j and j+1 annihilating the element in the position (j+1, j), pj+l row j -s Therefore, if P, P* P^ (A-kl) is given by, L 3 ^3 Pr-1 Y+l 'r+1 c r-l b r+l 'r+1 Y+2 then it follows by induction that, c r = g r /P r s r = W p r r = 1 , 2, .. . , n - 1 Vl " c r-l c r b r+l + s r a r+l (2) where , 9 r = c r-1 a r - c r _ 2 s r-1 b r r = 1, 2, ..., n (3) in which c Q = 1, s Q = 0. From (2) and (3) we obtain the recurrence relation, r = 2, 3, ..., n - 1 (4) cp=c 1 a-c s,b r K r r-1 r r-2 r-1 r Let w = c n p., then (4) reduces to the second order linear recurrence r r i=l ] relation, w Q = 1, W] - 0l w r = a r Vl - b r w r _ 2 r = 2, 3, ..., n - 1 Solving (5) is equivalent to solving the lower triangular system of equations, Lw = e 1 where, w = (w Q , w^ ..., w n _ 1 ) , and ij " a j ii 1 = J i = j + 1 i = j + 2 otherwise 2 2 2 2 Using the identity c + s = 1, and defining z = n p. , r r r r ^ l n - 1 , we obtain the first order linear recurrence relation 2 2 , z = w = ] = 1, 2, 2 - k2 2 . 2 2 r " b r + l 2 r-l + w r r = 1, 2, ..., n - 1 This is equivalent to the lower triangular system, Lz = w where, ~t 2 2 2 w = (w Q , w r . .., w n-1 ) , ~t _ / 2 2 2 x Z VZq> Z] » • • • » Zp_] / > r 1 1 - J and tlj = -b? i = j + 1 otherwise Note that the squares of the subdi agonal elements rather than the elements themselves are the given data. So squaring these elements of the original matrix will cost only one step and (n-1) processors, which is negligible compared to the time of finding all the eigenvalues of the tri diagonal matrix. The linear systems (6) and (8) are solved sequentially. Chen and Kuck [2] have shown that any m-th order linear recurrence system of n equations, or any unit lower triangular system of bandwidth m + 1, can be solved in 1 2 T p = (2+log 2 m)log 2 n - ■ 2 -( lo 9 2 m+ ' lo 92 m ) 2 steps with p = 0(m n) processors. Specifically, the system (6) can be solved in (31og 2 n - 1) steps with no more than 4n processors. Once w is obtained, forming the right-hand side w in (8) requires one step with (n-1) processors, and the system itself can be solved in 21og 2 n steps with (n-1) processors. Thus, solving (6) and (8) requires T = 51og 9 n steps Pi L with p, = 4n processors. From w and z we can obtain, A - 0, £.. > 0, £.. <_0 i / j), will have low relative error in the solution no matter how ill-conditioned L may be. a j * W M w j Z J (b j + i + z j z m> * Vi Vi <4 z ;V 2 x-1 b ! +1 ""j + i ' j j + i z J' (z j z J-i> j = 1, 2, ..., n - 1 j = 1, 2, .... n - 2 n-2 w n-l 'n-1 Figure 1 REFERENCES [1] C. H. Reinsch, "A stable, rational QR algorithm for the computa- tion of the eigenvalues of an Hermitian, tri diagonal matrix," Math. Comp., v. 25, 1971, pp. 591-597. MR 45 # 4621. [2] S. C. Chen, and D. J. Kuck, "Time and parallel processor bounds for linear recurrence systems," to appear IEEE Transactions on Computers. [3] H. Bowdler, R. S. Martin, C. Reinsch, and J. H. Wilkinson, "The QR and QL algorithms for symmetric matrices," Numer. Math., v. 11, 1968, pp. 293-306. [4] P. E. Gill, G. H. Golub, W. Murray, and M. A. Saunders, "Methods for modifying matrix factorizations," Math. Comp., v. 28, 1974, pp. 505-535. [5] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965. MR 32 # 1894. [6] J. H. Wilkinson, "Error analysis of direct methods of matrix inversion," J. ACM, v. 8, 1961, pp. 281-330. iriiOGfiAPHIC DATA UIUCDCS-R-75-700 3. Recipient's Accession No. hi it k \LLEL QR-ALGORITHM FOR TRIDIAGONAL SYMMETRIC MATRICES 5- Report bate February 1975 h. Sameh and David J. Kuck 8. Performing Organization Rept N '"UIUCDCS-R-75-700 . mization \ame and Address iversity of Illinois at Urbana-Champaign lartment of Computer Science ana, IL 61801 10. Project Task Vtork Unit No. 1 1. ( ontract Grant No. US NSF-GJ-36936 , mization Name and Address ional Science Foundation gton, D. C. 20550 13. I ypc ol Report & Period ( overed Technical Report 14. We show that if the size of the tridiagonal matrix in any given iteration -. then the parallel QR-algorithm requires 0(log 2 n) steps with 0(n) processors iteration and no square roots. This results in a speedup of 0(n/log ? n) and an iciencv of 0(l/log ? n) . im