LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 510.84 1463 c no.51-GO •;•. ENGINEERING AUG 51976 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN o OCT is OCT 1: MAY m 7 ■ ■■ ■ 31990 FEB 27 mt MM 5 lot ••. . ! • L161 — O-1096 ENGINEERING LIBRARY UNIVERSITY OF ILLINOIS URBA NA. ILLINOIS URBANA, ILLINOIS 61801 CAC Document No. 57 ON JACOBI AND JACOBI-LIKE ALGORITHMS FOR A PARALLEL COMPUTER By Ahmed H. Sameh December 11, 1972 Digitized by the Internet Archive in 2012 with funding from University of Illinois Urbana-Champaign http://archive.org/details/onjacobijacobili57same CAC Document No. 57 ON JACOBI AND JACOBI-LIKE ALGORITHMS FOR A PARALLEL COMPUTER By Ahmed H. Sameh Center for Advanced Computation University of Illinois at Urbana-Champaign Urbana, Illinois 61801 December 11, 1972 This work was supported in part by the Advanced Research Projects Agency of the Department of Defense, was monitored by the U. S. Army Research Office-Durham under Contract No. DAHCOU-72-C-OOOl, and appeared in Mathematics of Computation, July, 1971, Vol. 25, No. 115- IMGINEEJUNG LiBKAR* ABSTRACT Many existing algorithms for obtaining the eigenvalues and eigenvectors of matrices would make poor use of such a powerful parallel computer as the ILLIAC IV. In this paper Jacobi's algorithm for real symmetric or complex Hermitian matrices, and a Jacobi-like algorithm for real non- symmetric matrices developed by P. J. Eberlein, are modified so as to achieve maximum efficiency for the parallel compu- tations. TABLE OF CONTENTS Page 1. Introduction 2. Jacobi's Algorithm 3. A Jacobi-Like Algorithm for Non-Symmetric Matrices .... 11 k. Acknowledgement 22 APPENDIX . ' 23 REFERENCES 2h 1. Introduction . With the advent of parallel computers, the study of computationally massive problems became economically possible. Such problems include, for example, solution of sets of partial differential equations over sizable grids, and multiplication, inversion, or determination of eigenvalues and eigenvectors of large matrices. An example of a parallel computer is the ILLIAC IV . This computer is essentially an array of coupled arithmetic units driven by instructions from a common control unit. Each of the arithmetic units, called processing elements (PE's), have 20^+8 words of 6^-bit memory with an access time under ^20 nanoseconds. Each PE is capable of 6U-bit floating point multiplication in about 550 nanoseconds . Two 32-bit floating point operations may be per- formed in each PE in approximately the same times . The PE instruction set is similar to that of conventional machines with two exceptions. First, the PE's are capable of communicating data to four neighboring PE's by means of routing instructions. Second, the PE's are able to set their own mode reg- isters to effectively disable or enable themselves. For more detailed description of this system the reader is referred to [2, 8, 9, 12]. The purpose of this paper is to introduce modified Jacobi and Jacobi-like algorithms for the computation of the eigenvalues and eigen- vectors of large real symmetric or complex Hermitian matrices, and real non-symmetric matrices respectively, that are suitable for a parallel computer. 2. Jacobi's Algorithm. In the classical method of Jacobi (l8^6), [13] * a real symmetric matrix is reduced to the diagonal form by a sequence of plane rotations A, .. = R,iR, (k = 1,2,...,), where A = A is the original matrix and each rotation R = R(p,q,cr ) in the p, q plane through an angle ct eliminates the off -diagonal element a^ (and hence a ), and affects pq pq qp " only elements in rows and columns p and q. (See Appendix for the appropriate value of Or to annihilate the element a .) Because of symmetry only the off- pq pq diagonal elements above the main diagonal are considered in what follows. It is possible, however, to modify the present Jacobi algorithm for a parallel computer so as to eliminate more than one off -diagonal element. For example, for a matrix A of order k, if the orthogonal transformation R is chosen as, s., (2.1) R = 1 1 '2 where c. = cos Ct. , s. = sin a. (i = 1,2), then R A R would have zero elements 1 11 x ' ' ' in positions (1,3) and(2,U) provided that the angles C£ and a o are properly chosen, a and a o are determined by (a n , , a_~, a__) and (a__, a, , , a~, ) 12 J v 11' 33' 13' v 22 kk' 2V respectively. Define m by [(n + l)/2], where n is the order of the matrix A and [x'J is the greatest integer less than or equal to x. Let each (2m - l) orthogonal transformations be denoted by a sweep. Observing that there are n(n - l)/2 off -diagonal elements, and that the maximum number of these elements which can be annihilated by an orthogonal transformation of the type (2.1) is ['n/2"], then the modified Jacobi algorithm will attain maximum efficiency of parallel computation if the following two conditions are satisfied: (i) each orthogonal transformation R should "be constructed so as to annihilate [n/2] off-diagonal elements. (ii) each sweep should annihilate each off -diagonal element only once; i.e., each of the (2m - l) orthogonal transformations in a sweep should annihilate different [n/2] off -diagonal elements. Several annihilation regimes that satisfy the above requirements are possible. Two different regimes are discussed below. First Annihilation Regime . For a given sweep each of the (2m - l) orthogonal matrices R consists of the elements, sin a v ' p < q , Pq sin a> ' p > q, pq. where p and q are sequences defined by (a) for k = 1,2,...., m - 1, q=m-k+l,m-k+2, , n-k, f(2m -2k+l)-q, m-k+l 2m-k-l 2k . Let n = 8 and k - 3, then the pairs (p,q) are {(5,2); (7,k); (1,6); (3,8)} and R^ is of the form ,(3) 11 t 1 ,(3) l i6 ,(3) 22 I ,(3) 25 ,(3) l 33 1 •(3) : " R 25 " ii 1 1 if 3) 1+7 R (3) R l6 ,(3) ! ^55 ; ;(3) { 66 -R (3) ^7 77 -R J(3) 38 ,(3) v 38 3) 8 In order to construct the orthogonal transformations R for k = n/2 + 1, n/2 + 2,..., n-1; consider the sequence L = 1,2,..., y - 1. 7-L-l each value, of L there are N = 2 / " orthogonal matrices R given by, For k (2.6) R k = diag ( H { k) , H^ k) ,...., H\[ k) ) L-l -L where t = 2 , k = n(l-2 ) + I, and i = 1,2,..., N. The sequences p and q for each HA ', (M = l,2,..,t), are defined by. where, p = i + 1+N (M-l) , q = p + 2(N+i-l) - 2N[0(1)] , i = 1,2,. ..,21, , i + 2 (M-l) < UN 0(1) 1 » otherwise . Let n =8, L =2, and I = 1, then k = 7, and the pairs (p,q) are given by {(1,3); (2,10 j (5,7); (6,8)} and 1^ is of the form, r R (7) 11 i in. 13 ,(7) 13 ,(7) 22 K (7) K 33 -R (7) 2k' >(7) V 2U t (7) R (7) R 55" 57 n(7) >(7) v 57 ,>J)____L_. R (7) ? (7) 77 ,(7) 1 The pattern of the annihilated elements in one sweep for a matrix of order 16 is shown below, where those elements annihilated in the k-th transformation are denoted by the integer k. * 1 QJ) 2(iJ 3(19 k(J) 5(19 6(iij 7(1 * 8 © 7© 6 © 5 © 4 © 3 @ 2 © * 1© 2© 3© U© 5© 6© 7 * 8© 7© 60 5© h@ 3© * 1© 2© 3© i+O 5© 6 * 8© 7© 6© 5© O * 1© 2© 3© M© 5 * 8© 7© 6© 5© * l(B) 2(11) 3(lty *+ * 8© 7© 6© * 1© 2© 3 * 8© 7© * 1© 2 * 8© *- 1 10 Using one quadrant of the ILLIAC IV, (6k PE's), then for a 128 x 128 matrix, the 6^4- angles of each transformation are determined sim- ultaneously, one angle per PE. Once the transformation matrix R is deter- mined, the matrix A, -, = R A, R is computed in parallel [7]. Assuming that the matrix has converged, (using some criterion [13]), to the diagonal form after u sweeps, or after r - 1 = u (2m-l) orthogonal transformations, then the diagonal elements of A = WAW are taken to be the eigenvalues of A. The columns of ¥ = (V V V, ) are the corresponding eigen- x u u-1 1 to & vectors, where for the j-th sweep V. = n (R ) ., (j = 1,2,. ..,u). J k=l K J A similar algorithm as that described above [11] has been programmed in ILLIAC IV assembly language and successfully tested on an ILLIAC IV execution simulator [1]. 11 3. A Jacobi-Like Algorithm for No n symmetric Matrices . Eberlein [3,k] showed that for an n x n matrix A, complex in general, there exists a matrix U = II. U, (k,m) generated from a sequence of two dimensional transformations U„ (k,m), where (k,m) is the pivot pair, such that A T = U~" A U is arbitrarily close to being normal) i.e., the matrix (A T A - A T A ) is arbitrarily small. At each stage of the iteration, based on the elements of the k-th and m-th rows and columns, the para- meters of U „ were chosen such that the decrement of the Euclidean norm I of A» is given by, N 2 ^) - ^(U^Ug) > [l/3n(n-l)]. N^A* - A*A^) where, if (A) = Z la. . I . In this paper, the above algorithm has been modified for parallel computation. The transformations U, are n-dimensional, and their parameters are based on all the elements of the matrix A.. A lower bound on the decrement of the Euclidean norm of A„ is given by, H 2 ^) - ^(U^Ug) > (lAn) ^(A £ A* - A*A^). Once the matrix is practically normal, one can use the optimal procedure of Goldstine and Horwitz [5] for reducing it to the diagonal form, thus the eigenvalues and eigenvectors of A are obtained. Since a nondiagonable matrix cannot be similar to a normal matrix, then this procedure yields its best results for diagonable matrices, (see example 7 in [3], P- Qk) . Let the original matrix A be real, diagonable, and of an even order n = 2r, (if n is odd A is replaced by diag (A,v) of order n + l), then it can be partitioned as follows, 12 (3.1) A = A ll A 12 A 21 A 22 ^1 \2 A lr A 2r A rr where each siibmatrix A km , (k, m = 1, 2, ..., r), is given by, 3-2) V a 2k-l, 2m- 1 a 2k-l, 2m a a.~, _ 2k, 2m-l 2 k; 2m Let, D km ~ (a 2k-l, 2m-l " a 2k, 2m j ^ 3 ' 3 ^ E km " (a 2k-l, 2m ~2k, 2m-l' - a. V = l-1, 2m + a 2R, 2m-l' 13 and, (3.10 kAa) = E (i? + e 2 ) 1 k,m km fan K 2 {A) - im D km E km Assume also that A has been scaled such that F~(A) < 1, and denote the N 2 ^ t t matrix (A A - A A) by C. -1 Lemma 1. Let A' = Q~ A Q,, where Q, = diag (S , S , . ..,S ), and S = S = . . . = S = S is given by, (3-5) cosh Y sinh Y sinh ¥ cosh ¥ Define Y by, (3.6) tanh h ¥ = -2k 2 (A)/k 1 (A] Provided that k _ (A) > 2 |k (A)|, the following relation holds, (3-7) A^(A) > k 2 (a)/k.(a) 2 V " 1 where £tr(A) = IT (A) - tr(A') is the decrement of the Euclidean of A. norm Proof. The elements of each submatrix A' = S A, S are km km given by Ik a 2k-l,2m-l = a 2k-l,2m-l COsh2 *" a 2k,2m sinh2 * + | E km sinh 2 * a 2k,2m =: - a 2k-l,2m-l sinh ^ + a 2k?2m cosh 2 '* - 1 E^ sinh 2* ! , 2. (3' Q ) a 2k-l, 2m = | D km slnh 2 * + a 2k-l,2m COsh * " a 2k, 2m-l Sinh * 2 2 a ', „ -, = -1 D. sinh 2\lr - a_. , sinh \|/ + a„ _ , cosh \|/ 2k,2m-l ^ km Y 2k-!, 2m Y 2k,2m-l T Therefore, 1,2 sinh2 2 * + "wi™ sinh ^ and consequently, (3.9) ^ 2 (cosh u * - 1} Since N (A) = Z N (A. ), then k,m (3.10) AN 2 (A) = -1 (cosh ky -1) k ± (a) - (sinh k^i) * 2 (a). 2 2 A necessary condition for AN (A) to be an extremum with respect to * is ^ AN 2 (A) = 0, this yields relation (3-6), cnjr tanh ky = -2/c 2 (A)//c 1 (A). From the definition (3.U) it is clear that ^(A) > 2|/c 2 (A)|. Excluding for the time being the case ^(A) = a|fCg(A)|, then the second derivative of ^ 2 (A) with respect to ¥ evaluated for ¥ in (3-6) is given by, (3 11} -8k 1 (A)[1-(Uk|(A)/^(A))] (cosh H) 15 and is less than zero. Thus, for the choice (3-6) of \j/, ZW (A) ach its maximum value, leves (3.12) AN 2 (A) =|/c 1 (A)[l-{l-(^(A)/4(A))}2] which vanishes only if kAA) = 0. Since one is considering the case k (A) > 2 1 k: (A) J then "by the binomial theorem, (3.13) (1 - ^(A)/^(A)) 2 = 1 - i(k/ 2 (A)/Kl(A)) - 1(^(A)/^(A))' and (3-12) yields the relation (3-7). If ^(A) = 2|/< 2 (A)|, then from P 2 - (3.10), £N (A) is given by J/< (A)[l - ((l + tanh kty)/(l - tanh H) 2 }]. — P P Choosing tanh 4i|r = + (l - e )/(l + 6 ), where e is a small number, then o - AKT~(A) - •§■(! - e)/< (A; which is greater than zero. Lemma 2. Let A' = P A P, where P is the orthogonal trans- formation, (3.1>0 P = diag (T , T , , T ) A. c. r in which (3.15) k cos ep k sin rp k •sin rp, ^k cos rn, (k = 1,2, ....,r) 16 Then, if rp is determined "by (3-16) tan 2m. °2k-l,2k-l - °2k,2k 2c 2k-l,2k ,t .t where c. . are the elements of the matrix C = AA -A A, (3.17) Kp(A') > 1_ N 2 (C) 2n Proof. The 2X2 diagonal submatrices C of the matrix C can be expressed as (3.18) o w = l, k,4"*'a 'kk " m=i mk mk ] k = 1,2, ....,r Therefore, (3.19: where, k=l r C - v L kk ~~ k,m = 1 Km km Km Km l" A km ,t .t Km Ion Km •] E. B. km km km km ■D, E. km km -K B. km km Equating the off-diagonal elements of the left and right-hand sides of (3«19)> (3.21) f Z D. E. = - kJa) km km 2 V k,m 2k-l,2k Consequently, if the orthogonal matrix P is chosen such that the off- diagonal elements c^. . _. , for all k, attain their maximum positive 2k-l,2k values; then the inequality (3*17) is achieved. To show that, consider the matrix C - A'A' -A' A'. Since A' = P t A P, then C = P t C P, and the elements of the diagonal submatrices 0' = T C T are given by, 17 C 2k-l,2k = C 2k-l,2k C0S 2cP k + \ (C 2k-l,2k-l " C 2k,2k, } Sln *\ (3 ' 22) C '2k-l,2k-l = C 2k-l,2k-l C ° s2 fD k + C 2k,2k Sln2 \ -°2k-l,2k S±n *\ C '2k,2k = C 2k-l,2k-l Sln2 r \ + C 2k,2k C ° s2 Cp k + C 2k-l,2k Sin *\ and C '2k,2k-1 = C 2k-l,2k Hence, for c ' , n _, to be an extremum (3«l6) must hold. Also for the choice (3«l6) of cp the second derivative of c' with respect to en is given by, (3-23) -(^ 2k . 1>2k ) ««*k where, h = [^.^ * (o^.^^^ - c 2k;2k ) 2 ]*- As a result if cos 2 % is of the same sign as c_. , _. , c ' . n „ attains its maximum value. 2k-l,2k' 2k-l,2k Restricting cp to the interval [0,jt], the elements of T are given by, sin 2 o) k = 1 - (c 2k _^ 2k /h) (3.24) COs2c Pk = | + (G 2k-l,2k/ h) in which sin cp > and cos cp. is of the same sign as (c - c ). K K. which res ^lts in T fc being the identity matrix and hence s* . 2k = 0, then from (3-21) one obtains the inequality (3.25) 4(A>)>1 0^ k=l - 2k 18 2 2 Assuming that Z c' . _. > 1 N (C), then from the fact that the k=l 2k -^ 2k " 2n" Euclidean norm is invariant under orthogonal transformations and from (3.25) one obtains relation (3-17)- From Lemmas 1 and 2 it can be seen that in order to obtain the largest possible value of AN 2 (A), the matrix A should be subjected to the t orthogonal transformation M AM where M is a permutation matrix determined as follows: Let A" -M^M and C" = A" A nt - A nt A", then M is chosen such that each 2x2 diagonal submatrix C" has an element c" kk '2k-l,2k of at least average absolute value of all the off-diagonal elements of C" if any, and/or the difference (c" ) different from zero. For ~2k-l,2k-l 2k, 2k' example, in order to bring the off -diagonal element c- , (u < v), of maximum absolute value in the position (1,2) M is given by I I where I..=I-(e. -e.) (e. -e.) . Essentially I. . AI. . has the i-th and ij i J i J iJ iJ j -th rows and columns of A exchanged. After the matrix A is "prepared" by the transformation M, A' = p A" P will produce a matrix C whose off -diagonal elements r 2 c' ov are of such magnitudes that Z c' is at least equal to k=l ' (l/2n) ^(c; Theorem. Let A = A be a diagonable matrix with an even order -1 n = 2r and IT (A) these transformations are defined as follows: < 1. Let k = U £ A, U^ where U, = M| ^ « . If (i) M, is chosen as discussed above. 19 ii •Pn = diag (T^ } , T { 2 £) , . in which, U) _ cos cp (i) with, -sin cp^ } .CO sm cp. cos cp. I) :/) U) o (i) 2k- l,2k-l 2k, 2k tan 2cd; j = 2 777 2 — k 2 Wj c 2k-l,2k (iii) ^ = diag(S^, *W, .... , 8 (i)) in which, ,(i) cosh \j/ sinh \jr sinh' \j/ cosh \|r . with, tanh 1^ = -2K(Ap/ Kl (A' £ where, A i ■ W WV Then, lim N (C.) = 0. Proof. With no loss of generality assume that M„ = I. By Lemma 2, ^(Aj) > 1, N 2 (C ). From (3-3), (D^ } f + (E^ } ) 2 < SN 2 ^), 2n then (3-^) yields, k (A.) < 2W (A J < 2. Since the Euclidean norm is invariant under orthogonal transformations, then k, (A') < 2, and hence by Lemma 1, an 2 (A ) > £(A')/k .(a;) > 1 1^(0.) l v T Hn" 20 But since IT (A.) is a decreasing monotone function bounded below by Z |\. | , where \. are the eigenvalues of A, ["10], then AN (A.) -* as i -> oo. Hence TT(C«) -» 0, and A- is arbitrarily close to being normal. Let A be a 128 x 128 matrix. Using one quadrant of the ILLIAC IV, (6^4- PE's), the matrix can be stored in memory such that for a given m the 2x2 submatrices A^ (k = 1,2,..., 6^-) are assigned to the m-th D E. Once the matrix C is determined by parallel multiplication and stored in the same way; i.e., the k-th PE contains the submatrix C , the 6k angles d meluttw dmtmm) Research Report 8. author.si (Flrmlnmm: mlddt. Inlti.l. ft n.m») Ahmed A. Sameh «. REPORT DATE December 11, 1972 •a CONTRACT OR GRANT NO. DAHCOU 72-C-OOOl b. PROJECT NO. ARPA Order No. 1899 tm. TOTAL NO. OF PACES -20. 7b. NO. OF REFS 13 •a. ORIGINATOR 1 * REPORT NUMBER!** CAC Document No. 57 Vb. OTHER REPORT NO<*> (Any otfiaf nunfc.™ that may b. m»ml0n»d thlm mport) 10 DISTRIBUTION STATEMENT Copies may be requested from the address given in (l) above. II. SUPPLEMENTARY NOTES \2. SPONSORING MILITARY ACTIVITY U.S. Army Research Office-Durham Duke Station, Durham, North Carolina 13. ABSTRACT Many existing algorithms for obtaining the eigenvalues and eigenvectors of matrices would make poor use of such a powerful parallel computer as the ILLIAC IV. In this paper Jacobi 's algorithm for real symmetric or complex Hermitian matrices, and a Jacobi- lifce^ algorithm for real non- symmetric matrices developed. by P. J. Eberlem, are modified so as to achieve maximum efficiency for the parallel computations . DD ^..1473 TTTJnLARRTFTTCD "AS SI irity Cli Security Classification UNCLASSIFIED Security Classification k rv woroi HOLK WT Parallel Computers ILLIAC IV Jacobi ' s Algorithm Jacobi-like Algorithm Orthogonal Transformations Eigenvalues and Eigenvectors Normal Matrix IINCTiASSJFTEP Security Classification . . 6 7 8 10 ■