LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-OHAMPAICN aop.fc The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN L161— O-1006 Digitized by the Internet Archive in 2013 http://archive.org/details/sparsematrixinve443ratl liN Report No. M*3 klUZ COO-ll»69-Ol80 SPARSE MATRIX INVERSION by K. Ratliff June 1971 Report No. kk3 SPAESE MATRIX INVERSION* by K. Ratliff June 1971 Department of Computer Science University of Illinois Urbana, Illinois 6l801 Supported in part by contract US AEC AT(ll-l)lU69 TABLE OF CONTENTS Page 1. Background 1 2. Compile Time and Execution Time Processes 3 3. Pivot Selection Algorithm 5 4 . The Inversion Process 8 5. The Symbolic Inversion 10 6 . Register Optimization 15 7. Tests 18 APPENDICES I. DIAGRAM OF SUBROUTINE INTERACTION 20 II. DESCRIPTION OF ARRAYS 22 III. FLOVJCHARTS OF SPARSE AND COMPIL .27 IV. CALLS TO COMPIL 34 V. CODE GENERATED FOR VARIOUS CASES 38 1. Background In numerical problems one is often presented with an equation of the type Ax = b waere x and b are vectors of length n and A is an n x n matrix. To obtain the value of x - A -1 b is an expensive project in terms of computer time and memory if n is large. Typically this computation is incidental to the main problem ana x may have to be re-evaluated many times due to changes in b and/or A. Therefore tne cost of the computation is not proportional to its importance. Additionally the A matrix may be sparse (i.e. having few nonzero entries). Standard matrix inversion routines using Gauss elimination require that the matrix be stored in its entirety. Further the ordinary pivot selection algorithm will in most cases introduce a large number of new nonzero entries in A, which in turn increases the number of arithmetic or needed to solve for x. The purpose of this report is to describe a pre ,/. m which saves a significant amount of computer time and space in the inversion of sparse matrices. There are three factors which make the program more efficient for sparse matrices: 1. The pivot selection algorithm is designed to introduce few new nonzero matrix elements. 2. The matrix A~ is never actually computed. A record (in subroutine form) is kept of the operations required to produce the resultant vector x. 3. The A matrix is not stored as a whole. Instead various arrays are constructed which give necessary information about the nonzero elements only. 2 . Compile Tine and Execution Time Processes Prior to the inversion process the A matrix is analyzed and its nonzero elements are stored along with information about the elements. (See Appendix II) . In particular the elements of A are classified according to their type; an integer is type one, a constant type two, and a variable type three. The procedure used in obtaining the x vector is Gauss elimination-back substitution performed on A and b. This procedure is a series of operations of the form: a/d ■»■ a a - b*c ■* a The first type is used in the division of row elements by the pivot, while the second is used in row operations. The program, SPARSE, which formulates the sequence of such operations needed to produce x, outputs this information in the form of two subroutines, MATINV and MATMUL. The operations involving only elements of A are placed in MATINV; those involving b are put in MATMUL. Thus when x must be re-evaluated due to changes in the b vector, only MATMUL need be called. When A charjes, both subroutines must be executed. Since the elements of A are classified by type, i t is possible for SPARSE to recognize operations exclusively involving elements which never vary (types one and two) , and to perform these operations at compile time (during the compilation of MAT IN V and MATMUL) . In such a case no code is compiled so that the compiled subroutines contain fewer instructions. This is quite a saving since SPARSE is executed only once, while MAT IN V and MATMUL are executed frequently. 3. Pivot Selection Algorithm The pivot selection strategy is an important part of the program, since the choice of different pivots or the same pivots in a different sequence greatly effects the number of arithmetic operations necessary for the inversion. The selection is based on the number of nonzero elements in a row or column in order to avoid introducing new elements during the Gauss elimination. The following are some of the pivot selection algorithms which were tested. 1. Minimum column, minimum row: This strategy compares all columns of A with respect to the number of nonzero elements in the column. The column (s) which is minimum is then searched for the element with the minimum number of nonzero elements in its row. (The nonzero comparisons refer only to that part of A not already involved in pivoting.) 2. Minimum row, minimum column: This algorithm is the same as the first applied to the transpose of A. 3. Minimum product: Each matrix element is compared according to the product of the nonzero element count of its row and that of its column, rhe basis for judging the efficiency of a method was the number of computer operations required to obtain x. (See Chapter 7.) rhe first two algorithms were found to be about equivalent, rhe minimum product strategy was much more efficient in some rases, but less efficient in the majority of examples. The present program uses the first method. In the search for an optimal pivot candidate, two or more elements nay have identical nonzero row and column counts. In such cases the new element, b, is tested to see if it is the integer one. [f so, b is the best element so far. If b is not equal to one, Its type (see Chapter 2) is compared with that of the former sivot candidate, a. If b is of a simpler type than a, it is sreferred as a pivot. If a and b are both type one (integers), i is retained as the best choice. If a and b are both type :wo or tiiree (constants or variables), they are compared by ;ize and the larger (in absolute value) is chosen. In summary lonvarying elements, types one and two, are preferred to those which change when the A matrix changes, and large elements are preferred to small ones. The motivation for choosing elements of types one and two rather than those of type three is that row operations or pivot divisions involving only nonvarying elements can be performeu at compile time as explained in Chapter 2. Integers are preferred over constants, since integer arithmetic not involving division is exact. 4. The Inversion Process The program consists of a group of routines which are divided into two types; those which are used initially to set up and analyze the A matrix, and those used repeatedly to obtain the x vector. Appendix I gives a diagram of the interaction of these programs. The routine SETUP codes information about the A matrix for use by SPARSE. It does this by differentiating a series of expressions E Q to form 3E. 3E. SPARSE performs the bulk of the decision making for the inversion. SPARSE calls COMPIL (an assembly language program) to generate the routines (MATINV and MATMLL) whicli manipulate A and b to obtain x. As was mentioned previously, the inverse of A is never calculated explicitly, rather SPARSE records only those operations which directly effect the final value of x. These essential operations are of two types which correspond to f orri ng A~ and finding the product A - £>. The former arithmetic is contained in the routine MATINV and the latter in MATMLL. Hence if b changes more often than A, the x vector can be obtained frequently by calling MATMUL only. When the A matrix changes, both MATIWV and MATMUL must be called in that order. Since all decisions pertaining to pivot selection, etc., are made by SPARSE, the generated routines are very simple and execute rapidly. They are straight-line code using only basic machine instructions (LOAD, STORE, ADD, SUBTRACT, MULTIPLY, DIVIDE, and LOAD COMPLEMENT). The COMPIL routine is also simple in that its function is to translate three parameters into one instruction and store that instruction in the appropriate location in memory. The parameters to COMPIL indicate the arithmetic operation, the array location containing the operand, and the subroutine (MATINV or MATMUL) which is effected. (See Appendix IV) 10 5 . The Synbolic Inversion This section will outline in iaore detail the process by which MAT IN V and MATMliL are generated. The SLTLP routine uses eight arrays to store data about the A matrix. These arrays are described in Appendix II. Since only nonzero elements are considered, information about the position of these elements must be available. This data, along with pointers to the next elements in the sane row and column, information about the simplicity of the element, and the location of its numerical value are passed tc the SP/iRLSE routine. Basically SPARSL analyzes the attributes of the A matrix and tries to determine the best code for MATIdV and MATMLL. It can be divided into three sections: selection of a pivot, Gauss elimination and back substitution. The pivot selection process uses the algorithm described in Chapter 3. The matrices used to store information about A contain information about the inversion process itself at different stages of the analysis of the matrix. The set cf columns which has a minimum nonzero count is recorded by means of a chain. The pointers to columns in the chain arc stored in the location formerly used for nonzero column counts. 11 (This space is no longer needed since all columns in the chain have the sane nonzero count, which is stored in a single location. ) When all columns have been compared , those in the minimum chain are searched for the element which is the best pivot according to nonzero row count, type, ana size. After a column has been searched in this fashion the information about its nonzero count is restored if the column is not used for pivoting. The space for the nonzero count on the column which does contain the pivot is set to zero during pivoting ana to minus one after pivoting. When the pivot element is chosen, the instructions to divide the elements in the pivot row and the b vector element corresponding to the pivot row are compiled (if the pivot equals one no instructions are compiled) . 7\s each pivot row element is divided, its row number is set negative to indicate that it is in a pivot row. Then the nonzero count in the column of this element is reduced by one. During the divisions the pivot element is neglected (as are elements in pivoted columns) and at the completion of the pivot division it is removed from the row chain. Finally the b vector element is divided by the pivot. 12 The elements in the pivot column are then checked to determine which ones are to be eliminated (those not in pivoted rows). When an element to be eliminated is founu, its row is searched simultaneously with the pivot row to decide what row operations are necessary. The indicated instructions are compiled to change the elements in the eliminated-element row and the corresponding b entry. (See Appendix V) Luring the elimination process the compiled arithmetic is also performed on the variable elements (type three) by SPARSE, since it is necessary to avoid choosing an element as pivot which has value zero or is very small. As the pivot column is being searched in the elimination process, the pivot and eliminated elements are removed from the column chain (i.e. the preceding column element is made to point to the element following the one removed) . The place in the I IX array (see Appendix II) formerly occupied by an eliminated element is placed on a chain of free elements. These spaces are then reused when a new matrix element is generated by the elimination process. The pivot element is placed on a backward pivot chain (i.e. the beginning of the chain is the most recently chosen pivot). Pointers to elements in the free element chain are in the fix location normally used 13 to give the type of the element, while the back pivot chain pointers occupy the location for the column chain pointers. When all elements in the pivot column have been checker (and eliminated where appropriate) another pivot is selected and the procedure is repeated. (The last pivot requires little processing, since it is the only nonzero element in its row. Instructions are compiled to simply divide the b entry by the pivot value and store the result in x. ) The back pivot chain is used in the back substitution process in the following way. The column of the first pivot on the chain is searched for remaining elements ana as they are found, instructions are compiled to alter the b vector in those entries where the corresponding rows of A have elements in the pivot column. This continues until the pivot is the only element in the column. Whenever an element is the last nonpivot in its row, the b entry for that row is stored in the x vector in the position corresponding to the column number of the pivot in the row. This indicates that no further changes are necessary in the row. When a pivot column contains only the pivot element, the pivot is removed from the back pivot chain anu the process is repeated for the next pivot. After all pivots have been 14 used in the back substitution process, the x vector is in its final form (x = A _1 b) . Flow charts of SPARSE and COMPIL are given in Appendix III for more specific information on the mechanics of the program. 15 6 . Register Optir.ization The COMPIL subroutine generates the machine code for riNV and MATML'L in such a way that all computation is performed in floating point registers and 2, with general registers 1 through 15 used as base registers in addressing the arrays PW , XIN, and XOUT. (General register holds the address of the beginning of MATINV or IIATMbL at execution time.) MATINV and IIATMLL are called with the addresses of the arrays as parameters so the subroutine can use relative addresses in their register-to-storage instructions. The XIN and XOLT arrays have fixed length, n, and are referenced using registers 14 and 15, respectively, as base q registers. These are adequate for all n < 2 . The PW array on the other hand has variable length depending on the number of nonintegers in the A matrix. The limitation on its length is imposed by n (i.e., #PW entries < 2 ). Since there are only 13 index registers available for PW , at any one time only 13*2~ locations can be accessed (if PW is double precision, this figure is further reduced to 13*2"). This restriction on the length of PW is circumvented in the following way. Initially the address of PW(1) is in register 1 and those of PW(n*2 10 ), n = 1, 12 in registers 2 1G through 13. (This is done by taking the PW parameter address as the value of register 1 and adding U09G to this for each successive register.) Thereafter when a PW location not currently accessible is required, the base register useu lease recently is reset to contain the needed base address. The recomputation of the base register value is done within the subroutines by instructions also generated by COMPIL, In order to accomplish this, COMPIL keeps a count of the instructions generated and stores the current count in a table in a position corresponding to the base register used in the instruction. The decision of what base address to discard is made by searching this table. A second table is needed to indicate which base registers contain various base addresses. For example, if PW(14,000) is needed for an instruction, the entry in this second table corresponding to base address 14 (indicating PW (13312)) is checked. If there is a positive number in that position in the table, it gives the base register to be used for PW(14,000); if there is a negative number, no base register is currently being used for that part of the PW array. In the latter case the last used register is set to the desired base 17 address and the table entry for its former base address is made negative. In the matrix inversion computations performed by the generated suoroutines , floating point register 2 is uscct when possible for the values of elements which are used several times in succession (i.e., the value of the pivot during the division of the pivot row, or the value of the eliminated element during the row operations due to the elimination step) . Floating point register is used for those values which change with each operation (i.e., the pivot row elements to be divided or the elements in the pivot row used in operations of the type a - b*c ■* a) . In this way more economical register-to-register instructions can be used, with register being reloaded for eacn new element used and register 2 remaining fixed for a period of several operations. These decisions are made within SPARSE. Elements with special properties also provide the opportunity to cut down on the number of register-to-storage instructions. A comprehensive description of these properties is given in Appendix V. 18 7. Tests Two types of tests have been employed in the development of the program. The first type checks the efficiency of the pivot selection algorithms and the second type determines the accuracy of the inversion. The pivot selection algorithm test consists of varying the initial part of the SPARSE program so that different methods of pivot selection are used on the same examples with all procedures following the selection of a pivot remaining fixed. The number of times COHPIL is called (equivalent to the number of instructions generated for the subroutines) is compared for all methods tested. A summary of these tests was given in Chapter 3. The accuracy tests refer to the validity of the matrix inversion procedure as well as to the numerical accuracy of the results. A check on the inversion procedure was accomplished by printing all operations performed during the execution of SPARSE and duplicating these by hand on the same matrices. The examples useu in this way had eight as a maximum value of n. The test for numerical accuracy (which of course implies a check on the inversion logic) is a program which 19 calculates an arbitrary b vector and calls SETUP, MATSET , and SPARSE to produce MATINV and MATMUL for some A matrix. The two generated routines are then executed to produce an x vector, Finally the following is calculated | |Ax - b] | to measure the accuracy of the output vector, x. 20 APPENDIX I DIAGRAM OF SUbROUTINE INTERACTION 21 SPARSE SYMBOLICALLY INVERTS A AND PREMULTIPLIES b BY A" 1 (PIVOT SELECTION, GAUSS ELIMINATION , AND BACK SUBSTITUTION) SETUP GENERATES THE ARRAYS CONTAINING INFO. ABOUT A * MATSET EVALUATES THOSE ELEMENTS IN A WHICH ARE] VARIABLES COMPIL SET3 EACH MACHINE INSTRUCTION IN MEMORY IN THE APPROPRIATE SUBROUTINE T MATINV £ MATMUL ♦ PERPORMS ARITHMETIC RELATED TO FORMING A" 1 PERFORMS ARITHMETIC RELATED TO FORMING A _1 b 22 APPENDIX II DESCRIPTION OF ARRAYS 23 MX is an integer array which is the primary source of information about the elements of A. It is dimensioned 6 x K, where K is the upper limit on the number of elements possible in A. (Usually n*n) There are six entries for each element which contain the following data. fix (1,1) contains the type of the element, initially 1, 2, or 3, meaning integer, constant, or variable, respectively. If the type is 0, the element has been reduced to zero value and will be removed from A. This space is also used for pointers in a chain of free spaces in MX, where the previous element has been removed. fix (2, I) contains the row number of the element. This is set to minus the row number when the row is pivoted. MX (3, 1) contains the column number of the element, 24 MX (4, 1) points to the location of the next row element in MX. MX (5,1) points to the location of the next column element in MX. If the element is a pivot this points to the next element in the backward pivot chain. MX (6, I) contains the numerical value of type 1 elements and the location of this value in PW (see below) for type 2 and 3 elements. IR1, IC1, IR2, and IC2 are integer arrays dimensioned 1 by n, IR1(I) gives the location in MX of the first element of row I. IC1(I) similarly gives the first element of column I. IR2(I) contains the number of nonzero elements in row I . 25 column I. During the pivot selection this slot is a pointer to the next column in a chain of columns with minimum nonzero counts. IC2(I) is set to while the I-th column is being pivoted and to -1 after pivoting. PW is a floating point array dimensioned 1 by K where K is the upper limit on the number of constants or variables possible (usually n*n) . It is used to store the numerical values of noninteger elements in A. The values of variables are set into PW by the routine MATSET which computes new values in A. The values of constants are set in PW by the routine, SLTuP. The first twenty locations are filled with nonzero integers between -10 and +10. These locations are referenced wnen computation must be compiled involving an integer. IQ and IP are integer arrays, dimensioned 1 by n and 1 by n**2, respectively, containing information about the position of elements in PW which are variables. These arrays are used by MATSET to compute the PW entries initially and each time the A matrix cnanges. IQ(I) gives the number of elements to be computed in the I-th column of A. 26 IP (J) gives the row numbers of successive entries in PW which are variables. For example, if IQ(1) = 4 and the first four entries in IP were 6, 8, 11, 15, the nonzero variable elements in column 1 of A would appear in rows 6, 8, 11, 15. XIN and XOUT are double precision arrays dimensioned 1 dv n which correspond to the vectors b and x, repsectively. They are used by MATMUL. 27 APPENDIX III FLOWCHARTS OF SPARSE AND COMPIL 28 K £ t SE" 8*3 J it pa. a = £5 29 30 3l£ — jg Ul Ul 5 5 £B 1 I SSal 31 W £? !-• H Ul 2 5 -i o « S- ft. fa" > « q2 £ £ SB O Z > >• p « fal Q < M E-> ft. > as to ww W ^ K ° « H |9! 3! 32 o - i 33 3 2 B * B S •• s e — •< ■ — » 1 ' 5 ! ? SEg 1 i\. - \ - 3 3 X ■ - * ■ 34 APPLiMDIX IV CALLS TO COMPIL 35 The form of the call to COMPIL is Call COMPIL (I, J, K) . The following tables give the values of the three parameters ana the corresponding interpretation. Vhe first parameter may assume 25 different values, most inuicating a machine instruction involving floating point registers. I If K=0 when 1=0 If K=l when 1=0 -1 Compile end of subrts same Compile beginning of subrts same 1 LE 2,- LD 2,- 2 LE 0,- LD 0,- 3 LD 0,- same 4 STE 0,- STD 0,- 5 STD 0,- same 6 LCEP 2,2 LCDR 2,2 7 LCDR 0,0 same 8 DER 0,2 DDR 0,2 9 I LLP. ,2 MDR 0,2 10 bE 0,- Db 0,- 11 ILL ,- MD 0,- 12 DD 0,- same 13 \j 0,- same 14 AE 0,- AD 0,- 15 AD 0,- same 16 LCER 0,2 LCDR , 2 17 LER 0,2 LDR 0,2 lb LCER 0,0 LCDR 0,0 19 SD 0,- same 20 Lb 2 ,- same 21 LCbR 0,2 s ame 22 SDR 0,2 same 23 ADR , 2 same 36 The second parameter gives the array index of the element to be used in register-to-indexed-storage instructions. In register-to-register instructions, it is ignored. The third parameter may take on four different values which in general indicate the array referenced in r-x instructions and the subroutine (MATMUL or MATINV) which is being added to. The following table gives the values of k and their meaning. k When 1^0 dhen 1 = array index refers to PW array is PW, instruction is for single precision MATINV 1 array index refers to PW array is PW, instruction is for double precision MATMUL 2 array index refers to XIN, instruction is for not to be used r'iATMUL 3 array index refers to XOLT, instruction is for not to oe used MATMUL 37 To illustrate the manner in which instructions are generated for MATINV and MATMUL, take the following example. The pivot element throughout the example is A(6,4); its value is in PW(13) / which indicates it has value +3. There is a non-zero element in A (6 ,7) with value in PW(25). The element being eliminated in the Gauss elimination process is A (8, 4) , stored in PW(30). There is an element 7.(8,7) stored in Pis (32). During the back substitution process there is an element A(2,4), located in PW(35); it is the last non-pivot in row 2, and the pivot for row 2 is in column 5. CALLS TO COMPIL CODL FOR MATINV CODE POD MATMUL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COMPIL CALL COM.PIL CALL CuMPIL CALL COMPIL CALL COMPIL 1,13,0) 2,25,0) 8,0,0) 4,25,0) LE 2, PW(13) LE 0, PW(25) DR 0,2 STE 0, PW(25) More pivot row elements Doing divided will follow this repeatinq the last three instructions 3,6,2) The b element LD 0, XIN (6) 10,13,1) is being DE 0, PW(13) 5,6,2) divided STD 0, XIN (6 ) 1,30,0) LE 2, PW(30) 6,0,0) LCER 2,2 2,25,0) LE 0, PW(25) 9,0,0) MER 0,2 14,32,0) AE 0, PW(32) 4,32,0) STE 0, PW(32) More elements in the eliminated rcw will be cr. a n a e d foil ow in g this, repeating the last four instructions 2.30.1) The b element LE 0, PW(30) 7,0,1) in the row of LCDR 0,0 13.6.2) the elim eleto ML 0, XIN (6) 15,3,2) is being chang- AD 0, XIN (8) 5,8,2) ed due to elim. STD 0, XIN (8) 2 0,6,2) This is the 21,0,1) back subst 11,35,1) proc which 15,2,2) puts the 5,4,3) result in x LD 2, XIN (6) LCDR 0,2 ME 0, PW(35) AD 0, XIN (2) STD 0, XOu"T(5) 38 APPENDIX V CODE GENERATED FOR VARIOUS CASES 39 This Appendix shows various cases of a/d ■* a and a - b*c ■* a which occur in SPARSE, and the resulting computation. The following abbreviations are used: PV: Pivot element P: Pivot row element (^ PV) XINi: Entry in b vector corresponding to row of i element XOUTj_: Entry in x corresponding to column of pivot in i element row T#: Type of element, Tl = Type one, etc. E: Eliminated element X: Eliminated row element (5* E) These are snown in the figure below. ARRAY A: — F-6 Pivot ROW Pivot Column 40 Operations of the Type a/d ■♦ d (if PV = 1, no operations required) p?" = Tl, PV = Tl or T2 NO CODE COMPILED form P/PV -*- P (if new P = T2 get new space in PW] | P = T2, PV = Tl or T2 j form P/PV P = T3 or PV = T3 NO CODE COMPILED MATINV CODE Compile *LE 2, PV (compiled only once for each pivot) LE 0, P DER 0,2 STE 0,P When a is in the b. vector MATMUL CODE compile LD 0, XIN p DE 0, PV **STD 0, XIN pv * If only one non-pivot in pivot row, compile LE 0, P DE 0, PV STE 0, P 41 II. Operations of the Type a - b*c -> a P = Tl, E = Tl, X = Tl NO CODE COMPILED form X - E*P ■* x (if new X is too large make X = T2) HO CODE COMPILED P = Tl or T2, E = Tl or T2, X = Tl or T2 form X - E*P ■* X (change X to T2 if it was Tl, get space for it in PW array) P = Tl or T2, E = Tl or T2, X = T3 I1ATIMV CODE form y = -P*E compile LE 0, y AE 0, X